If you unable to read the files on the lustrefs with the HDF5 library v1.10 and above you should mount the FS with localflock option. The reason is flock support absence by default.
h5ls --version
h5ls: Version 1.10.1
Error:
h5ls snapshot.hdf5 snapshot_063.hdf5: unable to open file
What?!
mount -tlustrefs -o localflock 10.10.10.10@o2ib:/lustre /lustre
Now everything is ok!!
h5ls snapshot.hdf5
Header Group
PartType0 Group
Posts mit dem Label lustre werden angezeigt. Alle Posts anzeigen
Posts mit dem Label lustre werden angezeigt. Alle Posts anzeigen
Dienstag, 13. Februar 2018
Mittwoch, 10. Januar 2018
Lustrefs: a big performance hit on lfs find after patch of: CVE-2017-5754 CVE-2017-5753 CVE-2017-5715
Disable: sh set-protection.sh 0
time lfs find /lustre/arm2arm/| wc -lEnable: sh set-protection.sh 0
1454706
real 0m14.941s
user 0m1.633s
sys 0m10.770s
time lfs find /lustre/arm2arm/| wc -l
1454706
real 0m10.468s
user 0m0.959s
sys 0m5.521s
Let us hope that the situation will change in the near future....
And the script set-protection.sh content is:
#!/bin/bash
[ ! -d /sys/kernel/debug/x86 ]&& mount -t debugfs debugfs /sys/kernel/debug
echo $1 > /sys/kernel/debug/x86/pti_enabled
echo $1 > /sys/kernel/debug/x86/ibrs_enabled
echo $1 > /sys/kernel/debug/x86/ibpb_enabled
Dienstag, 28. November 2017
Lustre 2.10.x dkms usage
get the src for the lustre-client-dkms package from the:
1) rpmbuild --rebuild --without servers lustre-client-dkms-2.10.2_RC1-1.el7.src.rpm
2) yum install /root/pmbuild/RPMS/noarch/lustre-client-dkms-2.10.2_RC1-1.el7.centos.noarch.rpm
It will rebuild the modules for the active kernel with the following commands:
PS troubleshooting
if dkms status shows something like this:
2) rm -fr /lib/modules/3.10.0-693.5.2.el7.x86_64/extra/lustre
3) find and remove manually the module leftovers: find /lib/modules/| grep lustre
4) dkms --force install -m lustre-client -v 2.10.2_RC1 -k 3.10.0-693.5.2.el7.x86_64
https://build.hpdd.intel.com/job/lustre-b2_10/arch=x86_64,build_type=client,distro=el7,ib_stack=inkernel/
1) rpmbuild --rebuild --without servers lustre-client-dkms-2.10.2_RC1-1.el7.src.rpm
2) yum install /root/pmbuild/RPMS/noarch/lustre-client-dkms-2.10.2_RC1-1.el7.centos.noarch.rpm
It will rebuild the modules for the active kernel with the following commands:
/bin/bash /sbin/dkms build -m lustre-client -v 2.10.2_RC1 -k 3.10.0-693.5.2.el7.x86_64
PS troubleshooting
if dkms status shows something like this:
dkms status1) dkms --force remove -m lustre-client -v 2.10.2_RC1 -k 3.10.0-693.5.2.el7.x86_64
lustre-client, 2.10.2_RC1, 3.10.0-693.5.2.el7.x86_64, x86_64: installed (original_module exists) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
2) rm -fr /lib/modules/3.10.0-693.5.2.el7.x86_64/extra/lustre
3) find and remove manually the module leftovers: find /lib/modules/| grep lustre
4) dkms --force install -m lustre-client -v 2.10.2_RC1 -k 3.10.0-693.5.2.el7.x86_64
Dienstag, 12. September 2017
Again lustre upgrade troubles: unable to mount after upgrade to 2.10.x
The failure is following:
To fix it:
LDISKFS-fs (sda): Unrecognized mount option "context="unconfined_u:object_r:user_tmp_t:s0"" or missing valueBut Selinux is disabled so we should remove that mount options.
To fix it:
tunefs.lustre --mountfsoptions="user_xattr,errors=remount-ro" /dev/sdamount a expected.
Recover a single file on the Lustre due to the OST corruption.
Assuming the OST number 4 is corrupted.to get the ost ID on the client one can use lfs df :
Now once the file list is complete, one can use following script to copy the files over the corrupted files, here we assume that the backup path is /backup and target path is /archive:
lfs df /archiveIf the OST got corrupted then file attributes are still on MDS so we can filter all corrupted files by following:
UUID 1K-blocks Used Available Use% Mounted on
arch-MDT0000_UUID 99428812 44432680 48286884 48% /archive[MDT:0]
arch-OST0000_UUID 63838042896 46275072544 14344891120 76% /archive[OST:0]
arch-OST0001_UUID 63838042896 46036859640 14583104024 76% /archive[OST:1]
arch-OST0002_UUID 63838042896 34406650692 26213311960 57% /archive[OST:2]
arch-OST0003_UUID 63838042896 39355270936 21264676344 65% /archive[OST:3]
arch-OST0004_UUID 63838042896 7102256308 53517690972 12% /archive[OST:4]
lfs find /archive --ost 4 -type f | xargs -I% sh -c "[ ! -f % ]&& echo %" | tee -a recover/todo/OST04-corrupted.txtthis is assuming that files does not contain spaces or nasty characters. For the general case one should use python script for the proper handling of the filenames.
Now once the file list is complete, one can use following script to copy the files over the corrupted files, here we assume that the backup path is /backup and target path is /archive:
cat recover.sh
#!/bin/bash
file=$1
cat $file | xargs -I{} sh -c "[ -f \"/backup{}\" ]&&echo 'unlink \"{}\";cp -a \"/backup{}\" \"{}\"' "| parallel --progress --eta
./recover.sh recover/todo/OST04-corrupted.txt
Dienstag, 5. September 2017
Migrating lustre from 2.9.x to 2.10.x: Network setup
If you upgrade the Lustre 2.9.0 to 2.10, the service lnet will fail due to the miss-configured /etc/lnet.conf which is comming with the lustre-2.10.0-1.el7.x86_64 server package.
In 2.9 usually the line "options lnet networks=o2ib(ib0),o2ib3(ib0)" in the /etc/modprobe.d/lnt.conf brings the network up for the lnet but for the 2.10 it is simply ignored. It is due to the new network management in 2.10.x and above.
Digging littlebit more looks like a daemon unable to load /etc/lnet.conf .
The temporal solution what I found is manually setting the lnet network:
In 2.9 usually the line "options lnet networks=o2ib(ib0),o2ib3(ib0)" in the /etc/modprobe.d/lnt.conf brings the network up for the lnet but for the 2.10 it is simply ignored. It is due to the new network management in 2.10.x and above.
Digging littlebit more looks like a daemon unable to load /etc/lnet.conf .
The temporal solution what I found is manually setting the lnet network:
lnetctl net add --net o2ib --if ib0Then dump Yaml file of the working configuration and restart the lnet:
lnetctl net show --verbose >/etc/lnet.conf
service lnet stop
service lnet start
Donnerstag, 2. Februar 2017
Lustrefs 2.9 migrate data from OST
One of the osts is full, we got a new ost,
let us migrate:
let us migrate:
lfs find /archive -obd arch-OST0001_UUID -type f -size +1M | lfs_migrate -y
Abonnieren
Posts (Atom)