Posts mit dem Label lustre werden angezeigt. Alle Posts anzeigen
Posts mit dem Label lustre werden angezeigt. Alle Posts anzeigen

Dienstag, 13. Februar 2018

HDF5 v1.10 or above on the Lustre FS

If you unable to read the files on the lustrefs with the  HDF5 library v1.10 and above you should mount the FS with localflock option. The reason is flock support absence by default.

h5ls --version
h5ls: Version 1.10.1
Error:
h5ls snapshot.hdf5 snapshot_063.hdf5: unable to open file
What?!

mount -tlustrefs  -o localflock 10.10.10.10@o2ib:/lustre /lustre

Now everything is ok!!
 
h5ls snapshot.hdf5
Header                   Group
PartType0                Group 

Mittwoch, 10. Januar 2018

Lustrefs: a big performance hit on lfs find after patch of: CVE-2017-5754 CVE-2017-5753 CVE-2017-5715


Disable: sh set-protection.sh 0
time lfs find /lustre/arm2arm/| wc -l
1454706

real    0m14.941s
user    0m1.633s
sys    0m10.770s
 Enable: sh set-protection.sh 0

time lfs find /lustre/arm2arm/| wc -l
1454706

real    0m10.468s
user    0m0.959s
sys    0m5.521s

Let us hope that the situation will change in the near future....

 And the script set-protection.sh content is:

 #!/bin/bash
[ ! -d  /sys/kernel/debug/x86 ]&& mount -t debugfs debugfs /sys/kernel/debug
echo $1 > /sys/kernel/debug/x86/pti_enabled
echo $1 > /sys/kernel/debug/x86/ibrs_enabled
echo $1 > /sys/kernel/debug/x86/ibpb_enabled

Dienstag, 28. November 2017

Lustre 2.10.x dkms usage

get the src for the lustre-client-dkms package from the:
https://build.hpdd.intel.com/job/lustre-b2_10/arch=x86_64,build_type=client,distro=el7,ib_stack=inkernel/

1) rpmbuild --rebuild --without servers lustre-client-dkms-2.10.2_RC1-1.el7.src.rpm
2) yum install /root/pmbuild/RPMS/noarch/lustre-client-dkms-2.10.2_RC1-1.el7.centos.noarch.rpm
It will rebuild the modules for the active kernel with the following commands:
 
/bin/bash /sbin/dkms build -m lustre-client -v 2.10.2_RC1 -k 3.10.0-693.5.2.el7.x86_64

PS troubleshooting
if dkms  status shows something like this:
 dkms status
lustre-client, 2.10.2_RC1, 3.10.0-693.5.2.el7.x86_64, x86_64: installed (original_module exists) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
1) dkms --force remove  -m  lustre-client -v 2.10.2_RC1 -k 3.10.0-693.5.2.el7.x86_64
2) rm -fr  /lib/modules/3.10.0-693.5.2.el7.x86_64/extra/lustre
3) find and remove manually the module leftovers: find /lib/modules/| grep lustre
4) dkms --force install -m  lustre-client -v 2.10.2_RC1 -k 3.10.0-693.5.2.el7.x86_64

Dienstag, 12. September 2017

Again lustre upgrade troubles: unable to mount after upgrade to 2.10.x

The failure is following:
LDISKFS-fs (sda): Unrecognized mount option "context="unconfined_u:object_r:user_tmp_t:s0"" or missing value
But Selinux is disabled so we should remove that mount options.
To fix it:
tunefs.lustre --mountfsoptions="user_xattr,errors=remount-ro" /dev/sda
mount a expected.

Recover a single file on the Lustre due to the OST corruption.

Assuming the OST number 4 is corrupted.to get the ost ID on the client one can use lfs df  :
lfs df /archive
UUID                   1K-blocks        Used   Available Use% Mounted on
arch-MDT0000_UUID       99428812    44432680    48286884  48% /archive[MDT:0]
arch-OST0000_UUID    63838042896 46275072544 14344891120  76% /archive[OST:0]
arch-OST0001_UUID    63838042896 46036859640 14583104024  76% /archive[OST:1]
arch-OST0002_UUID    63838042896 34406650692 26213311960  57% /archive[OST:2]
arch-OST0003_UUID    63838042896 39355270936 21264676344  65% /archive[OST:3]
arch-OST0004_UUID    63838042896  7102256308 53517690972  12% /archive[OST:4]
If the OST got corrupted then file attributes are still on MDS so we can filter all corrupted files by following:

lfs find /archive --ost 4  -type f | xargs -I% sh -c "[ ! -f % ]&& echo %" | tee -a recover/todo/OST04-corrupted.txt
this is assuming that files does not contain spaces or nasty characters. For the general case one should use python script for the proper handling of the filenames.

Now once the file list is complete, one can use following script to copy the files over the corrupted files, here we assume that the backup path is /backup and target path is /archive:

 cat recover.sh
#!/bin/bash
file=$1
cat $file   | xargs -I{} sh -c  "[ -f \"/backup{}\" ]&&echo 'unlink \"{}\";cp -a \"/backup{}\" \"{}\"' "| parallel  --progress --eta 

./recover.sh recover/todo/OST04-corrupted.txt

Dienstag, 5. September 2017

Migrating lustre from 2.9.x to 2.10.x: Network setup

If you upgrade the Lustre 2.9.0 to 2.10, the service lnet will fail due to the miss-configured /etc/lnet.conf  which is comming with the lustre-2.10.0-1.el7.x86_64 server package.
In 2.9 usually the line "options lnet networks=o2ib(ib0),o2ib3(ib0)" in the /etc/modprobe.d/lnt.conf brings the network up for the lnet but for the 2.10 it is simply ignored. It is due to the new network management in 2.10.x and above.
Digging littlebit more looks like a daemon unable to load /etc/lnet.conf .

The temporal solution what I found is manually setting the lnet network:
lnetctl net add --net o2ib --if ib0
Then dump Yaml file of the working configuration and restart the lnet:
lnetctl net show --verbose >/etc/lnet.conf
service lnet stop
service lnet start






Donnerstag, 2. Februar 2017

Lustrefs 2.9 migrate data from OST

One of the osts is full,  we got a new ost,
let us migrate:

lfs find /archive  -obd arch-OST0001_UUID -type f -size +1M | lfs_migrate -y