Dienstag, 12. September 2017

Again lustre upgrade troubles: unable to mount after upgrade to 2.10.x

The failure is following:
LDISKFS-fs (sda): Unrecognized mount option "context="unconfined_u:object_r:user_tmp_t:s0"" or missing value
But Selinux is disabled so we should remove that mount options.
To fix it:
tunefs.lustre --mountfsoptions="user_xattr,errors=remount-ro" /dev/sda
mount a expected.

Recover a single file on the Lustre due to the OST corruption.

Assuming the OST number 4 is corrupted.to get the ost ID on the client one can use lfs df  :
lfs df /archive
UUID                   1K-blocks        Used   Available Use% Mounted on
arch-MDT0000_UUID       99428812    44432680    48286884  48% /archive[MDT:0]
arch-OST0000_UUID    63838042896 46275072544 14344891120  76% /archive[OST:0]
arch-OST0001_UUID    63838042896 46036859640 14583104024  76% /archive[OST:1]
arch-OST0002_UUID    63838042896 34406650692 26213311960  57% /archive[OST:2]
arch-OST0003_UUID    63838042896 39355270936 21264676344  65% /archive[OST:3]
arch-OST0004_UUID    63838042896  7102256308 53517690972  12% /archive[OST:4]
If the OST got corrupted then file attributes are still on MDS so we can filter all corrupted files by following:

lfs find /archive --ost 4  -type f | xargs -I% sh -c "[ ! -f % ]&& echo %" | tee -a recover/todo/OST04-corrupted.txt
this is assuming that files does not contain spaces or nasty characters. For the general case one should use python script for the proper handling of the filenames.

Now once the file list is complete, one can use following script to copy the files over the corrupted files, here we assume that the backup path is /backup and target path is /archive:

 cat recover.sh
#!/bin/bash
file=$1
cat $file   | xargs -I{} sh -c  "[ -f \"/backup{}\" ]&&echo 'unlink \"{}\";cp -a \"/backup{}\" \"{}\"' "| parallel  --progress --eta 

./recover.sh recover/todo/OST04-corrupted.txt

Freitag, 8. September 2017

Running ANSYS on LINUX

Check which version is installed:
ls /ansys_inc/
v172
v182

If you want to run v182 then the simple answer is:
/ansys_inc/v182/Framework/bin/Linux64/runwb2

Mittwoch, 6. September 2017

Recover uefi boot on the new laptops: Windows 10 + Linux

On the new laptop somehow the boot loader got back to the windows 10 only, so to recover it simply run the command in the cmd.exe as a Administrator:
 bcdedit /set {bootmgr} path \EFI\ubuntu\grubx64.efi

Dienstag, 5. September 2017

Migrating lustre from 2.9.x to 2.10.x: Network setup

If you upgrade the Lustre 2.9.0 to 2.10, the service lnet will fail due to the miss-configured /etc/lnet.conf  which is comming with the lustre-2.10.0-1.el7.x86_64 server package.
In 2.9 usually the line "options lnet networks=o2ib(ib0),o2ib3(ib0)" in the /etc/modprobe.d/lnt.conf brings the network up for the lnet but for the 2.10 it is simply ignored. It is due to the new network management in 2.10.x and above.
Digging littlebit more looks like a daemon unable to load /etc/lnet.conf .

The temporal solution what I found is manually setting the lnet network:
lnetctl net add --net o2ib --if ib0
Then dump Yaml file of the working configuration and restart the lnet:
lnetctl net show --verbose >/etc/lnet.conf
service lnet stop
service lnet start