Sonntag, 5. Januar 2014

Lustre 2.4.2 performance tests: ZFS vs ext4 backends.

The fastest mysql performance I have seen on RAID10 with 24 SASx4TB disks.
Before testing dd on both was bringing almost 2.5GB/s:
seq 1..8 | xargs -I{} echo "dd if=/dev/zero of=/data/test.{} bs=4M count=1000&"| sh

Next I was testing with Lustre 2.4.2 MDT:
1)ZFS 0.6.2 CentOS6.5
RAID10-ZFS(R0)
mds:>thrhi=16 dir_count=16 file_count=200000 mds-survey
Mon Dec 23 14:48:36 CET 2013 /usr/bin/mds-survey from test.host
mdt 1 file  200000 dir   16 thr   16 create 9185.17 [   0.00,15999.01] lookup 421928.47 [421928.47,421928.47] md_getattr 392586.40 [392586.40,392586.40] setxattr 56024.07 [25998.10,29999.37] destroy 10387.71 [   0.00,15999.41]
done!

2) RAID10-ldiskfs
mds:> thrhi=16 dir_count=16 file_count=200000 mds-survey
Mon Dec 23 15:06:46 CET 2013 /usr/bin/mds-survey from test.host
mdt 1 file  200000 dir   16 thr   16 create 126533.11 [126533.11,126533.11] lookup 1129547.84 [1129547.84,1129547.84] md_getattr 776042.03 [776042.03,776042.03] setxattr 115977.53 [115977.53,115977.53] destroy 156916.53 [156916.53,156916.53]
done!

The result is: zfs is slower than ldiskfs. In some operations it is about 10 times slower than ext4.

This can be that zfs was running on hardware raid 10. Maybe with HBA it will bring more performance.

Got bad lustre OST? Disable it!!

DEACTIVATE WRITES to OST
=========================
mgs:>lctl conf_param fsname-OST0002-osc.osc.active=0
 

check active/inactive OST:

cat /proc/fs/lustre/lov/fsname-MDT0000-mdtlov/target_obd
 
Simply replace fsname of your file system name. 

Before taking off or re-formating OST you can move out the data:  
Assuming we would like to move out all data from OST0002, run script on client: 
lfs find /lustre --ost fsanme-OST0002 | xargs -P4 -I{} rebalance-lustre.sh {}
 
And script: 
cat /opt/admin/rebalance-lustre.sh 
#!/bin/bash

file="$1"
fn=$(mktemp --dry-run "${file}".lureb-XXXXX )
echo $file $fn
cp -a "$file" "$fn" && unlink "$file" && mv "$fn" "$file"

=====================================================
Note: Script does not work properly with files with special characters like space.  

Donnerstag, 2. Januar 2014

Lustre upgrading from 2.4.1 to 2.4.2

For upgrading it is recommended to unmount clients. but if you cannot like in my case, just leave them.

Before upgrade make sure that server is stopped:
  1. umount OST 
  2. umount MDT
  3. umount MGS

Note that order is important!
in one server I got kernel panic because of MGS was unmounted before OST went down.


Then remove all lustre  modules from servers:
 service lustre stop;service lnet stop;lustre_rmmod;service lnet stop



Download latest e2fs from:
http://downloads.whamcloud.com/public/e2fsprogs/

NOTE: if you google: "lustre e2fsprogs" you will end up tons of wrong/old links.
The actual one is from whamcloud.com!!!

Install e2fsprogs:
yum install e2fsprogs-1.42.7.wc2-7.el6.x86_64.rpm e2fsprogs-libs-1.42.7.wc2-7.el6.x86_64.rpm libcom_err-1.42.7.wc2-7.el6.x86_64.rpm libss-1.42.7.wc2-7.el6.x86_64.rpm

NOTE: once you installed e2fsprogs you cannot  remove them any more.

The last step is installing all rpms from latest-maintenance-release.

Download  recent lustre from:
http://downloads.whamcloud.com/public/lustre/latest-maintenance-release/el6/server/RPMS/x86_64/

NOTE: If you dont use zfs back-end do not install  lustre-osd-zfs*.

reboot the servers and mount:
  1. MGS
  2. MDT
  3. OSTs 
wait until recovery mode is gone. Go to clients  and df -h should work.
On first hour on heavy loaded cluster you will probably get very hi-load on OSTS and mdt. It is expected, due to parallel commits from the clients.

Shut up the message: "padlock: VIA PadLock Hash Engine not detected."

If you don't have  motherboard with padlock devices on every service lnet start you will get error messages in dmesg:
alg: No test for crc32 (crc32-table)
alg: No test for adler32 (adler32-zlib)
alg: No test for crc32 (crc32-pclmul)

padlock: VIA PadLock Hash Engine not detected.


or similar.

To shut up the padlock message simply add line in your /etc/modprobe.d/blacklist.conf

blacklist padlock

Lustre 2.4.2 client on SL 6.x: recompile for older kernels.

It is fact that on cluster environments is not possible to keep linux kernel up to date.
For sure it is recommended time to time update kernels on all cluster if some critical security bugs are fixed.
Recently Lustre 2.4.2 has been released with kernel-2.6.32-358.23.2.el6.
If you run different kernel it is not a problem to recompile it.
In order to recompile
1) download src of client:
wget http://downloads.whamcloud.com/public/lustre/latest-maintenance-release/el6/client/SRPMS/lustre-client-2.4.2-2.6.32_358.23.2.el6.x86_64.src.rpm

2) rebuild using following command:

rpmbuild --rebuild --define 'lustre_name lustre-client' --define 'configure_args --disable-server' lustre-client-2.4.2-2.6.32_358.23.2.el6.x86_64.src.rpm

If you run this as root the new rpms will be generated in:
/root/rpmbuild/RPMS/x86_64

To install:
cd /root/rpmbuild/RPMS/x86_64 && yum install ./*.rpm

The last step reload lnet and lustre modules before remounting the lustre filesystem:
service lustre stop;service lnet stop;lustre_rmmod;service lnet stop;

Some times osc has still open references, therefore we need to   stop lnet twice.

Finally:
service lnet start;modprobe lustre;dmesg

the dmesg should contain some thing like:
Lustre: Lustre: Build Version: 2.4.2-RC2--PRISTINE-2.6.32-358.14.1.el6.x86_64

If you would like to test which versuon of client is running just check in:
cat /proc/fs/lustre/version

So we did it lustre client upgrade without rebooting. Have a fun with new lustre client.