Mittwoch, 23. Januar 2013

Installing SRPT on Scientific Linux 6.3/6.x PART-1

These all instructions are assuming that you have a machines with IB cards connected with IB switch.
My tests are done with Mellanox DDR(ConnectX-2-20Gb/s) and QDR(ConnectX-3-40Gb/s) cards
Assuming you have on machineA blockdevice/RAID6  on /dev/vg00/lv_01 and you wont to export it to machineB as a block device/scsi device.

Why do we need it?
Situation1: We bot a bunch of RAID boxes each with 24HDDs very fast on local IO, can I use their full power on MS SQL or MySQL database? Can I combine them in to one striped LVM on DB server?
Situation2: We have a server with huge RAID in LVM  and we are running out of space.
Can we expand LVM with another raid without plugging yet another physical raid?
Can we expand LVM without any downtime of server? Yes we can if we will bring the new empty raid over the Infiniband to the server.

There are several solutions for this: ISCSI(TCP/IP)-slow, ISER(RDMA)-fast, SRP(RDMA)-fastest-almost no overhead.

Yes we can export block device with ISCSI protocol. One can export block device from Linux machine to Windows machine using TCP/IP, or better to use ip-over-ib to have 300MB/s prformance. On concurrent and heavy loads ISCSI uses a lot of CPU and performance in some cases is only 10% of original RAID performance. Due to the overhead of ISCSI protocol I never got more than 350MB/s.

There is another more advanced protocol ISER which is ISCSI over RDMA. This one brings more performance from my personal tests it is about 30-40% of original RAID. Unfortunately on heavy loads the kernel panic is  accruing with unknown reasons. ISER performance was about 400~420 MB/s

Reading the white paper "Building a Scalable Storage with InfiniBand" from Mellanox was motiating me to look in SRP protocol.


Let us install SRPT and mount it on another machine.

Assuming machineA is a target and machineB is an initiator.
In normal language machineA is exporting the block device and machineB is a client.

There are  several steps to get running whole machinery.
First of all, do the following in the clean SL 6.3 installation yum -y update && yum -y upgrade.
The next step depend on your IB card, if IB card is working with RDMA then you don't need OFED at all.
On my some machines I have QDR cards(Mellanox Technologies MT27500 Family [ConnectX-3]) they are ok with kernels 2.6.32-279.x.
But another card DDR with dual port does not work without OFED.
Getting RDMA running without and with OFED:
1) Without OFED.
In order to test your IB installation you can test RDMA connection: the utils are coming with librdmacm-utils (it contains some examples how to use cm library)
machineA:> rdma_server
then on server 2:
machineB> rdma_client -s serverA.ibIP
The output should not contain negative results:
rdma_client: start
rdma_server: end 0
rdma_client: end 0
This looks nice.

2) With OFED we need to be careful, It brings broken SRPT and  we will not be able to compile SCST properly.
2.1) in the OFED folder: ./configure -c myofed.conf
the myofed.conf you can generate like:
./install.pl -p 
it will generate ofed-all.conf, rename it to  myofed.conf
Edit that file and put and remove all other stuff what you dont require like mpi or sdp, etc...:
nfsrdma=n
rnfs-utils=n
srp=n
srpt=n
iser=n

I put here nfs as well,so we are not going to run buggy nfs-rdma on srpt server. The iser  brings only 30% of original RAID performance, so we say no to iser/iscsi protocol.

For using SCST we need to compile that from source
So let me explain the scripts for compiling:

WARNING: We are going to compile Target part, after this point any kernel update will break the SRPT integrity.


On machine where you are going to compile code:
adduser
mockbuild
visudo
Add the line
mockbuild ALL=(ALL) NOPASSWD: ALL
We need to set up our rpmbuild environment
mkdir -p ~/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
echo “%_topdir      %(echo $HOME)/rpmbuild
%_smp_mflags  -j3
%_signature gpg
%_gpg_name Your Name (Your Title) ” > .rpmmacros
Time to get some other directories prepared…
mkdir ~/scst
mkdir ~/mlx
cd ~/rpmbuild/SOURCES
wget http://downloads.sourceforge.net/project/scst/srpt/2.2.0/srpt-2.2.0.tar.bz2
wget http://downloads.sourceforge.net/project/scst/scst/2.2.0/scst-2.2.0.tar.bz2
wget http://downloads.sourceforge.net/project/scst/scstadmin/2.2.0/scstadmin-2.2.0.tar.bz2
wget http://downloads.sourceforge.net/project/scst/iscsi-scst/2.2.0/iscsi-scst-2.2.0.tar.bz2
cp ./srpt-2.2.0.tar.bz2 ~/scst
wget -O ~/rpmbuild/SPECS/scst.spec.newlines http://pastebin.com/raw.php?i=HLMskJKK
cd ~/rpmbuild/SPECS
tr -d '\r' < scst.spec.newlines > scst.spec
rm scst.spec.newlines
cd ~/scst
tar jxvf srpt-2.2.0.tar.bz2

#Patch the kernel, you can put this into shell script then execute later
if [ -e /lib/modules/$(uname -r)/build/scripts/Makefile.lib ]; then     cd /lib/modules/$(uname -r)/build;   else     cd /usr/src/linux-$(uname -r);   fi
sudo patch -p1 < /home/mockbuild/scst/srpt-2.2.0/patches/kernel-$(uname -r | sed -e 's|-.*||')-pre-cflags.patch


#Compile and build RPMS
cd ~/rpmbuild/SPECS
rpmbuild -ba scst.spec

If you see error during the compilation please follow the advice which will tell you remove ib modules from the kernel path.
WARNING: Before installation remove /lib/modules/$(uname -r)/updates/drivers/infiniband/ulp/srpt if exist!!
Almost done.
Let us install rpms on the target machine:
copy rpms from ~mockbuild/rpmbuild/RPMS/..etc

rpm -Uhv ~/kernel-module-scst-core-2.6.32-???.el6-2.2.0-1.ab.x86_64.rpm
rpm -Uhv ~/kernel-module-scst-srpt-2.6.32-???.el6-2.2.0-1.ab.x86_64.rpm
rpm -Uhv ~/scst-tools-2.6.32-220.4.2.el6-???-1.ab.x86_64.rpm

# regenerate dependency of modules
depmod -aq $(uname -r)

 Done!
In the next post I will show how to configure scst.


Thanks to Uwe Sauter from HLRS to initiating to publish it.
The credits are going to: http://sonicostech.wordpress.com/2012/02/23/howto-infiniband-srp-target-on-centos-6-incl-rpm-spec/ for detailed explanation which was working for me.

Keine Kommentare: