Let us start with CentOS 6.5
1) enable epel: yum locainstall -y http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
2) yum install clustershell -y
3) yum install ansible
Let us run who command on multiple hosts:
clush -w localhost,anotherhost -B -b w
ansible all -i 'localhost,anotherhost,' -c local -m command -a "w"
Then Analyze output....
Donnerstag, 18. September 2014
Mittwoch, 17. September 2014
OpenSM lid and more...
On centOS 6.5
yum groupinstall "Infiniband Support"
yum install infiniband-diags
Infiniband diagnostics tools are containing nice "weapons" to eliminate network BUGS.
One of them is:
saquery - query InfiniBand subnet administration attributes
it looks like this:
NodeRecord dump:
lid.....................0x7D
reserved................0x0
base_version............0x1
class_version...........0x1
node_type...............Switch
num_ports...............36
sys_guid................0xf452140300365230
node_guid...............0xf452140300365230
port_guid...............0xf452140300365230
partition_cap...........0x8
device_id...............0xC738
revision................0xA2
port_num................0
vendor_id...............0x2C9
NodeDescription.........MF0;switch-cf2826:SX6036/U1
yum groupinstall "Infiniband Support"
yum install infiniband-diags
Infiniband diagnostics tools are containing nice "weapons" to eliminate network BUGS.
One of them is:
saquery - query InfiniBand subnet administration attributes
it looks like this:
NodeRecord dump:
lid.....................0x7D
reserved................0x0
base_version............0x1
class_version...........0x1
node_type...............Switch
num_ports...............36
sys_guid................0xf452140300365230
node_guid...............0xf452140300365230
port_guid...............0xf452140300365230
partition_cap...........0x8
device_id...............0xC738
revision................0xA2
port_num................0
vendor_id...............0x2C9
NodeDescription.........MF0;switch-cf2826:SX6036/U1
Fix infiniband QDR or FDR troubles on fat memory machines.
Fix memory trouble on FAT machines:
The formula to computer the maximum value of pagepool when using RDMA is:
2^log_num_mtt x 2^log_mtts_per_seg * x PAGE_SIZE > ( 2x pagepool )
2^20 bytes x 2^4 x 4K = 64GiB
Add:
/etc/modprobe.d/mlx4_core.conf
options mlx4_core log_num_mtt=20 log_mtts_per_seg=4
check changes
The formula to computer the maximum value of pagepool when using RDMA is:
2^log_num_mtt x 2^log_mtts_per_seg * x PAGE_SIZE > ( 2x pagepool )
2^20 bytes x 2^4 x 4K = 64GiB
Add:
/etc/modprobe.d/mlx4_core.conf
options mlx4_core log_num_mtt=20 log_mtts_per_seg=4
check changes
more /sys/module/mlx4_core/parameters/log_num_mtt
more /sys/module/mlx4_core/parameters/log_mtts_per_seg
CentOS 7 after install:
CentOS 7 is different management than CentOS 5.x and 6.x.
It is uses systemctl to manage services.
systemctl enable sshd
systemctl list-unit-files
systemctl get-default
systemctl set-default multi-user.target
systemctl disable firstboot-graphical.service
systemctl disable bluetooth.service
systemctl enable network.service
systemctl show
systemd-analyzesystemd-analyze blame
Tune other stuff as you need.
It is uses systemctl to manage services.
systemctl enable sshd
systemctl list-unit-files
systemctl get-default
systemctl set-default multi-user.target
systemctl disable firstboot-graphical.service
systemctl disable bluetooth.service
systemctl enable network.service
systemctl show
systemd-analyzesystemd-analyze blame
Tune other stuff as you need.
Lustre 2.1.6 server and 2.5.3 client.
Do they compatible ?
The answer is yes!
Lustre client upgrading using yum in centos 6.5:
vim /etc/yum.repo/lustre.repo
[hpddLustreserver]
name=CentOS-$releasever - Lustre
baseurl=https://downloads.hpdd.intel.com/public/lustre/latest-maintenance-release/el6/server/
gpgcheck=0
enabled=1
[e2fsprogs]
name=CentOS-$releasever - Ldiskfs
baseurl=https://downloads.hpdd.intel.com/public/e2fsprogs/latest/el6/RPMS/
gpgcheck=0
enabled=1
[hpddLustreclient]
name=CentOS-$releasever - Lustre
baseurl=https://downloads.hpdd.intel.com/public/lustre/latest-maintenance-release/el6/client/
gpgcheck=0
enabled=1
On client:
1) yum update lustre-client -y
2) and finally one liner to restart the lustre mount point:
umount /lustre;lustre_rmmod;service lnet stop;service lnet start;mount /lustre
The answer is yes!
Lustre client upgrading using yum in centos 6.5:
vim /etc/yum.repo/lustre.repo
[hpddLustreserver]
name=CentOS-$releasever - Lustre
baseurl=https://downloads.hpdd.intel.com/public/lustre/latest-maintenance-release/el6/server/
gpgcheck=0
enabled=1
[e2fsprogs]
name=CentOS-$releasever - Ldiskfs
baseurl=https://downloads.hpdd.intel.com/public/e2fsprogs/latest/el6/RPMS/
gpgcheck=0
enabled=1
[hpddLustreclient]
name=CentOS-$releasever - Lustre
baseurl=https://downloads.hpdd.intel.com/public/lustre/latest-maintenance-release/el6/client/
gpgcheck=0
enabled=1
On client:
1) yum update lustre-client -y
2) and finally one liner to restart the lustre mount point:
umount /lustre;lustre_rmmod;service lnet stop;service lnet start;mount /lustre
Ovirt 3.4.3-1: recover VM from an "unknown" state.
After iscsi(iser) storage failure and host local disk corruption one of the HA VMs is stalled in "?"=status unknown.
restarting ovirt-engine and hosts did not help much. The host which was owning VM was not there. But Web portal was showing unknown status for the host. I was not able to reboot or stop it. All services are gone with bad disk on the host. Looks like main problem was missing iscsi disk storage. It was hanging in "locked state".
I found simple solution in 3 steps:
find the hanging disk ID from web interface it looks something like this:324f9089-0a40-4744-aa33-5c5a108f7f43
on ovirt-engine server: su - postgres
psql -U postgres engine -c "select fn_db_unlock_disk('324f9089-0a40-4744-aa33-5c5a108f7f43');"
After this steps take down hanging host from the web interface. HA VM will come up to another "healthy node".
Abonnieren
Posts (Atom)