My college recently purchased a Dell M610 cluster and I am in charge for the administration job. The cluster consists of 8 nodes and each node has two 1Gb Ethernet cards and one Infiniband card(? or whatever it should be called). The nodes are connected with two back-pane Dell PowerConnect 6220 ethernet switch and one Mellanox M3601Q switch on the chassis. The Infiniband switch does not come with subnet management.
I decide to choose Gentoo as the base system for a clean and slim installation, particularly the meta-package administration system, portage, is a great attraction. Basic system installation is no problem and Ethernet cards setup is easy. But Infiniband is a big trouble at the beginning because the OFED package from either Mellanox or OpenFabrics only supports Redhat and SUSE linux boxes. There is no much documentation available and all the packages are rpm packages… Official portage build does not include any Infiniband packages. Gentoo-science overlay has a category sys-infiniband, but it seems they are not well curated and many issues. I spent quite a lot of time to figure out a solution.. Here I record what I did for the future reference and someone who might encounter the same problem.
Step 1 : check the hardware information
lspci
Information regards to the infiniband is as follows:
04:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s – IB QDR / 10GigE] (rev b0)
check if it is supported by the linux kernel at http://kmuto.jp/debian/hcl/.
It is supported for kernel v2.6.25- and use “mlx4_core” driver.
Step 2: kernel compilation
I use the gentoo kernel source 2.6.34-gentoo-r12. Other versions higher than 2.6.25 should be fine, I assume.
a) Device Drivers -> set “Infiniband support” as module. Under “Infiniband support”, set the following ones as module, “Infiniband userspace MAD support, Infiniband userspace access (verbs and CM), Mellanox ConnectX HCA support, IP-over-Infiniband, Infiniband SCSI RDMA Protocol, iSCSI Extensions for RDMA (iser)”. Set “IP-over-InfiniBand Connected Mode Support” as built-in. For other Infiniband cards, choose other drivers than “Mellanox ConnectX HCA”
b) Device Drivers -> Network device support: set Ethernet (10000 Mbit) (Gigabit Eternet card has already been configured). Set “Mellanox Technologies ConnectX 10G support” as module. This provides the driver, mlx4_en.
c) run “make && make modules_install” to compile the kernel.
So far, we have a ready kernel. After rebooting, using the command, lsmod, you should see “mlx4_core” being loaded.
Tags: cluster, gentoo, infiniband, ofed