Posts Tagged ‘rdma’

Gentoo Cluster: ofa_kernel installation

Friday, July 29th, 2011

Previously, I have setup the cluster and installed the Infiniband kernel modules and userspace libraries. However, a problem was lingering. When the command ibv_devinfo was run, the following error message was always given.

mlx4: There is a mismatch between the kernel and the userspace libraries: Kernel does not support XRC. Exiting.
Failed to open device

I have been ignoring this message. But recently I need to run some serious work with parallel computational power. The same error showed up now and then and MPI communication could not be established expect via the TCP/IP socket. The error was so annoying so i decided to solve the problem.
For the first step, I downloaded the OFED-1.5.3.2 installation package from the OpenFabrics website and extracted the ofa_kernel-1.5.3.2 package from it. I have tried the previous versions and it was not successful to install them on my kernel (2.6.38-gentoo-r6). The typical configure-make-make_install procedure was used to install the modules. However, with the configuration option, --with-nfsrdma-mod, the NFS/RDMA modules (svcrdma and xprtrdma) were unable to compile. They were just too many errors. Even after I manually modified all the errors-related sentences and the compilation was finished, the modules could not be loaded at all. So I have to give up that option.
The newly installed modules were placed under /lib/modules/`uname -r`/updates. After rebooting, the computer was frozen during boot-up. Lots of error messages with “Bad RIP value” were shown up. It turned up it was due to NFS/Client mounting. So after “netmount” was removed from the default runlevel, the rebooting was okay. Now the problem seems solved. The command ibv_devinfo gives the information I expected.

hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.7.710
node_guid: f04d:a290:9778:efe0
sys_image_guid: f04d:a290:9778:efe3
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xB0
board_id: DEL08F0120009
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 6
port_lid: 3
port_lmc: 0x00
link_layer: IB

port: 2
state: PORT_DOWN (1)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: IB


Other diagnostic commands also work fine.
But now a new problem emerges. Although the build-in modules for NFS/RDMA with the kernel (2.6.38-gentoo-r6) were able to load. But whenever I tried to mount a network folder with the rdma protocol, the error message related “Bad RIP value” appeared and the mounting failed. Therefore, I have to switch the traditional TCP protocol. This seems a okay comprise.

After the kernel modules were updated, I installed MVAPICH2 (1.7rc1) using the 3-step installation procedure. I have run some basic test jobs and the osu_benchmarks. It was okay to run the jobs with mpiexec. But when using mpirun_rsh, the following errors were produced without successful results.

[unset]: Unable to get host entry for
[unset]: Unable to connect to on 33276
start..Fatal error in MPI_Init:
Other MPI error
...

By checking the source code, it seems the problem is related a function called gethostbyname which is defined in netdb.h. How to use the package with PBS is needed to figure out.

Gentoo: NFS/RDMA (Infiniband)

Tuesday, July 12th, 2011

Our cluster system consists of a Dell EqualLogic PS4000e iSCSI SAN (16T) storage array. I used it for database storage and home directory of regular users. The storage array was mounted to the master node using iSCSI initiator, mount point, /mnt/ps4000e/. Then the sub-directory /mnt/ps4000e/home was exported across the cluster, so each node has access to the same home directory. So everyday users do not need move their data files between nodes. NFS services provides the network-based mounting. NFS sever/client is easy to install by following the guideline at http://en.gentoo-wiki.com/wiki/NFS/Server. Data transfer is via the IPoIB mechanism. But since we have Infiniband network, we could use RDMA network. NFS/RDMA achieves much faster speed. Here is my experience to setup NFS/RDMA.

Step 1: Kernel compilation
1) Requirements for NFS Server/Client
For the server node, it is needed to turn on File systems/Network File Systems/NFS server support.
For the client node, it is needed to turn on File systems/Network File Systems/NFS client support.
2) Requirements for RDMA support
Drivers for Infiniband should be compiled as module as said in a previous node. Check if RDMA support is enabled. Make sure that SUNRPC_XPRT_RDMA in the .config file has a value of M.

Step 2: emerge net-fs/nfs-utils
The version of 1.2.3-r1 is installed. The portmap package is no longed needed. Instead, rpcbind as a dependency will be installed instead. If you see the error message that says the nfs-utils package is blocked portmap, un-emerge portmap first. If portmap is pulled by ypserv, un-emerge ypserv and ypbind packages first. After installation of nfs-utils, then emerge ypserv ypbind again.

Step 3: Create the mount point.
edit the /etc/exports file. add the following line,

# /etc/exports: NFS file systems being exported. See exports(5).
/mnt/ps4000e/home 10.0.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_subtree_che
ck,no_root_squash)

The option insecure is important here because the NFS/RDMA client does not use a reserved port.

Step 4: Load necessary modules.
On the server node, svcrdma is needed. On the client node, xprtrdma is needed. I added them into the /etc/init.d/nfs script file. Put the following sentences into an appropriate place in the init.d file.

# svcrdma: server-side module for NFS/RDMA
# xprtrdma: client-side module for NFS/RDMA
/sbin/modprobe svcrdma > /dev/null 2>&1
/sbin/modprobe xprtrdma > /dev/null 2>&1

Remember to unload them when stopping the services. Or add corresponding rmmod commands into the script.

Step 5: Instruct the server to listen on the RDMA transport.

echo "rdma 20049" > /proc/fs/nfsd/portlist

I added it into the nfs script as well.

Step 6: Start the NFS service

/etc/init.d/nfs start

Or add the script to the default run level.

rc-update add nfs default

Step 7. Mount the file system on the client node.
First, ensure that the module xprtrdma has been loaded.

modprobe xprtrdma

Then, use the following command to mount the NFS/RDMA server:

mount -o rdma,port=20049 10.0.0.1:/mnt/ps4000e/home /mnt/ps4000e/home

To verify that the mount is using RDMA, run cat /proc/mounts to check the proto field.
Alternatively for automatic mounting during the boot-up, add the following record to the file /etc/fstab.

10.0.0.1:/mnt/ps4000e/home /mnt/ps4000e/home nfs _netdev,proto=rd
ma,port=20049 0 2

Use the init.d script netmount to mount the NFS/RDMA server.