Recently, I need to process large numbers of SDF files and some of them are too big (>500MB) to load into memory altogether. A immediate solution is to split these big files into smaller chunks. The two commands, split
and csplit
, which come with Linux, seem incapable to meet this need. split
is convenient to split a text file by lines; but a SDF file could contain many different molecular records and they do not have size, so split
will break the integrity of the molecular records. One has to manually fix the head and tails of each result file. csplit
is more flexible and can split a file according to patterns. But the weak point is that there is no way to specify how many matched patterns to skip before splitting. As a result, if I use “$$$$” as the record delimiter to csplit
a SDF file, it will break each molecular record into a single file. There are just too many of them! That is not what I want. (One could use cat
to concatenate them together, but it is too troublesome because of the large number of files).
I wrote this clsplit
to meet this need. It is available at here. The basic idea behind this script is to simulate the fixing work after split
ing the file by specifying number of lines. It can be called using the following syntax.
clsplit PATTERN line_number file_name
PATTERN must be a valid pattern for grep
. For example, if i want to split a big SDF file, the following command can be used.
clsplit \$\$\$\$ 10000 my.sdf
The resulting files usually do not have an exact number of 10000 lines and who cares about an exact number of lines! More important is to preserve the integrity of each record.
clsplit: Combination of `split’ and `csplit’
August 15th, 2011Gentoo Cluster: Gamess Installation with MVAPICH2 and PBS
August 1st, 2011Gamess is an electronic structure calculation package. Its installation is easy if you just want to use “sockets” communication mode. Just emerge it as you regularly do. Then use “rungms” to submit your job. The default rungms is okay to run the serial code. For the parallel computation, you still need to tune the script slightly. But since our cluster has Infiniband installed, it is better to go with the “mpi” communication mode. It took me quite some time to figure out how to install it correctly and make it run with mpiexec.hydra alone or with OpenPBS (Torque). Here is how I did it.
Software packages related:
1. gamess-20101001.3 (Dowload it beforehand from its developer’s website)
2. mvapich2-1.7rc1. (Previous versions should be okay and I installed it under /usr/local/)
3. OFED-1.5.3.2. (Userspace libraries for Infiniband. See my previous post. Only updated kernel modules installed. Userspace libraries should be the same as in OFED-1.5.3.1)
4. torque-2.4.14 (OpenPBS)
Steps
1. Update the gamess-20101001.3.ebuild
with this one and manifest it.
2. Unmask the mpi
user flag for gamess in /usr/portage/profiles/base/package.use.mask
.
3. Add sci-chemistry/gamess mpi
to /etc/portage/package.use
; then emerge -av gamess
.
4. Update rungms
with this one;
5. Create a new script pbsgms
as this one;
6. Add kernel.shmmax=XXXXX
to /etc/sysctl.conf
, in which XXXXX is a large enough integer for shared memory (default value 32MB is too small for DDI). Run /sbin/sysctl -w kernel.shmmax=XXXX
to update the setting in-the-fly.
Added on Sept. 9, 2011. It seems that kernel.shmall=XXXXX
should be modified as well. Please bear in mind that the unit for kernel.shmall
is pages and kernel.shmmax
is bytes. And a page is 4096 bytes in usual(use getconf PAGE_SIZE
to verify).
7. Environment setting. Create a file /etc/env.d/99gamess
GMS_TARGET=mpi
GMS_SCR=/tmp/gamess
GMS_HOSTS=~/.hosts
GMS_MPI_KICK=hydra
GMS_MPI_PATH=/usr/local/bin
Then update your profile.
8. Create a hostfile, ~/.hosts
node1
node2
...
This file is only needed by invoking rungms
directly.
9. Test your installation: copy a test job input file exam20.inp
under/usr/share/gamess/tests/
; submit the job using pbsgms exam20
(other settings will be prompted), or using rungms exam20 00 4
.
Explanations
1. Two changes were made on the ebuild file.
(a). The installation suggestions given in the documentation of Gamess is not enough. More libraries other than mpich are needed to pass over to lked
, the linker program for Gamess.
(b) MPI environment constants are needed to exported to the installation program, compddi
through an temporary file install.info
.
2. Many changes were made for the script, rungms
. I could not remember all of them. Some are as following.
(a) For parallel computation, the scratch file will be put under /tmp on each node by default.
(b) The script will be working with pbsgms
.
(c) System-wide setting for Gamess can be put under /etc/env.d.
(d) A host file is needed if not using PBS. By default, it should be at ~/.hosts
. If not found, running on the local host only.
3. The script pbsgms
is based on sge-pbs
shipped with the Gamess installation package. I have made it to work with Torque. Numerous changes were made.
Gentoo Cluster: ofa_kernel installation
July 29th, 2011Previously, I have setup the cluster and installed the Infiniband kernel modules and userspace libraries. However, a problem was lingering. When the command ibv_devinfo
was run, the following error message was always given.
mlx4: There is a mismatch between the kernel and the userspace libraries: Kernel does not support XRC. Exiting.
Failed to open device
I have been ignoring this message. But recently I need to run some serious work with parallel computational power. The same error showed up now and then and MPI communication could not be established expect via the TCP/IP socket. The error was so annoying so i decided to solve the problem.
For the first step, I downloaded the OFED-1.5.3.2
installation package from the OpenFabrics website and extracted the ofa_kernel-1.5.3.2
package from it. I have tried the previous versions and it was not successful to install them on my kernel (2.6.38-gentoo-r6
). The typical configure-make-make_install procedure was used to install the modules. However, with the configuration option, --with-nfsrdma-mod
, the NFS/RDMA modules (svcrdma
and xprtrdma
) were unable to compile. They were just too many errors. Even after I manually modified all the errors-related sentences and the compilation was finished, the modules could not be loaded at all. So I have to give up that option.
The newly installed modules were placed under /lib/modules/`uname -r`/updates
. After rebooting, the computer was frozen during boot-up. Lots of error messages with “Bad RIP value” were shown up. It turned up it was due to NFS/Client mounting. So after “netmount” was removed from the default runlevel, the rebooting was okay. Now the problem seems solved. The command ibv_devinfo
gives the information I expected.
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.7.710
node_guid: f04d:a290:9778:efe0
sys_image_guid: f04d:a290:9778:efe3
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xB0
board_id: DEL08F0120009
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 6
port_lid: 3
port_lmc: 0x00
link_layer: IB
port: 2
state: PORT_DOWN (1)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: IB
Other diagnostic commands also work fine.
But now a new problem emerges. Although the build-in modules for NFS/RDMA with the kernel (2.6.38-gentoo-r6) were able to load. But whenever I tried to mount a network folder with the rdma protocol, the error message related “Bad RIP value” appeared and the mounting failed. Therefore, I have to switch the traditional TCP protocol. This seems a okay comprise.
After the kernel modules were updated, I installed MVAPICH2 (1.7rc1) using the 3-step installation procedure. I have run some basic test jobs and the osu_benchmarks. It was okay to run the jobs with mpiexec
. But when using mpirun_rsh
, the following errors were produced without successful results.
[unset]: Unable to get host entry for
[unset]: Unable to connect to on 33276
start..Fatal error in MPI_Init:
Other MPI error
...
By checking the source code, it seems the problem is related a function called gethostbyname
which is defined in netdb.h
. How to use the package with PBS is needed to figure out.
Gentoo Cluster: a Strange OpenMPI Problem
July 27th, 2011Yesterday, I tried out some MPI jobs on our gentoo cluster. A really weird problem happened and then solved. One test job is the following mpihello
code. At first, I use both qsub mpihello
and just command-line mpirun -np 16 --hostfile hosts mpihello
. When the number of processes is a low number, say 1 or 2 processes per each node, the jobs end very quickly. But if the number of processes exceeds some threshold, it just hangs there and never ends except being killed by pbs or myself. The threshold seems a larger number when using just mpirun
then using qsub
. The command pbsnodes
shows all nodes are up and free. A debug test shows that the master process does not receive the messages from other processes, that is MPI_Recv
is waiting forever.
Solution: Both Infiniband adapter and Ethernet network cards are running. After the bonded ethernet cards are disabled on node 7 and node 8, the problem is solved. I am still not exactly sure about the cause. Other nodes still have bonded ethernet cards running. But so far, it is in an okay working state.
#include
#include
#include
#include "mpi.h"int main(int argc, char *argv[])
{
int my_rank;
int p;
int source;
int dest;
int tag = 0;
char message[100];
char hostname[100];MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &p);if (my_rank != 0) {
gethostname(hostname, 100);
sprintf(message, "Work: Hello world from process %d, on %s!",
my_rank, hostname);
dest = 0;
MPI_Send(message, strlen(message) + 1, MPI_CHAR,
dest, tag, MPI_COMM_WORLD);
// fprintf(stdout, "going away..., %d, %s\n", my_rank, hostname);
} else {
gethostname(hostname, 100);
printf("p = %d\n", p);
printf("Hello world from master process %d, on %s!\n",
my_rank, hostname);
for (source = 1; source < p; source++) {
MPI_Recv(message, 100, MPI_CHAR, MPI_ANY_SOURCE, tag,
MPI_COMM_WORLD, &status);
printf("Master Revd:%s\n", message);
}
}
MPI_Finalize();
return 0;
}
umount: device is busy
July 27th, 2011sometimes, when i try to umount a mounted device, the following error occurs.
xwang@node1 ~ $ umount /mnt/ps4000e/home
umount.nfs: /mnt/ps4000e/home: device is busy
umount.nfs: /mnt/ps4000e/home: device is busy
No one is logged in except myself which I do not use that directory and no other user’s job is running. It is really a mystery to figure out which process causes the “device is busy”. Use google, I found the solution at http://ocaoimh.ie/2008/02/13/how-to-umount-when-the-device-is-busy/.
The solution is to use fuser
to find out.
xwang # fuser -m /mnt/ps4000e/home/
/mnt/ps4000e/home/: 7706c
See the manual man fuser
for full description of this command. I guess 7706 is the process id which is currently uses the mounted device. Not sure the following letter ‘c’ stands for. So use ps to find out the process.
xwang # ps 7706
PID TTY STAT TIME COMMAND
7706 ? Ss 0:00 /usr/bin/orted --daemonize ....
Now the reason is obvious. I started an mpi job before and it does not end abnormally. The orted
is the administration process started by root and it does not exit. So after the process was killed, the device was able to be detached.
Gentoo: NFS/RDMA (Infiniband)
July 12th, 2011Our cluster system consists of a Dell EqualLogic PS4000e iSCSI SAN (16T) storage array. I used it for database storage and home directory of regular users. The storage array was mounted to the master node using iSCSI initiator, mount point, /mnt/ps4000e/
. Then the sub-directory /mnt/ps4000e/home
was exported across the cluster, so each node has access to the same home directory. So everyday users do not need move their data files between nodes. NFS services provides the network-based mounting. NFS sever/client is easy to install by following the guideline at http://en.gentoo-wiki.com/wiki/NFS/Server
. Data transfer is via the IPoIB mechanism. But since we have Infiniband network, we could use RDMA network. NFS/RDMA achieves much faster speed. Here is my experience to setup NFS/RDMA.
Step 1: Kernel compilation
1) Requirements for NFS Server/Client
For the server node, it is needed to turn on File systems/Network File Systems/NFS server support
.
For the client node, it is needed to turn on File systems/Network File Systems/NFS client support
.
2) Requirements for RDMA support
Drivers for Infiniband should be compiled as module as said in a previous node. Check if RDMA support is enabled. Make sure that SUNRPC_XPRT_RDMA in the .config
file has a value of M.
Step 2: emerge net-fs/nfs-utils
The version of 1.2.3-r1 is installed. The portmap package is no longed needed. Instead, rpcbind as a dependency will be installed instead. If you see the error message that says the nfs-utils package is blocked portmap, un-emerge portmap first. If portmap is pulled by ypserv, un-emerge ypserv and ypbind packages first. After installation of nfs-utils, then emerge ypserv ypbind again.
Step 3: Create the mount point.
edit the /etc/exports file. add the following line,
# /etc/exports: NFS file systems being exported. See exports(5).
/mnt/ps4000e/home 10.0.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_subtree_che
ck,no_root_squash)
The option insecure is important here because the NFS/RDMA client does not use a reserved port.
Step 4: Load necessary modules.
On the server node, svcrdma is needed. On the client node, xprtrdma is needed. I added them into the /etc/init.d/nfs
script file. Put the following sentences into an appropriate place in the init.d file.
# svcrdma: server-side module for NFS/RDMA
# xprtrdma: client-side module for NFS/RDMA
/sbin/modprobe svcrdma > /dev/null 2>&1
/sbin/modprobe xprtrdma > /dev/null 2>&1
Remember to unload them when stopping the services. Or add corresponding rmmod
commands into the script.
Step 5: Instruct the server to listen on the RDMA transport.
echo "rdma 20049" > /proc/fs/nfsd/portlist
I added it into the nfs script as well.
Step 6: Start the NFS service
/etc/init.d/nfs start
Or add the script to the default run level.
rc-update add nfs default
Step 7. Mount the file system on the client node.
First, ensure that the module xprtrdma has been loaded.
modprobe xprtrdma
Then, use the following command to mount the NFS/RDMA server:
mount -o rdma,port=20049 10.0.0.1:/mnt/ps4000e/home /mnt/ps4000e/home
To verify that the mount is using RDMA, run cat /proc/mounts
to check the proto field.
Alternatively for automatic mounting during the boot-up, add the following record to the file /etc/fstab
.
10.0.0.1:/mnt/ps4000e/home /mnt/ps4000e/home nfs _netdev,proto=rd
ma,port=20049 0 2
Use the init.d script netmount
to mount the NFS/RDMA server.
Infiniband Installation on Gentoo (II)
July 10th, 2011In a previous post, I wrote the first part of my experience to install infiniband adapters on a gentoo cluster. Recently, I upgraded the system and found i forgot the details of the installation and setup. So I need to write down what i have done.
Step 1: Turn on the infiniband modules in the kernel as discussed in the previous post.
Step 2: Emerge necessary packages. They were in the science layer, but now (July 2011) moved to the main tree under sys-infiniband category. On my cluster system, the following packages were installed.
sys-infiniband/dapl-2.0.32
sys-infiniband/infiniband-diags-1.5.8
sys-infiniband/libibcm-1.0.5
sys-infiniband/libibcommon-1.1.2_p20090314
sys-infiniband/libibmad-1.3.7
sys-infiniband/libibumad-1.3.7
sys-infiniband/libibverbs-1.1.4
sys-infiniband/libipathverbs-1.2
sys-infiniband/libmlx4-1.0.1
sys-infiniband/libmthca-1.0.5-r2
sys-infiniband/libnes-1.1.1
sys-infiniband/librdmacm-1.0.14.1
sys-infiniband/libsdp-1.1.108
sys-infiniband/openib-1.4
sys-infiniband/openib-files-1.5.3.1
sys-infiniband/opensm-3.3.9
sys-infiniband/perftest-1.3.0
Step 3: Edit configuration file, /etc/infiniband/openib.conf
. The following is the content of my configuration file.
# Start HCA driver upon boot
ONBOOT=yes
# Load UCM module
UCM_LOAD=no
# Load RDMA_CM module
RDMA_CM_LOAD=yes
# Load RDMA_UCM module
RDMA_UCM_LOAD=yes
# Increase ib_mad thread priority
RENICE_IB_MAD=no
# Load MTHCA
MTHCA_LOAD=no
# Load IPATH
IPATH_LOAD=no
# Load eHCA
EHCA_LOAD=no
# Load MLX4 modules
MLX4_LOAD=yes
# Load IPoIB
IPOIB_LOAD=yes
# Enable IPoIB Connected Mode
SET_IPOIB_CM=yes
# Enable IPoIB High Availability daemon
# Xianlong Wang
#IPOIBHA_ENABLE=yes
#PRIMARY_IPOIB_DEV=ib0
#SECONDARY_IPOIB_DEV=ib1
# Load SDP module
#SDP_LOAD=yes
# Load SRP module
#SRP_LOAD=no
# Enable SRP High Availability daemon
#SRPHA_ENABLE=no
# Load ISER module
#ISER_LOAD=no
# Load RDS module
#RDS_LOAD=no
# Load VNIC module
#VNIC_LOAD=yes
Step 4: Edit the init.d script, /etc/init.d/openib
. This is the important part. The original one seems does not load all necessary modules or in the right order. After all the if-clauses for setting POST_LOAD_MODULES, change the following:
PRE_UNLOAD_MODULES="ib_rds ib_ucm kdapl ib_srp_target scsi_target ib_srp ib_iser ib_sdp rdma_ucm rdma_cm ib_addr ib_cm ib_local_sa findex"
POST_UNLOAD_MODULES="$PRE_UNLOAD_MODULES ib_ipoib ib_sa ib_uverbs ib_umad"
to the following (pay attention to those in bold fonts):
#Xianlong Wang
# svcrdma: server-side module for NFS/RDMA
# xprtrdma: client-side module for NFS/RDMA
POST_LOAD_MODULES="$POST_LOAD_MODULES svcrdma xprtrdma"
#Xianlong Wang
#add ib_ipoib before ib_cm
#PRE_UNLOAD_MODULES="ib_rds ib_ucm kdapl ib_srp_target scsi_target ib_srp ib_iser ib_sdp rdma_ucm rdma_cm ib_addr ib_cm ib_local_sa findex"
# add xprtrdma module for NFS server and client
PRE_UNLOAD_MODULES="xprtrdma svcrdma ib_rds ib_ucm kdapl ib_srp_target scsi_target ib_srp ib_iser ib_sdp rdma_ucm rdma_cm ib_addr ib_ipoib ib_cm ib_local_sa findex"
# Xianlong Wang
if [ "X${MLX4_LOAD}" == "Xyes" ]; then
PRE_UNLOAD_MODULES="mlx4_en mlx4_ib mlx4_core ${PRE_UNLOAD_MODULES}"
fi
In the start() function, after einfo "Loading HCA and Access Layer drivers"
, add the following to load the necessary modules:
# Xianlong Wang, hard-coded
if [[ "${MLX4_LOAD}" == "yes" ]]; then
/sbin/modprobe mlx4_core > /dev/null 2>&1
rc=$[ $rc + $? ]
/sbin/modprobe mlx4_ib > /dev/null 2>&1
rc=$[ $rc + $? ]
/sbin/modprobe mlx4_en > /dev/null 2>&1
rc=$[ $rc + $? ]
fi
Step 4: add the init.d scripts, openib and opensm to boot level.
rc-update add openib default
rc-update add opensm default
Step 5: Edit the /etc/conf.d/net
file for IPoverIB settings. Create the symbolic link /etc/init.d/net.ib0
to /etc/init.d/net.lo
.
config_ib0=("10.0.0.1/24")
routes_ib0=("default via 10.0.0.1")
nis_domain_ib0="abc"
nis_servers_ib0="10.0.0.1"
Then add net.ib0
to default run level.
rc-update add net.ib0 default
After rebooting, check the port status by running ibstatus
. The following output is given:
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80:0000:0000:0000:f04d:a290:9778:efbd
base lid: 0x1
sm lid: 0x6
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
link_layer: InfiniBand
Infiniband device 'mlx4_0' port 2 status:
default gid: fe80:0000:0000:0000:f04d:a290:9778:efbe
base lid: 0x0
sm lid: 0x0
state: 1: DOWN
phys state: 2: Polling
rate: 70 Gb/sec (4X)
link_layer: InfiniBand
Using ifconfig
to check the ip address. The following output is given.
ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
RX packets:43233 errors:0 dropped:0 overruns:0 frame:0
TX packets:44438 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:5280799 (5.0 MiB) TX bytes:2771849 (2.6 MiB)
Still is there a problem. ibv_devinfo outputs the following error message.
mlx4: There is a mismatch between the kernel and the userspace libraries: Kernel does not support XRC. Exiting.
Failed to open device
湖南见闻
February 13th, 20111. 山里的人们爱盖楼,无论贫富。这是有较老的木建二层楼房(曾是村小学,现闲置), 现只有极少数村民还住在这样小楼. 大部分改建水泥框架结构,两层到五层不等。经济不富裕的村民建房分多步走,多年完成整个建设工程。
2. 一般村民一层正中是祖先牌位,也有不少村民家供着毛泽东。
3. 湖南丘陵地貌,山沟中雨水充沛,土壤肥沃,气温高,适合水稻生长。或许如此大环境下才成就了袁隆平这位水稻专家。该宣传画介绍他的水稻试验田之一:安江。
Infiniband Installation on Gentoo
December 7th, 2010My college recently purchased a Dell M610 cluster and I am in charge for the administration job. The cluster consists of 8 nodes and each node has two 1Gb Ethernet cards and one Infiniband card(? or whatever it should be called). The nodes are connected with two back-pane Dell PowerConnect 6220 ethernet switch and one Mellanox M3601Q switch on the chassis. The Infiniband switch does not come with subnet management.
I decide to choose Gentoo as the base system for a clean and slim installation, particularly the meta-package administration system, portage, is a great attraction. Basic system installation is no problem and Ethernet cards setup is easy. But Infiniband is a big trouble at the beginning because the OFED package from either Mellanox or OpenFabrics only supports Redhat and SUSE linux boxes. There is no much documentation available and all the packages are rpm packages… Official portage build does not include any Infiniband packages. Gentoo-science overlay has a category sys-infiniband, but it seems they are not well curated and many issues. I spent quite a lot of time to figure out a solution.. Here I record what I did for the future reference and someone who might encounter the same problem.
Step 1 : check the hardware information
lspci
Information regards to the infiniband is as follows:
04:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s – IB QDR / 10GigE] (rev b0)
check if it is supported by the linux kernel at http://kmuto.jp/debian/hcl/.
It is supported for kernel v2.6.25- and use “mlx4_core” driver.
Step 2: kernel compilation
I use the gentoo kernel source 2.6.34-gentoo-r12. Other versions higher than 2.6.25 should be fine, I assume.
a) Device Drivers -> set “Infiniband support” as module. Under “Infiniband support”, set the following ones as module, “Infiniband userspace MAD support, Infiniband userspace access (verbs and CM), Mellanox ConnectX HCA support, IP-over-Infiniband, Infiniband SCSI RDMA Protocol, iSCSI Extensions for RDMA (iser)”. Set “IP-over-InfiniBand Connected Mode Support” as built-in. For other Infiniband cards, choose other drivers than “Mellanox ConnectX HCA”
b) Device Drivers -> Network device support: set Ethernet (10000 Mbit) (Gigabit Eternet card has already been configured). Set “Mellanox Technologies ConnectX 10G support” as module. This provides the driver, mlx4_en.
c) run “make && make modules_install” to compile the kernel.
So far, we have a ready kernel. After rebooting, using the command, lsmod, you should see “mlx4_core” being loaded.