Re: [AMBER] Running Amber 11 simulations using pmemd.cuda.MPI

From: Baker D.J. <>
Date: Wed, 6 Jul 2011 16:45:56 +0100

Hi Ross,

Thank you for your advice. I've spend the day working on mvapich2 and Amber. Rebuilding pmemd.cuda.MPI using mvapich2 is exactly what's needed. Using 4 GPUs (that is two compute nodes) on the PME/Cellulose_production_NPT benchmark example I can get the simulation done in 15 minutes. This is excellent scaling since the same simulation takes 30 mins using 2 GPUs.

I'll need to open this testing up to some of the Amber users here before we can it a success. Pity that Openmpi doesn't do the job -- I'm not that keen to have to offer another flavor of MPI2 on the cluster. Taking a look at the latest version of OpenMPI, off hand, it appears that they are no way close to supporting GPUs properly.

Best regards -- David.

-----Original Message-----
From: Ross Walker []
Sent: Tuesday, July 05, 2011 4:58 PM
To: 'AMBER Mailing List'
Subject: Re: [AMBER] Running Amber 11 simulations using pmemd.cuda.MPI

Hi David,

> We recently installed Amber 11 on our RHELS computational cluster. I
> build Amber 11 for both CPUs and GPUs. We have 15 computes nodes each
> with 2 Fermi GPUs installed. All these GPU nodes have QDR Mellanox
> Infiniband cards installed. One of the users and I can successfully
> run Amber simulations using pmemd.cuda.MPI over 2 GPUs (that is
> locally on one of the compute nodes) - the speed up isn't bad. On the
> other hand I've so far failed to run a simulation using multiple nodes
> (let's say over 4 GPUs). In this case, the calculation appears to
> hang, and I see very little output - apart from the GPUs being
> detected and general set up, etc, etc. I've been working with a couple
> of the Amber PME benchmarks.

Have you tested the CPU code across multiple nodes? I assume this scales fine? - You should check that just to make sure. In particular make sure things are being routed correctly over the IB interface and not TCP/IP for example. Also make sure you aren't sharing the IB interface with NFS traffic or IP traffic for example.

> Could anyone please advise us. I've already noted that we have a
> fairly top notch IB network - the Qlogic switch and Mellanox cards are
> all QDR. I build pmemd.cuda.MPI with the Intel compilers, cuda 3.1,
> and OpenMPI 1.3.3. Could it be that I should employ another flavor of
> MPI or that OpenMPI needs to be configured in a particular way?

1) Use CUDA 3.2, it fixes a LOT. Also make sure you are using AMBER with all of the latest bugfixes applied. Check

2) I highly advise AGAINST using OpenMPI. It's performance is pretty terrible. I suggest using MVAPICH 2. We use MVAPICH2-1.5 which is what the benchmarks on the page were done with. This was 2 GPUs per node and 1 QDR IB card per node. Check that your IB card is in a
X16 slot (along with both GPUs) otherwise you won't be getting the maximum performance out of the IB card. You should also enable GPU direct in the MVAPICH setup which the Mellanox cards should support. I am not entirely sure how to enable this in the MVAPICH setup though as I have never had to build the cluster software stack myself. You can check with Mellanox directly, they should have a white paper explaining how to do this.

Also start with just 1 GPU per node (use the export CUDA_VISIBLE_DEVICES on each node and make sure your NODEFILE is setup to give processes out to each node in turn) and see if you can scale.

Having said that before you do the above make sure you are running correctly across the nodes. That is for 4 GPUs you should do mpirun -np 4. Make sure the first 2 threads get given to node 0 and the next 2 to node 1. If you have 8 core nodes and use a default NODEFILE it will end up putting all 4 threads on the first node so you end up running 4 GPU tasks on 2 GPUs and performance is thus utterly destroyed. So check this carefully before you do all the above. I do recommend MVAPICH2 and GPU Direct though.

All the best

|\oss Walker

| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| | |
| Tel: +1 858 822 0854 | EMail:- |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

AMBER mailing list

AMBER mailing list
Received on Wed Jul 06 2011 - 09:00:03 PDT
Custom Search