Hello,
We recently installed Amber 11 on our RHELS computational cluster. I build Amber 11 for both CPUs and GPUs. We have 15 computes nodes each with 2 Fermi GPUs installed. All these GPU nodes have QDR Mellanox Infiniband cards installed. One of the users and I can successfully run Amber simulations using pmemd.cuda.MPI over 2 GPUs (that is locally on one of the compute nodes) - the speed up isn't bad. On the other hand I've so far failed to run a simulation using multiple nodes (let's say over 4 GPUs). In this case, the calculation appears to hang, and I see very little output - apart from the GPUs being detected and general set up, etc, etc. I've been working with a couple of the Amber PME benchmarks.
Could anyone please advise us. I've already noted that we have a fairly top notch IB network - the Qlogic switch and Mellanox cards are all QDR. I build pmemd.cuda.MPI with the Intel compilers, cuda 3.1, and OpenMPI 1.3.3. Could it be that I should employ another flavor of MPI or that OpenMPI needs to be configured in a particular way?
Any tips or thoughts would be appreciated, please.
Best regards - David.
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jul 05 2011 - 08:00:03 PDT