Hello,
I'm working on building Amber v11 with the latest set of bug fixes. I'm primarily interested in the cuda performance patch provided by bf17. I now can run amber simulations over multiple gpus/nodes. That is I can run the simulation over 4 gpus (2 nodes -- that is I have 2 gpus installed in each compute node). Up until this morning this simulation was crashing with a segmentation fault.
The key to getting the simulation going was to set CUDA_NIC_INTEROP=1 in the job. Could someone please help get my head around this -- I googled this solution off the web, and I think it's something to do with my NICs not understanding GPUdirect v2, however I'm not completely sure that I really understand the situation. My build environment was -- CUDA v4.0, Intel compilers, Amber v11 (bf1-17) and AmberTools 1.5. I've tried building the parallel (cuda) executable with both mvapich2-1.6 and openmpi-1.4.3. Oddly enough I find that, with one of the Amber benchmarks, I get similar performance with both these MPIs and that does surprise me. I can complete the PME/Cellulose_production_NPT benchmark in 8 minutes on 4 gpus. I'm using generic OFED v1.5.* to "power" the IB network by the way.
Best regards -- David,
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 14 2011 - 04:30:03 PDT