Re: [AMBER] Parallel limit? from Ross Walker on 2011-09-28 (Amber Archive Sep 2011)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 28 Sep 2011 20:45:38 -0700

Hi Hu,

> I'm trying to run pmemd.cuda.MPI in parallel on a GPU cluster (one
> C2050/node).
> When using a few nodes it worked, but errors occured when using more
> nodes.
> The top limit seems to have something to do with the simulated systems.
> For some system even 96 nodes would work, while some not.
> I'm using PME and am sure there are 32x more atoms than nodes.
> The only messages I could find is:
>
> forrtl: severe (174): SIGSEGV, segmentation fault occurred

Try setting this in your bashrc file

export CUDA_NIC_INTEROP=1

And make sure all nodes process it.

As for scaling in parallel. The current scaling limit is approximately 8
GPUs. We've never even attempted to run the code past 16 GPUs!

See http://ambermd.org/gpus/benchmarks.htm#Benchmarks for example
benchmarks.

Replica Exchange simulations will be able to support many many more GPUs but
this support will not be released until AMBER 12.

Unfortunately it is unrealistic to expect GPU scaling for more than 8 nodes
at present due to the fact that individual nodes are now massively powerful
compared to the CPUs in those nodes but nobody has come up with a <100ns
latency PetaByte bandwidth interconnect yet to actually balance things out.
We might be able to use neutrinos since they seem to travel faster than the
speed of light these days. ;-)

I am hoping we can get to 16 GPUs once we have a GPU parallel FFT but that
will likely need PCI-E Gen 3 + FDR infiniband and a bunch of other tweaks.

Of course we could scale to many more nodes by just making the code really
really slow - just insert noops between all the machine code instructions
will immediately improve scaling but this kind of defeats the point. Unless
you are writing a comp sci communication on how great things scale. ;-)

Your best bet right now is to run multiple different simulations at once,
each one using between 4 and 8 GPUs. You should always run multiple
simulations with different ig (random seed) values anyway so run 16 runs by
8 GPUs instead of trying to use 96 GPUs for a single run.

All the best
Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 28 2011 - 21:00:04 PDT