I am occasionally tempted to write a multithreaded SSE/AVX version of PME
for crazy people, but then I wake up from the nightmare and I do something
useful instead.
On Wed, May 14, 2014 at 10:39 AM, Ross Walker <ross.rosswalker.co.uk> wrote:
> Not strictly true.
>
> pmemd.cuda.MPI is there to facilitate multi-GPU runs either on different
> nodes (not recommended) or within the same node.
>
> E.g. suppose you have a system with 2 GPUs in it. You could do either:
>
> cd run1
> export CUDA_VISIBLE_DEVICES=0
> nohup $AMBERHOME/bin/pmemd.cuda -O -i ... &
> cd ../run2
> export CUDA_VISIBLE_DEVICES=1
> nohup $AMBERHOME/bin/pmemd.cuda -O -i ... &
>
> And BOTH calculations will run at full speed (using a total of 2 of your
> CPU cores). This is different from a lot of other codes which have
> contention here since they rely on PCI-E communication on every step since
> they use the CPU cores as well.
>
> Or you could do:
>
> cd run1
> export CUDA_VISIBLE_DEVICES=0,1
> mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ...
> cd ../run2
> mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ...
>
> This will take longer than two single GPU runs since scaling to multiple
> GPUs for a single run is far from linear BUT if you want to get run1
> completed as quickly as possible this works.
>
>
> Note if you are using AMBER14 and your two GPUs can talk to each other via
> peer to peer (should be able to if they are on the same IOH controller /
> physical CPU socket) and you have true PCI-E gen 3 x16 bandwidth to each
> then you should see very good multi-GPU performance.
>
> If you have 4 GPUs (you'd need a two socket system right now for this to
> be full bandwidth) then you could run 2 x 2 GPU runs at the same time with
> AMBER 14, one using 0 and 1 and one using 2 and 3. - Assuming this matches
> with how they talk to each other over peer to peer. Or 4 x 1 GPU or 2 x 1
> GPU and 1 x 2 GPU. Currently no production motherboard supports 4 way peer
> to peer yet but when they do the code should scale well to 4 GPUs.
>
> Multi-node is a bad idea for things other than REMD and other loosely
> coupled stuff with GPUs right now because interconnect bandwidth has sadly
> not kept up with GPU improvements so modern GPUs (K40, GTX-Titan-Black
> etc) are too fast for the interconnect.
>
> For now what is on http://ambermd.org/gpus/ for running in parallel
> applies to AMBER 12 (even though it is on the AMBER 14 page) - I have not
> had a chance to update it yet. I am just finalizing a short piece of code
> that will test which GPUs can communicate via peer to peer in a node so
> one knows what to set CUDA_VISIBLE_DEVICES to and then I'll update that
> section.
>
> In terms of performance - see http://ambermd.org/gpus/benchmarks.htm for
> updated numbers with AMBER 14. From my experience if you run like for like
> simulations with gromacs (that is NOT doing crazy things like only
> updating the pair list every 20 steps and other such hacks) then I think
> you will find that AMBER on a single GPU beats Gromacs on two GPUs - and
> add to that the cumulative performance running two single GPU jobs one on
> each GPU then it wins hands down. For raw throughput on a single job using
> two GPUs AMBER 14 should be faster, from the testing I have done trying to
> run identical calculations, than any other MD code right now on the same
> hardware.
>
> And you still get your remaining CPU cores free to run some QM/MM or other
> such calculation on. Bonus! ;-)
>
> Hope that helps. Sorry the instructions on the website are not current - I
> am trying to get it done as quickly as possible.
>
> All the best
> Ross
>
>
> On 5/14/14, 10:13 AM, "MURAT OZTURK" <murozturk.ku.edu.tr> wrote:
>
> >To clarify, pmemd.cuda.MPI is only there to facilitate multi GPU runs when
> >GPUs are on different nodes then?
> >
> >This is very different than gromacs where I can do multi cpu + multi gpu.
> >I
> >wonder how the performance will compare.
> >
> >
> >On Wed, May 14, 2014 at 6:57 PM, Ross Walker <ross.rosswalker.co.uk>
> >wrote:
> >
> >> To add to Jason's answer - you can of course use the remaining 19 CPUs
> >> (make sure there are really 20 cores in your machine and not 10 cores +
> >>10
> >> hyperthreads) for something else while the GPU run is running.
> >>
> >> cd GPU_run
> >> nohup $AMBERHOME/bin/pmemd.cuda -O -i ... &
> >> cd ../CPU_run
> >> nohup mpirun -np 19 $AMBERHOME/bin/pmemd.MPI -O -i ... &
> >>
> >> All the best
> >> Ross
> >>
> >>
> >> On 5/14/14, 8:17 AM, "Jason Swails" <jason.swails.gmail.com> wrote:
> >>
> >> >On Wed, 2014-05-14 at 17:49 +0300, MURAT OZTURK wrote:
> >> >> I will be running on a single node with 20 cpus and 1 gpu installed.
> >> >>
> >> >> Do I have to use pmemd.cuda.MPI for this, or is pmemd.cuda enough..?
> >> >>
> >> >> How do I specify the number of cpus used with pmemd.cuda? I can't
> >>seem
> >> >>to
> >> >> find this information in the manual.
> >> >
> >> >Just pmemd.cuda. The thing about pmemd.cuda is that it runs the
> >> >_entire_ calculation on the GPU, so adding CPUs buys you nothing.
> >> >
> >> >The way it is designed, each CPU thread will launch a GPU thread as
> >>well
> >> >(so you are stuck using 1 CPU for each GPU).
> >> >
> >> >HTH,
> >> >Jason
> >> >
> >> >--
> >> >Jason M. Swails
> >> >BioMaPS,
> >> >Rutgers University
> >> >Postdoctoral Researcher
> >> >
> >> >
> >> >_______________________________________________
> >> >AMBER mailing list
> >> >AMBER.ambermd.org
> >> >http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed May 14 2014 - 11:30:03 PDT