Re: [AMBER] Protocol for multiple CPU+ single GPU run on a single node, from MURAT OZTURK on 2014-05-14 (Amber Archive May 2014)

From: MURAT OZTURK <murozturk.ku.edu.tr>
Date: Wed, 14 May 2014 22:22:24 +0300

No, you need not say more (performance wise). :)

Actually I was curious about the inner workings. The page you linked also
mentions 4 fs timestep. This is not recommended in amber (nor gromacs) as
far as I know. I mostly do 2 fs, sometimes even 1 fs timestep. If their
ns/day measurement is based on that, then it is off by %50. If they do
actually pull off a 4 fs timestep, then how?

Also, is there any way to put comments in the namelist files for
sander/pmemd? I can't find a comment syntax anywhere.

Regards,

Murat

On Wed, May 14, 2014 at 9:49 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> >From the ACEMD website http://www.acellera.com/products/d-soriano/
> (estimating by measuring their plot which has no actual raw numbers just
> bar graphs and a coarse scale)
>
> DHFR NVT 4fs 1 x GTX780 = 212ns/day
> 3 x GTX780 = 282ns/day (THREE GPUS)
>
> AMBER 14 NPT 4fs (note this is with a barostat as well, - don't have NVT
> numbers to hand but they will be faster than NPT for sure).
>
> DHFR NPT 4fs 1 x GTX780 = 239.14ns/day
> 2 x GTX780 = 334.29ns/day (TWO GPUS)
>
> Need I say more?
>
>
> On 5/14/14, 11:35 AM, "MURAT OZTURK" <murozturk.ku.edu.tr> wrote:
>
> >Oh so very exciting. I never quite grasped how fast the gpus are and how
> >slow the bus is, appearntly.
> >
> >Since we are already slightly off-topic, I dare to ask if anybody has any
> >experience with acemd? They claim to have a state-of-the art pure GPU
> >implementation. Do we know if it is principally different than the amber
> >implementation?
> >
> >Regards,
> >
> >Murat
> >
> >
> >On Wed, May 14, 2014 at 9:10 PM, Scott Le Grand
> ><varelse2005.gmail.com>wrote:
> >
> >> I am occasionally tempted to write a multithreaded SSE/AVX version of
> >>PME
> >> for crazy people, but then I wake up from the nightmare and I do
> >>something
> >> useful instead.
> >>
> >>
> >>
> >>
> >> On Wed, May 14, 2014 at 10:39 AM, Ross Walker <ross.rosswalker.co.uk>
> >> wrote:
> >>
> >> > Not strictly true.
> >> >
> >> > pmemd.cuda.MPI is there to facilitate multi-GPU runs either on
> >>different
> >> > nodes (not recommended) or within the same node.
> >> >
> >> > E.g. suppose you have a system with 2 GPUs in it. You could do either:
> >> >
> >> > cd run1
> >> > export CUDA_VISIBLE_DEVICES=0
> >> > nohup $AMBERHOME/bin/pmemd.cuda -O -i ... &
> >> > cd ../run2
> >> > export CUDA_VISIBLE_DEVICES=1
> >> > nohup $AMBERHOME/bin/pmemd.cuda -O -i ... &
> >> >
> >> > And BOTH calculations will run at full speed (using a total of 2 of
> >>your
> >> > CPU cores). This is different from a lot of other codes which have
> >> > contention here since they rely on PCI-E communication on every step
> >> since
> >> > they use the CPU cores as well.
> >> >
> >> > Or you could do:
> >> >
> >> > cd run1
> >> > export CUDA_VISIBLE_DEVICES=0,1
> >> > mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ...
> >> > cd ../run2
> >> > mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ...
> >> >
> >> > This will take longer than two single GPU runs since scaling to
> >>multiple
> >> > GPUs for a single run is far from linear BUT if you want to get run1
> >> > completed as quickly as possible this works.
> >> >
> >> >
> >> > Note if you are using AMBER14 and your two GPUs can talk to each other
> >> via
> >> > peer to peer (should be able to if they are on the same IOH
> >>controller /
> >> > physical CPU socket) and you have true PCI-E gen 3 x16 bandwidth to
> >>each
> >> > then you should see very good multi-GPU performance.
> >> >
> >> > If you have 4 GPUs (you'd need a two socket system right now for this
> >>to
> >> > be full bandwidth) then you could run 2 x 2 GPU runs at the same time
> >> with
> >> > AMBER 14, one using 0 and 1 and one using 2 and 3. - Assuming this
> >> matches
> >> > with how they talk to each other over peer to peer. Or 4 x 1 GPU or 2
> >>x 1
> >> > GPU and 1 x 2 GPU. Currently no production motherboard supports 4 way
> >> peer
> >> > to peer yet but when they do the code should scale well to 4 GPUs.
> >> >
> >> > Multi-node is a bad idea for things other than REMD and other loosely
> >> > coupled stuff with GPUs right now because interconnect bandwidth has
> >> sadly
> >> > not kept up with GPU improvements so modern GPUs (K40, GTX-Titan-Black
> >> > etc) are too fast for the interconnect.
> >> >
> >> > For now what is on http://ambermd.org/gpus/ for running in parallel
> >> > applies to AMBER 12 (even though it is on the AMBER 14 page) - I have
> >>not
> >> > had a chance to update it yet. I am just finalizing a short piece of
> >>code
> >> > that will test which GPUs can communicate via peer to peer in a node
> >>so
> >> > one knows what to set CUDA_VISIBLE_DEVICES to and then I'll update
> >>that
> >> > section.
> >> >
> >> > In terms of performance - see http://ambermd.org/gpus/benchmarks.htm
> >>for
> >> > updated numbers with AMBER 14. From my experience if you run like for
> >> like
> >> > simulations with gromacs (that is NOT doing crazy things like only
> >> > updating the pair list every 20 steps and other such hacks) then I
> >>think
> >> > you will find that AMBER on a single GPU beats Gromacs on two GPUs -
> >>and
> >> > add to that the cumulative performance running two single GPU jobs
> >>one on
> >> > each GPU then it wins hands down. For raw throughput on a single job
> >> using
> >> > two GPUs AMBER 14 should be faster, from the testing I have done
> >>trying
> >> to
> >> > run identical calculations, than any other MD code right now on the
> >>same
> >> > hardware.
> >> >
> >> > And you still get your remaining CPU cores free to run some QM/MM or
> >> other
> >> > such calculation on. Bonus! ;-)
> >> >
> >> > Hope that helps. Sorry the instructions on the website are not
> >>current -
> >> I
> >> > am trying to get it done as quickly as possible.
> >> >
> >> > All the best
> >> > Ross
> >> >
> >> >
> >> > On 5/14/14, 10:13 AM, "MURAT OZTURK" <murozturk.ku.edu.tr> wrote:
> >> >
> >> > >To clarify, pmemd.cuda.MPI is only there to facilitate multi GPU runs
> >> when
> >> > >GPUs are on different nodes then?
> >> > >
> >> > >This is very different than gromacs where I can do multi cpu + multi
> >> gpu.
> >> > >I
> >> > >wonder how the performance will compare.
> >> > >
> >> > >
> >> > >On Wed, May 14, 2014 at 6:57 PM, Ross Walker <ross.rosswalker.co.uk>
> >> > >wrote:
> >> > >
> >> > >> To add to Jason's answer - you can of course use the remaining 19
> >>CPUs
> >> > >> (make sure there are really 20 cores in your machine and not 10
> >>cores
> >> +
> >> > >>10
> >> > >> hyperthreads) for something else while the GPU run is running.
> >> > >>
> >> > >> cd GPU_run
> >> > >> nohup $AMBERHOME/bin/pmemd.cuda -O -i ... &
> >> > >> cd ../CPU_run
> >> > >> nohup mpirun -np 19 $AMBERHOME/bin/pmemd.MPI -O -i ... &
> >> > >>
> >> > >> All the best
> >> > >> Ross
> >> > >>
> >> > >>
> >> > >> On 5/14/14, 8:17 AM, "Jason Swails" <jason.swails.gmail.com>
> wrote:
> >> > >>
> >> > >> >On Wed, 2014-05-14 at 17:49 +0300, MURAT OZTURK wrote:
> >> > >> >> I will be running on a single node with 20 cpus and 1 gpu
> >> installed.
> >> > >> >>
> >> > >> >> Do I have to use pmemd.cuda.MPI for this, or is pmemd.cuda
> >> enough..?
> >> > >> >>
> >> > >> >> How do I specify the number of cpus used with pmemd.cuda? I
> >>can't
> >> > >>seem
> >> > >> >>to
> >> > >> >> find this information in the manual.
> >> > >> >
> >> > >> >Just pmemd.cuda. The thing about pmemd.cuda is that it runs the
> >> > >> >_entire_ calculation on the GPU, so adding CPUs buys you nothing.
> >> > >> >
> >> > >> >The way it is designed, each CPU thread will launch a GPU thread
> >>as
> >> > >>well
> >> > >> >(so you are stuck using 1 CPU for each GPU).
> >> > >> >
> >> > >> >HTH,
> >> > >> >Jason
> >> > >> >
> >> > >> >--
> >> > >> >Jason M. Swails
> >> > >> >BioMaPS,
> >> > >> >Rutgers University
> >> > >> >Postdoctoral Researcher
> >> > >> >
> >> > >> >
> >> > >> >_______________________________________________
> >> > >> >AMBER mailing list
> >> > >> >AMBER.ambermd.org
> >> > >> >http://lists.ambermd.org/mailman/listinfo/amber
> >> > >>
> >> > >>
> >> > >>
> >> > >> _______________________________________________
> >> > >> AMBER mailing list
> >> > >> AMBER.ambermd.org
> >> > >> http://lists.ambermd.org/mailman/listinfo/amber
> >> > >>
> >> > >_______________________________________________
> >> > >AMBER mailing list
> >> > >AMBER.ambermd.org
> >> > >http://lists.ambermd.org/mailman/listinfo/amber
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > AMBER mailing list
> >> > AMBER.ambermd.org
> >> > http://lists.ambermd.org/mailman/listinfo/amber
> >> >
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed May 14 2014 - 12:30:02 PDT