Re: [AMBER] Protocol for multiple CPU+ single GPU run on a single node, from Ross Walker on 2014-05-14 (Amber Archive May 2014)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 14 May 2014 11:49:41 -0700

>From the ACEMD website http://www.acellera.com/products/d-soriano/
(estimating by measuring their plot which has no actual raw numbers just
bar graphs and a coarse scale)

DHFR NVT 4fs 1 x GTX780 = 212ns/day
3 x GTX780 = 282ns/day (THREE GPUS)

AMBER 14 NPT 4fs (note this is with a barostat as well, - don't have NVT
numbers to hand but they will be faster than NPT for sure).

DHFR NPT 4fs 1 x GTX780 = 239.14ns/day
2 x GTX780 = 334.29ns/day (TWO GPUS)

Need I say more?

On 5/14/14, 11:35 AM, "MURAT OZTURK" <murozturk.ku.edu.tr> wrote:

>Oh so very exciting. I never quite grasped how fast the gpus are and how
>slow the bus is, appearntly.
>
>Since we are already slightly off-topic, I dare to ask if anybody has any
>experience with acemd? They claim to have a state-of-the art pure GPU
>implementation. Do we know if it is principally different than the amber
>implementation?
>
>Regards,
>
>Murat
>
>
>On Wed, May 14, 2014 at 9:10 PM, Scott Le Grand
><varelse2005.gmail.com>wrote:
>
>> I am occasionally tempted to write a multithreaded SSE/AVX version of
>>PME
>> for crazy people, but then I wake up from the nightmare and I do
>>something
>> useful instead.
>>
>>
>>
>>
>> On Wed, May 14, 2014 at 10:39 AM, Ross Walker <ross.rosswalker.co.uk>
>> wrote:
>>
>> > Not strictly true.
>> >
>> > pmemd.cuda.MPI is there to facilitate multi-GPU runs either on
>>different
>> > nodes (not recommended) or within the same node.
>> >
>> > E.g. suppose you have a system with 2 GPUs in it. You could do either:
>> >
>> > cd run1
>> > export CUDA_VISIBLE_DEVICES=0
>> > nohup $AMBERHOME/bin/pmemd.cuda -O -i ... &
>> > cd ../run2
>> > export CUDA_VISIBLE_DEVICES=1
>> > nohup $AMBERHOME/bin/pmemd.cuda -O -i ... &
>> >
>> > And BOTH calculations will run at full speed (using a total of 2 of
>>your
>> > CPU cores). This is different from a lot of other codes which have
>> > contention here since they rely on PCI-E communication on every step
>> since
>> > they use the CPU cores as well.
>> >
>> > Or you could do:
>> >
>> > cd run1
>> > export CUDA_VISIBLE_DEVICES=0,1
>> > mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ...
>> > cd ../run2
>> > mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ...
>> >
>> > This will take longer than two single GPU runs since scaling to
>>multiple
>> > GPUs for a single run is far from linear BUT if you want to get run1
>> > completed as quickly as possible this works.
>> >
>> >
>> > Note if you are using AMBER14 and your two GPUs can talk to each other
>> via
>> > peer to peer (should be able to if they are on the same IOH
>>controller /
>> > physical CPU socket) and you have true PCI-E gen 3 x16 bandwidth to
>>each
>> > then you should see very good multi-GPU performance.
>> >
>> > If you have 4 GPUs (you'd need a two socket system right now for this
>>to
>> > be full bandwidth) then you could run 2 x 2 GPU runs at the same time
>> with
>> > AMBER 14, one using 0 and 1 and one using 2 and 3. - Assuming this
>> matches
>> > with how they talk to each other over peer to peer. Or 4 x 1 GPU or 2
>>x 1
>> > GPU and 1 x 2 GPU. Currently no production motherboard supports 4 way
>> peer
>> > to peer yet but when they do the code should scale well to 4 GPUs.
>> >
>> > Multi-node is a bad idea for things other than REMD and other loosely
>> > coupled stuff with GPUs right now because interconnect bandwidth has
>> sadly
>> > not kept up with GPU improvements so modern GPUs (K40, GTX-Titan-Black
>> > etc) are too fast for the interconnect.
>> >
>> > For now what is on http://ambermd.org/gpus/ for running in parallel
>> > applies to AMBER 12 (even though it is on the AMBER 14 page) - I have
>>not
>> > had a chance to update it yet. I am just finalizing a short piece of
>>code
>> > that will test which GPUs can communicate via peer to peer in a node
>>so
>> > one knows what to set CUDA_VISIBLE_DEVICES to and then I'll update
>>that
>> > section.
>> >
>> > In terms of performance - see http://ambermd.org/gpus/benchmarks.htm
>>for
>> > updated numbers with AMBER 14. From my experience if you run like for
>> like
>> > simulations with gromacs (that is NOT doing crazy things like only
>> > updating the pair list every 20 steps and other such hacks) then I
>>think
>> > you will find that AMBER on a single GPU beats Gromacs on two GPUs -
>>and
>> > add to that the cumulative performance running two single GPU jobs
>>one on
>> > each GPU then it wins hands down. For raw throughput on a single job
>> using
>> > two GPUs AMBER 14 should be faster, from the testing I have done
>>trying
>> to
>> > run identical calculations, than any other MD code right now on the
>>same
>> > hardware.
>> >
>> > And you still get your remaining CPU cores free to run some QM/MM or
>> other
>> > such calculation on. Bonus! ;-)
>> >
>> > Hope that helps. Sorry the instructions on the website are not
>>current -
>> I
>> > am trying to get it done as quickly as possible.
>> >
>> > All the best
>> > Ross
>> >
>> >
>> > On 5/14/14, 10:13 AM, "MURAT OZTURK" <murozturk.ku.edu.tr> wrote:
>> >
>> > >To clarify, pmemd.cuda.MPI is only there to facilitate multi GPU runs
>> when
>> > >GPUs are on different nodes then?
>> > >
>> > >This is very different than gromacs where I can do multi cpu + multi
>> gpu.
>> > >I
>> > >wonder how the performance will compare.
>> > >
>> > >
>> > >On Wed, May 14, 2014 at 6:57 PM, Ross Walker <ross.rosswalker.co.uk>
>> > >wrote:
>> > >
>> > >> To add to Jason's answer - you can of course use the remaining 19
>>CPUs
>> > >> (make sure there are really 20 cores in your machine and not 10
>>cores
>> +
>> > >>10
>> > >> hyperthreads) for something else while the GPU run is running.
>> > >>
>> > >> cd GPU_run
>> > >> nohup $AMBERHOME/bin/pmemd.cuda -O -i ... &
>> > >> cd ../CPU_run
>> > >> nohup mpirun -np 19 $AMBERHOME/bin/pmemd.MPI -O -i ... &
>> > >>
>> > >> All the best
>> > >> Ross
>> > >>
>> > >>
>> > >> On 5/14/14, 8:17 AM, "Jason Swails" <jason.swails.gmail.com> wrote:
>> > >>
>> > >> >On Wed, 2014-05-14 at 17:49 +0300, MURAT OZTURK wrote:
>> > >> >> I will be running on a single node with 20 cpus and 1 gpu
>> installed.
>> > >> >>
>> > >> >> Do I have to use pmemd.cuda.MPI for this, or is pmemd.cuda
>> enough..?
>> > >> >>
>> > >> >> How do I specify the number of cpus used with pmemd.cuda? I
>>can't
>> > >>seem
>> > >> >>to
>> > >> >> find this information in the manual.
>> > >> >
>> > >> >Just pmemd.cuda. The thing about pmemd.cuda is that it runs the
>> > >> >_entire_ calculation on the GPU, so adding CPUs buys you nothing.
>> > >> >
>> > >> >The way it is designed, each CPU thread will launch a GPU thread
>>as
>> > >>well
>> > >> >(so you are stuck using 1 CPU for each GPU).
>> > >> >
>> > >> >HTH,
>> > >> >Jason
>> > >> >
>> > >> >--
>> > >> >Jason M. Swails
>> > >> >BioMaPS,
>> > >> >Rutgers University
>> > >> >Postdoctoral Researcher
>> > >> >
>> > >> >
>> > >> >_______________________________________________
>> > >> >AMBER mailing list
>> > >> >AMBER.ambermd.org
>> > >> >http://lists.ambermd.org/mailman/listinfo/amber
>> > >>
>> > >>
>> > >>
>> > >> _______________________________________________
>> > >> AMBER mailing list
>> > >> AMBER.ambermd.org
>> > >> http://lists.ambermd.org/mailman/listinfo/amber
>> > >>
>> > >_______________________________________________
>> > >AMBER mailing list
>> > >AMBER.ambermd.org
>> > >http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> >
>> >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed May 14 2014 - 12:00:03 PDT