Re: [AMBER] Protocol for multiple CPU+ single GPU run on a single node,

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 14 May 2014 12:36:11 -0700

They do it EXACTLY the same way we do it in AMBER 14. Using hydrogen mass
repartitioning. In fact the only reason we actually added support for it
to AMBER 14 was because I was pissed off with people comparing runs with a
4fs timestep against amber running with a 2fs one and could never convince
them to do a fair comparison. So now one can run 4fs and compare against
other codes.

However, in the act of doing this Adrian and I have actually tested 4fs
with HMR and found out that, at least in the preliminary testing it
actually seems to work pretty well. A paper on it is in the works.

All the best
Ross



On 5/14/14, 12:22 PM, "MURAT OZTURK" <murozturk.ku.edu.tr> wrote:

>No, you need not say more (performance wise). :)
>
>Actually I was curious about the inner workings. The page you linked also
>mentions 4 fs timestep. This is not recommended in amber (nor gromacs) as
>far as I know. I mostly do 2 fs, sometimes even 1 fs timestep. If their
>ns/day measurement is based on that, then it is off by %50. If they do
>actually pull off a 4 fs timestep, then how?
>
>Also, is there any way to put comments in the namelist files for
>sander/pmemd? I can't find a comment syntax anywhere.
>
>Regards,
>
>Murat
>
>
>
>
>
>
>On Wed, May 14, 2014 at 9:49 PM, Ross Walker <ross.rosswalker.co.uk>
>wrote:
>
>> >From the ACEMD website http://www.acellera.com/products/d-soriano/
>> (estimating by measuring their plot which has no actual raw numbers just
>> bar graphs and a coarse scale)
>>
>> DHFR NVT 4fs 1 x GTX780 = 212ns/day
>> 3 x GTX780 = 282ns/day (THREE GPUS)
>>
>> AMBER 14 NPT 4fs (note this is with a barostat as well, - don't have NVT
>> numbers to hand but they will be faster than NPT for sure).
>>
>> DHFR NPT 4fs 1 x GTX780 = 239.14ns/day
>> 2 x GTX780 = 334.29ns/day (TWO GPUS)
>>
>> Need I say more?
>>
>>
>> On 5/14/14, 11:35 AM, "MURAT OZTURK" <murozturk.ku.edu.tr> wrote:
>>
>> >Oh so very exciting. I never quite grasped how fast the gpus are and
>>how
>> >slow the bus is, appearntly.
>> >
>> >Since we are already slightly off-topic, I dare to ask if anybody has
>>any
>> >experience with acemd? They claim to have a state-of-the art pure GPU
>> >implementation. Do we know if it is principally different than the
>>amber
>> >implementation?
>> >
>> >Regards,
>> >
>> >Murat
>> >
>> >
>> >On Wed, May 14, 2014 at 9:10 PM, Scott Le Grand
>> ><varelse2005.gmail.com>wrote:
>> >
>> >> I am occasionally tempted to write a multithreaded SSE/AVX version of
>> >>PME
>> >> for crazy people, but then I wake up from the nightmare and I do
>> >>something
>> >> useful instead.
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, May 14, 2014 at 10:39 AM, Ross Walker <ross.rosswalker.co.uk>
>> >> wrote:
>> >>
>> >> > Not strictly true.
>> >> >
>> >> > pmemd.cuda.MPI is there to facilitate multi-GPU runs either on
>> >>different
>> >> > nodes (not recommended) or within the same node.
>> >> >
>> >> > E.g. suppose you have a system with 2 GPUs in it. You could do
>>either:
>> >> >
>> >> > cd run1
>> >> > export CUDA_VISIBLE_DEVICES=0
>> >> > nohup $AMBERHOME/bin/pmemd.cuda -O -i ... &
>> >> > cd ../run2
>> >> > export CUDA_VISIBLE_DEVICES=1
>> >> > nohup $AMBERHOME/bin/pmemd.cuda -O -i ... &
>> >> >
>> >> > And BOTH calculations will run at full speed (using a total of 2 of
>> >>your
>> >> > CPU cores). This is different from a lot of other codes which have
>> >> > contention here since they rely on PCI-E communication on every
>>step
>> >> since
>> >> > they use the CPU cores as well.
>> >> >
>> >> > Or you could do:
>> >> >
>> >> > cd run1
>> >> > export CUDA_VISIBLE_DEVICES=0,1
>> >> > mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ...
>> >> > cd ../run2
>> >> > mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ...
>> >> >
>> >> > This will take longer than two single GPU runs since scaling to
>> >>multiple
>> >> > GPUs for a single run is far from linear BUT if you want to get
>>run1
>> >> > completed as quickly as possible this works.
>> >> >
>> >> >
>> >> > Note if you are using AMBER14 and your two GPUs can talk to each
>>other
>> >> via
>> >> > peer to peer (should be able to if they are on the same IOH
>> >>controller /
>> >> > physical CPU socket) and you have true PCI-E gen 3 x16 bandwidth to
>> >>each
>> >> > then you should see very good multi-GPU performance.
>> >> >
>> >> > If you have 4 GPUs (you'd need a two socket system right now for
>>this
>> >>to
>> >> > be full bandwidth) then you could run 2 x 2 GPU runs at the same
>>time
>> >> with
>> >> > AMBER 14, one using 0 and 1 and one using 2 and 3. - Assuming this
>> >> matches
>> >> > with how they talk to each other over peer to peer. Or 4 x 1 GPU
>>or 2
>> >>x 1
>> >> > GPU and 1 x 2 GPU. Currently no production motherboard supports 4
>>way
>> >> peer
>> >> > to peer yet but when they do the code should scale well to 4 GPUs.
>> >> >
>> >> > Multi-node is a bad idea for things other than REMD and other
>>loosely
>> >> > coupled stuff with GPUs right now because interconnect bandwidth
>>has
>> >> sadly
>> >> > not kept up with GPU improvements so modern GPUs (K40,
>>GTX-Titan-Black
>> >> > etc) are too fast for the interconnect.
>> >> >
>> >> > For now what is on http://ambermd.org/gpus/ for running in parallel
>> >> > applies to AMBER 12 (even though it is on the AMBER 14 page) - I
>>have
>> >>not
>> >> > had a chance to update it yet. I am just finalizing a short piece
>>of
>> >>code
>> >> > that will test which GPUs can communicate via peer to peer in a
>>node
>> >>so
>> >> > one knows what to set CUDA_VISIBLE_DEVICES to and then I'll update
>> >>that
>> >> > section.
>> >> >
>> >> > In terms of performance - see
>>http://ambermd.org/gpus/benchmarks.htm
>> >>for
>> >> > updated numbers with AMBER 14. From my experience if you run like
>>for
>> >> like
>> >> > simulations with gromacs (that is NOT doing crazy things like only
>> >> > updating the pair list every 20 steps and other such hacks) then I
>> >>think
>> >> > you will find that AMBER on a single GPU beats Gromacs on two GPUs
>>-
>> >>and
>> >> > add to that the cumulative performance running two single GPU jobs
>> >>one on
>> >> > each GPU then it wins hands down. For raw throughput on a single
>>job
>> >> using
>> >> > two GPUs AMBER 14 should be faster, from the testing I have done
>> >>trying
>> >> to
>> >> > run identical calculations, than any other MD code right now on the
>> >>same
>> >> > hardware.
>> >> >
>> >> > And you still get your remaining CPU cores free to run some QM/MM
>>or
>> >> other
>> >> > such calculation on. Bonus! ;-)
>> >> >
>> >> > Hope that helps. Sorry the instructions on the website are not
>> >>current -
>> >> I
>> >> > am trying to get it done as quickly as possible.
>> >> >
>> >> > All the best
>> >> > Ross
>> >> >
>> >> >
>> >> > On 5/14/14, 10:13 AM, "MURAT OZTURK" <murozturk.ku.edu.tr> wrote:
>> >> >
>> >> > >To clarify, pmemd.cuda.MPI is only there to facilitate multi GPU
>>runs
>> >> when
>> >> > >GPUs are on different nodes then?
>> >> > >
>> >> > >This is very different than gromacs where I can do multi cpu +
>>multi
>> >> gpu.
>> >> > >I
>> >> > >wonder how the performance will compare.
>> >> > >
>> >> > >
>> >> > >On Wed, May 14, 2014 at 6:57 PM, Ross Walker
>><ross.rosswalker.co.uk>
>> >> > >wrote:
>> >> > >
>> >> > >> To add to Jason's answer - you can of course use the remaining
>>19
>> >>CPUs
>> >> > >> (make sure there are really 20 cores in your machine and not 10
>> >>cores
>> >> +
>> >> > >>10
>> >> > >> hyperthreads) for something else while the GPU run is running.
>> >> > >>
>> >> > >> cd GPU_run
>> >> > >> nohup $AMBERHOME/bin/pmemd.cuda -O -i ... &
>> >> > >> cd ../CPU_run
>> >> > >> nohup mpirun -np 19 $AMBERHOME/bin/pmemd.MPI -O -i ... &
>> >> > >>
>> >> > >> All the best
>> >> > >> Ross
>> >> > >>
>> >> > >>
>> >> > >> On 5/14/14, 8:17 AM, "Jason Swails" <jason.swails.gmail.com>
>> wrote:
>> >> > >>
>> >> > >> >On Wed, 2014-05-14 at 17:49 +0300, MURAT OZTURK wrote:
>> >> > >> >> I will be running on a single node with 20 cpus and 1 gpu
>> >> installed.
>> >> > >> >>
>> >> > >> >> Do I have to use pmemd.cuda.MPI for this, or is pmemd.cuda
>> >> enough..?
>> >> > >> >>
>> >> > >> >> How do I specify the number of cpus used with pmemd.cuda? I
>> >>can't
>> >> > >>seem
>> >> > >> >>to
>> >> > >> >> find this information in the manual.
>> >> > >> >
>> >> > >> >Just pmemd.cuda. The thing about pmemd.cuda is that it runs
>>the
>> >> > >> >_entire_ calculation on the GPU, so adding CPUs buys you
>>nothing.
>> >> > >> >
>> >> > >> >The way it is designed, each CPU thread will launch a GPU
>>thread
>> >>as
>> >> > >>well
>> >> > >> >(so you are stuck using 1 CPU for each GPU).
>> >> > >> >
>> >> > >> >HTH,
>> >> > >> >Jason
>> >> > >> >
>> >> > >> >--
>> >> > >> >Jason M. Swails
>> >> > >> >BioMaPS,
>> >> > >> >Rutgers University
>> >> > >> >Postdoctoral Researcher
>> >> > >> >
>> >> > >> >
>> >> > >> >_______________________________________________
>> >> > >> >AMBER mailing list
>> >> > >> >AMBER.ambermd.org
>> >> > >> >http://lists.ambermd.org/mailman/listinfo/amber
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> _______________________________________________
>> >> > >> AMBER mailing list
>> >> > >> AMBER.ambermd.org
>> >> > >> http://lists.ambermd.org/mailman/listinfo/amber
>> >> > >>
>> >> > >_______________________________________________
>> >> > >AMBER mailing list
>> >> > >AMBER.ambermd.org
>> >> > >http://lists.ambermd.org/mailman/listinfo/amber
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > AMBER mailing list
>> >> > AMBER.ambermd.org
>> >> > http://lists.ambermd.org/mailman/listinfo/amber
>> >> >
>> >> _______________________________________________
>> >> AMBER mailing list
>> >> AMBER.ambermd.org
>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >>
>> >_______________________________________________
>> >AMBER mailing list
>> >AMBER.ambermd.org
>> >http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed May 14 2014 - 13:00:04 PDT
Custom Search