Re: [AMBER] how to improve GPU running? from Ross Walker on 2012-04-18 (Amber Archive Apr 2012)

From: Ross Walker <rosscwalker.gmail.com>
Date: Wed, 18 Apr 2012 07:10:37 -0400

Hi Albert,

You could improve things a bit by using an 8A cutoff. And use constant volume if your density is well equilibrated. Ntt=1 will also be quicker if your system is also well thermally equilibrated. You could also request that they turn off ECC on the gpus but you will probably get a religious type 'no' response to that.

As for the parallel scaling, not much that can be done there since Forge is fundamentally flawed from the beginning, blame Dell for building what must be the world's worst design for a GPU cluster ever conceived. About the best you can hope for on Forge is 6 independent single GPU runs per node. The design is just too utterly awful for anything else, sorry.

You will likely get better success using Keeneland. Or try an MDsimcluster machine as we highlight on http://ambermd.org/gpus/ these are actually a reasonable design and you can get to 8 GPUs (see the benchmarks in that page for 2xM2090 per node).

All the best
Ross

On Apr 18, 2012, at 6:52, Albert <mailmd2011.gmail.com> wrote:

> hello:
> thank you very much for your kind reply. Does anybody else have any
> idea how to improve it?
> here is my md.in file:
>
> production dynamics
> &cntrl
> imin=0, irest=1, ntx=5,
> nstlim=250000000, dt=0.002,
> ntc=2, ntf=2,
> cut=10.0, ntb=2, ntp=1, taup=2.0,
> ntpr=1000, ntwx=1000, ntwr=50000,
> ntt=3, gamma_ln=2.0,
> temp0=300.0,
> /
>
>
> thank you very much
>
>
> On 04/18/2012 09:36 AM, steinbrt.rci.rutgers.edu wrote:
>> Hi,
>>
>>> some test for a 50,000 atoms protein/water system,
>>> command:
>>> 1X8 16.44
>> I am not part of the CUDA developers, but to me, that looks not unusual,
>> depending on your GPUs. Compare to
>>
>> http://ambermd.org/gpus/benchmarks.htm#Benchmarks
>>
>> I assume that 1X8 means 1 8core node with a single GPU, right? 10-20ns/d
>> for a medium-large system is what I'd expect.
>>
>>> 1X6 17.98
>>> 2X6 19.41
>>> 3X6 20.13
>>> 4X6 19.70
>>> 5X6 19.62
>>> 6X6 19.03
>>> 10X6 18.33
>>> It seems that the efficiency is not so high and the best one is 3X6 with
>>> around 20.1 ns/day. Since I am going to run hundreds of ns, it would
>>> take such a long time to be finished.....
>> I would argue that you gain almost nothing from scaling to a third GPU, so
>> 2 or even 1 GPU is the optimal spot to run your simulation. Adding 50%
>> more resources to gain 5% more efficiency seems wasteful to me. You see
>> that multi-GPU scaling is not very efficient, which would depend on your
>> machine setup.
>>
>> As for the long time your simulation would then take: *are you kidding
>> me?* I hate to sound exceptionally old here, but when I started doing MD
>> (say 5 years ago) I'd have killed for multinanosecond simulations on a
>> single machine, especially when waiting for a three-week 1 ns
>> equilibration to finish. So I guess the efficiency you see is the best one
>> could get at the moment and it is actually very very impressive!
>>
>> Please imagine last paragraph wrapped in<rant> tags ;-)
>>
>> Thomas
>>
>> Dr. Thomas Steinbrecher
>> formerly at the
>> BioMaps Institute
>> Rutgers University
>> 610 Taylor Rd.
>> Piscataway, NJ 08854
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Apr 18 2012 - 04:30:04 PDT