Re: [AMBER] Reduced Performance with pmemd.cuda Compared With Benchmarks. from Bill Ross on 2020-08-05 (Amber Archive Aug 2020)

From: Bill Ross <ross.cgl.ucsf.edu>
Date: Wed, 5 Aug 2020 23:52:10 -0700

It might help more-informed people to mention your topology.

AMBER Cuda is after my time, but I'll go ahead and speculate that the
nonbonded ops on cuda itself should be the same-ish for any N atoms, but
that if your topology is off the developers' beaten track, maybe there's
a bottleneck in collecting the data. What does nvidia-smi say about GPU
usage?

Bill

(I play cuda for tensorflow.)

On 8/5/20 11:43 PM, 李奕言 wrote:
> Thanks again for your kind reply.
>
> As much as I wish to get an upgrade, I am still wondering if there is any solution within Amber16.
> The benchmark tests went well, but the actual system with similar number of atoms got much slower -- Is this a general problem in Amber16 that can only be fixed in newer versions? Does the performance of pmemd.cuda differ among systems, or does it simply depend on the atom number?
>
> I would be grateful if you could help.
>
>> -----Original Messages-----
>> From: "David Cerutti" <dscerutti.gmail.com>
>> Sent Time: 2020-08-06 10:48:36 (Thursday)
>> To: "AMBER Mailing List" <amber.ambermd.org>
>> Cc:
>> Subject: Re: [AMBER] Reduced Performance with pmemd.cuda Compared With Benchmarks.
>>
>> We made some performance improvements in Amber18 that will carry over to
>> Amber20 once the patch hits. The sooner you can upgrade, the better, but
>> the performance of Amber20 will probably still be better than Amber16 on a
>> GTX-1080Ti even without the patch.
>>
>> Dave
>>
>>
>> On Wed, Aug 5, 2020 at 9:59 PM 李奕言 <liyiyan.pku.edu.cn> wrote:
>>
>>> Thank you for the information.
>>> Unfortunately I was using Amber16. Does the same problem occur in Amber16?
>>>
>>>
>>>> -----Original Messages-----
>>>> From: "David Cerutti" <dscerutti.gmail.com>
>>>> Sent Time: 2020-08-04 12:40:51 (Tuesday)
>>>> To: "AMBER Mailing List" <amber.ambermd.org>
>>>> Cc:
>>>> Subject: Re: [AMBER] Reduced Performance with pmemd.cuda Compared With
>>> Benchmarks.
>>>> This is a known problem in Amber20, which has yet to be fixed pending
>>> some
>>>> other developments in the master branch. We know the solution, but the
>>>> release version for now is taking precautions that will only become
>>>> necessary in CUDA11 and later. There is nothing wrong with the code, but
>>>> we are still deliberating what patch to make.
>>>>
>>>> Dave
>>>>
>>>>
>>>> On Sun, Aug 2, 2020 at 11:22 PM 李奕言 <liyiyan.pku.edu.cn> wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Here is a new Amber user, who has been doing MD simulations on GTX
>>> 1080 Ti
>>>>> using pmemd.cuda in Amber 16 and have encountered considerable
>>> reduction in
>>>>> performance.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> First I ran the benchmark sets of AMBER 16 GPU acceleration. For NPT
>>>>> simulation of Factor IX with 90,906 atoms, I have got ~89 ns/day on
>>> single
>>>>> GPU, slightly less than the benchmark performance (92.08 ns/day),
>>> which is
>>>>> acceptable. Since the benchmark results were OK, I was not suspecting
>>> any
>>>>> problem with the GPU.
>>>>>
>>>>> When it came to my membrane protein system, which contains a GPCR dimer
>>>>> protein (ffSB14), POPC lipids (lipid 14), waters (TIP3P) and ions, and
>>> has
>>>>> a fairly similar atom number of 95,096, I was getting ~65 ns/day on
>>> single
>>>>> GPU. This was not what I was expecting, seeing that 95,096 was not THAT
>>>>> larger than 90,906.
>>>>>
>>>>> My input file for the NPT simulation is as follows. I have tried to
>>> get
>>>>> as close as possible to the benchmark input.
>>>>>
>>>>> NPT Production Run
>>>>>
>>>>> &cntrl
>>>>> nstlim=250000000, dt=0.002, ntx=5, irest=1, ntpr=50000, ntwr=50000000,
>>>>> ntwx=50000,
>>>>> temp0=300.0, ntt=1, tautp=10.0,
>>>>> ntb=2, ntp=1, barostat=2,
>>>>> ntc=2, ntf=2,
>>>>> ioutfm=1,
>>>>> &end
>>>>> /
>>>>> 1/ I have even cut down the writing of output and trajectories for
>>> better
>>>>> performance by raising ntpr and ntwx values, at the cost of inadequate
>>>>> trajectory snapshots. Is this necessary?
>>>>>
>>>>> 2/ Also, currently I am running an aMD simulation with similar input
>>>>> settings, which resulted in ~42 ns/day. Is aMD performance destined to
>>>>> drop compared to conventional MD?
>>>>>
>>>>> Production Run with Accelerated MD
>>>>> &cntrl
>>>>> nstlim=250000000, dt=0.002, ntx=5, irest=1, ntpr=1000, ntwr=50000000,
>>>>> ntwx=1000, ntwprt=8873,
>>>>> temp0=300.0, ntt=1, tautp=10.0,
>>>>> ntb=2, ntp=1, barostat=2, cut=10.0,
>>>>> ntc=2, ntf=2,
>>>>> ioutfm=1,
>>>>> iamd=3,
>>>>> ethreshd=23916, alphad=1104,
>>>>> ethreshp=-187677, alphap=19011,
>>>>> &end
>>>>> /
>>>>>
>>>>> 3/ Does the performance of pmemd.cuda differ among systems, or does it
>>>>> simply depend on the atom number?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I am hoping to get tips for improving the performance of my NPT runs.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Many thanks for any reply.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Ian
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> AMBER mailing list
>>>>> AMBER.ambermd.org
>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
Phobrain.com
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Thu Aug 06 2020 - 00:00:02 PDT