Re: [AMBER] Fwd: increase speed in amber using implcit solvent model

From: Matias Machado <mmachado.pasteur.edu.uy>
Date: Tue, 26 Jun 2018 13:27:58 -0300 (UYT)

Dear Chhaya,

Implicit solvent isn't as cheap as many people may think... there are many reasons for that (e.g. read this paper of Schulten [https://www.ncbi.nlm.nih.gov/pubmed/22121340])

Notice for example, that you are explicitly calculating all N^2 Coulomb interactions by using a cut-off of 999A!!! You may increase the speed up in the CPU code by shortening that cut-off to e.g. 18A (but remember that means you are shifting the electrostatic potential and may compromise the accuracy of the calculation).

Anyway, I would say that nowadays, is cheaper running systems of >5000 atoms (or so) in explicit solvent as PME performs better, e.g. compare AMBER GPU benchmark on DHFR (23558 atoms) and Nucleosome (25095 atoms) both running at 2fs in explicit an implicit solvent respectively [http://ambermd.org/gpus/benchmarks.htm]... However, I must mention that there are some efforts to get better size scalability in implicit solvent (See [https://pubs.acs.org/doi/abs/10.1021/acs.jctc.6b00712]). I that sense, I think (IMHO) the main advantage of implicit solvent is a faster exploration of the Potential Energy Surface due to a reduced solvent friction... which means 1 ns MD at explicit solvent may not by 1 ns at implicit one (it may be more)... so there is some speed up there very difficult to quantify...

Regarding the CPU vs GPU performance of AMBER code... I don't think is fare comparing few CPU cores against thousands in a GPU... but in my own experience the CPU code is by far less performant than the GPU one (at least in terms of scalability), hopefully the upcoming midpoint version of pmemd will narrow that difference...

By the way... the performance you are getting is about the expected one for that system size, check as a reference the AMBER GPU benchmark on the Nucleosome (25095 atoms) in implicit solvent [http://ambermd.org/gpus/benchmarks.htm]:

# 36 CPU : 0.29 ns/day
# 2 K40 : 8 ns/day
# 4 K40 : 14 ns/day

(you would expect about twice that values as your system is a half)

Hope this helps...

Best,

Matías

------------------------------------
PhD.
Researcher at Biomolecular Simulations Lab.
Institut Pasteur de Montevideo | Uruguay
[http://pasteur.uy/en/laboratorios-eng/lsbm]
[http://www.sirahff.com]


----- Mensaje original -----
De: "Chhaya Singh" <chhayasingh014.gmail.com>
Para: "david case" <david.case.rutgers.edu>, "AMBER Mailing List" <amber.ambermd.org>, "Carlos Simmerling" <carlos.simmerling.gmail.com>
Enviados: Lunes, 25 de Junio 2018 3:57:37
Asunto: Re: [AMBER] Fwd: increase speed in amber using implcit solvent model

Hey ,
i have also tried using gpu as you suggested .

Using 4 K40 GPU , I am getting a speed :

Average timings for last 4950 steps:
| Elapsed(s) = 27.31 Per Step(ms) = 5.52
| ns/day = 31.32 seconds/ns = 2758.71
|
| Average timings for all steps:
| Elapsed(s) = 27.59 Per Step(ms) = 5.52
| ns/day = 31.32 seconds/ns = 2759.04


that means i am getting a speed of 31.32 ns/day

i also tried using 2 K40 GPU :

 Average timings for last 4950 steps:
| Elapsed(s) = 42.20 Per Step(ms) = 8.53
| ns/day = 20.27 seconds/ns = 4262.74
|
| Average timings for all steps:
| Elapsed(s) = 42.63 Per Step(ms) = 8.53
| ns/day = 20.27 seconds/ns = 4262.90



is there any improvement that i can make ?


On 25 June 2018 at 12:14, Chhaya Singh <chhayasingh014.gmail.com> wrote:

> hello,
> I am having 13010 atoms in my system.
> the cpu that i am using has the following details:
>
> Intel E5-2670 series CPUs with 16 cores/node and has 64 GB RAM.
>
> the command line i am using is :
>
> #mpirun -machinefile $PBS_NODEFILE -np $NPROCS $AMBERHOME/bin/sander.MPI
> -O -i min.in -p fib.prmtop -c fib.inpcrd -r min.rst -o min.out
>
>
> mpirun -machinefile $PBS_NODEFILE -np $NPROCS $AMBERHOME/bin/pmemd.MPI -O
> -i heat1.in -p fib.prmtop -c min.rst -r heat1.rst -o heat1.out -x heat1.nc
> -inf heat1.mdinfo
>
>
> my min.in file has the following parameters:
> Stage 1 - minimisation of fibril
> &cntrl
> imin=1, maxcyc=1000, ncyc=500,
> cut=999., rgbmax=999.,igb=8, ntb=0,
> ntpr=100
> /
>
> using 1 node I am getting a speed of :
> Average timings for last 50 steps:
> | Elapsed(s) = 33.80 Per Step(ms) = 675.91
> | ns/day = 0.06 seconds/ns = 1351821.64
> |
> | Average timings for all steps:
> | Elapsed(s) = 26982.45 Per Step(ms) = 674.56
> | ns/day = 0.06 seconds/ns = 1349122.30
>
> using 2 nodes I am getting a speed of :
>
> Average timings for last 150 steps:
> | Elapsed(s) = 51.65 Per Step(ms) = 344.37
> | ns/day = 0.13 seconds/ns = 688733.19
> |
> | Average timings for all steps:
> | Elapsed(s) = 13726.02 Per Step(ms) = 343.15
> | ns/day = 0.13 seconds/ns = 686300.95
>
>
> using 8 nodes:
> Average timings for last 250 steps:
> | Elapsed(s) = 23.90 Per Step(ms) = 95.62
> | ns/day = 0.45 seconds/ns = 191236.85
> |
> | Average timings for all steps:
> | Elapsed(s) = 955.23 Per Step(ms) = 95.52
> | ns/day = 0.45 seconds/ns = 191045.67
>
>
> using 10 nodes i am getting a speed of:
>
> Average timings for last 200 steps:
> | Elapsed(s) = 16.28 Per Step(ms) = 81.42
> | ns/day = 0.53 seconds/ns = 162838.32
> |
> | Average timings for all steps:
> | Elapsed(s) = 3224.53 Per Step(ms) = 80.61
> | ns/day = 0.54 seconds/ns = 161226.62
>
>
> On 25 June 2018 at 00:19, David A Case <david.case.rutgers.edu> wrote:
>
>> On Sun, Jun 24, 2018, Chhaya Singh wrote:
>>
>> > I am trying to perform a simulation having a protein using implicit
>> solvent
>> > model using force field ff14sbonlysc with igb = 8.
>> > I am getting a very low speed using 2 nodes. the speed i get now is
>> less a
>> > ns/ day.
>>
>> It would help a lot to know how many atoms are in your protein. Less
>> crucial, but still important, would be to know what cpu you are using.
>> (Or is this actually a GPU simulation?) When you say "2 nodes", exactly
>> what is meant? Can you provide the command line that you used to run
>> the simulation?
>>
>> Some general hints (beyond the good advice that Carlos has already
>> given.):
>>
>> a. be sure you are using pmemd.MPI, not sander.MPI (if pmemd is
>> available)
>> b. if possible, see if increasing the number of MPI threads helps
>> c. you can run tests with a cutoff (cut and rgbmax) of 20 or 25: you
>> will still have some artifacts from the cutoff, but they may be
>> small enough to live with.
>> d. if you system is indeed quite large, you may benefit from the
>> hierarchical charge partitioning (GB-HCP) model. See the manual
>> for details.
>>
>> ....dac
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 26 2018 - 09:30:01 PDT
Custom Search