Re: [AMBER] change in density on multiple gpus

From: Ross Walker <ross.rosswalker.co.uk>
Date: Fri, 10 Nov 2017 12:08:30 +0100

Hi Neha,

Thanks for testing. I'm glad to see the problem is isolated to a specific option. My suspicion would be ScaledMD rather than restraints but we'll take a look and see where the problem is.

All the best
Ross

> On Nov 9, 2017, at 01:22, Neha Gandhi <n.gandhiau.gmail.com> wrote:
>
> Hi Ross,
>
> I tried a small test run with a peptide in a cubic box. No restraints or
> SMD.
> Using parallel gpus (M40), the average density is correct. EIther smd or
> restraints are problem in parallel.
>
> NSTEP = 1000 TIME(PS) = 1002.000 TEMP(K) = 291.84 PRESS =
> 38.3
> Etot = -123129.5273 EKtot = 20192.0117 EPtot =
> -143321.5390
> BOND = 47.5188 ANGLE = 184.6402 DIHED =
> 177.1934
> 1-4 NB = 67.0236 1-4 EEL = 680.7912 VDWAALS =
> 20459.2485
> EELEC = -164937.9546 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> EKCMT = 10055.8297 VIRIAL = 9768.3861 VOLUME =
> 347191.7786
> Density =
> 1.0015
> ------------------------------------------------------------------------------
>
> wrapping first mol.: 25.57034 36.16192 62.63428
>
> NSTEP = 2000 TIME(PS) = 1004.000 TEMP(K) = 293.17 PRESS =
> 38.2
> Etot = -122959.5146 EKtot = 20284.1270 EPtot =
> -143243.6415
> BOND = 61.8348 ANGLE = 170.9486 DIHED =
> 174.6173
> 1-4 NB = 60.0539 1-4 EEL = 673.0535 VDWAALS =
> 20535.7533
> EELEC = -164919.9029 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> EKCMT = 10090.8180 VIRIAL = 9803.9675 VOLUME =
> 347496.5420
> Density =
> 1.0007
> ------------------------------------------------------------------------------
>
> Cheers then,
> Neha
>
> On 9 November 2017 at 07:33, Ross Walker <ross.rosswalker.co.uk> wrote:
>
>> Hi Neha,
>>
>> The best approach for systems that require you to request and entire node
>> is just to run two separate jobs. E.g.
>>
>> cd job1
>> $AMBERHOME/bin/pmemd.cuda -O -i mdin -o mdout ...... &
>> cd ../job2
>> $AMBERHOME/bin/pmemd.cuda -O -i mdin -o mdout ...... &
>> wait
>>
>> The key is the keyword wait here which stops the script returning until
>> both jobs have completed. If the jobs are the same system run with the same
>> input just different random seeds to get improved sampling that should take
>> almost identical time so the script load balances well. The other option
>> you can do is write a loop in here, with the wait statement, that loops 2
>> calculations over a series of steps - say lots of 50ns production steps one
>> after the other. That way if one job finishes quickly that GPU gets given
>> more work to do while the other one is still running.
>>
>> With regards to the problem you are seeing using both GPUs for a single
>> calculation though this definitely looks like some kind of bug. It maye be
>> related to the scaledMD though. Can you try this leaving out all the
>> options after the ig=-1 (no restraints and no scaled MD) and see if you get
>> the same difference between single GPU and multi-GPU.
>>
>> All the best
>> Ross
>>
>>> On Nov 8, 2017, at 4:24 PM, Neha Gandhi <n.gandhiau.gmail.com> wrote:
>>>
>>> Hi Ross,
>>>
>>> I understand that serial GPU performs better over parallel for some
>>> systems. We are required to request 2 GPUs per job on most supercomputing
>>> facilities in Australia even if the job will only use single gpu.
>>>
>>> The specification of my job is below
>>>
>>> AMBER version 16.
>>>
>>> &cntrl
>>>
>>>
>>> imin=0,irest=1,ntx=5,
>>>
>>>
>>> nstlim=5000000,dt=0.002,
>>>
>>>
>>> ntc=2,ntf=2,
>>>
>>> cut=12.0, ntb=2, ntp=1,
>>> taup=1,
>>> ntpr=5000,
>>> ntwx=5000,ntwr=5000,
>>> ntt=3,
>>> gamma_ln=5.0,
>>>
>>> temp0=310.0,iwrap=1,ig=-1,
>>>
>>> ntr=1,
>>> restraintmask=':CNT',
>>>
>>> restraint_wt=2.0,
>>>
>>>
>>> scaledMD=1,
>>>
>>>
>>> scaledMD_lambda=0.70,
>>>
>>> /
>>>
>>>
>>>
>>> Jobscript on HPC
>>>
>>> #!/bin/bash -l
>>> #PBS -N cnt2
>>> #PBS -l walltime=24:00:00
>>> #PBS -l select=1:ncpus=2:ngpus=2:mpiprocs=2:gputype=M40:mem=10gb
>>> #PBS -j oe
>>>
>>> module purge
>>> module load
>>> amber/16-iomkl-2016.09-gcc-4.9.3-2.25-ambertools-16-patchlevel-5-14-cuda
>>>
>>> cd $PBS_O_WORKDIR
>>> mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i smd.in -o smd1.out -p
>>> solvated.prmtop -c npt1.rst -r smd1.rst -x smd1.netcdf -ref npt1.rst
>>>
>>>
>>> |------------------- GPU DEVICE INFO --------------------
>>> |
>>> | Task ID: 0
>>> | CUDA_VISIBLE_DEVICES: 0,1
>>> | CUDA Capable Devices Detected: 2
>>> | CUDA Device ID in use: 0
>>> | CUDA Device Name: Tesla M40 24GB
>>> | CUDA Device Global Mem Size: 22971 MB
>>> | CUDA Device Num Multiprocessors: 24
>>> | CUDA Device Core Freq: 1.11 GHz
>>> |
>>> |
>>> | Task ID: 1
>>> | CUDA_VISIBLE_DEVICES: 0,1
>>> | CUDA Capable Devices Detected: 2
>>> | CUDA Device ID in use: 1
>>> | CUDA Device Name: Tesla M40 24GB
>>> | CUDA Device Global Mem Size: 22971 MB
>>> | CUDA Device Num Multiprocessors: 24
>>> | CUDA Device Core Freq: 1.11 GHz
>>> |
>>>
>>> I can confirm that the same job with single GPU shows density of 0.99.
>>>
>>> NSTEP = 5000 TIME(PS) = 40660.000 TEMP(K) = 310.08 PRESS =
>>> 28.4
>>> Etot = -267200.5515 EKtot = 113029.5000 EPtot =
>>> -380230.0515
>>> BOND = 2973.9049 ANGLE = 3436.4722 DIHED =
>>> 5568.2840
>>> 1-4 NB = 5125.8320 1-4 EEL = 9867.1070 VDWAALS =
>>> 76308.4188
>>> EELEC = -646585.5951 EHBOND = 0.0000 RESTRAINT =
>>> 119.7885
>>> EAMBER (non-restraint) = -380349.8399
>>> EKCMT = 54095.5514 VIRIAL = 52953.5751 VOLUME =
>>> 1860086.9230
>>> Density =
>>> 0.9909
>>> ------------------------------------------------------------
>> ------------------
>>>
>>>
>>> I also tried parallel job on Pascal gpu on different hpc but there are
>>> issues with density on parallel gpus.
>>>
>>> Thanks,
>>> Neha
>>>
>>> On 9 November 2017 at 00:21, Ross Walker <rosscwalker.gmail.com> wrote:
>>>
>>>> Hi Neha,
>>>>
>>>> There should be no difference in the density - or any of the properties
>> -
>>>> using a single or multiple GPUs. Can you confirm that if you restart the
>>>> calculation from the restart file at the 40.65ns stage you show below
>> on a
>>>> single GPU then the simulation continues at a density of approximately
>> 0.99?
>>>>
>>>> The key being to isolate that the problem is just from using more than 1
>>>> GPU and not from some other issue such as an incorrect setting in the
>> mdin
>>>> file for example.
>>>>
>>>> It would also help if you can provide some more details about your
>> setup,
>>>> GPU model, AMBER version etc.
>>>>
>>>> Note, as an aside running a single simulation on multiple GPUs is not
>>>> always faster so you might want to check that you actually get a speed
>>>> improvement from using more than 1 GPU at once. Although that's separate
>>>> from the issue you are reporting since even if it runs slower on
>> multiple
>>>> GPUs it shouldn't give incorrect answers.
>>>>
>>>> All the best
>>>> Ross
>>>>
>>>>> On Nov 8, 2017, at 1:15 AM, Neha Gandhi <n.gandhiau.gmail.com> wrote:
>>>>>
>>>>> Dear List,
>>>>>
>>>>> I am running NPT simulation using parallel gpus. Upon using pmemd.cuda,
>>>> the
>>>>> density is 0.99.
>>>>>
>>>>>
>>>>> A V E R A G E S O V E R 2000 S T E P S
>>>>>
>>>>>
>>>>> NSTEP = 10000000 TIME(PS) = 40650.000 TEMP(K) = 309.99 PRESS =
>>>>> 1.8
>>>>> Etot = -461754.3426 EKtot = 120156.4717 EPtot =
>>>>> -581910.8142
>>>>> BOND = 2213.1206 ANGLE = 2543.5550 DIHED =
>>>>> 5043.7658
>>>>> 1-4 NB = 5223.7909 1-4 EEL = 9977.1082 VDWAALS =
>>>>> 81530.0154
>>>>> EELEC = -688536.8366 EHBOND = 0.0000 RESTRAINT =
>>>>> 94.6664
>>>>> EAMBER (non-restraint) = -582005.4806
>>>>> EKCMT = 57615.1448 VIRIAL = 57540.2827 VOLUME =
>>>>> 1978006.8374
>>>>> Density =
>>>>> 0.9904
>>>>> -----------------------------------------------------------
>>>>> -------------------
>>>>>
>>>>>
>>>>> When I use parallel gpus, (the jobs are still onging), the density is
>>>> 0.87.
>>>>> I was wondering if this is the expected behaviour when using
>>>>> pmemd.cuda.MPI.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> NSTEP = 70000 TIME(PS) = 40790.000 TEMP(K) = 310.56 PRESS =
>>>>> -7.3
>>>>> Etot = -203302.7551 EKtot = 113220.4141 EPtot =
>>>>> -316523.1692
>>>>> BOND = 2874.4111 ANGLE = 3392.6226 DIHED =
>>>>> 5522.4047
>>>>> 1-4 NB = 5274.6312 1-4 EEL = 9963.2258 VDWAALS =
>>>>> 54760.1196
>>>>> EELEC = -534081.2338 EHBOND = 0.0000 RESTRAINT =
>>>>> 117.8627
>>>>> EAMBER (non-restraint) = -316641.0319
>>>>> EKCMT = 53755.0242 VIRIAL = 54084.9861 VOLUME =
>>>>> 2107014.1166
>>>>> Density =
>>>>> 0.8749
>>>>> ------------------------------------------------------------
>>>> ------------------
>>>>>
>>>>>
>>>>> NSTEP = 75000 TIME(PS) = 40800.000 TEMP(K) = 309.02 PRESS =
>>>>> 12.8
>>>>> Etot = -204018.1082 EKtot = 112657.9297 EPtot =
>>>>> -316676.0379
>>>>> BOND = 2955.5325 ANGLE = 3323.3666 DIHED =
>>>>> 5514.8454
>>>>> 1-4 NB = 5328.7581 1-4 EEL = 9964.6209 VDWAALS =
>>>>> 55136.4175
>>>>> EELEC = -534740.1213 EHBOND = 0.0000 RESTRAINT =
>>>>> 122.2405
>>>>> EAMBER (non-restraint) = -316798.2784
>>>>> EKCMT = 53530.0160 VIRIAL = 52950.1399 VOLUME =
>>>>> 2104363.9330
>>>>> Density =
>>>>> 0.8760
>>>>> ------------------------------------------------------------
>>>> ------------------
>>>>>
>>>>> I am happy to provide more information on job script and input files if
>>>>> required.
>>>>>
>>>>> Regards,
>>>>> Neha
>>>>> _______________________________________________
>>>>> AMBER mailing list
>>>>> AMBER.ambermd.org
>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Dr. Neha S. Gandhi,
>>> Vice Chancellor's Research Fellow,
>>> Queensland University of Technology,
>>> 2 George Street, Brisbane, QLD 4000
>>> Australia
>>> LinkedIn
>>> Research Gate
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>
> --
> Regards,
> Dr. Neha S. Gandhi,
> Vice Chancellor's Research Fellow,
> Queensland University of Technology,
> 2 George Street, Brisbane, QLD 4000
> Australia
> LinkedIn
> Research Gate
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Nov 10 2017 - 03:30:02 PST
Custom Search