Re: [AMBER] change in density on multiple gpus from Neha Gandhi on 2017-11-08 (Amber Archive Nov 2017)

From: Neha Gandhi <n.gandhiau.gmail.com>
Date: Thu, 9 Nov 2017 10:22:48 +1000

Hi Ross,

I tried a small test run with a peptide in a cubic box. No restraints or
SMD.
Using parallel gpus (M40), the average density is correct. EIther smd or
restraints are problem in parallel.

NSTEP = 1000 TIME(PS) = 1002.000 TEMP(K) = 291.84 PRESS =
38.3
Etot = -123129.5273 EKtot = 20192.0117 EPtot =
-143321.5390
BOND = 47.5188 ANGLE = 184.6402 DIHED =
177.1934
1-4 NB = 67.0236 1-4 EEL = 680.7912 VDWAALS =
20459.2485
EELEC = -164937.9546 EHBOND = 0.0000 RESTRAINT =
0.0000
EKCMT = 10055.8297 VIRIAL = 9768.3861 VOLUME =
347191.7786
Density =
1.0015
------------------------------------------------------------------------------

wrapping first mol.: 25.57034 36.16192 62.63428

NSTEP = 2000 TIME(PS) = 1004.000 TEMP(K) = 293.17 PRESS =
38.2
Etot = -122959.5146 EKtot = 20284.1270 EPtot =
-143243.6415
BOND = 61.8348 ANGLE = 170.9486 DIHED =
174.6173
1-4 NB = 60.0539 1-4 EEL = 673.0535 VDWAALS =
20535.7533
EELEC = -164919.9029 EHBOND = 0.0000 RESTRAINT =
0.0000
EKCMT = 10090.8180 VIRIAL = 9803.9675 VOLUME =
347496.5420
Density =
1.0007
------------------------------------------------------------------------------

Cheers then,
Neha

On 9 November 2017 at 07:33, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Neha,
>
> The best approach for systems that require you to request and entire node
> is just to run two separate jobs. E.g.
>
> cd job1
> $AMBERHOME/bin/pmemd.cuda -O -i mdin -o mdout ...... &
> cd ../job2
> $AMBERHOME/bin/pmemd.cuda -O -i mdin -o mdout ...... &
> wait
>
> The key is the keyword wait here which stops the script returning until
> both jobs have completed. If the jobs are the same system run with the same
> input just different random seeds to get improved sampling that should take
> almost identical time so the script load balances well. The other option
> you can do is write a loop in here, with the wait statement, that loops 2
> calculations over a series of steps - say lots of 50ns production steps one
> after the other. That way if one job finishes quickly that GPU gets given
> more work to do while the other one is still running.
>
> With regards to the problem you are seeing using both GPUs for a single
> calculation though this definitely looks like some kind of bug. It maye be
> related to the scaledMD though. Can you try this leaving out all the
> options after the ig=-1 (no restraints and no scaled MD) and see if you get
> the same difference between single GPU and multi-GPU.
>
> All the best
> Ross
>
> > On Nov 8, 2017, at 4:24 PM, Neha Gandhi <n.gandhiau.gmail.com> wrote:
> >
> > Hi Ross,
> >
> > I understand that serial GPU performs better over parallel for some
> > systems. We are required to request 2 GPUs per job on most supercomputing
> > facilities in Australia even if the job will only use single gpu.
> >
> > The specification of my job is below
> >
> > AMBER version 16.
> >
> > &cntrl
> >
> >
> > imin=0,irest=1,ntx=5,
> >
> >
> > nstlim=5000000,dt=0.002,
> >
> >
> > ntc=2,ntf=2,
> >
> > cut=12.0, ntb=2, ntp=1,
> > taup=1,
> > ntpr=5000,
> > ntwx=5000,ntwr=5000,
> > ntt=3,
> > gamma_ln=5.0,
> >
> > temp0=310.0,iwrap=1,ig=-1,
> >
> > ntr=1,
> > restraintmask=':CNT',
> >
> > restraint_wt=2.0,
> >
> >
> > scaledMD=1,
> >
> >
> > scaledMD_lambda=0.70,
> >
> > /
> >
> >
> >
> > Jobscript on HPC
> >
> > #!/bin/bash -l
> > #PBS -N cnt2
> > #PBS -l walltime=24:00:00
> > #PBS -l select=1:ncpus=2:ngpus=2:mpiprocs=2:gputype=M40:mem=10gb
> > #PBS -j oe
> >
> > module purge
> > module load
> > amber/16-iomkl-2016.09-gcc-4.9.3-2.25-ambertools-16-patchlevel-5-14-cuda
> >
> > cd $PBS_O_WORKDIR
> > mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i smd.in -o smd1.out -p
> > solvated.prmtop -c npt1.rst -r smd1.rst -x smd1.netcdf -ref npt1.rst
> >
> >
> > |------------------- GPU DEVICE INFO --------------------
> > |
> > | Task ID: 0
> > | CUDA_VISIBLE_DEVICES: 0,1
> > | CUDA Capable Devices Detected: 2
> > | CUDA Device ID in use: 0
> > | CUDA Device Name: Tesla M40 24GB
> > | CUDA Device Global Mem Size: 22971 MB
> > | CUDA Device Num Multiprocessors: 24
> > | CUDA Device Core Freq: 1.11 GHz
> > |
> > |
> > | Task ID: 1
> > | CUDA_VISIBLE_DEVICES: 0,1
> > | CUDA Capable Devices Detected: 2
> > | CUDA Device ID in use: 1
> > | CUDA Device Name: Tesla M40 24GB
> > | CUDA Device Global Mem Size: 22971 MB
> > | CUDA Device Num Multiprocessors: 24
> > | CUDA Device Core Freq: 1.11 GHz
> > |
> >
> > I can confirm that the same job with single GPU shows density of 0.99.
> >
> > NSTEP = 5000 TIME(PS) = 40660.000 TEMP(K) = 310.08 PRESS =
> > 28.4
> > Etot = -267200.5515 EKtot = 113029.5000 EPtot =
> > -380230.0515
> > BOND = 2973.9049 ANGLE = 3436.4722 DIHED =
> > 5568.2840
> > 1-4 NB = 5125.8320 1-4 EEL = 9867.1070 VDWAALS =
> > 76308.4188
> > EELEC = -646585.5951 EHBOND = 0.0000 RESTRAINT =
> > 119.7885
> > EAMBER (non-restraint) = -380349.8399
> > EKCMT = 54095.5514 VIRIAL = 52953.5751 VOLUME =
> > 1860086.9230
> > Density =
> > 0.9909
> > ------------------------------------------------------------
> ------------------
> >
> >
> > I also tried parallel job on Pascal gpu on different hpc but there are
> > issues with density on parallel gpus.
> >
> > Thanks,
> > Neha
> >
> > On 9 November 2017 at 00:21, Ross Walker <rosscwalker.gmail.com> wrote:
> >
> >> Hi Neha,
> >>
> >> There should be no difference in the density - or any of the properties
> -
> >> using a single or multiple GPUs. Can you confirm that if you restart the
> >> calculation from the restart file at the 40.65ns stage you show below
> on a
> >> single GPU then the simulation continues at a density of approximately
> 0.99?
> >>
> >> The key being to isolate that the problem is just from using more than 1
> >> GPU and not from some other issue such as an incorrect setting in the
> mdin
> >> file for example.
> >>
> >> It would also help if you can provide some more details about your
> setup,
> >> GPU model, AMBER version etc.
> >>
> >> Note, as an aside running a single simulation on multiple GPUs is not
> >> always faster so you might want to check that you actually get a speed
> >> improvement from using more than 1 GPU at once. Although that's separate
> >> from the issue you are reporting since even if it runs slower on
> multiple
> >> GPUs it shouldn't give incorrect answers.
> >>
> >> All the best
> >> Ross
> >>
> >>> On Nov 8, 2017, at 1:15 AM, Neha Gandhi <n.gandhiau.gmail.com> wrote:
> >>>
> >>> Dear List,
> >>>
> >>> I am running NPT simulation using parallel gpus. Upon using pmemd.cuda,
> >> the
> >>> density is 0.99.
> >>>
> >>>
> >>> A V E R A G E S O V E R 2000 S T E P S
> >>>
> >>>
> >>> NSTEP = 10000000 TIME(PS) = 40650.000 TEMP(K) = 309.99 PRESS =
> >>> 1.8
> >>> Etot = -461754.3426 EKtot = 120156.4717 EPtot =
> >>> -581910.8142
> >>> BOND = 2213.1206 ANGLE = 2543.5550 DIHED =
> >>> 5043.7658
> >>> 1-4 NB = 5223.7909 1-4 EEL = 9977.1082 VDWAALS =
> >>> 81530.0154
> >>> EELEC = -688536.8366 EHBOND = 0.0000 RESTRAINT =
> >>> 94.6664
> >>> EAMBER (non-restraint) = -582005.4806
> >>> EKCMT = 57615.1448 VIRIAL = 57540.2827 VOLUME =
> >>> 1978006.8374
> >>> Density =
> >>> 0.9904
> >>> -----------------------------------------------------------
> >>> -------------------
> >>>
> >>>
> >>> When I use parallel gpus, (the jobs are still onging), the density is
> >> 0.87.
> >>> I was wondering if this is the expected behaviour when using
> >>> pmemd.cuda.MPI.
> >>>
> >>>
> >>>
> >>>
> >>> NSTEP = 70000 TIME(PS) = 40790.000 TEMP(K) = 310.56 PRESS =
> >>> -7.3
> >>> Etot = -203302.7551 EKtot = 113220.4141 EPtot =
> >>> -316523.1692
> >>> BOND = 2874.4111 ANGLE = 3392.6226 DIHED =
> >>> 5522.4047
> >>> 1-4 NB = 5274.6312 1-4 EEL = 9963.2258 VDWAALS =
> >>> 54760.1196
> >>> EELEC = -534081.2338 EHBOND = 0.0000 RESTRAINT =
> >>> 117.8627
> >>> EAMBER (non-restraint) = -316641.0319
> >>> EKCMT = 53755.0242 VIRIAL = 54084.9861 VOLUME =
> >>> 2107014.1166
> >>> Density =
> >>> 0.8749
> >>> ------------------------------------------------------------
> >> ------------------
> >>>
> >>>
> >>> NSTEP = 75000 TIME(PS) = 40800.000 TEMP(K) = 309.02 PRESS =
> >>> 12.8
> >>> Etot = -204018.1082 EKtot = 112657.9297 EPtot =
> >>> -316676.0379
> >>> BOND = 2955.5325 ANGLE = 3323.3666 DIHED =
> >>> 5514.8454
> >>> 1-4 NB = 5328.7581 1-4 EEL = 9964.6209 VDWAALS =
> >>> 55136.4175
> >>> EELEC = -534740.1213 EHBOND = 0.0000 RESTRAINT =
> >>> 122.2405
> >>> EAMBER (non-restraint) = -316798.2784
> >>> EKCMT = 53530.0160 VIRIAL = 52950.1399 VOLUME =
> >>> 2104363.9330
> >>> Density =
> >>> 0.8760
> >>> ------------------------------------------------------------
> >> ------------------
> >>>
> >>> I am happy to provide more information on job script and input files if
> >>> required.
> >>>
> >>> Regards,
> >>> Neha
> >>> _______________________________________________
> >>> AMBER mailing list
> >>> AMBER.ambermd.org
> >>> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >
> >
> >
> > --
> > Regards,
> > Dr. Neha S. Gandhi,
> > Vice Chancellor's Research Fellow,
> > Queensland University of Technology,
> > 2 George Street, Brisbane, QLD 4000
> > Australia
> > LinkedIn
> > Research Gate
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
Regards,
Dr. Neha S. Gandhi,
Vice Chancellor's Research Fellow,
Queensland University of Technology,
2 George Street, Brisbane, QLD 4000
Australia
LinkedIn
Research Gate
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Wed Nov 08 2017 - 16:30:02 PST