Hi Neha,
The best approach for systems that require you to request and entire node is just to run two separate jobs. E.g.
cd job1
$AMBERHOME/bin/pmemd.cuda -O -i mdin -o mdout ...... &
cd ../job2
$AMBERHOME/bin/pmemd.cuda -O -i mdin -o mdout ...... &
wait
The key is the keyword wait here which stops the script returning until both jobs have completed. If the jobs are the same system run with the same input just different random seeds to get improved sampling that should take almost identical time so the script load balances well. The other option you can do is write a loop in here, with the wait statement, that loops 2 calculations over a series of steps - say lots of 50ns production steps one after the other. That way if one job finishes quickly that GPU gets given more work to do while the other one is still running.
With regards to the problem you are seeing using both GPUs for a single calculation though this definitely looks like some kind of bug. It maye be related to the scaledMD though. Can you try this leaving out all the options after the ig=-1 (no restraints and no scaled MD) and see if you get the same difference between single GPU and multi-GPU.
All the best
Ross
> On Nov 8, 2017, at 4:24 PM, Neha Gandhi <n.gandhiau.gmail.com> wrote:
>
> Hi Ross,
>
> I understand that serial GPU performs better over parallel for some
> systems. We are required to request 2 GPUs per job on most supercomputing
> facilities in Australia even if the job will only use single gpu.
>
> The specification of my job is below
>
> AMBER version 16.
>
> &cntrl
>
>
> imin=0,irest=1,ntx=5,
>
>
> nstlim=5000000,dt=0.002,
>
>
> ntc=2,ntf=2,
>
> cut=12.0, ntb=2, ntp=1,
> taup=1,
> ntpr=5000,
> ntwx=5000,ntwr=5000,
> ntt=3,
> gamma_ln=5.0,
>
> temp0=310.0,iwrap=1,ig=-1,
>
> ntr=1,
> restraintmask=':CNT',
>
> restraint_wt=2.0,
>
>
> scaledMD=1,
>
>
> scaledMD_lambda=0.70,
>
> /
>
>
>
> Jobscript on HPC
>
> #!/bin/bash -l
> #PBS -N cnt2
> #PBS -l walltime=24:00:00
> #PBS -l select=1:ncpus=2:ngpus=2:mpiprocs=2:gputype=M40:mem=10gb
> #PBS -j oe
>
> module purge
> module load
> amber/16-iomkl-2016.09-gcc-4.9.3-2.25-ambertools-16-patchlevel-5-14-cuda
>
> cd $PBS_O_WORKDIR
> mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i smd.in -o smd1.out -p
> solvated.prmtop -c npt1.rst -r smd1.rst -x smd1.netcdf -ref npt1.rst
>
>
> |------------------- GPU DEVICE INFO --------------------
> |
> | Task ID: 0
> | CUDA_VISIBLE_DEVICES: 0,1
> | CUDA Capable Devices Detected: 2
> | CUDA Device ID in use: 0
> | CUDA Device Name: Tesla M40 24GB
> | CUDA Device Global Mem Size: 22971 MB
> | CUDA Device Num Multiprocessors: 24
> | CUDA Device Core Freq: 1.11 GHz
> |
> |
> | Task ID: 1
> | CUDA_VISIBLE_DEVICES: 0,1
> | CUDA Capable Devices Detected: 2
> | CUDA Device ID in use: 1
> | CUDA Device Name: Tesla M40 24GB
> | CUDA Device Global Mem Size: 22971 MB
> | CUDA Device Num Multiprocessors: 24
> | CUDA Device Core Freq: 1.11 GHz
> |
>
> I can confirm that the same job with single GPU shows density of 0.99.
>
> NSTEP = 5000 TIME(PS) = 40660.000 TEMP(K) = 310.08 PRESS =
> 28.4
> Etot = -267200.5515 EKtot = 113029.5000 EPtot =
> -380230.0515
> BOND = 2973.9049 ANGLE = 3436.4722 DIHED =
> 5568.2840
> 1-4 NB = 5125.8320 1-4 EEL = 9867.1070 VDWAALS =
> 76308.4188
> EELEC = -646585.5951 EHBOND = 0.0000 RESTRAINT =
> 119.7885
> EAMBER (non-restraint) = -380349.8399
> EKCMT = 54095.5514 VIRIAL = 52953.5751 VOLUME =
> 1860086.9230
> Density =
> 0.9909
> ------------------------------------------------------------------------------
>
>
> I also tried parallel job on Pascal gpu on different hpc but there are
> issues with density on parallel gpus.
>
> Thanks,
> Neha
>
> On 9 November 2017 at 00:21, Ross Walker <rosscwalker.gmail.com> wrote:
>
>> Hi Neha,
>>
>> There should be no difference in the density - or any of the properties -
>> using a single or multiple GPUs. Can you confirm that if you restart the
>> calculation from the restart file at the 40.65ns stage you show below on a
>> single GPU then the simulation continues at a density of approximately 0.99?
>>
>> The key being to isolate that the problem is just from using more than 1
>> GPU and not from some other issue such as an incorrect setting in the mdin
>> file for example.
>>
>> It would also help if you can provide some more details about your setup,
>> GPU model, AMBER version etc.
>>
>> Note, as an aside running a single simulation on multiple GPUs is not
>> always faster so you might want to check that you actually get a speed
>> improvement from using more than 1 GPU at once. Although that's separate
>> from the issue you are reporting since even if it runs slower on multiple
>> GPUs it shouldn't give incorrect answers.
>>
>> All the best
>> Ross
>>
>>> On Nov 8, 2017, at 1:15 AM, Neha Gandhi <n.gandhiau.gmail.com> wrote:
>>>
>>> Dear List,
>>>
>>> I am running NPT simulation using parallel gpus. Upon using pmemd.cuda,
>> the
>>> density is 0.99.
>>>
>>>
>>> A V E R A G E S O V E R 2000 S T E P S
>>>
>>>
>>> NSTEP = 10000000 TIME(PS) = 40650.000 TEMP(K) = 309.99 PRESS =
>>> 1.8
>>> Etot = -461754.3426 EKtot = 120156.4717 EPtot =
>>> -581910.8142
>>> BOND = 2213.1206 ANGLE = 2543.5550 DIHED =
>>> 5043.7658
>>> 1-4 NB = 5223.7909 1-4 EEL = 9977.1082 VDWAALS =
>>> 81530.0154
>>> EELEC = -688536.8366 EHBOND = 0.0000 RESTRAINT =
>>> 94.6664
>>> EAMBER (non-restraint) = -582005.4806
>>> EKCMT = 57615.1448 VIRIAL = 57540.2827 VOLUME =
>>> 1978006.8374
>>> Density =
>>> 0.9904
>>> -----------------------------------------------------------
>>> -------------------
>>>
>>>
>>> When I use parallel gpus, (the jobs are still onging), the density is
>> 0.87.
>>> I was wondering if this is the expected behaviour when using
>>> pmemd.cuda.MPI.
>>>
>>>
>>>
>>>
>>> NSTEP = 70000 TIME(PS) = 40790.000 TEMP(K) = 310.56 PRESS =
>>> -7.3
>>> Etot = -203302.7551 EKtot = 113220.4141 EPtot =
>>> -316523.1692
>>> BOND = 2874.4111 ANGLE = 3392.6226 DIHED =
>>> 5522.4047
>>> 1-4 NB = 5274.6312 1-4 EEL = 9963.2258 VDWAALS =
>>> 54760.1196
>>> EELEC = -534081.2338 EHBOND = 0.0000 RESTRAINT =
>>> 117.8627
>>> EAMBER (non-restraint) = -316641.0319
>>> EKCMT = 53755.0242 VIRIAL = 54084.9861 VOLUME =
>>> 2107014.1166
>>> Density =
>>> 0.8749
>>> ------------------------------------------------------------
>> ------------------
>>>
>>>
>>> NSTEP = 75000 TIME(PS) = 40800.000 TEMP(K) = 309.02 PRESS =
>>> 12.8
>>> Etot = -204018.1082 EKtot = 112657.9297 EPtot =
>>> -316676.0379
>>> BOND = 2955.5325 ANGLE = 3323.3666 DIHED =
>>> 5514.8454
>>> 1-4 NB = 5328.7581 1-4 EEL = 9964.6209 VDWAALS =
>>> 55136.4175
>>> EELEC = -534740.1213 EHBOND = 0.0000 RESTRAINT =
>>> 122.2405
>>> EAMBER (non-restraint) = -316798.2784
>>> EKCMT = 53530.0160 VIRIAL = 52950.1399 VOLUME =
>>> 2104363.9330
>>> Density =
>>> 0.8760
>>> ------------------------------------------------------------
>> ------------------
>>>
>>> I am happy to provide more information on job script and input files if
>>> required.
>>>
>>> Regards,
>>> Neha
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>
> --
> Regards,
> Dr. Neha S. Gandhi,
> Vice Chancellor's Research Fellow,
> Queensland University of Technology,
> 2 George Street, Brisbane, QLD 4000
> Australia
> LinkedIn
> Research Gate
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Nov 08 2017 - 14:00:02 PST