[AMBER] Amber16 on GPUs and speed differences between CUDA_VISIBLE_DEVICES=0, 1 or 0, 2

From: Neale, Christopher Andrew <cneale.lanl.gov>
Date: Mon, 19 Dec 2016 18:09:00 +0000

Dear Users:

I am trying to run multiple Amber16 instances on a single node (separate, standard simulations). This is a system from Exxact. Each node has 4x NVIDIA GeForce GTX TITAN GPUs (pre-Pascal) and 12 physical cores

I tried the code below, and though I have 4 GPUs per node I can only get full speed when I run only 2 single-GPU jobs (and those jobs have to go on e.g. GPUs 0 and 2 rather than GPUs 0 and 1). In the code below, I get 115 ns/day (per simulation) if I run GPUA=0 and GPUB=2, but I get only 65 ns/day if I run GPUA=0 and GPUB=1.
    
GPUA=0
GPUB=1

{
  export CUDA_VISIBLE_DEVICES=$GPUA
  NAM=A
  mpirun -np 1 ${AMBERHOME}/bin/pmemd.cuda.MPI -O -i md_restart.in -o ${NAM}.out -p this.prmtop -c MD1.rst -r ${NAM}.rst -x ${NAM}.mdcrd -inf ${NAM}.info -l ${NAM}.log
} &

{
  export CUDA_VISIBLE_DEVICES=$GPUB
  NAM=B
  mpirun -np 1 ${AMBERHOME}/bin/pmemd.cuda.MPI -O -i md_restart.in -o ${NAM}.out -p this.prmtop -c MD1.rst -r ${NAM}.rst -x ${NAM}.mdcrd -inf ${NAM}.info -l ${NAM}.log
} &

wait

################################



$ nvidia-smi
+------------------------------------------------------+
| NVIDIA-SMI 346.89 Driver Version: 346.89 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... On | 0000:02:00.0 Off | N/A |
| 22% 31C P8 28W / 250W | 23MiB / 12287MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... On | 0000:03:00.0 Off | N/A |
| 22% 26C P8 14W / 250W | 23MiB / 12287MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX TIT... On | 0000:81:00.0 Off | N/A |
| 22% 25C P8 14W / 250W | 23MiB / 12287MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX TIT... On | 0000:82:00.0 Off | N/A |
| 22% 24C P8 15W / 250W | 23MiB / 12287MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+


##################################

I found this post on doing the same type of thing: http://archive.ambermd.org/201405/0364.html but it doesn't mention anything about slowdowns.

One other thing I noted is that running a 2-GPU job is faster if I use GPUs 0 and 1 (160 ns/day) but not any faster than the single-GPU job if I use GPUs 0 and 2 (118 ns/day).

I'm guessing that this is all hardware related, but since it's an Exxact system I am confused about not finding much mention of this when I search the internet.

Thank you for your help,
Chris.
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Dec 19 2016 - 10:30:02 PST
Custom Search