Re: [AMBER] Select cuda ID device in PMEMD from Ross Walker on 2011-11-17 (Amber Archive Nov 2011)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 17 Nov 2011 21:19:22 -0800

Hi Gonzalo,

> Thanks a lot for the info, I will read it very carefully and make some
> testing. Yes, it seems that pmemd.cuda.mpi is programmed not to use any
> used
> GPU, but in my case, if I run, let's say, two mpirun jobs on the same

Correction: The code is setup to not use any used GPU within the same run.
It cannot determine information regarding the use of GPUs by other codes or
other instances of AMBER.

> 6-GPUs
> node using 3 GPUs each, each job takes forever and reading the .out
> files,
> it seems that the two jobs are using the same GPUs:
>
> md.out:| CUDA Device ID in use: 0
> md.out:| CUDA Device ID in use: 1
> md.out:| CUDA Device ID in use: 2
> md.out2:| CUDA Device ID in use: 0
> md.out2:| CUDA Device ID in use: 1
> md.out2:| CUDA Device ID in use: 2
>
> Anyway, I will read your info and try the CUDA_VISIBLE_DEVICES option.

Exactly. Here you are running both calculations on GPUs 0 to 2 so each GPU
is running two calculations hence the slow down. What you want to do is set:

export CUDA_VISIBLE_DEVICES="0,1,2"

before running the first job and then

export CUDA_VISIBLE_DEVICES="3,4,5"

before running he second job.

You can check GPU utilization with the command

nvidia-smi -a

Note that the CUDA_VISIBLE_DEVICES command actually rebases the hardware GPU
ID. so in the second case here the code will still see the GPUs as ID's 0, 1
and 2 even though those physically correspond to GPUs 3,4 and 5.

Note though that if you have 6 GPUs in a node this means, given current
PCI-E gen 2 limitations, that the GPUs are almost certainly sharing x16
channels. The GPUs will not all have dedicated x16 PCI-E bandwidth and so
parallel GPU performance is likely to be pretty terrible. Unfortunately
these multiple GPUs per node machines are terribly designed but marketers
keep flogging them without any real regard or understanding of their actual
performance. This is the reason we created the MD SimCluster program. See:
http://ambermd.org/news.html#simcluster and
http://exxactcorp.com/testdrive/md/ for a machine that is well designed,
balanced and optimized for best performance with AMBER (and most parallel
GPU codes for that matter). Best advice is to avoid the likes of Dell etc
with their bargain basement GPU breakout boxes.

Note if you already have the physical hardware you might want to consider
pulling two of the GPUs out of each node so each node has 4 GPUs with all
the GPUs in their own physical dedicated X16 slot. This is really the only
way at present (until Intel fixes their QPI to conform to the PCI-E spec and
allow peer to peer transfers) to get reasonable performance in parallel
across multiple GPUs.

All the best
Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Nov 17 2011 - 21:30:04 PST