Re: [AMBER] GPU performance and GPU Direct

From: Gould, Ian R <>
Date: Wed, 7 Mar 2012 14:14:08 +0000

Hi Thomas,

To the best of my knowledge there should be no performance improvement/hit
if you activate GPU Direct for two GPU cards within a single node. But I
am probably wrong and Ross or Scott will correct me here.

If you've telinit to level 3 then X will not be running and the amount of
memory to run linux in command line mode is very small. It is only an
issue re:memory on the GPU card, if you've 3Gb or more it really should
never be an issue and there is no performance implications.

The 20% is about normal for small jobs across two GPU's in a single box if
they are sharing the same PCI bus, if memory serves and I'm too lazy to
check the benchmark page the results Ross has up for 2 gpu's in a single
system is for a dual processor Xeon box where each gpu is on one
independent channel. My own findings of running two GPU's in a single box,
that is one processor motherboard, is that I see roughly 20% scale up for
small systems, about the best I've ever seen is about a 35% speed up for
really big system size, the cellulase test case. My personal view on this
is not to run across GPU's but to run them in serial configuration as you
get the most ns/day that way.

Anyway my 2cents


Women love us for our defects. If we have enough of them, they will
forgive us everything, even our intellects.
Oscar Wilde,
Dr Ian R Gould
Reader in Computational Chemical Biology
Department of Chemistry
Imperial College London
Exhibition Road
Tel +44 (0)207 594 5809
On 07/03/2012 13:26, "" <>
>Hi CUDA users,
>I have a few short questions on running pmemd.cuda{,.MPI}:
>- It is suggested to enable GPU Direct for running across multiple nodes.
>Will this also affect performance for two GPU cards within the same node
>or does it only concern Infiniband connects?
>- My machine has only two GPUs, no onboard graphics. Therefore, GPU0
>defaults to screen output, even at runlevel 3. If I boot up without a
>screen attached, will one of my GPUs still handle displaying a useless
>login screen? Does this affect performance? If yes, how to avoid?
>- Scaling to 2 GPUs even in the same node gives about a 20% speedup
>(comparable to the benchmarks). Is there a specific bottleneck for this,
>like input file settings, system size etc? Did anyone see significantly
>better scaling under some circumstances? Not that running 900 cores in a
>single machine isn't awesome, but at the moment it seems running two
>independent sims on each GPU would be preferable.
>Kind Regards,
>Dr. Thomas Steinbrecher
>formerly at the
>BioMaps Institute
>Rutgers University
>610 Taylor Rd.
>Piscataway, NJ 08854
>AMBER mailing list
AMBER mailing list
Received on Wed Mar 07 2012 - 06:30:02 PST
Custom Search