Re: [AMBER] issue using CUDA+MPI+OpenMP

From: Jason Swails <>
Date: Thu, 9 Jul 2015 09:06:51 -0400

On Thu, Jul 9, 2015 at 8:40 AM, Brent Krueger <> wrote:

> A quick follow-up question for anyone who has tried what Ross mentions. My
> impression is that if you are running 4 pememd.cuda jobs on this node that
> Ross envisioned, your hard drive will be reasonably busy saving MD data
> from those 4 jobs. So, if one was to also run a job that utilizes the
> CPUs, then the storage system might be very busy. Something like Gaussian
> that would be doing a lot of scratch read/writes might lead to some pretty
> significant bottlenecks? Something like a pmemd CPU job might not be too
> bad?

​Some of this depends on your write options. If, as you should, you set
ioutfm=1 and ntxo=2 to always write NetCDF, that will significantly reduce
the I/O pressure on the filesystem.

Some related benchmarks may help. Awhile ago I ran some quick benchmarks
of a JAC-like system with performance as a function of writing snapshots:

As you can see, the performance really doesn't start to degrade measurably
until you start to write every 10 to 50 steps. The performance degradation
is a combination of the increased GPU<->CPU communications, trajectory I/O,
and the related operations (like wrapping, if applicable). And these were
all done on a typical desktop (with no RAID) -- it's easy to get better
parallel filesystem performance.

Based on this information, I would say that even running 4 independent GPU
simulations on a single desktop, the I/O buffers are not being *that*
saturated, so I doubt there will be much of a limitation in the other
applications you can run.

All the best,

Jason M. Swails
Rutgers University
Postdoctoral Researcher
AMBER mailing list
Received on Thu Jul 09 2015 - 06:30:02 PDT
Custom Search