Re: [AMBER] pmemd cuda MPI and PBS_GPUFILE from Scott Brozell on 2012-11-07 (Amber Archive Nov 2012)

From: Scott Brozell <sbrozell.rci.rutgers.edu>
Date: Wed, 7 Nov 2012 12:25:13 -0500

Hi,

On Tue, Nov 06, 2012 at 10:50:52AM -0500, Jason Swails wrote:
> On Tue, Nov 6, 2012 at 12:38 AM, Ross Walker <ross.rosswalker.co.uk> wrote:
> > >http://ambermd.org/gpus/#Running
> > >" Ideally you would have a batch scheduling system that will set
> > >everything up for you correctly "
> > >
> > >In fact, PBS does just that with its PBS_GPUFILE, e.g.,
> > >#PBS -l nodes=2:ppn=x:gpus=2
> > >...
> > >cat $PBS_GPUFILE
> > >cat /var/spool/batch/torque/aux//517906.batch.edugpu
> > >n0659-gpu1
> > >n0659-gpu0
> > >n0658-gpu1
> > >n0658-gpu0
> > >
> > >And a reliable PBS source indicates that the PBS_GPUFILE and its syntax
> > >are stable.
> > >When will pmemd support PBS_GPUFILE ?
>
> In response to Scott, I can't imagine this happening, at least in the
> foreseeable future. The effort that would be put into learning and
> implementing the required parts of the PBS API will most likely go into
> feature development and enhancements instead. IMO, it's the MPIs that
> should support this, not the CUDA applications themselves. mpiexec and/or
> mpirun should, when compiled against the existing torque API, be able to
> descriminate and launch processes strictly on the allocated GPUs. Most
> (all?) MPIs already have the code to support torque integration, so it
> seems a simpl*er* task for them, and well worth generalizing above and
> beyond pmemd.cuda(.MPI).

That seems reasonable, so i'll contact them.

> > >Please provide a workaround script that takes a $PBS_GPUFILE and spews
> > >all the necessary environment variables to run on the specified gpus.
> >
> > Volunteers? - Should be pretty simple for some Bash whizz to figure this
> > out.
>
> This is surprisingly not simple to do in general if/when you use GPUs
> scattered across different nodes. Suppose you have 3 GPUs per node (e.g.,
> Keeneland), and you want to use 8 total GPUs (say, for a REMD job or
> something). To make things clean, we ask for 4 nodes, 2 GPUs per node, so
> we are charged only for what we need. PBS_GPUFILE can now point to GPU 0,
> 1 on node 1, 1, 2 on node 2, etc, based on any GPUs that may be used
> already. (We can take this a step further and just ask for any 8 GPUs
> regardless of the node/GPU #).
>
> So you need to be able to set this environment variable on a per-thread
> basis. As this is unnecessary for CPUs, I don't think this has really been
> addressed before.
>
> The staff at the UF HPC has written a script that seems to work correctly
> (that is, CUDA_VISIBLE_DEVICES is set on a per-process basis so that only
> the GPUs specified in PBS_GPUFILE are used).
>
> The solution is here: http://wiki.hpc.ufl.edu/doc/CUDA#pbsgpu-wrapper (I
> have attached the pbsgpu-wrapper script they reference in there).
>
> Note in many cases this may be overkill. If you are required to request
> entire nodes and all GPUs on it (or you do, as general practice), then this
> is unnecessary (just let the GPUs be chosen by default). If you are
> running only on a single node, you can parse PBS_GPUFILE directly and set a
> single CUDA_VISIBLE_DEVICES for all threads.

Excellent, thanks for the scripts.
FWIW, i already knew that it was not a trivial problem to write a
general purpose robust script and to handle non full node requests.
But a solution that starts with the PBS_GPUFILE, whether processed by MPI
implementation, pmemd binaries, or helper scripts
is overdue IMO.

The current pmemd approach with user setup of CUDA_VISIBLE_DEVICES might
be ok for home grown clusters, but is not ready for prime time.
I recommend that some polish (author info, etc) be added to your
institution's scripts and that they be distributed with Amber.

On Tue, Nov 06, 2012 at 04:23:59PM +0000, Jodi Ann Hadden wrote:
> Just wanted to note that the line
>
> #PBS -l nodes=2:ppn=2:gpus=2
>
> will not work if job submission incorporates a scheduler that does not understand GPUs as a resource, such as MAUI.

We have a scheduler that understands gpus, but thanks for your response.

scott

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Nov 07 2012 - 09:30:03 PST