Thanks. I ran ps and there are just 2 pmemd.cuda jobs running. "nvidia-smi"
also indicates that only 2 gpus are occupied. Anyway I am running these job
in serial. Like you mentioned above, parallel scalability is limited
currently. So there are not much advantage running parallel.
Thanks again.
Victor
On Wed, Aug 7, 2013 at 2:44 PM, Jason Swails <jason.swails.gmail.com> wrote:
> On Wed, Aug 7, 2013 at 3:37 PM, Victor Ma <victordsmagift.gmail.com>
> wrote:
>
> > Thanks for both replies. The simulation does run fine in serial. I
> checked
> > em1.out. It says "CUDA (GPU): Minimization is NOT supported in parallel
> on
> > GPUs. " The message is pretty clear. I then tested the parallel
> calculation
> > with a production run. The command is
> > mpirun --machinefile=nodefile -np 2 pmemd.cuda.MPI -O -i prod.in -o
> > prod.out -c md3.rst -p complex_wat.prm -r prod.rst -x prod.crd -ref
> md3.rst
> > &
> >
> > In the nodefile, I put
> > localhost:2
> > (My machine has 4GPU and 24 CPU)
> >
> > This time, the error message is "cudaMemcpyToSymbol: SetSim copy to cSim
> > failed all CUDA-capable devices are busy or unavailable". But I do have 2
> > idle GPUs in the machine. Any idea?
> >
>
> Maybe there is a rogue process running that is 'occupying' the GPU...?
>
> You could always print out the output of "nvidia-smi" to see, although I
> don't know that that would necessarily tell you if you had a process
> claiming to need the GPU. If you only run pmemd.cuda on the GPUs, you
> could use the "ps" utility to look for pmemd.cuda jobs (or CUDA namd,
> gromacs, etc.)
>
> HTH,
> Jason
>
>
> >
> > Thanks.
> >
> > Victor
> >
> >
> > On Wed, Aug 7, 2013 at 1:50 PM, David A Case <case.biomaps.rutgers.edu
> > >wrote:
> >
> > > On Wed, Aug 07, 2013, Victor Ma wrote:
> > > >
> > > > I have amber12 and openmpi installed and configured on a 4GPU
> machine.
> > > I'd
> > > > like to run multi-GPU amber simulation. Here is the command I used:
> > > > mpirun --machinefile=nodefile -np 2 pmemd.cuda.MPI -O -i em1.in -o
> > > em1.out
> > > > -c complex_wat.inpcrd -p complex_wat.prm -r em1.rst -ref
> > > complex_wat.inpcrd
> > > > &
> > > >
> > > > And the error messageI got is,
> > > > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> > > >
> > > > Does that mean my openmpi is not properly configured?
> > >
> > > Could be anything. First, run the test suite to see if you have a
> > generic
> > > problem (e.g. with openmpi configuration).
> > >
> > > Second, look at the output files, especially em1.out. Most likely,
> there
> > > is
> > > an error message there. The MPI_Abort message just informs you that
> the
> > > process failed. You have to look at the acutal outputs to find out
> why.
> > >
> > > ...dac
> > >
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
>
> --
> Jason M. Swails
> BioMaPS,
> Rutgers University
> Postdoctoral Researcher
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Aug 07 2013 - 13:00:05 PDT