Re: [AMBER] Problem with GPU equilibration step [ cudaMalloc GpuBuffer::Allocate failed out of memory] from Yogeeshwar Ajjugal on 2015-12-09 (Amber Archive Dec 2015)

From: Yogeeshwar Ajjugal <bo14resch11004.iith.ac.in>
Date: Thu, 10 Dec 2015 11:34:51 +0530

Dear Jason,

Thanks for your needful suggestion.

Regards
yoogi

On Wed, Dec 9, 2015 at 6:16 PM, Jason Swails <jason.swails.gmail.com> wrote:

> On Wed, Dec 9, 2015 at 12:42 AM, Yogeeshwar Ajjugal <
> bo14resch11004.iith.ac.in> wrote:
>
> > Dear amber users,
> >
> > Iam trying to run equilibration step in GPU but its showing
> the
> > cudaMalloc error. Here iam attaching my Pbs script. please any help can
> be
> > appreciated.
> >
> > #!/bin/bash
> > #PBS -l nodes=1:ppn=16:GPU
> > #PBS -l walltime=07:00:00:00
> > #PBS -q GPUq
> > #PBS -e err_""$PBS_JOBID
> > #PBS -o out_""$PBS_JOBID
> > #PBS -r n
> > #PBS -V
> > #PBS -M bo14resch11004.iith.ac.in
> >
> > export I_MPI_JOB_CONTEXT=$PBS_JOBID
> > export OMP_NUM_THREADS=2
> > echo PBS JOB id is $PBS_JOBID
> > echo PBS_NODEFILE is $PBS_NODEFILE
> > echo PBS_QUEUE is $PBS_QUEUE
> > NPROCS=`wc -l < $PBS_NODEFILE`
> > echo NPROCS is $NPROCS
> > NRINGS=`cat $PBS_NODEFILE |sort|uniq|wc -l`
> > echo NRINGS is $NRINGS
> > NGPUS=`expr $NRINGS \* 2`
> > echo NGPUS is $NGPUS
> > cd $PBS_O_WORKDIR
> >
> > export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
> >
> > setenv AMBERHOME /home/external/iith/yajjugal/programs/amber_gpu/amber12
> >
> > cd $PBS_O_WORKDIR
> > mpirun -machinefile $PBS_NODEFILE -np 16
> > /home/external/iith/yajjugal/programs/amber_gpu/amber12/bin/pmemd.cuda -O
> > -i step1.inp -o step1.out -r step1.rst -p ../input/pr
> > mtop -c ../input/prmcrd -ref ../input/prmcrd
> >
> > mpirun -machinefile $PBS_NODEFILE -np 16
> > /home/external/iith/yajjugal/programs/amber_gpu/amber12/bin/pmemd.cuda -O
> > -i step2.inp -o step2.out -r step2.rst -p ../input/pr
> > mtop -c step1.rst -ref step1.rst
> >
>
> There are several things wrong here. First, pmemd.cuda is a serial
> program, not a parallel one. What this script is doing is running 16 copies
> of the same job on all of the available GPUs on the assigned compute node
> (based on your script, it would seem each node has 2 GPUs). This is bad.
> The output files from each copy will try and overwrite those from the other
> copies, they will be competing for resources, etc. pmemd.cuda, like all
> serial programs, is meant to be used *without* mpirun (or, if the cluster
> requires mpirun be used, via "mpirun -np 1" to force only a single thread).
>
> Second, pmemd.cuda.MPI (which is the parallel version of pmemd.cuda)
> parallelizes across GPUs -- it does not use the CPUs for much computing.
> As a result, you should use -np # where # is the number of GPUs you are
> requesting, NOT the number of CPUs. Even if you used pmemd.cuda.MPI in
> your above commands, it would try to use 16 GPUs. If your node only has 2,
> then it will try to run 8 of the threads on a single GPU, which will ruin
> performance. That said, I do not think the parallel performance for
> pmemd.cuda.MPI is very impressive (Amber 14 I believe is substantially
> better due to peer-to-peer support). You might be better off running
> independent serial simulations, even if the parallel simulations work.
>
> HTH,
> Jason
>
> --
> Jason M. Swails
> BioMaPS,
> Rutgers University
> Postdoctoral Researcher
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Dec 09 2015 - 22:30:03 PST