Re: [AMBER] pmemd.cuda.MPI error - integer divide by zero

From: Bill Miller III <brmilleriii.gmail.com>
Date: Tue, 14 Jul 2015 21:11:34 -0400

I just wanted to follow up. After your helpful suggestions, I found out the
Amber cuda parallel tests gave the same error message I originally posted.
When I ran the same simulation with pmemd.MPI on the CPUs I found out that
pmemd.MPI wasn't recognizing that I was running in parallel (even though I
was using 'mpirun -np 16' because it was giving me an error message saying
pmemd.MPI required at least 2 processors. Based on an off-list suggestion
by Jason, this appeared to be a mix-up of MPI versions because I had
previously compiled using a different Open MPI version and this wasn't
completely cleared out when I updated the code and re-compiled.

In the end, I did a complete re-compile with the correct Open MPI version
and now the tests pass and my simulation runs successfully.

Thanks again for all the help.

-Bill

On Sat, Jul 11, 2015 at 8:27 AM, Jason Swails <jason.swails.gmail.com>
wrote:

> On Fri, Jul 10, 2015 at 8:03 PM, David A Case <david.case.rutgers.edu>
> wrote:
>
> > On Fri, Jul 10, 2015, Bill Miller III wrote:
> > >
> > > I am trying to get pmemd.cuda.MPI to run on two GTX-980s in parallel
> on a
> > > workstation running RedHat 6.6. I have re-compiled with an updated
> > Amber14
> > > (released with patches as of today, not developers tree) using openmpi
> > > 1.6.5 and gnu (gcc/gfortran v. 4.4.7-11). Whenever I try to run a MD
> > > simulation in parallel, I get the following error messages
> immediately. I
> > > tried googling for several of the messages, but nothing seemed
> > appropriate
> > > for my particular situation. Any ideas?
> >
> > 1. Do the pmemd.cuda.MPI test cases pass? This will help discriminate
> > between
> > problems with your input and problems with the installation.
> >
> > 2. Is your system equilibrated, or might it have bad forces? Because of
> > the
> > way pmemd.cuda is coded, systems with bad forces (e.g. that have not been
> > minimized and/or equilibrated) can get errors on GPU's. Workaround is to
> > equilibrate the system on a CPU before moving to GPU's.
> >
> > I recognize that this is not much help, but looking esp. as point 1 will
> > help decide if you linked to the wrong libraries (unlikely).
> >
>
> ​Yea, point 1 is probably the best bet here. I doubt (2) factors in, since
> the divide-by-zero exception occurs in the setup routines (before any
> forces are computed). If the pmemd.cuda.MPI tests pass, we will likely
> need copies of the input files to verify and reproduce the problem.
>
> Also -- how does this work with pmemd.MPI?
>
> All the best,
> Jason
>
> -
> ​-​
> Jason M. Swails
> BioMaPS,
> Rutgers University
> Postdoctoral Researcher
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
Bill Miller III
Post-doc
University of Richmond
417-549-0952
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jul 14 2015 - 18:30:02 PDT
Custom Search