Re: [AMBER] pmemd.cuda.MPI error - integer divide by zero

From: Jason Swails <jason.swails.gmail.com>
Date: Sat, 11 Jul 2015 08:27:58 -0400

On Fri, Jul 10, 2015 at 8:03 PM, David A Case <david.case.rutgers.edu>
wrote:

> On Fri, Jul 10, 2015, Bill Miller III wrote:
> >
> > I am trying to get pmemd.cuda.MPI to run on two GTX-980s in parallel on a
> > workstation running RedHat 6.6. I have re-compiled with an updated
> Amber14
> > (released with patches as of today, not developers tree) using openmpi
> > 1.6.5 and gnu (gcc/gfortran v. 4.4.7-11). Whenever I try to run a MD
> > simulation in parallel, I get the following error messages immediately. I
> > tried googling for several of the messages, but nothing seemed
> appropriate
> > for my particular situation. Any ideas?
>
> 1. Do the pmemd.cuda.MPI test cases pass? This will help discriminate
> between
> problems with your input and problems with the installation.
>
> 2. Is your system equilibrated, or might it have bad forces? Because of
> the
> way pmemd.cuda is coded, systems with bad forces (e.g. that have not been
> minimized and/or equilibrated) can get errors on GPU's. Workaround is to
> equilibrate the system on a CPU before moving to GPU's.
>
> I recognize that this is not much help, but looking esp. as point 1 will
> help decide if you linked to the wrong libraries (unlikely).
>

​Yea, point 1 is probably the best bet here. I doubt (2) factors in, since
the divide-by-zero exception occurs in the setup routines (before any
forces are computed). If the pmemd.cuda.MPI tests pass, we will likely
need copies of the input files to verify and reproduce the problem.

Also -- how does this work with pmemd.MPI?

All the best,
Jason

-
​-​
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Jul 11 2015 - 05:30:03 PDT
Custom Search