[AMBER] BUG and FIX: pmemd crashes when vlimit exceeded

From: <Don.Bashford.stjude.org>
Date: Tue, 10 Aug 2010 19:46:58 -0500

I was running pmemd from Amber10 under MPI on 16 processors and it
crashed with messages to stderr like:

   MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
   with errorcode 1

shortly after emitting the warning to stdout:

  vlimit exceeded for step 747615; vmax = 39.4731

vlimit had the default value (20.0) after over 1 ns of production, and
this was the only vlimit warning. It looks to me like the problem
comes from around line 733 in amber11/src/pmemd/src/runmd.fpp:

        ! Only violations on the master node are actually reported
        ! to avoid both MPI communication and non-master writes.
        write(mdout, '(a,i6,a,f10.4)') 'vlimit exceeded for step ', nstep, &
                                        '; vmax = ', vmax

Although the comment says only the master will report, I don't see any
code to actually enforce that. Elsewhere in runmd.fpp, writes to mdout are
protected by an "if (master) then .... end if" production immediately
around the write statement. So I assume the fix is just to do that
here also.

I don't know much about MPI. Is it usual for an MPI application to
crash if a non-master tries to write? Is this dependent on your MPI
implementation/environment?

I experienced this problem in Amber10 with patches up to bugfix 30.
The more recent bugfixes don't seem to cover it, and the problem seems
to still be there in the Amber11 source.

Don Bashford
Department of Structural Biology
Saint Jude Children's Research Hospital
Memphis, TN

Email Disclaimer: www.stjude.org/emaildisclaimer


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Aug 10 2010 - 18:00:05 PDT
Custom Search