Re: AMBER: pmemd error from Robert Duke on 2006-09-14 (Amber Archive Sep 2006)

From: Robert Duke <rduke.email.unc.edu>
Date: Thu, 14 Sep 2006 12:46:30 -0400

pmemd errorBala -
This is pretty much the exact same error you reported before (before Aug 17), right down to the stacktrace, with perhaps slight differences in the mdin file, but still a langevin dynamics run, and on a cluster using around 50-64 procs. Did you fix the potential issue with stacksize I suggested back then (are you sure - can you determine that stacksize is indeed unlimited on ALL nodes when you use mpi to start a proc?) If not, then there could be an install issue (ie., maybe something is wrong with how mpi was installed, but figuring something out like that would require building a debug copy. enabling stack trace, etc. etc. It could also be a compiler version issue, which can be a pain to track down. If you really want help on this sort of thing, you have to give us a lot more info, and if there is a system issue, that could require system access by someone knowledgeable. If it is not a stacksize issue, it should show up on a smaller # of processors, but because one doesn't get exactly reproducible results from run to run (due to roundoff errors in network operations - where order of additions/subtractions and net indeterminacy get you) it can be heck to reproduce these things. Now, last time you said it also occurred for sander. That makes it fairly unlikely that it is a pmemd mpi code issue, because the parallel code is completely different in the two programs. Because it is using langevin dynamics, it would be nice to know if it dies pretty routinely in short order. I vaguely remember some langevin dynamics problems in early implementations of sander and pmemd, but I don't think you should be seeing this in amber 9 or fully patched 8 (sorry, I don't remember the details, but there was some scenario where langevin dynamics would consistently blow up - Ross Walker, do you remember this?)
Regards - Bob Duke

----- Original Message -----
  From: bala
  To: ambermail
  Sent: Thursday, September 14, 2006 11:34 AM
  Subject: AMBER: pmemd error

  Dear Amber users,

  I am using pmemd for my simulation. My simulation was running fine for small duration 50ps or 100ps. When i submitted a job for 1ns, it gets interrupted without any error in the output file. I have written the error i am getting below. I am using 50 processor for my job. Kindly someone suggest me on this.

  forrtl: severe (174): SIGSEGV, segmentation fault occurred
  Image PC Routine Line Source
  libvapi.so 0000002A96BEF4AF Unknown Unknown Unknown
  srun: error: n71: task2: Exited with exit code 174
  srun: Terminating job
  srun: error: n70: task0: Exited with exit code 174
  --------------------------------------------------------------------------------
  The following is my input file

   &cntrl
   imin=0,ntx=7,irest=1,
   ntpr=100,ntwx=500,
   ntb=2,cut=10.0,
   ntr=1,nstlim=1000000,
   ntt=3,gamma_ln=1,tempi=300.0,temp0=300.0,
   ntp=1,
   ntc=2,ntf=2
  /
  Hold the nuclic acid
  5.0
  RES 1 25
  END
  END

  thanks in advance,
  c.bala

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Sep 17 2006 - 06:07:13 PDT