Re: AMBER: amber 9 sander crashed with "forrtl: severe (174): SIGSEGV, segmentation fault occurred"

From: Shuzhi Wang <Shuzhi.Wang.Colorado.EDU>
Date: Mon, 17 Dec 2007 15:49:57 -0700

Hi Dr. Walker,

Thank you very much for your help. I have been doing the tests you
suggested last week and I think I found the problem. The problem
actually comes from the too large a time step I used. When I change dt
from 2 fs to 1 fs, everything works. So we can rule out the compiler bug
at least, which would be much more serious.

Shuzhi "James" Wang

Please see below for my response:

>Hi Shuzhi
>
>My first question is a simple one. Have you run the test cases both in
>serial and in parallel? If so do they all pass? Do other simulations all
>run
>fine?

>
>You need to do this step before we can debug any further since from what
>you
>have said so far it suggests that it may be hardware problems - possible
>interconnect failure if it only happens in parallel - or possibly a
>compiler
>bug.
>
>Have you tried PMEMD? Does the same problem occur in both PMEMD and in
>sander.MPI?
>
>Also if you set ntpr=1 and ntwx=1 what happens? Does it still fail?

>It may
>be possible that you have a bad structure - sometimes this only shows up
>when you switch to constant pressure. If you run with ntwx=1 and ntpr=1 you
>may be able to see the structure start to blow up before some division by
>zero or similar infinite energy problem is leading to the segfault.
>However,
>the fact it runs okay in amber 8 and 7 suggests it is most probably a
>compiler bug issue and running the test cases might help identify it.
>

----- Original Message -----
From: "Ross Walker" <ross.rosswalker.co.uk>
To: <amber.scripps.edu>
Sent: Tuesday, December 11, 2007 2:32 PM
Subject: RE: AMBER: amber 9 sander crashed with "forrtl: severe (174):
SIGSEGV, segmentation fault occurred"


> Hi Shuzhi
>
> My first question is a simple one. Have you run the test cases both in
> serial and in parallel? If so do they all pass? Do other simulations all
> run
> fine?
>
> You need to do this step before we can debug any further since from what
> you
> have said so far it suggests that it may be hardware problems - possible
> interconnect failure if it only happens in parallel - or possibly a
> compiler
> bug.
>
> Have you tried PMEMD? Does the same problem occur in both PMEMD and in
> sander.MPI?
>
> Also if you set ntpr=1 and ntwx=1 what happens? Does it still fail? It may
> be possible that you have a bad structure - sometimes this only shows up
> when you switch to constant pressure. If you run with ntwx=1 and ntpr=1
> you
> may be able to see the structure start to blow up before some division by
> zero or similar infinite energy problem is leading to the segfault.
> However,
> the fact it runs okay in amber 8 and 7 suggests it is most probably a
> compiler bug issue and running the test cases might help identify it.
>
> Good luck,
> Ross
>
> /\
> \/
> |\oss Walker
>
> | HPC Consultant and Staff Scientist |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> | http://www.rosswalker.co.uk | PGP Key available on request |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>> -----Original Message-----
>> From: owner-amber.scripps.edu
>> [mailto:owner-amber.scripps.edu] On Behalf Of Shuzhi Wang
>> Sent: Tuesday, December 11, 2007 13:07
>> To: amber.scripps.edu
>> Cc: Shuzhi Wang
>> Subject: AMBER: amber 9 sander crashed with "forrtl: severe
>> (174): SIGSEGV, segmentation fault occurred"
>>
>> Dear all,
>>
>> (Sorry for the long email. but my problem is complicated and i cannot
>> shorten this.)
>>
>> I am a new user of Amber, and I bumped into a very
>> frustrating problem
>> in my first try of running Amber 9: SANDER keeps crashing after an
>> uncertain number of steps with the error message as follows:
>> ----------error message with output context---------------
>> NSTEP = 17800 TIME(PS) = 37.800 TEMP(K) =
>> 285.13 PRESS =
>> -656.4
>> Etot = -2390.0295 EKtot = 1023.2938 EPtot =
>> -3413.3233
>> BOND = 1.2793 ANGLE = 0.4961 DIHED
>> =
>> 0.0002
>> 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS
>> =
>> 209.2466
>> EELEC = -3624.3456 EHBOND = 0.0000 RESTRAINT
>> =
>> 0.0000
>> EKCMT = 506.8383 VIRIAL = 996.4304 VOLUME =
>> 34547.6103
>> Density
>> =
>> 0.5226
>> Ewald error estimate: 0.3956E-03
>>
>> --------------------------------------------------------------
>> ----------------
>>
>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>> Image PC Routine Line
>> Source
>> sander 0000000000548A0C Unknown
>> Unknown Unknown
>> sander 00000000004FAB86 Unknown
>> Unknown Unknown
>> sander 00000000006BE194 Unknown
>> Unknown Unknown
>> sander 00000000004DBE6B Unknown
>> Unknown Unknown
>> sander 00000000004ADF9E Unknown
>> Unknown Unknown
>> sander 00000000004AA218 Unknown
>> Unknown Unknown
>> sander 0000000000404062 Unknown
>> Unknown Unknown
>> libc.so.6 0000003BA081D8A4 Unknown
>> Unknown Unknown
>> sander 0000000000403FA9 Unknown
>> Unknown Unknown
>>
>> NSTEP = 17900 TIME(PS) = 37.900 TEMP(K) = NaN PRESS
>> = NaN
>> Etot = NaN EKtot = NaN EPtot
>> = NaN
>> BOND = 1.5918 ANGLE = 0.6282 DIHED
>> =
>> 0.2988
>> 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS
>> = NaN
>> EELEC = NaN EHBOND = 0.0000 RESTRAINT
>> =
>> 0.0000
>> EKCMT = 532.8891 VIRIAL = NaN VOLUME =
>> 34531.6889
>> Density
>> =
>> 0.5228
>> Ewald error estimate: NaN
>>
>> --------------------------------------------------------------
>> ----------------
>>
>> The whole situation is as follows:
>>
>> I want to run a NVT MD at 300 K on a nitrate ion in a 600 POL3 water
>> cubic box with periodic boundary conditions. I first generated the
>> prmtop and inpcrd files using Leap. I minimized the system first, and
>> then heated it up from 0K to 300K using NVT MD. In the third
>> step, I did
>> a NPT MD at 300 K to get the correct density (~1g/cc). It was at this
>> step when I found the problem. The input file is attached
>> below together
>> with the command to start the simulation:
>> ---------------input-----------------
>> NO3-.(H2O)600: 100ps MD NPT
>> &cntrl
>> imin = 0,
>> irest = 1, ntx = 7,
>> ntb = 2, pres0 = 0.7, ntp = 1, taup = 5.0,
>> ipol = 0,
>> cut = 12.0,
>> ntc = 2, ntf = 2,
>> tempi = 300.0, temp0 = 300.0,
>> ntt = 3, gamma_ln = 1.0,
>> nstlim = 100000, dt = 0.001
>> ntpr = 100, ntwx = 100, ntwr = 1000
>> /
>> ---------bash script to run sander--------------
>> sander -O -i nit_600pol3_cube_md2.in -o nit_600pol3_cube_md2.out -p
>> nit_600pol3_
>> cube.prmtop -c nit_600pol3_cube_md1.rst -r
>> nit_600pol3_cube_md2.rst -x
>> nit_600po
>> l3_cube_md2.mdcrd
>>
>>
>> I searched the mail archive and only found a similar problem about
>> DIVCON, which has already been corrected by a bugfix of amber 9. this
>> amber 9 was compiled using intel fortran compiler 10.0.023. all bug
>> fixes for amber 9 had been applied before compilation.
>>
>> i tried the following things:
>> 1) changing the parameters, which didn't help at all. amber still
>> crashed, although not exactly after the same number steps.
>> 2) doing the same simulation on H2O in 600 POL water box (i.e. a 601
>> POL3 water box), in which the same problem occurred.
>> 3) using amber 8 (compiled with intel fortran compiler v9)
>> and amber 7
>> (compiled with some other fortran compiler, but i don't know
>> which one),
>> and amber 7 worked and finished the simulation, but it was
>> slower than
>> amber 9, cannot do NTT=3 temperature scaling, and there was
>> no parallel
>> sander i can use. amber 8 displayed the same problem as amber 9.
>>
>> i wonder if anyone can kindly help me out of this frustrating
>> situation.
>>
>> thanks,
>> Shuzhi "James" Wang
>> --------------------------------------------------------------
>> ---------
>> The AMBER Mail Reflector
>> To post, send mail to amber.scripps.edu
>> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>>
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Wed Dec 19 2007 - 06:07:22 PST
Custom Search