RE: AMBER: problems with restart of MD

From: Ross Walker <ross.rosswalker.co.uk>
Date: Sat, 12 Jan 2008 18:58:20 -0800

Hi Vijay,
 
Here's the problem - line 488: 174.1520850
-93.25969211280.4501416************-208.2782643************
 
Your coordinates have increased so that they no longer fit in the space
allocated for them in the file and it prints *'s. If you run this in serial
it will probably quit with an error about problems doing formatted read but
in parallel sometimes things can just hang. If you find weird errors in
parallel it is always best to rerun in serial interactively so any errors
are not "lost".
 
How long has your simulation been running at this point? Normally it takes
at least 50ns or so for things to have diffused far enough to cause the
above problem. If your simulation time is much less than this then you could
have problems that are causing your system to blow up.
 
I would recommend going back to the previous restart file (that doesn't have
star's in it) and rerun the simulation but this time set nscm=1000 which
will remove center of mass motion and stop your system translating through
space. However, the fact that line 489 has:
 
1867.9560045-227.8549628 617.5614458

while everything else is around:
 
 337.8596051 341.2630291-537.0048903 337.8908765 342.2356117-536.7168717

It suggest to me that some part of your system has took off and translated a
long way away. You don't have ions or water here do you?
 
I would both check on your system - perhaps run from the restart with ntwx=1
for a few hundred steps and visualize it to see what is happening. Also make
sure you have nscm set or you will still have problems later when the entire
system ends up translating too far due to center of mass motion imparted by
having a thermostat present.
 
All the best
Ross
 
/\
\/
|\oss Walker

| Assistant Research Professor |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk <http://www.rosswalker.co.uk/> | PGP Key
available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.
 


  _____

From: owner-amber.scripps.edu [mailto:owner-amber.scripps.edu] On Behalf Of
Vijay Singh
Sent: Saturday, January 12, 2008 01:49
To: amber.scripps.edu
Subject: Re: AMBER: problems with restart of MD


Dear Dr.Ross,

    The results continues to be same even with NTX = 5. The error log
file shows the following -


forrtl: severe (64): input conversion error, unit 9, file
/mnt/home/singhvij/mdout800_1.rst
Image PC Routine Line Source
sander.MPI 00000000007E2492 Unknown Unknown Unknown
sander.MPI 00000000007E1692 Unknown Unknown Unknown
sander.MPI 0000000000798246 Unknown Unknown Unknown
sander.MPI 000000000074D63E Unknown Unknown Unknown
sander.MPI 000000000074CC5A Unknown Unknown Unknown
sander.MPI 000000000076D3A9 Unknown Unknown Unknown
sander.MPI 00000000004F4556 Unknown Unknown Unknown
sander.MPI 00000000004B8854 Unknown Unknown Unknown
sander.MPI 00000000004B2BAD Unknown Unknown Unknown
sander.MPI 0000000000405E32 Unknown Unknown Unknown
libc.so.6 00002BA629E1A154 Unknown Unknown Unknown
sander.MPI 0000000000405D6A Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
sander.MPI 000000000073763F Unknown Unknown Unknown
sander.MPI 0000000000736740 Unknown Unknown Unknown



I have no idea of how to proceed from here.Incase needed, I am attaching the
"mdout800_1.rst" file for your perusal.

Thanks a lot,
Vijay











On Jan 11, 2008 3:04 PM, Vijay Singh <vijayratan.singh.gmail.com
<mailto:vijayratan.singh.gmail.com> > wrote:


Hi,

   Thanks for the response. I actually tried NTX = 5 too. Result was same as
that with NTX=7. But, I will go ahead and try once again and may be wait a
little more longer to see if the output file updates properly.

Thanks again,
Vijay


On Jan 11, 2008 2:56 PM, Ross Walker < <mailto:ross.rosswalker.co.uk>
ross.rosswalker.co.uk> wrote:


Hi Vijay,
 
The issue is that you are running a non-periodic simulation here (ntb=0) but
when you restart you are setting ntx=7 which tells sander to expect box
information from the input coordinate file. Since your input coordinate file
does not have any box info the code is hanging there waiting for that
information to be appended to the file. I realize we should probably find a
better way to do this in the code so it fails gracefully rather than just
hanging but this isn't always easy in parallel.
 
Anyway, to answer your problem set ntx=5 and everything should be good. Also
note that with Amber 9 you can always set NTX=5 and it will auto load the
box info if you are running a periodic simulation. Thus ntx=7 is actually
deprecated as an option hence why it is no longer in the manual.
 
All the best
Ross

/\
\/
|\oss Walker

| Assistant Research Professor |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk <http://www.rosswalker.co.uk/> | PGP Key
available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

 


  _____


From: owner-amber.scripps.edu [mailto:owner-amber.scripps.edu] On Behalf Of
Vijay Singh

Sent: Friday, January 11, 2008 09:30

To: amber.scripps.edu
Subject: AMBER: problems with restart of MD



Hi,



Not sure if my messages are reaching the right destination. I did not get
any response on 2 different occasions earlier. Neverthless, another try.


I am using amber9 and doing some very basic MD.I am having some trouble with
the restart of MD production run. Not sure where I am going wrong. After
initial minimization, the first part of run is fine

The input files looks -




&cntrl
 imin = 0, ntb = 0, irest = 0,
 igb = 1, ntpr = 10000, ntwx = 1000,
 ntt = 3, gamma_ln = 1.0,
 temp0 = 800.0,tempi = 800.0,
 nstlim = 40000000, dt = 0.001,
 cut = 999
/

mpiexec $AMBERHOME/exe/sander.MPI -O -i md_800k_1.in -o md800_1.out -c
t57c_min.rst -p t57c.prmtop -r mdout800_1.rst -x mdout800_1.mdcrd









Till this point I get all the output as needed. But the 2nd step below is
where I get stuck on the restart part, the input is as follows -






&cntrl
 imin = 0, ntb = 0, irest = 1, ntx = 7,
 igb = 1, ntpr = 10000, ntwx = 1000,
 ntt = 3, gamma_ln = 1.0,
 temp0 = 800.0,
 nstlim =40000000, dt = 0.001,
 cut = 999
/



#mpiexec $AMBERHOME/exe/sander.MPI -O -i md_800k_2.in -o md800_2.out -c
mdout800_1.rst -p t57c.prmtop -r mdout800_2.rst -x mdout800_2.mdcrd





>From here I don't get any output. The mdout file stops with -

Langevin dynamics temperature regulation:
   ig = 71277
   temp0 = 800.00000, tempi = 0.00000, gamma_ln= 1.00000
| INFO: Old style inpcrd file read



----------------------------------------------------------------------------
----
3. ATOMIC COORDINATES AND VELOCITIES
----------------------------------------------------------------------------
----








Could someone please help me on that.

Regards
Vijay




-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Jan 13 2008 - 06:07:39 PST
Custom Search