Re: AMBER: replica exchange hangs

From: Carlos Simmerling <carlos.simmerling.stonybrook.edu>
Date: Wed, 01 Mar 2006 13:37:04 -0500

by still working you mean that it is still taking CPU time?
we haven't seen that. One possible problem with REMD is that
if one of the replicas fails (shake error perhaps) then the
others wait for it, and MPI isn't always good about killing the
whole set. As with normal MD in a parallel run, Amber doesn't
always report the reason for failure if it happens on a non-master
node that doesn't have access to the output file.
I'm a bit puzzled about these, though, I have run lots of REMD
and haven't experienced it.

Peter's experience with even changing the seed and so on is
really odd. DIfferent seeds should hange the trajectory so
anything that happens should be different.

One thing to check in your restart files that you use as input
is to ensure that all have different T in the restart file (2nd line, third
number, only in REMD restarts). It can be possible if a job
crashes that the restarts are not all at the same time point and
matching temperatures will cause a crash. This part is fixed in Amber
9 along with some other changes.
Carlos

===================================================================
Carlos L. Simmerling, Ph.D.
Associate Professor Phone: (631) 632-1336
Center for Structural Biology Fax: (631) 632-1555
Stony Brook University E-mail: carlos.simmerling.stonybrook.edu
Stony Brook, NY 11794-5115

Academic year 2005 address:
Brookhaven National Laboratory
Computational Science Center
Upton, NY 11973
===================================================================




Sergey Krishtal wrote:

>Hello!
>
>I have the same problem: I performed REMD simulations with 16 replicas on Xeon machines. After some time Amber 8 is still working but without any output written.
>I'm also very interesting in that.
>
>Best regards,
>Sergey
>
>
>-----Original Message-----
>From: Peter Varnai <pv232.cam.ac.uk>
>To: amber.scripps.edu
>Date: Wed, 1 Mar 2006 17:29:34 +0000 (GMT)
>Subject: AMBER: replica exchange hangs
>
>
>
>>Dear Amber-group,
>>
>>A 16 replica exchange simulation (with LD and GB) ran successfully
>>over 10000 exchanges every 1 ps (including several restarts). However
>>after a certain successful exchange the code hangs: cpu used and no
>>output written. This behaviour was seen on multiple platforms,
>>compilation, in batch vs interactive modes. I restarted it from the
>>last successful rst files and tried different random seeds/exchange
>>frequency and finally attempted exchange every step (1 fs) and checked
>>the final, written output files but nothing unusual with coordinates
>>or energies.
>>
>>I would welcome any suggestion what might be wrong with it.
>>
>>Thanks for your help.
>>
>>Best regards,
>>Peter
>>-----------------------------------------------------------------------
>>The AMBER Mail Reflector
>>To post, send mail to amber.scripps.edu
>>To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>>
>>
>>
>
>-----------------------------------------------------------------------
>The AMBER Mail Reflector
>To post, send mail to amber.scripps.edu
>To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Mar 05 2006 - 06:10:12 PST
Custom Search