Re: [AMBER] Restart file for pmemd not showing all information

From: Dean Cuebas <deancuebas.missouristate.edu>
Date: Tue, 27 Dec 2011 14:29:01 -0600

Sorry for not responding earlier... The only time I have seen the problem
is on a CRAY XD1 with SUSE Linux.

Dean

On 12/25/11 4:30 PM, "Duke, Robert E Jr" <rduke.email.unc.edu> wrote:

>Hi Filip,
>Hmmm, I am thinking that what you may have here is OS-specific. I am
>glad to hear that it is only an "interrupted run" problem, but then I I
>think you are solely at the mercy of how the machine does file system
>buffering. Still, a correctly operating machine should not simply be
>dropping bytes on the floor if the job terminates unexpectedly, and even
>then, currently used journaling filesystems should rarely miss much, even
>if the OS crashes. So I presume you are dealing with unexpected job
>termination, not unexpected machine crashing? I am going to act like it
>is Christmas (ie., stop working for the day), but if you give me the info
>about job crash vs. machine crash, I'll think about it all a little more
>early next week. I need to review the code, but it is my guess that I
>never designed restart-writing to survive absolutely everything that
>could go wrong with the machine (ie., I don't believe I close and reopen
>the file at every write, and that is what you need to do to be certain
>that all file system buffers are more-or-less immediately flushed to disk
>- doing this would be a definite performance hit I expect, especially in
>parallel, as it stalls the master process). So it is my guess this is in
>no way a bug, but if you are having machines crashing left and right,
>could be a heck of an annoyance (but I would think that the crashing
>itself would be something that really ought to be addressed..., if that
>is what is happening).
>Best Regards - Bob
>
>________________________________________
>From: filip fratev [filipfratev.yahoo.com]
>Sent: Sunday, December 25, 2011 1:48 PM
>To: amber.ambermd.org
>Subject: Re: [AMBER] Restart file for pmemd not showing all information
>
>Hi Bob,
>Amber11 give me correct restart file only at the final step (when the
>simulation finish), i.e. if I run 1ns simulation I will obtain a correct
>file only after 1ns. Thus I can Heat, Density and so on my system if
>you mean that. My problem is that if I run 100ns and something wrong
>happen after 50ns I am not able to restart and continue my simulation.
>Moreover, what I know from Ross, if you set the same "ig" value for the
>pmemd.CUDA the simulation should continue exactly in the same way. The
>failure is permanent. For my test today I used the standard Amber CUDA
>test files, but also as an example I can give:
> &cntrl
> imin=0,irest=1,ntx=5,
> nstlim=50000000,dt=0.002,
> ntc=2,ntf=2,ig=-1,iwrap=1,
> cut=8.0, ntb=2, ntp=1,
> taup=1.0,
> ntpr=5000, ntwx=5000, ntwr=10000,
> ntt=3, gamma_ln=2.0,
> temp0=300.0,
> ioutfm=1,
> /
>
>Unfortunately, I don't have Amber10 but probably can find Amber9, is it
>ok for these
>tests? It is interesting because I know my colleagues that have the same
>problem but use the same OS (Suse, gcc). On the other hand from our
>discussions here I know people no experiencing this problem under Suse,
>as for example Marek if I am not wrong...
>
>All the best,
>Filip
>
>
>
>
>
>________________________________
>
>
>
>
>________________________________
> From: "Duke, Robert E Jr" <rduke.email.unc.edu>
>To: filip fratev <filipfratev.yahoo.com>; AMBER Mailing List
><amber.ambermd.org>
>Sent: Sunday, December 25, 2011 10:52 PM
>Subject: RE: [AMBER] Restart file for pmemd not showing all information
>
>Hi Filip,
>Do you have access to pmemd 10? Can you try that? That would tell us
>whether it is a problem specific to your system, or Amber 11. I don't
>work on Amber 11 much myself, so would probably suggest that Walker's
>group pick it up, if it isolates to 11. I don't understand your
>statement that you don't use restarts much - I don't see how would get
>trajectories of any length without using them, but maybe you are using
>amber a bit differently than what I am used to. It also might not hurt
>if you post what your mdin looks like for these runs. What is the
>failure rate?
>Thanks - Bob
>
>________________________________________
>From: filip fratev [filipfratev.yahoo.com]
>Sent: Sunday, December 25, 2011 12:46 PM
>To: AMBER Mailing List
>Subject: Re: [AMBER] Restart file for pmemd not showing all information
>
>Hi
> Bob,
>>Does this happen to you with Amber 11, and while using CUDA/CUDA.MPI? if
>>you run non-CUDA pmemd.mpi , can you get it to happen?Sounds to me like
>>you are talking small cluster systems, in-lab, correct?
>
>Yes, I use just several individual desktop machines and Amber11. I tried
>again right now and the problem is the same when using both
>pmemd.cuda.MPI and pmemd.MPI, as well as when I use the serial version.
>It is very strange. I noticed this problem one year ago but because I
>never used restart files I report it now here.
>
>All the best,
>Filip
>
>
>
>
>________________________________
>From: "Duke, Robert E Jr" <rduke.email.unc.edu>
>To: filip fratev <filipfratev.yahoo.com>; AMBER Mailing List
><amber.ambermd.org>
>Sent: Sunday, December 25, 2011 9:03 PM
>Subject: Re: [AMBER] Restart file for pmemd not showing all information
>
>Thanks filip,
>So the question for everyone with pmemd restart file problems becomes
>this: Does this happen to you with Amber 11, and while using
>CUDA/CUDA.MPI? The other question would be, "if you run non-CUDA
>pmemd.mpi (amber11 or amber10), can you get it to happen?". We then can
>distinguish between something specific to a version/build type of pmemd
>vs. a possible OS problem. Sounds to me like you are talking small
>cluster systems, in-lab, correct? (ie., you are not running at one of the
>big supercomputer centers with some sort of super-optimized parallel file
>system).
>Best Regards - Bob Duke
>
>________________________________________
>From: filip fratev [filipfratev.yahoo.com]
>Sent: Sunday, December 25, 2011 3:30 AM
>To: AMBER Mailing List
>Subject: Re: [AMBER] Restart file for pmemd not showing all information
>
>Hi all,
>Marry Christmas and
>happy New Year!
>
>I have the same
>problem - some atoms missing and no any information about the box. I
>never obtained
>full restart file during the simulations. I use pmemd.CUDA and CUDA.MPI
>compiled with
>gcc4.3, 4.5 and 4.6 on different systems under Suse11.3, 11.4 and 12.1.
>The
>only proper restart files are those obtained after the end of the
>simulation.
>
>What might be
>the problem and how to solve it?
>
>
>All the best,
>Filip
>
>
>________________________________
>From: Bill Ross <ross.cgl.ucsf.EDU>
>To: amber.ambermd.org
>Sent: Saturday, December 24, 2011 11:26 PM
>Subject: Re: [AMBER] Restart file for pmemd not showing all information
>
>> If memory serves, really the only way we could flush the buffers during
>> a run was an actual close and reopen cycle
>
>How about flush()?
>
> http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gfortran/FLUSH.html
>
>Though I think close/open would be easier to trust.
>
>Bill
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Dec 27 2011 - 12:30:02 PST
Custom Search