Re: [AMBER] Restart file for pmemd not showing all information

From: filip fratev <filipfratev.yahoo.com>
Date: Tue, 14 Feb 2012 08:38:57 -0800 (PST)

Hi all,
I was not able
to solve my problem with restart files. We got 4 PC's with completely
different hardware except the GPU's (GTX580's) and OS - SUSE 11.3-11.4 installed
on all PC's. I was wondering what could provoke this problem. Could someone
using Suse to reproduce my problem?
 
All the best,
Filip


________________________________
 From: "Duke, Robert E Jr" <rduke.email.unc.edu>
To: filip fratev <filipfratev.yahoo.com>; AMBER Mailing List <amber.ambermd.org>
Sent: Monday, December 26, 2011 12:30 AM
Subject: RE: [AMBER] Restart file for pmemd not showing all information
 
Hi Filip,
Hmmm, I am thinking that what you may have here is OS-specific.  I am glad to hear that it is only an "interrupted run" problem, but then I I think you are solely at the mercy of how the machine does file system buffering.  Still, a correctly operating machine should not simply be dropping bytes on the floor if the job terminates unexpectedly, and even then, currently used journaling filesystems should rarely miss much, even if the OS crashes.  So I presume you are dealing with unexpected job termination, not unexpected machine crashing?  I am going to act like it is Christmas (ie., stop working for the day), but if you give me the info about job crash vs. machine crash, I'll think about it all a little more early next week.  I need to review the code, but it is my guess that I never designed restart-writing to survive absolutely everything that could go wrong with the machine (ie., I don't believe I close and reopen the file at every write, and
 that is what you need to do to be certain that all file system buffers are more-or-less immediately flushed to disk - doing this would be a definite performance hit I expect, especially in parallel, as it stalls the master process).  So it is my guess this is in no way a bug, but if you are having machines crashing left and right, could be a heck of an annoyance (but I would think that the crashing itself would be something that really ought to be addressed..., if that is what is happening).
Best Regards - Bob

________________________________________
From: filip fratev [filipfratev.yahoo.com]
Sent: Sunday, December 25, 2011 1:48 PM
To: amber.ambermd.org
Subject: Re: [AMBER] Restart file for pmemd not showing all information

Hi Bob,
Amber11 give me correct restart file only at the final step (when the
simulation finish), i.e. if I run 1ns simulation I will obtain a correct file only after 1ns. Thus I can Heat, Density and so on my system if
you mean that. My problem is that if I run 100ns and something wrong
happen after 50ns I am not able to restart and continue my simulation.
Moreover, what I know from Ross, if you set the same "ig" value for the
pmemd.CUDA the simulation should continue exactly in the same way. The
failure is permanent. For my test today I used the standard Amber CUDA
test files, but also as an example I can give:
&cntrl
  imin=0,irest=1,ntx=5,
  nstlim=50000000,dt=0.002,
  ntc=2,ntf=2,ig=-1,iwrap=1,
  cut=8.0, ntb=2, ntp=1,
taup=1.0,
  ntpr=5000, ntwx=5000, ntwr=10000,
  ntt=3, gamma_ln=2.0,
  temp0=300.0,
  ioutfm=1,
/

Unfortunately, I don't have Amber10 but probably can find Amber9, is it ok for these
tests? It is interesting because I know my colleagues that have the same problem but use the same OS (Suse, gcc). On the other hand from our
discussions here I know people no experiencing this problem under Suse,
as for example Marek if I am not wrong...

All the best,
Filip





________________________________




________________________________
From: "Duke, Robert E Jr" <rduke.email.unc.edu>
To: filip fratev <filipfratev.yahoo.com>; AMBER Mailing List <amber.ambermd.org>
Sent: Sunday, December 25, 2011 10:52 PM
Subject: RE: [AMBER] Restart file for pmemd not showing all information

Hi Filip,
Do you have access to pmemd 10?  Can you try that?  That would tell us whether it is a problem specific to your system, or Amber 11.  I don't work on Amber 11 much myself, so would probably suggest that Walker's group pick it up, if it isolates to 11.  I don't understand your statement that you don't use restarts much - I don't see how would get trajectories of any length without using them, but maybe you are using amber a bit differently than what I am used to.  It also might not hurt if you post what your mdin looks like for these runs.  What is the failure rate?
Thanks - Bob

________________________________________
From: filip fratev [filipfratev.yahoo.com]
Sent: Sunday, December 25, 2011 12:46 PM
To: AMBER Mailing List
Subject: Re: [AMBER] Restart file for pmemd not showing all information

Hi
Bob,
>Does this happen to you with Amber 11, and while using CUDA/CUDA.MPI? if you run non-CUDA pmemd.mpi , can you get it to happen?Sounds to me like you are talking small cluster systems, in-lab, correct?

Yes, I use just several individual desktop machines and Amber11. I tried again right now and the problem is the same when using both pmemd.cuda.MPI and pmemd.MPI, as well as when I use the serial version.
It is very strange. I noticed this problem one year ago but because I never used restart files I report it now here.

All the best,
Filip




________________________________
From: "Duke, Robert E Jr" <rduke.email.unc.edu>
To: filip fratev <filipfratev.yahoo.com>; AMBER Mailing List <amber.ambermd.org>
Sent: Sunday, December 25, 2011 9:03 PM
Subject: Re: [AMBER] Restart file for pmemd not showing all information

Thanks filip,
So the question for everyone with pmemd restart file problems becomes this:  Does this happen to you with Amber 11, and while using CUDA/CUDA.MPI?  The other question would be, "if you run non-CUDA pmemd.mpi (amber11 or amber10), can you get it to happen?".  We then can distinguish between something specific to a version/build type of pmemd vs.  a possible OS problem.  Sounds to me like you are talking small cluster systems, in-lab, correct? (ie., you are not running at one of the big supercomputer centers with some sort of super-optimized parallel file system).
Best Regards - Bob Duke

________________________________________
From: filip fratev [filipfratev.yahoo.com]
Sent: Sunday, December 25, 2011 3:30 AM
To: AMBER Mailing List
Subject: Re: [AMBER] Restart file for pmemd not showing all information

Hi all,
Marry Christmas and
happy New Year!

I have the same
problem - some atoms missing and no any information about the box. I never obtained
full restart file during the simulations. I use pmemd.CUDA and CUDA.MPI compiled with
gcc4.3, 4.5 and 4.6 on different systems under Suse11.3, 11.4 and 12.1. The
only proper restart files are those obtained after the end of the simulation.

What might be
the problem and how to solve it?


All the best,
Filip


________________________________
From: Bill Ross <ross.cgl.ucsf.EDU>
To: amber.ambermd.org
Sent: Saturday, December 24, 2011 11:26 PM
Subject: Re: [AMBER] Restart file for pmemd not showing all information

> If memory serves, really the only way we could flush the buffers during
> a run was an actual close and reopen cycle

How about flush()?

  http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gfortran/FLUSH.html

Though I think close/open would be easier to trust.

Bill

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Feb 14 2012 - 09:00:02 PST
Custom Search