Hi Filip,
Hmmm, I am thinking that what you may have here is OS-specific. I am glad to hear that it is only an "interrupted run" problem, but then I I think you are solely at the mercy of how the machine does file system buffering. Still, a correctly operating machine should not simply be dropping bytes on the floor if the job terminates unexpectedly, and even then, currently used journaling filesystems should rarely miss much, even if the OS crashes. So I presume you are dealing with unexpected job termination, not unexpected machine crashing? I am going to act like it is Christmas (ie., stop working for the day), but if you give me the info about job crash vs. machine crash, I'll think about it all a little more early next week. I need to review the code, but it is my guess that I never designed restart-writing to survive absolutely everything that could go wrong with the machine (ie., I don't believe I close and reopen the file at every write, and that is what you need to do to be certain that all file system
buffers are more-or-less immediately flushed to disk - doing this would be a definite performance hit I expect, especially in parallel, as it stalls the master process). So it is my guess this is in no way a bug, but if you are having machines crashing left and right, could be a heck of an annoyance (but I would think that the crashing itself would be something that really ought to be addressed..., if that is what is happening).
Best Regards - Bob
________________________________________
From: filip fratev [filipfratev.yahoo.com]
Sent: Sunday, December 25, 2011 1:48 PM
To: amber.ambermd.org
Subject: Re: [AMBER] Restart file for pmemd not showing all information
Hi Bob,
Amber11 give me correct restart file only at the final step (when the
simulation finish), i.e. if I run 1ns simulation I will obtain a correct file only after 1ns. Thus I can Heat, Density and so on my system if
you mean that. My problem is that if I run 100ns and something wrong
happen after 50ns I am not able to restart and continue my simulation.
Moreover, what I know from Ross, if you set the same "ig" value for the
pmemd.CUDA the simulation should continue exactly in the same way. The
failure is permanent. For my test today I used the standard Amber CUDA
test files, but also as an example I can give:
&cntrl
imin=0,irest=1,ntx=5,
nstlim=50000000,dt=0.002,
ntc=2,ntf=2,ig=-1,iwrap=1,
cut=8.0, ntb=2, ntp=1,
taup=1.0,
ntpr=5000, ntwx=5000, ntwr=10000,
ntt=3, gamma_ln=2.0,
temp0=300.0,
ioutfm=1,
/
Unfortunately, I don't have Amber10 but probably can find Amber9, is it ok for these
tests? It is interesting because I know my colleagues that have the same problem but use the same OS (Suse, gcc). On the other hand from our
discussions here I know people no experiencing this problem under Suse,
as for example Marek if I am not wrong...
All the best,
Filip
________________________________
________________________________
From: "Duke, Robert E Jr" <rduke.email.unc.edu>
To: filip fratev <filipfratev.yahoo.com>; AMBER Mailing List <amber.ambermd.org>
Sent: Sunday, December 25, 2011 10:52 PM
Subject: RE: [AMBER] Restart file for pmemd not showing all information
Hi Filip,
Do you have access to pmemd 10? Can you try that? That would tell us whether it is a problem specific to your system, or Amber 11. I don't work on Amber 11 much myself, so would probably suggest that Walker's group pick it up, if it isolates to 11. I don't understand your statement that you don't use restarts much - I don't see how would get trajectories of any length without using them, but maybe you are using amber a bit differently than what I am used to. It also might not hurt if you post what your mdin looks like for these runs. What is the failure rate?
Thanks - Bob
________________________________________
From: filip fratev [filipfratev.yahoo.com]
Sent: Sunday, December 25, 2011 12:46 PM
To: AMBER Mailing List
Subject: Re: [AMBER] Restart file for pmemd not showing all information
Hi
Bob,
>Does this happen to you with Amber 11, and while using CUDA/CUDA.MPI? if you run non-CUDA pmemd.mpi , can you get it to happen?Sounds to me like you are talking small cluster systems, in-lab, correct?
Yes, I use just several individual desktop machines and Amber11. I tried again right now and the problem is the same when using both pmemd.cuda.MPI and pmemd.MPI, as well as when I use the serial version.
It is very strange. I noticed this problem one year ago but because I never used restart files I report it now here.
All the best,
Filip
________________________________
From: "Duke, Robert E Jr" <rduke.email.unc.edu>
To: filip fratev <filipfratev.yahoo.com>; AMBER Mailing List <amber.ambermd.org>
Sent: Sunday, December 25, 2011 9:03 PM
Subject: Re: [AMBER] Restart file for pmemd not showing all information
Thanks filip,
So the question for everyone with pmemd restart file problems becomes this: Does this happen to you with Amber 11, and while using CUDA/CUDA.MPI? The other question would be, "if you run non-CUDA pmemd.mpi (amber11 or amber10), can you get it to happen?". We then can distinguish between something specific to a version/build type of pmemd vs. a possible OS problem. Sounds to me like you are talking small cluster systems, in-lab, correct? (ie., you are not running at one of the big supercomputer centers with some sort of super-optimized parallel file system).
Best Regards - Bob Duke
________________________________________
From: filip fratev [filipfratev.yahoo.com]
Sent: Sunday, December 25, 2011 3:30 AM
To: AMBER Mailing List
Subject: Re: [AMBER] Restart file for pmemd not showing all information
Hi all,
Marry Christmas and
happy New Year!
I have the same
problem - some atoms missing and no any information about the box. I never obtained
full restart file during the simulations. I use pmemd.CUDA and CUDA.MPI compiled with
gcc4.3, 4.5 and 4.6 on different systems under Suse11.3, 11.4 and 12.1. The
only proper restart files are those obtained after the end of the simulation.
What might be
the problem and how to solve it?
All the best,
Filip
________________________________
From: Bill Ross <ross.cgl.ucsf.EDU>
To: amber.ambermd.org
Sent: Saturday, December 24, 2011 11:26 PM
Subject: Re: [AMBER] Restart file for pmemd not showing all information
> If memory serves, really the only way we could flush the buffers during
> a run was an actual close and reopen cycle
How about flush()?
http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gfortran/FLUSH.html
Though I think close/open would be easier to trust.
Bill
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Dec 25 2011 - 15:00:03 PST