Filip,
I was able to get the full information on the restart file by using Amber
10 instead of Amber 11. I know that doesn't solve the problem, but that's
how I avoided the problem from a couple of months ago. Sorry for the
delayed response. Hope this helps.
Sasha Perkins
Penn State University
On Tue, Feb 14, 2012 at 11:38 AM, filip fratev <filipfratev.yahoo.com>wrote:
> Hi all,
> I was not able
> to solve my problem with restart files. We got 4 PC's with completely
> different hardware except the GPU's (GTX580's) and OS - SUSE 11.3-11.4
> installed
> on all PC's. I was wondering what could provoke this problem. Could someone
> using Suse to reproduce my problem?
>
> All the best,
> Filip
>
>
> ________________________________
> From: "Duke, Robert E Jr" <rduke.email.unc.edu>
> To: filip fratev <filipfratev.yahoo.com>; AMBER Mailing List <
> amber.ambermd.org>
> Sent: Monday, December 26, 2011 12:30 AM
> Subject: RE: [AMBER] Restart file for pmemd not showing all information
>
> Hi Filip,
> Hmmm, I am thinking that what you may have here is OS-specific. I am glad
> to hear that it is only an "interrupted run" problem, but then I I think
> you are solely at the mercy of how the machine does file system buffering.
> Still, a correctly operating machine should not simply be dropping bytes on
> the floor if the job terminates unexpectedly, and even then, currently used
> journaling filesystems should rarely miss much, even if the OS crashes. So
> I presume you are dealing with unexpected job termination, not unexpected
> machine crashing? I am going to act like it is Christmas (ie., stop
> working for the day), but if you give me the info about job crash vs.
> machine crash, I'll think about it all a little more early next week. I
> need to review the code, but it is my guess that I never designed
> restart-writing to survive absolutely everything that could go wrong with
> the machine (ie., I don't believe I close and reopen the file at every
> write, and
> that is what you need to do to be certain that all file system buffers
> are more-or-less immediately flushed to disk - doing this would be a
> definite performance hit I expect, especially in parallel, as it stalls the
> master process). So it is my guess this is in no way a bug, but if you are
> having machines crashing left and right, could be a heck of an annoyance
> (but I would think that the crashing itself would be something that really
> ought to be addressed..., if that is what is happening).
> Best Regards - Bob
>
> ________________________________________
> From: filip fratev [filipfratev.yahoo.com]
> Sent: Sunday, December 25, 2011 1:48 PM
> To: amber.ambermd.org
> Subject: Re: [AMBER] Restart file for pmemd not showing all information
>
> Hi Bob,
> Amber11 give me correct restart file only at the final step (when the
> simulation finish), i.e. if I run 1ns simulation I will obtain a correct
> file only after 1ns. Thus I can Heat, Density and so on my system if
> you mean that. My problem is that if I run 100ns and something wrong
> happen after 50ns I am not able to restart and continue my simulation.
> Moreover, what I know from Ross, if you set the same "ig" value for the
> pmemd.CUDA the simulation should continue exactly in the same way. The
> failure is permanent. For my test today I used the standard Amber CUDA
> test files, but also as an example I can give:
> &cntrl
> imin=0,irest=1,ntx=5,
> nstlim=50000000,dt=0.002,
> ntc=2,ntf=2,ig=-1,iwrap=1,
> cut=8.0, ntb=2, ntp=1,
> taup=1.0,
> ntpr=5000, ntwx=5000, ntwr=10000,
> ntt=3, gamma_ln=2.0,
> temp0=300.0,
> ioutfm=1,
> /
>
> Unfortunately, I don't have Amber10 but probably can find Amber9, is it ok
> for these
> tests? It is interesting because I know my colleagues that have the same
> problem but use the same OS (Suse, gcc). On the other hand from our
> discussions here I know people no experiencing this problem under Suse,
> as for example Marek if I am not wrong...
>
> All the best,
> Filip
>
>
>
>
>
> ________________________________
>
>
>
>
> ________________________________
> From: "Duke, Robert E Jr" <rduke.email.unc.edu>
> To: filip fratev <filipfratev.yahoo.com>; AMBER Mailing List <
> amber.ambermd.org>
> Sent: Sunday, December 25, 2011 10:52 PM
> Subject: RE: [AMBER] Restart file for pmemd not showing all information
>
> Hi Filip,
> Do you have access to pmemd 10? Can you try that? That would tell us
> whether it is a problem specific to your system, or Amber 11. I don't work
> on Amber 11 much myself, so would probably suggest that Walker's group pick
> it up, if it isolates to 11. I don't understand your statement that you
> don't use restarts much - I don't see how would get trajectories of any
> length without using them, but maybe you are using amber a bit differently
> than what I am used to. It also might not hurt if you post what your mdin
> looks like for these runs. What is the failure rate?
> Thanks - Bob
>
> ________________________________________
> From: filip fratev [filipfratev.yahoo.com]
> Sent: Sunday, December 25, 2011 12:46 PM
> To: AMBER Mailing List
> Subject: Re: [AMBER] Restart file for pmemd not showing all information
>
> Hi
> Bob,
> >Does this happen to you with Amber 11, and while using CUDA/CUDA.MPI? if
> you run non-CUDA pmemd.mpi , can you get it to happen?Sounds to me like you
> are talking small cluster systems, in-lab, correct?
>
> Yes, I use just several individual desktop machines and Amber11. I tried
> again right now and the problem is the same when using both pmemd.cuda.MPI
> and pmemd.MPI, as well as when I use the serial version.
> It is very strange. I noticed this problem one year ago but because I
> never used restart files I report it now here.
>
> All the best,
> Filip
>
>
>
>
> ________________________________
> From: "Duke, Robert E Jr" <rduke.email.unc.edu>
> To: filip fratev <filipfratev.yahoo.com>; AMBER Mailing List <
> amber.ambermd.org>
> Sent: Sunday, December 25, 2011 9:03 PM
> Subject: Re: [AMBER] Restart file for pmemd not showing all information
>
> Thanks filip,
> So the question for everyone with pmemd restart file problems becomes
> this: Does this happen to you with Amber 11, and while using
> CUDA/CUDA.MPI? The other question would be, "if you run non-CUDA pmemd.mpi
> (amber11 or amber10), can you get it to happen?". We then can distinguish
> between something specific to a version/build type of pmemd vs. a possible
> OS problem. Sounds to me like you are talking small cluster systems,
> in-lab, correct? (ie., you are not running at one of the big supercomputer
> centers with some sort of super-optimized parallel file system).
> Best Regards - Bob Duke
>
> ________________________________________
> From: filip fratev [filipfratev.yahoo.com]
> Sent: Sunday, December 25, 2011 3:30 AM
> To: AMBER Mailing List
> Subject: Re: [AMBER] Restart file for pmemd not showing all information
>
> Hi all,
> Marry Christmas and
> happy New Year!
>
> I have the same
> problem - some atoms missing and no any information about the box. I never
> obtained
> full restart file during the simulations. I use pmemd.CUDA and CUDA.MPI
> compiled with
> gcc4.3, 4.5 and 4.6 on different systems under Suse11.3, 11.4 and 12.1. The
> only proper restart files are those obtained after the end of the
> simulation.
>
> What might be
> the problem and how to solve it?
>
>
> All the best,
> Filip
>
>
> ________________________________
> From: Bill Ross <ross.cgl.ucsf.EDU>
> To: amber.ambermd.org
> Sent: Saturday, December 24, 2011 11:26 PM
> Subject: Re: [AMBER] Restart file for pmemd not showing all information
>
> > If memory serves, really the only way we could flush the buffers during
> > a run was an actual close and reopen cycle
>
> How about flush()?
>
> http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gfortran/FLUSH.html
>
> Though I think close/open would be easier to trust.
>
> Bill
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
--
Sincerely,
Sasha Perkins
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Feb 14 2012 - 09:00:03 PST