On Thu, Jul 26, 2012 at 12:02 PM, David Case <dacase.rci.rutgers.edu> wrote:
> On Jul 26, 2012, at 9:14 AM, Jan-Philip Gehrcke <jgehrcke.googlemail.com>
> wrote:
>
> >
> > Actually "the only sure-fire way of forcing a buffer flush is to close
> > the open file unit" is unprecise.
>
> My question is this: is this a real problem? I very rarely have jobs that
> fail to finish or fail to get the results written to disk. And I generally
> limit any single run to one or two days, in order to limit the loss if
> something prevents results from being dumped to disk. And in the rare
> times when something bad happens, I almost always start again from the last
> fully completed run, and never try to rescue some partial set.
>
> So, what are other people's experience? Are there places where this is a
> bigger problem than I see?
>
I'll pitch in with Niel on this point. I've seen performance on
supercomputers fluctuate significantly from run-to-run (simple restart),
with the end result that simulations can vary on the order of at least
20-30 minutes for my simulations. I'm a bit more conservative than even
you are -- I try to keep my wallclock times at ~5 hours for GB runs and
~10-20 hours for PME runs. In my simulations (288 cores), I had runs that
would finish in 3:30 on a 4 hour wall clock and then others that would hit
and die (albeit close to the end), with time stamps indicating they were
being written to up until the end of the walltime.
However, in my tests/experience, open/close in Fortran has provided
reliable buffer flushing on systems where a simple flush call proved
insufficient. So until open/close fails to work as expected in
sander/pmemd, I think we can avoid looking into hacking fsync into our
Fortran programs (and further attempts to sidestep the fact that no
standard will issue 'guarantees').
All the best,
Jason
--
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jul 26 2012 - 12:30:05 PDT