Re: [AMBER] Amber12 issue with SMD protocol from Jan-Philip Gehrcke on 2012-07-26 (Amber Archive Jul 2012)

From: Jan-Philip Gehrcke <jgehrcke.googlemail.com>
Date: Thu, 26 Jul 2012 15:14:08 +0200

Jason,

let's elaborate this a bit more :)

Actually "the only sure-fire way of forcing a buffer flush is to close
the open file unit" is unprecise. First of all, the "buffer flush" you
mean is from the program to the kernel. There for sure are more buffers
on the way from the file object in the program to physical storage.
Then, and most importantly, the kernel may still defer writing data to
the file system after having called POSIX close(). Quote from
http://linux.die.net/man/2/close:

"A successful close does not guarantee that the data has been
successfully saved to disk, as the kernel defers writes. It is not
common for a file system to flush the buffers when the stream is closed.
If you need to be sure that the data is physically stored use fsync(2).
(It will depend on the disk hardware at this point.)"

Hence, I am not sure what the difference is between a POSIX fflush() and
a close()/open() combination. About fflush():

"Note that fflush() only flushes the user space buffers provided by the
C library. To ensure that the data is physically stored on disk the
kernel buffers must be flushed too, for example, with sync(2) or fsync(2)."
from http://linux.die.net/man/3/fflush

Hence, the only way to be sure that data has been 'provided to' the file
system is to use POSIX fsync(). If that was the intention of the bugfix
you mentioned, it should have been done that way and not via close().

Let me quote another thing from http://stackoverflow.com/a/706688/145400:

"
It is also important to note that fsync does not guarantee a file is on
disk; it just guarantees that the OS has asked the filesystem to flush
changes to the disk. The filesystem does not have to write anything to disk
[...]
Luckily, all of the common filesystems for Linux do in fact write the
changes to disk; unluckily that still doesn't guarantee the file is on
the disk. Many hard drives come with write buffering turned on (and
therefore have their own buffers that fsync does not flush). And some
drives/raid controllers even lie to you about having flushed their buffers.
"

Cheers,

Jan-Philip

On 07/26/2012 02:12 PM, Jason Swails wrote:
> Just a minor comment here: Newer Linux kernels are becoming increasingly
> aggressive when it comes to file buffering, and even "flush" commands in
> Fortran are unable to force writing of the buffer in many instances.
>
> The only sure-fire way of forcing a buffer flush is to close the open file
> unit. If you ensure that the job will finish before the wall time, you
> will get a full file. Otherwise, you can modify the source code directly
> to close this file and reopen it whenever you want its contents flushed.
> The restrt file was recently treated this way (after the bugfixes are all
> applied) to fix this issue with restart files.
>
> HTH,
> Jason
>
> On Thu, Jul 26, 2012 at 6:18 AM, Jan-Philip Gehrcke <jgehrcke.googlemail.com
>> wrote:
>
>> Hey Agostino,
>>
>> let me just add my experiences on this topic. I've also seen that the
>> distance file is written in a different way than other files (mdout for
>> instance). But still, the resulting behavior depends on the environment
>> (most likely on the file system in use). On one cluster, I've seen the
>> distance file content being up to date just as "good" as the trajectory
>> file. On another cluster, I have not seen any contents in that file
>> until the sander process finished.
>>
>> I would agree that this file should be flushed more often during an SMD
>> run in order to lose as few data as possible in case of an unexpected
>> event.
>>
>> Jan-Philip
>>
>>
>>
>> On 07/26/2012 12:05 PM, Agostino Bruno wrote:
>>> Dear developers,
>>>
>>> I am writing you because I just installed the new version of Amber
>> (Amber12). After the installation I
>>> performed the analysis test for all the tools available in amber, and
>> everything went fine. Thus, I tried
>>> to run a SMD job (of about 20ns), using the pmemd.MPI protocol. Also in
>> this case everything went
>>> fine, the only problem is a sort of delay of Amber (a sort of lag time)
>> in writing the text file containing
>>> the distance, the force and the work done during the simulations (this
>> file is usually called dist_vs_t).
>>> My concern refers about the fact that I am working on a public CPUs
>> workstation cluster (a University
>>> Consortium), where I have a limited number of hours for the queues and
>> for the job (usually 72
>>> hours), so when the time useful for the calculation was expired I
>> obtained the exact number of frame
>>> for the SMD run (i.e 2000 frames), but a reduced number of values in
>> the dist_vs_t file (i.e 1500). I
>>> think that this is due to the delay of Amber in writing the file text
>> for the dist_vs_t file.
>>>
>>> I would ask you if exist a way to reduce the lag time with which amber
>> write the dist_vs_t data, so as
>>> to have the same number of frame and dist_vs_t values at the end of the
>> calculation (when the
>>> simulation is stopped, because of the expiration of the queues hour on
>> the public workstation).
>>>
>>> Thank you very much for your collaboration
>>>
>>> Kindest regards,
>>>
>>> Agostino
>>>
>>> --
>>> Agostino Bruno, PhD
>>> Dipartimento Farmceutico
>>> Universita' degli Studi di Parma
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jul 26 2012 - 06:30:02 PDT