PMEMD is now _so_ fast - that disk write integrity may merit a second
look :)
I just had the unfortunate experience of running out of disk space
allocation on our cluster during a PMEMD run that was generating GBs of
trajectory frames (very fast). PMEMD did not halt, but kept running
while I was deleting files to save space.
Fortunately, I caught the corrupted zero-data frames in my trajectory
file when doing RMSD vs start analysis...
I suggest that if PMEMD cannot write a trajectory frame, it should halt
with a hard error with EXIT_STATUS <> 0. (and I respect this request
may not be trivial in a parallel or GPU environment).
In pmemd's bintraj.F90 there is this kind of code which writes the
trajectory frames
if (unit .eq. mdcrd) then
*call checkNCerror*(nf90_put_var(mdcrd_ncid, mdcrd_time_var_id, (/ t /), &
start=(/ mdcrd_frame /), count=(/ 1 /)), 'write
time')
call checkNCerror(nf90_sync(mdcrd_ncid))
I found *checkNCerror* in the Amber16 tools fortran condes -a nd copy it
here.
As you can see it puts out an error message - but then returns (ouch).
I can;t think of a time where failure of checkNCerror should return.
It's a pretty serious situation and I would recommend hard exit instead
of return after you output the message:
!--------------------------------------------------------------------
!> MODULE AMBERNETCDF FUNCTION CHECKNCERROR()
!> .brief Passive check for netcdf error.
subroutine *checkNCerror*(err, location)
use netcdf
implicit none
integer, intent(in) :: err
character(*), optional, intent(in) :: location
if (err .ne. nf90_noerr) then
write(mdout, '(a,a)') 'NetCDF error: ', trim(nf90_strerror(err))
if (present(location)) then
write(mdout, '(a,a)') ' at ', location
end if
end if
******* CAN WE PLEASE EXIT HERE INSTEAD OF RETURN ****** ???
end subroutine checkNCerror
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Apr 20 2017 - 16:00:02 PDT