Re: [AMBER] How to read mdinfo file without crashing the simulation?

From: Dr. M. Shahid <mohammad.shahid.gmail.com>
Date: Mon, 25 Apr 2016 14:31:51 +0200

Hi,

One other thing might influence this which is to increase your ntpr from
1000 to say 10,000 or 20,000 or more as 1000 is too quick to print output
from a 50ns multi gpu simulation and that too in an RMS env.

Best regards,

--
Shahid.
On Mon, Apr 25, 2016 at 2:00 PM, Karolina Markowska <markowska.kar.gmail.com
> wrote:
> Ok, this sounds like something that might work.
> Could you give me an advice, how to run a job with this mdinfo trick?
>
> Best regards.
>
>
>
> 2016-04-25 0:46 GMT+02:00 Bill Ross <ross.cgl.ucsf.edu>:
>
> > One way to hack around it would be to have the code output a new file
> > each time: mdinfo.1, mdinfo.2 or what have you (numbered by step). Then
> > you'd never collide, though you'd have to delete a bunch of small files,
> > and worst case might exceed the number of files a directory can hold.
> >
> > Bill
> >
> > On 4/24/16 12:16 PM, Karolina Markowska wrote:
> > > I can do "cat mdinfo" and in most cases everything runs just fine.
> > > But if I make "cat mdinfo" in the exact moment when the contents of
> > mdinfo
> > > changes, I get the resource unavailable error and the simulation
> crashes.
> > > Also if I use "tail mdinfo" and pmemd changes the mdinfo file,
> everything
> > > crashes. And this is probably related with the queueing system because
> > when
> > > I run the simulation without submitting it into the queue - everything
> is
> > > OK.
> > > If I don't look into mdinfo file during whole simulation, everything
> runs
> > > OK.
> > >
> > > 2016-04-23 20:55 GMT+02:00 Bill Ross <ross.cgl.ucsf.edu>:
> > >
> > >> contents of mdinfo
> > >>
> > >> On 4/23/16 11:53 AM, Bill Ross wrote:
> > >>> So with pmemd, you can run for an arbitrary amount of time, and then
> > the
> > >>> moment you 'cat mdinfo' the simulation crashes on resource
> unavailable,
> > >>> and the contents of pmemd are different each time, and if you don't
> > look
> > >>> at the file, the job runs to completion?
> > >>>
> > >>> Bill
> > >>>
> > >>> On 4/22/16 2:24 AM, Karolina Markowska wrote:
> > >>>> I'm using Ubuntu 14.04, and the file system is ext4. We're using
> > quota.
> > >>>>
> > >>>> I don't have any problem with the "ls -l mdinfo". I'm the owner of
> > this
> > >>>> file, I (theoretically) can read it or change it. The file is
> present
> > >> with
> > >>>> a non-zero length and I can read it using "cat mdinfo" command.
> > >>>> -rw-r----- 1 karolinam user     1257 kwi 21 14:57 mdinfo
> > >>>>
> > >>>> It looks OK, I guess:
> > >>>>
> > >>>>     NSTEP =   950000   TIME(PS) =   52980.000  TEMP(K) =   298.41
> > PRESS
> > >> =
> > >>>> 0.0
> > >>>>     Etot   =   -140475.4219  EKtot   =     36982.3438  EPtot      =
> > >>>> -177457.7657
> > >>>>     BOND   =      1208.0342  ANGLE   =      3085.0042  DIHED      =
> > >>>> 5565.0947
> > >>>>     1-4 NB =      1290.0424  1-4 EEL =     12081.5006  VDWAALS    =
> > >>>> 21022.0260
> > >>>>     EELEC  =   -221794.1217  EHBOND  =         0.0000  RESTRAINT  =
> > >>>> 0.0000
> > >>>>     EAMD_BOOST  =        84.6540
> > >>>>
> > >>
> >
> ------------------------------------------------------------------------------
> > >>>> | Current Timing Info
> > >>>> | -------------------
> > >>>> | Total steps :  25000000 | Completed :    950000 | Remaining :
> > >> 24050000
> > >>>> |
> > >>>> | Average timings for last   20000 steps:
> > >>>> |     Elapsed(s) =      93.93 Per Step(ms) =       4.70
> > >>>> |         ns/day =      36.79   seconds/ns =    2348.27
> > >>>> |
> > >>>> | Average timings for all steps:
> > >>>> |     Elapsed(s) =    4461.87 Per Step(ms) =       4.70
> > >>>> |         ns/day =      36.79   seconds/ns =    2348.36
> > >>>> |
> > >>>> |
> > >>>> | Estimated time remaining:      31.4 hours.
> > >>>>
> > >>
> >
> ------------------------------------------------------------------------------
> > >>>> This issue does not depend on the type of MD simulation I run - it
> > >> happens
> > >>>> during classical MD and aMD.
> > >>>> I rerun the same job using the CPU, typed "tail -f mdinfo" and
> nothing
> > >>>> happened. The simulation is running.
> > >>>> Could it be a problem with pmemd.cuda?
> > >>>> I've ran a simulation on a cluster without PBS (on CPU and GPU)
> and...
> > >>>> everything worked. I don't get it.
> > >>>>
> > >>>> Best regards.
> > >>>>
> > >>>>
> > >>>> 2016-04-21 14:14 GMT+02:00 David A Case <david.case.rutgers.edu>:
> > >>>>
> > >>>>> On Thu, Apr 21, 2016, Karolina Markowska wrote:
> > >>>>>> I have a strange problem. I'm running different aMD simulations
> > tests
> > >>>>> and I
> > >>>>>> want to compare the timings. I know I can find that kind of
> > >> informations
> > >>>>> in
> > >>>>>> the mdinfo file, but here comes the problem: several times when I
> > >> opened
> > >>>>>> mdinfo file (using just "cat mdinfo"), the simulation crashed and
> > I've
> > >>>>> got
> > >>>>>> an error:
> > >>>>>> At line 810 of file runfiles.F90 (unit = 7, file = 'mdinfo')
> > >>>>>> Fortran runtime error: Resource temporarily unavailable
> > >>>>> I don't remember any reports like this, and the amber developers
> > >> (including
> > >>>>> me) do this all the time.
> > >>>>>
> > >>>>> What is your operating system; do you know what sort of file system
> > is
> > >>>>> being
> > >>>>> used on the drive where the mdinfo file is?
> > >>>>>
> > >>>>> Do you run into problems with commands like "ls -l mdinfo"?  Is a
> > >> mdinfo
> > >>>>> file
> > >>>>> present (with non-zero length) when you execute the "cat mdinfo"
> > >> command?
> > >>>>> Can you try to narrow down the problem? Does is depend on using aMD
> > vs.
> > >>>>> regular MD?  Using GPUs vs CPUs?  Submitted to a queuing system vs.
> > >>>>> running interactively?
> > >>>>>
> > >>>>> ....thx...dac
> > >>>>>
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> AMBER mailing list
> > >>>>> AMBER.ambermd.org
> > >>>>> http://lists.ambermd.org/mailman/listinfo/amber
> > >>>>>
> > >>>> _______________________________________________
> > >>>> AMBER mailing list
> > >>>> AMBER.ambermd.org
> > >>>> http://lists.ambermd.org/mailman/listinfo/amber
> > >>> _______________________________________________
> > >>> AMBER mailing list
> > >>> AMBER.ambermd.org
> > >>> http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >> _______________________________________________
> > >> AMBER mailing list
> > >> AMBER.ambermd.org
> > >> http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Apr 25 2016 - 06:00:05 PDT
Custom Search