Hi,
One other thing might influence this which is to increase your ntpr from
1000 to say 10,000 or 20,000 or more as 1000 is too quick to print output
from a 50ns multi gpu simulation and that too in an RMS env.
Best regards,
--
Shahid.
On Mon, Apr 25, 2016 at 2:00 PM, Karolina Markowska <markowska.kar.gmail.com
> wrote:
> Ok, this sounds like something that might work.
> Could you give me an advice, how to run a job with this mdinfo trick?
>
> Best regards.
>
>
>
> 2016-04-25 0:46 GMT+02:00 Bill Ross <ross.cgl.ucsf.edu>:
>
> > One way to hack around it would be to have the code output a new file
> > each time: mdinfo.1, mdinfo.2 or what have you (numbered by step). Then
> > you'd never collide, though you'd have to delete a bunch of small files,
> > and worst case might exceed the number of files a directory can hold.
> >
> > Bill
> >
> > On 4/24/16 12:16 PM, Karolina Markowska wrote:
> > > I can do "cat mdinfo" and in most cases everything runs just fine.
> > > But if I make "cat mdinfo" in the exact moment when the contents of
> > mdinfo
> > > changes, I get the resource unavailable error and the simulation
> crashes.
> > > Also if I use "tail mdinfo" and pmemd changes the mdinfo file,
> everything
> > > crashes. And this is probably related with the queueing system because
> > when
> > > I run the simulation without submitting it into the queue - everything
> is
> > > OK.
> > > If I don't look into mdinfo file during whole simulation, everything
> runs
> > > OK.
> > >
> > > 2016-04-23 20:55 GMT+02:00 Bill Ross <ross.cgl.ucsf.edu>:
> > >
> > >> contents of mdinfo
> > >>
> > >> On 4/23/16 11:53 AM, Bill Ross wrote:
> > >>> So with pmemd, you can run for an arbitrary amount of time, and then
> > the
> > >>> moment you 'cat mdinfo' the simulation crashes on resource
> unavailable,
> > >>> and the contents of pmemd are different each time, and if you don't
> > look
> > >>> at the file, the job runs to completion?
> > >>>
> > >>> Bill
> > >>>
> > >>> On 4/22/16 2:24 AM, Karolina Markowska wrote:
> > >>>> I'm using Ubuntu 14.04, and the file system is ext4. We're using
> > quota.
> > >>>>
> > >>>> I don't have any problem with the "ls -l mdinfo". I'm the owner of
> > this
> > >>>> file, I (theoretically) can read it or change it. The file is
> present
> > >> with
> > >>>> a non-zero length and I can read it using "cat mdinfo" command.
> > >>>> -rw-r----- 1 karolinam user 1257 kwi 21 14:57 mdinfo
> > >>>>
> > >>>> It looks OK, I guess:
> > >>>>
> > >>>> NSTEP = 950000 TIME(PS) = 52980.000 TEMP(K) = 298.41
> > PRESS
> > >> =
> > >>>> 0.0
> > >>>> Etot = -140475.4219 EKtot = 36982.3438 EPtot =
> > >>>> -177457.7657
> > >>>> BOND = 1208.0342 ANGLE = 3085.0042 DIHED =
> > >>>> 5565.0947
> > >>>> 1-4 NB = 1290.0424 1-4 EEL = 12081.5006 VDWAALS =
> > >>>> 21022.0260
> > >>>> EELEC = -221794.1217 EHBOND = 0.0000 RESTRAINT =
> > >>>> 0.0000
> > >>>> EAMD_BOOST = 84.6540
> > >>>>
> > >>
> >
> ------------------------------------------------------------------------------
> > >>>> | Current Timing Info
> > >>>> | -------------------
> > >>>> | Total steps : 25000000 | Completed : 950000 | Remaining :
> > >> 24050000
> > >>>> |
> > >>>> | Average timings for last 20000 steps:
> > >>>> | Elapsed(s) = 93.93 Per Step(ms) = 4.70
> > >>>> | ns/day = 36.79 seconds/ns = 2348.27
> > >>>> |
> > >>>> | Average timings for all steps:
> > >>>> | Elapsed(s) = 4461.87 Per Step(ms) = 4.70
> > >>>> | ns/day = 36.79 seconds/ns = 2348.36
> > >>>> |
> > >>>> |
> > >>>> | Estimated time remaining: 31.4 hours.
> > >>>>
> > >>
> >
> ------------------------------------------------------------------------------
> > >>>> This issue does not depend on the type of MD simulation I run - it
> > >> happens
> > >>>> during classical MD and aMD.
> > >>>> I rerun the same job using the CPU, typed "tail -f mdinfo" and
> nothing
> > >>>> happened. The simulation is running.
> > >>>> Could it be a problem with pmemd.cuda?
> > >>>> I've ran a simulation on a cluster without PBS (on CPU and GPU)
> and...
> > >>>> everything worked. I don't get it.
> > >>>>
> > >>>> Best regards.
> > >>>>
> > >>>>
> > >>>> 2016-04-21 14:14 GMT+02:00 David A Case <david.case.rutgers.edu>:
> > >>>>
> > >>>>> On Thu, Apr 21, 2016, Karolina Markowska wrote:
> > >>>>>> I have a strange problem. I'm running different aMD simulations
> > tests
> > >>>>> and I
> > >>>>>> want to compare the timings. I know I can find that kind of
> > >> informations
> > >>>>> in
> > >>>>>> the mdinfo file, but here comes the problem: several times when I
> > >> opened
> > >>>>>> mdinfo file (using just "cat mdinfo"), the simulation crashed and
> > I've
> > >>>>> got
> > >>>>>> an error:
> > >>>>>> At line 810 of file runfiles.F90 (unit = 7, file = 'mdinfo')
> > >>>>>> Fortran runtime error: Resource temporarily unavailable
> > >>>>> I don't remember any reports like this, and the amber developers
> > >> (including
> > >>>>> me) do this all the time.
> > >>>>>
> > >>>>> What is your operating system; do you know what sort of file system
> > is
> > >>>>> being
> > >>>>> used on the drive where the mdinfo file is?
> > >>>>>
> > >>>>> Do you run into problems with commands like "ls -l mdinfo"? Is a
> > >> mdinfo
> > >>>>> file
> > >>>>> present (with non-zero length) when you execute the "cat mdinfo"
> > >> command?
> > >>>>> Can you try to narrow down the problem? Does is depend on using aMD
> > vs.
> > >>>>> regular MD? Using GPUs vs CPUs? Submitted to a queuing system vs.
> > >>>>> running interactively?
> > >>>>>
> > >>>>> ....thx...dac
> > >>>>>
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> AMBER mailing list
> > >>>>> AMBER.ambermd.org
> > >>>>> http://lists.ambermd.org/mailman/listinfo/amber
> > >>>>>
> > >>>> _______________________________________________
> > >>>> AMBER mailing list
> > >>>> AMBER.ambermd.org
> > >>>> http://lists.ambermd.org/mailman/listinfo/amber
> > >>> _______________________________________________
> > >>> AMBER mailing list
> > >>> AMBER.ambermd.org
> > >>> http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >> _______________________________________________
> > >> AMBER mailing list
> > >> AMBER.ambermd.org
> > >> http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Apr 25 2016 - 06:00:05 PDT