Re: [AMBER] Question about mdinfo file. from Robert Duke on 2010-08-17 (Amber Archive Aug 2010)

From: Robert Duke <rduke.email.unc.edu>
Date: Tue, 17 Aug 2010 13:08:06 -0400

Wow, you guys put a lot of good stuff in there :-) I am not in the habit of
looking at mdinfo, obviously; I just do most of this stuff in my head. But
on performance, yes, it does pick up with time as a general rule, probably
primarily as a function of load balancing, but also if there is any settling
of the dynamics (so if there are more "hot spots" early in equilibration,
this results in the need to build the pairlist more frequently). When I
used to do lots of benchmarking, I had determined that you needed somewhere
on the order of 5-10K steps when working in the neighborhood of 128 procs
for loadbalancing to get completely fine-tuned.
Regards - Bob
----- Original Message -----
From: "Ross Walker" <ross.rosswalker.co.uk>
To: "'AMBER Mailing List'" <amber.ambermd.org>
Sent: Tuesday, August 17, 2010 12:52 PM
Subject: Re: [AMBER] Question about mdinfo file.

> Hi Mayank,
>
> To follow up on what others have said.
>
> Firstly the mdinfo file in pmemd is only flushed at specific time
> intervals
> hence why it is not always insync with mdout. Short answer, don't worry
> about it. You just have to wait a while. The interval is 60 seconds so if
> you are running a multiple hour job it makes no difference.
>
> An example of the output shown in the mdinfo file is:
>
>
> NSTEP = 960000 TIME(PS) = 28170.000 TEMP(K) = 310.64 PRESS =
> 0.0
> Etot = -262918.0681 EKtot = 76197.9193 EPtot =
> -339115.9874
> BOND = 4711.2662 ANGLE = 12150.7208 DIHED =
> 15850.2509
> 1-4 NB = 5280.4684 1-4 EEL = 67234.5675 VDWAALS =
> 29059.5335
> EELEC = -473402.7946 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> Ewald error estimate: 0.2009E-04
>
> ----------------------------------------------------------------------------
> --
> | Current Timing Info
> | -------------------
> | Total steps : 2500000 | Completed : 960000 | Remaining : 1540000
> |
> | Average timings for last 5000 steps:
> | Elapsed(s) = 64.4 Per Step(ms) = 12.9
> | ns/day = 13.4 seconds/ns = 6436.2
> |
> | Average timings for all steps:
> | Elapsed(s) = 12434.9 Per Step(ms) = 13.0
> | ns/day = 13.3 seconds/ns = 6476.5
> |
> |
> | Estimated time remaining: 5.5 hours.
>
> ----------------------------------------------------------------------------
> --
>
> The timing info is an estimate of the performance and is in two forms. It
> is
> calculated entirely from the wallclock time to a resolution of ms. The
> timers are started after the initial setup time so should be real
> performance although still an estimate.
>
> The first section "Average timings for last XXXX steps:" provides the
> approximate speed for the last XXXX steps of MD. This is an instantaneous
> measure of the performance since the last mdinfo file was written, so
> typically over the last 60 seconds. This is useful to see if something has
> gone wrong with performance during a run. For example a node is
> misbehaving
> and running slow. If the performance for the last XXXX steps is
> significantly worse than the average timing then you should consider
> checking your hardware etc.
>
> The second section "Average timings for all steps: " shows the performance
> over the entire run so far. This is the metric that really tells you the
> achievable performance you are getting. It is useful if you want to
> quickly
> see if you are getting better performance by using more processors etc.
> Note
> it is still an estimate.
>
> Finally there is the line: "Estimated time remaining:" This is an
> indication
> of how long the run will take to complete the remaining nstlim steps based
> on the current performance. The main use of this, except for checking if
> it
> will take say 200 days to complete!, is to see if the calculation is
> likely
> to complete within the remaining wallclock limit of your batch job. There
> is
> nothing more annoying than having a job killed by a queuing system 10
> minutes before it was due to finish.
>
> All of this is information that you could calculate manually yourself from
> the output file and the queuing systems wallclock time to date etc but
> this
> approach just makes it much easier to see.
>
> With regards to instantaneous performance it could be higher than the
> average performance for a number of reasons. First the code can actually
> get
> faster with time. Load balancing can help improve performance over time
> during a run. Additionally changes in density of your system etc can
> influence, slightly, the mathematical cost per step and communication
> overheads etc. The second reason is just a basic stastical argument
> related
> to convergence of averages. The average over the last XXX steps (approx 60
> seconds) has much more noise on it that the average over all steps.
>
> I hope that helps.
>
> All the best
> Ross
>
>> -----Original Message-----
>> From: Mayank Daga [mailto:mdaga.vt.edu]
>> Sent: Tuesday, August 17, 2010 8:55 AM
>> To: AMBER Mailing List
>> Subject: Re: [AMBER] Question about mdinfo file.
>>
>> What I am concerned is how would the ns/day be affected if and if not
>> the
>> simulations run to the entirety. The mdinfo file states ns/day obtained
>> due
>> to last 'x' steps, hence if the 'x' = 10000 and not 1000, is there a
>> chance
>> the average would be better for 10000 steps??
>> ~mayank
>>
>> On Tue, Aug 17, 2010 at 11:31 AM, Jason Swails
>> <jason.swails.gmail.com>wrote:
>>
>> > Hello,
>> >
>> > I believe that pmemd does not update the mdinfo file as often as it
>> updates
>> > the mdout file due to performance implications. You can figure out
>> the
>> > source of this discrepancy by digging through the pmemd code, but
>> this has
>> > no effect on your results.
>> >
>> > Hope this helps,
>> > Jason
>> >
>> > On Tue, Aug 17, 2010 at 11:10 AM, Mayank Daga <mdaga.vt.edu> wrote:
>> >
>> > > Hi,
>> > >
>> > > I am a newbie using AMBER on the GPUs.
>> > > When I run my simulations, I see two output files, mdout and
>> mdinfo. In
>> > the
>> > > mdinfo file, I see the timing details as to how many ns/day I get.
>> The
>> > > issue
>> > > is some steps are always uncompleted according to this file while
>> mdout
>> > > lists that all the steps have been completed. Why is this
>> discrepancy?
>> > > For example, if I run a simulation for 10000 steps, mdinfo shows
>> 9000
>> > steps
>> > > remaining while mdout list energy values for all 10000 steps.
>> > >
>> > > I am using the input files as downloaded from the AMBER website and
>> to
>> > run
>> > > the simulation:
>> > > ~/amber11/bin/pmemd.cuda -O -i mdin -o mdout -p prmtop -c inpcrd -
>> r
>> > restrt
>> > > -x mdcrd -gpu 0
>> > >
>> > > Please explain this behaviour.
>> > >
>> > > Thanks,
>> > > ~mayank
>> > >
>> > >
>> > > --
>> > > Mayank Daga | SyNeRGy Laboratory | Dept. of Computer Science
>> > > Virginia Tech | http://synergy.cs.vt.edu | http://www.cs.vt.edu
>> > > _______________________________________________
>> > > AMBER mailing list
>> > > AMBER.ambermd.org
>> > > http://lists.ambermd.org/mailman/listinfo/amber
>> > >
>> >
>> >
>> >
>> > --
>> > Jason M. Swails
>> > Quantum Theory Project,
>> > University of Florida
>> > Ph.D. Graduate Student
>> > 352-392-4032
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>>
>>
>>
>> --
>> Mayank Daga | SyNeRGy Laboratory | Dept. of Computer Science
>> Virginia Tech | http://synergy.cs.vt.edu | http://www.cs.vt.edu
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Aug 17 2010 - 10:30:23 PDT