Re: AMBER: timing info output from pmemd from Robert Duke on 2008-07-11 (Amber Archive Jul 2008)

From: Robert Duke <rduke.email.unc.edu>
Date: Fri, 11 Jul 2008 09:41:21 -0400

Presuming that you are not writing over an NFS share, the difference should
be no more than 0-3% for a low-scaling job such as one you would run on an
ethernet cluster. I suppose if you were dumping output every step, the
percentage could go up. Where this flush interval stuff comes into play is
32+ processors, and most typically 128+ processors, and in instances where
the way the i/o subsystem is designed, you don't get any mdout output in a
convenient timespan. I originally suggested dinking with this, thinking
that if the problem was at a regular interval, this would change the
interval and identify the open/close as where the vulnerability occurs. But
since you say the timing is random, I am not sure what it would really tell
us. If the run is less than 1 hr, you can essentially turn the flush off,
and then if the problem went away, that would say it is linked somehow to
closing and reopening the file - still doesn't fix it though, and it is
still likely a problem down in the file system control variables which are
getting stomped on by something else. This may be in user space where the
application could be stomping it, but I am not sure (so fortran has a file
abstraction it layers over the system file abstraction, and data structures
there could get hosed I would think). I still think that the best bet is we
have a somehow incompatible combination of libraries, compilers, etc.; if
there was some sort of buffer overrun problem in the code itself, bad
things would be happening with pmemd running all over the place (not
impossible, just seems unlikely). I think I did suggest just running the
factor ix benchmark (or jac if you prefer, but with a few things like
nstlim, maybe output params) tweaked, to see if this occurs in a vanilla
situation, or if there is something unusual about the combination of things
you are doing here (which I at least have never done myself).
Regards - Bob

----- Original Message -----
From: "Vlad Cojocaru" <Vlad.Cojocaru.eml-r.villa-bosch.de>
To: <amber.scripps.edu>
Sent: Friday, July 11, 2008 8:33 AM
Subject: Re: AMBER: timing info output from pmemd

> In fact, I was trying to test whether I still get the output problem
> described before in the previous thread by using mdout_flush_interval to 0
> or other value. In parallel, I wanted to see how much slower pmemd is when
> mdout_flush_interval is set to 0 for instance. Since the ouptut problem
> appears in average only after about 25000 steps, having such a regular
> output about the timing (in a similar fashion NAMD does) would tell me
> immediately the difference in performance between runs using different
> mdout_flush_interval without the need to test that in advance.
>
> If the difference in performance is significant, it would be useless to
> run tests of at least 25000 steps to see if the output problem is still
> present.
>
> On the other hand, I am thinking of going into compiling a version of
> amber 10 for my own use as the compilation we have here produces these
> problems and I was thinking that such a regular timing output would be
> convenient for fast tests of different compilations (using different
> compilers). But of course this can be done by running the benchmarking
> runs.
>
> To summarize, this output is not really a necessary feature .. I was just
> wondering if there is an option to have it.
>
> Vlad
>
>
>
> Robert Duke wrote:
>> No, and I guess I don't understand why you would want to be able to do
>> that. Are you looking for variations in the performance of the machine,
>> or what? In pmemd there is what is basically a parallelization log
>> (logfile), which is sort of similar to the sander profile file in that it
>> offers summary parallel performance info. It also has the ability to
>> dump details about how fft's are being distributed and details about
>> workload redistribution, including just how much time each processor is
>> spending doing what since the last workload redistribution. This is
>> intended for working on parallel performance problems, and the higher
>> dumping levels may not even be documented (the namelist variable is
>> loadbal_verbose in &cntrl, default 0, 1 gives a bit additional info, by 3
>> you are getting a whole bunch of detail). This may not be what you want,
>> but it is what I use to debug parallel performance problems.
>> Regards - Bob Duke
>>
>> ----- Original Message ----- From: "Vlad Cojocaru"
>> <Vlad.Cojocaru.eml-r.villa-bosch.de>
>> To: "AMBER list" <amber.scripps.edu>
>> Sent: Friday, July 11, 2008 4:49 AM
>> Subject: AMBER: timing info output from pmemd
>>
>>
>>> Dear Bob, amber users,
>>>
>>> Is there a way to print timing info (time/mdstep) at regular intervals
>>> in pmemd (and/or sander) ?
>>>
>>> Vlad
>>>
>>> --
>>> ----------------------------------------------------------------------------
>>>
>>> Dr. Vlad Cojocaru
>>>
>>> EML Research gGmbH
>>> Schloss-Wolfsbrunnenweg 33
>>> 69118 Heidelberg
>>>
>>> Tel: ++49-6221-533266
>>> Fax: ++49-6221-533298
>>>
>>> e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de
>>>
>>> http://projects.villa-bosch.de/mcm/people/cojocaru/
>>>
>>> ----------------------------------------------------------------------------
>>>
>>> EML Research gGmbH
>>> Amtgericht Mannheim / HRB 337446
>>> Managing Partner: Dr. h.c. Klaus Tschira
>>> Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter
>>> http://www.eml-r.org
>>> ----------------------------------------------------------------------------
>>>
>>>
>>>
>>> -----------------------------------------------------------------------
>>> The AMBER Mail Reflector
>>> To post, send mail to amber.scripps.edu
>>> To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
>>> to majordomo.scripps.edu
>>>
>>
>> -----------------------------------------------------------------------
>> The AMBER Mail Reflector
>> To post, send mail to amber.scripps.edu
>> To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
>> to majordomo.scripps.edu
>>
>
> --
> ----------------------------------------------------------------------------
> Dr. Vlad Cojocaru
>
> EML Research gGmbH
> Schloss-Wolfsbrunnenweg 33
> 69118 Heidelberg
>
> Tel: ++49-6221-533266
> Fax: ++49-6221-533298
>
> e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de
>
> http://projects.villa-bosch.de/mcm/people/cojocaru/
>
> ----------------------------------------------------------------------------
> EML Research gGmbH
> Amtgericht Mannheim / HRB 337446
> Managing Partner: Dr. h.c. Klaus Tschira
> Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter
> http://www.eml-r.org
> ----------------------------------------------------------------------------
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
> to majordomo.scripps.edu
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
to majordomo.scripps.edu
Received on Sun Jul 13 2008 - 06:07:47 PDT