-----[1]<amber-bounces.amberm= d.org> wrote: -----
To: AMBER Mailing List [2]<amber.ambermd.org>
From= : Jason Swails [3]<jason.swails.gmail.com>
Sent by: [4]<amber-bo= unces.ambermd.org>
Date: 02/22/2010 04:38PM
Subject: Re: [AMBE= R] ptraj.MPI issue with "uninterruptible sleep"
Hello,
On Mon, Feb 22, 2010 at 7= :04 PM, Trevor Gokey [5]<tgokey.sfsu.edu>
wrote:
>
> Hell= o,
>
> First I want to say that I am excited abou t the = upcoming GPU
support in
> AMBER 11. Thanks for the information= Ross Wal ker.
>
> I have since moved on to ptraj,MPI wh= ile I wa it for AMBER 11. Both
OpenMPI
> 1.4.1, Ambertools 1.3= , and parallel ptraj com piled without error on
my
> Ubuntu sy= stem. I used the GNU compilers, gcc 4.4. However, when
running
> &nbs= p; <mpirun -np 4 ptraj.MPI> command, top shows al l of the
processes = are in
> the "D" state. The processes seem to randomly go &nbs= p;in the "R"
state for no
> more than a second, and I've never= seen all process es running at the
same
> time. Furthermore, = running <mpirun -np 4 ptraj.MP I> on a trajectory
takes
> = ; longer than -np 2, which takes longer than seri al ptraj. OpenMPI
seems t= o
> be creating the overhead to mulithread, but no u suable mu ltithreading is
> occuring. Running just regular <mpirun pt= raj.MP I> produces no "D"
state in
> the single process.
I do not have much experience on this part. To my knowledge,
= ptraj.MPI is (presently) aimed primarily at parallelizing file
reading.&= nbsp;
&n= bsp;
I'm not sure. If you run something with mpirun ptraj.MPI, it splits it up
into the appropriate number of ranks which have an equal number of sets to
process. However there is only a single progress bar for the entire
process, so it's hard to say how exactly ptraj.MPI is doing its magic.
Actually, if I recall from the AT manual, ptraj.MPI diffe= rs from serial
ptraj in that serial ptraj reads the trajectory twice: once = to
read/check, once to process. ptraj.MPI gets rid of this and reads the tr ajectory only once. So perhaps ptraj.MPI has in fact found a way to
process= the trajectory in parallel. I could be wrong.
I'm not sure that many of the act= ual manipulations are done
in parallel yet as of AT 1.3. Someone p= lease correct me if I'm
mistaken. If your trajectory files are not= very big, I don't see how
helpful ptraj.MPI will be...
Well I di= scovered all of this while trying to benchmark ptraj.MPI.
Coverting a 8.6GB= mdcrd file to binpos took 6 minutes with mpirun -np 4,
5.5 minutes with -n= p 2, and 5 minutes with serial, all consistently.
That lead me to investiga= te and find the sleeping processes.
I just discovered= something that might make this more of a local issue on
my part. Serial pt= raj does go into "D" 2-3 % of the time. I do have LVM
on top of RAID 0 with= 2 HDDs, so I'm wondering if ptraj/OpenMPI isn't
playing nice with my setup= . hdparm reports nice sequential read numbers,
so I can't really say it's s= pecifically my configuration. I'll reinstall
with just a single hard drive = and see if my luck turns.
>
> This is probably more = of a OpenMPI issue perhaps, but I thought I'd
post m y experience on= the AMBER board to see if anyone has had this
experience. I 've bee= n trying to figure out how to go about debugging
OpenMPI, and it see = ms like a bear.
>
> On a side note about ptraj.MPI (bu = t perhaps pertinent)--the
traditional
> ptraj command [ptraj p= rmtop < infi le] does not work for me with
mpirun -np
> 4 p= traj.MPI. It exits with the err or "MULTIPTRAJ must be run with
<TOP> > and <INPUT>". My fix to t his problem is leavin= g out the < sign,
and
> ptraj.MPI worked.
This is su= btle, but it's a different way of providing input. The <
sign d= umps the input file into the program as stdin (it does not
register as a= command-line argument), so ptraj reads the file lines as
standard input= rather than opening the file and reading them that way.
Leaving t= hat sign out makes ptraj read it as a file (and now it DOES
count as a c= ommand-line argument). I'm guessing it's easier to ignore
stdin st= reams in a multi-threaded environment and just read files, but
again I m= ay be mistaken here. In any case, ptraj.MPI will quit with
less th= an 3 arguments (the first being ptraj, second being prmtop,
third being = input file).
Thus, taking < out certainly makes it work, and that= 's all that needs
to be done, and the above is the reason. So here= was just a
long-winded explanation for something you figured out how to= do easily
enough, though hopefully it'll be at least slightly helpful t= o
someone.
Thanks, I do appreciate the feedback.
All the b= est,
Jason
--
---------------------------------------
= Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D= . Graduate Student
352-392-4032
-Trevor
____ ________________________
F= __________________
AMBER mailing lis= t
[6]AMBER.ambermd.org
[7]
http://lists.ambermd.org/mailman/listinfo/amber
References
1. 3D"mailto:amber-bounces.ambermd.org"
2. 3D"mailto:amber.ambermd.org"
3. 3D"mailto:jason.swai 4. 3D"mailto:amber-bounces.ambermd.org"
5. 3D"mailto:tgok 6. 3D"mailto:AMBER.ambermd.org 7. 3D"
http://lists.ambermd.org/mailman/list_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Feb 22 2010 - 17:30:02 PST