Re: [AMBER] ptraj.MPI issue with from Trevor Gokey on 2010-02-22 (Amber Archive Feb 2010)

From: Trevor Gokey <tgokey.sfsu.edu>
Date: Mon, 22 Feb 2010 17:11:25 -0800

   -----[1]<amber-bounces.amberm= d.org> wrote: -----

     To: AMBER Mailing List [2]<amber.ambermd.org>
     From= : Jason Swails [3]<jason.swails.gmail.com>
     Sent by: [4]<amber-bo= unces.ambermd.org>
     Date: 02/22/2010 04:38PM
     Subject: Re: [AMBE= R] ptraj.MPI issue with "uninterruptible sleep"
     Hello,
     On Mon, Feb 22, 2010 at 7= :04 PM, Trevor Gokey [5]<tgokey.sfsu.edu>
     wrote:
>
> Hell= o,
>
> First I want to say that I am excited abou t the = upcoming GPU
     support in
> AMBER 11. Thanks for the information= Ross Wal ker.
>
> I have since moved on to ptraj,MPI wh= ile I wa it for AMBER 11. Both
     OpenMPI
> 1.4.1, Ambertools 1.3= , and parallel ptraj com piled without error on
     my
> Ubuntu sy= stem. I used the GNU compilers, gcc 4.4. However, when
     running
> &nbs= p; <mpirun -np 4 ptraj.MPI> command, top shows al l of the
     processes = are in
> the "D" state. The processes seem to randomly go &nbs= p;in the "R"
     state for no
> more than a second, and I've never= seen all process es running at the
     same
> time. Furthermore, = running <mpirun -np 4 ptraj.MP I> on a trajectory
     takes
> = ; longer than -np 2, which takes longer than seri al ptraj. OpenMPI
     seems t= o
> be creating the overhead to mulithread, but no u suable mu ltithreading is
> occuring. Running just regular <mpirun pt= raj.MP I> produces no "D"
     state in
> the single process.
     I do not have much experience on this part. To my knowledge,
     = ptraj.MPI is (presently) aimed primarily at parallelizing file
     reading.&= nbsp;
     &n= bsp;
     I'm not sure. If you run something with mpirun ptraj.MPI, it splits it up
     into the appropriate number of ranks which have an equal number of sets to
     process. However there is only a single progress bar for the entire
     process, so it's hard to say how exactly ptraj.MPI is doing its magic.
     Actually, if I recall from the AT manual, ptraj.MPI diffe= rs from serial
     ptraj in that serial ptraj reads the trajectory twice: once = to
     read/check, once to process. ptraj.MPI gets rid of this and reads the tr ajectory only once. So perhaps ptraj.MPI has in fact found a way to
     process= the trajectory in parallel. I could be wrong.
     I'm not sure that many of the act= ual manipulations are done
     in parallel yet as of AT 1.3. Someone p= lease correct me if I'm
     mistaken. If your trajectory files are not= very big, I don't see how
     helpful ptraj.MPI will be...
     Well I di= scovered all of this while trying to benchmark ptraj.MPI.
     Coverting a 8.6GB= mdcrd file to binpos took 6 minutes with mpirun -np 4,
     5.5 minutes with -n= p 2, and 5 minutes with serial, all consistently.
     That lead me to investiga= te and find the sleeping processes.
     I just discovered= something that might make this more of a local issue on
     my part. Serial pt= raj does go into "D" 2-3 % of the time. I do have LVM
     on top of RAID 0 with= 2 HDDs, so I'm wondering if ptraj/OpenMPI isn't
     playing nice with my setup= . hdparm reports nice sequential read numbers,
     so I can't really say it's s= pecifically my configuration. I'll reinstall
     with just a single hard drive = and see if my luck turns.
>
> This is probably more = of a OpenMPI issue perhaps, but I thought I'd
     post m y experience on= the AMBER board to see if anyone has had this
     experience. I 've bee= n trying to figure out how to go about debugging
     OpenMPI, and it see = ms like a bear.
>
> On a side note about ptraj.MPI (bu = t perhaps pertinent)--the
     traditional
> ptraj command [ptraj p= rmtop < infi le] does not work for me with
     mpirun -np
> 4 p= traj.MPI. It exits with the err or "MULTIPTRAJ must be run with
     <TOP> > and <INPUT>". My fix to t his problem is leavin= g out the < sign,
     and
> ptraj.MPI worked.
     This is su= btle, but it's a different way of providing input. The <
     sign d= umps the input file into the program as stdin (it does not
     register as a= command-line argument), so ptraj reads the file lines as
     standard input= rather than opening the file and reading them that way.
      Leaving t= hat sign out makes ptraj read it as a file (and now it DOES
     count as a c= ommand-line argument). I'm guessing it's easier to ignore
     stdin st= reams in a multi-threaded environment and just read files, but
     again I m= ay be mistaken here. In any case, ptraj.MPI will quit with
     less th= an 3 arguments (the first being ptraj, second being prmtop,
     third being = input file).
     Thus, taking < out certainly makes it work, and that= 's all that needs
     to be done, and the above is the reason. So here= was just a
     long-winded explanation for something you figured out how to= do easily
     enough, though hopefully it'll be at least slightly helpful t= o
     someone.
     Thanks, I do appreciate the feedback.
     All the b= est,
     Jason
     --
     ---------------------------------------
     = Jason M. Swails
     Quantum Theory Project,
     University of Florida
     Ph.D= . Graduate Student
     352-392-4032
     -Trevor
     ____ ________________________
     F= __________________
     AMBER mailing lis= t
     [6]AMBER.ambermd.org
     [7]http://lists.ambermd.org/mailman/listinfo/amber


References

   1. 3D"mailto:amber-bounces.ambermd.org"
   2. 3D"mailto:amber.ambermd.org"
   3. 3D"mailto:jason.swai 4. 3D"mailto:amber-bounces.ambermd.org"
   5. 3D"mailto:tgok 6. 3D"mailto:AMBER.ambermd.org 7. 3D"http://lists.ambermd.org/mailman/list_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Feb 22 2010 - 17:30:02 PST