RE: [AMBER] ptraj.MPI issue with "uninterruptible sleep" from Ross Walker on 2010-02-22 (Amber Archive Feb 2010)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 22 Feb 2010 17:37:23 -0800

Hi Trevor,

> <mpirun -np 4 ptraj.MPI> command, top shows al= of the processes are
> in
> the "D" state. The processes seem to randomly go =n the "R" state
> for no
> more than a second, and I've never seen all process=s running at the
> same
> time. Furthermore, running <mpirun -np 4 ptraj.MP=> on a trajectory
> takes
> longer than -np 2, which takes longer than seri=l ptraj. OpenMPI
> seems to
> be creating the overhead to mulithread, but no u=uable
> multithreading is
> occuring. Running just regular <mpirun ptraj.MP=> produces no "D"
> state in
> the single process.

The performance / behavior you see will be very much dependent on the type
of ptraj run you are doing. The code works by splitting the trajectory file
up into chunks and doing each on a different processor. To begin with not
all actions are supported in parallel and if you try to use a non-supported
action in parallel the code will default back to serial mode for the entire
run.

Secondly if the type of analysis you are doing is very quick meaning it is
IO heavy then you may not see any improvement in parallel. For example
something like a simple trajectory read, image and write or water strip for
example may not benefit from running in parallel. This is particularly true
if the specs of your system are not good. For example if you just have a
single 5400 rpm SATA drive then a simple image will probably be completely
I/O bound and running in parallel will just thrash the disk and actually
reduce performance. Ideally your system needs to support parallel I/O.
Either with something like a luster parallel filesystem or at least a decent
high performance raid array.

Additionally if your trajectory file is small you may not see any benefit in
parallel.

Ideally to get the best performance improvement you want to have a large
trajectory file (not gzipped since this defaults back to serial I/O), you
want to be doing lots of expensive actions such as checkoverlap, rms,
contacts etc, and you need a decent I/O subsystem such as a GOOD raid system
or parallel I/O / parallel SAN.

> ptraj command [ptraj prmtop < infi=e] does not work for me with
> mpirun -np
> 4 ptraj.MPI. It exits with the err=r "MULTIPTRAJ must be run with
> <TOP>
> and <INPUT>". My fix to t=is problem is leaving out the < sign, and
> ptraj.MPI worked.

This is because redirection of standard input is difficult to do when
running in parallel. And difficult to do consistently. E.g.

mpirun -np 4 ptraj.MPI foo.prmtop <foobar

means read foobar as standard input. As if it was typed using the keyboard.
However in parallel this can mean several things. All 4 threads could read
it (from a single 'pseudo' keyboard). All 4 could read from their own
'different pseudo keyboards', just the master thread could read it and the
other threads process nothing or just the master could read and silently
communicate this to the other 3 threads. It is thus simpler in parallel to
just specify the name of the input file and have the code manually open this
in the traditional manner by which files are read.

Having checked the manual I see this is not clear in the manual so I guess I
should post an update to this.

All the best
Ross

/\
\/
|\oss Walker

| Assistant Research Professor |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Feb 22 2010 - 18:00:03 PST