AMBER: REMD and mpiexec

From: Steve Young <chemadm.hamilton.edu>
Date: Tue, 05 Jun 2007 20:53:14 -0400

Hello,
        We have a beowulf cluster that is running torque-2.0.0p7 (PBS), RedHat
Enterprise 4, Amber9 and mpich2-1.0.5. We've had a heck of a time with
using mpiexec vs. mpirun when trying to run different aspects of
sander.MPI.

  Some history, if we use mpirun (with or without the queue (PBS)
system) sander.MPI runs as expected. We get good output with no errors.

However, there is one major issue. The nodes that PBS allocates end up
not always being the nodes the job runs on. This is a problem since
mpich seems to manage job allocation with mpirun itself. So in posting
to the mpich-discuss list I found out I needed to use the mpiexec
program within the mpich distro. It also turned out I needed the version
from OSC that works with torque. After installing OSC mpiexec, we ran
some normal sander.MPI jobs and received output as expected. So now we
start to test out some Replica Exchange jobs that we've run on other
clusters.

Here is some of my post from the mpich-discuss listserv:

<.... snip ...>
Ok so I got the OSC version of mpiexec. This appears to work very well
running normal sander.MPI. Requesting 16 cpu's we verify good output and
near 100% utilization of all 16 processes. Now the next thing we want to
use is another part of Amber called Replica Exchange. It basically is
different arguments to the sander.MPI program. When I run this part of
the program I end up with the following results:

 Error: specified more groups ( 8 ) than the number of
processors (
           1 ) !
[unset]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
 Error: specified more groups ( 8 ) than the number of
processors (
           1 ) !
[unset]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
 Error: specified more groups ( 8 ) than the number of
processors (
           1 ) !


Now I realize I should be posting to the Amber list as this appears to
be an Amber related problem. I myself would tend to believe that. But
what I can't explain is why when I change to the original version of
mpirun that the program runs fine using the exact same files.

<.... snip....>


So does this mean that Amber9 isn't working properly with the OSC
version of mpiexec? Which combinations of amber and mpi work best on a
torque beowulf cluster? Thanks in advance for any advice.

-Steve


-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Wed Jun 06 2007 - 06:07:38 PDT
Custom Search