"Unit 5 Error" with a Linux/MPICH Amber7

From: Vincent BOSQUIER <vbosquier_at_nimes.syntem.com>
Date: Mon 6 May 2002 17:50:07 +0200

Hi all,

I have installed AMBER7 with MPICH on a Linux RedHat-7.2 cluster. To validate my installation, I run a script that has already been run on AMBER7 installed on a 1 CPU SGI server. This script includes several successive commands, including calls to sander. The script has the following structure:

-----------

#!/bin/csh -f

setenv $AMBERHOME /data/test/amber7
setenv MPICH_HOME /usr/share/mpi
setenv DO_PARALLEL "$MPICH_HOME/bin/mpirun -np 4 -machinefile $MPICH_HOME/share/machines.LINUX"

$AMBERHOME/exe/sander -O \
                                        -i mini1.in \
                                   -o test1.out \
                                   -p test.top \
                                   -c test.crd \
                                   -inf test1.info \
                                           -r test1.rst

$AMBERHOME/exe/sander -O \
                                        -i mini2.in \
                                        (...)


-----------

It seems that sander crashes with the following error messages, whenever I try to run such a script:

-----------

  Unit 5 Error on OPEN: mini1.in
[0] MPI Abort by user Aborting program !
[0] Aborting program!
p0_3492: p4_error: : 1

  Unit 5 Error on OPEN: mini2.in
[0] MPI Abort by user Aborting program !
[0] Aborting program!
p0_3493: p4_error: : 1

(...)

-----------

As I already said above, a researcher in our molecular modeling team tried to run the same test files on an SGI machine where I previously installed AMBER7 locally without MPICH and it worked fine. Is there a problem with our input files? Is there a difference in the input files for AMBER7 when you runit on 1 or on several processors?
Today, what I'm sure about is that "make test.sander" passed without any problem on the cluster. I don't know wether MPICH is correctly configured or not, but I think it is, because of some tests I have successfully made (see below).

Can one tell me what is a "Unit 5 error", and how I can manage it so that sander runs normally with all the processors I define in the machinefile?

We also experienced sander-crashes problems with "Unit 6 error" that seemed to be related to ".out" files. Has anyone any information about this too?

Here are some informations about the machines and the tests I ran to validate my MPICH module. Maybe it will help you have an idea of what is happening:

-----------

IBM x330series - Linux RedHat-7.2
Test of parallel computing using "mpich-1.2.0" installed from RedHat's RPMs.
$MPICH_HOME=/usr/share/mpi
DO_PARALLEL="$MPICH_HOME/bin/mpirun -np 4 -machinefile $MPICH_HOME/share/machines.LINUX"
MPICH Machinefile is "machines.LINUX" and contains 4 lines formatted that way:

machine2.ourdomain
machine2.ourdomain
machine1.ourdomain
machine1.ourdomain

"machine2" and "machine1" are biprocessors nodes in my cluster

The /data/test directory is a local directory on "machine1" and is NFS-mounted on "machine2" where /data/test is also the name of the mountpoint.
User "me" owns $MPICH_HOME directory (and all of its contents).
User "me" also owns /data/test directory (and all of its contents, including the "cpi" executable file).
Command line used and associated results look like this:

<me_at_machine1:/data/test>/usr/share/mpi/bin/mpirun -np 4 -machinefile /usr/share/mpi/share/machines.LINUX ./cpi
Process 0 on machine1.ourdomain
Process 3 on machine1.ourdomain
Process 1 on machine2.ourdomain
Process 2 on machine2.ourdomain
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.001346

-----------

Thanks in advance to all those who will help me.

Vincent.


---------------------------------------------------------------------
Vincent Bosquier
IT Engineer

Synt:em
Computational Drug Discovery
Parc Scientifique G.Besse
Allee Charles Babbage
30035 Nimes Cedex 1
France

E-mail: vbosquier_at_syntem.com
Ligne directe: +33 (0)466 042 294
Standard: +33 (0)466 048 666
Fax: +33 (0)466 048 667
---------------------------------------------------------------------
Discover New Drugs, Discover Synt:em
        http://www.syntem.com
---------------------------------------------------------------------
Received on Mon May 06 2002 - 08:50:07 PDT
Custom Search