Re: "Unit 5 Error" with a Linux/MPICH Amber7

From: (wrong string) éphane Teletchéa <steletch_at_biomedicale.univ-paris5.fr>
Date: Mon 6 May 2002 18:22:42 +0200

Hi Vincent, it seems that you have forgotten to put the mini1.in in the
correct directory, or that your mini1.min is incorrect (corrupted, wrong ..).

Stef

-- 
*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*
Teletchéa Stéphane - CNRS UMR 8601
Lab. de chimie et biochimie pharmacologiques et toxicologiques
45 rue des Saints-Peres 75270 Paris cedex 06
tel : (33) - 1 42 86 20 86 - fax : (33) - 1 42 86 83 87
mél : steletch_at_biomedicale.univ-paris5.fr
*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*
Le Lundi 6 Mai 2002 17:50, Vincent BOSQUIER a écrit :
> Hi all,
>
> I have installed AMBER7 with MPICH on a Linux RedHat-7.2 cluster. To
> validate my installation, I run a script that has already been run on
> AMBER7 installed on a 1 CPU SGI server. This script includes several
> successive commands, including calls to sander. The script has the
> following structure:
>
> -----------
>
> #!/bin/csh -f
>
> setenv $AMBERHOME /data/test/amber7
> setenv MPICH_HOME /usr/share/mpi
> setenv DO_PARALLEL "$MPICH_HOME/bin/mpirun -np 4 -machinefile
> $MPICH_HOME/share/machines.LINUX"
>
> $AMBERHOME/exe/sander -O \
> 					-i mini1.in \
>                            	-o test1.out \
>                            	-p test.top \
>                            	-c test.crd \
>                            	-inf test1.info \
> 		           		-r test1.rst
>
> $AMBERHOME/exe/sander -O \
> 					-i mini2.in \
> 					(...)
>
>
> -----------
>
> It seems that sander crashes with the following error messages, whenever I
> try to run such a script:
>
> -----------
>
>   Unit    5 Error on OPEN: mini1.in
> [0] MPI Abort by user Aborting program !
> [0] Aborting program!
> p0_3492:  p4_error: : 1
>
>   Unit    5 Error on OPEN: mini2.in
> [0] MPI Abort by user Aborting program !
> [0] Aborting program!
> p0_3493:  p4_error: : 1
>
> (...)
>
> -----------
>
> As I already said above, a researcher in our molecular modeling team tried
> to run the same test files on an SGI machine where I previously installed
> AMBER7 locally without MPICH and it worked fine. Is there a problem with
> our input files? Is there a difference in the input files for AMBER7 when
> you runit on 1 or on several processors? Today, what I'm sure about is that
> "make test.sander" passed without any problem on the cluster. I don't know
> wether MPICH is correctly configured or not, but I think it is, because of
> some tests I have successfully made (see below).
>
> Can one tell me what is a "Unit 5 error", and how I can manage it so that
> sander runs normally with all the processors I define in the machinefile?
>
> We also experienced sander-crashes problems with "Unit 6 error" that seemed
> to be related to ".out" files. Has anyone any information about this too?
>
> Here are some informations about the machines and the tests I ran to
> validate my MPICH module. Maybe it will help you have an idea of what is
> happening:
>
> -----------
>
> IBM x330series - Linux RedHat-7.2
> Test of parallel computing using "mpich-1.2.0" installed from RedHat's
> RPMs. $MPICH_HOME=/usr/share/mpi
> DO_PARALLEL="$MPICH_HOME/bin/mpirun -np 4 -machinefile
> $MPICH_HOME/share/machines.LINUX" MPICH Machinefile is "machines.LINUX" and
> contains 4 lines formatted that way:
>
> machine2.ourdomain
> machine2.ourdomain
> machine1.ourdomain
> machine1.ourdomain
>
> "machine2" and "machine1" are biprocessors nodes in my cluster
>
> The /data/test directory is a local directory on "machine1" and is
> NFS-mounted on "machine2" where /data/test is also the name of the
> mountpoint. User "me" owns $MPICH_HOME directory (and all of its contents).
> User "me" also owns /data/test directory (and all of its contents,
> including the "cpi" executable file). Command line used and associated
> results look like this:
>
> <me_at_machine1:/data/test>/usr/share/mpi/bin/mpirun -np 4 -machinefile
> /usr/share/mpi/share/machines.LINUX ./cpi Process 0 on machine1.ourdomain
> Process 3 on machine1.ourdomain
> Process 1 on machine2.ourdomain
> Process 2 on machine2.ourdomain
> pi is approximately 3.1416009869231249, Error is 0.0000083333333318
> wall clock time = 0.001346
>
> -----------
>
> Thanks in advance to all those who will help me.
>
> Vincent.
>
>
> ---------------------------------------------------------------------
> Vincent Bosquier
> IT Engineer
>
> Synt:em
> Computational Drug Discovery
> Parc Scientifique G.Besse
> Allee Charles Babbage
> 30035 Nimes Cedex 1
> France
>
> E-mail: vbosquier_at_syntem.com
> Ligne directe: +33 (0)466 042 294
> Standard: +33 (0)466 048 666
> Fax: +33 (0)466 048 667
> ---------------------------------------------------------------------
> Discover New Drugs, Discover Synt:em
> 	http://www.syntem.com
> ---------------------------------------------------------------------
Received on Mon May 06 2002 - 09:22:42 PDT
Custom Search