Not able to do parallel run

From: (wrong string) éphane <steletch_at_biomedicale.univ-paris5.fr>
Date: Fri 30 Nov 2001 16:36:37 +0100

Hi all,

As seen before on the ML, more and more people are using AMBER on linux
clusters.
We bought such one, and now we have 4 dual-nodes athlons 1.2Ghz.
I compile amber with mpich-1.2.2.3 following the procedure described in
README and README.parallel, using Machine.g77_mpich.
The program compiled well.
But i'm not able to go on with the test procedure.
I got losts of errors like :
diffing run3e_w_300.out.save with run3e_w_300.out
PASSED
diffing run3e_w_300.mc.save with run3e_w_300.mc
PASSED
cd lj_lj/test; ./run
b run1_b_300
[1] MPI Abort by user Aborting program !
[1] Aborting program!
 Error in setpar: check code, input
p1_28946: p4_error: : 1
bm_list_15437: p4_error: net_recv read: probable EOF on socket: 1

make[1]: *** [test] Interruption
make: *** [CMC] Interruption

I know not all the code is parallelized, but i think the test procedure cope
with it, so i don't know why it is not working ?

I tried with a well-know script for mpi, and i get same errors when i try to
run the script.
So as you can see, i tried to modify P4_GLOBMEMSIZE, but it doesn't worked

P4 est de : 100000000
DO est de :/usr/local/mpich-1.2.2.3/bin/mpirun -np 4 -nolocal
dirbas est de : /home/admin/Tests/Test_4_proc/DM_Tcte300_H2O/
La valeur de i est de : 2
Dynamique #2\tGCCGGGTCGC.dn300K2\tven nov 30 15:56:59 CET 2001
 All processors started
3 - MPI_RECV : Invalid rank 133
[3] Aborting program !
[3] Aborting program!
p3_9590: p4_error: : 8262
Connection failed for reason: : Connection refused
Connection failed for reason: : Connection refused
rm_l_1_13537: (2.644713) net_recv failed for fd = 6
rm_l_1_13537: p4_error: net_recv read, errno = : 104
Connection failed for reason: : Connection refused
Connection failed for reason: : Connection refused
bm_list_28969: (4.683018) net_recv failed for fd = 5
bm_list_28969: p4_error: net_recv read, errno = : 104


Last of all : i patched until the patch 27 (Thanks D. Case, i could use the
command patch, i had to do it manually for those from B. Ross), checked that
i did use the right version.
Again, it compiled well, all my environment variables worked (PATH and
AMBERHOME).
Again, the tests failed.

I have to mention that with mpich-1.2.2.3, there are tests provided to check
wheter everything is working or not, and thaty these tests worked.
I hope i provided enough information, and that you could help me :-))

PS : i have been told that because of PME, the amber could not go up to 2
processors, is that true ?

Regards,
Stephane Teletchea
Received on Fri Nov 30 2001 - 07:36:37 PST
Custom Search