Greetings,
I am trying to build a linux cluster to run AMBER simulations. While
installing MPICH I ran across a problem thats troubling me for some days
now. Perhaps someone of you knows a solution and can help me with it.
My system consists of:
Hardware:
1 Linux PC (Athlon 1800) with 2 NIC acting as master
1 Linux PC (SMP dual Athlon 1600) acting as node (more of those to come
when the system runs)
1 allied telsyn switch connecting the computers
Software:
Suse Linux 8.0 (kernel 2.2.13) installed on both machines
the nodes home directory and mpich-directory are nfs-mounted
(nfs version 2) from the master
I added the following modifictions:
I allowed passwordless rsh login between all computers (the tstmachines
script of mpich worked without errors, I also tried rsh host true
with all of them)
I installed MPICH-1.2.4 with options device=ch_p4 and comm=shared
(I tried without the options first, but the problem stayed the
same)
I set up a machines.LINUX file with
> master
> node1:2
Problem:
When I try to run the cpi testprogram with mpirun, it fails when I
try to use processors from both machines, that is:
mpirun -np 1 /examples/basic/cpi
runs without problem
mpirun -np 2 /examples/basic/cpi
hangs after creating the PI-file:
> running /usr/local/mpich-1.2.4/examples/basic/cpi on 2 LINUX
> ch_p4 processors
> Created /home/tom/PI23485
The PI-file is:
> master 0 /usr/local/mpich-1.2.4/examples/basic/cpi
> node1 1 /usr/local/mpich-1.2.4/examples/basic/cpi
when I switch the two names in the machine file, it also runs with
-np 1, but hangs with -np 2.
When I try with -np 3 it also hangs, the PI-file is:
> pc2-117 0 /usr/local/mpich-1.2.4/examples/basic/cpi
> node1 2 /usr/local/mpich-1.2.4/examples/basic/cpi
I'm afraid as a newbie to Linux I cannot solve this alone. I didn't find
hints on this problem in the MPICH or AMBER mail archives or
documentations, partially because I don't know exactly what I'm looking for.
Please mail if anyone has a clue what to try next.
Kind regards,
Thomas
Received on Mon Jul 08 2002 - 03:02:52 PDT