Hi,
could you pass the output from the echo-option
mpirun -echo -np 2 .....
to us, and better do not run programs as root.
Regards
Walter Langel
root wrote:
>Greetings,
>
>I am trying to build a linux cluster to run AMBER simulations. While
>installing MPICH I ran across a problem thats troubling me for some days
>now. Perhaps someone of you knows a solution and can help me with it.
>
>My system consists of:
>
>Hardware:
>
> 1 Linux PC (Athlon 1800) with 2 NIC acting as master
>
> 1 Linux PC (SMP dual Athlon 1600) acting as node (more of those to come
> when the system runs)
>
> 1 allied telsyn switch connecting the computers
>
>Software:
>
> Suse Linux 8.0 (kernel 2.2.13) installed on both machines
> the nodes home directory and mpich-directory are nfs-mounted
> (nfs version 2) from the master
>
> I added the following modifictions:
>
> I allowed passwordless rsh login between all computers (the tstmachines
> script of mpich worked without errors, I also tried rsh host true
> with all of them)
>
> I installed MPICH-1.2.4 with options device=ch_p4 and comm=shared
> (I tried without the options first, but the problem stayed the
> same)
>
> I set up a machines.LINUX file with
>
> > master
> > node1:2
>
>Problem:
>
> When I try to run the cpi testprogram with mpirun, it fails when I
> try to use processors from both machines, that is:
>
> mpirun -np 1 /examples/basic/cpi
>
> runs without problem
>
> mpirun -np 2 /examples/basic/cpi
>
> hangs after creating the PI-file:
>
> > running /usr/local/mpich-1.2.4/examples/basic/cpi on 2 LINUX
> > ch_p4 processors
> > Created /home/tom/PI23485
>
> The PI-file is:
> > master 0 /usr/local/mpich-1.2.4/examples/basic/cpi
> > node1 1 /usr/local/mpich-1.2.4/examples/basic/cpi
>
> when I switch the two names in the machine file, it also runs with
> -np 1, but hangs with -np 2.
>
> When I try with -np 3 it also hangs, the PI-file is:
>
> > pc2-117 0 /usr/local/mpich-1.2.4/examples/basic/cpi
> > node1 2 /usr/local/mpich-1.2.4/examples/basic/cpi
>
>I'm afraid as a newbie to Linux I cannot solve this alone. I didn't find
>hints on this problem in the MPICH or AMBER mail archives or
>documentations, partially because I don't know exactly what I'm looking for.
>
>Please mail if anyone has a clue what to try next.
>
>Kind regards,
>
>Thomas
>
>
>
--
Prof. Dr. Walter Langel
Institut fuer Chemie und Biochemie
Universitaet Greifswald
Soldmannstrasse 23
D-17487 Greifswald
Germany
Tel +49 3834 86 4423
Fax +49 3834 86 4413
http://www.chemie.uni-greifswald.de/~plasma
Received on Mon Jul 08 2002 - 05:26:36 PDT