installing MPICH for AMBER on Linux CLuster

From: root <root_at_pc2-117.physchem.uni-freiburg.de>
Date: Mon 8 Jul 2002 12:02:52 +0200 (CEST)

Greetings,

I am trying to build a linux cluster to run AMBER simulations. While
installing MPICH I ran across a problem thats troubling me for some days
now. Perhaps someone of you knows a solution and can help me with it.

My system consists of:

Hardware:

        1 Linux PC (Athlon 1800) with 2 NIC acting as master

        1 Linux PC (SMP dual Athlon 1600) acting as node (more of those to come
        when the system runs)

        1 allied telsyn switch connecting the computers

Software:

        Suse Linux 8.0 (kernel 2.2.13) installed on both machines
        the nodes home directory and mpich-directory are nfs-mounted
        (nfs version 2) from the master

        I added the following modifictions:

        I allowed passwordless rsh login between all computers (the tstmachines
        script of mpich worked without errors, I also tried rsh host true
        with all of them)

        I installed MPICH-1.2.4 with options device=ch_p4 and comm=shared
        (I tried without the options first, but the problem stayed the
        same)

        I set up a machines.LINUX file with

> master
> node1:2

Problem:

        When I try to run the cpi testprogram with mpirun, it fails when I
        try to use processors from both machines, that is:

        mpirun -np 1 /examples/basic/cpi

        runs without problem

        mpirun -np 2 /examples/basic/cpi

        hangs after creating the PI-file:

> running /usr/local/mpich-1.2.4/examples/basic/cpi on 2 LINUX
> ch_p4 processors
> Created /home/tom/PI23485

        The PI-file is:
> master 0 /usr/local/mpich-1.2.4/examples/basic/cpi
> node1 1 /usr/local/mpich-1.2.4/examples/basic/cpi

        when I switch the two names in the machine file, it also runs with
        -np 1, but hangs with -np 2.

        When I try with -np 3 it also hangs, the PI-file is:

> pc2-117 0 /usr/local/mpich-1.2.4/examples/basic/cpi
> node1 2 /usr/local/mpich-1.2.4/examples/basic/cpi

I'm afraid as a newbie to Linux I cannot solve this alone. I didn't find
hints on this problem in the MPICH or AMBER mail archives or
documentations, partially because I don't know exactly what I'm looking for.

Please mail if anyone has a clue what to try next.

Kind regards,

Thomas
Received on Mon Jul 08 2002 - 03:02:52 PDT
Custom Search