Re: [AMBER] cannot use more than one node

From: Jiri Wiesner <wiesner.chemi.muni.cz>
Date: Sun, 16 Dec 2012 14:24:38 +0100

Hello Fabian:
A suitable way to use more nodes very much depends on how your computer
cluster is configured and your MPI software is compiled. It seems that
you use MPICH (I have no experience with that). I will describe the way
I do it; I use OpenMPI.

Prerequisites:
1. the cluster nodes can use "password-less" ssh, meaning there is
either a kerberos service running or ssh keys are configured
2. I assume there is no shared storage that the node can use to stored
files during the calculation, e.g. each node has its own /scratch directory.
3. your MPI software is configured without PBS support (I do not
recommend that as a general solution, because the PBS should be aware of
all processes launched by MPI. I have to it on our cluster, because
OpenMPI does not compile "--with-tm" (./configure --prefix=$OPAL_PREFIX
--with-tm=no ))

Create a script which does the following:
1. "tell" PBS to allocate two nodes
2. copy the whole MPI installation and the program binary to the current
directory
3. set up some shell variables OPAL_PREFIX, LD_LIBRARY_PATH, PATH
4. create a copy of the current directory on the other nodes (Let's call
them slave nodes)
5. launch pmemd
6. copy back the results (mdout, mdcdr, restrt) from the slave nodes
7. delete the data from the slave nodes

Once this script is executed by PBS, it is provided with a shell on one
of the allocated nodes (main node). You can use the $PBS_NODEFILE, which
is set up by PBS before the script is executed, to get the names of the
slave nodes. I am attaching a functional prototype such a script
(without the PBS directives, my software is located on an AFS filesystem).

A word of warning: I suggest that you do benchmarks of all possible
scenarios (two nodes, all CPU cores; two nodes, half of the cores),
because you might actually find that you would gain nothing at all -
that depends on the network connection between the nodes, size of your
system, ...

Cheers,
Jiri Wiesner


On 16/12/12 09:26, Fabian Glaser wrote:
> Hi,
>
> I am running dynamics successfully with ONE node, using the following combination on PBS file:
>
> #PBS -l select=1:ncpus=12:mpiprocs=12
>
> mpirun -hostfile $PBS_NODEFILE pmemd.MPI -O -i prod.in -p 3SO6_clean.prmtop -c 3SO6_clean_prod_2.rst -o 3SO6_clean_prod_3.out -x 3SO6_clean_prod_3.mdcrd -r 3SO6_clean_prod_3.rst
>
> But when I try to use 2 or more nodes I get an error from the system, which I think it is not connected to PBS:
>
> That is for example:
> #PBS -l select=2:ncpus=12:mpiprocs=12
>
> Here is the error:
>
> HYDU_create_process (./utils/launch/launch.c:94): execvp error on file 3SO6_clean_prod_3.rst (No such file or directory)
>
> So that is one of the nodes does not find the output file?
>
> What I am doing wrong?
>
> Thanks!
>
> Fabian
>
>
> _______________________________
> Fabian Glaser, PhD
> Bioinformatics Knowledge Unit,
> The Lorry I. Lokey Interdisciplinary
> Center for Life Sciences and Engineering
>
> Technion - Israel Institute of Technology
> Haifa 32000, ISRAEL
> fglaser.technion.ac.il
> Tel: +972 4 8293701
> Fax: +972 4 8225153
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Sun Dec 16 2012 - 05:30:03 PST
Custom Search