Re: [AMBER] pmemd.MPI fails to run from Jason Swails on 2014-12-29 (Amber Archive Dec 2014)

From: Jason Swails <jason.swails.gmail.com>
Date: Mon, 29 Dec 2014 06:53:18 -0500

On Mon, Dec 29, 2014 at 5:33 AM, Fabian Glaser <fabian.glaser.gmail.com>
wrote:

> Thanks Bill,
>
> We use PBS queuing, and despite it looks that all internal variables of
> the job seem to be ok (24 CPU from two different nodes) the same problem
> occurs, the job is run in only one node 24 subjobs ( 2 jobs x CPU) and the
> second one is empty, instead of starting 24 CPU in two nodes, here are the
> output variables once the job ends, and their real value produced by PBS
> output below. The PBS I use is also pasted. The amber output files by the
> way are produced just fine, the problem is it does not spread the jobs in
> more than one node.....
>
> So it seems the PBS queue is working correctly but something is not
> allowing it to use two nodes, do you still think the problem is on the
> system or you think we should recompile?
>
> Thanks a lot,
>
> Fabian
>
>
> • PBS_O_HOST - name of the host upon which qsub command is running
> • PBS_O_QUEUE - name of the original queue to which the job was
> submitted
> • PBS_O_WORKDIR - absolute path of the current working directory
> of the qsub command
> • PBS_ENVIRONMENT - set to PBS_BATCH to indicate the job is a
> batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive
> job
> • PBS_JOBID - the job identifier assigned to the job by the batch
> system
> • PBS_JOBNAME - the job name supplied by the user
> • PBS_NODEFILE - the name of the file containing the list of nodes
> assigned to the job
> • PBS_QUEUE - the name of the queue from which the job is executed
>
> PBS output:
> ===
> -bash-4.1$ more ETA_1_min3.o1076308
> /u/fglaser/projects/IsraelVlodavsky/hep1_v2/MD/ETA_1/min
> tamnun.default.domain
> all_l_p
> /u/fglaser/projects/IsraelVlodavsky/hep1_v2/MD/ETA_1/min
> PBS_BATCH
> 1076308.tamnun
> ETA_1_min3
> all_l_p_exe
> nodes (24 cpu total):
> n032.default.domain
> n034.default.domain
>
> PBS file
> ======
>
> #!/bin/sh
> #
> # job name (default is the name of pbs script file)
> #---------------------------------------------------
> #PBS -N ETA_1_min3
> # Submit the job to the queue "queue_name"
> #---------------------------------------------------
> #PBS -q all_l_p
> # Send the mail messages (see below) to the specified user address
> #-----------------------------------------------------------------
> #PBS -M fglaser.technion.ac.il
> # send me mail when the job begins
> #---------------------------------------------------
> #PBS -mbea
> # resource limits: number and distribution of parallel processes
> #------------------------------------------------------------------
> #PBS -l select=2:ncpus=12:mpiprocs=12
> #
> # comment: this select statement means: use M chunks (nodes),
> # use N (=< 12) CPUs for N mpi tasks on each of M nodes.
> # "scatter" will use exactly N CPUs from each node, while omitting
> # "-l place" statement will fill all available CPUs of M nodes
> #
> # specifying working directory
> #------------------------------------------------------
> echo $PBS_O_WORKDIR
> echo $PBS_O_HOST
> echo $PBS_O_QUEUE
> echo $PBS_O_WORKDIR
> echo $PBS_ENVIRONMENT
> echo $PBS_JOBID
> echo $PBS_JOBNAME
> echo $PBS_QUEUE
>
>
> cd $PBS_O_WORKDIR
>
>
> # This finds out the number of nodes we have
> NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
> echo "nodes ($NP cpu total):"
> sort $PBS_NODEFILE | uniq
>
>
> export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
> export PATH=/usr/lib64/openmpi/bin:$PATH
> source /usr/local/amber14/amber.sh
>
> #source /usr/local/amber14/setup.sh
>
> # running MPI executable with M*N processes
> #------------------------------------------------------
>
> mpirun -np 24 pmemd.MPI -O -i min.in -o min.out -p
> ../hep1_system_ETA_ETA_1.prmtop -c ../hep1_system_ETA_ETA_1.prmcrd -r
> min.rst -ref ../hep1_system_ETA_ETA_1.prmcrd
>

A number of people have already pointed out what they think is happening
(and I agree with them): you are not giving any instruction here to tell
the MPI implementation WHERE to actually run those 24 threads. In some
cases (depending on how your MPI is installed), this will mean that all 24
threads are run on the same node. If this is happening, you need to
provide a machinefile to mpirun to tell it exactly where to start all of
the threads. In the case of OpenMPI, this can be done with --hostfile; so
something like this:

mpirun -np 24 --hostfile $PBS_NODEFILE pmemd.MPI -O ...

That said, the most common PBS implementation (Torque) provides an API so
that applications can be made aware of the scheduler. The various MPI
implementations (OpenMPI, MPICH, etc.) can all be built with Torque
integration, which will make it *much* easier to use within the PBS
environment. For example. on one of the HPC systems I've used in the past,
I was able to use the command:

mpiexec pmemd.MPI -O ...

with no arguments at all to mpiexec/mpirun -- in this case, mpiexec was
able to figure out how many threads to run and where to run them because it
was integrated directly with the scheduler.

HTH,
Jason

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Mon Dec 29 2014 - 04:00:03 PST