Re: [AMBER] pmemd.MPI fails to run

From: Fabian Glaser <fabian.glaser.gmail.com>
Date: Mon, 29 Dec 2014 12:33:18 +0200

Thanks Bill,

We use PBS queuing, and despite it looks that all internal variables of the job seem to be ok (24 CPU from two different nodes) the same problem occurs, the job is run in only one node 24 subjobs ( 2 jobs x CPU) and the second one is empty, instead of starting 24 CPU in two nodes, here are the output variables once the job ends, and their real value produced by PBS output below. The PBS I use is also pasted. The amber output files by the way are produced just fine, the problem is it does not spread the jobs in more than one node.....

So it seems the PBS queue is working correctly but something is not allowing it to use two nodes, do you still think the problem is on the system or you think we should recompile?

Thanks a lot,

Fabian


        • PBS_O_HOST - name of the host upon which qsub command is running
        • PBS_O_QUEUE - name of the original queue to which the job was submitted
        • PBS_O_WORKDIR - absolute path of the current working directory of the qsub command
        • PBS_ENVIRONMENT - set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job
        • PBS_JOBID - the job identifier assigned to the job by the batch system
        • PBS_JOBNAME - the job name supplied by the user
        • PBS_NODEFILE - the name of the file containing the list of nodes assigned to the job
        • PBS_QUEUE - the name of the queue from which the job is executed

PBS output:
===
-bash-4.1$ more ETA_1_min3.o1076308
/u/fglaser/projects/IsraelVlodavsky/hep1_v2/MD/ETA_1/min
tamnun.default.domain
all_l_p
/u/fglaser/projects/IsraelVlodavsky/hep1_v2/MD/ETA_1/min
PBS_BATCH
1076308.tamnun
ETA_1_min3
all_l_p_exe
nodes (24 cpu total):
n032.default.domain
n034.default.domain

PBS file
======

#!/bin/sh
#
# job name (default is the name of pbs script file)
#---------------------------------------------------
#PBS -N ETA_1_min3
# Submit the job to the queue "queue_name"
#---------------------------------------------------
#PBS -q all_l_p
# Send the mail messages (see below) to the specified user address
#-----------------------------------------------------------------
#PBS -M fglaser.technion.ac.il
# send me mail when the job begins
#---------------------------------------------------
#PBS -mbea
# resource limits: number and distribution of parallel processes
#------------------------------------------------------------------
#PBS -l select=2:ncpus=12:mpiprocs=12
#
# comment: this select statement means: use M chunks (nodes),
# use N (=< 12) CPUs for N mpi tasks on each of M nodes.
# "scatter" will use exactly N CPUs from each node, while omitting
# "-l place" statement will fill all available CPUs of M nodes
#
# specifying working directory
#------------------------------------------------------
echo $PBS_O_WORKDIR
echo $PBS_O_HOST
echo $PBS_O_QUEUE
echo $PBS_O_WORKDIR
echo $PBS_ENVIRONMENT
echo $PBS_JOBID
echo $PBS_JOBNAME
echo $PBS_QUEUE


cd $PBS_O_WORKDIR


# This finds out the number of nodes we have
NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
echo "nodes ($NP cpu total):"
sort $PBS_NODEFILE | uniq


export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
export PATH=/usr/lib64/openmpi/bin:$PATH
source /usr/local/amber14/amber.sh

#source /usr/local/amber14/setup.sh

# running MPI executable with M*N processes
#------------------------------------------------------

mpirun -np 24 pmemd.MPI -O -i min.in -o min.out -p ../hep1_system_ETA_ETA_1.prmtop -c ../hep1_system_ETA_ETA_1.prmcrd -r min.rst -ref ../hep1_system_ETA_ETA_1.prmcrd




=====





_______________________________
Fabian Glaser, PhD

Technion - Israel Institute of Technology
Haifa 32000, ISRAEL

fglaser.technion.ac.il
Tel: +972 4 8293701
Fax: +972 4 8225153

> On Dec 29, 2014, at 6:19 AM, Bill Ross <ross.cgl.ucsf.edu> wrote:
>
> My suggestion would be to just use a queueing system. I don't currently
> use one myself, but classics NQS and PBS stem from the supercomputer
> center building I worked at at NASA while I was the secretary and Leap
> programmer for AMBER, and they were a natural fit even before clusters
> came along.
>
> Vis a vis clusters, GPUs seem to be the thing these days - noting that
> there is now peer-to-peering with them, they could be schedulable as
> singles or doubles (granularity a queuing system would help with). E.g.
> if you have the advertised hardware to create the need:
>
> http://ambermd.org/news.html#4Ux8GPU
>
> The more that is invested in hardware, the more likely it is shared
> and/or optimized with a queueing system.
>
> Bill
>
> On 12/28/2014 11:07 AM, Thomas Cheatham wrote:
>>> I am not sure about it we have successfully run amber 14 from PBS
>>> without any PBS_NODEFILE variable, but I will try to use it.
>> Any mpirun command needs a list of the nodes to run on, otherwise it
>> defaults to the node the command was run from. There must be some way on
>> your cluster to specify which nodes are assigned to the current job; the
>> mpirun command itself does not have the built in intelligence to
>> automatically figure it out. Usually this comes from the queuing system;
>> if you are not running a queuing system, then you can create the
>> "nodelist" by hand. Searching google for "mpirun tutorial" shows some
>> examples...
>>
>>> What about Intel MPI?
>>>
>>>>> Can AMBER 14 work with Intel MPI generally?
>> Yes, or even with the built-in MPI version that comes with AMBER 14 or
>> mpich; the AMBER reference manual has clear discussion of this. For the
>> Intel compile, there is an extra flag to configure, -intelmpi
>>
>> All of the compiles assume you have the matching mpicc and mpif90 in your
>> path, and as mentioned previously, you want all the MPI commands to match.
>> You showed this with the openmpi compile, just neglected to specify the
>> host_file that lists what nodes to run on. If running PBS, as Ross Walker
>> mentioned, this is usually set to the variable $PBS_NODEFILE
>>
>> --tec3
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Dec 29 2014 - 03:00:05 PST
Custom Search