Re: [AMBER] PBS script

From: Carlos Sosa <sosa0006.r.umn.edu>
Date: Tue, 30 Aug 2011 09:57:24 -0500

Bruno,

This is a script that I am using and works fine (I use MPI INtel).
However, you might need to modify it for your site. I had to modify
it slightly for my runs.

This is how it works (it assumes that all the files are in the working
directory):

./submit.bash 4 1 In this case, it will ask for 4
different nodes and 1 MPI task per node.

or

./submit.bash 2 2 In this case, it will ask for 2
different nodes and 2 MPI tasks per node.

The nodes are selected in the script called: bench.jac.pmemd_pbs

submit.bash invokes "qsub" and submits bench.jac.pmemd_pbs

Here is submit.bash:

cat submit.bash
#!/bin/bash
if [ $# -ne 2 ] ; then
echo "usage: $0 NODES PPN"
exit
fi

typeset integer NODES PPN NP
NODES=$1
PPN=$2
let NP=${NODES}*${PPN}

qsub -V -v NP=$NP -lnodes=$NODES:ppn=12 -lwalltime=00:30:00
./bench.jac.pmemd_pbs


Here is bench.jac.pmemd_pbs:

cat bench.jac.pmemd_pbs
#!/bin/bash
#PBS -N JACjob
#PBS -j oe
#PBS -l nodes=2:ppn=12
#PBS -l walltime=1:00:00

cd $PBS_O_WORKDIR

typeset integer NODES NP PPN

MPDHOSTS=./mpd.hosts
uniq $PBS_NODEFILE > mpd.hosts
NODES=`wc -l $MPDHOSTS | awk '{print $1}'`
#NODES=`uniq $PBS_NODEFILE | wc -l | awk '{print $1}'`
if [ "$NP" == "" ] ; then
  echo "NP not defined.. exiting"
  exit
else
let PPN=${NP}/${NODES}
fi
#
PID=$$
export I_MPI_JOB_CONTEXT=${PID}

mpdboot -n $NODES -r ssh -f $MPDHOSTS
mpdtrace -l

#export I_MPI_DEBUG=4
#export I_MPI_FALLBACK_DEVICE=disable
#export I_MPI_PROCESSOR_LIST=allcores
#export I_MPI_FABRICS=shm:tmi
#export I_MPI_TMI_PROVIDER=psm

echo "run environment.."
which icc
which mpiexec
echo LD_LIBRARY_PATH=$LD_LIBRARY_PATH
echo ""
env | grep I_MPI
echo ""

for j in `uniq $PBS_NODEFILE`
do
echo "$j"
done >> nodelist

# AMBER files
output=jac_cal42.out
sander=../../amber11/exe/pmemd.MPI
#DO_PARALLEL="mpiexec -ppn $PPN -machinefile nodelist -envall -np $NP"
DO_PARALLEL="mpiexec -ppn $PPN -envall -np $NP"

$DO_PARALLEL $sander -O -i mdin.jac -c inpcrd.equil -o $output < /dev/null

mpdallexit



On Tue, Aug 30, 2011 at 9:37 AM, Ross Walker <ross.rosswalker.co.uk> wrote:
> Hi Bruno,
>
> There is probably one of two things happening.
>
> 1) Your PBS script is running all the threads on the same node rather than
> distributing them. Note, you normally specify resources in PBS with
>
> #PBS -l nodes=4:ppn=8
>
> Which would give you a total of 32 MPI threads spread over 4 nodes.
>
> Try cating the contents of $PBS_NODEFILE and see which nodes it shows as
> running on. You can also try 'mpirun -np 64 hostfile' which will show you
> the hostname of every node an MPI thread is running on.
>
> 2) Your mpi is messed up in some way. One possibility is that the mpi you
> are using is not the infiniband one and thus it is running over the Ethernet
> interconnect instead of the IB interface. You should probably try to use
> mvapich2 as the mpi library and try running some of the bandwidth and ping
> tests that come with that to make sure everything is performing correctly.
> Then make sure you recompile AMBER with this mpi version.
>
> I would suggest starting at just 2 MPI tasks. Start by running pmemd in
> serial and see what performance you get. Then try running the  MPI version
> on 2 cores and see what happens. Try the standard benchmarks included here:
> http://ambermd.org/amber11_bench_files/Amber11_Benchmark_Suite.tar.gz
>
> All the best
> Ross
>
>> -----Original Message-----
>> From: Bruno Rodrigues [mailto:bbrodrigues.gmail.com]
>> Sent: Monday, August 29, 2011 8:34 PM
>> To: AMBER Mailing List
>> Subject: Re: [AMBER] PBS script
>>
>> the serial sander runs at 0.6ns/day....
>>
>> may it be a problem with mpi?
>>
>> On Mon, Aug 29, 2011 at 7:13 PM, Bruno Rodrigues
>> <bbrodrigues.gmail.com>wrote:
>>
>> > I've found that on the former cluster, the logfile prints out the FFT
>> slab
>> > distribution, and now it's FFT block distribution.
>> > Does it mean that something has substantially changed on the way FFT
>> > distributes the the blocks?
>> >
>> >
>> > On Mon, Aug 29, 2011 at 6:05 PM, Bruno Rodrigues
>> <bbrodrigues.gmail.com>wrote:
>> >
>> >> it's InfiniBand, at 40 Gbps.
>> >>
>> >>
>> >> On Mon, Aug 29, 2011 at 5:59 PM, Jason Swails
>> <jason.swails.gmail.com>wrote:
>> >>
>> >>> What kind of interconnect does your cluster have?
>> >>>
>> >>> On Mon, Aug 29, 2011 at 4:54 PM, Bruno Rodrigues
>> <bbrodrigues.gmail.com
>> >>> >wrote:
>> >>>
>> >>> > After the changes you suggested, I got this information on the
>> output
>> >>> >
>> >>> > | Dynamic Memory, Types Used:
>> >>> > | Reals              688690
>> >>> > | Integers           595564
>> >>> >
>> >>> > | Nonbonded Pairs Initial Allocation:      146264
>> >>> >
>> >>> > | Running AMBER/MPI version on   64 nodes
>> >>> >
>> >>> > and still a performance of 02.ns/day.
>> >>> >
>> >>> > There is now a log file that didn't appear before, with the
>> following
>> >>> > information:
>> >>> >
>> >>> >
>> >>> > Initial FFT Block Distribution Based on Workload Estimate:
>> >>> >
>> >>> >  FFT blocks assigned to   12 tasks
>> >>> >
>> >>> > First FFT Block Distribution Based on Actual Workload:
>> >>> >
>> >>> >  FFT blocks assigned to   56 tasks
>> >>> >
>> >>> > Image Distribution at run step     344:
>> >>> >
>> >>> >  Count of images assigned to each task:
>> >>> >          340      437      412      335      542      572
>> 542
>> >>>  516
>> >>> >          291      256       99        1        0        0
>> 0
>> >>>    0
>> >>> >            0        0        0        0        0        0
>> 230
>> >>>  184
>> >>> >            1        0      244      352      436        6
>> 82
>> >>>  219
>> >>> >           23        1        2       64      137      283
>> 173
>> >>>   59
>> >>> >          290      133      233       81      253      198
>> 341
>> >>>  173
>> >>> >          280      330      367      267      157      117
>> 407
>> >>>  125
>> >>> >          361      374      533      455      606      646
>> 1003
>> >>>  905
>> >>> >
>> >>> >
>> >>> > What does it mean?
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Mon, Aug 29, 2011 at 5:34 PM, Jason Swails
>> <jason.swails.gmail.com
>> >>> > >wrote:
>> >>> >
>> >>> > > Every PBS system is set up differently, so it's impossible for
>> us to
>> >>> tell
>> >>> > > what may be happening for sure.  However, I suspect that you're
>> not
>> >>> > getting
>> >>> > > 64 CPUs like you think you are.
>> >>> > >
>> >>> > > On Mon, Aug 29, 2011 at 4:05 PM, Bruno Rodrigues <
>> >>> bbrodrigues.gmail.com
>> >>> > > >wrote:
>> >>> > >
>> >>> > > > Dear All,
>> >>> > > >
>> >>> > > > I'm trying to run parallel Amber 11 on a cluster with PBS.
>> I've
>> >>> checked
>> >>> > > the
>> >>> > > > parallel installation and it's quite fine (the log file
>> attached).
>> >>> > > >
>> >>> > > > However, the performance is always between 0.1 and 0.5
>> ns/day, no
>> >>> > matter
>> >>> > > > the
>> >>> > > > number of processors I choose. Is there something missing in
>> my
>> >>> script?
>> >>> > > >
>> >>> > > > Here are the changes I made on my configure (for the parallel
>> >>> version):
>> >>> > > > mpicc --> icc -lmpi
>> >>> > > > mpif90 --> ifort -lmpi
>> >>> > > >
>> >>> > > > This generated the correct config.h needed for the fortran
>> >>> compiler.
>> >>> > > >
>> >>> > > > However, the problem persists with gnu installing, so I guess
>> it
>> >>> has
>> >>> > > > nothing
>> >>> > > > to do with the installation, but it's pretty much a
>> submission
>> >>> problem.
>> >>> > > > Here
>> >>> > > > is an example of my job:
>> >>> > > >
>> >>> > > > #!/bin/bash
>> >>> > > > #
>> >>> > > > #################################################
>> >>> > > > # THIS JOB IS TO EQUILIBRATE THE SYSTEM AT 300K #
>> >>> > > > # TO BE USED IN FUTURE SIMULATIONS. IT STARTS   #
>> >>> > > > # FROM THE EQUILIBRATION ON CHACOBO, WHERE 1ns  #
>> >>> > > > # WAS PERFORMED AFTER THE DNA WAS RELEASED.     #
>> >>> > > > #################################################
>> >>> > > > #
>> >>> > > > #PBS -S /bin/sh
>> >>> > > > #
>> >>> > > > # Nome do job
>> >>> > > > #PBS -N prod_slow
>> >>> > > > #
>> >>> > > > #Erro na saida padrao
>> >>> > > > #PBS -j oe
>> >>> > > > #
>> >>> > > > # Chamada do ambiente paralelo e numero de slots
>> >>> > > > #PBS -l select=64:ncpus=1
>> >>> > > > #PBS -l walltime=200:00:00
>> >>> > > >
>> >>> > > > #
>> >>> > > > cd $PBS_O_WORKDIR
>> >>> > > >
>> >>> > > > export sander=/home/u/bbr/bin/amber11/bin/pmemd.MPI
>> >>> > > >
>> >>> > >
>> >>> > > In here, add the line
>> >>> > >
>> >>> > > CPUS=`cat $PBS_NODEFILE | wc -l`
>> >>> > >
>> >>> > >
>> >>> > > >
>> >>> > > >  l=heat20
>> >>> > > > f=prod01
>> >>> > > > mpiexec -n 64 $sander -O -i $PWD/$f.in -o $PWD/$f.out -inf
>> >>> $PWD/$f.inf
>> >>> > \
>> >>> > > >        -c $PWD/1D20_wat_tip3pf.$l -ref
>> $PWD/1D20_wat_tip3pf.$l -r
>> >>> > > > $PWD/1D20_wat_tip3pf.$f \
>> >>> > > >        -p $PWD/1D20_wat_tip3pf.top -x
>> $PWD/1D20_wat_tip3pf$f.x -e
>> >>> > > > $PWD/1D20_wat_tip3pf$f.ene
>> >>> > > >
>> >>> > >
>> >>> > > change the beginning to "mpiexec -n $CPUS" instead of "mpiexec
>> -n
>> >>> 64".
>> >>> > > pmemd.MPI should report how many processors are being used,
>> which
>> >>> should
>> >>> > > help you make sure that you're at least allocating all the
>> processors
>> >>> you
>> >>> > > want to be.  You could also consider passing mpiexec the
>> PBS_NODEFILE
>> >>> if
>> >>> > > you
>> >>> > > find out how to pass your mpiexec a hostfile or nodefile or
>> something
>> >>> > (this
>> >>> > > makes sure that each thread is bound to the proper processor).
>> >>> > >
>> >>> > > HTH,
>> >>> > > Jason
>> >>> > >
>> >>> > > --
>> >>> > > Jason M. Swails
>> >>> > > Quantum Theory Project,
>> >>> > > University of Florida
>> >>> > > Ph.D. Candidate
>> >>> > > 352-392-4032
>> >>> > > _______________________________________________
>> >>> > > AMBER mailing list
>> >>> > > AMBER.ambermd.org
>> >>> > > http://lists.ambermd.org/mailman/listinfo/amber
>> >>> > >
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > --
>> >>> > Bruno Barbosa Rodrigues
>> >>> > PhD Student - Physics Department
>> >>> > Universidade Federal de Minas Gerais - UFMG
>> >>> > Belo Horizonte - Brazil
>> >>> > _______________________________________________
>> >>> > AMBER mailing list
>> >>> > AMBER.ambermd.org
>> >>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Jason M. Swails
>> >>> Quantum Theory Project,
>> >>> University of Florida
>> >>> Ph.D. Candidate
>> >>> 352-392-4032
>> >>> _______________________________________________
>> >>> AMBER mailing list
>> >>> AMBER.ambermd.org
>> >>> http://lists.ambermd.org/mailman/listinfo/amber
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> --
>> >> Bruno Barbosa Rodrigues
>> >> PhD Student - Physics Department
>> >> Universidade Federal de Minas Gerais - UFMG
>> >> Belo Horizonte - Brazil
>> >>
>> >
>> >
>> >
>> > --
>> > --
>> > Bruno Barbosa Rodrigues
>> > PhD Student - Physics Department
>> > Universidade Federal de Minas Gerais - UFMG
>> > Belo Horizonte - Brazil
>> >
>>
>>
>>
>> --
>> --
>> Bruno Barbosa Rodrigues
>> PhD Student - Physics Department
>> Universidade Federal de Minas Gerais - UFMG
>> Belo Horizonte - Brazil
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
Carlos P Sosa, Ph.D.
Adjunct Assistant Professor
Biomedical Informatics and Computational Biology (BICB)
University of  Minnesota Rochester
300 University Square
R0869A
111 S Broadway
Rochester, MN 55904
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Aug 30 2011 - 08:00:04 PDT
Custom Search