Dear Dr. Carols,
Thank you for the script. It automated a lot of my work and could give a
deeper understanding of the PBS.
I'll switch to it!
Also I would suggest (Ross can support me) to do not change anything on the
configure file, because it messes up everything.
However, for the newest intel compilers, it's neccessary to switch from
em64t to intel64. Is it a matter of a bugfix?
On Tue, Aug 30, 2011 at 11:57 AM, Carlos Sosa <sosa0006.r.umn.edu> wrote:
> Bruno,
>
> This is a script that I am using and works fine (I use MPI INtel).
> However, you might need to modify it for your site. I had to modify
> it slightly for my runs.
>
> This is how it works (it assumes that all the files are in the working
> directory):
>
> ./submit.bash 4 1 In this case, it will ask for 4
> different nodes and 1 MPI task per node.
>
> or
>
> ./submit.bash 2 2 In this case, it will ask for 2
> different nodes and 2 MPI tasks per node.
>
> The nodes are selected in the script called: bench.jac.pmemd_pbs
>
> submit.bash invokes "qsub" and submits bench.jac.pmemd_pbs
>
> Here is submit.bash:
>
> cat submit.bash
> #!/bin/bash
> if [ $# -ne 2 ] ; then
> echo "usage: $0 NODES PPN"
> exit
> fi
>
> typeset integer NODES PPN NP
> NODES=$1
> PPN=$2
> let NP=${NODES}*${PPN}
>
> qsub -V -v NP=$NP -lnodes=$NODES:ppn=12 -lwalltime=00:30:00
> ./bench.jac.pmemd_pbs
>
>
> Here is bench.jac.pmemd_pbs:
>
> cat bench.jac.pmemd_pbs
> #!/bin/bash
> #PBS -N JACjob
> #PBS -j oe
> #PBS -l nodes=2:ppn=12
> #PBS -l walltime=1:00:00
>
> cd $PBS_O_WORKDIR
>
> typeset integer NODES NP PPN
>
> MPDHOSTS=./mpd.hosts
> uniq $PBS_NODEFILE > mpd.hosts
> NODES=`wc -l $MPDHOSTS | awk '{print $1}'`
> #NODES=`uniq $PBS_NODEFILE | wc -l | awk '{print $1}'`
> if [ "$NP" == "" ] ; then
> echo "NP not defined.. exiting"
> exit
> else
> let PPN=${NP}/${NODES}
> fi
> #
> PID=$$
> export I_MPI_JOB_CONTEXT=${PID}
>
> mpdboot -n $NODES -r ssh -f $MPDHOSTS
> mpdtrace -l
>
> #export I_MPI_DEBUG=4
> #export I_MPI_FALLBACK_DEVICE=disable
> #export I_MPI_PROCESSOR_LIST=allcores
> #export I_MPI_FABRICS=shm:tmi
> #export I_MPI_TMI_PROVIDER=psm
>
> echo "run environment.."
> which icc
> which mpiexec
> echo LD_LIBRARY_PATH=$LD_LIBRARY_PATH
> echo ""
> env | grep I_MPI
> echo ""
>
> for j in `uniq $PBS_NODEFILE`
> do
> echo "$j"
> done >> nodelist
>
> # AMBER files
> output=jac_cal42.out
> sander=../../amber11/exe/pmemd.MPI
> #DO_PARALLEL="mpiexec -ppn $PPN -machinefile nodelist -envall -np $NP"
> DO_PARALLEL="mpiexec -ppn $PPN -envall -np $NP"
>
> $DO_PARALLEL $sander -O -i mdin.jac -c inpcrd.equil -o $output < /dev/null
>
> mpdallexit
>
>
>
> On Tue, Aug 30, 2011 at 9:37 AM, Ross Walker <ross.rosswalker.co.uk>
> wrote:
> > Hi Bruno,
> >
> > There is probably one of two things happening.
> >
> > 1) Your PBS script is running all the threads on the same node rather
> than
> > distributing them. Note, you normally specify resources in PBS with
> >
> > #PBS -l nodes=4:ppn=8
> >
> > Which would give you a total of 32 MPI threads spread over 4 nodes.
> >
> > Try cating the contents of $PBS_NODEFILE and see which nodes it shows as
> > running on. You can also try 'mpirun -np 64 hostfile' which will show you
> > the hostname of every node an MPI thread is running on.
> >
> > 2) Your mpi is messed up in some way. One possibility is that the mpi you
> > are using is not the infiniband one and thus it is running over the
> Ethernet
> > interconnect instead of the IB interface. You should probably try to use
> > mvapich2 as the mpi library and try running some of the bandwidth and
> ping
> > tests that come with that to make sure everything is performing
> correctly.
> > Then make sure you recompile AMBER with this mpi version.
> >
> > I would suggest starting at just 2 MPI tasks. Start by running pmemd in
> > serial and see what performance you get. Then try running the MPI
> version
> > on 2 cores and see what happens. Try the standard benchmarks included
> here:
> > http://ambermd.org/amber11_bench_files/Amber11_Benchmark_Suite.tar.gz
> >
> > All the best
> > Ross
> >
> >> -----Original Message-----
> >> From: Bruno Rodrigues [mailto:bbrodrigues.gmail.com]
> >> Sent: Monday, August 29, 2011 8:34 PM
> >> To: AMBER Mailing List
> >> Subject: Re: [AMBER] PBS script
> >>
> >> the serial sander runs at 0.6ns/day....
> >>
> >> may it be a problem with mpi?
> >>
> >> On Mon, Aug 29, 2011 at 7:13 PM, Bruno Rodrigues
> >> <bbrodrigues.gmail.com>wrote:
> >>
> >> > I've found that on the former cluster, the logfile prints out the FFT
> >> slab
> >> > distribution, and now it's FFT block distribution.
> >> > Does it mean that something has substantially changed on the way FFT
> >> > distributes the the blocks?
> >> >
> >> >
> >> > On Mon, Aug 29, 2011 at 6:05 PM, Bruno Rodrigues
> >> <bbrodrigues.gmail.com>wrote:
> >> >
> >> >> it's InfiniBand, at 40 Gbps.
> >> >>
> >> >>
> >> >> On Mon, Aug 29, 2011 at 5:59 PM, Jason Swails
> >> <jason.swails.gmail.com>wrote:
> >> >>
> >> >>> What kind of interconnect does your cluster have?
> >> >>>
> >> >>> On Mon, Aug 29, 2011 at 4:54 PM, Bruno Rodrigues
> >> <bbrodrigues.gmail.com
> >> >>> >wrote:
> >> >>>
> >> >>> > After the changes you suggested, I got this information on the
> >> output
> >> >>> >
> >> >>> > | Dynamic Memory, Types Used:
> >> >>> > | Reals 688690
> >> >>> > | Integers 595564
> >> >>> >
> >> >>> > | Nonbonded Pairs Initial Allocation: 146264
> >> >>> >
> >> >>> > | Running AMBER/MPI version on 64 nodes
> >> >>> >
> >> >>> > and still a performance of 02.ns/day.
> >> >>> >
> >> >>> > There is now a log file that didn't appear before, with the
> >> following
> >> >>> > information:
> >> >>> >
> >> >>> >
> >> >>> > Initial FFT Block Distribution Based on Workload Estimate:
> >> >>> >
> >> >>> > FFT blocks assigned to 12 tasks
> >> >>> >
> >> >>> > First FFT Block Distribution Based on Actual Workload:
> >> >>> >
> >> >>> > FFT blocks assigned to 56 tasks
> >> >>> >
> >> >>> > Image Distribution at run step 344:
> >> >>> >
> >> >>> > Count of images assigned to each task:
> >> >>> > 340 437 412 335 542 572
> >> 542
> >> >>> 516
> >> >>> > 291 256 99 1 0 0
> >> 0
> >> >>> 0
> >> >>> > 0 0 0 0 0 0
> >> 230
> >> >>> 184
> >> >>> > 1 0 244 352 436 6
> >> 82
> >> >>> 219
> >> >>> > 23 1 2 64 137 283
> >> 173
> >> >>> 59
> >> >>> > 290 133 233 81 253 198
> >> 341
> >> >>> 173
> >> >>> > 280 330 367 267 157 117
> >> 407
> >> >>> 125
> >> >>> > 361 374 533 455 606 646
> >> 1003
> >> >>> 905
> >> >>> >
> >> >>> >
> >> >>> > What does it mean?
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > On Mon, Aug 29, 2011 at 5:34 PM, Jason Swails
> >> <jason.swails.gmail.com
> >> >>> > >wrote:
> >> >>> >
> >> >>> > > Every PBS system is set up differently, so it's impossible for
> >> us to
> >> >>> tell
> >> >>> > > what may be happening for sure. However, I suspect that you're
> >> not
> >> >>> > getting
> >> >>> > > 64 CPUs like you think you are.
> >> >>> > >
> >> >>> > > On Mon, Aug 29, 2011 at 4:05 PM, Bruno Rodrigues <
> >> >>> bbrodrigues.gmail.com
> >> >>> > > >wrote:
> >> >>> > >
> >> >>> > > > Dear All,
> >> >>> > > >
> >> >>> > > > I'm trying to run parallel Amber 11 on a cluster with PBS.
> >> I've
> >> >>> checked
> >> >>> > > the
> >> >>> > > > parallel installation and it's quite fine (the log file
> >> attached).
> >> >>> > > >
> >> >>> > > > However, the performance is always between 0.1 and 0.5
> >> ns/day, no
> >> >>> > matter
> >> >>> > > > the
> >> >>> > > > number of processors I choose. Is there something missing in
> >> my
> >> >>> script?
> >> >>> > > >
> >> >>> > > > Here are the changes I made on my configure (for the parallel
> >> >>> version):
> >> >>> > > > mpicc --> icc -lmpi
> >> >>> > > > mpif90 --> ifort -lmpi
> >> >>> > > >
> >> >>> > > > This generated the correct config.h needed for the fortran
> >> >>> compiler.
> >> >>> > > >
> >> >>> > > > However, the problem persists with gnu installing, so I guess
> >> it
> >> >>> has
> >> >>> > > > nothing
> >> >>> > > > to do with the installation, but it's pretty much a
> >> submission
> >> >>> problem.
> >> >>> > > > Here
> >> >>> > > > is an example of my job:
> >> >>> > > >
> >> >>> > > > #!/bin/bash
> >> >>> > > > #
> >> >>> > > > #################################################
> >> >>> > > > # THIS JOB IS TO EQUILIBRATE THE SYSTEM AT 300K #
> >> >>> > > > # TO BE USED IN FUTURE SIMULATIONS. IT STARTS #
> >> >>> > > > # FROM THE EQUILIBRATION ON CHACOBO, WHERE 1ns #
> >> >>> > > > # WAS PERFORMED AFTER THE DNA WAS RELEASED. #
> >> >>> > > > #################################################
> >> >>> > > > #
> >> >>> > > > #PBS -S /bin/sh
> >> >>> > > > #
> >> >>> > > > # Nome do job
> >> >>> > > > #PBS -N prod_slow
> >> >>> > > > #
> >> >>> > > > #Erro na saida padrao
> >> >>> > > > #PBS -j oe
> >> >>> > > > #
> >> >>> > > > # Chamada do ambiente paralelo e numero de slots
> >> >>> > > > #PBS -l select=64:ncpus=1
> >> >>> > > > #PBS -l walltime=200:00:00
> >> >>> > > >
> >> >>> > > > #
> >> >>> > > > cd $PBS_O_WORKDIR
> >> >>> > > >
> >> >>> > > > export sander=/home/u/bbr/bin/amber11/bin/pmemd.MPI
> >> >>> > > >
> >> >>> > >
> >> >>> > > In here, add the line
> >> >>> > >
> >> >>> > > CPUS=`cat $PBS_NODEFILE | wc -l`
> >> >>> > >
> >> >>> > >
> >> >>> > > >
> >> >>> > > > l=heat20
> >> >>> > > > f=prod01
> >> >>> > > > mpiexec -n 64 $sander -O -i $PWD/$f.in -o $PWD/$f.out -inf
> >> >>> $PWD/$f.inf
> >> >>> > \
> >> >>> > > > -c $PWD/1D20_wat_tip3pf.$l -ref
> >> $PWD/1D20_wat_tip3pf.$l -r
> >> >>> > > > $PWD/1D20_wat_tip3pf.$f \
> >> >>> > > > -p $PWD/1D20_wat_tip3pf.top -x
> >> $PWD/1D20_wat_tip3pf$f.x -e
> >> >>> > > > $PWD/1D20_wat_tip3pf$f.ene
> >> >>> > > >
> >> >>> > >
> >> >>> > > change the beginning to "mpiexec -n $CPUS" instead of "mpiexec
> >> -n
> >> >>> 64".
> >> >>> > > pmemd.MPI should report how many processors are being used,
> >> which
> >> >>> should
> >> >>> > > help you make sure that you're at least allocating all the
> >> processors
> >> >>> you
> >> >>> > > want to be. You could also consider passing mpiexec the
> >> PBS_NODEFILE
> >> >>> if
> >> >>> > > you
> >> >>> > > find out how to pass your mpiexec a hostfile or nodefile or
> >> something
> >> >>> > (this
> >> >>> > > makes sure that each thread is bound to the proper processor).
> >> >>> > >
> >> >>> > > HTH,
> >> >>> > > Jason
> >> >>> > >
> >> >>> > > --
> >> >>> > > Jason M. Swails
> >> >>> > > Quantum Theory Project,
> >> >>> > > University of Florida
> >> >>> > > Ph.D. Candidate
> >> >>> > > 352-392-4032
> >> >>> > > _______________________________________________
> >> >>> > > AMBER mailing list
> >> >>> > > AMBER.ambermd.org
> >> >>> > > http://lists.ambermd.org/mailman/listinfo/amber
> >> >>> > >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > --
> >> >>> > --
> >> >>> > Bruno Barbosa Rodrigues
> >> >>> > PhD Student - Physics Department
> >> >>> > Universidade Federal de Minas Gerais - UFMG
> >> >>> > Belo Horizonte - Brazil
> >> >>> > _______________________________________________
> >> >>> > AMBER mailing list
> >> >>> > AMBER.ambermd.org
> >> >>> > http://lists.ambermd.org/mailman/listinfo/amber
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Jason M. Swails
> >> >>> Quantum Theory Project,
> >> >>> University of Florida
> >> >>> Ph.D. Candidate
> >> >>> 352-392-4032
> >> >>> _______________________________________________
> >> >>> AMBER mailing list
> >> >>> AMBER.ambermd.org
> >> >>> http://lists.ambermd.org/mailman/listinfo/amber
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> --
> >> >> Bruno Barbosa Rodrigues
> >> >> PhD Student - Physics Department
> >> >> Universidade Federal de Minas Gerais - UFMG
> >> >> Belo Horizonte - Brazil
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > --
> >> > Bruno Barbosa Rodrigues
> >> > PhD Student - Physics Department
> >> > Universidade Federal de Minas Gerais - UFMG
> >> > Belo Horizonte - Brazil
> >> >
> >>
> >>
> >>
> >> --
> >> --
> >> Bruno Barbosa Rodrigues
> >> PhD Student - Physics Department
> >> Universidade Federal de Minas Gerais - UFMG
> >> Belo Horizonte - Brazil
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
>
> --
> Carlos P Sosa, Ph.D.
> Adjunct Assistant Professor
> Biomedical Informatics and Computational Biology (BICB)
> University of Minnesota Rochester
> 300 University Square
> R0869A
> 111 S Broadway
> Rochester, MN 55904
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
--
--
Bruno Barbosa Rodrigues
PhD Student - Physics Department
Universidade Federal de Minas Gerais - UFMG
Belo Horizonte - Brazil
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Sep 08 2011 - 23:30:03 PDT