Hi Bruno,
There is probably one of two things happening.
1) Your PBS script is running all the threads on the same node rather than
distributing them. Note, you normally specify resources in PBS with
#PBS -l nodes=4:ppn=8
Which would give you a total of 32 MPI threads spread over 4 nodes.
Try cating the contents of $PBS_NODEFILE and see which nodes it shows as
running on. You can also try 'mpirun -np 64 hostfile' which will show you
the hostname of every node an MPI thread is running on.
2) Your mpi is messed up in some way. One possibility is that the mpi you
are using is not the infiniband one and thus it is running over the Ethernet
interconnect instead of the IB interface. You should probably try to use
mvapich2 as the mpi library and try running some of the bandwidth and ping
tests that come with that to make sure everything is performing correctly.
Then make sure you recompile AMBER with this mpi version.
I would suggest starting at just 2 MPI tasks. Start by running pmemd in
serial and see what performance you get. Then try running the MPI version
on 2 cores and see what happens. Try the standard benchmarks included here:
http://ambermd.org/amber11_bench_files/Amber11_Benchmark_Suite.tar.gz
All the best
Ross
> -----Original Message-----
> From: Bruno Rodrigues [mailto:bbrodrigues.gmail.com]
> Sent: Monday, August 29, 2011 8:34 PM
> To: AMBER Mailing List
> Subject: Re: [AMBER] PBS script
>
> the serial sander runs at 0.6ns/day....
>
> may it be a problem with mpi?
>
> On Mon, Aug 29, 2011 at 7:13 PM, Bruno Rodrigues
> <bbrodrigues.gmail.com>wrote:
>
> > I've found that on the former cluster, the logfile prints out the FFT
> slab
> > distribution, and now it's FFT block distribution.
> > Does it mean that something has substantially changed on the way FFT
> > distributes the the blocks?
> >
> >
> > On Mon, Aug 29, 2011 at 6:05 PM, Bruno Rodrigues
> <bbrodrigues.gmail.com>wrote:
> >
> >> it's InfiniBand, at 40 Gbps.
> >>
> >>
> >> On Mon, Aug 29, 2011 at 5:59 PM, Jason Swails
> <jason.swails.gmail.com>wrote:
> >>
> >>> What kind of interconnect does your cluster have?
> >>>
> >>> On Mon, Aug 29, 2011 at 4:54 PM, Bruno Rodrigues
> <bbrodrigues.gmail.com
> >>> >wrote:
> >>>
> >>> > After the changes you suggested, I got this information on the
> output
> >>> >
> >>> > | Dynamic Memory, Types Used:
> >>> > | Reals 688690
> >>> > | Integers 595564
> >>> >
> >>> > | Nonbonded Pairs Initial Allocation: 146264
> >>> >
> >>> > | Running AMBER/MPI version on 64 nodes
> >>> >
> >>> > and still a performance of 02.ns/day.
> >>> >
> >>> > There is now a log file that didn't appear before, with the
> following
> >>> > information:
> >>> >
> >>> >
> >>> > Initial FFT Block Distribution Based on Workload Estimate:
> >>> >
> >>> > FFT blocks assigned to 12 tasks
> >>> >
> >>> > First FFT Block Distribution Based on Actual Workload:
> >>> >
> >>> > FFT blocks assigned to 56 tasks
> >>> >
> >>> > Image Distribution at run step 344:
> >>> >
> >>> > Count of images assigned to each task:
> >>> > 340 437 412 335 542 572
> 542
> >>> 516
> >>> > 291 256 99 1 0 0
> 0
> >>> 0
> >>> > 0 0 0 0 0 0
> 230
> >>> 184
> >>> > 1 0 244 352 436 6
> 82
> >>> 219
> >>> > 23 1 2 64 137 283
> 173
> >>> 59
> >>> > 290 133 233 81 253 198
> 341
> >>> 173
> >>> > 280 330 367 267 157 117
> 407
> >>> 125
> >>> > 361 374 533 455 606 646
> 1003
> >>> 905
> >>> >
> >>> >
> >>> > What does it mean?
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > On Mon, Aug 29, 2011 at 5:34 PM, Jason Swails
> <jason.swails.gmail.com
> >>> > >wrote:
> >>> >
> >>> > > Every PBS system is set up differently, so it's impossible for
> us to
> >>> tell
> >>> > > what may be happening for sure. However, I suspect that you're
> not
> >>> > getting
> >>> > > 64 CPUs like you think you are.
> >>> > >
> >>> > > On Mon, Aug 29, 2011 at 4:05 PM, Bruno Rodrigues <
> >>> bbrodrigues.gmail.com
> >>> > > >wrote:
> >>> > >
> >>> > > > Dear All,
> >>> > > >
> >>> > > > I'm trying to run parallel Amber 11 on a cluster with PBS.
> I've
> >>> checked
> >>> > > the
> >>> > > > parallel installation and it's quite fine (the log file
> attached).
> >>> > > >
> >>> > > > However, the performance is always between 0.1 and 0.5
> ns/day, no
> >>> > matter
> >>> > > > the
> >>> > > > number of processors I choose. Is there something missing in
> my
> >>> script?
> >>> > > >
> >>> > > > Here are the changes I made on my configure (for the parallel
> >>> version):
> >>> > > > mpicc --> icc -lmpi
> >>> > > > mpif90 --> ifort -lmpi
> >>> > > >
> >>> > > > This generated the correct config.h needed for the fortran
> >>> compiler.
> >>> > > >
> >>> > > > However, the problem persists with gnu installing, so I guess
> it
> >>> has
> >>> > > > nothing
> >>> > > > to do with the installation, but it's pretty much a
> submission
> >>> problem.
> >>> > > > Here
> >>> > > > is an example of my job:
> >>> > > >
> >>> > > > #!/bin/bash
> >>> > > > #
> >>> > > > #################################################
> >>> > > > # THIS JOB IS TO EQUILIBRATE THE SYSTEM AT 300K #
> >>> > > > # TO BE USED IN FUTURE SIMULATIONS. IT STARTS #
> >>> > > > # FROM THE EQUILIBRATION ON CHACOBO, WHERE 1ns #
> >>> > > > # WAS PERFORMED AFTER THE DNA WAS RELEASED. #
> >>> > > > #################################################
> >>> > > > #
> >>> > > > #PBS -S /bin/sh
> >>> > > > #
> >>> > > > # Nome do job
> >>> > > > #PBS -N prod_slow
> >>> > > > #
> >>> > > > #Erro na saida padrao
> >>> > > > #PBS -j oe
> >>> > > > #
> >>> > > > # Chamada do ambiente paralelo e numero de slots
> >>> > > > #PBS -l select=64:ncpus=1
> >>> > > > #PBS -l walltime=200:00:00
> >>> > > >
> >>> > > > #
> >>> > > > cd $PBS_O_WORKDIR
> >>> > > >
> >>> > > > export sander=/home/u/bbr/bin/amber11/bin/pmemd.MPI
> >>> > > >
> >>> > >
> >>> > > In here, add the line
> >>> > >
> >>> > > CPUS=`cat $PBS_NODEFILE | wc -l`
> >>> > >
> >>> > >
> >>> > > >
> >>> > > > l=heat20
> >>> > > > f=prod01
> >>> > > > mpiexec -n 64 $sander -O -i $PWD/$f.in -o $PWD/$f.out -inf
> >>> $PWD/$f.inf
> >>> > \
> >>> > > > -c $PWD/1D20_wat_tip3pf.$l -ref
> $PWD/1D20_wat_tip3pf.$l -r
> >>> > > > $PWD/1D20_wat_tip3pf.$f \
> >>> > > > -p $PWD/1D20_wat_tip3pf.top -x
> $PWD/1D20_wat_tip3pf$f.x -e
> >>> > > > $PWD/1D20_wat_tip3pf$f.ene
> >>> > > >
> >>> > >
> >>> > > change the beginning to "mpiexec -n $CPUS" instead of "mpiexec
> -n
> >>> 64".
> >>> > > pmemd.MPI should report how many processors are being used,
> which
> >>> should
> >>> > > help you make sure that you're at least allocating all the
> processors
> >>> you
> >>> > > want to be. You could also consider passing mpiexec the
> PBS_NODEFILE
> >>> if
> >>> > > you
> >>> > > find out how to pass your mpiexec a hostfile or nodefile or
> something
> >>> > (this
> >>> > > makes sure that each thread is bound to the proper processor).
> >>> > >
> >>> > > HTH,
> >>> > > Jason
> >>> > >
> >>> > > --
> >>> > > Jason M. Swails
> >>> > > Quantum Theory Project,
> >>> > > University of Florida
> >>> > > Ph.D. Candidate
> >>> > > 352-392-4032
> >>> > > _______________________________________________
> >>> > > AMBER mailing list
> >>> > > AMBER.ambermd.org
> >>> > > http://lists.ambermd.org/mailman/listinfo/amber
> >>> > >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > --
> >>> > Bruno Barbosa Rodrigues
> >>> > PhD Student - Physics Department
> >>> > Universidade Federal de Minas Gerais - UFMG
> >>> > Belo Horizonte - Brazil
> >>> > _______________________________________________
> >>> > AMBER mailing list
> >>> > AMBER.ambermd.org
> >>> > http://lists.ambermd.org/mailman/listinfo/amber
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Jason M. Swails
> >>> Quantum Theory Project,
> >>> University of Florida
> >>> Ph.D. Candidate
> >>> 352-392-4032
> >>> _______________________________________________
> >>> AMBER mailing list
> >>> AMBER.ambermd.org
> >>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>
> >>
> >>
> >>
> >> --
> >> --
> >> Bruno Barbosa Rodrigues
> >> PhD Student - Physics Department
> >> Universidade Federal de Minas Gerais - UFMG
> >> Belo Horizonte - Brazil
> >>
> >
> >
> >
> > --
> > --
> > Bruno Barbosa Rodrigues
> > PhD Student - Physics Department
> > Universidade Federal de Minas Gerais - UFMG
> > Belo Horizonte - Brazil
> >
>
>
>
> --
> --
> Bruno Barbosa Rodrigues
> PhD Student - Physics Department
> Universidade Federal de Minas Gerais - UFMG
> Belo Horizonte - Brazil
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Aug 30 2011 - 08:00:02 PDT