Thank you Jason
On Tue, Aug 30, 2011 at 11:37 AM, Ross Walker <ross.rosswalker.co.uk> wrote:
> Hi Bruno,
>
> There is probably one of two things happening.
>
> 1) Your PBS script is running all the threads on the same node rather than
> distributing them. Note, you normally specify resources in PBS with
>
> #PBS -l nodes=4:ppn=8
>
> Which would give you a total of 32 MPI threads spread over 4 nodes.
>
It came from the manual of my cluster, but didn't work at all. yours is
working, but it was not what solved my problem, still....
Try cating the contents of $PBS_NODEFILE and see which nodes it shows as
> running on. You can also try 'mpirun -np 64 hostfile' which will show you
> the hostname of every node an MPI thread is running on.
>
> The processes were distributed correctly.
> 2) Your mpi is messed up in some way. One possibility is that the mpi you
> are using is not the infiniband one and thus it is running over the
> Ethernet
> interconnect instead of the IB interface. You should probably try to use
> mvapich2 as the mpi library and try running some of the bandwidth and ping
> tests that come with that to make sure everything is performing correctly.
> Then make sure you recompile AMBER with this mpi version.
>
> The MPI was incredibly messed up. I was using MPT from SGI with gnu
compiler. I ended up having to change many many flags no configure files as
well as on config.h, and at the end I realized that it should not be THAT
complicated (like playing with mpicc -->> gcc -lmpi and so on).
Then I asked the administrator to install mpicc, mpif90 and mpi.
Result: my simulations are runing as fast as Sonic!
> I would suggest starting at just 2 MPI tasks. Start by running pmemd in
> serial and see what performance you get. Then try running the MPI version
> on 2 cores and see what happens. Try the standard benchmarks included here:
> http://ambermd.org/amber11_bench_files/Amber11_Benchmark_Suite.tar.gz
>
> All the best
> Ross
>
> > -----Original Message-----
> > From: Bruno Rodrigues [mailto:bbrodrigues.gmail.com]
> > Sent: Monday, August 29, 2011 8:34 PM
> > To: AMBER Mailing List
> > Subject: Re: [AMBER] PBS script
> >
> > the serial sander runs at 0.6ns/day....
> >
> > may it be a problem with mpi?
> >
> > On Mon, Aug 29, 2011 at 7:13 PM, Bruno Rodrigues
> > <bbrodrigues.gmail.com>wrote:
> >
> > > I've found that on the former cluster, the logfile prints out the FFT
> > slab
> > > distribution, and now it's FFT block distribution.
> > > Does it mean that something has substantially changed on the way FFT
> > > distributes the the blocks?
> > >
> > >
> > > On Mon, Aug 29, 2011 at 6:05 PM, Bruno Rodrigues
> > <bbrodrigues.gmail.com>wrote:
> > >
> > >> it's InfiniBand, at 40 Gbps.
> > >>
> > >>
> > >> On Mon, Aug 29, 2011 at 5:59 PM, Jason Swails
> > <jason.swails.gmail.com>wrote:
> > >>
> > >>> What kind of interconnect does your cluster have?
> > >>>
> > >>> On Mon, Aug 29, 2011 at 4:54 PM, Bruno Rodrigues
> > <bbrodrigues.gmail.com
> > >>> >wrote:
> > >>>
> > >>> > After the changes you suggested, I got this information on the
> > output
> > >>> >
> > >>> > | Dynamic Memory, Types Used:
> > >>> > | Reals 688690
> > >>> > | Integers 595564
> > >>> >
> > >>> > | Nonbonded Pairs Initial Allocation: 146264
> > >>> >
> > >>> > | Running AMBER/MPI version on 64 nodes
> > >>> >
> > >>> > and still a performance of 02.ns/day.
> > >>> >
> > >>> > There is now a log file that didn't appear before, with the
> > following
> > >>> > information:
> > >>> >
> > >>> >
> > >>> > Initial FFT Block Distribution Based on Workload Estimate:
> > >>> >
> > >>> > FFT blocks assigned to 12 tasks
> > >>> >
> > >>> > First FFT Block Distribution Based on Actual Workload:
> > >>> >
> > >>> > FFT blocks assigned to 56 tasks
> > >>> >
> > >>> > Image Distribution at run step 344:
> > >>> >
> > >>> > Count of images assigned to each task:
> > >>> > 340 437 412 335 542 572
> > 542
> > >>> 516
> > >>> > 291 256 99 1 0 0
> > 0
> > >>> 0
> > >>> > 0 0 0 0 0 0
> > 230
> > >>> 184
> > >>> > 1 0 244 352 436 6
> > 82
> > >>> 219
> > >>> > 23 1 2 64 137 283
> > 173
> > >>> 59
> > >>> > 290 133 233 81 253 198
> > 341
> > >>> 173
> > >>> > 280 330 367 267 157 117
> > 407
> > >>> 125
> > >>> > 361 374 533 455 606 646
> > 1003
> > >>> 905
> > >>> >
> > >>> >
> > >>> > What does it mean?
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> > On Mon, Aug 29, 2011 at 5:34 PM, Jason Swails
> > <jason.swails.gmail.com
> > >>> > >wrote:
> > >>> >
> > >>> > > Every PBS system is set up differently, so it's impossible for
> > us to
> > >>> tell
> > >>> > > what may be happening for sure. However, I suspect that you're
> > not
> > >>> > getting
> > >>> > > 64 CPUs like you think you are.
> > >>> > >
> > >>> > > On Mon, Aug 29, 2011 at 4:05 PM, Bruno Rodrigues <
> > >>> bbrodrigues.gmail.com
> > >>> > > >wrote:
> > >>> > >
> > >>> > > > Dear All,
> > >>> > > >
> > >>> > > > I'm trying to run parallel Amber 11 on a cluster with PBS.
> > I've
> > >>> checked
> > >>> > > the
> > >>> > > > parallel installation and it's quite fine (the log file
> > attached).
> > >>> > > >
> > >>> > > > However, the performance is always between 0.1 and 0.5
> > ns/day, no
> > >>> > matter
> > >>> > > > the
> > >>> > > > number of processors I choose. Is there something missing in
> > my
> > >>> script?
> > >>> > > >
> > >>> > > > Here are the changes I made on my configure (for the parallel
> > >>> version):
> > >>> > > > mpicc --> icc -lmpi
> > >>> > > > mpif90 --> ifort -lmpi
> > >>> > > >
> > >>> > > > This generated the correct config.h needed for the fortran
> > >>> compiler.
> > >>> > > >
> > >>> > > > However, the problem persists with gnu installing, so I guess
> > it
> > >>> has
> > >>> > > > nothing
> > >>> > > > to do with the installation, but it's pretty much a
> > submission
> > >>> problem.
> > >>> > > > Here
> > >>> > > > is an example of my job:
> > >>> > > >
> > >>> > > > #!/bin/bash
> > >>> > > > #
> > >>> > > > #################################################
> > >>> > > > # THIS JOB IS TO EQUILIBRATE THE SYSTEM AT 300K #
> > >>> > > > # TO BE USED IN FUTURE SIMULATIONS. IT STARTS #
> > >>> > > > # FROM THE EQUILIBRATION ON CHACOBO, WHERE 1ns #
> > >>> > > > # WAS PERFORMED AFTER THE DNA WAS RELEASED. #
> > >>> > > > #################################################
> > >>> > > > #
> > >>> > > > #PBS -S /bin/sh
> > >>> > > > #
> > >>> > > > # Nome do job
> > >>> > > > #PBS -N prod_slow
> > >>> > > > #
> > >>> > > > #Erro na saida padrao
> > >>> > > > #PBS -j oe
> > >>> > > > #
> > >>> > > > # Chamada do ambiente paralelo e numero de slots
> > >>> > > > #PBS -l select=64:ncpus=1
> > >>> > > > #PBS -l walltime=200:00:00
> > >>> > > >
> > >>> > > > #
> > >>> > > > cd $PBS_O_WORKDIR
> > >>> > > >
> > >>> > > > export sander=/home/u/bbr/bin/amber11/bin/pmemd.MPI
> > >>> > > >
> > >>> > >
> > >>> > > In here, add the line
> > >>> > >
> > >>> > > CPUS=`cat $PBS_NODEFILE | wc -l`
> > >>> > >
> > >>> > >
> > >>> > > >
> > >>> > > > l=heat20
> > >>> > > > f=prod01
> > >>> > > > mpiexec -n 64 $sander -O -i $PWD/$f.in -o $PWD/$f.out -inf
> > >>> $PWD/$f.inf
> > >>> > \
> > >>> > > > -c $PWD/1D20_wat_tip3pf.$l -ref
> > $PWD/1D20_wat_tip3pf.$l -r
> > >>> > > > $PWD/1D20_wat_tip3pf.$f \
> > >>> > > > -p $PWD/1D20_wat_tip3pf.top -x
> > $PWD/1D20_wat_tip3pf$f.x -e
> > >>> > > > $PWD/1D20_wat_tip3pf$f.ene
> > >>> > > >
> > >>> > >
> > >>> > > change the beginning to "mpiexec -n $CPUS" instead of "mpiexec
> > -n
> > >>> 64".
> > >>> > > pmemd.MPI should report how many processors are being used,
> > which
> > >>> should
> > >>> > > help you make sure that you're at least allocating all the
> > processors
> > >>> you
> > >>> > > want to be. You could also consider passing mpiexec the
> > PBS_NODEFILE
> > >>> if
> > >>> > > you
> > >>> > > find out how to pass your mpiexec a hostfile or nodefile or
> > something
> > >>> > (this
> > >>> > > makes sure that each thread is bound to the proper processor).
> > >>> > >
> > >>> > > HTH,
> > >>> > > Jason
> > >>> > >
> > >>> > > --
> > >>> > > Jason M. Swails
> > >>> > > Quantum Theory Project,
> > >>> > > University of Florida
> > >>> > > Ph.D. Candidate
> > >>> > > 352-392-4032
> > >>> > > _______________________________________________
> > >>> > > AMBER mailing list
> > >>> > > AMBER.ambermd.org
> > >>> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >>> > >
> > >>> >
> > >>> >
> > >>> >
> > >>> > --
> > >>> > --
> > >>> > Bruno Barbosa Rodrigues
> > >>> > PhD Student - Physics Department
> > >>> > Universidade Federal de Minas Gerais - UFMG
> > >>> > Belo Horizonte - Brazil
> > >>> > _______________________________________________
> > >>> > AMBER mailing list
> > >>> > AMBER.ambermd.org
> > >>> > http://lists.ambermd.org/mailman/listinfo/amber
> > >>> >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Jason M. Swails
> > >>> Quantum Theory Project,
> > >>> University of Florida
> > >>> Ph.D. Candidate
> > >>> 352-392-4032
> > >>> _______________________________________________
> > >>> AMBER mailing list
> > >>> AMBER.ambermd.org
> > >>> http://lists.ambermd.org/mailman/listinfo/amber
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> --
> > >> Bruno Barbosa Rodrigues
> > >> PhD Student - Physics Department
> > >> Universidade Federal de Minas Gerais - UFMG
> > >> Belo Horizonte - Brazil
> > >>
> > >
> > >
> > >
> > > --
> > > --
> > > Bruno Barbosa Rodrigues
> > > PhD Student - Physics Department
> > > Universidade Federal de Minas Gerais - UFMG
> > > Belo Horizonte - Brazil
> > >
> >
> >
> >
> > --
> > --
> > Bruno Barbosa Rodrigues
> > PhD Student - Physics Department
> > Universidade Federal de Minas Gerais - UFMG
> > Belo Horizonte - Brazil
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
--
--
Bruno Barbosa Rodrigues
PhD Student - Physics Department
Universidade Federal de Minas Gerais - UFMG
Belo Horizonte - Brazil
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Sep 08 2011 - 23:30:02 PDT