I've found that on the former cluster, the logfile prints out the FFT slab
distribution, and now it's FFT block distribution.
Does it mean that something has substantially changed on the way FFT
distributes the the blocks?
On Mon, Aug 29, 2011 at 6:05 PM, Bruno Rodrigues <bbrodrigues.gmail.com>wrote:
> it's InfiniBand, at 40 Gbps.
>
>
> On Mon, Aug 29, 2011 at 5:59 PM, Jason Swails <jason.swails.gmail.com>wrote:
>
>> What kind of interconnect does your cluster have?
>>
>> On Mon, Aug 29, 2011 at 4:54 PM, Bruno Rodrigues <bbrodrigues.gmail.com
>> >wrote:
>>
>> > After the changes you suggested, I got this information on the output
>> >
>> > | Dynamic Memory, Types Used:
>> > | Reals 688690
>> > | Integers 595564
>> >
>> > | Nonbonded Pairs Initial Allocation: 146264
>> >
>> > | Running AMBER/MPI version on 64 nodes
>> >
>> > and still a performance of 02.ns/day.
>> >
>> > There is now a log file that didn't appear before, with the following
>> > information:
>> >
>> >
>> > Initial FFT Block Distribution Based on Workload Estimate:
>> >
>> > FFT blocks assigned to 12 tasks
>> >
>> > First FFT Block Distribution Based on Actual Workload:
>> >
>> > FFT blocks assigned to 56 tasks
>> >
>> > Image Distribution at run step 344:
>> >
>> > Count of images assigned to each task:
>> > 340 437 412 335 542 572 542
>> 516
>> > 291 256 99 1 0 0 0
>> 0
>> > 0 0 0 0 0 0 230
>> 184
>> > 1 0 244 352 436 6 82
>> 219
>> > 23 1 2 64 137 283 173
>> 59
>> > 290 133 233 81 253 198 341
>> 173
>> > 280 330 367 267 157 117 407
>> 125
>> > 361 374 533 455 606 646 1003
>> 905
>> >
>> >
>> > What does it mean?
>> >
>> >
>> >
>> >
>> > On Mon, Aug 29, 2011 at 5:34 PM, Jason Swails <jason.swails.gmail.com
>> > >wrote:
>> >
>> > > Every PBS system is set up differently, so it's impossible for us to
>> tell
>> > > what may be happening for sure. However, I suspect that you're not
>> > getting
>> > > 64 CPUs like you think you are.
>> > >
>> > > On Mon, Aug 29, 2011 at 4:05 PM, Bruno Rodrigues <
>> bbrodrigues.gmail.com
>> > > >wrote:
>> > >
>> > > > Dear All,
>> > > >
>> > > > I'm trying to run parallel Amber 11 on a cluster with PBS. I've
>> checked
>> > > the
>> > > > parallel installation and it's quite fine (the log file attached).
>> > > >
>> > > > However, the performance is always between 0.1 and 0.5 ns/day, no
>> > matter
>> > > > the
>> > > > number of processors I choose. Is there something missing in my
>> script?
>> > > >
>> > > > Here are the changes I made on my configure (for the parallel
>> version):
>> > > > mpicc --> icc -lmpi
>> > > > mpif90 --> ifort -lmpi
>> > > >
>> > > > This generated the correct config.h needed for the fortran compiler.
>> > > >
>> > > > However, the problem persists with gnu installing, so I guess it has
>> > > > nothing
>> > > > to do with the installation, but it's pretty much a submission
>> problem.
>> > > > Here
>> > > > is an example of my job:
>> > > >
>> > > > #!/bin/bash
>> > > > #
>> > > > #################################################
>> > > > # THIS JOB IS TO EQUILIBRATE THE SYSTEM AT 300K #
>> > > > # TO BE USED IN FUTURE SIMULATIONS. IT STARTS #
>> > > > # FROM THE EQUILIBRATION ON CHACOBO, WHERE 1ns #
>> > > > # WAS PERFORMED AFTER THE DNA WAS RELEASED. #
>> > > > #################################################
>> > > > #
>> > > > #PBS -S /bin/sh
>> > > > #
>> > > > # Nome do job
>> > > > #PBS -N prod_slow
>> > > > #
>> > > > #Erro na saida padrao
>> > > > #PBS -j oe
>> > > > #
>> > > > # Chamada do ambiente paralelo e numero de slots
>> > > > #PBS -l select=64:ncpus=1
>> > > > #PBS -l walltime=200:00:00
>> > > >
>> > > > #
>> > > > cd $PBS_O_WORKDIR
>> > > >
>> > > > export sander=/home/u/bbr/bin/amber11/bin/pmemd.MPI
>> > > >
>> > >
>> > > In here, add the line
>> > >
>> > > CPUS=`cat $PBS_NODEFILE | wc -l`
>> > >
>> > >
>> > > >
>> > > > l=heat20
>> > > > f=prod01
>> > > > mpiexec -n 64 $sander -O -i $PWD/$f.in -o $PWD/$f.out -inf
>> $PWD/$f.inf
>> > \
>> > > > -c $PWD/1D20_wat_tip3pf.$l -ref $PWD/1D20_wat_tip3pf.$l -r
>> > > > $PWD/1D20_wat_tip3pf.$f \
>> > > > -p $PWD/1D20_wat_tip3pf.top -x $PWD/1D20_wat_tip3pf$f.x -e
>> > > > $PWD/1D20_wat_tip3pf$f.ene
>> > > >
>> > >
>> > > change the beginning to "mpiexec -n $CPUS" instead of "mpiexec -n 64".
>> > > pmemd.MPI should report how many processors are being used, which
>> should
>> > > help you make sure that you're at least allocating all the processors
>> you
>> > > want to be. You could also consider passing mpiexec the PBS_NODEFILE
>> if
>> > > you
>> > > find out how to pass your mpiexec a hostfile or nodefile or something
>> > (this
>> > > makes sure that each thread is bound to the proper processor).
>> > >
>> > > HTH,
>> > > Jason
>> > >
>> > > --
>> > > Jason M. Swails
>> > > Quantum Theory Project,
>> > > University of Florida
>> > > Ph.D. Candidate
>> > > 352-392-4032
>> > > _______________________________________________
>> > > AMBER mailing list
>> > > AMBER.ambermd.org
>> > > http://lists.ambermd.org/mailman/listinfo/amber
>> > >
>> >
>> >
>> >
>> > --
>> > --
>> > Bruno Barbosa Rodrigues
>> > PhD Student - Physics Department
>> > Universidade Federal de Minas Gerais - UFMG
>> > Belo Horizonte - Brazil
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>>
>>
>>
>> --
>> Jason M. Swails
>> Quantum Theory Project,
>> University of Florida
>> Ph.D. Candidate
>> 352-392-4032
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>
> --
> --
> Bruno Barbosa Rodrigues
> PhD Student - Physics Department
> Universidade Federal de Minas Gerais - UFMG
> Belo Horizonte - Brazil
>
--
--
Bruno Barbosa Rodrigues
PhD Student - Physics Department
Universidade Federal de Minas Gerais - UFMG
Belo Horizonte - Brazil
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Aug 29 2011 - 15:30:02 PDT