On Fri, Sep 21, 2012 at 1:51 AM, marawan hussain
<marawanhussain.yahoo.com>wrote:
> Hi Jason,
> I followed your suggestion and use the following script:
>
> #!/bin/bash
> #PBS -l mem=1gb
> #PBS -l walltime=01:10:00
> #PBS -N m8_npt
>
> echo "Num Procs is `cat $PBS_NODEFILE | wc -l`"
> source /usr/local/modules/init/bash
> module load amber/x86_64/gnu/12_mpi
> module load mvapich2/1.8
> cd $PBS_O_WORKDIR
>
> mpirun -f NODEFILE -np 8 pmemd.MPI -O -i eq_1_heat.in -p
> com_solvated_m8.top -c min_solventonly_5.rst -r eq_1_heat.rst -x
> eq_1_heat.mdcrd -o eq_1_heat.out -ref min_solventonly_5.rst
>
This should be $PBS_NODEFILE not NODEFILE (notice the PBS_ and the leading
$). Again, I highly suggest removing the "-np 8" -- let PBS tell mpirun
how many threads to start.
>
>
>
> I use this NODEFILE:
> node001
> node002
> node003
> node004
> node005
> node006
> node007
> node008
>
You should use the node file provided by PBS. To find out what this is,
you can use a command like:
echo =========
cat $PBS_NODEFILE
echo =========
Which should print the PBS-provided nodefile between 2 lines of ='s.
Strangely enough, i didn't get any output, although when i view the running
> jobs under my account the job is still running....I also noticed
> this behavior before when i was using the (#PBS -l nodes=8) in the script
> before..
>
PBS redirects stdout and stderr to different files whose default names
depend on the PBS configuration. You can specify the names of the output
and error files using the lines
#PBS -o pbs.out
#PBS -e pbs.err
(or you can make the second like "#PBS -j oe" to print the output and error
to the same place). Then look for the pbs.out file for the "Num Procs is
X" line.
Also, i removed the -f NODEFILE keyword from the command line and typed the
> command you sent me, the output contained this line:
>
> Num Procs is 1
>
This means your PBS_NODEFILE has a single line (and therefore, a single
processor)
> I also got this logfile:
>
>
>
> Static FFT Slab Distribution:
>
> FFT slabs assigned to 8 tasks
> Maximum of 12 xy slabs per task
> Maximum of 12 zx slabs per task
> Count of FFT xy slabs assigned to each task:
> 12 12 12 12 12 12 12 12
> Count of FFT xz slabs assigned to each task:
> 12 12 11 11 11 11 11 11
>
This is a pmemd log file, unrelated to PBS in any way.
>
> Could you please comment...I feel as if one processor is doing 8
> jobs....Is it true....???
>
It appears so. And this will absolutely trash scaling. It will actually
probably spread out the jobs to other processors on the same node (which
will annoy anybody else trying to run jobs on that node, since you are
using processors you are not assigned to!) In fact, many sysadmins will
kill jobs that use more than they requested like this if they find them.
If you have only 8 cores per node (which is very common), then this may use
all 8 processors in that node (you will be lucky if nobody is assigned the
other processors in that node). In this case, you can expect *some*
speedup up to 8 cores, but you should expect scaling to end after that,
since you will be running multiple threads on the same core. (This may
explain the scaling behavior you've observed).
You should ask your sysadmin how to request nodes and such. A popular
option is to request nodes like:
#PBS -l nodes=8:ppn=8
Which will request 8 nodes, and 8 processors on each node (for 64
processors total). This will set up a PBS_NODEFILE with 64 lines. If
properly passed to mpirun, this will just 'do the right thing'.
You can see
http://jswails.wikidot.com/using-pbs for a mini tutorial on
using PBS.
HTH,
Jason
--
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Sep 21 2012 - 05:30:02 PDT