Re: [AMBER] sander.MPI

From: Fabian Glaser <fglaser.technion.ac.il>
Date: Tue, 18 Dec 2012 13:21:05 +0200

Hi all,

I am having still problems to run in more than one node, I did follow Jason kind suggestions to try and find the problem and find nothing significative, see please our system people answers:


" Given the
> > > > communication required on each step, though, unless your nodes
have a fast
> > > > interconnect (e.g., some type of infiniband) "

System people answer: Tamnun (our cluster) is designed for multinode proccessing, so the above isn't a question.


"If you're really having a problem running on multiple nodes, the issue
> > > > > is
> > > > > probably somewhere in your system configuration or your MPI
> > > > > installation."

Answer: Other user use MPI without problems.

" If you want to test inter-node MPI with a
> > > > > very
> > > > > simple program, try running something like this:
> > > > >
> > > > > mpiexec -hostfile $PBS_NODEFILE $AMBERHOME/test/numprocs
> > > > >
> > > > > Which should just output the total number of processors you
asked
for to
> > > > > the PBS output file (#PBS -o <pbs_output>)"

I tried, the answers are as expected, that is:

I tried the test, it appears that the PBS is able to access the 24 processors:

#!/bin/sh
#
#PBS -N test
#PBS -q nano_h_p
#PBS -M fglaser.technion.ac.il
#PBS -mbea
#PBS -l select=2:ncpus=12:mpiprocs=12
#PBS -o pbs_output

mpirun -hostfile $PBS_NODEFILE $AMBERHOME/test/numprocs

This is the content of pbs_output

and for #PBS -l select=1:ncpus=12:mpiprocs=12
the output is:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
12

For #PBS -l select=2:ncpus=12:mpiprocs=12
the output is:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
24

I would appreciate your help to try and find the prblem.

thanks!!

Fabian
_______________________________
Fabian Glaser, PhD
Bioinformatics Knowledge Unit,
The Lorry I. Lokey Interdisciplinary
Center for Life Sciences and Engineering

Technion - Israel Institute of Technology
Haifa 32000, ISRAEL
fglaser.technion.ac.il
Tel: +972 4 8293701
Fax: +972 4 8225153

On Dec 16, 2012, at 4:48 PM, Jason Swails wrote:

> On Sun, Dec 16, 2012 at 2:21 AM, Fabian Glaser <fglaser.technion.ac.il>wrote:
>
>> Hi,
>>
>> I am using the following PBS file to run sander
>>
>> #PBS -l select=1:ncpus=12:mpiprocs=12
>> ...
>> mpirun -hostfile $PBS_NODEFILE pmemd.MPI -O -i prod.in -p
>> 3SO6_clean.prmtop -c 3SO6_clean_prod_1.rst -o 3SO6_clean_prod_2.out -x
>> 3SO6_clean_prod_2.mdcrd -r 3SO6_clean_prod_2.rst
>>
>> Which runs perfectly, at a rate of about ns/day = 3.67
>>
>> But if I try to use more than one node, for example:
>> #PBS -l select=2:ncpus=12:mpiprocs=12
>>
>> The job does not seem to start or at least output files are not writte...
>>
>> Is there a way to use more than one node? Or any way to accelerate the
>> process?
>>
>
> I have never had problems running on multiple nodes. Given the
> communication required on each step, though, unless your nodes have a fast
> interconnect (e.g., some type of infiniband) you will be better off just
> using 1 node if each node has 12 cores available, IMO.
>
> If you're really having a problem running on multiple nodes, the issue is
> probably somewhere in your system configuration or your MPI installation.
> Some systems may require you to set up password-less login between nodes
> using an ssh-key, since multi-node jobs need to send information between
> nodes. Since we just use the MPI API, the problem is highly unlikely to be
> Amber.
>
> I would suggest contacting your system administrator for this cluster with
> the problems you're having. If you want to test inter-node MPI with a very
> simple program, try running something like this:
>
> mpiexec -hostfile $PBS_NODEFILE $AMBERHOME/test/numprocs
>
> Which should just output the total number of processors you asked for to
> the PBS output file (#PBS -o <pbs_output>)
>
> Good luck,
> Jason
>
> --
> Jason M. Swails
> Quantum Theory Project,
> University of Florida
> Ph.D. Candidate
> 352-392-4032
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Dec 18 2012 - 03:30:03 PST
Custom Search