Hi all,
I am having still problems to run in more than one node, I did follow Jason kind suggestions to try and find the problem and find nothing significative, see please our system people answers:
" Given the
> > > > communication required on each step, though, unless your nodes
have a fast
> > > > interconnect (e.g., some type of infiniband) "
System people answer: Tamnun (our cluster) is designed for multinode proccessing, so the above isn't a question.
"If you're really having a problem running on multiple nodes, the issue
> > > > > is
> > > > > probably somewhere in your system configuration or your MPI
> > > > > installation."
Answer: Other user use MPI without problems.
" If you want to test inter-node MPI with a
> > > > > very
> > > > > simple program, try running something like this:
> > > > >
> > > > > mpiexec -hostfile $PBS_NODEFILE $AMBERHOME/test/numprocs
> > > > >
> > > > > Which should just output the total number of processors you
asked
for to
> > > > > the PBS output file (#PBS -o <pbs_output>)"
I tried, the answers are as expected, that is:
I tried the test, it appears that the PBS is able to access the 24 processors:
#!/bin/sh
#
#PBS -N test
#PBS -q nano_h_p
#PBS -M fglaser.technion.ac.il
#PBS -mbea
#PBS -l select=2:ncpus=12:mpiprocs=12
#PBS -o pbs_output
mpirun -hostfile $PBS_NODEFILE $AMBERHOME/test/numprocs
This is the content of pbs_output
and for #PBS -l select=1:ncpus=12:mpiprocs=12
the output is:
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
12
For #PBS -l select=2:ncpus=12:mpiprocs=12
the output is:
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
24
I would appreciate your help to try and find the prblem.
thanks!!
Fabian
_______________________________
Fabian Glaser, PhD
Bioinformatics Knowledge Unit,
The Lorry I. Lokey Interdisciplinary
Center for Life Sciences and Engineering
Technion - Israel Institute of Technology
Haifa 32000, ISRAEL
fglaser.technion.ac.il
Tel: +972 4 8293701
Fax: +972 4 8225153
On Dec 16, 2012, at 4:48 PM, Jason Swails wrote:
> On Sun, Dec 16, 2012 at 2:21 AM, Fabian Glaser <fglaser.technion.ac.il>wrote:
>
>> Hi,
>>
>> I am using the following PBS file to run sander
>>
>> #PBS -l select=1:ncpus=12:mpiprocs=12
>> ...
>> mpirun -hostfile $PBS_NODEFILE pmemd.MPI -O -i prod.in -p
>> 3SO6_clean.prmtop -c 3SO6_clean_prod_1.rst -o 3SO6_clean_prod_2.out -x
>> 3SO6_clean_prod_2.mdcrd -r 3SO6_clean_prod_2.rst
>>
>> Which runs perfectly, at a rate of about ns/day = 3.67
>>
>> But if I try to use more than one node, for example:
>> #PBS -l select=2:ncpus=12:mpiprocs=12
>>
>> The job does not seem to start or at least output files are not writte...
>>
>> Is there a way to use more than one node? Or any way to accelerate the
>> process?
>>
>
> I have never had problems running on multiple nodes. Given the
> communication required on each step, though, unless your nodes have a fast
> interconnect (e.g., some type of infiniband) you will be better off just
> using 1 node if each node has 12 cores available, IMO.
>
> If you're really having a problem running on multiple nodes, the issue is
> probably somewhere in your system configuration or your MPI installation.
> Some systems may require you to set up password-less login between nodes
> using an ssh-key, since multi-node jobs need to send information between
> nodes. Since we just use the MPI API, the problem is highly unlikely to be
> Amber.
>
> I would suggest contacting your system administrator for this cluster with
> the problems you're having. If you want to test inter-node MPI with a very
> simple program, try running something like this:
>
> mpiexec -hostfile $PBS_NODEFILE $AMBERHOME/test/numprocs
>
> Which should just output the total number of processors you asked for to
> the PBS output file (#PBS -o <pbs_output>)
>
> Good luck,
> Jason
>
> --
> Jason M. Swails
> Quantum Theory Project,
> University of Florida
> Ph.D. Candidate
> 352-392-4032
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Dec 18 2012 - 03:30:03 PST