Hi,
I've managed to rerun the minimization case and tee the output to a file
(it is strange that sometimes the cluster gives more information than
other times).
I think from the error message below that you are right, the number of
atoms seems to be smaller than the capacity although the error message
says that the capacity of NB pairs was exceeded, even if the numbers do
not confirm this. (in all error messages the limit -2424- is larger than
the actual numbers).
Or does this point to something else?
Thank you,
Ioana
Here is a more explicit error message from my minimization run:
Job 56060.lomax.nas.nasa.gov started on Wed Dec 18 17:47:11 PST 2002
mpirun -np 128 $AMBERHOME/exe/sander -O -i ./box20Amin.in -o ./box20Amin_128cpu.out -p ./box20A.prmtop -c ./box20A.prmcrd -r ./box20A_128cpu.restrt -ref ./box20A.prmcrd -inf ./box20A.128cpu.mdinfo
* NB pairs 171 2394 exceeds capacity ( 2424) 2
SIZE OF NONBOND LIST = 2424
EWALD BOMB in subroutine ewald_list
Non bond list overflow!
check MAXPR in locmem.f
* NB pairs 150 2347 exceeds capacity ( 2424) 13
SIZE OF NONBOND LIST = 2424
EWALD BOMB in subroutine ewald_list
Non bond list overflow!
check MAXPR in locmem.f
* NB pairs 159 2325 exceeds capacity ( 2424) 14
SIZE OF NONBOND LIST = 2424
EWALD BOMB in subroutine ewald_list
Non bond list overflow!
check MAXPR in locmem.f
* NB pairs 206 2360 exceeds capacity ( 2424) 67
SIZE OF NONBOND LIST = 2424
EWALD BOMB in subroutine ewald_list
Non bond list overflow!
check MAXPR in locmem.f
* NB pairs 137 2382 exceeds capacity ( 2424) 69
SIZE OF NONBOND LIST = 2424
EWALD BOMB in subroutine ewald_list
Non bond list overflow!
check MAXPR in locmem.f
* NB pairs 179 2310 exceeds capacity ( 2424) 79
SIZE OF NONBOND LIST = 2424
EWALD BOMB in subroutine ewald_list
Non bond list overflow!
check MAXPR in locmem.f
* NB pairs 211 2249 exceeds capacity ( 2424) 81
SIZE OF NONBOND LIST = 2424
EWALD BOMB in subroutine ewald_list
Non bond list overflow!
check MAXPR in locmem.f
* NB pairs 126 2395 exceeds capacity ( 2424) 86
SIZE OF NONBOND LIST = 2424
EWALD BOMB in subroutine ewald_list
Non bond list overflow!
check MAXPR in locmem.f
* NB pairs 185 2259 exceeds capacity ( 2424) 93
SIZE OF NONBOND LIST = 2424
EWALD BOMB in subroutine ewald_list
Non bond list overflow!
check MAXPR in locmem.f
* NB pairs 249 2273 exceeds capacity ( 2424) 95
SIZE OF NONBOND LIST = 2424
EWALD BOMB in subroutine ewald_list
Non bond list overflow!
check MAXPR in locmem.f
* NB pairs 209 2394 exceeds capacity ( 2424) 97
SIZE OF NONBOND LIST = 2424
EWALD BOMB in subroutine ewald_list
Non bond list overflow!
check MAXPR in locmem.f
* NB pairs 171 2299 exceeds capacity ( 2424) 98
SIZE OF NONBOND LIST = 2424
EWALD BOMB in subroutine ewald_list
Non bond list overflow!
check MAXPR in locmem.f
* NB pairs 145 2285 exceeds capacity ( 2424) 100
SIZE OF NONBOND LIST = 2424
EWALD BOMB in subroutine ewald_list
Non bond list overflow!
check MAXPR in locmem.f
* NB pairs 161 2400 exceeds capacity ( 2424) 111
SIZE OF NONBOND LIST = 2424
EWALD BOMB in subroutine ewald_list
Non bond list overflow!
check MAXPR in locmem.f
MPI: MPI_COMM_WORLD rank 2 has terminated without calling MPI_Finalize()
exit
____________________________________________________________________
On Wed, 18 Dec 2002, darden wrote:
> Dear Iona
> I'll guess that your problem is that you have too few atoms per processor.
> A 20A water box has probably 1000 or less atoms. That makes
> less than 10 atoms per processor. Unfortunately sander e.g. pme sander
> may have somewhere some hidden assumptions about system size. Thus rather
> than recognize its in trouble with some aspect of the simulation, it
> simply dies. PME should I think be fine with that many processors,
> although with more atoms.
> Hope this helps
> Tom Darden
> On Wed, 18 Dec 2002, Ioana Cozmuta wrote:
>
> > Hello,
> >
> > Thanks for the replies and sorry for not being more explicit. In the
> > output file this is the last thing it is written before the job stops.
> >
> > Here is my input file (it is for a cubic water box L=20A, using PBC and a
> > cutoff of 8 A.)
> >
> > Initial minimization of the water box, 20A, PBC, 8.0 cut
> > &cntrl
> > ntx = 1, irest = 0, ntxo = 1,
> > ntpr = 1,
> > ntf = 1, ntb = 1,
> > cut = 8.0, scee = 1.2,
> > ibelly = 0, ntr = 0,
> > imin = 1,
> > maxcyc = 300,
> > ncyc = 50,
> > ntmin = 1, dx0 = 0.1, dxm = 0.5, drms = 0.0001,
> > &end
> >
> > I assumed that amber will automatically use PME to calculate the
> > electrostatic energy (PME seems to be faster and more accurate than simple
> > Ewald) so I did not put anything explicit to ewald.
> >
> > Here is the command to submit the job:
> >
> > mpirun -np 128 $AMBERHOME/exe/sander -O -i ./box20Amin.in \
> > -o ./box20Amin_128cpu.out \
> > -p ./box20A.prmtop \
> > -c ./box20A.prmcrd \
> > -r ./box20A_128cpu.restrt \
> > -ref ./box20A.prmcrd \
> > -inf ./box20A.128cpu.mdinfo
> > \
> > Could this be a problem with the fact that PME is not designed to run on
> > 128 processors? What is the limit for PME in terms of CPU's?
> > What is the limit for normal ewald calculations in the same terms?
> >
> > Thank you,
> > Ioana
> >
> >
> >
> > On Wed, 18 Dec 2002, David A. Case wrote:
> >
> > > On Tue, Dec 17, 2002, Ioana Cozmuta wrote:
> > > >
> > > > APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
> > > > using 5000.0 points per unit in tabled values
> > > > TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff | CHECK
> > > > switch(x): max rel err = 0.2088E-14 at 2.598900 | CHECK d/dx switch(x):
> > > > max rel err = 0.7671E-11 at 2.757160
> > > > ---------------------------------------------------
> > >
> > > Above is normal, and is not an error message. From the info given, I
> > > can't tell why the job stops.
> > >
> > > ..dac
> > >
> > > --
> > >
> > > ==================================================================
> > > David A. Case | e-mail: case_at_scripps.edu
> > > Dept. of Molecular Biology, TPC15 | fax: +1-858-784-8896
> > > The Scripps Research Institute | phone: +1-858-784-9768
> > > 10550 N. Torrey Pines Rd. | home page:
> > > La Jolla CA 92037 USA | http://www.scripps.edu/case
> > > ==================================================================
> > >
> > >
> >
>
>
Received on Wed Dec 18 2002 - 18:02:26 PST