Re: message during minimization from David A. Case on 2002-12-19 (Amber Archive Dec 2002)

From: David A. Case <case_at_scripps.edu>
Date: Thu 19 Dec 2002 08:05:00 -0800

On Wed, Dec 18, 2002, Ioana Cozmuta wrote:

[problem was trying to run minimization on 128 processors]

> Here is a more explicit error message from my minimization run:
>
> Job 56060.lomax.nas.nasa.gov started on Wed Dec 18 17:47:11 PST 2002
> mpirun -np 128 $AMBERHOME/exe/sander -O -i ./box20Amin.in -o ./box20Amin_128cpu.out -p ./box20A.prmtop -c ./box20A.prmcrd -r ./box20A_128cpu.restrt -ref ./box20A.prmcrd -inf ./box20A.128cpu.mdinfo
>
> * NB pairs 171 2394 exceeds capacity ( 2424) 2
> SIZE OF NONBOND LIST = 2424
> EWALD BOMB in subroutine ewald_list
> Non bond list overflow!
> check MAXPR in locmem.f
>

OK...here's what is happening: Amber assumes that the nonbonded list can
be pretty equally distributed among all processors. With a large number
of CPU's, the "granularity" becomes big enough, so that the algorithm for
the division no longer works. In your case, the assumed size of the nonbonded
list for each processor is very small (only 2424 elements), but some
processors require more than this.

Go into locmem.f, search for where MAXPR is calculated, and increase its
estimate; you could easily give each processor 10 times as big a value,
and that should get you going.

..good luck...dac

-- 
==================================================================
David A. Case                     |  e-mail:      case_at_scripps.edu
Dept. of Molecular Biology, TPC15 |  fax:          +1-858-784-8896
The Scripps Research Institute    |  phone:        +1-858-784-9768
10550 N. Torrey Pines Rd.         |  home page:                   
La Jolla CA 92037  USA            |    http://www.scripps.edu/case
==================================================================

Received on Thu Dec 19 2002 - 08:05:00 PST