Error running AMBER6 on Beowulf cluster

From: <arubin_at_unmc.edu>
Date: Thu 21 Nov 2002 17:10:39 -0600

Dear Amber users,

    We ran into a problem with MD simulation using AMBER6 on the Beowulf
  cluster (RedHat, Myranet, PG compiler). To run an MPI job on 8 processors
  we used "mpirun.ch_gm" script. Calculation stops abnormally. Could you
  help us to find out what is going on? If anyone has some idea? I am
  attaching the output file and error message(see below).
  ********************************************************************
  # message - sander_06135:
  Atom division among processors for gb:
  | 0 1402 2804 4206 5608 7010 8412 9814
  | 11220
  | Running AMBER/MPI version on 8 nodes


       Sum of charges from parm topology file = 0.00000000
       Forcing neutrality...
   ---------------------------------------------------
   APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
   using 5000.0 points per unit in tabled values
   TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
   ---------------------------------------------------
   APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
   using 5000.0 points per unit in tabled values
   TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
   ---------------------------------------------------
   APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
   using 5000.0 points per unit in tabled values
   TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
  | Atom division among processors:
  | 0 1404 2805 4206 5610 7011 8412 9816
  | 11220
  | Atom division among processors for gb:
  | 0 1402 2804 4206 5608 7010 8412 9814
  | 11220
  | Running AMBER/MPI version on 8 nodes


       Sum of charges from parm topology file = 0.00000000
       Forcing neutrality...
   ---------------------------------------------------
   APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
   using 5000.0 points per unit in tabled values
   TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
   ---------------------------------------------------
   APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
   using 5000.0 points per unit in tabled values
   TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
   ---------------------------------------------------
   APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
   using 5000.0 points per unit in tabled values
   TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
  | Atom division among processors:
  | 0 1404 2805 4206 5610 7011 8412 9816
  | 11220
  | Atom division among processors for gb:
  | 0 1402 2804 4206 5608 7010 8412 9814
  | 11220
  | Running AMBER/MPI version on 8 nodes


       Sum of charges from parm topology file = 0.00000000
       Forcing neutrality...
   ---------------------------------------------------
   APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
   using 5000.0 points per unit in tabled values
   TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
  | CHECK switch(x): max rel err = 0.3242E-14 at 2.436720
  | CHECK d/dx switch(x): max rel err = 0.8064E-11 at 2.761360
   ---------------------------------------------------
       Total number of mask terms = 12578
       Total number of mask terms = 25156
  | CHECK switch(x): max rel err = 0.3242E-14 at 2.436720
  | CHECK d/dx switch(x): max rel err = 0.8064E-11 at 2.761360
   ---------------------------------------------------
  ???????????????????
   ---------------------------------------------------
       Total number of mask terms = 12578
       Total number of mask terms = 25156
  | CHECK switch(x): max rel err = 0.3242E-14 at 2.436720
  | CHECK d/dx switch(x): max rel err = 0.8064E-11 at 2.761360
   ---------------------------------------------------
       Total number of mask terms = 12578
       Total number of mask terms = 25156
  | Total Ewald setup time = 0.14000000

  ------------------------------------------------------------------------------

  | Total Ewald setup time = 0.15000000| Total Ewald setup time =
  0.19000000|
  Total Ewald setup time = 0.19000000

  Total Ewald setup time = 0.20000000| Total Ewald setup time =
  0.22000000|
  Total Ewald setup time = 0.19000000

  Unit 7 Error on OPEN:
  [7] MPI Abort by user Aborting program !
  [7] Aborting program!

  [4] MPI Abort by user Aborting program !
  [4] Aborting program!
  done




  *****************************************************************************
  # output file:
  mdrest_5.out

            -------------------------------------------------------
            Amber 6 SANDER Scripps/UCSF 1999
            -------------------------------------------------------

  | Fri Oct 18 13:38:37 2002


  File Assignments:
  |MDIN : mdrst_5.in
  |MDOUT: mdrst_5.out
  |INPCR: mdrst_4_ahg21_W.xyz
  |PARM : ahg21_W.top
  |RESTR: mdrst_5_ahg21_W.xyz
  |REFC : mdrst_4_ahg21_W.xyz
  |MDVEL: mdvel
  |MDEN : mden
  |MDCRD: mdrst_5_ahg21_W.traj
  |MDINF: test.info


   Here is the input file:

  MD run(mdrst_5.in) for P=const (ntb=2)with force constant 10.0 kcal/mol
   &cntrl
      imin=0, irest=1, ntx=7,
      ntt=1, tempi=283.0, temp0=283.0, tautp=2.0,
      ntb=2, ntp=1,
      ntc=2, tol=0.000001,
      scee=1.2, cut=9.0,
      ntwx=100, ntpr=100,
      nstlim=10000,
      ntr=1,
   &end
  Group input for restrained atoms (Harmonic rest. on solute coord.)
   10.0
  RES 1 21
  END
  END
   1. RESOURCE USE:

   getting box info from bottom of parm
   getting new box info from bottom of inpcrd
  | peek_ewald_inpcrd: Box info found

     EWALD SPECIFIC INPUT:

  | Using the T3D specific (FFT3D0) Fast Fourier Transform
   -------------------------------------------------
   NO EWALD INPUT FOUND: USING DEFAULTS
   -------------------------------------------------
       Largest sphere to fit in unit cell has radius = 18.431
       Calculating ew_coeff from dsum_tol,cutoff
       Box X = 78.705 Box Y = 36.863 Box Z = 38.447
       Alpha = 90.000 Beta = 90.000 Gamma = 90.000
       NFFT1 = 80 NFFT2 = 36 NFFT3 = 40
       Cutoff= 9.000 Tol =0.100E-04
       Ewald Coefficient = 0.30768
       Interpolation order = 4

   NATOM = 11220 NTYPES = 16 NBONH = 11068 MBONA = 158
   NTHETH = 334 MTHETA = 222 NPHIH = 687 MPHIA = 473
   NHPARM = 0 NPARM = 0 NNB = 16230 NRES = 3662
   NBONA = 158 NTHETA = 222 NPHIA = 473 NUMBND = 34
   NUMANG = 67 NPTRA = 34 NATYP = 27 NPHB = 4
   IFBOX = 1 NMXRS = 41 IFCAP = 0


     EWALD MEMORY USE:

  | Total heap storage needed = 1161
  | Adjacent nonbond minimum mask = 16230
  | Max number of pointers = 25
  | List build maxmask = 32460
  | Maximage = 16418

     EWALD LOCMEM POINTER OFFSETS
  | Real memory needed by PME = 1161
  | Size of EEDTABLE = 20768
  | Real memory needed by EEDTABLE = 83072
  | Integer memory needed by ADJ = 32460
  | Integer memory used by local nonb= 271519
  | Real memory used by local nonb = 183894

  | MAX NONBOND PAIRS = 5000000

  | Memory Use Allocated Used
  | Real 2500000 784668
  | Hollerith 600000 70984
  | Integer 2000000 561701

  | Max Nonbonded Pairs: 5000000

       BOX TYPE: RECTILINEAR

    2. CONTROL DATA FOR THE RUN


       TIMLIM= 999999. IREST = 1 IBELLY= 0
       IMIN = 0
       IPOL = 0

       NTX = 7 NTXO = 1
       IG = 71277 TEMPI = 283.00 HEAT = 0.000

       NTB = 2 BOXX = 78.705
       BOXY = 36.863 BOXZ = 38.447

       NTT = 1 TEMP0 = 283.000
       DTEMP = 0.000 TAUTP = 2.000
       VLIMIT= 0.000

       NTP = 1 PRES0 = 1.000 COMP = 44.600
       TAUP = 0.200 NPSCAL= 0

       NTCM = 0 NSCM = 9999999

       NSTLIM=10000 NTU = 1
       T = 0.000 DT = 0.00100

       NTC = 2 TOL = 0.00000 JFASTW = 0

       NTF = 1 NSNB = 25

       CUT = 9.000 SCNB = 2.000
       SCEE = 1.200 DIELC = 1.000

       NTPR = 100 NTWR = 50 NTWX = 100
       NTWV = 0 NTWE = 0 IOUTFM= 0
       NTWPRT= 0 NTWPR0= 0 NTAVE= 0

       NTR = 1 NTRX = 1
       TAUR = 0.00000 NMROPT= 0 PENCUT= 0.10000

       IVCAP = 0 MATCAP= 0 FCAP = 1.500

     OTHER DATA:

       IFCAP = 0 NATCAP= 0 CUTCAP= 0.000
       XCAP = 0.000 YCAP = 0.000 ZCAP = 0.000

       VRAND= 0

       NATOM = 11220 NRES = 3662

       Water definition for fast triangulated model:
       Resname = WAT ; Oxygen_name = O ; Hyd1_name = H1 ; Hyd2_name = H2
  | PLEVEL = 1: runmd parallelization, no EKCMR

      LOADING THE CONSTRAINED ATOMS AS GROUPS

    5. REFERENCE ATOM COORDINATES

      ----- READING GROUP 1; TITLE:
   Group input for restrained atoms (Harmonic rest. on solute coord.)

       GROUP 1 HAS HARMONIC CONSTRAINTS 10.00000
   GRP 1 RES 1 TO 21
        Number of atoms in this group = 297
      ----- END OF GROUP READ -----

     3. ATOMIC COORDINATES AND VELOCITIES

       Largest sphere to fit in unit cell has radius = 18.431
   NEW EWALD BOX PARAMETERS from inpcrd file:
       A = 78.70522 B = 36.86298 C = 38.44706

       ALPHA = 90.00000 BETA = 90.00000 GAMMA = 90.00000

   begin time read from input coords = 50.000 ps

   Number of triangulated 3-point waters found: 3641
  | Atom division among processors:
  | 0 1404 2805 4206 5610 7011 8412 9816
  | 11220
  | Atom division among processors for gb:
  | 0 1402 2804 4206 5608 7010 8412 9814
  | 11220
  | Running AMBER/MPI version on 8 nodes


       Sum of charges from parm topology file = 0.00000000
       Forcing neutrality...
   ---------------------------------------------------
   APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
   using 5000.0 points per unit in tabled values
   TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
  | CHECK switch(x): max rel err = 0.3242E-14 at 2.4367

  ***************************************************************

  I very sorry for such big file.
  Thanks a lot,

Alexander Rubinshtein, Ph.D.
UNMC Eppley Cancer Center
Molecular Modeling Core Facility
_________________________________
University of Nebraska Medical Center
986805 Nebraska Medical Center
Omaha, Nebraska 68198-6805
USA
Office: (402) 559-7809
Fax: (402) 559-4651
E-mail: arubin_at_unmc.edu
WWW: http://www.unmc.edu/Eppley
Received on Thu Nov 21 2002 - 15:10:39 PST
Custom Search