[AMBER] Running 606580 atoms system on Kraken

From: Hector Chang <chang52.purdue.edu>
Date: Tue, 11 Jan 2011 14:52:39 -0500

Dear Amber Users,

Please I want to ask about my molecular dynamics run in AMBER. I'm running a

production run under constant pressure (NPT), but my simulation gets canceled or

crashed after few seconds of running. My system contains around 600,000 atoms in

explicit solvent. I have tried "sander.MPI" and ''pmemd", and both gave me the

same problem. I ran the minimization and constant volume (NVT) where everything

is fine, but once I get to run NPT the system is canceled or crashed.


If it helps, this is my input file:

&cntrl

  imin = 0, irest = 1, ntx = 7,

  ntb = 2, pres0 = 1.0, ntp = 1,

  taup = 2.0,

  cut = 40.0, ntr = 1,

  ntc = 2, ntf = 2,

  tempi = 300.0, temp0 = 300.0,

  ntt = 3, gamma_ln = 1.0,
  nstlim = 50000, dt = 0.002, ig = -1,

  ntpr = 100, ntwx = 100, ntwr = 100

 /

Keep Struct fixed with weak restraints
10.0
ATOMS 986 20185
END
END

The following report is from amber expert in Teragrid:


 I've investigated this issue with Amber11. I've been unable to correct the

issue, although reproductions of the error are bountiful. All the segmentation

faults occur with the following stack trace.



Program terminated with signal 11, Segmentation fault.

#0 0x00000000006602d5 in free ()

(gdb) bt

#0 0x00000000006602d5 in free ()

#1 0x000000000049d972 in mol_list_mod_setup_molecule_lists_ () at

./mol_list.f90:224

#2 0x00000000004e3abd in pme_alltasks_setup_mod_pme_alltasks_setup_ () at

./pme_alltasks_setup.f90:126

#3 0x00000000004ca4ce in pmemd () at ./pmemd.f90:129

#4 0x0000000000400330 in main ()



This is definitely a memory allocation or freeing issue within the code. These

routines are the initial setup routines after all the input data is read and

broadcast to all processes. These routines build module lists which is

dependent on the number of atoms involved in the calculation. It is possible

that there are too many atoms.



However, I ran a benchmark case (Cellulose NPT) which includes 408609 atoms.

This is less than what the user is trying to run (606580 atoms); however, it is

not that far off. This benchmark ran successfully. I also ran an altered

version of this benchmark to closely resemble this users run (via options in the

input file). No option in the input file seems to be causing this problem

since this run was successful.



I tried running the code at various core counts 12 to 120. All runs reproduce

the segmentation fault.

I tried running the code using 1 core per socket. All runs reproduce the

segmentation fault.



I applied all known bugfixes to the code. The runs using this code still

reproduce the segmentation fault.



Using the sander.MPI executable instead of pmemd.MPI produces the following

error (not the segmentation fault). (12 cores)



NATOM = 606580 NTYPES = 23 NBONH = 586742 MBONA = 18645

 NTHETH = 800 MTHETA = 27640 NPHIH = 1345 MPHIA = 24420

 NHPARM = 0 NPARM = 0 NNB = 857864 NRES = 195911

 NBONA = 18645 NTHETA = 27640 NPHIA = 24420 NUMBND = 57

 NUMANG = 113 NPTRA = 33 NATYP = 40 NPHB = 1

 IFBOX = 1 NMXRS = 114 IFCAP = 0 NEXTRA = 0

 NCOPY = 0



Unreasonably large value for MAXPR: 0.15E+11



This value is too large when it is greater than or equal to 2^31 (will fit in a

signed 32 bit integer). This vault is calculated via several formula, but one

is [NATOM*(NATOM-1) ]/(2*MPI_processes). Once again, a possible link to the

number of atoms. However, I'm unable to get this case to run by using more

processors. The same error is presented.



Using the pmemd executable from amber/9, I receive the following error instead

of a segmentation fault. However, the format of files may have changed between

version 9 and 11.



| ERROR: Bad residue/molecule data in prmtop!

| Residue 34(atoms 618- 684) is in multiple molecules.



In conclusion, the problem is one of two things. 1) This bug occurs in the

Amber11 source code and is very specific to systems with large numbers of atoms.

2) The user's input is problematic in some way and the problem is specific to

the users input data.



To test this -- the user could produce for us a similar case which contains less

atoms (less than 408609 would be great!). If this is successful then it is most

likely item (1); however, if it is not successful then item (2) may be a

possibility. In either case, the user should check their input data for

consistency in order to eliminate item (2). If it is item (1), then a bug

report should be sent to the developers.


I want to ask if you have any solution for this.


Thanks,
Hector

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jan 11 2011 - 12:00:02 PST
Custom Search