[AMBER] Segmentation fault

From: sylvester kisembo <akiikius.yahoo.com>
Date: Fri, 21 Apr 2017 13:35:04 +0000 (UTC)

Hi all,
I have been trying to get some runs going on the supercomputer GPUs. Specs below (i get similar out come on CPUs):
NVIDIA Maxwell K80 GPU Nodes
1. Node count: 362. CPU cores: GPUs/node 24:43. CPU:GPU DRAM/node: 128 GB:40 GB
However, i get the following error message (this is truncated because of the length):
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Backtrace for this error:
Backtrace for this error:#0  0x2B71C0FD7337#1  0x2B71C0FD794E#2  0x3C3723269F#0  0x2B4D739D5337#0  0x2AD7BE157337#1  0x2B4D739D594E#0  #0  0x0x2B2362D503372B21910CD337
#1  #0  0x2AD7BE15794E0x2B224416D337#00x#2  #0  2B0A5CEBB3370x#0  0x3C3723269F#0  2B604CB313370x0x#0  2B54F1AF8337#1  #1
2B2AD10EA3370x0x0x2B880BD65337#22B21910CD94E0x2B2362D5094E#1
3C3723269F0x#02B224416D94E0x#1  2B8FEB9863370x2B0A5CEBB94E#1  #1  0x0x#0  2B604CB3194E2B54F1AF894E0x
[comet-25-70.sdsc.edu:mpispawn_1][readline] Unexpected End-Of-File on file descriptor 24. MPI process died?[comet-25-70.sdsc.edu:mpispawn_1][mtpmi_processops] Error while reading PMI socket. MPI process died?[comet-25-68.sdsc.edu:mpispawn_0][child_handler] MPI process (rank: 11, pid: 23565) terminated with signal 11 -> abort job[comet-25-70.sdsc.edu:mpispawn_1][child_handler] MPI process (rank: 26, pid: 6930) terminated with signal 11 -> abort job[comet-25-68.sdsc.edu:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node comet-25-68 aborted: Error while reading a PMI socket (4)/opt/amber/bin/pmemd.cuda.MPI: error while loading shared libraries: libcurand.so.8.0: cannot open shared object file: No such file or directory/opt/amber/bin/pmemd.cuda.MPI: error while loading shared libraries: libcurand.so.8.0: cannot open shared object file: No such file or directory/opt/amber/bin/pmemd.cuda.MPI: error while loading shared libraries: libcurand.so.8.0: cannot open shared object file: No such file or directory[comet-25-68.sdsc.edu:mpispawn_0][child_handler] MPI process (rank: 1, pid: 23756) exited with status 127[comet-25-68.sdsc.edu:mpispawn_0][child_handler] MPI process (rank: 6, pid: 23761) exited with status 127[comet-25-68.sdsc.edu:mpispawn_0][child_handler] MPI process (rank: 2, pid: 23757) exited with status 127
This is my minimization file (min1.in):
minimize structure &cntrl        imin=1,maxcyc=20000, ntmin=1, ncyc=5000,        ntb=1, cut=8, ntwx=500, ioutfm=1,iwrap=1,  ntr=1, ntwprt=0, restraintmask=':1', restraint_wt=2.0, / &ewald /

This is my 2nd minimization file (min2.in):
minimize structure &cntrl        imin=1,maxcyc=50000, ntmin=1, ncyc=5000,iwrap=1,        ntwprt=0, ntb=1, cut=8, ntwx=500, ioutfm=1, / &ewald/


These are my equilibration files:
equil1.in
CZRA : equilibration &cntrl  nstlim=100000, dt=0.002,ntx=1,irest=0,ntpr=500,ntwr=5000,ntwx=5000,  tempi=0, temp0=300.0, ntt=3, ig=-1, imin=0,  ntb=1, cut=8, iwrap=1, ntp=0,  ntc=2, ntf=2, gamma_ln = 2.0,  ioutfm=1,ntr=1, restraintmask=':1', restraint_wt=2.0,nmropt=1 / &wt TYPE='TEMP0', istep1=0, istep2=100000,  value1=0, value2=300.0, / &wt TYPE='END' /
equil2.in
CZRA : equilibration &cntrl  nstlim=100000, dt=0.002,ntx=1,irest=0,ntpr=500,ntwr=5000,ntwx=5000,  tempi=0, temp0=300.0, ntt=3, ig=-1, imin=0,  ntb=1, cut=8, iwrap=1, ntp=0,  ntc=2, ntf=2, gamma_ln = 2.0,  ioutfm=1,ntr=1, restraintmask=':1', restraint_wt=2.0,nmropt=1 / &wt TYPE='TEMP0', istep1=0, istep2=100000,  value1=0, value2=300.0, / &wt TYPE='END' /[stumusii.comet-ln2 min_eq_prod_5]$ cat equil2.in CZRA : equilibration &cntrl  nstlim=500000, dt=0.002,ntx=7,irest=1,ntpr=1000,ntwx=1000,  tempi=300.0, temp0=300.0, ntt=3, imin=0, ntwv=-1,  ntb=2, cut=8,ig=-1,ntwr=1000,  pres0 = 1.0, ntp = 1, iwrap=1,  taup = 2.0, ig=-1,  ntc=2, ntf=2, gamma_ln = 2.0,  ioutfm=1, / &ewald /
This the production file:
CZRA : equilibration &cntrl  nstlim=10000000, dt=0.002, ntx=5, irest=1, ntpr=1000, ntwx=10000,  tempi=300.0, temp0=300.0, ntt=3, imin=0, ntwv=-1,  ntb=2, cut=8, ig=-1, ntwr=1000, ntwprt=0,  pres0 = 1.0, ntp=1, iwrap=1,  taup = 2.0, barostat=2,  ntc=2, ntf=2, gamma_ln = 2.0,  ioutfm=1, / &ewald /
This is the mdinfo output:
NSTEP =        0   TIME(PS) =       0.000  TEMP(K) =     0.00  PRESS =     0.0 Etot   =   -139172.1824  EKtot   =         0.0000  EPtot      =   -139172.1824 BOND   =         4.9660  ANGLE   =        16.7275  DIHED      =       162.8560 1-4 NB =        38.1430  1-4 EEL =       -79.6157  VDWAALS    =     34613.3061 EELEC  =   -173928.5652  EHBOND  =         0.0000  RESTRAINT  =         0.0000 Ewald error estimate:   0.3256E-03 NMR restraints: Bond =    0.000   Angle =     0.000   Torsion =     0.000===============================================================================

This is the final result of the 2nd minimization file:
 FINAL RESULTS


   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER  13397      -1.2770E+05     3.4370E-03     2.2172E-01     H15       131
 BOND    =    11477.9062  ANGLE   =       16.7275  DIHED      =      162.8560 VDWAALS =    34613.3061  EEL     =  -173928.5652  HBOND      =        0.0000 1-4 VDW =       38.1430  1-4 EEL =      -79.6157  RESTRAINT  =        0.0000
--------------------------------------------------------------------------------   5.  TIMINGS--------------------------------------------------------------------------------
|                Build the list             0.11 ( 3.40% of List )|                Other                      3.12 (96.60% of List )|             List time                  3.23 ( 2.49% of Nonbo)|                   Short_ene time            49.47 (79.59% of Direc)|                   Other                     12.69 (20.41% of Direc)|                Direct Ewald time         62.16 (49.22% of Ewald)|                Adjust Ewald time          0.42 ( 0.33% of Ewald)|                Self Ewald time            0.02 ( 0.02% of Ewald)|                   Fill Bspline coeffs        3.78 ( 8.53% of Recip)|                   Fill charge grid           2.31 ( 5.21% of Recip)|                   Scalar sum                 2.91 ( 6.57% of Recip)|                   Grad sum                   3.58 ( 8.08% of Recip)|                      FFT back comm time        23.56 (75.81% of FFT t)|                      Other                      7.52 (24.19% of FFT t)|                   FFT time                  31.07 (70.07% of Recip)|                   Other                      0.69 ( 1.55% of Recip)|                Recip Ewald time          44.34 (35.11% of Ewald)|                Force Adjust              13.03 (10.31% of Ewald)|                Virial junk                6.23 ( 4.93% of Ewald)|                Start synchronizatio       0.02 ( 0.02% of Ewald)|                Other                      0.08 ( 0.06% of Ewald)|             Ewald time               126.30 (97.49% of Nonbo)|             Other                      0.02 ( 0.01% of Nonbo)|          Nonbond force            129.55 (80.78% of Force)|          Bond/Angle/Dihedral        0.39 ( 0.24% of Force)|          FRC Collect time          21.28 (13.27% of Force)|          Other                      9.16 ( 5.71% of Force)|       Force time               160.37 (100.0% of Runmd)|    Runmd Time               160.37 (72.41% of Total)|    Other                     61.08 (27.58% of Total)| Total time               221.46 (100.0% of ALL  )
| Highest rstack allocated:      89990| Highest istack allocated:       2452|           Job began  at 05:48:29.038  on 04/21/2017|           Setup done at 05:48:29.281  on 04/21/2017|           Run   done at 05:52:10.501  on 04/21/2017|     wallclock() was called  589556 times
I appreciate any input. If anyone needs .prmtop .inpcrd (co-ordinate) files or other info please let me know and i can email you a compressed file.
Thanks!
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Apr 21 2017 - 07:00:03 PDT
Custom Search