Re: [AMBER] Segmentation fault

From: sylvester kisembo <akiikius.yahoo.com>
Date: Fri, 21 Apr 2017 23:50:39 +0000 (UTC)

After running the job with a longer wall time than 30 minutes, the segmentation fault re-appeared. I have been advised by sys-admin to load cuda/8.0 with respect to the CUDA error you point out. Below is the full error message for the run with no cuda and no mpi. It will be followed by the error for the run with cuda and no mpi:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#0  0x2B38B0282337#1  0x2B38B028294E#2  0xProgram received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:381F23269F#3  0x585DA7 in __nblist_MOD_grid_ucell#0  0x2B09FF2BA337#1  0x#4  2B09FF2BA94E0x589D93 in __nblist_MOD_nonbond_list
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#2  0x381F23269F#5  0x7283F9 in force_#3  0x585DA7 in __nblist_MOD_grid_ucell#6  0x5369A4 in runmd_#4  0x589D93 in __nblist_MOD_nonbond_list#7  0x4ECB86 in sander_
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.#5  Program received signal SIGSEGV: Segmentation fault - invalid memory reference.0x7283F9Backtrace for this error: in force_
Backtrace for this error:#8  0x4E4753Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error: in MAIN__#6  0x5369A4 in runmd_
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__#0  0x2B70321CB337#1  0x2B70321CB94E#2  0x381F23269F#3  0x585DA7 in __nblist_MOD_grid_ucell
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.#4  Backtrace for this error:0x589D93 in __nblist_MOD_nonbond_list#5  0x#0  7283F9 in 0xforce_2B46D0434337
#1  0x2B46D043494E#2  0x381F23269F#6  0x5369A4 in runmd_#3  0x585DA7 in __nblist_MOD_grid_ucell#7  0x4ECB86 in sander_#4  0x589D93 in __nblist_MOD_nonbond_list#8  0x4E4753 in MAIN__#0  0x2B4301CBF337#1  0x2B4301CBF94E#2  0x381F23269F[comet-11-71.sdsc.edu:mpispawn_0][child_handler] MPI process (rank: 11, pid: 18509) terminated with signal 11 -> abort job#5  0x7283F9 in force_#6  0x5369A4 in runmd_#3  0x585DA7 in __nblist_MOD_grid_ucell#7  0x4ECB86 in sander_#0  0x2AF0C3F74337#1  0x#4  2AF0C3F7494E0x589D93 in __nblist_MOD_nonbond_list#2  0x381F23269F#8  #5  0x0x7283F9 in 4E4753force_ in MAIN__
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#3  0x585DA7 in __nblist_MOD_grid_ucell#6  0x5369A4 in runmd_#4  0x589D93 in __nblist_MOD_nonbond_list#7  0x4ECB86 in sander_#5  0x7283F9 in force_#0  0x2B5590CB4337#8  0x4E4753 in MAIN__#1  0x2B5590CB494E#2  0x381F23269F#6  0x5369A4 in runmd_#0  #3  0x0x2B18C1F16337585DA7 in __nblist_MOD_grid_ucell
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#1  0x2B18C1F1694E#2  0x381F23269F#7  0x4ECB86 in sander_#0  0x2B6B88152337#4  0x589D93 in __nblist_MOD_nonbond_list#8  0x4E4753 in MAIN__#1  0x2B6B8815294E#2  0x381F23269F#3  0x585DA7 in __nblist_MOD_grid_ucell#5  0x7283F9 in force_#4  0x589D93 in __nblist_MOD_nonbond_list#3  0x585DA7 in __nblist_MOD_grid_ucell#0  0x2AF27A6C6337#1  0x2AF27A6C694E#6  0x5369A4 in runmd_#2  0x381F23269F#5  0x7283F9 in force_#4  0x589D93 in __nblist_MOD_nonbond_list#7  0x4ECB86 in sander_#3  0x585DA7 in __nblist_MOD_grid_ucell#6  0x5369A4 in runmd_#8  0x4E4753 in MAIN__#5  0x7283F9 in force_#0  0x2AF0BBDC0337#1  0x2AF0BBDC094E#2  0x381F23269F#7  0x4ECB86 in #4  sander_0x589D93 in __nblist_MOD_nonbond_list#6  0x5369A4 in runmd_#8  0x4E4753 in MAIN__#5  0x7283F9 in force_#0  0x2B5F02F93337#3  0x585DA7 in __nblist_MOD_grid_ucell#1  0x2B5F02F9394E#7  0x4ECB86 in sander_#2  0x381F23269F#6  0x5369A4 in runmd_#4  #8  0x0x589D934E4753 in  in __nblist_MOD_nonbond_listMAIN__
#3  0x585DA7 in __nblist_MOD_grid_ucell#7  0x4ECB86 in sander_#5  0x7283F9 in force_#4  0x589D93 in __nblist_MOD_nonbond_list#8  0x4E4753 in MAIN__#6  0x5369A4 in runmd_
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#5  0x7283F9 in force_#7  0x4ECB86 in sander_#6  0x5369A4 in runmd_#8  0x4E4753 in MAIN__#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__#0  0x2B19FFFB3337#1  0x2B19FFFB394E#2  0x381F23269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#0  0x2B63AA463337#1  0x2B63AA46394E#2  0x381F23269F#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__#0  0x2AD2A9789337#1  0x2AD2A978994E#2  0x381F23269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#0  0x2AC81288A337#1  0x2AC81288A94E#2  0x381F23269F#5  0x7283F9 in force_#3  0x585DA7 in __nblist_MOD_grid_ucell#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#4  0x589D93 in __nblist_MOD_nonbond_list#8  0x4E4753 in MAIN__#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#0  0x2B01E4104337#1  0x2B01E410494E#2  0x381F23269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#0  0x2B9200E14337#1  0x2B9200E1494E#2  0x381F23269F#3  0x585DA7 in __nblist_MOD_grid_ucell#0  0x2B15CA4BA337#1  0x2B15CA4BA94E#2  0x381F23269F#4  0x589D93 in __nblist_MOD_nonbond_list#3  0x585DA7 in __nblist_MOD_grid_ucell#5  0x7283F9 in force_#4  0x589D93 in __nblist_MOD_nonbond_list#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#5  0x7283F9 in force_#8  0x4E4753 in MAIN__#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#0  0x2B34EA304337#1  0x2B34EA30494E#2  0x381F23269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#0  0x2ACBF9A19337#1  0x2ACBF9A1994E#2  0x381F23269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__#0  0x2B3FDF74C337#1  0x2B3FDF74C94E#2  0x381F23269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#0  0x2B6553D2B337#8  0x4E4753 in MAIN__#1  0x2B6553D2B94E#2  0x381F23269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#0  0x2ACD50C70337#1  0x2ACD50C7094E#2  0x381F23269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__[comet-11-71.sdsc.edu:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node comet-11-71 aborted: MPI process error (1)
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out1.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  
STOP PMEMD Terminated Abnormally!  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
  Unit    9 Error on OPEN: equil-out2.rst                                                                                                                                                                                                                                                  STOP PMEMD Terminated Abnormally!
For the run with cuda and no mpi below is the error message:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#0  0x2AFCC53D1337#1  0x2AFCC53D194E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#8  0x4E4753 in MAIN__#0  0x2B6DBA5B5337#1  0x2B6DBA5B594E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:[comet-18-58.sdsc.edu:mpispawn_0][child_handler] MPI process (rank: 14, pid: 13409) terminated with signal 11 -> abort job#0  0x2B5247D8C337#1  0x2B5247D8C94E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#0  0x2B496BADA337#8  0x4E4753 in MAIN__#1  0x2B496BADA94E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#0  0x2B388D201337#7  0x4ECB86 in sander_#1  0x2B388D20194E#2  0x3F8BC3269F
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#8  0x4E4753 in MAIN__#3  0x585DA7 in __nblist_MOD_grid_ucell
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__#0  0x2BA8C12F3337#1  0x2BA8C12F394E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#0  0x2B220C930337#1  0x2B220C93094E#2  0x3F8BC3269F#5  0x7283F9 in force_#0  #0  0x2B5A56A0E337#1  0x2B5A56A0E94E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell0x2B77EF9B6337Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#6  0x5369A4 in runmd_
#1  0x2B77EF9B694E#4  0x589D93 in __nblist_MOD_nonbond_list#3  0x585DA7 in __nblist_MOD_grid_ucell
#7  0x4ECB86 in sander_#2  0x3F8BC3269F#5  0x7283F9 in force_#4  0x589D93 in __nblist_MOD_nonbond_list#8  0x4E4753 in MAIN__#5  0x7283F9 in force_#3  0x585DA7 in __nblist_MOD_grid_ucell#6  0x5369A4 in runmd_#6  0x5369A4 in runmd_#7  #4  0x0x4ECB86589D93 in  in sander___nblist_MOD_nonbond_list
#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__#5  0x7283F9 in force_#6  0x5369A4 in runmd_#8  0x4E4753 in MAIN__#7  0x4ECB86 in sander_#0  0x2B4BA621C337#1  0x2B4BA621C94E#2  0x3F8BC3269F#8  0x4E4753 in MAIN__
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#3  0x585DA7 in __nblist_MOD_grid_ucell#0  0x2B9F29F3B337#1  0x2B9F29F3B94E#2  0x3F8BC3269F
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#4  0x589D93 in __nblist_MOD_nonbond_list#0  0x2AD9062F7337#1  0x2AD9062F794E#5  #3  0x0x7283F9585DA7 in  in force___nblist_MOD_grid_ucell
#2  0x3F8BC3269F#0  0x2B252613A337#6  0x5369A4 in runmd_#4  0x589D93 in __nblist_MOD_nonbond_list#1  0x2B252613A94E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell#5  0x7283F9 in force_#7  0x4ECB86 in sander_#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#8  0x4E4753 in MAIN__#6  0x5369A4 in runmd_#4  #5  0x0x589D937283F9 in  in __nblist_MOD_nonbond_listforce_
#7  0x4ECB86 in sander_#6  0x5369A4 in runmd_#5  0x7283F9 in force_#8  0x4E4753 in MAIN__#7  0x4ECB86 in sander_#6  0x5369A4 in runmd_#8  0x4E4753 in MAIN__#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__#0  0x2B87395B7337#1  0x2B87395B794E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__#0  0x2AEFAC1CD337#0  0x2B4EA1AE6337#1  0x2B4EA1AE694E#2  0x3F8BC3269F#1  0x2AEFAC1CD94E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#5  0x7283F9 in force_#6  0x5369A4 in runmd_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__#8  0x4E4753 in MAIN__
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#0  0x2BA07D6BB337#1  0x2BA07D6BB94E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__#0  0x2B521F61B337#1  0x2B521F61B94E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__#0  0x2B90CA5A8337#1  0x2B90CA5A894E#2  0x3F8BC3269F
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__#0  0x2B91F0F6C337#1  0x2B91F0F6C94E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#0  0x2B8DD69F6337#1  0x2B8DD69F694E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#0  0x2AFA24662337#1  0x2AFA2466294E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#0  0x2B5140787337#1  0x2B514078794E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:#0  0x2B659CD58337#1  0x2B659CD5894E#2  0x3F8BC3269F#3  0x585DA7 in __nblist_MOD_grid_ucell#4  0x589D93 in __nblist_MOD_nonbond_list#5  0x7283F9 in force_#6  0x5369A4 in runmd_#7  0x4ECB86 in sander_#8  0x4E4753 in MAIN__[comet-18-58.sdsc.edu:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node comet-18-58 aborted: MPI process error (1)/opt/amber/bin/pmemd.cuda: error while loading shared libraries: libcurand.so.8.0: cannot open shared object file: No such file or directory/opt/amber/bin/pmemd.cuda: error while loading shared libraries: libcurand.so.8.0: cannot open shared object file: No such file or directory/opt/amber/bin/pmemd.cuda: error while loading shared libraries: libcurand.so.8.0: cannot open shared object file: No such file or directory/opt/amber/bin/pmemd.cuda: error while loading shared libraries: libcurand.so.8.0: cannot open shared object file: No such file or directory/opt/amber/bin/pmemd.cuda: error while loading shared libraries: libcurand.so.8.0: cannot open shared object file: No such file or directory/opt/amber/bin/pmemd.cuda: error while loading shared libraries: libcurand.so.8.0: cannot open shared object file: No such file or directory[comet-18-58.sdsc.edu:mpispawn_0][child_handler] MPI process (rank: 0, pid: 13549) exited with status 127

    On Friday, April 21, 2017 9:49 AM, David Case <david.case.rutgers.edu> wrote:
 

 On Fri, Apr 21, 2017, sylvester kisembo wrote:

> I have been trying to get some runs going on the supercomputer
> GPUs. Specs below (i get similar out come on CPUs):

OK: if you are getting segfaults on the CPU, then use the CPU for debugging:
let's remove any dependence on GPUs.

Second: run short serial CPU runs: let's remove any dependence on MPI.

Third: where is the error occuring (minimization, dynamics?)  Can you create
a short test run on a serial CPU that illustrates the error?

> /opt/amber/bin/pmemd.cuda.MPI: error while loading shared
> libraries: libcurand.so.8.0: cannot open shared object file: No such
> file or directory

The sort of error above suggests that your LD_LIBRARY_PATH is not set
correctly.  It also shows that you are trying to run pmemd.cuda.MPI.
After trying the CPU ideas above (and if they work), *first* do a short
run (that you know works on a CPU) on a single GPU (i.e. use pmemd.cuda.)
It's possible that you are seeing an environment problem that is very simple
to fix, but you need to narrow down the problem first.

[Aside, to you and others: few problems benefit much by running
on multiple GPU's.  Be sure you can use pmemd.cuda itself first, and
understand the tradeoffs between running one simulation on 2 or more GPUs vs
running several simultaneous simulations, each on a single GPU. (Replica
exchange calculations are an expection here.)]

....dac



   
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Apr 21 2017 - 17:00:02 PDT
Custom Search