bash: module: line 1: syntax error: unexpected end of file bash: error importing function definition for `BASH_FUNC_module' bash: module: line 1: syntax error: unexpected end of file bash: error importing function definition for `BASH_FUNC_module' bash: module: line 1: syntax error: unexpected end of file bash: error importing function definition for `BASH_FUNC_module' -------------------------------------------------------------------------- WARNING: Open MPI will create a shared memory backing file in a directory that appears to be mounted on a network filesystem. Creating the shared memory backup file on a network file system, such as NFS or Lustre is not recommended -- it may cause excessive network traffic to your file servers and/or cause shared memory traffic in Open MPI to be much slower than expected. You may want to check what the typical temporary directory is on your node. Possible sources of the location of this temporary directory include the $TEMPDIR, $TEMP, and $TMP environment variables. Note, too, that system administrators can set a list of filesystems where Open MPI is disallowed from creating temporary files by setting the MCA parameter "orte_no_session_dir". Local host: quake20.cluster.net Filename: /SGE-tmp/586075.1.all.q/ompi.quake20.1072/jf.26458/1/shared_mem_cuda_pool.quake20 You can set the MCA paramter shmem_mmap_enable_nfs_warning to 0 to disable this message. -------------------------------------------------------------------------- [1575623180.662351] [quake20:20675:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623180.771565] [quake20:20675:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623181.111464] [quake20:20674:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623181.221717] [quake20:20674:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623181.737770] [quake15:20108:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623181.795044] [quake16:22174:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623181.802774] [quake16:22174:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623181.845594] [quake13:20045:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623181.887448] [quake15:20108:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623181.941982] [quake16:22173:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623181.949117] [quake16:22173:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623181.995937] [quake13:20045:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623182.092450] [quake15:20107:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623182.121360] [quake13:20044:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623182.201881] [quake15:20107:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) [1575623182.230565] [quake13:20044:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) Running multipmemd version of pmemd Amber18 Total processors = 8 Number of groups = 8 -------------------------------------------------------------------------- A process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: [[26458,1],4] (PID 22173) If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- [quake13.cluster.net:20028] 7 more processes have sent help message help-opal-shmem-mmap.txt / mmap on nfs [quake13.cluster.net:20028] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [quake13.cluster.net:20028] 5 more processes have sent help message help-opal-runtime.txt / opal_init:warn-fork [quake16:22174:0] Caught signal 11 (Segmentation fault) [quake13:20044:0] Caught signal 11 (Segmentation fault) [quake16:22173:0] Caught signal 11 (Segmentation fault) [quake15:20108:0] Caught signal 11 (Segmentation fault) [quake15:20107:0] Caught signal 11 (Segmentation fault) [quake13:20045:0] Caught signal 11 (Segmentation fault) [quake20:20675:0] Caught signal 11 (Segmentation fault) [quake20:20674:0] Caught signal 11 (Segmentation fault) ==== backtrace ==== 2 0x000000000005a50c mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:641 3 0x000000000005a67c mxm_error_signal_handler() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:616 ==== backtrace ==== 4 0x00000035b5232660 killpg() ??:0 5 0x00000000005a073f gpu_gb_ene_() ??:0 6 0x00000000004ca5b8 __gb_force_mod_MOD_gb_cph_ene() ??:0 7 0x000000000053cf41 __constantph_mod_MOD_cnstph_explicitmd() ??:0 8 0x000000000049d11f __runmd_mod_MOD_runmd() ??:0 9 0x00000000004ec59f MAIN__() pmemd.F90:0 10 0x00000000004ed39d main() ??:0 11 0x00000035b521ed1d __libc_start_main() ??:0 12 0x00000000004088a9 _start() ??:0 =================== 2 0x000000000005a50c mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:641 3 0x000000000005a67c mxm_error_signal_handler() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:616 4 0x00000035b5232660 killpg() ??:0 5 0x00000000005a073f gpu_gb_ene_() ??:0 6 0x00000000004ca5b8 __gb_force_mod_MOD_gb_cph_ene() ??:0 7 0x000000000053cf41 __constantph_mod_MOD_cnstph_explicitmd() ??:0 8 0x000000000049d11f __runmd_mod_MOD_runmd() ??:0 9 0x00000000004ec59f MAIN__() pmemd.F90:0 10 0x00000000004ed39d main() ??:0 11 0x00000035b521ed1d __libc_start_main() ??:0 12 0x00000000004088a9 _start() ??:0 =================== ==== backtrace ==== ==== backtrace ==== 2 0x000000000005a50c mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:641 3 0x000000000005a67c mxm_error_signal_handler() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:616 4 0x0000003f26632660 killpg() ??:0 5 0x00000000005a073f gpu_gb_ene_() ??:0 6 0x00000000004ca5b8 __gb_force_mod_MOD_gb_cph_ene() ??:0 7 0x000000000053cf41 __constantph_mod_MOD_cnstph_explicitmd() ??:0 8 0x000000000049d11f __runmd_mod_MOD_runmd() ??:0 9 0x00000000004ec59f MAIN__() pmemd.F90:0 2 0x000000000005a50c mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:641 3 0x000000000005a67c mxm_error_signal_handler() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:616 4 0x0000003f26632660 killpg() ??:0 5 0x00000000005a073f gpu_gb_ene_() ??:0 6 0x00000000004ca5b8 __gb_force_mod_MOD_gb_cph_ene() ??:0 7 0x000000000053cf41 __constantph_mod_MOD_cnstph_explicitmd() ??:0 8 0x000000000049d11f __runmd_mod_MOD_runmd() ??:0 9 0x00000000004ec59f MAIN__() pmemd.F90:0 10 0x00000000004ed39d main() ??:0 11 0x0000003f2661ed1d __libc_start_main() ??:0 12 0x00000000004088a9 _start() ??:0 =================== 10 0x00000000004ed39d main() ??:0 11 0x0000003f2661ed1d __libc_start_main() ??:0 12 0x00000000004088a9 _start() ??:0 =================== ==== backtrace ==== ==== backtrace ==== 2 0x000000000005a50c mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:641 3 0x000000000005a67c mxm_error_signal_handler() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:616 4 0x0000003259c32660 killpg() ??:0 5 0x00000000005a073f gpu_gb_ene_() ??:0 6 0x00000000004ca5b8 __gb_force_mod_MOD_gb_cph_ene() ??:0 7 0x000000000053cf41 __constantph_mod_MOD_cnstph_explicitmd() ??:0 8 0x000000000049d11f __runmd_mod_MOD_runmd() ??:0 9 0x00000000004ec59f MAIN__() pmemd.F90:0 10 0x00000000004ed39d main() ??:0 11 0x0000003259c1ed1d __libc_start_main() ??:0 12 0x00000000004088a9 _start() ??:0 =================== 2 0x000000000005a50c mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:641 3 0x000000000005a67c mxm_error_signal_handler() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:616 4 0x0000003259c32660 killpg() ??:0 5 0x00000000005a073f gpu_gb_ene_() ??:0 6 0x00000000004ca5b8 __gb_force_mod_MOD_gb_cph_ene() ??:0 7 0x000000000053cf41 __constantph_mod_MOD_cnstph_explicitmd() ??:0 8 0x000000000049d11f __runmd_mod_MOD_runmd() ??:0 9 0x00000000004ec59f MAIN__() pmemd.F90:0 10 0x00000000004ed39d main() ??:0 11 0x0000003259c1ed1d __libc_start_main() ??:0 12 0x00000000004088a9 _start() ??:0 =================== ==== backtrace ==== ==== backtrace ==== 2 0x000000000005a50c mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:641 3 0x000000000005a67c mxm_error_signal_handler() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:616 4 0x0000003d66232660 killpg() ??:0 5 0x00000000005a073f gpu_gb_ene_() ??:0 6 0x00000000004ca5b8 __gb_force_mod_MOD_gb_cph_ene() ??:0 7 0x000000000053cf41 __constantph_mod_MOD_cnstph_explicitmd() ??:0 8 0x000000000049d11f __runmd_mod_MOD_runmd() ??:0 9 0x00000000004ec59f MAIN__() pmemd.F90:0 10 0x00000000004ed39d main() ??:0 2 0x000000000005a50c mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:641 3 0x000000000005a67c mxm_error_signal_handler() /var/tmp/OFED_topdir/BUILD/mxm-3.5.3093/src/mxm/util/debug/debug.c:616 4 0x0000003d66232660 killpg() ??:0 5 0x00000000005a073f gpu_gb_ene_() ??:0 6 0x00000000004ca5b8 __gb_force_mod_MOD_gb_cph_ene() ??:0 7 0x000000000053cf41 __constantph_mod_MOD_cnstph_explicitmd() ??:0 8 0x000000000049d11f __runmd_mod_MOD_runmd() ??:0 9 0x00000000004ec59f MAIN__() pmemd.F90:0 10 0x00000000004ed39d main() ??:0 11 0x0000003d6621ed1d __libc_start_main() ??:0 12 0x00000000004088a9 _start() ??:0 =================== 11 0x0000003d6621ed1d __libc_start_main() ??:0 12 0x00000000004088a9 _start() ??:0 =================== -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- [quake20.cluster.net:20669] PMIX ERROR: NO-PERMISSIONS in file gds_dstore.c at line 702 [quake20.cluster.net:20669] PMIX ERROR: NO-PERMISSIONS in file gds_dstore.c at line 711 [quake15.cluster.net:20102] PMIX ERROR: NO-PERMISSIONS in file gds_dstore.c at line 702 [quake16.cluster.net:22168] PMIX ERROR: NO-PERMISSIONS in file gds_dstore.c at line 702 [quake15.cluster.net:20102] PMIX ERROR: NO-PERMISSIONS in file gds_dstore.c at line 711 [quake16.cluster.net:22168] PMIX ERROR: NO-PERMISSIONS in file gds_dstore.c at line 711 -------------------------------------------------------------------------- mpirun noticed that process rank 3 with PID 20108 on node quake15 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- [quake13.cluster.net:20028] PMIX ERROR: NO-PERMISSIONS in file gds_dstore.c at line 702 [quake13.cluster.net:20028] PMIX ERROR: NO-PERMISSIONS in file gds_dstore.c at line 711