[AMBER] hpc error

From: Damiano Spadoni <enxds6.nottingham.ac.uk>
Date: Tue, 18 Aug 2015 13:43:23 +0000

Dear Amber creators,

I am trying to run the following script, involving a production step of my simulation, on an hpc facility:

#!/bin/bash --login

#PBS -N S4F4_md
#PBS -l walltime=24:00:00
#PBS -l select=4
#PBS -j oe
#PBS -A e280-Croft

module load amber

cd /work/e280/e280/enxds6/SF4

aprun -n 96 sander.MPI -O -i S4F4_md.in -o S4F4_md1.out -p SF4.prmtop -c S4F4_heat.rst -r S4F4_md1.rst -x S4F4_md1.mdcrd
aprun -n 96 sander.MPI -O -i S4F4_md.in -o S4F4_md2.out -p SF4.prmtop -c S4F4_md1.rst -r S4F4_md2.rst -x S4F4_md2.mdcrd
aprun -n 96 sander.MPI -O -i S4F4_md.in -o S4F4_md3.out -p SF4.prmtop -c S4F4_md2.rst -r S4F4_md3.rst -x S4F4_md3.mdcrd
aprun -n 96 sander.MPI -O -i S4F4_md.in -o S4F4_md4.out -p SF4.prmtop -c S4F4_md3.rst -r S4F4_md4.rst -x S4F4_md4.mdcrd
aprun -n 96 sander.MPI -O -i S4F4_md.in -o S4F4_md5.out -p SF4.prmtop -c S4F4_md4.rst -r S4F4_md5.rst -x S4F4_md5.mdcrd
echo "DONE"

but I receive the same error message:

ModuleCmd_Load.c(226):ERROR:105: Unable to locate a modulefile for 'null'
ModuleCmd_Load.c(226):ERROR:105: Unable to locate a modulefile for 'null'
 partition error in shake on processor 2
 this processor has atoms 13055 through 19568
 atom 19568 is within this range
 atom 19569 is not within this range !
Rank 2 [Tue Aug 18 14:22:06 2015] [c4-2c2s8n3] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
Rank 0 [Tue Aug 18 14:22:06 2015] [c4-2c2s8n3] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0 #0 0xDD499D0x in DD499D_gfortran_backtrace in _gfortran_backtrace at backtrace.c:258
 at backtrace.c:258
#1 0xDBBAB0 in _gfortrani_backtrace_handler at compile_options.c:129
#1 0xDBBAB0 in _gfortrani_backtrace_handler at compile_options.c:129
#2 0xE3C43F in system
#3 0xE3C3EB in raise at pt-raise.c:41
#4 0xE4C8E0 in abort at abort.c:92
#5 0xD03EE1 in MPID_Abort
#6 0xCE3ED2 in MPI_Abort
#2 0xE3C43F in system
#3 0xE3C3EB in raise at pt-raise.c:41
#7 0xCBF1B4 in pmpi_abort
#4 0xE4C8E0 in abort at abort.c:92
#8 0x56EBB8 in mexit_
#5 0xD03EE1 in MPID_Abort
#9 0x4D7DCF in shake_
#10 0x4C19AD in runmd_
#6 0xCE3ED2 in MPI_Abort
#11 0x47EE31 in sander_
#7 0xCBF1B4 in pmpi_abort
#12 0x4783D5 in MAIN__ at multisander.F90:0
#8 0x56EBB8 in mexit_
#9 0x4D7DCF in shake_
#10 0x4C19AD in runmd_
#11 0x47EE31 in sander_
#12 0x4783D5 in MAIN__ at multisander.F90:0
_pmiu_daemon(SIGCHLD): [NID 04003] [c4-2c2s8n3] [Tue Aug 18 14:22:07 2015] PE RANK 0 exit signal Aborted
[NID 04003] 2015-08-18 14:22:07 Apid 16936164: initiated application termination
Application 16936164 exit codes: 134
Application 16936164 exit signals: Killed
Application 16936164 resources: utime ~2s, stime ~5s, Rss ~122380, inblocks ~277489, outblocks ~171369

  Error opening unit 30: File "S4F4_md1.rst" is missing or unreadable
Rank 0 [Tue Aug 18 14:22:15 2015] [c4-2c2s8n3] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0 0xDD499D in _gfortran_backtrace at backtrace.c:258
#1 0xDBBAB0 in _gfortrani_backtrace_handler at compile_options.c:129
#2 0xE3C43F in system
#3 0xE3C3EB in raise at pt-raise.c:41
#4 0xE4C8E0 in abort at abort.c:92
#5 0xD03EE1 in MPID_Abort
#6 0xCE3ED2 in MPI_Abort
#7 0xCBF1B4 in pmpi_abort
#8 0x56EBB8 in mexit_
#9 0x5518AC in amopen_
#10 0x5EFD7F in amoeba_check_newstyle_inpcrd_
#11 0x4FCE7B in load_ewald_info_
#12 0x4A644E in mdread1_
#13 0x47954E in sander_
#14 0x4783D5 in MAIN__ at multisander.F90:0
_pmiu_daemon(SIGCHLD): [NID 04003] [c4-2c2s8n3] [Tue Aug 18 14:22:16 2015] PE RANK 0 exit signal Aborted
[NID 04003] 2015-08-18 14:22:16 Apid 16936165: initiated application termination
Application 16936165 exit codes: 134
Application 16936165 exit signals: Killed
Application 16936165 resources: utime ~0s, stime ~4s, Rss ~12120, inblocks ~75764, outblocks ~171353

  Error opening unit 30: File "S4F4_md2.rst" is missing or unreadable
Rank 0 [Tue Aug 18 14:22:21 2015] [c4-2c2s8n3] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0 0xDD499D in _gfortran_backtrace at backtrace.c:258
#1 0xDBBAB0 in _gfortrani_backtrace_handler at compile_options.c:129
#2 0xE3C43F in system
#3 0xE3C3EB in raise at pt-raise.c:41
#4 0xE4C8E0 in abort at abort.c:92
#5 0xD03EE1 in MPID_Abort
#6 0xCE3ED2 in MPI_Abort
#7 0xCBF1B4 in pmpi_abort
#8 0x56EBB8 in mexit_
#9 0x5518AC in amopen_
#10 0x5EFD7F in amoeba_check_newstyle_inpcrd_
#11 0x4FCE7B in load_ewald_info_
#12 0x4A644E in mdread1_
#13 0x47954E in sander_
#14 0x4783D5 in MAIN__ at multisander.F90:0
_pmiu_daemon(SIGCHLD): [NID 04003] [c4-2c2s8n3] [Tue Aug 18 14:22:21 2015] PE RANK 0 exit signal Aborted
[NID 04003] 2015-08-18 14:22:21 Apid 16936166: initiated application termination
Application 16936166 exit codes: 134
Application 16936166 exit signals: Killed
Application 16936166 resources: utime ~0s, stime ~4s, Rss ~12120, inblocks ~75764, outblocks ~171353

  Error opening unit 30: File "S4F4_md3.rst" is missing or unreadable
Rank 0 [Tue Aug 18 14:22:26 2015] [c4-2c2s8n3] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0 0xDD499D in _gfortran_backtrace at backtrace.c:258
#1 0xDBBAB0 in _gfortrani_backtrace_handler at compile_options.c:129
#2 0xE3C43F in system
#3 0xE3C3EB in raise at pt-raise.c:41
#4 0xE4C8E0 in abort at abort.c:92
#5 0xD03EE1 in MPID_Abort
#6 0xCE3ED2 in MPI_Abort
#7 0xCBF1B4 in pmpi_abort
#8 0x56EBB8 in mexit_
#9 0x5518AC in amopen_
#10 0x5EFD7F in amoeba_check_newstyle_inpcrd_
#11 0x4FCE7B in load_ewald_info_
#12 0x4A644E in mdread1_
#13 0x47954E in sander_
#14 0x4783D5 in MAIN__ at multisander.F90:0
_pmiu_daemon(SIGCHLD): [NID 04003] [c4-2c2s8n3] [Tue Aug 18 14:22:27 2015] PE RANK 0 exit signal Aborted
[NID 04003] 2015-08-18 14:22:27 Apid 16936168: initiated application termination
Application 16936168 exit codes: 134
Application 16936168 exit signals: Killed
Application 16936168 resources: utime ~0s, stime ~4s, Rss ~12120, inblocks ~75764, outblocks ~171353

  Error opening unit 30: File "S4F4_md4.rst" is missing or unreadable
Rank 0 [Tue Aug 18 14:22:29 2015] [c4-2c2s8n3] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0 0xDD499D in _gfortran_backtrace at backtrace.c:258
#1 0xDBBAB0 in _gfortrani_backtrace_handler at compile_options.c:129
#2 0xE3C43F in system
#3 0xE3C3EB in raise at pt-raise.c:41
#4 0xE4C8E0 in abort at abort.c:92
#5 0xD03EE1 in MPID_Abort
#6 0xCE3ED2 in MPI_Abort
#7 0xCBF1B4 in pmpi_abort
#8 0x56EBB8 in mexit_
#9 0x5518AC in amopen_
#10 0x5EFD7F in amoeba_check_newstyle_inpcrd_
#11 0x4FCE7B in load_ewald_info_
#12 0x4A644E in mdread1_
#13 0x47954E in sander_
#14 0x4783D5 in MAIN__ at multisander.F90:0
_pmiu_daemon(SIGCHLD): [NID 04003] [c4-2c2s8n3] [Tue Aug 18 14:22:29 2015] PE RANK 0 exit signal Aborted
[NID 04003] 2015-08-18 14:22:29 Apid 16936169: initiated application termination
Application 16936169 exit codes: 134
Application 16936169 exit signals: Killed
Application 16936169 resources: utime ~0s, stime ~4s, Rss ~12120, inblocks ~75764, outblocks ~171353
DONE

All the requested files are in the correct directory.
>From the helpdesk of the hpc said might be an error in the SF4.prmtop file, but they are not sure about it.
It is the first time I am trying tu run this simulation on a cluster, I previously ran (just the first sander.MPI command) it on my machine and it worked, but I want to repeat this simulation on a cluster.
Any suggestions about something I'm probably missing?

Best
Damiano




This message and any attachment are intended solely for the addressee
and may contain confidential information. If you have received this
message in error, please send it back to me, and immediately delete it.

Please do not use, copy or disclose the information contained in this
message or in any attachment. Any views or opinions expressed by the
author of this email do not necessarily reflect the views of the
University of Nottingham.

This message has been checked for viruses but the contents of an
attachment may still contain software viruses which could damage your
computer system, you are advised to perform your own checks. Email
communications with the University of Nottingham may be monitored as
permitted by UK legislation.

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Aug 18 2015 - 07:00:03 PDT
Custom Search