Is it possible to submit Amber job on a cluster as a Slurm job array? I
was trying to do it with MMPBSA.py.MPI and my jobs partially failed. I
don't really understand why only some of them failed. Also, each time I
try relaunching my jobs, the error messages appear to be quite
different. I'm using the following script to analyse couple of ns of my
MD simulation.
My batch script:
#!/bin/bash
#SBATCH --array=48-62
#SBATCH --job-name=mmpbsa
#SBATCH --ntasks-per-node=20
#SBATCH --time=24:00:00
NUMBER=$SLURM_ARRAY_TASK_ID
mpirun -np 20 MMPBSA.py.MPI -O \
-i mmpbsa.in \
-o bss-fum-tol-${NUMBER}ns.dat \
-sp com-bss-fum-tol-wat.prmtop \
-cp com-bss-fum-tol.prmtop \
-rp rec-bss-fum.prmtop \
-lp lig-toluen.prmtop \
-y $SCRATCH/20.03.18-30_md/bss-fum-tol-md${NUMBER}.crd
Some (truncated) error messages:
forrtl: severe (24): end-of-file during read, unit 24, file
_MMPBSA_complex.mdcrd.19
Image PC Routine Line Source
libifcore.so.5 00002AF4B5C02947 Unknown Unknown Unknown
libifcore.so.5 00002AF4B5C3BA33 Unknown Unknown Unknown
sander 000000000050679B Unknown Unknown Unknown
sander 0000000000503263 Unknown Unknown Unknown
sander 00000000004F81D2 Unknown Unknown Unknown
sander 000000000046A4CE Unknown Unknown Unknown
libc.so.6 00002AF4B6B4E3D5 Unknown Unknown Unknown
sander 000000000046A3C9 Unknown Unknown Unknown
...
CalcError: /net/software/local/amber/amber16/bin/sander failed with
prmtop com-bss-fum-tol.prmtop!
Error occured on rank 0.
Exiting. All files have been retained.
self.prmtop))
CalcError: /net/software/local/amber/amber16/bin/sander failed with
prmtop com-bss-fum-tol.prmtop!
Error occured on rank 4.
Exiting. All files have been retained.
calc.run(rank, stdout=stdout, stderr=stderr)
File
"/net/software/local/amber/amber16/lib/python2.7/site-packages/MMPBSA_mods/calculation.py",
line 157, in run
self.prmtop))
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
calc.run(rank, stdout=stdout, stderr=stderr)
File
"/net/software/local/amber/amber16/lib/python2.7/site-packages/MMPBSA_mods/calculation.py",
line 157, in run
self.prmtop))
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 4
==============================================
Beginning PB calculations with /net/software/local/amber/amber16/bin/sander
calculating complex contribution...
calculating receptor contribution...
calculating ligand contribution...
File "/net/software/local/amber/amber16/bin/MMPBSA.py.MPI", line 108,
in <module>
app.parse_output_files()
File
"/net/software/local/amber/amber16/lib/python2.7/site-packages/MMPBSA_mods/main.py",
line 930, in parse_output_files
self.using_chamber)}
File
"/net/software/local/amber/amber16/lib/python2.7/site-packages/MMPBSA_mods/amber_outputs.py",
line 708, in __init__
AmberOutput._read(self)
File
"/net/software/local/amber/amber16/lib/python2.7/site-packages/MMPBSA_mods/amber_outputs.py",
line 343, in _read
self._get_energies(output_file)
File
"/net/software/local/amber/amber16/lib/python2.7/site-packages/MMPBSA_mods/amber_outputs.py",
line 737, in _get_energies
self.data['VDWAALS'].append(float(words[2]))
ValueError: could not convert string to float: *************
Error occured on rank 0.
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
==============================================
Beginning PB calculations with /net/software/local/amber/amber16/bin/sander
calculating complex contribution...
calculating receptor contribution...
calculating ligand contribution...
Timing:
Total setup time: 0.063 min.
Creating trajectories with cpptraj: 0.154 min.
Total calculation time: 11.878 min.
Total GB calculation time: 0.513 min.
Total PB calculation time: 11.253 min.
Statistics calculation & output writing: 0.000 min.
Total time taken: 12.150 min.
File "/net/software/local/amber/amber16/bin/MMPBSA.py.MPI", line 110,
in <module>
app.finalize()
File
"/net/software/local/amber/amber16/lib/python2.7/site-packages/MMPBSA_mods/main.py",
line 681, in finalize
self.remove(self.INPUT['keep_files'])
File
"/net/software/local/amber/amber16/lib/python2.7/site-packages/MMPBSA_mods/main.py",
line 869, in remove
utils.remove(flag, mpi_size=self.mpi_size, fnpre=self.pre)
File
"/net/software/local/amber/amber16/lib/python2.7/site-packages/MMPBSA_mods/utils.py",
line 123, in remove
os.remove(fil)
OSError: [Errno 2] No such file or directory: '_MMPBSA_pb.mdin'
Error occured on rank 0.
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Apr 23 2020 - 08:00:02 PDT