Hi Amber users,
I try to use the following scripts for calculation of delG value. Some job finished sucessfully. But the system administrator just deleted my remaining job. I included their mail which explain why they kill my job. I therefore request you to help me what parameter I need to change in my scripts.
Script:
#!/bin/bash
#PBS -q default
#PBS -N Amber
#PBS -j oe
#PBS -l nodes=1:ppn=8
#PBS -l walltime=500:00:00
#PBS -l pmem=1GB
#PBS -M
#PBS -m abe
#
export AMBERHOME=/usr/local/amber12-pgi
#
cd /chpchome/dhossain/SHARED/DHPROJECT/AMBER/L112A-GBimp/TEST
#
$AMBERHOME/bin/MMPBSA.py -O -i mmpbsa.in -o FINAL_RESULTS_MMPBSA.dat -cp M11-L112A.gas.prmtop -rp M11.gas.prmtop -lp L112A.gas.prmtop -y *.mdcrd
The administrator explanation:
Hi Delwar,
Did you change any of the parameters in Amber before resubmitting your jobs? I will need to kill job 6720 running on node cluster3-95 as it is running like the ones we saw this morning.
Please understand that we cannot let the systems be affected or fail due to erroneous jobs. I did kindly inform you that unfortunately your program was killed to protect our computing resource and asked you to change the Amber parameters so that your simulation fits to our computing resource.
In this case the job system needed more memory than what the compute nodes physically have and that produced swap. This is not a parameter that is in the queue submission script, but within Amber. We are not experts of Amber. It may be quicker and you will get better support for this specific issue if you contact Amber support and ask what parameter to change to reduce amount of memory in order to run your simulation on our cluster.
FYI: Each compute node of cluster3 has 48GB physical memory for all eight processors (6GB per processor) - we need to reserve at least several GB for system.
As you know, we are trying to do our best to provide computing resource as much as possible.
Secondly, when I use parallel versition using the following script I got the following error message:
#!/bin/sh
#PBS -N rasraf_parallel
#PBS -o parallel.out
#PBS -e parallel.err
#PBS -m abe
#PBS -M
#PBS -q default
#PBS -l nodes=4:ppn=8
#PBS -l pmem=2GB
module load amber12-gcc
SANDER=MMPBSA.py.MPI
MDIN=mmpbsa.in
OUTPUT=FINAL_RESULTS_MMPBSA.dat
CPTOP=M11-bec.gas.prmtop
RPTOP=M11.gas.prmtop
LPTOP=beclin.gas.prmtop
MDCRD=M11-bec_md1.mdcrd
PROG=progress.log
#
cd $PBS_O_WORKDIR
#
#
export NUM_PROCS=`cat $PBS_NODEFILE | wc -l`
#
mpirun --mca mpi_paffinity_alone 1 -np $NUM_PROCS -machinefile $PBS_NODEFILE -x MX_RCACHE=0 --mca pml cm $SANDER -i $MDIN -o $OUTPUT -cp $CPTOP -rp $RPTOP -lp $LPTOP -y $MDCRD > $PROG 2>&1
#
progress.log
Running MMPBSA.MPI on 32 processors
Reading command-line arguments and input files...
Loading and checking parameter files for compatibility...
mmpbsa_py_energy found! Using /usr/local/amber12-gcc/bin/mmpbsa_py_energy
cpptraj found! Using /usr/local/amber12-gcc/bin/cpptraj
ptraj found! Using /usr/local/amber12-gcc/bin/ptraj
Preparing trajectories for simulation...
50 frames were processed by cpptraj for use in calculation.
Beginning GB calculations with /usr/local/amber12-gcc/bin/mmpbsa_py_energy
calculating complex contribution...
calculating receptor contribution...
calculating ligand contribution...
Beginning PB calculations with /usr/local/amber12-gcc/bin/mmpbsa_py_energy
calculating complex contribution...
CalcError: /usr/local/amber12-gcc/bin/mmpbsa_py_energy failed with prmtop M11-bec.gas.prmtop!
Error occured on rank 1.
Exiting. All files have been retained.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 25991 on
node cluster3-126.chpc.ndsu.nodak.edu exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
Thank you.
With best regards
Delwar
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jun 20 2012 - 12:30:03 PDT