Dear List,
I am using AMBER 12.3.1 and most of my simulations scale upto 40
ns/day using 2 GPUs (1 GPU per node). However, whenever I try to use a
jobscript with more than four nodes, my job crashes without any error
message. My jobscript is below. When I use 2 nodes, prod1.mdcrd would
be the first file and after completion of prod1.mdrcrd (upto 10 ns) it
will start writing prod2.mdcrd. However, when I try to use 4 nodes all
the runs prod1.out prod2.out start simultaneously and the job would
crash.
Has anybody experienced this? Any feedback is appreciated. How do I
manage to submit job across multiple GPU node. The supercomputing
cluster has 1 GPU and 12 CPU per node.
Regards,
Neha
=======================================================
#PBS -l walltime=24:00:00
#PBS -l select=4:ncpus=12:mpiprocs=1:ngpus=1:mem=64GB
#PBS -W group_list=partner420
#PBS -q workq
module load openmpi
module load cuda
module load amber-dev
cd $PBS_O_WORKDIR
mpirun $AMBERHOME/bin/pmemd.cuda_SPFP.MPI -O -i sander.in4 -o
prod1.out2 -r prod1.rst -p solvated.prmtop -c anneal.rst -x
prod1.mdcrd
mpirun $AMBERHOME/bin/pmemd.cuda_SPFP.MPI -O -i sander.in4 -o
prod2.out2 -r prod2.rst -p solvated.prmtop -c prod1.rst -x prod2.mdcrd
mpirun $AMBERHOME/bin/pmemd.cuda_SPFP.MPI -O -i sander.in4 -o
prod3.out2 -r prod3.rst -p solvated.prmtop -c prod2.rst -x prod3.mdcrd
--
Regards,
Dr. Neha S. Gandhi,
Curtin Research Fellow,
School of Biomedical Sciences,
Curtin University,
Perth GPO U1987
Australia
LinkedIn
Research Gate
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Dec 10 2013 - 03:30:02 PST