Re: [AMBER] hpc error from David A Case on 2015-08-28 (Amber Archive Aug 2015)

From: David A Case <david.case.rutgers.edu>
Date: Fri, 28 Aug 2015 08:20:16 -0400

On Fri, Aug 28, 2015, Damiano Spadoni wrote:
>
> following your suggests, I tried to run my simulation using many less
> cores and processors and increase them little by little, but a new
> problem occurrs, the hpc facility is writing weird output files (here I
> report my last try on 64 nodes, but the problem is the same even on 2)
> where the same step is repeated more than once and make it impossible to
> finish any jobs. Here I report the error message and some other files
> are attached:

Here is at least one key problem with your script:

aprun -n 768 pmemd -O -i S4F4_md_200ps.in -o S4F4_md200ps32.out -p
SF4.prmtop -c S4F4_heat.rst -r S4F4_md200ps32.rst -x S4F4_md200ps32.mdcrd

You need to be running pmemd.MPI *not* pmemd. You are essentially running the
same (serial) program on every core, and the output is being intermingled.

And, while you are trying to get things working, why not run a much shorter
job? (Set nstlim to 100 or so, with a small value of ntpr.)

Even the correct program (pmemd.MPI) will probably never scale to 768 cores.
Be sure to try a variety of values to optimize this.

....dac

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Aug 28 2015 - 05:30:05 PDT