Re: [AMBER] Error Message while using GPU from David A Case on 2020-06-28 (Amber Archive Jun 2020)

From: David A Case <david.case.rutgers.edu>
Date: Sun, 28 Jun 2020 15:19:51 -0400

On Sun, Jun 28, 2020, Sivanandam Magudeeswaran wrote:

> Recently we have installed Amber 20 in
>Fujitsu HPC machine. While running amber jobs using pmemd.cuda_DPFP.MPI
>routine with 4 nodes (each node has 24 processors), it gives the following
>error. Please give your valuable suggestions to rectify that error.

This looks like an error with the simulation, that happens very early
on:

vlimit exceeded for step 52; vmax = 5669.1427

Re-run the simulation with the serial CPU code, setting ntpr=1 and
nstlim =100 (say). See if you can get clues to what is happening from
this more detailed output.

Is this part of a continuing simulation? (That is, has it run for a
long time with apparently correct results, or are you just starting
out?) You should generally make sure that your system is stable, and
has converged on approximately the correct density, before moving to a
GPU. And, you should test whether going to mulitple GPUs actually
provides much speed-up over a single GPU run.

Also, I'm guessing that you may have incorrectly specified the MPI
setup, since you see to have something like 96 MPI threads:

>95 more processes have sent help message ....

This makes no sense for GPU runs: you for sure don't have 96 gpus on
those four nodes. So, (a) make sure that things work on the CPU; (b)
move to a single GPU and repeat; (c) carefully expand to multiple GPUs,
if required, starting with 2.

...good luck...dac

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Jun 28 2020 - 12:30:03 PDT