Re: [AMBER] Problem of running a simulation in A100 GPU server.

From: David A Case <david.case.rutgers.edu>
Date: Sun, 13 Jun 2021 21:47:17 -0400

On Sun, Jun 13, 2021, Jong Young Joung wrote:

>After installing Amber20 in a GPU server using A100 card,
>I tried a simulation of a bound structure of a certain protein-ligand.
>Right after the simulation run started, the system began to be unstable.

>However, when I tried the same simulation with the same md preparation
>files in the GPU server using V100 card, the simulation continued to
>progress stably.

Since you get bad behavior "right after the simulation run started", you
should be able to run parallel short simulations on A100 vs V100, using
identical inputs and setting a small value for ntpr to give maximum
information. Try to narrow down as much as you can what the differences
are. Are the energies identical on the first step? If so, how long does it
take for them to diverge?

I'm assuming here that the cuda test cases pass for the A100 both cards.
If not, one needs to explore possible problems with the installation, or
(potentially) with the hardware itself. But you should be in good shape to
track this down: you have a machine where things appear to work, and one
where things fail; so there are a number of things that should come to mind
to try to narrow down the underlying cause.

....dac


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Jun 13 2021 - 19:00:03 PDT
Custom Search