Re: [AMBER] CUDA errors on a 700k system

From: Bill Ross <ross.cgl.ucsf.edu>
Date: Sun, 14 Jul 2019 11:07:55 -0700

Have you tried on a CPU? Maybe just to get started to some degree, in
case there's a GPU-specific numerical hump in Amber or GPUs.

Also it's a normal type of problem if you are equilibrating too fast.

Likely the manual describes vlimit, but searching: vlimit sander amber

The variablevlimitresets the velocity to the value of VLIMIT once it
becomes greater that abs(VLIMIT).  This can be used to avoid occasional
instabilities in molecular dynamics runs, and is especially important
for simulated annealing runs because of the high temperature.  It should
be set to some value between 10 and 20, which is well above the most
probably velocity in a Maxwell-Boltzmann distribution at room
temperature.  A warning message will be printed whenever the velocities
are modified.

http://ambermd.org/tutorials/advanced/tutorial4/index.htm

Bill

On 7/14/19 11:00 AM, Dmitry Suplatov wrote:
> Dear Amber Users,
>
> I run a classical NVT simulation of a 700k system on Tesla P100's. I run 10
> MD replicas of the same system on different nodes.
>
> All MDs generally run for 80-100 ns (i.e., the production run after EM, EQ,
> etc.) then I get some problems.
>
> When running on *two GPUs *in the *peer2peer* mode (single node = 2 cards)
> I get the following error:
> gpu_allreduce cudaDeviceSynchronize failed an illegal memory access was
> encountered
>
> When running on *one GPU *I get the following error:
> Error: an illegal memory access was encountered launching kernel kNLSkinTest
>
> When adding the *vlimit=20,* option to my config file *some MDs run
> normally on one GPU* while others encounter the same error. Nothing changes
> in the peer-to-peer mode.
>
> When setting the *vlimit=10,* option to my config file *all MDs run
> normally* *on one GPU*. Nothing changes in the peer-to-peer mode.
>
> My QUESTIONS are:
>
> 1/ What does the "vlimit" option do? I googled it from the amber mailing
> lists but can not find the meaning.
>
> 2/ Does setting the vlimit affect performance of biological output of my
> simulations?
>
> 3/ What would you suggest
>
> Thank you,
> Dmitry
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Jul 14 2019 - 11:30:03 PDT
Custom Search