Dear Amber Users,
I run a classical NVT simulation of a 700k system on Tesla P100's. I run 10
MD replicas of the same system on different nodes.
All MDs generally run for 80-100 ns (i.e., the production run after EM, EQ,
etc.) then I get some problems.
When running on *two GPUs *in the *peer2peer* mode (single node = 2 cards)
I get the following error:
gpu_allreduce cudaDeviceSynchronize failed an illegal memory access was
encountered
When running on *one GPU *I get the following error:
Error: an illegal memory access was encountered launching kernel kNLSkinTest
When adding the *vlimit=20,* option to my config file *some MDs run
normally on one GPU* while others encounter the same error. Nothing changes
in the peer-to-peer mode.
When setting the *vlimit=10,* option to my config file *all MDs run
normally* *on one GPU*. Nothing changes in the peer-to-peer mode.
My QUESTIONS are:
1/ What does the "vlimit" option do? I googled it from the amber mailing
lists but can not find the meaning.
2/ Does setting the vlimit affect performance of biological output of my
simulations?
3/ What would you suggest
Thank you,
Dmitry
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Jul 14 2019 - 11:30:03 PDT