[AMBER] Amber20 Performance on RTX (Turing): known problems, with a patch forthcoming

From: David Cerutti <dscerutti.gmail.com>
Date: Fri, 29 May 2020 02:30:20 -0400

Dear Users,

As has been shared on this listserv, many users are finding that Amber20 is
not as fast on Turing architectures for PME simulations as Amber18. The
source of this problem has now been identified, and indeed it affects much
more than just Turing, but the 15-20% slowdown seen on Turing is merely the
most severe case.

The slowdown itself does NOT reflect any bugs or issues that would
necessitate repeating experiments. The problem, rather, is that some
future-proofing that our collaborators at NVIDIA kindly performed for us
has led to more GPU effort in synchronization. The benefit of this is
that, come CUDA 11 and the new Ampere chipset, pmemd.cuda is already
prepared to run on the cards (at a substantially greater speed than is
currently possible with a V100, which in my view competes with RTX-6000 for
top dog). However, legacy chipsets that do not need to perform the
synchronization required for CUDA 11 to work properly will suffer in

Contrary to what I warned yesterday afternoon, a fix is possible and we
already have it. A compiler-specific directive will create separate code
paths for the various chipsets and mask out the synchronization where it is
not needed, recovering the Amber18 performance while still keeping the code
in a state that is ready for the next architecture.

I would like to thank Scott Legrand, Peng Wang, and others at NVIDIA who
contributed either to the future-proofing or the short-term recovery
effort. As you sit at home preparing your new simulations, please enjoy
some ice cream or other simple treat while you await the forthcoming patch
that will put Amber20 back where it should be on the benchmarks.


Dave Cerutti
AMBER mailing list
Received on Fri May 29 2020 - 00:00:02 PDT
Custom Search