Thank you--this is very helpful!
For the record, here are the details on the build that finally worked for us (RHEL 7.9):
We're using Spack, and on the 'develop' branch, are at
commit b9bb303063299ba512a19168d1635172edb0e944
Author: Sebastian Pipping <sebastian.pipping.org>
Date: Wed Feb 2 00:29:29 2022 +0100
With that, we installed with
spack install amber.20 %gcc +x11 +mpi +cuda ^cuda.10.2.89 ^intel-mpi
It doesn't look like much, but we tried at least 15 other Spack invocations, with various compilers and versions of MPI and CUDA, all of which failed in various ways (usually compilation failure). We never did get a vanilla compile working, using the Amber instructions. It's certainly possible that we're missing something simple.
This works for our K80s. Trying this version with an A100 draws the "cudaMemcpyToSymbol: SetSim copy to cSim failed invalid device symbol" error, which I believe indicates a gencode mismatch. (I'm a bit surprised that the A100 can't run K80 code, but apparently not.)
Tried advancing the 'cuda' version to best support our (few) A100s. In particular, it appears that some version of CUDA 11 is required to make best use of A100s. (Can't recall which version.) The versions of CUDA 11 currently possible in Spack don't lead to a successful Amber compile. We'll look more later.
Mike
-----Original Message-----
From: David A Case <david.case.rutgers.edu>
Sent: Friday, February 4, 2022 5:47 AM
To: AMBER Mailing List <amber.ambermd.org>
Subject: Re: [AMBER] any working singularity or docker recipe? (esp for CUDA and MPI)
On Fri, Feb 04, 2022, Michael Coleman wrote:
>Finally managed a path to get this compiled (details to follow). In
>testing 'pmemd.cuda.MPI' on the 'jac' benchmark, I'm seeing our multiple
>K80s lighting up as expected, but there is no benefit in wall-clock time
>for adding multiple GPUs. If anything, adding GPUs increases running time.
That is a common experience, although others might comment on what they
expect for jac and K80s. Check out the DHFR (aka jac) benchmarks results
here for K80s:
https://urldefense.com/v3/__https://ambermd.org/gpus14/benchmarks.htm*__;Iw!!C5qS4YX3!QLWndnL_3CHIFv9zc-CFXrNUDnxCEUyhByyowhe1DhHYQk_HB3_UD4WG1M9F-_iZirU$
This shows significant increases on going to 2 to 4 GPUs, but results
like this are generally quite dependent on the interconnect hardware and
its software settings. And look at the last note on ""Maximizing GPU
performance" here:
https://urldefense.com/v3/__https://ambermd.org/GPULogistics.php__;!!C5qS4YX3!QLWndnL_3CHIFv9zc-CFXrNUDnxCEUyhByyowhe1DhHYQk_HB3_UD4WG1M9FVin2kew$
>Is there an available example that would be expected to show a benefit with
>(say) eight or sixteen K80s?
I don't know of any such example for a single MD run. Multiple GPUs can be
great for independent simulations, or for things like replica-exchange,
where communication between GPUs is minimal.
....dac
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
https://urldefense.com/v3/__http://lists.ambermd.org/mailman/listinfo/amber__;!!C5qS4YX3!QLWndnL_3CHIFv9zc-CFXrNUDnxCEUyhByyowhe1DhHYQk_HB3_UD4WG1M9FHVUGerQ$
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Feb 04 2022 - 19:30:02 PST