[AMBER] Major GPU Update Released

From: Ross Walker <ross.rosswalker.co.uk>
Date: Fri, 19 Aug 2011 08:22:59 -0700

Dear Fellow Amberites,

After several months of promising I am now pleased to announce that we have
released a major update to the GPU accelerated PMEMD code in AMBER and I
would encourage you to update your code. This update, ostensibly labeled
bugfix.17, almost doubles performance for PME calculations, without
introducing any new approximations, while also adding support for extra
points and fixing a number of outstanding bugs related to the use NPT with
restraints and organic solvents. It also addresses the issues with runs
locking up on GTX4XX and 5XX cards in a way that does not impact performance
on other GPUs. Full details of the GPU implementation and updated benchmarks
are available on http://ambermd.org/gpus/

Some example performance improvements are () show old performance numbers:

                 Performance in (ns/day)
                   1xC2070 2xC2070 4xC2070 8xC2070 1xGTX580
2xGTX580
JAC prod NVE 35.7 (21.1) 49.2 (31.0) 69.7 (44.4) 85.2 (52.7) 50.8
67.6
JAC prod NPT 30.0 (18.5) 40.7 (27.3) 55.5 (38.4) 67.8 (45.6) 40.7
55.2
Factor IX NVE 11.1 (5.3) 14.8 (8.8) 21.2 (14.0) 29.3 (20.2) 14.3
18.5
Factor IX NPT 9.0 (4.4) 12.2 (6.5) 17.8 (8.8) 20.5 (10.9) 11.8
16.0
Cellulose NPT 2.0 (1.1) 2.9 (1.7) 4.2 (2.6) 5.9 (3.8) Insuf
Mem

I intend to post some example hardware configurations on the GPU section of
the Amber website and, in collaboration with NVIDIA and certain hardware
vendors, we will shortly have hardware configurations available, termed MD
SimCluster(s), that can be purchased with AMBER preinstalled. I will post
more details on this program once it is formally announced.

The specifics of the update are as follows:

----------------------------------------------------------------------------
--
********>Bugfix 17:
Author: Ross Walker & Scott Le Grand
Date: 18 Aug 2011
Programs: pmemd.cuda
Description:  Major update. Updates cuda implementation to v2.2. This patch
provides
              a significant update to the GPU support. It more than doubles
performance
              across the board, fixes numerous bugs, especially involving
NPT.
              Performance notes:
              Full benchmarks are available on http://ambermd.org/gpus/
              Some brief examples of the performance improvements are:
                i) JAC Production NVE      - 1xC2070 35.70ns/day (new code),
21.11ns/day (old code)
                                             8xC2070 85.70nd/day (new code),
52.68ns/day (old code)
               ii) FactorIX Production NVE - 1xC2070 11.08ns/day (new code),
5.30ns/day (old code)
                                             8xC2070 29.30nd/day (new code),
20.20ns/day (old code)
              Updates include:
              1) Updated the setting for no_ntt3_sync when ig=-1 - remove
ifdef NO_NTT3_SYNC
                 This update applies to both CPU and GPU and will improve
scaling considerably
                 in both cases.
              2) Support for extra points added to GPU code [serial only]
(single extra point e.g. TIP4PEW)
              3) Power of 2 limitation on GPU count is removed. Both GB and
PME can now use
                 1,2,3,4,5 etc GPUs.
              4) NTT=2 is now accelerated on GPU.
              5) Check that ATOMS_PER_MOLECULE sums to NATOM for constant
pressure to test
                 for bogus prmtops. Applies to both CPU and GPU code.
              6) Use CURAND from CUDA toolkit in place of built in random
number generator.
              7) Multiple bug fixes including:
                 i) Fixes NPT restraint issues leading to kernel launch
failures.
                ii) Fix MPI energy differences due to missing initial
backwards half step.
               iii) Fix certain SM2.0 series crashes for PME runs with > 8A
cutoff
                iv) Fixes 8 node NPT crash
                 v) Multiple other minor bug fixes.
              8) Performance for PME simulations is improved by between 1.8
and 2.2x both in
                  serial and parallel.
              9) Support CUDA 4.0 compiler.
              10) Fixes kReduceSoluteCOM error when running with large
numbers of solute molecules
                  or many solvent molecules with atom counts > 3. The
previous limit of 460 on  C1060
                  and 1535 on C20xx has been removed although performance
will be impacted if you have
                  more molecules with >3 atoms in your system.
              11) Introduces permanent lockup fix for GTX5XX and GTX4XX
cards without impacting
                  performance as the previous worksaround did.
              12) Adds files necessary for Windows build of GPU accelerated
code to be described
                  on the AMBER Website shortly.
----------------------------------------------------------------------------
-- 
You can obtain the update from http://ambermd.org/bugfixes11.html
Comments are welcome.
All the best
Ross
/\
\/
|\oss Walker
---------------------------------------------------------
|             Assistant Research Professor              |
|            San Diego Supercomputer Center             |
|             Adjunct Assistant Professor               |
|         Dept. of Chemistry and Biochemistry           |
|          University of California San Diego           |
|                     NVIDIA Fellow                     |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk  | 
---------------------------------------------------------
Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.  
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Aug 19 2011 - 08:30:03 PDT
Custom Search