[AMBER] RTX2080TI Performance

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 11 Oct 2018 07:19:47 -0400

Hi All,

I finally have provisional numbers for RTX1080TI GPUs. I tested 4 GPUs with the stock reference design coolers. The two central GPUs hit 93C before dropping off the bus. Clearly 2080TI cooling is going to be a challenge. I pulled the two middle GPUs and ran with 2 x 2080TI well spaced out.

The following are the AMBER 18 numbers with the initial Turin Tweaks. Note consider these are still provisional. I still have to run the validation tests to make sure the GPUs get the right answers. One thing is for sure, we really are pushing the power envelope of these GPUs. How's this for maxed out:

Thu Oct 11 00:31:51 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.57 Driver Version: 410.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:1A:00.0 Off | N/A |
| 60% 81C P2 259W / 260W | 1495MiB / 10989MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:68:00.0 Off | N/A |
| 31% 60C P0 1W / 260W | 0MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 53398 C /usr/local/amber18/bin/pmemd.cuda 1485MiB |
+-----------------------------------------------------------------------------+

And yes the benchmarks below are correct! The RTX2080TI is indeed equivalent the 8x more expensive V100 and we do indeed get over 1 microsecond per day for the DHFR 4fs benchmark. :-)

New Benchmark Suite with thread count tweak for Turing

            System Class CPU (1) CPU (4) GPU 0 GPU 1
------------------------------- ---------------- -------- -------- -------- --------
JAC_production_NVE_4fs PME (Standard) 915.37 900.75
JAC_production_NPT_4fs PME (Standard) 846.79 836.68
Cellulose_production_NVE_4fs PME (Standard) 68.00 67.52
Cellulose_production_NPT_4fs PME (Standard) 64.75 64.58
FactorIX_production_NVE_4fs PME (Standard) 320.49 319.07
FactorIX_production_NPT_4fs PME (Standard) 299.55 295.82
STMV_production_NVE_4fs PME (Standard) 23.44 23.36
STMV_production_NPT_4fs PME (Standard) 21.96 21.87

JAC_production_NVE_4fs PME (Optimized) 1033.45 1012.65
JAC_production_NPT_4fs PME (Optimized) 948.41 940.27
Cellulose_production_NVE_4fs PME (Optimized) 72.61 72.24
Cellulose_production_NPT_4fs PME (Optimized) 68.61 68.58
FactorIX_production_NVE_4fs PME (Optimized) 354.46 353.35
FactorIX_production_NPT_4fs PME (Optimized) 329.38 327.69
STMV_production_NVE_4fs PME (Optimized) 25.45 25.39
STMV_production_NPT_4fs PME (Optimized) 23.67 23.69

TRPCage GB 2514.65 2521.09
myoglobin GB 1250.55 1251.71
nucleosome GB 30.15 29.93


Original Benchmark Suite with thread count tweak for Turing
JAC_PRODUCTION_NVE - 23,558 atoms PME 4fs
-----------------------------------------

      [0] 1 x GPU: | ns/day = 1017.27 seconds/ns = 84.93
      [1] 1 x GPU: | ns/day = 1017.20 seconds/ns = 84.94
Multiple Single GPU Run Performance
      [0] 1 x GPU: | ns/day = 1011.77 seconds/ns = 85.39
      [1] 1 x GPU: | ns/day = 1005.58 seconds/ns = 85.92

JAC_PRODUCTION_NPT - 23,558 atoms PME 4fs
-----------------------------------------

      [0] 1 x GPU: | ns/day = 940.66 seconds/ns = 91.85
      [1] 1 x GPU: | ns/day = 947.24 seconds/ns = 91.21
Multiple Single GPU Run Performance
      [0] 1 x GPU: | ns/day = 943.47 seconds/ns = 91.58
      [1] 1 x GPU: | ns/day = 939.12 seconds/ns = 92.00

JAC_PRODUCTION_NVE - 23,558 atoms PME 2fs
-----------------------------------------

      [0] 1 x GPU: | ns/day = 535.28 seconds/ns = 161.41
      [1] 1 x GPU: | ns/day = 533.23 seconds/ns = 162.03
Multiple Single GPU Run Performance
      [0] 1 x GPU: | ns/day = 536.59 seconds/ns = 161.02
      [1] 1 x GPU: | ns/day = 527.42 seconds/ns = 163.82

JAC_PRODUCTION_NPT - 23,558 atoms PME 2fs
-----------------------------------------

      [0] 1 x GPU: | ns/day = 485.28 seconds/ns = 178.04
      [1] 1 x GPU: | ns/day = 486.15 seconds/ns = 177.72
Multiple Single GPU Run Performance
      [0] 1 x GPU: | ns/day = 483.66 seconds/ns = 178.64
      [1] 1 x GPU: | ns/day = 477.95 seconds/ns = 180.77

FACTOR_IX_PRODUCTION_NVE - 90,906 atoms PME
-------------------------------------------

      [0] 1 x GPU: | ns/day = 195.32 seconds/ns = 442.36
      [1] 1 x GPU: | ns/day = 199.12 seconds/ns = 433.91
Multiple Single GPU Run Performance
      [0] 1 x GPU: | ns/day = 200.88 seconds/ns = 430.11
      [1] 1 x GPU: | ns/day = 197.50 seconds/ns = 437.47

FACTOR_IX_PRODUCTION_NPT - 90,906 atoms PME
-------------------------------------------

      [0] 1 x GPU: | ns/day = 182.97 seconds/ns = 472.21
      [1] 1 x GPU: | ns/day = 184.14 seconds/ns = 469.21
Multiple Single GPU Run Performance
      [0] 1 x GPU: | ns/day = 182.33 seconds/ns = 473.88
      [1] 1 x GPU: | ns/day = 183.67 seconds/ns = 470.41

CELLULOSE_PRODUCTION_NVE - 408,609 atoms PME
--------------------------------------------

      [0] 1 x GPU: | ns/day = 41.97 seconds/ns = 2058.82
      [1] 1 x GPU: | ns/day = 41.79 seconds/ns = 2067.66
Multiple Single GPU Run Performance
      [0] 1 x GPU: | ns/day = 41.77 seconds/ns = 2068.26
      [1] 1 x GPU: | ns/day = 41.55 seconds/ns = 2079.29

CELLULOSE_PRODUCTION_NPT - 408,609 atoms PME
--------------------------------------------

      [0] 1 x GPU: | ns/day = 39.28 seconds/ns = 2199.86
      [1] 1 x GPU: | ns/day = 39.34 seconds/ns = 2196.28
Multiple Single GPU Run Performance
      [0] 1 x GPU: | ns/day = 39.23 seconds/ns = 2202.29
      [1] 1 x GPU: | ns/day = 39.03 seconds/ns = 2213.82

STMV_PRODUCTION_NPT - 1,067,095 atoms PME
-----------------------------------------

      [0] 1 x GPU: | ns/day = 24.39 seconds/ns = 3542.69
      [1] 1 x GPU: | ns/day = 24.38 seconds/ns = 3543.43
Multiple Single GPU Run Performance
      [0] 1 x GPU: | ns/day = 24.32 seconds/ns = 3552.87
      [1] 1 x GPU: | ns/day = 24.17 seconds/ns = 3574.82

TRPCAGE_PRODUCTION - 304 atoms GB
---------------------------------

      [0] 1 x GPU: | ns/day = 1151.69 seconds/ns = 75.02
      [1] 1 x GPU: | ns/day = 1133.17 seconds/ns = 76.25
Multiple Single GPU Run Performance
      [0] 1 x GPU: | ns/day = 1124.11 seconds/ns = 76.86
      [1] 1 x GPU: | ns/day = 1184.88 seconds/ns = 72.92

MYOGLOBIN_PRODUCTION - 2,492 atoms GB
-------------------------------------

      [0] 1 x GPU: | ns/day = 563.33 seconds/ns = 153.37
      [1] 1 x GPU: | ns/day = 501.07 seconds/ns = 172.43
Multiple Single GPU Run Performance
      [0] 1 x GPU: | ns/day = 545.35 seconds/ns = 158.43
      [1] 1 x GPU: | ns/day = 551.93 seconds/ns = 156.54

NUCLEOSOME_PRODUCTION - 25,095 atoms GB
---------------------------------------

      [0] 1 x GPU: | ns/day = 15.52 seconds/ns = 5568.28
      [1] 1 x GPU: | ns/day = 15.35 seconds/ns = 5628.69
Multiple Single GPU Run Performance
      [0] 1 x GPU: | ns/day = 15.40 seconds/ns = 5609.17
      [1] 1 x GPU: | ns/day = 15.24 seconds/ns = 5669.86


All the best
Ross



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Oct 11 2018 - 04:30:02 PDT
Custom Search