Hi Ross,
On Fri, Sep 07, 2012 at 12:45:12AM +0100, Ross Walker wrote:
> Hi Thomas,
>
> The issue is not one of the code actually working or not but one of
> performance. Unfortunately I don't have nay 1.3 hardware left to test on
> but our assumption is that SPFP will be substantially slower then APDP on
> V1.3 (C1060 type) hardware. If you have something you ran previously with
> SPDP on the same card it would be interesting to know what the performance
> is if you use the new default SPFP version of the code.
Here are the performance numbers of SPDP vs. SPFP I got for a
single M1060 using CUDA-4.2. I assume the single GPU is most
relevant as it is also the base line for parallel runs. Let me know
if you are also interested in parallel runs or if you would like to
have an account on our cluster for testing yourself.
The testcases and inputs are the ones from your Amber GPU webpage.
For SPDP the numbers are typically slighlty higher that what you
reported previously on the webpage (probably due to CUDA-4.2 vs.
3.2)
GB/TRPCage
12p7-spdp:| ns/day = 240.43
12p9-SPFP:| ns/day = 229.90
GB/myoglobin
12p7-spdp:| ns/day = 31.55
12p9-SPFP:| ns/day = 25.91
GB/nucleosome
12p7-spdp:| ns/day = 0.41
12p9-SPFP:| ns/day = 0.27
PME/JAC_production_NVE
12p7-spdp:| ns/day = 17.31
12p9-SPFP:| ns/day = 15.76
PME/JAC_production_NPT
12p7-spdp:| ns/day = 12.41
12p9-SPFP:| ns/day = 13.78
PME/FactorIX_production_NVE_128_64_64
12p7-spdp:| ns/day = 5.33
12p9-SPFP:| ns/day = 4.91
PME/FactorIX_production_NPT
12p7-spdp:| ns/day = 3.72
12p9-SPFP:| ns/day = 4.14
PME/Cellulose_production_NVE
12p7-spdp:| ns/day = 1.07
12p9-SPFP:| ns/day = 0.71
PME/Cellulose_production_NPT
12p7-spdp:| ns/day = 0.67
12p9-SPFP:| ns/day = 0.65
Thus, the new SPFP on M1060 is not always (significantly) slower as
you expected. In two of the NPT cases the new SPFP even
outperforms the old SPDP version by +10%.
Best,
thomas
Details of the system / binaries:
- Linux tg00X 2.6.32-42-server #95-Ubuntu SMP Wed Jul 25 16:10:49 UTC 2012 x86_64 GNU/Linux
- 2x Intel Xeon X5550 (4 Core Nehalem . 2.66 GHz)
- SMT active; fixed frequency; pmemd process pinned to Core 1
- CUDA Toolkit 4.2
- NVRM version: NVIDIA UNIX x86_64 Kernel Module 304.37 Wed Aug 8 19:52:48 PDT 2012
|------------------- GPU DEVICE INFO --------------------
|
| CUDA Capable Devices Detected: 2
| CUDA Device ID in use: 0
| CUDA Device Name: Tesla M1060
| CUDA Device Global Mem Size: 4095 MB
| CUDA Device Num Multiprocessors: 30
| CUDA Device Core Freq: 1.30 GHz
|
|--------------------------------------------------------
- 12p7-spdp = amber-gpu/12-p07-at12p07-cuda4.2-intel12.1-intelmpi
- 12p9-SPFP = amber-gpu/12-p09-at12p23-cuda4.2-intel12.1-intelmpi
| Conditional Compilation Defines Used:
| DIRFRC_COMTRANS
| DIRFRC_EFS
| DIRFRC_NOVEC
| PUBFFT
| FFTLOADBAL_2PROC
| BINTRAJ
| CUDA
> All the best
> Ross
>
>
> >> On Wed, Sep 5, 2012 at 6:58 AM, Thomas Zeiser wrote
> >>
> >> > Hello,
> >> >
> >> > according to the web page, NVidia GPUs with Hardware Version 1.3
> >> > should only support SPDP (and DPDP) but not the new SPFP mode.
> >> >
> >> > However, if a SPFP binary is run on a M1060 card which is supposed
> >> > to be HW-1.3, pmemd.cuda does not abort. (At least not for
> >> > Cellulose NPT).
> >> >
> >> > What does that mean?
> >> > (1) SPFP is supported on HW-1.3, or
> >> > (2) there is a bug in pmemd.cuda not aborting if SPFP is detected
> >> > on HW-1.3, or
> >> > (3) pmemd.cuda automatically switches (at runtime) to correct a
> >>non-SPFP
> >> > branch for HW-1.3 despite being compiled with default options
> >> > and mentioning SPFP
> >> > (4) SPFP is only used in parts which are not relevant for Cellulose
> >> > NPT, thus, everything is o.k. for this specific case
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 12 2012 - 02:00:04 PDT