Hi,
For a fresh installation of Amber14 with cuda-5.0.35 on old hardware,
all pmemd.cuda tests fail:
cudaMemcpyToSymbol: SetSim copy to cSim failed invalid device symbol
I see one reflector where the cuda installation was apparently faulty:
http://archive.ambermd.org/201301/0399.html
But the cuda-5.0.35 has been tested and works for the samples and
simple SDK examples.
So i wonder whether the hardware is now supported by pmemd.cuda ?
thanks,
scott
+------------------------------------------------------+
| NVIDIA-SMI 331.62 Driver Version: 331.62 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 HICx16 + Graphics Off | 0000:24:00.0 N/A | N/A |
|100% 89C N/A N/A / N/A | 2MiB / 255MiB | N/A E. Thread |
+-------------------------------+----------------------+----------------------+
| 1 Quadroplex 2200 S4 Off | 0000:28:00.0 N/A | N/A |
| N/A 36C N/A N/A / N/A | 3MiB / 4095MiB | N/A E. Thread |
+-------------------------------+----------------------+----------------------+
| 2 Quadroplex 2200 S4 Off | 0000:29:00.0 N/A | N/A |
| N/A 36C N/A N/A / N/A | 71MiB / 4095MiB | N/A E. Thread |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
| 2 Not Supported |
+-----------------------------------------------------------------------------+
/usr/local/cuda-5.0.35/1_Utilities/deviceQuery/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 2 CUDA Capable device(s)
Device 0: "Quadroplex 2200 S4"
CUDA Driver Version / Runtime Version 6.0 / 5.0
CUDA Capability Major/Minor version number: 1.3
Total amount of global memory: 4096 MBytes (4294770688 bytes)
(30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
GPU Clock rate: 1296 MHz (1.30 GHz)
Memory Clock rate: 800 Mhz
Memory Bus Width: 512-bit
Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 40 / 0
Compute Mode:
< Exclusive (only one host thread in one process is able to use ::cudaSetDevice() with this device) >
Device 1: "Quadroplex 2200 S4"
CUDA Driver Version / Runtime Version 6.0 / 5.0
CUDA Capability Major/Minor version number: 1.3
Total amount of global memory: 4096 MBytes (4294770688 bytes)
(30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
GPU Clock rate: 1296 MHz (1.30 GHz)
Memory Clock rate: 800 Mhz
Memory Bus Width: 512-bit
Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 41 / 0
Compute Mode:
< Exclusive (only one host thread in one process is able to use ::cudaSetDevice() with this device) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 5.0, NumDevs = 2,
Device0 = Quadroplex 2200 S4, Device1 = Quadroplex 2200 S4
-------------------------------------------------------
Amber 14 SANDER 2014
-------------------------------------------------------
| PMEMD implementation of SANDER, Release 14
| Run on 09/09/2014 at 16:24:16
| Executable path: ../../../bin/pmemd.cuda
| Working directory: /tmp/pbstmp.17067098/test/cuda/cellulose
| Hostname: opt2636
[-O]verwriting output
File Assignments:
| MDIN: mdin
| MDOUT: mdout
| INPCRD: inpcrd
| PARM: prmtop
| RESTRT: restrt
| REFC: refc
| MDVEL: mdvel
| MDEN: mden
| MDCRD: mdcrd
| MDINFO: mdinfo
| MDFRC: mdfrc
Here is the input file:
Typical Production MD NVT
&cntrl
ntx=5, irest=1,
ntc=2, ntf=2,
nstlim=20,
ntpr=1, ntwx=0,
ntwr=40,
dt=0.002, cut=8.,
ntt=1, tautp=10.0,
temp0=300.0,
ntb=2, ntp=1,tautp=10.0,
ioutfm=1,
/
&ewald
nfft1=288, nfft2=128, nfft3=128,netfrc=0,
/
|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
| Version 14.0.1
|
| 06/20/2014
|
| Implementation by:
| Ross C. Walker (SDSC)
| Scott Le Grand (nVIDIA)
|
| CAUTION: The CUDA code is currently experimental.
| You use it at your own risk. Be sure to
| check ALL results carefully.
|
| Precision model in use:
| [SPFP] - Mixed Single/Double/Fixed Point Precision.
| (Default)
|
|--------------------------------------------------------
|----------------- CITATION INFORMATION -----------------
|
| When publishing work that utilized the CUDA version
| of AMBER, please cite the following in addition to
| the regular AMBER citations:
|
| - Romelia Salomon-Ferrer; Andreas W. Goetz; Duncan
| Poole; Scott Le Grand; Ross C. Walker "Routine
| microsecond molecular dynamics simulations with
| AMBER - Part II: Particle Mesh Ewald", J. Chem.
| Theory Comput., 2013, 9 (9), pp3878-3888,
| DOI: 10.1021/ct400314y.
|
| - Andreas W. Goetz; Mark J. Williamson; Dong Xu;
| Duncan Poole; Scott Le Grand; Ross C. Walker
| "Routine microsecond molecular dynamics simulations
| with AMBER - Part I: Generalized Born", J. Chem.
| Theory Comput., 2012, 8 (5), pp1542-1555.
|
| - Scott Le Grand; Andreas W. Goetz; Ross C. Walker
| "SPFP: Speed without compromise - a mixed precision
| model for GPU accelerated molecular dynamics
| simulations.", Comp. Phys. Comm., 2013, 184
| pp374-380, DOI: 10.1016/j.cpc.2012.09.022
|
|--------------------------------------------------------
|------------------- GPU DEVICE INFO --------------------
|
| CUDA_VISIBLE_DEVICES: 0
| CUDA Capable Devices Detected: 1
| CUDA Device ID in use: 0
| CUDA Device Name: Quadroplex 2200 S4
| CUDA Device Global Mem Size: 4095 MB
| CUDA Device Num Multiprocessors: 30
| CUDA Device Core Freq: 1.30 GHz
|
|--------------------------------------------------------
| Conditional Compilation Defines Used:
| PUBFFT
| BINTRAJ
| CUDA
| EMIL
| Largest sphere to fit in unit cell has radius = 61.751
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Sep 09 2014 - 23:30:02 PDT