[AMBER] cudaMemcpyToSymbol: SetSim copy to cSim failed invalid device symbol

From: Scott Brozell <sbrozell.rci.rutgers.edu>
Date: Wed, 10 Sep 2014 02:23:18 -0400


For a fresh installation of Amber14 with cuda-5.0.35 on old hardware,
all pmemd.cuda tests fail:

cudaMemcpyToSymbol: SetSim copy to cSim failed invalid device symbol

I see one reflector where the cuda installation was apparently faulty:

But the cuda-5.0.35 has been tested and works for the samples and
simple SDK examples.

So i wonder whether the hardware is now supported by pmemd.cuda ?


| NVIDIA-SMI 331.62 Driver Version: 331.62 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 HICx16 + Graphics Off | 0000:24:00.0 N/A | N/A |
|100% 89C N/A N/A / N/A | 2MiB / 255MiB | N/A E. Thread |
| 1 Quadroplex 2200 S4 Off | 0000:28:00.0 N/A | N/A |
| N/A 36C N/A N/A / N/A | 3MiB / 4095MiB | N/A E. Thread |
| 2 Quadroplex 2200 S4 Off | 0000:29:00.0 N/A | N/A |
| N/A 36C N/A N/A / N/A | 71MiB / 4095MiB | N/A E. Thread |

| Compute processes: GPU Memory |
| GPU PID Process name Usage |
| 0 Not Supported |
| 1 Not Supported |
| 2 Not Supported |

/usr/local/cuda-5.0.35/1_Utilities/deviceQuery/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "Quadroplex 2200 S4"
  CUDA Driver Version / Runtime Version 6.0 / 5.0
  CUDA Capability Major/Minor version number: 1.3
  Total amount of global memory: 4096 MBytes (4294770688 bytes)
  (30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
  GPU Clock rate: 1296 MHz (1.30 GHz)
  Memory Clock rate: 800 Mhz
  Memory Bus Width: 512-bit
  Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 16384 bytes
  Total number of registers available per block: 16384
  Warp size: 32
  Maximum number of threads per multiprocessor: 1024
  Maximum number of threads per block: 512
  Maximum sizes of each dimension of a block: 512 x 512 x 64
  Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
  Maximum memory pitch: 2147483647 bytes
  Texture alignment: 256 bytes
  Concurrent copy and kernel execution: Yes with 1 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Alignment requirement for Surfaces: Yes
  Device has ECC support: Disabled
  Device supports Unified Addressing (UVA): No
  Device PCI Bus ID / PCI location ID: 40 / 0
  Compute Mode:
     < Exclusive (only one host thread in one process is able to use ::cudaSetDevice() with this device) >

Device 1: "Quadroplex 2200 S4"
  CUDA Driver Version / Runtime Version 6.0 / 5.0
  CUDA Capability Major/Minor version number: 1.3
  Total amount of global memory: 4096 MBytes (4294770688 bytes)
  (30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
  GPU Clock rate: 1296 MHz (1.30 GHz)
  Memory Clock rate: 800 Mhz
  Memory Bus Width: 512-bit
  Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 16384 bytes
  Total number of registers available per block: 16384
  Warp size: 32
  Maximum number of threads per multiprocessor: 1024
  Maximum number of threads per block: 512
  Maximum sizes of each dimension of a block: 512 x 512 x 64
  Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
  Maximum memory pitch: 2147483647 bytes
  Texture alignment: 256 bytes
  Concurrent copy and kernel execution: Yes with 1 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Alignment requirement for Surfaces: Yes
  Device has ECC support: Disabled
  Device supports Unified Addressing (UVA): No
  Device PCI Bus ID / PCI location ID: 41 / 0
  Compute Mode:
     < Exclusive (only one host thread in one process is able to use ::cudaSetDevice() with this device) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 5.0, NumDevs = 2,
 Device0 = Quadroplex 2200 S4, Device1 = Quadroplex 2200 S4

          Amber 14 SANDER 2014

| PMEMD implementation of SANDER, Release 14

| Run on 09/09/2014 at 16:24:16

| Executable path: ../../../bin/pmemd.cuda
| Working directory: /tmp/pbstmp.17067098/test/cuda/cellulose
| Hostname: opt2636

  [-O]verwriting output

File Assignments:
| MDIN: mdin
| MDOUT: mdout
| INPCRD: inpcrd
| PARM: prmtop
| RESTRT: restrt
| REFC: refc
| MDVEL: mdvel
| MDEN: mden
| MDCRD: mdcrd
| MDINFO: mdinfo
| MDFRC: mdfrc

 Here is the input file:
 Typical Production MD NVT
   ntx=5, irest=1,
   ntc=2, ntf=2,
   ntpr=1, ntwx=0,
   dt=0.002, cut=8.,
   ntt=1, tautp=10.0,
   ntb=2, ntp=1,tautp=10.0,
  nfft1=288, nfft2=128, nfft3=128,netfrc=0,

|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
| Version 14.0.1
| 06/20/2014
| Implementation by:
| Ross C. Walker (SDSC)
| Scott Le Grand (nVIDIA)
| CAUTION: The CUDA code is currently experimental.
| You use it at your own risk. Be sure to
| check ALL results carefully.
| Precision model in use:
| [SPFP] - Mixed Single/Double/Fixed Point Precision.
| (Default)
|----------------- CITATION INFORMATION -----------------
| When publishing work that utilized the CUDA version
| of AMBER, please cite the following in addition to
| the regular AMBER citations:
| - Romelia Salomon-Ferrer; Andreas W. Goetz; Duncan
| Poole; Scott Le Grand; Ross C. Walker "Routine
| microsecond molecular dynamics simulations with
| AMBER - Part II: Particle Mesh Ewald", J. Chem.
| Theory Comput., 2013, 9 (9), pp3878-3888,
| DOI: 10.1021/ct400314y.
| - Andreas W. Goetz; Mark J. Williamson; Dong Xu;
| Duncan Poole; Scott Le Grand; Ross C. Walker
| "Routine microsecond molecular dynamics simulations
| with AMBER - Part I: Generalized Born", J. Chem.
| Theory Comput., 2012, 8 (5), pp1542-1555.
| - Scott Le Grand; Andreas W. Goetz; Ross C. Walker
| "SPFP: Speed without compromise - a mixed precision
| model for GPU accelerated molecular dynamics
| simulations.", Comp. Phys. Comm., 2013, 184
| pp374-380, DOI: 10.1016/j.cpc.2012.09.022
|------------------- GPU DEVICE INFO --------------------
| CUDA Capable Devices Detected: 1
| CUDA Device ID in use: 0
| CUDA Device Name: Quadroplex 2200 S4
| CUDA Device Global Mem Size: 4095 MB
| CUDA Device Num Multiprocessors: 30
| CUDA Device Core Freq: 1.30 GHz
| Conditional Compilation Defines Used:
| Largest sphere to fit in unit cell has radius = 61.751

AMBER mailing list
Received on Tue Sep 09 2014 - 23:30:02 PDT
Custom Search