Re: [AMBER] cudaMemcpyToSymbol: SetSim copy to cSim failed invalid device symbol

From: Scott Le Grand <varelse2005.gmail.com>
Date: Wed, 10 Sep 2014 09:57:49 -0700

SM 1.3? Not supported anymore, SM 2.0 or later only. Surprised this even
ran...



On Tue, Sep 9, 2014 at 11:23 PM, Scott Brozell <sbrozell.rci.rutgers.edu>
wrote:

> Hi,
>
> For a fresh installation of Amber14 with cuda-5.0.35 on old hardware,
> all pmemd.cuda tests fail:
>
> cudaMemcpyToSymbol: SetSim copy to cSim failed invalid device symbol
>
> I see one reflector where the cuda installation was apparently faulty:
> http://archive.ambermd.org/201301/0399.html
>
> But the cuda-5.0.35 has been tested and works for the samples and
> simple SDK examples.
>
> So i wonder whether the hardware is now supported by pmemd.cuda ?
>
> thanks,
> scott
>
> +------------------------------------------------------+
> | NVIDIA-SMI 331.62 Driver Version: 331.62 |
>
> |-------------------------------+----------------------+----------------------+
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
> ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
> M. |
>
> |===============================+======================+======================|
> | 0 HICx16 + Graphics Off | 0000:24:00.0 N/A |
> N/A |
> |100% 89C N/A N/A / N/A | 2MiB / 255MiB | N/A E.
> Thread |
>
> +-------------------------------+----------------------+----------------------+
> | 1 Quadroplex 2200 S4 Off | 0000:28:00.0 N/A |
> N/A |
> | N/A 36C N/A N/A / N/A | 3MiB / 4095MiB | N/A E.
> Thread |
>
> +-------------------------------+----------------------+----------------------+
> | 2 Quadroplex 2200 S4 Off | 0000:29:00.0 N/A |
> N/A |
> | N/A 36C N/A N/A / N/A | 71MiB / 4095MiB | N/A E.
> Thread |
>
> +-------------------------------+----------------------+----------------------+
>
>
> +-----------------------------------------------------------------------------+
> | Compute processes: GPU
> Memory |
> | GPU PID Process name Usage
> |
>
> |=============================================================================|
> | 0 Not Supported
> |
> | 1 Not Supported
> |
> | 2 Not Supported
> |
>
> +-----------------------------------------------------------------------------+
>
> /usr/local/cuda-5.0.35/1_Utilities/deviceQuery/deviceQuery Starting...
>
> CUDA Device Query (Runtime API) version (CUDART static linking)
>
> Detected 2 CUDA Capable device(s)
>
> Device 0: "Quadroplex 2200 S4"
> CUDA Driver Version / Runtime Version 6.0 / 5.0
> CUDA Capability Major/Minor version number: 1.3
> Total amount of global memory: 4096 MBytes (4294770688
> bytes)
> (30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
> GPU Clock rate: 1296 MHz (1.30 GHz)
> Memory Clock rate: 800 Mhz
> Memory Bus Width: 512-bit
> Max Texture Dimension Size (x,y,z) 1D=(8192),
> 2D=(65536,32768), 3D=(2048,2048,2048)
> Max Layered Texture Size (dim) x layers 1D=(8192) x 512,
> 2D=(8192,8192) x 512
> Total amount of constant memory: 65536 bytes
> Total amount of shared memory per block: 16384 bytes
> Total number of registers available per block: 16384
> Warp size: 32
> Maximum number of threads per multiprocessor: 1024
> Maximum number of threads per block: 512
> Maximum sizes of each dimension of a block: 512 x 512 x 64
> Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
> Maximum memory pitch: 2147483647 bytes
> Texture alignment: 256 bytes
> Concurrent copy and kernel execution: Yes with 1 copy engine(s)
> Run time limit on kernels: No
> Integrated GPU sharing Host Memory: No
> Support host page-locked memory mapping: Yes
> Alignment requirement for Surfaces: Yes
> Device has ECC support: Disabled
> Device supports Unified Addressing (UVA): No
> Device PCI Bus ID / PCI location ID: 40 / 0
> Compute Mode:
> < Exclusive (only one host thread in one process is able to use
> ::cudaSetDevice() with this device) >
>
> Device 1: "Quadroplex 2200 S4"
> CUDA Driver Version / Runtime Version 6.0 / 5.0
> CUDA Capability Major/Minor version number: 1.3
> Total amount of global memory: 4096 MBytes (4294770688
> bytes)
> (30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
> GPU Clock rate: 1296 MHz (1.30 GHz)
> Memory Clock rate: 800 Mhz
> Memory Bus Width: 512-bit
> Max Texture Dimension Size (x,y,z) 1D=(8192),
> 2D=(65536,32768), 3D=(2048,2048,2048)
> Max Layered Texture Size (dim) x layers 1D=(8192) x 512,
> 2D=(8192,8192) x 512
> Total amount of constant memory: 65536 bytes
> Total amount of shared memory per block: 16384 bytes
> Total number of registers available per block: 16384
> Warp size: 32
> Maximum number of threads per multiprocessor: 1024
> Maximum number of threads per block: 512
> Maximum sizes of each dimension of a block: 512 x 512 x 64
> Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
> Maximum memory pitch: 2147483647 bytes
> Texture alignment: 256 bytes
> Concurrent copy and kernel execution: Yes with 1 copy engine(s)
> Run time limit on kernels: No
> Integrated GPU sharing Host Memory: No
> Support host page-locked memory mapping: Yes
> Alignment requirement for Surfaces: Yes
> Device has ECC support: Disabled
> Device supports Unified Addressing (UVA): No
> Device PCI Bus ID / PCI location ID: 41 / 0
> Compute Mode:
> < Exclusive (only one host thread in one process is able to use
> ::cudaSetDevice() with this device) >
>
> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime
> Version = 5.0, NumDevs = 2,
> Device0 = Quadroplex 2200 S4, Device1 = Quadroplex 2200 S4
>
>
> -------------------------------------------------------
> Amber 14 SANDER 2014
> -------------------------------------------------------
>
> | PMEMD implementation of SANDER, Release 14
>
> | Run on 09/09/2014 at 16:24:16
>
> | Executable path: ../../../bin/pmemd.cuda
> | Working directory: /tmp/pbstmp.17067098/test/cuda/cellulose
> | Hostname: opt2636
>
> [-O]verwriting output
>
> File Assignments:
> | MDIN: mdin
> | MDOUT: mdout
> | INPCRD: inpcrd
> | PARM: prmtop
> | RESTRT: restrt
> | REFC: refc
> | MDVEL: mdvel
> | MDEN: mden
> | MDCRD: mdcrd
> | MDINFO: mdinfo
> | MDFRC: mdfrc
>
>
> Here is the input file:
>
> Typical Production MD NVT
> &cntrl
> ntx=5, irest=1,
> ntc=2, ntf=2,
> nstlim=20,
> ntpr=1, ntwx=0,
> ntwr=40,
> dt=0.002, cut=8.,
> ntt=1, tautp=10.0,
> temp0=300.0,
> ntb=2, ntp=1,tautp=10.0,
> ioutfm=1,
> /
> &ewald
> nfft1=288, nfft2=128, nfft3=128,netfrc=0,
> /
>
>
>
> |--------------------- INFORMATION ----------------------
> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> | Version 14.0.1
> |
> | 06/20/2014
> |
> | Implementation by:
> | Ross C. Walker (SDSC)
> | Scott Le Grand (nVIDIA)
> |
> | CAUTION: The CUDA code is currently experimental.
> | You use it at your own risk. Be sure to
> | check ALL results carefully.
> |
> | Precision model in use:
> | [SPFP] - Mixed Single/Double/Fixed Point Precision.
> | (Default)
> |
> |--------------------------------------------------------
>
> |----------------- CITATION INFORMATION -----------------
> |
> | When publishing work that utilized the CUDA version
> | of AMBER, please cite the following in addition to
> | the regular AMBER citations:
> |
> | - Romelia Salomon-Ferrer; Andreas W. Goetz; Duncan
> | Poole; Scott Le Grand; Ross C. Walker "Routine
> | microsecond molecular dynamics simulations with
> | AMBER - Part II: Particle Mesh Ewald", J. Chem.
> | Theory Comput., 2013, 9 (9), pp3878-3888,
> | DOI: 10.1021/ct400314y.
> |
> | - Andreas W. Goetz; Mark J. Williamson; Dong Xu;
> | Duncan Poole; Scott Le Grand; Ross C. Walker
> | "Routine microsecond molecular dynamics simulations
> | with AMBER - Part I: Generalized Born", J. Chem.
> | Theory Comput., 2012, 8 (5), pp1542-1555.
> |
> | - Scott Le Grand; Andreas W. Goetz; Ross C. Walker
> | "SPFP: Speed without compromise - a mixed precision
> | model for GPU accelerated molecular dynamics
> | simulations.", Comp. Phys. Comm., 2013, 184
> | pp374-380, DOI: 10.1016/j.cpc.2012.09.022
> |
> |--------------------------------------------------------
>
> |------------------- GPU DEVICE INFO --------------------
> |
> | CUDA_VISIBLE_DEVICES: 0
> | CUDA Capable Devices Detected: 1
> | CUDA Device ID in use: 0
> | CUDA Device Name: Quadroplex 2200 S4
> | CUDA Device Global Mem Size: 4095 MB
> | CUDA Device Num Multiprocessors: 30
> | CUDA Device Core Freq: 1.30 GHz
> |
> |--------------------------------------------------------
>
>
> | Conditional Compilation Defines Used:
> | PUBFFT
> | BINTRAJ
> | CUDA
> | EMIL
>
> | Largest sphere to fit in unit cell has radius = 61.751
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 10 2014 - 10:00:05 PDT
Custom Search