Re: [AMBER] cudaMemcpyToSymbol: SetSim copy to cSim failed invalid device symbol from Ross Walker on 2014-09-10 (Amber Archive Sep 2014)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 10 Sep 2014 09:51:22 -0700

Hi Scott,

Wow, I've never seen that description for a GPU before. These must be old
old old. :-( They also report their hardware version (CUDA Capability) as
1.3. AMBER 14 requires hardware support 2.0 or later unfortunately which
is why it didn't work. I am confused why it didn't quite with an error
though. The following code in $AMBERHOME/src/pmemd/src/cuda/gpu.cpp is
supposed to detect this and quit with an error:

    {
        cudaGetDeviceProperties(&deviceProp, gpu->gpu_device_id);
#ifdef MPI
        if (deviceProp.canMapHostMemory && (deviceProp.major >= 2))
#else
        if (deviceProp.major >= 2)
#endif
            device = gpu->gpu_device_id;
        else
        {
#ifdef MPI
            printf("Selected GPU does not support both zero-copy and SM
2.0, exiting.\n");
#else
            printf("Selected GPU lacks SM 2.0 or better support,
exiting.\n");
#endif
            cudaThreadExit();
            exit(-1);
        }
    }

Would be interesting to see what the value of deviceProp.major is in this
section for your GPUs.

All the best
Ross

On 9/9/14, 11:23 PM, "Scott Brozell" <sbrozell.rci.rutgers.edu> wrote:

>Hi,
>
>For a fresh installation of Amber14 with cuda-5.0.35 on old hardware,
>all pmemd.cuda tests fail:
>
>cudaMemcpyToSymbol: SetSim copy to cSim failed invalid device symbol
>
>I see one reflector where the cuda installation was apparently faulty:
>http://archive.ambermd.org/201301/0399.html
>
>But the cuda-5.0.35 has been tested and works for the samples and
>simple SDK examples.
>
>So i wonder whether the hardware is now supported by pmemd.cuda ?
>
>thanks,
>scott
>
>+------------------------------------------------------+
>| NVIDIA-SMI 331.62 Driver Version: 331.62 |
>|-------------------------------+----------------------+------------------
>----+
>| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
>ECC |
>| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util
>Compute M. |
>|===============================+======================+==================
>====|
>| 0 HICx16 + Graphics Off | 0000:24:00.0 N/A |
>N/A |
>|100% 89C N/A N/A / N/A | 2MiB / 255MiB | N/A E.
>Thread |
>+-------------------------------+----------------------+------------------
>----+
>| 1 Quadroplex 2200 S4 Off | 0000:28:00.0 N/A |
>N/A |
>| N/A 36C N/A N/A / N/A | 3MiB / 4095MiB | N/A E.
>Thread |
>+-------------------------------+----------------------+------------------
>----+
>| 2 Quadroplex 2200 S4 Off | 0000:29:00.0 N/A |
>N/A |
>| N/A 36C N/A N/A / N/A | 71MiB / 4095MiB | N/A E.
>Thread |
>+-------------------------------+----------------------+------------------
>----+
>
>+-------------------------------------------------------------------------
>----+
>| Compute processes: GPU
>Memory |
>| GPU PID Process name Usage
> |
>|=========================================================================
>====|
>| 0 Not Supported
> |
>| 1 Not Supported
> |
>| 2 Not Supported
> |
>+-------------------------------------------------------------------------
>----+
>
>/usr/local/cuda-5.0.35/1_Utilities/deviceQuery/deviceQuery Starting...
>
> CUDA Device Query (Runtime API) version (CUDART static linking)
>
>Detected 2 CUDA Capable device(s)
>
>Device 0: "Quadroplex 2200 S4"
> CUDA Driver Version / Runtime Version 6.0 / 5.0
> CUDA Capability Major/Minor version number: 1.3
> Total amount of global memory: 4096 MBytes (4294770688
>bytes)
> (30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
> GPU Clock rate: 1296 MHz (1.30 GHz)
> Memory Clock rate: 800 Mhz
> Memory Bus Width: 512-bit
> Max Texture Dimension Size (x,y,z) 1D=(8192),
>2D=(65536,32768), 3D=(2048,2048,2048)
> Max Layered Texture Size (dim) x layers 1D=(8192) x 512,
>2D=(8192,8192) x 512
> Total amount of constant memory: 65536 bytes
> Total amount of shared memory per block: 16384 bytes
> Total number of registers available per block: 16384
> Warp size: 32
> Maximum number of threads per multiprocessor: 1024
> Maximum number of threads per block: 512
> Maximum sizes of each dimension of a block: 512 x 512 x 64
> Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
> Maximum memory pitch: 2147483647 bytes
> Texture alignment: 256 bytes
> Concurrent copy and kernel execution: Yes with 1 copy engine(s)
> Run time limit on kernels: No
> Integrated GPU sharing Host Memory: No
> Support host page-locked memory mapping: Yes
> Alignment requirement for Surfaces: Yes
> Device has ECC support: Disabled
> Device supports Unified Addressing (UVA): No
> Device PCI Bus ID / PCI location ID: 40 / 0
> Compute Mode:
> < Exclusive (only one host thread in one process is able to use
>::cudaSetDevice() with this device) >
>
>Device 1: "Quadroplex 2200 S4"
> CUDA Driver Version / Runtime Version 6.0 / 5.0
> CUDA Capability Major/Minor version number: 1.3
> Total amount of global memory: 4096 MBytes (4294770688
>bytes)
> (30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
> GPU Clock rate: 1296 MHz (1.30 GHz)
> Memory Clock rate: 800 Mhz
> Memory Bus Width: 512-bit
> Max Texture Dimension Size (x,y,z) 1D=(8192),
>2D=(65536,32768), 3D=(2048,2048,2048)
> Max Layered Texture Size (dim) x layers 1D=(8192) x 512,
>2D=(8192,8192) x 512
> Total amount of constant memory: 65536 bytes
> Total amount of shared memory per block: 16384 bytes
> Total number of registers available per block: 16384
> Warp size: 32
> Maximum number of threads per multiprocessor: 1024
> Maximum number of threads per block: 512
> Maximum sizes of each dimension of a block: 512 x 512 x 64
> Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
> Maximum memory pitch: 2147483647 bytes
> Texture alignment: 256 bytes
> Concurrent copy and kernel execution: Yes with 1 copy engine(s)
> Run time limit on kernels: No
> Integrated GPU sharing Host Memory: No
> Support host page-locked memory mapping: Yes
> Alignment requirement for Surfaces: Yes
> Device has ECC support: Disabled
> Device supports Unified Addressing (UVA): No
> Device PCI Bus ID / PCI location ID: 41 / 0
> Compute Mode:
> < Exclusive (only one host thread in one process is able to use
>::cudaSetDevice() with this device) >
>
>deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA
>Runtime Version = 5.0, NumDevs = 2,
> Device0 = Quadroplex 2200 S4, Device1 = Quadroplex 2200 S4
>
>
> -------------------------------------------------------
> Amber 14 SANDER 2014
> -------------------------------------------------------
>
>| PMEMD implementation of SANDER, Release 14
>
>| Run on 09/09/2014 at 16:24:16
>
>| Executable path: ../../../bin/pmemd.cuda
>| Working directory: /tmp/pbstmp.17067098/test/cuda/cellulose
>| Hostname: opt2636
>
> [-O]verwriting output
>
>File Assignments:
>| MDIN: mdin
>
>| MDOUT: mdout
>
>| INPCRD: inpcrd
>
>| PARM: prmtop
>
>| RESTRT: restrt
>
>| REFC: refc
>
>| MDVEL: mdvel
>
>| MDEN: mden
>
>| MDCRD: mdcrd
>
>| MDINFO: mdinfo
>
>| MDFRC: mdfrc
>
>
>
> Here is the input file:
>
> Typical Production MD NVT
>
> &cntrl
>
> ntx=5, irest=1,
>
> ntc=2, ntf=2,
>
> nstlim=20,
>
> ntpr=1, ntwx=0,
>
> ntwr=40,
>
> dt=0.002, cut=8.,
>
> ntt=1, tautp=10.0,
>
> temp0=300.0,
>
> ntb=2, ntp=1,tautp=10.0,
>
> ioutfm=1,
>
> /
>
> &ewald
>
> nfft1=288, nfft2=128, nfft3=128,netfrc=0,
>
> /
>
>
>
>
>|--------------------- INFORMATION ----------------------
>| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
>| Version 14.0.1
>|
>| 06/20/2014
>|
>| Implementation by:
>| Ross C. Walker (SDSC)
>| Scott Le Grand (nVIDIA)
>|
>| CAUTION: The CUDA code is currently experimental.
>| You use it at your own risk. Be sure to
>| check ALL results carefully.
>|
>| Precision model in use:
>| [SPFP] - Mixed Single/Double/Fixed Point Precision.
>| (Default)
>|
>|--------------------------------------------------------
>
>|----------------- CITATION INFORMATION -----------------
>|
>| When publishing work that utilized the CUDA version
>| of AMBER, please cite the following in addition to
>| the regular AMBER citations:
>|
>| - Romelia Salomon-Ferrer; Andreas W. Goetz; Duncan
>| Poole; Scott Le Grand; Ross C. Walker "Routine
>| microsecond molecular dynamics simulations with
>| AMBER - Part II: Particle Mesh Ewald", J. Chem.
>| Theory Comput., 2013, 9 (9), pp3878-3888,
>| DOI: 10.1021/ct400314y.
>|
>| - Andreas W. Goetz; Mark J. Williamson; Dong Xu;
>| Duncan Poole; Scott Le Grand; Ross C. Walker
>| "Routine microsecond molecular dynamics simulations
>| with AMBER - Part I: Generalized Born", J. Chem.
>| Theory Comput., 2012, 8 (5), pp1542-1555.
>|
>| - Scott Le Grand; Andreas W. Goetz; Ross C. Walker
>| "SPFP: Speed without compromise - a mixed precision
>| model for GPU accelerated molecular dynamics
>| simulations.", Comp. Phys. Comm., 2013, 184
>| pp374-380, DOI: 10.1016/j.cpc.2012.09.022
>|
>|--------------------------------------------------------
>
>|------------------- GPU DEVICE INFO --------------------
>|
>| CUDA_VISIBLE_DEVICES: 0
>| CUDA Capable Devices Detected: 1
>| CUDA Device ID in use: 0
>| CUDA Device Name: Quadroplex 2200 S4
>| CUDA Device Global Mem Size: 4095 MB
>| CUDA Device Num Multiprocessors: 30
>| CUDA Device Core Freq: 1.30 GHz
>|
>|--------------------------------------------------------
>
>
>| Conditional Compilation Defines Used:
>| PUBFFT
>| BINTRAJ
>| CUDA
>| EMIL
>
>| Largest sphere to fit in unit cell has radius = 61.751
>
>
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 10 2014 - 10:00:04 PDT