Hi,
For a fresh installation of Amber14 with cuda-5.0.35 on old hardware,
all pmemd.cuda tests fail:
cudaMemcpyToSymbol: SetSim copy to cSim failed invalid device symbol
I see one reflector where the cuda installation was apparently faulty:
http://archive.ambermd.org/201301/0399.html
But the cuda-5.0.35 has been tested and works for the samples and
simple SDK examples.
So i wonder whether the hardware is now supported by pmemd.cuda ?
thanks,
scott
+------------------------------------------------------+
| NVIDIA-SMI 331.62     Driver Version: 331.62         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  HICx16 + Graphics   Off  | 0000:24:00.0     N/A |                  N/A |
|100%   89C  N/A     N/A /  N/A |      2MiB /   255MiB |     N/A    E. Thread |
+-------------------------------+----------------------+----------------------+
|   1  Quadroplex 2200 S4  Off  | 0000:28:00.0     N/A |                  N/A |
| N/A   36C  N/A     N/A /  N/A |      3MiB /  4095MiB |     N/A    E. Thread |
+-------------------------------+----------------------+----------------------+
|   2  Quadroplex 2200 S4  Off  | 0000:29:00.0     N/A |                  N/A |
| N/A   36C  N/A     N/A /  N/A |     71MiB /  4095MiB |     N/A    E. Thread |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
|    1            Not Supported                                               |
|    2            Not Supported                                               |
+-----------------------------------------------------------------------------+
/usr/local/cuda-5.0.35/1_Utilities/deviceQuery/deviceQuery Starting...
 CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 2 CUDA Capable device(s)
Device 0: "Quadroplex 2200 S4"
  CUDA Driver Version / Runtime Version          6.0 / 5.0
  CUDA Capability Major/Minor version number:    1.3
  Total amount of global memory:                 4096 MBytes (4294770688 bytes)
  (30) Multiprocessors x (  8) CUDA Cores/MP:    240 CUDA Cores
  GPU Clock rate:                                1296 MHz (1.30 GHz)
  Memory Clock rate:                             800 Mhz
  Memory Bus Width:                              512-bit
  Max Texture Dimension Size (x,y,z)             1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers        1D=(8192) x 512, 2D=(8192,8192) x 512
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 16384
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           40 / 0
  Compute Mode:
     < Exclusive (only one host thread in one process is able to use ::cudaSetDevice() with this device) >
Device 1: "Quadroplex 2200 S4"
  CUDA Driver Version / Runtime Version          6.0 / 5.0
  CUDA Capability Major/Minor version number:    1.3
  Total amount of global memory:                 4096 MBytes (4294770688 bytes)
  (30) Multiprocessors x (  8) CUDA Cores/MP:    240 CUDA Cores
  GPU Clock rate:                                1296 MHz (1.30 GHz)
  Memory Clock rate:                             800 Mhz
  Memory Bus Width:                              512-bit
  Max Texture Dimension Size (x,y,z)             1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers        1D=(8192) x 512, 2D=(8192,8192) x 512
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 16384
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           41 / 0
  Compute Mode:
     < Exclusive (only one host thread in one process is able to use ::cudaSetDevice() with this device) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 5.0, NumDevs = 2,
 Device0 = Quadroplex 2200 S4, Device1 = Quadroplex 2200 S4
          -------------------------------------------------------
          Amber 14 SANDER                              2014
          -------------------------------------------------------
| PMEMD implementation of SANDER, Release 14
| Run on 09/09/2014 at 16:24:16
|   Executable path: ../../../bin/pmemd.cuda
| Working directory: /tmp/pbstmp.17067098/test/cuda/cellulose
|          Hostname: opt2636
  [-O]verwriting output
File Assignments:
|   MDIN: mdin                                                                  
|  MDOUT: mdout                                                                 
| INPCRD: inpcrd                                                                
|   PARM: prmtop                                                                
| RESTRT: restrt                                                                
|   REFC: refc                                                                  
|  MDVEL: mdvel                                                                 
|   MDEN: mden                                                                  
|  MDCRD: mdcrd                                                                 
| MDINFO: mdinfo                                                                
|  MDFRC: mdfrc                                                                 
 
 Here is the input file:
 
 Typical Production MD NVT                                                     
 &cntrl                                                                        
   ntx=5, irest=1,                                                             
   ntc=2, ntf=2,                                                               
   nstlim=20,                                                                  
   ntpr=1, ntwx=0,                                                             
   ntwr=40,                                                                    
   dt=0.002, cut=8.,                                                           
   ntt=1, tautp=10.0,                                                          
   temp0=300.0,                                                                
   ntb=2, ntp=1,tautp=10.0,                                                    
   ioutfm=1,                                                                   
 /                                                                             
 &ewald                                                                        
  nfft1=288, nfft2=128, nfft3=128,netfrc=0,                                    
 /                                                                             
 
|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
|                    Version 14.0.1
| 
|                      06/20/2014
| 
| Implementation by:
|                    Ross C. Walker     (SDSC)
|                    Scott Le Grand     (nVIDIA)
| 
| CAUTION: The CUDA code is currently experimental.
|          You use it at your own risk. Be sure to
|          check ALL results carefully.
| 
| Precision model in use:
|      [SPFP] - Mixed Single/Double/Fixed Point Precision.
|               (Default)
| 
|--------------------------------------------------------
 
|----------------- CITATION INFORMATION -----------------
|
|    When publishing work that utilized the CUDA version
|    of AMBER, please cite the following in addition to
|    the regular AMBER citations:
|
|  - Romelia Salomon-Ferrer; Andreas W. Goetz; Duncan
|    Poole; Scott Le Grand; Ross C. Walker "Routine
|    microsecond molecular dynamics simulations with
|    AMBER - Part II: Particle Mesh Ewald", J. Chem.
|    Theory Comput., 2013, 9 (9), pp3878-3888,
|    DOI: 10.1021/ct400314y.
|
|  - Andreas W. Goetz; Mark J. Williamson; Dong Xu;
|    Duncan Poole; Scott Le Grand; Ross C. Walker
|    "Routine microsecond molecular dynamics simulations
|    with AMBER - Part I: Generalized Born", J. Chem.
|    Theory Comput., 2012, 8 (5), pp1542-1555.
|
|  - Scott Le Grand; Andreas W. Goetz; Ross C. Walker
|    "SPFP: Speed without compromise - a mixed precision
|    model for GPU accelerated molecular dynamics
|    simulations.", Comp. Phys. Comm., 2013, 184
|    pp374-380, DOI: 10.1016/j.cpc.2012.09.022
|
|--------------------------------------------------------
 
|------------------- GPU DEVICE INFO --------------------
|
|            CUDA_VISIBLE_DEVICES: 0
|   CUDA Capable Devices Detected:      1
|           CUDA Device ID in use:      0
|                CUDA Device Name: Quadroplex 2200 S4
|     CUDA Device Global Mem Size:   4095 MB
| CUDA Device Num Multiprocessors:     30
|           CUDA Device Core Freq:   1.30 GHz
|
|--------------------------------------------------------
 
 
| Conditional Compilation Defines Used:
| PUBFFT
| BINTRAJ
| CUDA
| EMIL
 
| Largest sphere to fit in unit cell has radius =    61.751
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Sep 09 2014 - 23:30:02 PDT