Re: [AMBER] ERROR: GPU runs fail with nfft Error only when running 2x geforce TITANS in same machine from ET on 2013-06-17 (Amber Archive Jun 2013)

From: ET <sketchfoot.gmail.com>
Date: Tue, 18 Jun 2013 06:24:17 +0100

ps. The machine is running in headless mode on centos 6

#### bandwitdh test for currently installed TITAN-b:

[CUDA Bandwidth Test] - Starting...
Running on...

Device 0: GeForce GTX TITAN
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
   Transfer Size (Bytes) Bandwidth(MB/s)
   33554432 6002.5

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
   Transfer Size (Bytes) Bandwidth(MB/s)
   33554432 6165.5

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
   Transfer Size (Bytes) Bandwidth(MB/s)
   33554432 220723.8

### deviceQuery

deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX TITAN"
  CUDA Driver Version / Runtime Version 5.5 / 5.0
  CUDA Capability Major/Minor version number: 3.5
  Total amount of global memory: 6143 MBytes (6441730048
bytes)
  (14) Multiprocessors x (192) CUDA Cores/MP: 2688 CUDA Cores
  GPU Clock rate: 928 MHz (0.93 GHz)
  Memory Clock rate: 3004 Mhz
  Memory Bus Width: 384-bit
  L2 Cache Size: 1572864 bytes
  Max Texture Dimension Size (x,y,z) 1D=(65536),
2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers 1D=(16384) x 2048,
2D=(16384,16384) x 2048
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 49152 bytes
  Total number of registers available per block: 65536
  Warp size: 32
  Maximum number of threads per multiprocessor: 2048
  Maximum number of threads per block: 1024
  Maximum sizes of each dimension of a block: 1024 x 1024 x 64
  Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
  Maximum memory pitch: 2147483647 bytes
  Texture alignment: 512 bytes
  Concurrent copy and kernel execution: Yes with 1 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Alignment requirement for Surfaces: Yes
  Device has ECC support: Disabled
  Device supports Unified Addressing (UVA): Yes
  Device PCI Bus ID / PCI location ID: 3 / 0
  Compute Mode:
     < Exclusive Process (many threads in one process is able to use
::cudaSetDevice() with this device) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime
Version = 5.0, NumDevs = 1, Device0 = GeForce GTX TITAN

On 18 June 2013 06:21, ET <sketchfoot.gmail.com> wrote:

> Hi,
>
> I am trying to run NPT simulations using pmemd.cuda using TITAN graphics
> cards. The equilibration & steps were completed with the CPU version of
> sander.
>
> I have 2x EVGA superclocked TITAN cards.There have been problems with the
> TITAN graphics cards and I RMA'd one. I have benchmarked both the cards
> after the RMA and determined that they have no obvious problems that would
> warrant them being RMA'd again. Though there is an issue with the AMBER
> cuda code and TITANs in general as discussed in the following thread:
>
> < sorry, can't find it, but it's a ~200 posts long and titled: experiences
> with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ? >
>
> As I'm not sure whether this is the same issue, I'm posting this in a new
> thread.
>
> I began running 12 100ns production run using TITAN-a. There were no
> problems. After waiting for and testing the replacement card (TITAN-b), I
> put that into the machine as well. So both cards were working on finishing
> the total of 300 segments.
>
> Very shortly, all the segments had failed, though the cards still showed a
> 100% utilisation and I did not realise until I checked the outfiles which
> showed "ERROR: nfft1 must be in the range of blah, blah, blah" (error
> posted below). This was pretty weird as I am used to the jobs visibly
> failing and not carrying on eating resources, whilst doing nothing.
>
> So I pulled the TITAN-a out and restarted the calculations with TITAN-b
> from the last good rst. Usually 2 back. There have been no problems at all
> and all the simulations have completed.
>
> My hardware specs are:
> Gigabyte GA-X58-UD7 mobo
> i7-930 processor
> 6GB RAM
> 1200 Watt Bequiet power supply
>
>
>
> Does anyone have any idea as to what's going on?
>
>
> br,
> g
>
> ############################################################
> ############################################################
> -------------------------------------------------------
> Amber 12 SANDER 2012
> -------------------------------------------------------
>
> | PMEMD implementation of SANDER, Release 12
>
> | Run on 06/09/2013 at 16:26:10
>
> [-O]verwriting output
>
> File Assignments:
> | MDIN: prod.in
>
> | MDOUT: md_4.out
>
> | INPCRD: md_3.rst
>
> | PARM: ../leap/TMC_I54V-V82S_Complex_25.parm
>
> | RESTRT: md_4.rst
>
> | REFC: refc
>
> | MDVEL: mdvel
>
> | MDEN: mden
>
> | MDCRD: md_4.ncdf
>
> | MDINFO: mdinfo
>
>
>
> Here is the input file:
>
> Constant pressure constant temperature production run
>
> &cntrl
>
> nstlim=2000000, dt=0.002, ntx=5, irest=1, ntpr=250, ntwr=1000, ntwx=500,
>
> temp0=300.0, ntt=1, tautp=2.0, ioutfm=1, ig=-1, ntxo=2,
>
>
>
> ntb=2, ntp=1,
>
>
>
> ntc=2, ntf=2,
>
>
>
> nrespa=1,
>
> &end
>
>
>
> Note: ig = -1. Setting random seed based on wallclock time in microseconds.
>
> |--------------------- INFORMATION ----------------------
> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> | Version 12.3
> |
> | 04/24/2013
> |
> | Implementation by:
> | Ross C. Walker (SDSC)
> | Scott Le Grand (nVIDIA)
> | Duncan Poole (nVIDIA)
> |
> | CAUTION: The CUDA code is currently experimental.
> | You use it at your own risk. Be sure to
> | check ALL results carefully.
> |
> | Precision model in use:
> | [SPFP] - Mixed Single/Double/Fixed Point Precision.
> | (Default)
> |
> |--------------------------------------------------------
>
> |----------------- CITATION INFORMATION -----------------
> |
> | When publishing work that utilized the CUDA version
> | of AMBER, please cite the following in addition to
> | the regular AMBER citations:
> |
> | - Romelia Salomon-Ferrer; Andreas W. Goetz; Duncan
> | Poole; Scott Le Grand; Ross C. Walker "Routine
> | microsecond molecular dynamics simulations with
> | AMBER - Part II: Particle Mesh Ewald", J. Chem.
> | Theory Comput., 2012, (In review).
> |
> | - Andreas W. Goetz; Mark J. Williamson; Dong Xu;
> | Duncan Poole; Scott Le Grand; Ross C. Walker
> | "Routine microsecond molecular dynamics simulations
> | with AMBER - Part I: Generalized Born", J. Chem.
> | Theory Comput., 2012, 8 (5), pp1542-1555.
> |
> | - Scott Le Grand; Andreas W. Goetz; Ross C. Walker
> | "SPFP: Speed without compromise - a mixed precision
> | model for GPU accelerated molecular dynamics
> | simulations.", Comp. Phys. Comm., 2013, 184
> | pp374-380, DOI: 10.1016/j.cpc.2012.09.022
> |
> |--------------------------------------------------------
>
> |------------------- GPU DEVICE INFO --------------------
> |
> | CUDA Capable Devices Detected: 2
> | CUDA Device ID in use: 0
> | CUDA Device Name: GeForce GTX TITAN
> | CUDA Device Global Mem Size: 6143 MB
> | CUDA Device Num Multiprocessors: 14
> | CUDA Device Core Freq: 0.93 GHz
> |
> |--------------------------------------------------------
>
> | ERROR: nfft1 must be in the range of 6 to 512!
> | ERROR: nfft2 must be in the range of 6 to 512!
> | ERROR: nfft3 must be in the range of 6 to 512!
> | ERROR: a must be in the range of 0.10000E+01 to 0.10000E+04!
> | ERROR: b must be in the range of 0.10000E+01 to 0.10000E+04!
> | ERROR: c must be in the range of 0.10000E+01 to 0.10000E+04!
>
> Input errors occurred. Terminating execution.
> ############################################################
> ############################################################
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jun 17 2013 - 22:30:02 PDT