Re: [AMBER] ERROR: GPU runs fail with nfft Error only when running 2x geforce TITANS in same machine from Scott Le Grand on 2013-06-19 (Amber Archive Jun 2013)

From: Scott Le Grand <varelse2005.gmail.com>
Date: Wed, 19 Jun 2013 10:15:34 -0700

| ERROR: nfft1 must be in the range of 6 to 512!
| ERROR: nfft2 must be in the range of 6 to 512!
| ERROR: nfft3 must be in the range of 6 to 512!
| ERROR: a must be in the range of 0.10000E+01 to 0.10000E+04!
| ERROR: b must be in the range of 0.10000E+01 to 0.10000E+04!
| ERROR: c must be in the range of 0.10000E+01 to 0.10000E+04!

That's not the CUDA code or the GPU. That's something *very* bad with your
machine having somehow convinced itself of something even worse. No idea
what though

On Mon, Jun 17, 2013 at 10:24 PM, ET <sketchfoot.gmail.com> wrote:

> ps. The machine is running in headless mode on centos 6
>
>
> #### bandwitdh test for currently installed TITAN-b:
>
> [CUDA Bandwidth Test] - Starting...
> Running on...
>
> Device 0: GeForce GTX TITAN
> Quick Mode
>
> Host to Device Bandwidth, 1 Device(s)
> PINNED Memory Transfers
> Transfer Size (Bytes) Bandwidth(MB/s)
> 33554432 6002.5
>
> Device to Host Bandwidth, 1 Device(s)
> PINNED Memory Transfers
> Transfer Size (Bytes) Bandwidth(MB/s)
> 33554432 6165.5
>
> Device to Device Bandwidth, 1 Device(s)
> PINNED Memory Transfers
> Transfer Size (Bytes) Bandwidth(MB/s)
> 33554432 220723.8
>
>
>
> ### deviceQuery
>
> deviceQuery Starting...
>
> CUDA Device Query (Runtime API) version (CUDART static linking)
>
> Detected 1 CUDA Capable device(s)
>
> Device 0: "GeForce GTX TITAN"
> CUDA Driver Version / Runtime Version 5.5 / 5.0
> CUDA Capability Major/Minor version number: 3.5
> Total amount of global memory: 6143 MBytes (6441730048
> bytes)
> (14) Multiprocessors x (192) CUDA Cores/MP: 2688 CUDA Cores
> GPU Clock rate: 928 MHz (0.93 GHz)
> Memory Clock rate: 3004 Mhz
> Memory Bus Width: 384-bit
> L2 Cache Size: 1572864 bytes
> Max Texture Dimension Size (x,y,z) 1D=(65536),
> 2D=(65536,65536), 3D=(4096,4096,4096)
> Max Layered Texture Size (dim) x layers 1D=(16384) x 2048,
> 2D=(16384,16384) x 2048
> Total amount of constant memory: 65536 bytes
> Total amount of shared memory per block: 49152 bytes
> Total number of registers available per block: 65536
> Warp size: 32
> Maximum number of threads per multiprocessor: 2048
> Maximum number of threads per block: 1024
> Maximum sizes of each dimension of a block: 1024 x 1024 x 64
> Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
> Maximum memory pitch: 2147483647 bytes
> Texture alignment: 512 bytes
> Concurrent copy and kernel execution: Yes with 1 copy engine(s)
> Run time limit on kernels: No
> Integrated GPU sharing Host Memory: No
> Support host page-locked memory mapping: Yes
> Alignment requirement for Surfaces: Yes
> Device has ECC support: Disabled
> Device supports Unified Addressing (UVA): Yes
> Device PCI Bus ID / PCI location ID: 3 / 0
> Compute Mode:
> < Exclusive Process (many threads in one process is able to use
> ::cudaSetDevice() with this device) >
>
> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime
> Version = 5.0, NumDevs = 1, Device0 = GeForce GTX TITAN
>
>
>
> On 18 June 2013 06:21, ET <sketchfoot.gmail.com> wrote:
>
> > Hi,
> >
> > I am trying to run NPT simulations using pmemd.cuda using TITAN graphics
> > cards. The equilibration & steps were completed with the CPU version of
> > sander.
> >
> > I have 2x EVGA superclocked TITAN cards.There have been problems with the
> > TITAN graphics cards and I RMA'd one. I have benchmarked both the cards
> > after the RMA and determined that they have no obvious problems that
> would
> > warrant them being RMA'd again. Though there is an issue with the AMBER
> > cuda code and TITANs in general as discussed in the following thread:
> >
> > < sorry, can't find it, but it's a ~200 posts long and titled:
> experiences
> > with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ?
> >
> >
> > As I'm not sure whether this is the same issue, I'm posting this in a new
> > thread.
> >
> > I began running 12 100ns production run using TITAN-a. There were no
> > problems. After waiting for and testing the replacement card (TITAN-b), I
> > put that into the machine as well. So both cards were working on
> finishing
> > the total of 300 segments.
> >
> > Very shortly, all the segments had failed, though the cards still showed
> a
> > 100% utilisation and I did not realise until I checked the outfiles which
> > showed "ERROR: nfft1 must be in the range of blah, blah, blah" (error
> > posted below). This was pretty weird as I am used to the jobs visibly
> > failing and not carrying on eating resources, whilst doing nothing.
> >
> > So I pulled the TITAN-a out and restarted the calculations with TITAN-b
> > from the last good rst. Usually 2 back. There have been no problems at
> all
> > and all the simulations have completed.
> >
> > My hardware specs are:
> > Gigabyte GA-X58-UD7 mobo
> > i7-930 processor
> > 6GB RAM
> > 1200 Watt Bequiet power supply
> >
> >
> >
> > Does anyone have any idea as to what's going on?
> >
> >
> > br,
> > g
> >
> > ############################################################
> > ############################################################
> > -------------------------------------------------------
> > Amber 12 SANDER 2012
> > -------------------------------------------------------
> >
> > | PMEMD implementation of SANDER, Release 12
> >
> > | Run on 06/09/2013 at 16:26:10
> >
> > [-O]verwriting output
> >
> > File Assignments:
> > | MDIN: prod.in
> >
> > | MDOUT: md_4.out
> >
> > | INPCRD: md_3.rst
> >
> > | PARM: ../leap/TMC_I54V-V82S_Complex_25.parm
> >
> > | RESTRT: md_4.rst
> >
> > | REFC: refc
> >
> > | MDVEL: mdvel
> >
> > | MDEN: mden
> >
> > | MDCRD: md_4.ncdf
> >
> > | MDINFO: mdinfo
> >
> >
> >
> > Here is the input file:
> >
> > Constant pressure constant temperature production run
> >
> > &cntrl
> >
> > nstlim=2000000, dt=0.002, ntx=5, irest=1, ntpr=250, ntwr=1000,
> ntwx=500,
> >
> > temp0=300.0, ntt=1, tautp=2.0, ioutfm=1, ig=-1, ntxo=2,
> >
> >
> >
> > ntb=2, ntp=1,
> >
> >
> >
> > ntc=2, ntf=2,
> >
> >
> >
> > nrespa=1,
> >
> > &end
> >
> >
> >
> > Note: ig = -1. Setting random seed based on wallclock time in
> microseconds.
> >
> > |--------------------- INFORMATION ----------------------
> > | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> > | Version 12.3
> > |
> > | 04/24/2013
> > |
> > | Implementation by:
> > | Ross C. Walker (SDSC)
> > | Scott Le Grand (nVIDIA)
> > | Duncan Poole (nVIDIA)
> > |
> > | CAUTION: The CUDA code is currently experimental.
> > | You use it at your own risk. Be sure to
> > | check ALL results carefully.
> > |
> > | Precision model in use:
> > | [SPFP] - Mixed Single/Double/Fixed Point Precision.
> > | (Default)
> > |
> > |--------------------------------------------------------
> >
> > |----------------- CITATION INFORMATION -----------------
> > |
> > | When publishing work that utilized the CUDA version
> > | of AMBER, please cite the following in addition to
> > | the regular AMBER citations:
> > |
> > | - Romelia Salomon-Ferrer; Andreas W. Goetz; Duncan
> > | Poole; Scott Le Grand; Ross C. Walker "Routine
> > | microsecond molecular dynamics simulations with
> > | AMBER - Part II: Particle Mesh Ewald", J. Chem.
> > | Theory Comput., 2012, (In review).
> > |
> > | - Andreas W. Goetz; Mark J. Williamson; Dong Xu;
> > | Duncan Poole; Scott Le Grand; Ross C. Walker
> > | "Routine microsecond molecular dynamics simulations
> > | with AMBER - Part I: Generalized Born", J. Chem.
> > | Theory Comput., 2012, 8 (5), pp1542-1555.
> > |
> > | - Scott Le Grand; Andreas W. Goetz; Ross C. Walker
> > | "SPFP: Speed without compromise - a mixed precision
> > | model for GPU accelerated molecular dynamics
> > | simulations.", Comp. Phys. Comm., 2013, 184
> > | pp374-380, DOI: 10.1016/j.cpc.2012.09.022
> > |
> > |--------------------------------------------------------
> >
> > |------------------- GPU DEVICE INFO --------------------
> > |
> > | CUDA Capable Devices Detected: 2
> > | CUDA Device ID in use: 0
> > | CUDA Device Name: GeForce GTX TITAN
> > | CUDA Device Global Mem Size: 6143 MB
> > | CUDA Device Num Multiprocessors: 14
> > | CUDA Device Core Freq: 0.93 GHz
> > |
> > |--------------------------------------------------------
> >
> > | ERROR: nfft1 must be in the range of 6 to 512!
> > | ERROR: nfft2 must be in the range of 6 to 512!
> > | ERROR: nfft3 must be in the range of 6 to 512!
> > | ERROR: a must be in the range of 0.10000E+01 to 0.10000E+04!
> > | ERROR: b must be in the range of 0.10000E+01 to 0.10000E+04!
> > | ERROR: c must be in the range of 0.10000E+01 to 0.10000E+04!
> >
> > Input errors occurred. Terminating execution.
> > ############################################################
> > ############################################################
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jun 19 2013 - 10:30:06 PDT