Re: [AMBER] ERROR: GPU runs fail with nfft Error only when running 2x geforce TITANS in same machine

From: ET <sketchfoot.gmail.com>
Date: Thu, 20 Jun 2013 20:07:12 +0100

I thought this was prob the case, but I just wanted to make sure that it
was not my setup as the mobo is ~ 4 years old. TBH, I'm going to RMA the
TITANS. I need some sims done quite quickly and can't really wait for a
fix. Thanks very much to both of you for your comments. :)

br,
g


On 20 June 2013 20:02, Scott Le Grand <varelse2005.gmail.com> wrote:

> Sounds like the Titan is freaking out actually...
>
> Sigh...
>
>
>
>
> On Thu, Jun 20, 2013 at 11:42 AM, ET <sketchfoot.gmail.com> wrote:
>
> > Hi Ross,
> >
> > What you say makes sense. The error does occur at the start of the
> > production segment. However, IMO it is GPU related as if you look in the
> > previous segment that generated the restart file for the segment that
> > failed, there are ******'s written in the outfile.
> >
> > Restarting the run with the nfft error inevitably fails, but restarting
> > from the previous run works fine in all the cases I have tried so far. So
> > the error that lead to NAN in the co-ordinate file only happens
> > intermittently.
> >
> > I can play the trajectories fine so the my input parameters are fine.
> It's
> > just that along the way the written co-ordinates get messed up. This
> occurs
> > with very high frequency (i.e.inevitable) in my dual GPU-setup. The
> problem
> > happens very rarely (but still occurs) in single-GPU.
> >
> > What do you think?
> >
> >
> > br,
> > g
> >
> >
> >
> >
> > On 20 June 2013 17:37, Ross Walker <ross.rosswalker.co.uk> wrote:
> >
> > > Hi ET,
> > >
> > > This cannot possibly be bandwidth limited. This error is triggered in
> the
> > > CPU code (vanilla Fortran) long before any GPU calculations are fired
> up.
> > > It is an initial check by the CPU at the time it is reading in the
> > > coordinates. Have you tried this with the CPU version of PMEMD? What
> does
> > > that report?
> > >
> > > All the best
> > > Ross
> > >
> > >
> > >
> > > On 6/20/13 7:56 AM, "ET" <sketchfoot.gmail.com> wrote:
> > >
> > > >Hi Scott,
> > > >
> > > >Hmmm. I'm thinking this is maybe a bandwidth issue. The problem occurs
> > > >when
> > > >running the card in dual GPU-config and the fail in the single-GPU
> mode
> > > >occurred on a machine with strangely low results.
> > > >
> > > >If I load the other card back into dual mode, I'll run the
> bandwidthtest
> > > >again. The slots should both have x16 bandwidth even if both are
> > > >populated.
> > > >Do you think an unusual number of peripherals such as HDDs will make a
> > > >difference?
> > > >
> > > >Also you say that it is not CUDA code or the GPU. This isn't an OS
> error
> > > >is
> > > >it? Or is it AMBER reporting an underlying OS error?
> > > >
> > > >
> > > >##########################################
> > > >
> > > >[CUDA Bandwidth Test] - Starting...
> > > >Running on...
> > > >
> > > > Device 0: GeForce GTX TITAN
> > > > Quick Mode
> > > >
> > > > Host to Device Bandwidth, 1 Device(s)
> > > > PINNED Memory Transfers
> > > > Transfer Size (Bytes) Bandwidth(MB/s)
> > > > 33554432 3930.7
> > > >
> > > > Device to Host Bandwidth, 1 Device(s)
> > > > PINNED Memory Transfers
> > > > Transfer Size (Bytes) Bandwidth(MB/s)
> > > > 33554432 2100.4
> > > >
> > > > Device to Device Bandwidth, 1 Device(s)
> > > > PINNED Memory Transfers
> > > > Transfer Size (Bytes) Bandwidth(MB/s)
> > > > 33554432 220731.1
> > > >
> > > >
> > > >
> > > >br,
> > > >g
> > > >
> > > >
> > > >On 19 June 2013 18:15, Scott Le Grand <varelse2005.gmail.com> wrote:
> > > >
> > > >> | ERROR: nfft1 must be in the range of 6 to 512!
> > > >> | ERROR: nfft2 must be in the range of 6 to 512!
> > > >> | ERROR: nfft3 must be in the range of 6 to 512!
> > > >> | ERROR: a must be in the range of 0.10000E+01 to 0.10000E+04!
> > > >> | ERROR: b must be in the range of 0.10000E+01 to 0.10000E+04!
> > > >> | ERROR: c must be in the range of 0.10000E+01 to 0.10000E+04!
> > > >>
> > > >> That's not the CUDA code or the GPU. That's something *very* bad
> with
> > > >>your
> > > >> machine having somehow convinced itself of something even worse. No
> > > >>idea
> > > >> what though
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Jun 17, 2013 at 10:24 PM, ET <sketchfoot.gmail.com> wrote:
> > > >>
> > > >> > ps. The machine is running in headless mode on centos 6
> > > >> >
> > > >> >
> > > >> > #### bandwitdh test for currently installed TITAN-b:
> > > >> >
> > > >> > [CUDA Bandwidth Test] - Starting...
> > > >> > Running on...
> > > >> >
> > > >> > Device 0: GeForce GTX TITAN
> > > >> > Quick Mode
> > > >> >
> > > >> > Host to Device Bandwidth, 1 Device(s)
> > > >> > PINNED Memory Transfers
> > > >> > Transfer Size (Bytes) Bandwidth(MB/s)
> > > >> > 33554432 6002.5
> > > >> >
> > > >> > Device to Host Bandwidth, 1 Device(s)
> > > >> > PINNED Memory Transfers
> > > >> > Transfer Size (Bytes) Bandwidth(MB/s)
> > > >> > 33554432 6165.5
> > > >> >
> > > >> > Device to Device Bandwidth, 1 Device(s)
> > > >> > PINNED Memory Transfers
> > > >> > Transfer Size (Bytes) Bandwidth(MB/s)
> > > >> > 33554432 220723.8
> > > >> >
> > > >> >
> > > >> >
> > > >> > ### deviceQuery
> > > >> >
> > > >> > deviceQuery Starting...
> > > >> >
> > > >> > CUDA Device Query (Runtime API) version (CUDART static linking)
> > > >> >
> > > >> > Detected 1 CUDA Capable device(s)
> > > >> >
> > > >> > Device 0: "GeForce GTX TITAN"
> > > >> > CUDA Driver Version / Runtime Version 5.5 / 5.0
> > > >> > CUDA Capability Major/Minor version number: 3.5
> > > >> > Total amount of global memory: 6143 MBytes
> > > >>(6441730048
> > > >> > bytes)
> > > >> > (14) Multiprocessors x (192) CUDA Cores/MP: 2688 CUDA Cores
> > > >> > GPU Clock rate: 928 MHz (0.93
> GHz)
> > > >> > Memory Clock rate: 3004 Mhz
> > > >> > Memory Bus Width: 384-bit
> > > >> > L2 Cache Size: 1572864 bytes
> > > >> > Max Texture Dimension Size (x,y,z) 1D=(65536),
> > > >> > 2D=(65536,65536), 3D=(4096,4096,4096)
> > > >> > Max Layered Texture Size (dim) x layers 1D=(16384) x
> 2048,
> > > >> > 2D=(16384,16384) x 2048
> > > >> > Total amount of constant memory: 65536 bytes
> > > >> > Total amount of shared memory per block: 49152 bytes
> > > >> > Total number of registers available per block: 65536
> > > >> > Warp size: 32
> > > >> > Maximum number of threads per multiprocessor: 2048
> > > >> > Maximum number of threads per block: 1024
> > > >> > Maximum sizes of each dimension of a block: 1024 x 1024 x 64
> > > >> > Maximum sizes of each dimension of a grid: 2147483647 x
> 65535
> > x
> > > >> 65535
> > > >> > Maximum memory pitch: 2147483647 bytes
> > > >> > Texture alignment: 512 bytes
> > > >> > Concurrent copy and kernel execution: Yes with 1 copy
> > > >> engine(s)
> > > >> > Run time limit on kernels: No
> > > >> > Integrated GPU sharing Host Memory: No
> > > >> > Support host page-locked memory mapping: Yes
> > > >> > Alignment requirement for Surfaces: Yes
> > > >> > Device has ECC support: Disabled
> > > >> > Device supports Unified Addressing (UVA): Yes
> > > >> > Device PCI Bus ID / PCI location ID: 3 / 0
> > > >> > Compute Mode:
> > > >> > < Exclusive Process (many threads in one process is able to
> use
> > > >> > ::cudaSetDevice() with this device) >
> > > >> >
> > > >> > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA
> > > >> Runtime
> > > >> > Version = 5.0, NumDevs = 1, Device0 = GeForce GTX TITAN
> > > >> >
> > > >> >
> > > >> >
> > > >> > On 18 June 2013 06:21, ET <sketchfoot.gmail.com> wrote:
> > > >> >
> > > >> > > Hi,
> > > >> > >
> > > >> > > I am trying to run NPT simulations using pmemd.cuda using TITAN
> > > >> graphics
> > > >> > > cards. The equilibration & steps were completed with the CPU
> > > >>version
> > > >> of
> > > >> > > sander.
> > > >> > >
> > > >> > > I have 2x EVGA superclocked TITAN cards.There have been problems
> > > >>with
> > > >> the
> > > >> > > TITAN graphics cards and I RMA'd one. I have benchmarked both
> the
> > > >>cards
> > > >> > > after the RMA and determined that they have no obvious problems
> > that
> > > >> > would
> > > >> > > warrant them being RMA'd again. Though there is an issue with
> the
> > > >>AMBER
> > > >> > > cuda code and TITANs in general as discussed in the following
> > > >>thread:
> > > >> > >
> > > >> > > < sorry, can't find it, but it's a ~200 posts long and titled:
> > > >> > experiences
> > > >> > > with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in
> > > >>Linux
> > > >> ?
> > > >> > >
> > > >> > >
> > > >> > > As I'm not sure whether this is the same issue, I'm posting this
> > in
> > > >>a
> > > >> new
> > > >> > > thread.
> > > >> > >
> > > >> > > I began running 12 100ns production run using TITAN-a. There
> were
> > no
> > > >> > > problems. After waiting for and testing the replacement card
> > > >> (TITAN-b), I
> > > >> > > put that into the machine as well. So both cards were working on
> > > >> > finishing
> > > >> > > the total of 300 segments.
> > > >> > >
> > > >> > > Very shortly, all the segments had failed, though the cards
> still
> > > >> showed
> > > >> > a
> > > >> > > 100% utilisation and I did not realise until I checked the
> > outfiles
> > > >> which
> > > >> > > showed "ERROR: nfft1 must be in the range of blah, blah,
> blah"
> > > >> (error
> > > >> > > posted below). This was pretty weird as I am used to the jobs
> > > >>visibly
> > > >> > > failing and not carrying on eating resources, whilst doing
> > nothing.
> > > >> > >
> > > >> > > So I pulled the TITAN-a out and restarted the calculations with
> > > >>TITAN-b
> > > >> > > from the last good rst. Usually 2 back. There have been no
> > problems
> > > >>at
> > > >> > all
> > > >> > > and all the simulations have completed.
> > > >> > >
> > > >> > > My hardware specs are:
> > > >> > > Gigabyte GA-X58-UD7 mobo
> > > >> > > i7-930 processor
> > > >> > > 6GB RAM
> > > >> > > 1200 Watt Bequiet power supply
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > Does anyone have any idea as to what's going on?
> > > >> > >
> > > >> > >
> > > >> > > br,
> > > >> > > g
> > > >> > >
> > > >> > > ############################################################
> > > >> > > ############################################################
> > > >> > >
> -------------------------------------------------------
> > > >> > > Amber 12 SANDER 2012
> > > >> > >
> -------------------------------------------------------
> > > >> > >
> > > >> > > | PMEMD implementation of SANDER, Release 12
> > > >> > >
> > > >> > > | Run on 06/09/2013 at 16:26:10
> > > >> > >
> > > >> > > [-O]verwriting output
> > > >> > >
> > > >> > > File Assignments:
> > > >> > > | MDIN: prod.in
> > > >> > >
> > > >> > > | MDOUT: md_4.out
> > > >> > >
> > > >> > > | INPCRD: md_3.rst
> > > >> > >
> > > >> > > | PARM: ../leap/TMC_I54V-V82S_Complex_25.parm
> > > >> > >
> > > >> > > | RESTRT: md_4.rst
> > > >> > >
> > > >> > > | REFC: refc
> > > >> > >
> > > >> > > | MDVEL: mdvel
> > > >> > >
> > > >> > > | MDEN: mden
> > > >> > >
> > > >> > > | MDCRD: md_4.ncdf
> > > >> > >
> > > >> > > | MDINFO: mdinfo
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > Here is the input file:
> > > >> > >
> > > >> > > Constant pressure constant temperature production run
> > > >> > >
> > > >> > > &cntrl
> > > >> > >
> > > >> > > nstlim=2000000, dt=0.002, ntx=5, irest=1, ntpr=250, ntwr=1000,
> > > >> > ntwx=500,
> > > >> > >
> > > >> > > temp0=300.0, ntt=1, tautp=2.0, ioutfm=1, ig=-1, ntxo=2,
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > ntb=2, ntp=1,
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > ntc=2, ntf=2,
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > nrespa=1,
> > > >> > >
> > > >> > > &end
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > Note: ig = -1. Setting random seed based on wallclock time in
> > > >> > microseconds.
> > > >> > >
> > > >> > > |--------------------- INFORMATION ----------------------
> > > >> > > | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> > > >> > > | Version 12.3
> > > >> > > |
> > > >> > > | 04/24/2013
> > > >> > > |
> > > >> > > | Implementation by:
> > > >> > > | Ross C. Walker (SDSC)
> > > >> > > | Scott Le Grand (nVIDIA)
> > > >> > > | Duncan Poole (nVIDIA)
> > > >> > > |
> > > >> > > | CAUTION: The CUDA code is currently experimental.
> > > >> > > | You use it at your own risk. Be sure to
> > > >> > > | check ALL results carefully.
> > > >> > > |
> > > >> > > | Precision model in use:
> > > >> > > | [SPFP] - Mixed Single/Double/Fixed Point Precision.
> > > >> > > | (Default)
> > > >> > > |
> > > >> > > |--------------------------------------------------------
> > > >> > >
> > > >> > > |----------------- CITATION INFORMATION -----------------
> > > >> > > |
> > > >> > > | When publishing work that utilized the CUDA version
> > > >> > > | of AMBER, please cite the following in addition to
> > > >> > > | the regular AMBER citations:
> > > >> > > |
> > > >> > > | - Romelia Salomon-Ferrer; Andreas W. Goetz; Duncan
> > > >> > > | Poole; Scott Le Grand; Ross C. Walker "Routine
> > > >> > > | microsecond molecular dynamics simulations with
> > > >> > > | AMBER - Part II: Particle Mesh Ewald", J. Chem.
> > > >> > > | Theory Comput., 2012, (In review).
> > > >> > > |
> > > >> > > | - Andreas W. Goetz; Mark J. Williamson; Dong Xu;
> > > >> > > | Duncan Poole; Scott Le Grand; Ross C. Walker
> > > >> > > | "Routine microsecond molecular dynamics simulations
> > > >> > > | with AMBER - Part I: Generalized Born", J. Chem.
> > > >> > > | Theory Comput., 2012, 8 (5), pp1542-1555.
> > > >> > > |
> > > >> > > | - Scott Le Grand; Andreas W. Goetz; Ross C. Walker
> > > >> > > | "SPFP: Speed without compromise - a mixed precision
> > > >> > > | model for GPU accelerated molecular dynamics
> > > >> > > | simulations.", Comp. Phys. Comm., 2013, 184
> > > >> > > | pp374-380, DOI: 10.1016/j.cpc.2012.09.022
> > > >> > > |
> > > >> > > |--------------------------------------------------------
> > > >> > >
> > > >> > > |------------------- GPU DEVICE INFO --------------------
> > > >> > > |
> > > >> > > | CUDA Capable Devices Detected: 2
> > > >> > > | CUDA Device ID in use: 0
> > > >> > > | CUDA Device Name: GeForce GTX TITAN
> > > >> > > | CUDA Device Global Mem Size: 6143 MB
> > > >> > > | CUDA Device Num Multiprocessors: 14
> > > >> > > | CUDA Device Core Freq: 0.93 GHz
> > > >> > > |
> > > >> > > |--------------------------------------------------------
> > > >> > >
> > > >> > > | ERROR: nfft1 must be in the range of 6 to 512!
> > > >> > > | ERROR: nfft2 must be in the range of 6 to 512!
> > > >> > > | ERROR: nfft3 must be in the range of 6 to 512!
> > > >> > > | ERROR: a must be in the range of 0.10000E+01 to
> 0.10000E+04!
> > > >> > > | ERROR: b must be in the range of 0.10000E+01 to
> 0.10000E+04!
> > > >> > > | ERROR: c must be in the range of 0.10000E+01 to
> 0.10000E+04!
> > > >> > >
> > > >> > > Input errors occurred. Terminating execution.
> > > >> > > ############################################################
> > > >> > > ############################################################
> > > >> > >
> > > >> > _______________________________________________
> > > >> > AMBER mailing list
> > > >> > AMBER.ambermd.org
> > > >> > http://lists.ambermd.org/mailman/listinfo/amber
> > > >> >
> > > >> _______________________________________________
> > > >> AMBER mailing list
> > > >> AMBER.ambermd.org
> > > >> http://lists.ambermd.org/mailman/listinfo/amber
> > > >>
> > > >_______________________________________________
> > > >AMBER mailing list
> > > >AMBER.ambermd.org
> > > >http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > >
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jun 20 2013 - 12:30:03 PDT
Custom Search