Re: [AMBER] ERROR: GPU runs fail with nfft Error only when running 2x geforce TITANS in same machine from ET on 2013-06-20 (Amber Archive Jun 2013)

From: ET <sketchfoot.gmail.com>
Date: Thu, 20 Jun 2013 19:42:42 +0100

Hi Ross,

What you say makes sense. The error does occur at the start of the
production segment. However, IMO it is GPU related as if you look in the
previous segment that generated the restart file for the segment that
failed, there are ******'s written in the outfile.

Restarting the run with the nfft error inevitably fails, but restarting
from the previous run works fine in all the cases I have tried so far. So
the error that lead to NAN in the co-ordinate file only happens
intermittently.

I can play the trajectories fine so the my input parameters are fine. It's
just that along the way the written co-ordinates get messed up. This occurs
with very high frequency (i.e.inevitable) in my dual GPU-setup. The problem
happens very rarely (but still occurs) in single-GPU.

What do you think?

br,
g

On 20 June 2013 17:37, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi ET,
>
> This cannot possibly be bandwidth limited. This error is triggered in the
> CPU code (vanilla Fortran) long before any GPU calculations are fired up.
> It is an initial check by the CPU at the time it is reading in the
> coordinates. Have you tried this with the CPU version of PMEMD? What does
> that report?
>
> All the best
> Ross
>
>
>
> On 6/20/13 7:56 AM, "ET" <sketchfoot.gmail.com> wrote:
>
> >Hi Scott,
> >
> >Hmmm. I'm thinking this is maybe a bandwidth issue. The problem occurs
> >when
> >running the card in dual GPU-config and the fail in the single-GPU mode
> >occurred on a machine with strangely low results.
> >
> >If I load the other card back into dual mode, I'll run the bandwidthtest
> >again. The slots should both have x16 bandwidth even if both are
> >populated.
> >Do you think an unusual number of peripherals such as HDDs will make a
> >difference?
> >
> >Also you say that it is not CUDA code or the GPU. This isn't an OS error
> >is
> >it? Or is it AMBER reporting an underlying OS error?
> >
> >
> >##########################################
> >
> >[CUDA Bandwidth Test] - Starting...
> >Running on...
> >
> > Device 0: GeForce GTX TITAN
> > Quick Mode
> >
> > Host to Device Bandwidth, 1 Device(s)
> > PINNED Memory Transfers
> > Transfer Size (Bytes) Bandwidth(MB/s)
> > 33554432 3930.7
> >
> > Device to Host Bandwidth, 1 Device(s)
> > PINNED Memory Transfers
> > Transfer Size (Bytes) Bandwidth(MB/s)
> > 33554432 2100.4
> >
> > Device to Device Bandwidth, 1 Device(s)
> > PINNED Memory Transfers
> > Transfer Size (Bytes) Bandwidth(MB/s)
> > 33554432 220731.1
> >
> >
> >
> >br,
> >g
> >
> >
> >On 19 June 2013 18:15, Scott Le Grand <varelse2005.gmail.com> wrote:
> >
> >> | ERROR: nfft1 must be in the range of 6 to 512!
> >> | ERROR: nfft2 must be in the range of 6 to 512!
> >> | ERROR: nfft3 must be in the range of 6 to 512!
> >> | ERROR: a must be in the range of 0.10000E+01 to 0.10000E+04!
> >> | ERROR: b must be in the range of 0.10000E+01 to 0.10000E+04!
> >> | ERROR: c must be in the range of 0.10000E+01 to 0.10000E+04!
> >>
> >> That's not the CUDA code or the GPU. That's something *very* bad with
> >>your
> >> machine having somehow convinced itself of something even worse. No
> >>idea
> >> what though
> >>
> >>
> >>
> >>
> >> On Mon, Jun 17, 2013 at 10:24 PM, ET <sketchfoot.gmail.com> wrote:
> >>
> >> > ps. The machine is running in headless mode on centos 6
> >> >
> >> >
> >> > #### bandwitdh test for currently installed TITAN-b:
> >> >
> >> > [CUDA Bandwidth Test] - Starting...
> >> > Running on...
> >> >
> >> > Device 0: GeForce GTX TITAN
> >> > Quick Mode
> >> >
> >> > Host to Device Bandwidth, 1 Device(s)
> >> > PINNED Memory Transfers
> >> > Transfer Size (Bytes) Bandwidth(MB/s)
> >> > 33554432 6002.5
> >> >
> >> > Device to Host Bandwidth, 1 Device(s)
> >> > PINNED Memory Transfers
> >> > Transfer Size (Bytes) Bandwidth(MB/s)
> >> > 33554432 6165.5
> >> >
> >> > Device to Device Bandwidth, 1 Device(s)
> >> > PINNED Memory Transfers
> >> > Transfer Size (Bytes) Bandwidth(MB/s)
> >> > 33554432 220723.8
> >> >
> >> >
> >> >
> >> > ### deviceQuery
> >> >
> >> > deviceQuery Starting...
> >> >
> >> > CUDA Device Query (Runtime API) version (CUDART static linking)
> >> >
> >> > Detected 1 CUDA Capable device(s)
> >> >
> >> > Device 0: "GeForce GTX TITAN"
> >> > CUDA Driver Version / Runtime Version 5.5 / 5.0
> >> > CUDA Capability Major/Minor version number: 3.5
> >> > Total amount of global memory: 6143 MBytes
> >>(6441730048
> >> > bytes)
> >> > (14) Multiprocessors x (192) CUDA Cores/MP: 2688 CUDA Cores
> >> > GPU Clock rate: 928 MHz (0.93 GHz)
> >> > Memory Clock rate: 3004 Mhz
> >> > Memory Bus Width: 384-bit
> >> > L2 Cache Size: 1572864 bytes
> >> > Max Texture Dimension Size (x,y,z) 1D=(65536),
> >> > 2D=(65536,65536), 3D=(4096,4096,4096)
> >> > Max Layered Texture Size (dim) x layers 1D=(16384) x 2048,
> >> > 2D=(16384,16384) x 2048
> >> > Total amount of constant memory: 65536 bytes
> >> > Total amount of shared memory per block: 49152 bytes
> >> > Total number of registers available per block: 65536
> >> > Warp size: 32
> >> > Maximum number of threads per multiprocessor: 2048
> >> > Maximum number of threads per block: 1024
> >> > Maximum sizes of each dimension of a block: 1024 x 1024 x 64
> >> > Maximum sizes of each dimension of a grid: 2147483647 x 65535 x
> >> 65535
> >> > Maximum memory pitch: 2147483647 bytes
> >> > Texture alignment: 512 bytes
> >> > Concurrent copy and kernel execution: Yes with 1 copy
> >> engine(s)
> >> > Run time limit on kernels: No
> >> > Integrated GPU sharing Host Memory: No
> >> > Support host page-locked memory mapping: Yes
> >> > Alignment requirement for Surfaces: Yes
> >> > Device has ECC support: Disabled
> >> > Device supports Unified Addressing (UVA): Yes
> >> > Device PCI Bus ID / PCI location ID: 3 / 0
> >> > Compute Mode:
> >> > < Exclusive Process (many threads in one process is able to use
> >> > ::cudaSetDevice() with this device) >
> >> >
> >> > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA
> >> Runtime
> >> > Version = 5.0, NumDevs = 1, Device0 = GeForce GTX TITAN
> >> >
> >> >
> >> >
> >> > On 18 June 2013 06:21, ET <sketchfoot.gmail.com> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > I am trying to run NPT simulations using pmemd.cuda using TITAN
> >> graphics
> >> > > cards. The equilibration & steps were completed with the CPU
> >>version
> >> of
> >> > > sander.
> >> > >
> >> > > I have 2x EVGA superclocked TITAN cards.There have been problems
> >>with
> >> the
> >> > > TITAN graphics cards and I RMA'd one. I have benchmarked both the
> >>cards
> >> > > after the RMA and determined that they have no obvious problems that
> >> > would
> >> > > warrant them being RMA'd again. Though there is an issue with the
> >>AMBER
> >> > > cuda code and TITANs in general as discussed in the following
> >>thread:
> >> > >
> >> > > < sorry, can't find it, but it's a ~200 posts long and titled:
> >> > experiences
> >> > > with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in
> >>Linux
> >> ?
> >> > >
> >> > >
> >> > > As I'm not sure whether this is the same issue, I'm posting this in
> >>a
> >> new
> >> > > thread.
> >> > >
> >> > > I began running 12 100ns production run using TITAN-a. There were no
> >> > > problems. After waiting for and testing the replacement card
> >> (TITAN-b), I
> >> > > put that into the machine as well. So both cards were working on
> >> > finishing
> >> > > the total of 300 segments.
> >> > >
> >> > > Very shortly, all the segments had failed, though the cards still
> >> showed
> >> > a
> >> > > 100% utilisation and I did not realise until I checked the outfiles
> >> which
> >> > > showed "ERROR: nfft1 must be in the range of blah, blah, blah"
> >> (error
> >> > > posted below). This was pretty weird as I am used to the jobs
> >>visibly
> >> > > failing and not carrying on eating resources, whilst doing nothing.
> >> > >
> >> > > So I pulled the TITAN-a out and restarted the calculations with
> >>TITAN-b
> >> > > from the last good rst. Usually 2 back. There have been no problems
> >>at
> >> > all
> >> > > and all the simulations have completed.
> >> > >
> >> > > My hardware specs are:
> >> > > Gigabyte GA-X58-UD7 mobo
> >> > > i7-930 processor
> >> > > 6GB RAM
> >> > > 1200 Watt Bequiet power supply
> >> > >
> >> > >
> >> > >
> >> > > Does anyone have any idea as to what's going on?
> >> > >
> >> > >
> >> > > br,
> >> > > g
> >> > >
> >> > > ############################################################
> >> > > ############################################################
> >> > > -------------------------------------------------------
> >> > > Amber 12 SANDER 2012
> >> > > -------------------------------------------------------
> >> > >
> >> > > | PMEMD implementation of SANDER, Release 12
> >> > >
> >> > > | Run on 06/09/2013 at 16:26:10
> >> > >
> >> > > [-O]verwriting output
> >> > >
> >> > > File Assignments:
> >> > > | MDIN: prod.in
> >> > >
> >> > > | MDOUT: md_4.out
> >> > >
> >> > > | INPCRD: md_3.rst
> >> > >
> >> > > | PARM: ../leap/TMC_I54V-V82S_Complex_25.parm
> >> > >
> >> > > | RESTRT: md_4.rst
> >> > >
> >> > > | REFC: refc
> >> > >
> >> > > | MDVEL: mdvel
> >> > >
> >> > > | MDEN: mden
> >> > >
> >> > > | MDCRD: md_4.ncdf
> >> > >
> >> > > | MDINFO: mdinfo
> >> > >
> >> > >
> >> > >
> >> > > Here is the input file:
> >> > >
> >> > > Constant pressure constant temperature production run
> >> > >
> >> > > &cntrl
> >> > >
> >> > > nstlim=2000000, dt=0.002, ntx=5, irest=1, ntpr=250, ntwr=1000,
> >> > ntwx=500,
> >> > >
> >> > > temp0=300.0, ntt=1, tautp=2.0, ioutfm=1, ig=-1, ntxo=2,
> >> > >
> >> > >
> >> > >
> >> > > ntb=2, ntp=1,
> >> > >
> >> > >
> >> > >
> >> > > ntc=2, ntf=2,
> >> > >
> >> > >
> >> > >
> >> > > nrespa=1,
> >> > >
> >> > > &end
> >> > >
> >> > >
> >> > >
> >> > > Note: ig = -1. Setting random seed based on wallclock time in
> >> > microseconds.
> >> > >
> >> > > |--------------------- INFORMATION ----------------------
> >> > > | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> >> > > | Version 12.3
> >> > > |
> >> > > | 04/24/2013
> >> > > |
> >> > > | Implementation by:
> >> > > | Ross C. Walker (SDSC)
> >> > > | Scott Le Grand (nVIDIA)
> >> > > | Duncan Poole (nVIDIA)
> >> > > |
> >> > > | CAUTION: The CUDA code is currently experimental.
> >> > > | You use it at your own risk. Be sure to
> >> > > | check ALL results carefully.
> >> > > |
> >> > > | Precision model in use:
> >> > > | [SPFP] - Mixed Single/Double/Fixed Point Precision.
> >> > > | (Default)
> >> > > |
> >> > > |--------------------------------------------------------
> >> > >
> >> > > |----------------- CITATION INFORMATION -----------------
> >> > > |
> >> > > | When publishing work that utilized the CUDA version
> >> > > | of AMBER, please cite the following in addition to
> >> > > | the regular AMBER citations:
> >> > > |
> >> > > | - Romelia Salomon-Ferrer; Andreas W. Goetz; Duncan
> >> > > | Poole; Scott Le Grand; Ross C. Walker "Routine
> >> > > | microsecond molecular dynamics simulations with
> >> > > | AMBER - Part II: Particle Mesh Ewald", J. Chem.
> >> > > | Theory Comput., 2012, (In review).
> >> > > |
> >> > > | - Andreas W. Goetz; Mark J. Williamson; Dong Xu;
> >> > > | Duncan Poole; Scott Le Grand; Ross C. Walker
> >> > > | "Routine microsecond molecular dynamics simulations
> >> > > | with AMBER - Part I: Generalized Born", J. Chem.
> >> > > | Theory Comput., 2012, 8 (5), pp1542-1555.
> >> > > |
> >> > > | - Scott Le Grand; Andreas W. Goetz; Ross C. Walker
> >> > > | "SPFP: Speed without compromise - a mixed precision
> >> > > | model for GPU accelerated molecular dynamics
> >> > > | simulations.", Comp. Phys. Comm., 2013, 184
> >> > > | pp374-380, DOI: 10.1016/j.cpc.2012.09.022
> >> > > |
> >> > > |--------------------------------------------------------
> >> > >
> >> > > |------------------- GPU DEVICE INFO --------------------
> >> > > |
> >> > > | CUDA Capable Devices Detected: 2
> >> > > | CUDA Device ID in use: 0
> >> > > | CUDA Device Name: GeForce GTX TITAN
> >> > > | CUDA Device Global Mem Size: 6143 MB
> >> > > | CUDA Device Num Multiprocessors: 14
> >> > > | CUDA Device Core Freq: 0.93 GHz
> >> > > |
> >> > > |--------------------------------------------------------
> >> > >
> >> > > | ERROR: nfft1 must be in the range of 6 to 512!
> >> > > | ERROR: nfft2 must be in the range of 6 to 512!
> >> > > | ERROR: nfft3 must be in the range of 6 to 512!
> >> > > | ERROR: a must be in the range of 0.10000E+01 to 0.10000E+04!
> >> > > | ERROR: b must be in the range of 0.10000E+01 to 0.10000E+04!
> >> > > | ERROR: c must be in the range of 0.10000E+01 to 0.10000E+04!
> >> > >
> >> > > Input errors occurred. Terminating execution.
> >> > > ############################################################
> >> > > ############################################################
> >> > >
> >> > _______________________________________________
> >> > AMBER mailing list
> >> > AMBER.ambermd.org
> >> > http://lists.ambermd.org/mailman/listinfo/amber
> >> >
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jun 20 2013 - 12:00:22 PDT