Re: [AMBER] CUDA running error from Ross Walker on 2012-05-08 (Amber Archive May 2012)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 8 May 2012 10:24:30 -0700

Hi Albert,

So just to confirm (forget parallel for the moment - it's not of any use on GTX590s anyway and for now will just complicate debugging)...

"Serial GPU does NOT work with Intel Parallel Studio XE 2011 for Linux?"

What does 'ifort -V' give?

All the best
Ross

> -----Original Message-----
> From: Albert [mailto:mailmd2011.gmail.com]
> Sent: Tuesday, May 08, 2012 10:12 AM
> To: AMBER Mailing List
> Subject: Re: [AMBER] CUDA running error
>
> Hi ROss:
>
> thanks a lot for prompt reply.
> I use *Intel^® Parallel Studio XE 2011 for Linux**
> <https://registrationcenter.intel.com/RegCenter/NComForm.aspx?ProductID
> =1540>
> and mpich2-1.4.1. The MPI goes well for intel compiler. neither series
> or MPI.CUDA works although I don't get any error when I compile tjhem.
> It doesn't work when I try to submit for job running.... Yes, I submit
> errors in previous thread, and here they are:
>
>
>
> mpiexec -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -i md.in -p bm.prmtop -c
> npt.rst -o md.out -r md.rst -x md.mdcrd &
>
>
> but it failed with following infomrations:
>
>
>
>
>
> --------md.out-----------
> -------------------------------------------------------
> Amber 12 SANDER 2012
> -------------------------------------------------------
>
> | PMEMD implementation of SANDER, Release 12
>
> | Run on 05/07/2012 at 22:26:15
>
> File Assignments:
> | MDIN: md.in
> | MDOUT: md.out
> | INPCRD: npt.rst
> | PARM: bm.prmtop
> | RESTRT: md.rst
> | REFC: refc
> | MDVEL: mdvel
> | MDEN: mden
> | MDCRD: md.mdcrd
> | MDINFO: mdinfo
> |LOGFILE: logfile
>
>
> Here is the input file:
>
> production dynamics
> &cntrl
> imin=0, irest=1, ntx=5,
> nstlim=10000000, dt=0.002,
> ntc=2, ntf=2,
> cut=10.0, ntb=2, ntp=1, taup=2.0,
> ntpr=5000, ntwx=5000, ntwr=50000,
> ntt=3, gamma_ln=2.0,
> temp0=300.0,
> /
>
>
> |--------------------- INFORMATION ----------------------
> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> | Version 12.0
> |
> | 03/19/2012
> |
> | Implementation by:
> | Ross C. Walker (SDSC)
> | Scott Le Grand (nVIDIA)
> | Duncan Poole (nVIDIA)
> |
> | CAUTION: The CUDA code is currently experimental.
> | You use it at your own risk. Be sure to
> | check ALL results carefully.
> |
> | Precision model in use:
> | [SPDP] - Hybrid Single/Double Precision (Default).
> |
> |--------------------------------------------------------
>
> |------------------- GPU DEVICE INFO --------------------
> |
> | Task ID: 0
> | CUDA Capable Devices Detected: 4
> | CUDA Device ID in use: 0
> | CUDA Device Name: GeForce GTX 590
> | CUDA Device Global Mem Size: 1535 MB
> | CUDA Device Num Multiprocessors: 0
> | CUDA Device Core Freq: 1.22 GHz
> |
> |
> | Task ID: 1
> | CUDA Capable Devices Detected: 4
> | CUDA Device ID in use: 1
> | CUDA Device Name: GeForce GTX 590
> | CUDA Device Global Mem Size: 1535 MB
> | CUDA Device Num Multiprocessors: 0
> | CUDA Device Core Freq: 1.22 GHz
> |
> |--------------------------------------------------------
>
>
> | Conditional Compilation Defines Used:
> | DIRFRC_COMTRANS
> | DIRFRC_EFS
> | DIRFRC_NOVEC
> | MPI
> | PUBFFT
> | FFTLOADBAL_2PROC
> | BINTRAJ
> | MKL
> | CUDA
>
> | Largest sphere to fit in unit cell has radius = 33.920
>
> | New format PARM file being parsed.
> | Version = 1.000 Date = 05/02/12 Time = 13:49:08
>
> | Note: 1-4 EEL scale factors are being read from the topology file.
>
> | Note: 1-4 VDW scale factors are being read from the topology file.
> | Duplicated 0 dihedrals
>
> | Duplicated 0 dihedrals
>
> -----------------------------------------------------------------------
> ---------
> 1. RESOURCE USE:
> -----------------------------------------------------------------------
> ---------
>
> getting new box info from bottom of inpcrd
>
> NATOM = 36356 NTYPES = 19 NBONH = 33899 MBONA = 2451
> NTHETH = 5199 MTHETA = 3321 NPHIH = 10329 MPHIA = 8468
> NHPARM = 0 NPARM = 0 NNB = 67990 NRES = 10898
> NBONA = 2451 NTHETA = 3321 NPHIA = 8468 NUMBND = 61
> NUMANG = 120 NPTRA = 71 NATYP = 45 NPHB = 1
> IFBOX = 1 NMXRS = 24 IFCAP = 0 NEXTRA = 0
> NCOPY = 0
>
> | Coordinate Index Table dimensions: 12 12 11
> | Direct force subcell size = 6.0700 6.1089 6.1673
>
> BOX TYPE: RECTILINEAR
>
> -----------------------------------------------------------------------
> ---------
> 2. CONTROL DATA FOR THE RUN
> -----------------------------------------------------------------------
> ---------
>
> default_name
>
> General flags:
> imin = 0, nmropt = 0
>
> Nature and format of input:
> ntx = 5, irest = 1, ntrx = 1
>
> Nature and format of output:
> ntxo = 1, ntpr = 5000, ntrx = 1, ntwr
> = 50000
> iwrap = 0, ntwx = 5000, ntwv = 0, ntwe
> = 0
> ioutfm = 0, ntwprt = 0, idecomp = 0,
> rbornstat= 0
>
> Potential function:
> ntf = 2, ntb = 2, igb = 0, nsnb
> = 25
> ipol = 0, gbsa = 0, iesp = 0
> dielc = 1.00000, cut = 10.00000, intdiel = 1.00000
>
> Frozen or restrained atoms:
> ibelly = 0, ntr = 0
>
> Molecular dynamics:
> nstlim = 10000000, nscm = 1000, nrespa = 1
> t = 0.00000, dt = 0.00200, vlimit = -1.00000
>
> Langevin dynamics temperature regulation:
> ig = 71277
> temp0 = 300.00000, tempi = 0.00000, gamma_ln= 2.00000
>
> Pressure regulation:
> ntp = 1
> pres0 = 1.00000, comp = 44.60000, taup = 2.00000
>
> SHAKE:
> ntc = 2, jfastw = 0
> tol = 0.00001
>
> | Intermolecular bonds treatment:
> | no_intermolecular_bonds = 1
>
> | Energy averages sample interval:
> | ene_avg_sampling = 5000
>
> Ewald parameters:
> verbose = 0, ew_type = 0, nbflag = 1, use_pme
> = 1
> vdwmeth = 1, eedmeth = 1, netfrc = 1
> Box X = 72.839 Box Y = 73.307 Box Z = 67.840
> Alpha = 90.000 Beta = 90.000 Gamma = 90.000
> NFFT1 = 80 NFFT2 = 80 NFFT3 = 64
> Cutoff= 10.000 Tol =0.100E-04
> Ewald Coefficient = 0.27511
> Interpolation order = 4
>
> | PMEMD ewald parallel performance parameters:
> | block_fft = 0
> | fft_blk_y_divisor = 2
> | excl_recip = 0
> | excl_master = 0
> | atm_redist_freq = 320
>
> -----------------------------------------------------------------------
> ---------
> 3. ATOMIC COORDINATES AND VELOCITIES
> -----------------------------------------------------------------------
> ---------
>
> default_name
> begin time read from input coords = 1300.000 ps
>
>
> Number of triangulated 3-point waters found: 10538
>
> Sum of charges from parm topology file = -0.00000015
> Forcing neutrality...
>
>
>
> --------------logfile--------------------------------
> FFT slabs assigned to 1 tasks
> Maximum of 64 xy slabs per task
> Maximum of 80 zx slabs per task
> Count of FFT xy slabs assigned to each task:
> 0 64
> Count of FFT xz slabs assigned to each task:
> 0 80
>
>
> -----------terminal--------------log---------------------
> Image PC Routine Line
> Source
> pmemd.cuda.MPI 000000000057E4BD Unknown Unknown
> Unknown
> pmemd.cuda.MPI 000000000057EB62 Unknown Unknown
> Unknown
> pmemd.cuda.MPI 0000000000555DF5 Unknown Unknown
> Unknown
> pmemd.cuda.MPI 000000000051D5F2 Unknown Unknown
> Unknown
> pmemd.cuda.MPI 00000000004F901E Unknown Unknown
> Unknown
> pmemd.cuda.MPI 00000000004057FC Unknown Unknown
> Unknown
> libc.so.6 00002B98685A8BFD Unknown Unknown
> Unknown
> pmemd.cuda.MPI 00000000004056F9 Unknown Unknown
> Unknown
> forrtl: severe (71): integer divide by zero
> Image PC Routine Line
> Source
> pmemd.cuda.MPI 000000000057E4BD Unknown Unknown
> Unknown
> pmemd.cuda.MPI 000000000057EB62 Unknown Unknown
> Unknown
> pmemd.cuda.MPI 0000000000555DF5 Unknown Unknown
> Unknown
> pmemd.cuda.MPI 000000000051D5F2 Unknown Unknown
> Unknown
> pmemd.cuda.MPI 00000000004F901E Unknown Unknown
> Unknown
> pmemd.cuda.MPI 00000000004057FC Unknown Unknown
> Unknown
> libc.so.6 00002AF6CA4B0BFD Unknown Unknown
> Unknown
> pmemd.cuda.MPI 00000000004056F9 Unknown Unknown
> Unknown
>
>
>
>
> On 05/08/2012 06:57 PM, Ross Walker wrote:
> > Hi Albert,
> >
> > Which version of Intel and which version of Mpich2 and which version
> of
> > nvcc?
> >
> > It works fine for me with Intel 11.1.069 and mpich2-1.4. Also does
> the CPU
> > version work fine with your intel and mpich2?
> >
> > Please please please make sure you isolate errors to the GPU code
> before
> > reporting them. I.e. ALWAYS test the cpu codes thoroughly first.
> >
> > I assume the serial GPU code works fine with Intel? - Also note that
> you
> > will see little to no performance improvement with the Intel
> compilers over
> > GNU for the GPU code.
> >
> > All the best
> > Ross
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue May 08 2012 - 10:30:05 PDT