Re: [AMBER] CUDA running error from Albert on 2012-05-08 (Amber Archive May 2012)

From: Albert <mailmd2011.gmail.com>
Date: Tue, 08 May 2012 19:41:44 +0200

HI Ross:

here is the output for "ifort -V"

Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on
Intel(R) 64, Version 12.1.3.293 Build 20120212
Copyright (C) 1985-2012 Intel Corporation. All rights reserved.
FOR NON-COMMERCIAL USE ONLY

On 05/08/2012 07:24 PM, Ross Walker wrote:
> Hi Albert,
>
> So just to confirm (forget parallel for the moment - it's not of any use on GTX590s anyway and for now will just complicate debugging)...
>
> "Serial GPU does NOT work with Intel Parallel Studio XE 2011 for Linux?"
>
> What does 'ifort -V' give?
>
> All the best
> Ross
>
>> -----Original Message-----
>> From: Albert [mailto:mailmd2011.gmail.com]
>> Sent: Tuesday, May 08, 2012 10:12 AM
>> To: AMBER Mailing List
>> Subject: Re: [AMBER] CUDA running error
>>
>> Hi ROss:
>>
>> thanks a lot for prompt reply.
>> I use *Intel^® Parallel Studio XE 2011 for Linux**
>> <https://registrationcenter.intel.com/RegCenter/NComForm.aspx?ProductID
>> =1540>
>> and mpich2-1.4.1. The MPI goes well for intel compiler. neither series
>> or MPI.CUDA works although I don't get any error when I compile tjhem.
>> It doesn't work when I try to submit for job running.... Yes, I submit
>> errors in previous thread, and here they are:
>>
>>
>>
>> mpiexec -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -i md.in -p bm.prmtop -c
>> npt.rst -o md.out -r md.rst -x md.mdcrd&
>>
>>
>> but it failed with following infomrations:
>>
>>
>>
>>
>>
>> --------md.out-----------
>> -------------------------------------------------------
>> Amber 12 SANDER 2012
>> -------------------------------------------------------
>>
>> | PMEMD implementation of SANDER, Release 12
>>
>> | Run on 05/07/2012 at 22:26:15
>>
>> File Assignments:
>> | MDIN: md.in
>> | MDOUT: md.out
>> | INPCRD: npt.rst
>> | PARM: bm.prmtop
>> | RESTRT: md.rst
>> | REFC: refc
>> | MDVEL: mdvel
>> | MDEN: mden
>> | MDCRD: md.mdcrd
>> | MDINFO: mdinfo
>> |LOGFILE: logfile
>>
>>
>> Here is the input file:
>>
>> production dynamics
>> &cntrl
>> imin=0, irest=1, ntx=5,
>> nstlim=10000000, dt=0.002,
>> ntc=2, ntf=2,
>> cut=10.0, ntb=2, ntp=1, taup=2.0,
>> ntpr=5000, ntwx=5000, ntwr=50000,
>> ntt=3, gamma_ln=2.0,
>> temp0=300.0,
>> /
>>
>>
>> |--------------------- INFORMATION ----------------------
>> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
>> | Version 12.0
>> |
>> | 03/19/2012
>> |
>> | Implementation by:
>> | Ross C. Walker (SDSC)
>> | Scott Le Grand (nVIDIA)
>> | Duncan Poole (nVIDIA)
>> |
>> | CAUTION: The CUDA code is currently experimental.
>> | You use it at your own risk. Be sure to
>> | check ALL results carefully.
>> |
>> | Precision model in use:
>> | [SPDP] - Hybrid Single/Double Precision (Default).
>> |
>> |--------------------------------------------------------
>>
>> |------------------- GPU DEVICE INFO --------------------
>> |
>> | Task ID: 0
>> | CUDA Capable Devices Detected: 4
>> | CUDA Device ID in use: 0
>> | CUDA Device Name: GeForce GTX 590
>> | CUDA Device Global Mem Size: 1535 MB
>> | CUDA Device Num Multiprocessors: 0
>> | CUDA Device Core Freq: 1.22 GHz
>> |
>> |
>> | Task ID: 1
>> | CUDA Capable Devices Detected: 4
>> | CUDA Device ID in use: 1
>> | CUDA Device Name: GeForce GTX 590
>> | CUDA Device Global Mem Size: 1535 MB
>> | CUDA Device Num Multiprocessors: 0
>> | CUDA Device Core Freq: 1.22 GHz
>> |
>> |--------------------------------------------------------
>>
>>
>> | Conditional Compilation Defines Used:
>> | DIRFRC_COMTRANS
>> | DIRFRC_EFS
>> | DIRFRC_NOVEC
>> | MPI
>> | PUBFFT
>> | FFTLOADBAL_2PROC
>> | BINTRAJ
>> | MKL
>> | CUDA
>>
>> | Largest sphere to fit in unit cell has radius = 33.920
>>
>> | New format PARM file being parsed.
>> | Version = 1.000 Date = 05/02/12 Time = 13:49:08
>>
>> | Note: 1-4 EEL scale factors are being read from the topology file.
>>
>> | Note: 1-4 VDW scale factors are being read from the topology file.
>> | Duplicated 0 dihedrals
>>
>> | Duplicated 0 dihedrals
>>
>> -----------------------------------------------------------------------
>> ---------
>> 1. RESOURCE USE:
>> -----------------------------------------------------------------------
>> ---------
>>
>> getting new box info from bottom of inpcrd
>>
>> NATOM = 36356 NTYPES = 19 NBONH = 33899 MBONA = 2451
>> NTHETH = 5199 MTHETA = 3321 NPHIH = 10329 MPHIA = 8468
>> NHPARM = 0 NPARM = 0 NNB = 67990 NRES = 10898
>> NBONA = 2451 NTHETA = 3321 NPHIA = 8468 NUMBND = 61
>> NUMANG = 120 NPTRA = 71 NATYP = 45 NPHB = 1
>> IFBOX = 1 NMXRS = 24 IFCAP = 0 NEXTRA = 0
>> NCOPY = 0
>>
>> | Coordinate Index Table dimensions: 12 12 11
>> | Direct force subcell size = 6.0700 6.1089 6.1673
>>
>> BOX TYPE: RECTILINEAR
>>
>> -----------------------------------------------------------------------
>> ---------
>> 2. CONTROL DATA FOR THE RUN
>> -----------------------------------------------------------------------
>> ---------
>>
>> default_name
>>
>> General flags:
>> imin = 0, nmropt = 0
>>
>> Nature and format of input:
>> ntx = 5, irest = 1, ntrx = 1
>>
>> Nature and format of output:
>> ntxo = 1, ntpr = 5000, ntrx = 1, ntwr
>> = 50000
>> iwrap = 0, ntwx = 5000, ntwv = 0, ntwe
>> = 0
>> ioutfm = 0, ntwprt = 0, idecomp = 0,
>> rbornstat= 0
>>
>> Potential function:
>> ntf = 2, ntb = 2, igb = 0, nsnb
>> = 25
>> ipol = 0, gbsa = 0, iesp = 0
>> dielc = 1.00000, cut = 10.00000, intdiel = 1.00000
>>
>> Frozen or restrained atoms:
>> ibelly = 0, ntr = 0
>>
>> Molecular dynamics:
>> nstlim = 10000000, nscm = 1000, nrespa = 1
>> t = 0.00000, dt = 0.00200, vlimit = -1.00000
>>
>> Langevin dynamics temperature regulation:
>> ig = 71277
>> temp0 = 300.00000, tempi = 0.00000, gamma_ln= 2.00000
>>
>> Pressure regulation:
>> ntp = 1
>> pres0 = 1.00000, comp = 44.60000, taup = 2.00000
>>
>> SHAKE:
>> ntc = 2, jfastw = 0
>> tol = 0.00001
>>
>> | Intermolecular bonds treatment:
>> | no_intermolecular_bonds = 1
>>
>> | Energy averages sample interval:
>> | ene_avg_sampling = 5000
>>
>> Ewald parameters:
>> verbose = 0, ew_type = 0, nbflag = 1, use_pme
>> = 1
>> vdwmeth = 1, eedmeth = 1, netfrc = 1
>> Box X = 72.839 Box Y = 73.307 Box Z = 67.840
>> Alpha = 90.000 Beta = 90.000 Gamma = 90.000
>> NFFT1 = 80 NFFT2 = 80 NFFT3 = 64
>> Cutoff= 10.000 Tol =0.100E-04
>> Ewald Coefficient = 0.27511
>> Interpolation order = 4
>>
>> | PMEMD ewald parallel performance parameters:
>> | block_fft = 0
>> | fft_blk_y_divisor = 2
>> | excl_recip = 0
>> | excl_master = 0
>> | atm_redist_freq = 320
>>
>> -----------------------------------------------------------------------
>> ---------
>> 3. ATOMIC COORDINATES AND VELOCITIES
>> -----------------------------------------------------------------------
>> ---------
>>
>> default_name
>> begin time read from input coords = 1300.000 ps
>>
>>
>> Number of triangulated 3-point waters found: 10538
>>
>> Sum of charges from parm topology file = -0.00000015
>> Forcing neutrality...
>>
>>
>>
>> --------------logfile--------------------------------
>> FFT slabs assigned to 1 tasks
>> Maximum of 64 xy slabs per task
>> Maximum of 80 zx slabs per task
>> Count of FFT xy slabs assigned to each task:
>> 0 64
>> Count of FFT xz slabs assigned to each task:
>> 0 80
>>
>>
>> -----------terminal--------------log---------------------
>> Image PC Routine Line
>> Source
>> pmemd.cuda.MPI 000000000057E4BD Unknown Unknown
>> Unknown
>> pmemd.cuda.MPI 000000000057EB62 Unknown Unknown
>> Unknown
>> pmemd.cuda.MPI 0000000000555DF5 Unknown Unknown
>> Unknown
>> pmemd.cuda.MPI 000000000051D5F2 Unknown Unknown
>> Unknown
>> pmemd.cuda.MPI 00000000004F901E Unknown Unknown
>> Unknown
>> pmemd.cuda.MPI 00000000004057FC Unknown Unknown
>> Unknown
>> libc.so.6 00002B98685A8BFD Unknown Unknown
>> Unknown
>> pmemd.cuda.MPI 00000000004056F9 Unknown Unknown
>> Unknown
>> forrtl: severe (71): integer divide by zero
>> Image PC Routine Line
>> Source
>> pmemd.cuda.MPI 000000000057E4BD Unknown Unknown
>> Unknown
>> pmemd.cuda.MPI 000000000057EB62 Unknown Unknown
>> Unknown
>> pmemd.cuda.MPI 0000000000555DF5 Unknown Unknown
>> Unknown
>> pmemd.cuda.MPI 000000000051D5F2 Unknown Unknown
>> Unknown
>> pmemd.cuda.MPI 00000000004F901E Unknown Unknown
>> Unknown
>> pmemd.cuda.MPI 00000000004057FC Unknown Unknown
>> Unknown
>> libc.so.6 00002AF6CA4B0BFD Unknown Unknown
>> Unknown
>> pmemd.cuda.MPI 00000000004056F9 Unknown Unknown
>> Unknown
>>
>>
>>
>>
>> On 05/08/2012 06:57 PM, Ross Walker wrote:
>>> Hi Albert,
>>>
>>> Which version of Intel and which version of Mpich2 and which version
>> of
>>> nvcc?
>>>
>>> It works fine for me with Intel 11.1.069 and mpich2-1.4. Also does
>> the CPU
>>> version work fine with your intel and mpich2?
>>>
>>> Please please please make sure you isolate errors to the GPU code
>> before
>>> reporting them. I.e. ALWAYS test the cpu codes thoroughly first.
>>>
>>> I assume the serial GPU code works fine with Intel? - Also note that
>> you
>>> will see little to no performance improvement with the Intel
>> compilers over
>>> GNU for the GPU code.
>>>
>>> All the best
>>> Ross
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue May 08 2012 - 11:00:02 PDT