Re: [AMBER] CUDA running error from Albert on 2012-05-08 (Amber Archive May 2012)

From: Albert <mailmd2011.gmail.com>
Date: Tue, 08 May 2012 19:11:31 +0200

Hi ROss:

   thanks a lot for prompt reply.
   I use *Intel^® Parallel Studio XE 2011 for Linux**
<https://registrationcenter.intel.com/RegCenter/NComForm.aspx?ProductID=1540>
and mpich2-1.4.1. The MPI goes well for intel compiler. neither series
or MPI.CUDA works although I don't get any error when I compile tjhem.
It doesn't work when I try to submit for job running.... Yes, I submit
errors in previous thread, and here they are:

mpiexec -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -i md.in -p bm.prmtop -c
npt.rst -o md.out -r md.rst -x md.mdcrd &

but it failed with following infomrations:

--------md.out-----------
           -------------------------------------------------------
           Amber 12 SANDER 2012
           -------------------------------------------------------

| PMEMD implementation of SANDER, Release 12

| Run on 05/07/2012 at 22:26:15

File Assignments:
| MDIN: md.in
| MDOUT: md.out
| INPCRD: npt.rst
| PARM: bm.prmtop
| RESTRT: md.rst
| REFC: refc
| MDVEL: mdvel
| MDEN: mden
| MDCRD: md.mdcrd
| MDINFO: mdinfo
|LOGFILE: logfile

  Here is the input file:

production dynamics
&cntrl
   imin=0, irest=1, ntx=5,
   nstlim=10000000, dt=0.002,
   ntc=2, ntf=2,
   cut=10.0, ntb=2, ntp=1, taup=2.0,
   ntpr=5000, ntwx=5000, ntwr=50000,
   ntt=3, gamma_ln=2.0,
   temp0=300.0,
  /

|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
| Version 12.0
|
| 03/19/2012
|
| Implementation by:
| Ross C. Walker (SDSC)
| Scott Le Grand (nVIDIA)
| Duncan Poole (nVIDIA)
|
| CAUTION: The CUDA code is currently experimental.
| You use it at your own risk. Be sure to
| check ALL results carefully.
|
| Precision model in use:
| [SPDP] - Hybrid Single/Double Precision (Default).
|
|--------------------------------------------------------

|------------------- GPU DEVICE INFO --------------------
|
| Task ID: 0
| CUDA Capable Devices Detected: 4
| CUDA Device ID in use: 0
| CUDA Device Name: GeForce GTX 590
| CUDA Device Global Mem Size: 1535 MB
| CUDA Device Num Multiprocessors: 0
| CUDA Device Core Freq: 1.22 GHz
|
|
| Task ID: 1
| CUDA Capable Devices Detected: 4
| CUDA Device ID in use: 1
| CUDA Device Name: GeForce GTX 590
| CUDA Device Global Mem Size: 1535 MB
| CUDA Device Num Multiprocessors: 0
| CUDA Device Core Freq: 1.22 GHz
|
|--------------------------------------------------------

| Conditional Compilation Defines Used:
| DIRFRC_COMTRANS
| DIRFRC_EFS
| DIRFRC_NOVEC
| MPI
| PUBFFT
| FFTLOADBAL_2PROC
| BINTRAJ
| MKL
| CUDA

| Largest sphere to fit in unit cell has radius = 33.920

| New format PARM file being parsed.
| Version = 1.000 Date = 05/02/12 Time = 13:49:08

| Note: 1-4 EEL scale factors are being read from the topology file.

| Note: 1-4 VDW scale factors are being read from the topology file.
| Duplicated 0 dihedrals

| Duplicated 0 dihedrals

--------------------------------------------------------------------------------
    1. RESOURCE USE:
--------------------------------------------------------------------------------

  getting new box info from bottom of inpcrd

  NATOM = 36356 NTYPES = 19 NBONH = 33899 MBONA = 2451
  NTHETH = 5199 MTHETA = 3321 NPHIH = 10329 MPHIA = 8468
  NHPARM = 0 NPARM = 0 NNB = 67990 NRES = 10898
  NBONA = 2451 NTHETA = 3321 NPHIA = 8468 NUMBND = 61
  NUMANG = 120 NPTRA = 71 NATYP = 45 NPHB = 1
  IFBOX = 1 NMXRS = 24 IFCAP = 0 NEXTRA = 0
  NCOPY = 0

| Coordinate Index Table dimensions: 12 12 11
| Direct force subcell size = 6.0700 6.1089 6.1673

      BOX TYPE: RECTILINEAR

--------------------------------------------------------------------------------
    2. CONTROL DATA FOR THE RUN
--------------------------------------------------------------------------------

default_name

General flags:
      imin = 0, nmropt = 0

Nature and format of input:
      ntx = 5, irest = 1, ntrx = 1

Nature and format of output:
      ntxo = 1, ntpr = 5000, ntrx = 1, ntwr
= 50000
      iwrap = 0, ntwx = 5000, ntwv = 0, ntwe
= 0
      ioutfm = 0, ntwprt = 0, idecomp = 0,
rbornstat= 0

Potential function:
      ntf = 2, ntb = 2, igb = 0, nsnb
= 25
      ipol = 0, gbsa = 0, iesp = 0
      dielc = 1.00000, cut = 10.00000, intdiel = 1.00000

Frozen or restrained atoms:
      ibelly = 0, ntr = 0

Molecular dynamics:
      nstlim = 10000000, nscm = 1000, nrespa = 1
      t = 0.00000, dt = 0.00200, vlimit = -1.00000

Langevin dynamics temperature regulation:
      ig = 71277
      temp0 = 300.00000, tempi = 0.00000, gamma_ln= 2.00000

Pressure regulation:
      ntp = 1
      pres0 = 1.00000, comp = 44.60000, taup = 2.00000

SHAKE:
      ntc = 2, jfastw = 0
      tol = 0.00001

| Intermolecular bonds treatment:
| no_intermolecular_bonds = 1

| Energy averages sample interval:
| ene_avg_sampling = 5000

Ewald parameters:
      verbose = 0, ew_type = 0, nbflag = 1, use_pme
= 1
      vdwmeth = 1, eedmeth = 1, netfrc = 1
      Box X = 72.839 Box Y = 73.307 Box Z = 67.840
      Alpha = 90.000 Beta = 90.000 Gamma = 90.000
      NFFT1 = 80 NFFT2 = 80 NFFT3 = 64
      Cutoff= 10.000 Tol =0.100E-04
      Ewald Coefficient = 0.27511
      Interpolation order = 4

| PMEMD ewald parallel performance parameters:
| block_fft = 0
| fft_blk_y_divisor = 2
| excl_recip = 0
| excl_master = 0
| atm_redist_freq = 320

--------------------------------------------------------------------------------
    3. ATOMIC COORDINATES AND VELOCITIES
--------------------------------------------------------------------------------

default_name
  begin time read from input coords = 1300.000 ps

  Number of triangulated 3-point waters found: 10538

      Sum of charges from parm topology file = -0.00000015
      Forcing neutrality...

--------------logfile--------------------------------
   FFT slabs assigned to 1 tasks
   Maximum of 64 xy slabs per task
   Maximum of 80 zx slabs per task
   Count of FFT xy slabs assigned to each task:
        0 64
   Count of FFT xz slabs assigned to each task:
        0 80

-----------terminal--------------log---------------------
Image PC Routine Line Source
pmemd.cuda.MPI 000000000057E4BD Unknown Unknown Unknown
pmemd.cuda.MPI 000000000057EB62 Unknown Unknown Unknown
pmemd.cuda.MPI 0000000000555DF5 Unknown Unknown Unknown
pmemd.cuda.MPI 000000000051D5F2 Unknown Unknown Unknown
pmemd.cuda.MPI 00000000004F901E Unknown Unknown Unknown
pmemd.cuda.MPI 00000000004057FC Unknown Unknown Unknown
libc.so.6 00002B98685A8BFD Unknown Unknown Unknown
pmemd.cuda.MPI 00000000004056F9 Unknown Unknown Unknown
forrtl: severe (71): integer divide by zero
Image PC Routine Line Source
pmemd.cuda.MPI 000000000057E4BD Unknown Unknown Unknown
pmemd.cuda.MPI 000000000057EB62 Unknown Unknown Unknown
pmemd.cuda.MPI 0000000000555DF5 Unknown Unknown Unknown
pmemd.cuda.MPI 000000000051D5F2 Unknown Unknown Unknown
pmemd.cuda.MPI 00000000004F901E Unknown Unknown Unknown
pmemd.cuda.MPI 00000000004057FC Unknown Unknown Unknown
libc.so.6 00002AF6CA4B0BFD Unknown Unknown Unknown
pmemd.cuda.MPI 00000000004056F9 Unknown Unknown Unknown

On 05/08/2012 06:57 PM, Ross Walker wrote:
> Hi Albert,
>
> Which version of Intel and which version of Mpich2 and which version of
> nvcc?
>
> It works fine for me with Intel 11.1.069 and mpich2-1.4. Also does the CPU
> version work fine with your intel and mpich2?
>
> Please please please make sure you isolate errors to the GPU code before
> reporting them. I.e. ALWAYS test the cpu codes thoroughly first.
>
> I assume the serial GPU code works fine with Intel? - Also note that you
> will see little to no performance improvement with the Intel compilers over
> GNU for the GPU code.
>
> All the best
> Ross

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue May 08 2012 - 10:30:04 PDT