Re: [AMBER] Cuda test hangs with Amber12 compiled with CUDA5

From: Ismail, Mohd F. <farid.ou.edu>
Date: Fri, 19 Oct 2012 18:12:59 +0000

I think just yesterday, Scott mentioned that CUDA 5 is not supported yet. Search the archive.

*******************************
Mohd Ismail
Graduate Student
Dept. of Chemistry/Biochemistry
University of Oklahoma
Norman 73019

________________________________________
From: Jean-Christophe Ducom [jcducom.scripps.edu]
Sent: Friday, October 19, 2012 12:40 PM
To: amber.ambermd.org
Subject: [AMBER] Cuda test hangs with Amber12 compiled with CUDA5

Hi-
We have been updated GPU nodes to CUDA5 and some Amber12 tests are hanging.
The system is running Opensuse11.3 on 16cores E5-2650 0 . 2.00GHz with
86GB of memory, The GPU card is NVIDIA Teslma M2090 (/usr/bin/nvidia-smi
-pm 1 -c 3 --ecc-config=0)

Amber12 (patched) compiled with CUDA5:
-----------------------------------------------------------------
> ldd pmemd.cuda
     linux-vdso.so.1 => (0x00007fff571ff000)
     libcurand.so.5.0 => /usr/local/cuda/lib64/libcurand.so.5.0
(0x00007f526aa0e000)
     libcufft.so.5.0 => /usr/local/cuda/lib64/libcufft.so.5.0
(0x00007f5268a31000)
     libcudart.so.5.0 => /usr/local/cuda/lib64/libcudart.so.5.0
(0x00007f52687d5000)
     libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007f52684ef000)
     libm.so.6 => /lib64/libm.so.6 (0x00007f5268298000)
     libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f5268082000)
     libc.so.6 => /lib64/libc.so.6 (0x00007f5267d22000)
     libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f5267a18000)
     libdl.so.2 => /lib64/libdl.so.2 (0x00007f5267814000)
     libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f52675f7000)
     librt.so.1 => /lib64/librt.so.1 (0x00007f52673ee000)
     /lib64/ld-linux-x86-64.so.2 (0x00007f526c933000)

>make test.cuda
[...]
---------------------------------------------
Running Extended CUDA Explicit solvent tests.
       Precision Model = SPFP
---------------------------------------------
cd 4096wat/ && ./Run.pure_wat SPFP
/opt/applications/amber/12/gnu/include/netcdf.mod

It hangs there for ever.

> cat mdout.pure_wat

           -------------------------------------------------------
           Amber 12 SANDER 2012
           -------------------------------------------------------

However the GPU card seems to be working
# nvidia-smi -a

==============NVSMI LOG==============

Timestamp : Fri Oct 19 10:04:28 2012
Driver Version : 304.54

Attached GPUs : 1
GPU 0000:42:00.0
     Product Name : Tesla M2090
     Display Mode : Disabled
     Persistence Mode : Disabled
     Driver Model
         Current : N/A
         Pending : N/A
     Serial Number : 0321412003326
     GPU UUID : GPU-8ba67388-783d-ca37-1d16-286dc5764189
     VBIOS Version : 70.10.46.00.01
     Inforom Version
         Image Version : N/A
         OEM Object : 1.1
         ECC Object : 2.0
         Power Management Object : 4.0
     GPU Operation Mode
         Current : N/A
         Pending : N/A
     PCI
         Bus : 0x42
         Device : 0x00
         Domain : 0x0000
         Device Id : 0x109110DE
         Bus Id : 0000:42:00.0
         Sub System Id : 0x088710DE
         GPU Link Info
             PCIe Generation
                 Max : 2
                 Current : 2
             Link Width
                 Max : 16x
                 Current : 16x
     Fan Speed : N/A
     Performance State : P0
     Clocks Throttle Reasons : N/A
     Memory Usage
         Total : 6143 MB
         Used : 115 MB
         Free : 6028 MB
     Compute Mode : Default
     Utilization
         Gpu : 99 %
         Memory : 0 %
     Ecc Mode
         Current : Disabled
         Pending : Disabled
     ECC Errors
         Volatile
             Single Bit
                 Device Memory : N/A
                 Register File : N/A
                 L1 Cache : N/A
                 L2 Cache : N/A
                 Texture Memory : N/A
                 Total : N/A
             Double Bit
                 Device Memory : N/A
                 Register File : N/A
                 L1 Cache : N/A
                 L2 Cache : N/A
                 Texture Memory : N/A
                 Total : N/A
         Aggregate
             Single Bit
                 Device Memory : N/A
                 Register File : N/A
                 L1 Cache : N/A
                 L2 Cache : N/A
                 Texture Memory : N/A
                 Total : N/A
             Double Bit
                 Device Memory : N/A
                 Register File : N/A
                 L1 Cache : N/A
                 L2 Cache : N/A
                 Texture Memory : N/A
                 Total : N/A
     Temperature
         Gpu : N/A
     Power Readings
         Power Management : Supported
         Power Draw : 95.12 W
         Power Limit : 225.00 W
         Default Power Limit : N/A
         Min Power Limit : N/A
         Max Power Limit : N/A
     Clocks
         Graphics : 650 MHz
         SM : 1301 MHz
         Memory : 1848 MHz
     Applications Clocks
         Graphics : N/A
         Memory : N/A
     Max Clocks
         Graphics : 650 MHz
         SM : 1301 MHz
         Memory : 1848 MHz
     Compute Processes
         Process ID : 6829
             Name : ../../../bin/pmemd.cuda_SPFP
             Used GPU Memory : 101 MB


Amber12 (patched) compiled with CUDA4.2:
--------------------------------------------------------------------
> ldd pmemd.cuda
     linux-vdso.so.1 => (0x00007ffff31ff000)
     libcurand.so.4 => /usr/local/cuda/lib64/libcurand.so.4
(0x00007fb49375e000)
     libcufft.so.4 => /usr/local/cuda/lib64/libcufft.so.4
(0x00007fb491726000)
     libcudart.so.4 => /usr/local/cuda/lib64/libcudart.so.4
(0x00007fb4914cb000)
     libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007fb4911e5000)
     libm.so.6 => /lib64/libm.so.6 (0x00007fb490f8e000)
     libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fb490d78000)
     libc.so.6 => /lib64/libc.so.6 (0x00007fb490a18000)
     libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fb49070e000)
     libdl.so.2 => /lib64/libdl.so.2 (0x00007fb49050a000)
     libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb4902ed000)
     librt.so.1 => /lib64/librt.so.1 (0x00007fb4900e4000)
     /lib64/ld-linux-x86-64.so.2 (0x00007fb495870000)


>make test.cuda
[...]
---------------------------------------------
Running Extended CUDA Explicit solvent tests.
       Precision Model = SPFP
---------------------------------------------
cd 4096wat/ && ./Run.pure_wat SPFP
/opt/applications/amber/12/gnu/include/netcdf.mod
diffing mdout.pure_wat.GPU_SPFP with mdout.pure_wat
PASSED
[...]
All the tests passed successfully

Any idea?
Let me know if you need additional info.
Best,
JC



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Oct 19 2012 - 11:30:03 PDT
Custom Search