Re: [AMBER] Cuda test hangs with Amber12 compiled with CUDA5

From: Ross Walker <ross.rosswalker.co.uk>
Date: Fri, 19 Oct 2012 11:19:59 -0700

CUDA 5.0 is not currently supported. Thanks NVIDIA :(

Stick with 4.2 until we get a patch out.

All the best
Ross


On 10/19/12 10:40 AM, "Jean-Christophe Ducom" <jcducom.scripps.edu> wrote:

>Hi-
>We have been updated GPU nodes to CUDA5 and some Amber12 tests are
>hanging.
>The system is running Opensuse11.3 on 16cores E5-2650 0 . 2.00GHz with
>86GB of memory, The GPU card is NVIDIA Teslma M2090 (/usr/bin/nvidia-smi
>-pm 1 -c 3 --ecc-config=0)
>
>Amber12 (patched) compiled with CUDA5:
>-----------------------------------------------------------------
> > ldd pmemd.cuda
> linux-vdso.so.1 => (0x00007fff571ff000)
> libcurand.so.5.0 => /usr/local/cuda/lib64/libcurand.so.5.0
>(0x00007f526aa0e000)
> libcufft.so.5.0 => /usr/local/cuda/lib64/libcufft.so.5.0
>(0x00007f5268a31000)
> libcudart.so.5.0 => /usr/local/cuda/lib64/libcudart.so.5.0
>(0x00007f52687d5000)
> libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007f52684ef000)
> libm.so.6 => /lib64/libm.so.6 (0x00007f5268298000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f5268082000)
> libc.so.6 => /lib64/libc.so.6 (0x00007f5267d22000)
> libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f5267a18000)
> libdl.so.2 => /lib64/libdl.so.2 (0x00007f5267814000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f52675f7000)
> librt.so.1 => /lib64/librt.so.1 (0x00007f52673ee000)
> /lib64/ld-linux-x86-64.so.2 (0x00007f526c933000)
>
> >make test.cuda
>[...]
>---------------------------------------------
>Running Extended CUDA Explicit solvent tests.
> Precision Model = SPFP
>---------------------------------------------
>cd 4096wat/ && ./Run.pure_wat SPFP
>/opt/applications/amber/12/gnu/include/netcdf.mod
>
>It hangs there for ever.
>
> > cat mdout.pure_wat
>
> -------------------------------------------------------
> Amber 12 SANDER 2012
> -------------------------------------------------------
>
>However the GPU card seems to be working
># nvidia-smi -a
>
>==============NVSMI LOG==============
>
>Timestamp : Fri Oct 19 10:04:28 2012
>Driver Version : 304.54
>
>Attached GPUs : 1
>GPU 0000:42:00.0
> Product Name : Tesla M2090
> Display Mode : Disabled
> Persistence Mode : Disabled
> Driver Model
> Current : N/A
> Pending : N/A
> Serial Number : 0321412003326
> GPU UUID :
>GPU-8ba67388-783d-ca37-1d16-286dc5764189
> VBIOS Version : 70.10.46.00.01
> Inforom Version
> Image Version : N/A
> OEM Object : 1.1
> ECC Object : 2.0
> Power Management Object : 4.0
> GPU Operation Mode
> Current : N/A
> Pending : N/A
> PCI
> Bus : 0x42
> Device : 0x00
> Domain : 0x0000
> Device Id : 0x109110DE
> Bus Id : 0000:42:00.0
> Sub System Id : 0x088710DE
> GPU Link Info
> PCIe Generation
> Max : 2
> Current : 2
> Link Width
> Max : 16x
> Current : 16x
> Fan Speed : N/A
> Performance State : P0
> Clocks Throttle Reasons : N/A
> Memory Usage
> Total : 6143 MB
> Used : 115 MB
> Free : 6028 MB
> Compute Mode : Default
> Utilization
> Gpu : 99 %
> Memory : 0 %
> Ecc Mode
> Current : Disabled
> Pending : Disabled
> ECC Errors
> Volatile
> Single Bit
> Device Memory : N/A
> Register File : N/A
> L1 Cache : N/A
> L2 Cache : N/A
> Texture Memory : N/A
> Total : N/A
> Double Bit
> Device Memory : N/A
> Register File : N/A
> L1 Cache : N/A
> L2 Cache : N/A
> Texture Memory : N/A
> Total : N/A
> Aggregate
> Single Bit
> Device Memory : N/A
> Register File : N/A
> L1 Cache : N/A
> L2 Cache : N/A
> Texture Memory : N/A
> Total : N/A
> Double Bit
> Device Memory : N/A
> Register File : N/A
> L1 Cache : N/A
> L2 Cache : N/A
> Texture Memory : N/A
> Total : N/A
> Temperature
> Gpu : N/A
> Power Readings
> Power Management : Supported
> Power Draw : 95.12 W
> Power Limit : 225.00 W
> Default Power Limit : N/A
> Min Power Limit : N/A
> Max Power Limit : N/A
> Clocks
> Graphics : 650 MHz
> SM : 1301 MHz
> Memory : 1848 MHz
> Applications Clocks
> Graphics : N/A
> Memory : N/A
> Max Clocks
> Graphics : 650 MHz
> SM : 1301 MHz
> Memory : 1848 MHz
> Compute Processes
> Process ID : 6829
> Name : ../../../bin/pmemd.cuda_SPFP
> Used GPU Memory : 101 MB
>
>
>Amber12 (patched) compiled with CUDA4.2:
>--------------------------------------------------------------------
> > ldd pmemd.cuda
> linux-vdso.so.1 => (0x00007ffff31ff000)
> libcurand.so.4 => /usr/local/cuda/lib64/libcurand.so.4
>(0x00007fb49375e000)
> libcufft.so.4 => /usr/local/cuda/lib64/libcufft.so.4
>(0x00007fb491726000)
> libcudart.so.4 => /usr/local/cuda/lib64/libcudart.so.4
>(0x00007fb4914cb000)
> libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007fb4911e5000)
> libm.so.6 => /lib64/libm.so.6 (0x00007fb490f8e000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fb490d78000)
> libc.so.6 => /lib64/libc.so.6 (0x00007fb490a18000)
> libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fb49070e000)
> libdl.so.2 => /lib64/libdl.so.2 (0x00007fb49050a000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb4902ed000)
> librt.so.1 => /lib64/librt.so.1 (0x00007fb4900e4000)
> /lib64/ld-linux-x86-64.so.2 (0x00007fb495870000)
>
>
> >make test.cuda
>[...]
>---------------------------------------------
>Running Extended CUDA Explicit solvent tests.
> Precision Model = SPFP
>---------------------------------------------
>cd 4096wat/ && ./Run.pure_wat SPFP
>/opt/applications/amber/12/gnu/include/netcdf.mod
>diffing mdout.pure_wat.GPU_SPFP with mdout.pure_wat
>PASSED
>[...]
>All the tests passed successfully
>
>Any idea?
>Let me know if you need additional info.
>Best,
>JC
>
>
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Oct 19 2012 - 11:30:05 PDT
Custom Search