Hi-
We have been updated GPU nodes to CUDA5 and some Amber12 tests are hanging.
The system is running Opensuse11.3 on 16cores E5-2650 0 . 2.00GHz with
86GB of memory, The GPU card is NVIDIA Teslma M2090 (/usr/bin/nvidia-smi
-pm 1 -c 3 --ecc-config=0)
Amber12 (patched) compiled with CUDA5:
-----------------------------------------------------------------
> ldd pmemd.cuda
linux-vdso.so.1 => (0x00007fff571ff000)
libcurand.so.5.0 => /usr/local/cuda/lib64/libcurand.so.5.0
(0x00007f526aa0e000)
libcufft.so.5.0 => /usr/local/cuda/lib64/libcufft.so.5.0
(0x00007f5268a31000)
libcudart.so.5.0 => /usr/local/cuda/lib64/libcudart.so.5.0
(0x00007f52687d5000)
libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007f52684ef000)
libm.so.6 => /lib64/libm.so.6 (0x00007f5268298000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f5268082000)
libc.so.6 => /lib64/libc.so.6 (0x00007f5267d22000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f5267a18000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f5267814000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f52675f7000)
librt.so.1 => /lib64/librt.so.1 (0x00007f52673ee000)
/lib64/ld-linux-x86-64.so.2 (0x00007f526c933000)
>make test.cuda
[...]
---------------------------------------------
Running Extended CUDA Explicit solvent tests.
Precision Model = SPFP
---------------------------------------------
cd 4096wat/ && ./Run.pure_wat SPFP
/opt/applications/amber/12/gnu/include/netcdf.mod
It hangs there for ever.
> cat mdout.pure_wat
-------------------------------------------------------
Amber 12 SANDER 2012
-------------------------------------------------------
However the GPU card seems to be working
# nvidia-smi -a
==============NVSMI LOG==============
Timestamp : Fri Oct 19 10:04:28 2012
Driver Version : 304.54
Attached GPUs : 1
GPU 0000:42:00.0
Product Name : Tesla M2090
Display Mode : Disabled
Persistence Mode : Disabled
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0321412003326
GPU UUID : GPU-8ba67388-783d-ca37-1d16-286dc5764189
VBIOS Version : 70.10.46.00.01
Inforom Version
Image Version : N/A
OEM Object : 1.1
ECC Object : 2.0
Power Management Object : 4.0
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x42
Device : 0x00
Domain : 0x0000
Device Id : 0x109110DE
Bus Id : 0000:42:00.0
Sub System Id : 0x088710DE
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 16x
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons : N/A
Memory Usage
Total : 6143 MB
Used : 115 MB
Free : 6028 MB
Compute Mode : Default
Utilization
Gpu : 99 %
Memory : 0 %
Ecc Mode
Current : Disabled
Pending : Disabled
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Temperature
Gpu : N/A
Power Readings
Power Management : Supported
Power Draw : 95.12 W
Power Limit : 225.00 W
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 650 MHz
SM : 1301 MHz
Memory : 1848 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 650 MHz
SM : 1301 MHz
Memory : 1848 MHz
Compute Processes
Process ID : 6829
Name : ../../../bin/pmemd.cuda_SPFP
Used GPU Memory : 101 MB
Amber12 (patched) compiled with CUDA4.2:
--------------------------------------------------------------------
> ldd pmemd.cuda
linux-vdso.so.1 => (0x00007ffff31ff000)
libcurand.so.4 => /usr/local/cuda/lib64/libcurand.so.4
(0x00007fb49375e000)
libcufft.so.4 => /usr/local/cuda/lib64/libcufft.so.4
(0x00007fb491726000)
libcudart.so.4 => /usr/local/cuda/lib64/libcudart.so.4
(0x00007fb4914cb000)
libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007fb4911e5000)
libm.so.6 => /lib64/libm.so.6 (0x00007fb490f8e000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fb490d78000)
libc.so.6 => /lib64/libc.so.6 (0x00007fb490a18000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fb49070e000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fb49050a000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb4902ed000)
librt.so.1 => /lib64/librt.so.1 (0x00007fb4900e4000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb495870000)
>make test.cuda
[...]
---------------------------------------------
Running Extended CUDA Explicit solvent tests.
Precision Model = SPFP
---------------------------------------------
cd 4096wat/ && ./Run.pure_wat SPFP
/opt/applications/amber/12/gnu/include/netcdf.mod
diffing mdout.pure_wat.GPU_SPFP with mdout.pure_wat
PASSED
[...]
All the tests passed successfully
Any idea?
Let me know if you need additional info.
Best,
JC
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Oct 19 2012 - 11:00:02 PDT