Hello,
I'm still trying to figure out why the MPI CUDA tests are failing.
If I run tests with DO_PARALLEL="mpirun -np 4" and limit
CUDA_VISIBLE_DEVICES to only 0 or 1, all tests pass. I get the same
behavior with OpenMPI 1.8, 1.10, 2.0 and mpich 3.1.
I ran gpuP2PCheck just in case communication between the GPUs was the
problem. It confirms that communication is working:
CUDA-capable device count: 2
GPU0 " Tesla K80"
GPU1 " Tesla K80"
Two way peer access between:
GPU0 and GPU1: YES
If it's of any use, here is the output of nvidia-smi -q:
==============NVSMI LOG==============
Timestamp : Thu Aug 11 22:42:34 2016
Driver Version : 352.93
Attached GPUs : 2
GPU 0000:05:00.0
Product Name : Tesla K80
Product Brand : Tesla
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0325015055313
GPU UUID :
GPU-a65eaa77-8871-ded5-b6ee-5268404192f1
Minor Number : 0
VBIOS Version : 80.21.1B.00.01
MultiGPU Board : Yes
Board ID : 0x300
Inforom Version
Image Version : 2080.0200.00.04
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x05
Device : 0x00
Domain : 0x0000
Device Id : 0x102D10DE
Bus Id : 0000:05:00.0
Sub System Id : 0x106C10DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : PLX
Firmware : 0xF0472900
Replays since reset : 0
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
FB Memory Usage
Total : 12287 MiB
Used : 56 MiB
Free : 12231 MiB
BAR1 Memory Usage
Total : 16384 MiB
Used : 2 MiB
Free : 16382 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Ecc Mode
Current : Disabled
Pending : Disabled
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
GPU Current Temp : 31 C
GPU Shutdown Temp : 93 C
GPU Slowdown Temp : 88 C
Power Readings
Power Management : Supported
Power Draw : 59.20 W
Power Limit : 149.00 W
Default Power Limit : 149.00 W
Enforced Power Limit : 149.00 W
Min Power Limit : 100.00 W
Max Power Limit : 175.00 W
Clocks
Graphics : 562 MHz
SM : 562 MHz
Memory : 2505 MHz
Applications Clocks
Graphics : 562 MHz
Memory : 2505 MHz
Default Applications Clocks
Graphics : 562 MHz
Memory : 2505 MHz
Max Clocks
Graphics : 875 MHz
SM : 875 MHz
Memory : 2505 MHz
Clock Policy
Auto Boost : On
Auto Boost Default : On
Processes : None
GPU 0000:06:00.0
Product Name : Tesla K80
Product Brand : Tesla
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0325015055313
GPU UUID :
GPU-21c2be1c-72a9-1b68-adab-459d05dd7adc
Minor Number : 1
VBIOS Version : 80.21.1B.00.02
MultiGPU Board : Yes
Board ID : 0x300
Inforom Version
Image Version : 2080.0200.00.04
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x06
Device : 0x00
Domain : 0x0000
Device Id : 0x102D10DE
Bus Id : 0000:06:00.0
Sub System Id : 0x106C10DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : PLX
Firmware : 0xF0472900
Replays since reset : 0
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
FB Memory Usage
Total : 12287 MiB
Used : 56 MiB
Free : 12231 MiB
BAR1 Memory Usage
Total : 16384 MiB
Used : 2 MiB
Free : 16382 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Ecc Mode
Current : Disabled
Pending : Disabled
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
GPU Current Temp : 24 C
GPU Shutdown Temp : 93 C
GPU Slowdown Temp : 88 C
Power Readings
Power Management : Supported
Power Draw : 70.89 W
Power Limit : 149.00 W
Default Power Limit : 149.00 W
Enforced Power Limit : 149.00 W
Min Power Limit : 100.00 W
Max Power Limit : 175.00 W
Clocks
Graphics : 562 MHz
SM : 562 MHz
Memory : 2505 MHz
Applications Clocks
Graphics : 562 MHz
Memory : 2505 MHz
Default Applications Clocks
Graphics : 562 MHz
Memory : 2505 MHz
Max Clocks
Graphics : 875 MHz
SM : 875 MHz
Memory : 2505 MHz
Clock Policy
Auto Boost : On
Auto Boost Default : On
Processes : None
If it matters, when I do the tests with DO_PARALLEL="mpirun -np 4", I see
that each process is running a thread on both GPUs. For example:
# gpu pid type sm mem enc dec command
# Idx # C/G % % % % name
0 30599 C 24 0 0 0 pmemd.cuda_DPFP
0 30600 C 0 0 0 0 pmemd.cuda_DPFP
0 30601 C 11 0 0 0 pmemd.cuda_DPFP
0 30602 C 0 0 0 0 pmemd.cuda_DPFP
1 30599 C 0 0 0 0 pmemd.cuda_DPFP
1 30600 C 36 0 0 0 pmemd.cuda_DPFP
1 30601 C 0 0 0 0 pmemd.cuda_DPFP
1 30602 C 6 0 0 0 pmemd.cuda_DPFP
Is that expected behavior?
Has anybody else had any problems using K80s with MPI and CUDA? Or using
CentOS/RHEL 6?
This machine does have dual CPUs, could that be a factor?
I'm currently using AmberTools version 16.12 and Amber version 16.05.
Any insight would be greatly appreciated.
Thanks,
Steve
On Mon, Jul 25, 2016 at 3:06 PM, Steven Ford <sford123.ibbr.umd.edu> wrote:
> Ross,
>
> This is CentOS version 6.7 with kernel version 2.6.32-573.22.1.el6.x86_64.
>
> The output of nvidia-smi is:
>
> +------------------------------------------------------+
>
> | NVIDIA-SMI 352.79 Driver Version: 352.79 |
>
> |-------------------------------+----------------------+----
> ------------------+
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
> ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
> M. |
> |===============================+======================+====
> ==================|
> | 0 Tesla K80 Off | 0000:05:00.0 Off |
> Off |
> | N/A 34C P0 59W / 149W | 56MiB / 12287MiB | 0%
> Default |
> +-------------------------------+----------------------+----
> ------------------+
> | 1 Tesla K80 Off | 0000:06:00.0 Off |
> Off |
> | N/A 27C P0 48W / 149W | 56MiB / 12287MiB | 0%
> Default |
> +-------------------------------+----------------------+----
> ------------------+
>
>
> +-----------------------------------------------------------
> ------------------+
> | Processes: GPU
> Memory |
> | GPU PID Type Process name Usage
> |
> |===========================================================
> ==================|
> | No running processes found
> |
> +-----------------------------------------------------------
> ------------------+
>
> The version of nvcc:
>
> nvcc: NVIDIA (R) Cuda compiler driver
> Copyright (c) 2005-2015 NVIDIA Corporation
> Built on Tue_Aug_11_14:27:32_CDT_2015
> Cuda compilation tools, release 7.5, V7.5.17
>
> I used the GNU compilers, version 4.4.7.
>
> I am using OpenMPI version 1.8.1-5.el6 from the CentOS repository. I have
> not tried any other MPI installation.
>
> Output of mpif90 --showme:
>
> gfortran -I/usr/include/openmpi-x86_64 -pthread -I/usr/lib64/openmpi/lib
> -Wl,-rpath -Wl,/usr/lib64/openmpi/lib -Wl,--enable-new-dtags
> -L/usr/lib64/openmpi/lib -lmpi_usempi -lmpi_mpifh -lmpi
>
>
> I set DO_PARALLEL to "mpirun -np 2"
>
> The parallel tests for the CPU were all successful.
>
> I had not run 'make clean' in between each step. I tried the tests again
> this morning after running 'make clean' and got the same result.
>
> I applied all patches this morning before testing again. I am using
> AmberTools 16.10 and Amber 16.04
>
>
> Thanks,
>
> Steve
>
> On Sat, Jul 23, 2016 at 6:32 PM, Ross Walker <ross.rosswalker.co.uk>
> wrote:
>
>> Hi Steven,
>>
>> This is a large number of very worrying failures. Something is definitely
>> very wrong here and I'd like to investigate further. Can you give me some
>> more details about your system please. This includes:
>>
>> The specifics of what version of Linux you are using.
>>
>> The output of nvidia-smi
>>
>> nvcc -V (might be lower case v to get version info).
>>
>> Did you use the GNU compilers or the Intel compilers and in either case
>> which version?
>>
>> OpenMPI - can you confirm the version again and also send me the output
>> of mpif90 --showme (it might be --show or -show or something similar) -
>> essentially I want to see what the underlying compilation line is.
>>
>> Can you confirm what you had $DO_PARALLEL set to when you ran make test
>> for the parallel GPU build. Also can you confirm if the regular (CPU)
>> parallel build passed the tests please?
>>
>> Also did you run 'make clean' before each build step? E.g.
>>
>> ./configure -cuda gnu
>> make -j8 install
>> make test
>> *make clean*
>>
>> ./configure -cuda -mpi gnu
>> make -j8 install
>> make test
>>
>> Have you tried any other MPI installations? - E.g. MPICH?
>>
>> And finally can you please confirm which version of Amber (and
>> AmberTools) this is and which patches have been applied?
>>
>> Thanks.
>>
>> All the best
>> Ross
>>
>> On Jul 21, 2016, at 14:20, Steven Ford <sford123.ibbr.umd.edu> wrote:
>>
>> Ross,
>>
>> Attached are the log and diff files. Thank you for taking a look.
>>
>> Regards,
>>
>> Steve
>>
>> On Thu, Jul 21, 2016 at 5:34 AM, Ross Walker <ross.rosswalker.co.uk>
>> wrote:
>>
>>> Hi Steve,
>>>
>>> Indeed that is too big a difference to just be rounding error - although
>>> if those tests are using Langevin or Anderson for the thermostat that would
>>> explain it (different random number streams) - although those tests are
>>> supposed to be skipped in parallel.
>>>
>>> Can you send me a copy directly of your .log and .dif files for the 2
>>> GPU run and I'll take a closer look at it.
>>>
>>> All the best
>>> Ross
>>>
>>> > On Jul 20, 2016, at 21:19, Steven Ford <sford123.ibbr.umd.edu> wrote:
>>> >
>>> > Hello All,
>>> >
>>> > I currently trying to get Amber16 installed and running on our
>>> computing
>>> > cluster. Our researchers are primarily interested in running the GPU
>>> > accelerated programs. For GPU computing jobs, we have one CentOS 6.7
>>> node
>>> > with a Tesla K80.
>>> >
>>> > I was able to build Amber16 and run the Serial/Parallel CPU plus the
>>> Serial
>>> > GPU tests with all file comparisons passing. However, only 5 parallel
>>> GPU
>>> > tests succeeded, while the other 100 comparisons failed.
>>> >
>>> > Examining the diff file shows that some of the numbers are not off by
>>> much
>>> > like the documentation said could happen. For example:
>>> >
>>> > 66c66
>>> > < NSTEP = 1 TIME(PS) = 50.002 TEMP(K) = 351.27
>>> PRESS =
>>> > 0.
>>> >> NSTEP = 1 TIME(PS) = 50.002 TEMP(K) = 353.29 PRESS =
>>> > 0.
>>> >
>>> > This may also be too large to attribute to a rounding error, but it is
>>> a
>>> > small difference compared to others:
>>> >
>>> > 85c85
>>> > < Etot = -217.1552 EKtot = 238.6655 EPtot =
>>> > -455.8207
>>> >> Etot = -1014.2562 EKtot = 244.6242 EPtot =
>>> > -1258.8804
>>> >
>>> > This was build with CUDA 7.5, OpenMPI 1.8, and run with
>>> DO_PARALLEL="mpirun
>>> > -np 2"
>>> >
>>> > Any idea what else could be affecting the output?
>>> >
>>> > Thanks,
>>> >
>>> > Steve
>>> >
>>> > --
>>> > Steven Ford
>>> > IT Infrastructure Specialist
>>> > Institute for Bioscience and Biotechnology Research
>>> > University of Maryland
>>> > (240)314-6405
>>> > _______________________________________________
>>> > AMBER mailing list
>>> > AMBER.ambermd.org
>>> > http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
>>
>> --
>> Steven Ford
>> IT Infrastructure Specialist
>> Institute for Bioscience and Biotechnology Research
>> University of Maryland
>> (240)314-6405
>> <2016-07-20_11-17-52.diff><2016-07-20_11-17-52.log>
>>
>>
>>
>
>
> --
> Steven Ford
> IT Infrastructure Specialist
> Institute for Bioscience and Biotechnology Research
> University of Maryland
> (240)314-6405
>
--
Steven Ford
IT Infrastructure Specialist
Institute for Bioscience and Biotechnology Research
University of Maryland
(240)314-6405
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Aug 11 2016 - 20:00:03 PDT