Re: [AMBER] Amber16 Parallel CUDA Tests from Steven Ford on 2016-08-11 (Amber Archive Aug 2016)

From: Steven Ford <sford123.ibbr.umd.edu>
Date: Fri, 12 Aug 2016 00:33:15 -0400

Ross,

Thanks, I will look for bios updates. Firmware aside, is there any
configuration in the bios that would affect this?

Thanks,

Steve

On Aug 12, 2016 12:29 AM, "Ross Walker" <ross.rosswalker.co.uk> wrote:

> Hi Steven,
>
> Ah I thought you meant you had 4 GPUs as in 2 K80s rather than a single
> K80 card that contains 2 GPUs.
>
> Either way this shows your hardware is incorrectly configured / has a
> buggy bios. Who makes it? You probably need to go back to them and get an
> updated bios that properly handles peer to peer communication.
>
> You could also check the motherboard manufacturer and see if they have an
> up to date bios that fixes this bug.
>
> All those entries reported by lspci should have a minus after them if
> things are correct in the bios.
>
> All the best
> Ross
>
> On Aug 11, 2016, at 9:21 PM, Steven Ford <sford123.ibbr.umd.edu> wrote:
>
> Ross,
>
> The output of lspci -d "10b5:*" -vvv | grep ACSCtl is:
>
> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+
> EgressCtrl- DirectTrans-
> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+
> EgressCtrl- DirectTrans-
>
>
> With CUDA_VISIBLE_DEVICES unset:
>
> [./simpleP2P] - Starting...
> Checking for multiple GPUs...
> CUDA-capable device count: 2
> > GPU0 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> > GPU1 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
>
> Checking GPU(s) for support of peer to peer memory access...
> > Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes
> > Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes
> Enabling peer access between GPU0 and GPU1...
> Checking GPU0 and GPU1 for UVA capabilities...
> > Tesla K80 (GPU0) supports UVA: Yes
> > Tesla K80 (GPU1) supports UVA: Yes
> Both GPUs can support UVA, enabling...
> Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
> Creating event handles...
> cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 1.11GB/s
> Preparing host buffer and memcpy to GPU0...
> Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
> Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
> Copy data back to host from GPU0 and verify results...
> Verification error . element 0: val = nan, ref = 0.000000
> Verification error . element 1: val = nan, ref = 4.000000
> Verification error . element 2: val = nan, ref = 8.000000
> Verification error . element 3: val = nan, ref = 12.000000
> Verification error . element 4: val = nan, ref = 16.000000
> Verification error . element 5: val = nan, ref = 20.000000
> Verification error . element 6: val = nan, ref = 24.000000
> Verification error . element 7: val = nan, ref = 28.000000
> Verification error . element 8: val = nan, ref = 32.000000
> Verification error . element 9: val = nan, ref = 36.000000
> Verification error . element 10: val = nan, ref = 40.000000
> Verification error . element 11: val = nan, ref = 44.000000
> Disabling peer access...
> Shutting down...
> Test failed!
>
> With CUDA_VISIBLE_DEVICES=0,1
>
> [./simpleP2P] - Starting...
> Checking for multiple GPUs...
> CUDA-capable device count: 2
> > GPU0 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> > GPU1 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
>
> Checking GPU(s) for support of peer to peer memory access...
> > Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes
> > Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes
> Enabling peer access between GPU0 and GPU1...
> Checking GPU0 and GPU1 for UVA capabilities...
> > Tesla K80 (GPU0) supports UVA: Yes
> > Tesla K80 (GPU1) supports UVA: Yes
> Both GPUs can support UVA, enabling...
> Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
> Creating event handles...
> cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 1.11GB/s
> Preparing host buffer and memcpy to GPU0...
> Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
> Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
> Copy data back to host from GPU0 and verify results...
> Verification error . element 0: val = nan, ref = 0.000000
> Verification error . element 1: val = nan, ref = 4.000000
> Verification error . element 2: val = nan, ref = 8.000000
> Verification error . element 3: val = nan, ref = 12.000000
> Verification error . element 4: val = nan, ref = 16.000000
> Verification error . element 5: val = nan, ref = 20.000000
> Verification error . element 6: val = nan, ref = 24.000000
> Verification error . element 7: val = nan, ref = 28.000000
> Verification error . element 8: val = nan, ref = 32.000000
> Verification error . element 9: val = nan, ref = 36.000000
> Verification error . element 10: val = nan, ref = 40.000000
> Verification error . element 11: val = nan, ref = 44.000000
> Disabling peer access...
> Shutting down...
> Test failed!
>
>
> With CUDA_VISIBLE_DEVICES=2,3
>
> [./simpleP2P] - Starting...
> Checking for multiple GPUs...
> CUDA error at simpleP2P.cu:63 code=38(cudaErrorNoDevice)
> "cudaGetDeviceCount(&gpu_n)"
>
>
> and with CUDA_VISIBLE_DEVICES=0,2
>
> CUDA-capable device count: 1
> Two or more GPUs with SM 2.0 or higher capability are required for
> ./simpleP2P.
> Waiving test.
>
>
> I'm guessing the last two test fail because I have only one card with two
> K80 GPUs on it, so no devices 2 or 3. Seems like something's awry with the
> peer to peer communication between 0 and 1. Is it possible for them to be
> on different PCIe domains even though they are on the same physical card?
>
> This makes me wonder: If each PCIe slot is connected to one CPU, should
> this system either use only one CPU or have another K80 in the other PCIe
> slot that's connected to the other CPU?
>
> If it helps, nvidia-smi topo -m shows:
>
> GPU0 GPU1 CPU Affinity
> GPU0 X PIX 0-7,16-23
> GPU1 PIX X 0-7,16-23
>
>
> Thanks again,
>
> Steve
>
> On Thu, Aug 11, 2016 at 11:17 PM, Ross Walker <ross.rosswalker.co.uk>
> wrote:
>
>> Hi Steve,
>>
>> I suspect your hardware is misconfigured. Can you run a couple of tests
>> please.
>>
>> With CUDA_VISIBLE_DEVICES unset
>>
>> 1) As root run: lspci -d "10b5:*" -vvv | grep ACSCtl
>>
>> and post the output here.
>>
>> 2) Compile the CUDA samples installed as part of CUDA 7.5 and then run
>> the following:
>>
>> unset CUDA_VISIBLE_DEVICES
>> ./simpleP2P
>>
>> export CUDA_VISIBLE_DEVICES=0,1
>> ./simpleP2P
>>
>> export CUDA_VISIBLE_DEVICES=2,3
>> ./simpleP2P
>>
>> export CUDA_VISIBLE_DEVICES=0,2
>> ./simpleP2P
>>
>> And post the results here.
>>
>> My suspicion is that your two K80s are on different PCI-E domains
>> connected to different CPU sockets BUT your bios is misconfigured such that
>> it is incorrectly reporting that the two K80s can talk to each other via
>> P2P. Thus the first two simpleP2P runs above should pass. The last one will
>> likely report that P2P is possible but then the bandwidth will be very low
>> and it will ultimately fail the test because the array received by GPU 2
>> will be garbage.
>>
>> If my suspicions are correct you would find the following behavior with
>> AMBER
>>
>> 4 x 1 GPU runs, one on each GPU would be fine.
>> (1 or 2) x 2 GPU runs will be fine if you use GPUS 0,1 and 2,3 but will
>> fail if you were to use 0,2 - 0,3 - 1,2 or 1,3
>> 1 x 4 GPU runs will fail unless you restrict it to GPUs 0,1 or 2,3 and
>> thus overload the GPUs.
>>
>> Ps. nvidia-smi reporting 2 threads per mpi task is not an issue - it to
>> be expected.
>>
>> All the best
>> Ross
>>
>> On Aug 11, 2016, at 7:54 PM, Steven Ford <sford123.ibbr.umd.edu> wrote:
>>
>> Hello,
>>
>> I'm still trying to figure out why the MPI CUDA tests are failing.
>>
>> If I run tests with DO_PARALLEL="mpirun -np 4" and limit
>> CUDA_VISIBLE_DEVICES to only 0 or 1, all tests pass. I get the same
>> behavior with OpenMPI 1.8, 1.10, 2.0 and mpich 3.1.
>>
>> I ran gpuP2PCheck just in case communication between the GPUs was the
>> problem. It confirms that communication is working:
>>
>> CUDA-capable device count: 2
>> GPU0 " Tesla K80"
>> GPU1 " Tesla K80"
>>
>> Two way peer access between:
>> GPU0 and GPU1: YES
>>
>> If it's of any use, here is the output of nvidia-smi -q:
>>
>> ==============NVSMI LOG==============
>>
>> Timestamp : Thu Aug 11 22:42:34 2016
>> Driver Version : 352.93
>>
>> Attached GPUs : 2
>> GPU 0000:05:00.0
>> Product Name : Tesla K80
>> Product Brand : Tesla
>> Display Mode : Disabled
>> Display Active : Disabled
>> Persistence Mode : Disabled
>> Accounting Mode : Disabled
>> Accounting Mode Buffer Size : 1920
>> Driver Model
>> Current : N/A
>> Pending : N/A
>> Serial Number : 0325015055313
>> GPU UUID : GPU-a65eaa77-8871-ded5-b6ee-52
>> 68404192f1
>> Minor Number : 0
>> VBIOS Version : 80.21.1B.00.01
>> MultiGPU Board : Yes
>> Board ID : 0x300
>> Inforom Version
>> Image Version : 2080.0200.00.04
>> OEM Object : 1.1
>> ECC Object : 3.0
>> Power Management Object : N/A
>> GPU Operation Mode
>> Current : N/A
>> Pending : N/A
>> PCI
>> Bus : 0x05
>> Device : 0x00
>> Domain : 0x0000
>> Device Id : 0x102D10DE
>> Bus Id : 0000:05:00.0
>> Sub System Id : 0x106C10DE
>> GPU Link Info
>> PCIe Generation
>> Max : 3
>> Current : 3
>> Link Width
>> Max : 16x
>> Current : 16x
>> Bridge Chip
>> Type : PLX
>> Firmware : 0xF0472900
>> Replays since reset : 0
>> Tx Throughput : N/A
>> Rx Throughput : N/A
>> Fan Speed : N/A
>> Performance State : P0
>> Clocks Throttle Reasons
>> Idle : Not Active
>> Applications Clocks Setting : Active
>> SW Power Cap : Not Active
>> HW Slowdown : Not Active
>> Unknown : Not Active
>> FB Memory Usage
>> Total : 12287 MiB
>> Used : 56 MiB
>> Free : 12231 MiB
>> BAR1 Memory Usage
>> Total : 16384 MiB
>> Used : 2 MiB
>> Free : 16382 MiB
>> Compute Mode : Default
>> Utilization
>> Gpu : 0 %
>> Memory : 0 %
>> Encoder : 0 %
>> Decoder : 0 %
>> Ecc Mode
>> Current : Disabled
>> Pending : Disabled
>> ECC Errors
>> Volatile
>> Single Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Double Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Aggregate
>> Single Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Double Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Retired Pages
>> Single Bit ECC : 0
>> Double Bit ECC : 0
>> Pending : No
>> Temperature
>> GPU Current Temp : 31 C
>> GPU Shutdown Temp : 93 C
>> GPU Slowdown Temp : 88 C
>> Power Readings
>> Power Management : Supported
>> Power Draw : 59.20 W
>> Power Limit : 149.00 W
>> Default Power Limit : 149.00 W
>> Enforced Power Limit : 149.00 W
>> Min Power Limit : 100.00 W
>> Max Power Limit : 175.00 W
>> Clocks
>> Graphics : 562 MHz
>> SM : 562 MHz
>> Memory : 2505 MHz
>> Applications Clocks
>> Graphics : 562 MHz
>> Memory : 2505 MHz
>> Default Applications Clocks
>> Graphics : 562 MHz
>> Memory : 2505 MHz
>> Max Clocks
>> Graphics : 875 MHz
>> SM : 875 MHz
>> Memory : 2505 MHz
>> Clock Policy
>> Auto Boost : On
>> Auto Boost Default : On
>> Processes : None
>>
>> GPU 0000:06:00.0
>> Product Name : Tesla K80
>> Product Brand : Tesla
>> Display Mode : Disabled
>> Display Active : Disabled
>> Persistence Mode : Disabled
>> Accounting Mode : Disabled
>> Accounting Mode Buffer Size : 1920
>> Driver Model
>> Current : N/A
>> Pending : N/A
>> Serial Number : 0325015055313
>> GPU UUID : GPU-21c2be1c-72a9-1b68-adab-45
>> 9d05dd7adc
>> Minor Number : 1
>> VBIOS Version : 80.21.1B.00.02
>> MultiGPU Board : Yes
>> Board ID : 0x300
>> Inforom Version
>> Image Version : 2080.0200.00.04
>> OEM Object : 1.1
>> ECC Object : 3.0
>> Power Management Object : N/A
>> GPU Operation Mode
>> Current : N/A
>> Pending : N/A
>> PCI
>> Bus : 0x06
>> Device : 0x00
>> Domain : 0x0000
>> Device Id : 0x102D10DE
>> Bus Id : 0000:06:00.0
>> Sub System Id : 0x106C10DE
>> GPU Link Info
>> PCIe Generation
>> Max : 3
>> Current : 3
>> Link Width
>> Max : 16x
>> Current : 16x
>> Bridge Chip
>> Type : PLX
>> Firmware : 0xF0472900
>> Replays since reset : 0
>> Tx Throughput : N/A
>> Rx Throughput : N/A
>> Fan Speed : N/A
>> Performance State : P0
>> Clocks Throttle Reasons
>> Idle : Not Active
>> Applications Clocks Setting : Active
>> SW Power Cap : Not Active
>> HW Slowdown : Not Active
>> Unknown : Not Active
>> FB Memory Usage
>> Total : 12287 MiB
>> Used : 56 MiB
>> Free : 12231 MiB
>> BAR1 Memory Usage
>> Total : 16384 MiB
>> Used : 2 MiB
>> Free : 16382 MiB
>> Compute Mode : Default
>> Utilization
>> Gpu : 0 %
>> Memory : 0 %
>> Encoder : 0 %
>> Decoder : 0 %
>> Ecc Mode
>> Current : Disabled
>> Pending : Disabled
>> ECC Errors
>> Volatile
>> Single Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Double Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Aggregate
>> Single Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Double Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Retired Pages
>> Single Bit ECC : 0
>> Double Bit ECC : 0
>> Pending : No
>> Temperature
>> GPU Current Temp : 24 C
>> GPU Shutdown Temp : 93 C
>> GPU Slowdown Temp : 88 C
>> Power Readings
>> Power Management : Supported
>> Power Draw : 70.89 W
>> Power Limit : 149.00 W
>> Default Power Limit : 149.00 W
>> Enforced Power Limit : 149.00 W
>> Min Power Limit : 100.00 W
>> Max Power Limit : 175.00 W
>> Clocks
>> Graphics : 562 MHz
>> SM : 562 MHz
>> Memory : 2505 MHz
>> Applications Clocks
>> Graphics : 562 MHz
>> Memory : 2505 MHz
>> Default Applications Clocks
>> Graphics : 562 MHz
>> Memory : 2505 MHz
>> Max Clocks
>> Graphics : 875 MHz
>> SM : 875 MHz
>> Memory : 2505 MHz
>> Clock Policy
>> Auto Boost : On
>> Auto Boost Default : On
>> Processes : None
>>
>>
>> If it matters, when I do the tests with DO_PARALLEL="mpirun -np 4", I see
>> that each process is running a thread on both GPUs. For example:
>>
>> # gpu pid type sm mem enc dec command
>> # Idx # C/G % % % % name
>> 0 30599 C 24 0 0 0 pmemd.cuda_DPFP
>> 0 30600 C 0 0 0 0 pmemd.cuda_DPFP
>> 0 30601 C 11 0 0 0 pmemd.cuda_DPFP
>> 0 30602 C 0 0 0 0 pmemd.cuda_DPFP
>> 1 30599 C 0 0 0 0 pmemd.cuda_DPFP
>> 1 30600 C 36 0 0 0 pmemd.cuda_DPFP
>> 1 30601 C 0 0 0 0 pmemd.cuda_DPFP
>> 1 30602 C 6 0 0 0 pmemd.cuda_DPFP
>>
>> Is that expected behavior?
>>
>> Has anybody else had any problems using K80s with MPI and CUDA? Or using
>> CentOS/RHEL 6?
>>
>> This machine does have dual CPUs, could that be a factor?
>>
>> I'm currently using AmberTools version 16.12 and Amber version 16.05.
>>
>> Any insight would be greatly appreciated.
>>
>> Thanks,
>>
>> Steve
>>
>>
>>
>> On Mon, Jul 25, 2016 at 3:06 PM, Steven Ford <sford123.ibbr.umd.edu>
>> wrote:
>>
>>> Ross,
>>>
>>> This is CentOS version 6.7 with kernel version
>>> 2.6.32-573.22.1.el6.x86_64.
>>>
>>> The output of nvidia-smi is:
>>>
>>> +------------------------------------------------------+
>>>
>>> | NVIDIA-SMI 352.79 Driver Version: 352.79 |
>>>
>>> |-------------------------------+----------------------+----
>>> ------------------+
>>> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile
>>> Uncorr. ECC |
>>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util
>>> Compute M. |
>>> |===============================+======================+====
>>> ==================|
>>> | 0 Tesla K80 Off | 0000:05:00.0 Off |
>>> Off |
>>> | N/A 34C P0 59W / 149W | 56MiB / 12287MiB | 0%
>>> Default |
>>> +-------------------------------+----------------------+----
>>> ------------------+
>>> | 1 Tesla K80 Off | 0000:06:00.0 Off |
>>> Off |
>>> | N/A 27C P0 48W / 149W | 56MiB / 12287MiB | 0%
>>> Default |
>>> +-------------------------------+----------------------+----
>>> ------------------+
>>>
>>>
>>> +-----------------------------------------------------------
>>> ------------------+
>>> | Processes: GPU
>>> Memory |
>>> | GPU PID Type Process name Usage
>>> |
>>> |===========================================================
>>> ==================|
>>> | No running processes found
>>> |
>>> +-----------------------------------------------------------
>>> ------------------+
>>>
>>> The version of nvcc:
>>>
>>> nvcc: NVIDIA (R) Cuda compiler driver
>>> Copyright (c) 2005-2015 NVIDIA Corporation
>>> Built on Tue_Aug_11_14:27:32_CDT_2015
>>> Cuda compilation tools, release 7.5, V7.5.17
>>>
>>> I used the GNU compilers, version 4.4.7.
>>>
>>> I am using OpenMPI version 1.8.1-5.el6 from the CentOS repository. I
>>> have not tried any other MPI installation.
>>>
>>> Output of mpif90 --showme:
>>>
>>> gfortran -I/usr/include/openmpi-x86_64 -pthread -I/usr/lib64/openmpi/lib
>>> -Wl,-rpath -Wl,/usr/lib64/openmpi/lib -Wl,--enable-new-dtags
>>> -L/usr/lib64/openmpi/lib -lmpi_usempi -lmpi_mpifh -lmpi
>>>
>>>
>>> I set DO_PARALLEL to "mpirun -np 2"
>>>
>>> The parallel tests for the CPU were all successful.
>>>
>>> I had not run 'make clean' in between each step. I tried the tests again
>>> this morning after running 'make clean' and got the same result.
>>>
>>> I applied all patches this morning before testing again. I am using
>>> AmberTools 16.10 and Amber 16.04
>>>
>>>
>>> Thanks,
>>>
>>> Steve
>>>
>>> On Sat, Jul 23, 2016 at 6:32 PM, Ross Walker <ross.rosswalker.co.uk>
>>> wrote:
>>>
>>>> Hi Steven,
>>>>
>>>> This is a large number of very worrying failures. Something is
>>>> definitely very wrong here and I'd like to investigate further. Can you
>>>> give me some more details about your system please. This includes:
>>>>
>>>> The specifics of what version of Linux you are using.
>>>>
>>>> The output of nvidia-smi
>>>>
>>>> nvcc -V (might be lower case v to get version info).
>>>>
>>>> Did you use the GNU compilers or the Intel compilers and in either case
>>>> which version?
>>>>
>>>> OpenMPI - can you confirm the version again and also send me the output
>>>> of mpif90 --showme (it might be --show or -show or something similar) -
>>>> essentially I want to see what the underlying compilation line is.
>>>>
>>>> Can you confirm what you had $DO_PARALLEL set to when you ran make test
>>>> for the parallel GPU build. Also can you confirm if the regular (CPU)
>>>> parallel build passed the tests please?
>>>>
>>>> Also did you run 'make clean' before each build step? E.g.
>>>>
>>>> ./configure -cuda gnu
>>>> make -j8 install
>>>> make test
>>>> *make clean*
>>>>
>>>> ./configure -cuda -mpi gnu
>>>> make -j8 install
>>>> make test
>>>>
>>>> Have you tried any other MPI installations? - E.g. MPICH?
>>>>
>>>> And finally can you please confirm which version of Amber (and
>>>> AmberTools) this is and which patches have been applied?
>>>>
>>>> Thanks.
>>>>
>>>> All the best
>>>> Ross
>>>>
>>>> On Jul 21, 2016, at 14:20, Steven Ford <sford123.ibbr.umd.edu> wrote:
>>>>
>>>> Ross,
>>>>
>>>> Attached are the log and diff files. Thank you for taking a look.
>>>>
>>>> Regards,
>>>>
>>>> Steve
>>>>
>>>> On Thu, Jul 21, 2016 at 5:34 AM, Ross Walker <ross.rosswalker.co.uk>
>>>> wrote:
>>>>
>>>>> Hi Steve,
>>>>>
>>>>> Indeed that is too big a difference to just be rounding error -
>>>>> although if those tests are using Langevin or Anderson for the thermostat
>>>>> that would explain it (different random number streams) - although those
>>>>> tests are supposed to be skipped in parallel.
>>>>>
>>>>> Can you send me a copy directly of your .log and .dif files for the 2
>>>>> GPU run and I'll take a closer look at it.
>>>>>
>>>>> All the best
>>>>> Ross
>>>>>
>>>>> > On Jul 20, 2016, at 21:19, Steven Ford <sford123.ibbr.umd.edu>
>>>>> wrote:
>>>>> >
>>>>> > Hello All,
>>>>> >
>>>>> > I currently trying to get Amber16 installed and running on our
>>>>> computing
>>>>> > cluster. Our researchers are primarily interested in running the GPU
>>>>> > accelerated programs. For GPU computing jobs, we have one CentOS 6.7
>>>>> node
>>>>> > with a Tesla K80.
>>>>> >
>>>>> > I was able to build Amber16 and run the Serial/Parallel CPU plus the
>>>>> Serial
>>>>> > GPU tests with all file comparisons passing. However, only 5
>>>>> parallel GPU
>>>>> > tests succeeded, while the other 100 comparisons failed.
>>>>> >
>>>>> > Examining the diff file shows that some of the numbers are not off
>>>>> by much
>>>>> > like the documentation said could happen. For example:
>>>>> >
>>>>> > 66c66
>>>>> > < NSTEP = 1 TIME(PS) = 50.002 TEMP(K) = 351.27
>>>>> PRESS =
>>>>> > 0.
>>>>> >> NSTEP = 1 TIME(PS) = 50.002 TEMP(K) = 353.29
>>>>> PRESS =
>>>>> > 0.
>>>>> >
>>>>> > This may also be too large to attribute to a rounding error, but it
>>>>> is a
>>>>> > small difference compared to others:
>>>>> >
>>>>> > 85c85
>>>>> > < Etot = -217.1552 EKtot = 238.6655 EPtot =
>>>>> > -455.8207
>>>>> >> Etot = -1014.2562 EKtot = 244.6242 EPtot =
>>>>> > -1258.8804
>>>>> >
>>>>> > This was build with CUDA 7.5, OpenMPI 1.8, and run with
>>>>> DO_PARALLEL="mpirun
>>>>> > -np 2"
>>>>> >
>>>>> > Any idea what else could be affecting the output?
>>>>> >
>>>>> > Thanks,
>>>>> >
>>>>> > Steve
>>>>> >
>>>>> > --
>>>>> > Steven Ford
>>>>> > IT Infrastructure Specialist
>>>>> > Institute for Bioscience and Biotechnology Research
>>>>> > University of Maryland
>>>>> > (240)314-6405
>>>>> > _______________________________________________
>>>>> > AMBER mailing list
>>>>> > AMBER.ambermd.org
>>>>> > http://lists.ambermd.org/mailman/listinfo/amber
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> AMBER mailing list
>>>>> AMBER.ambermd.org
>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Steven Ford
>>>> IT Infrastructure Specialist
>>>> Institute for Bioscience and Biotechnology Research
>>>> University of Maryland
>>>> (240)314-6405
>>>> <2016-07-20_11-17-52.diff><2016-07-20_11-17-52.log>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Steven Ford
>>> IT Infrastructure Specialist
>>> Institute for Bioscience and Biotechnology Research
>>> University of Maryland
>>> (240)314-6405
>>>
>>
>>
>>
>> --
>> Steven Ford
>> IT Infrastructure Specialist
>> Institute for Bioscience and Biotechnology Research
>> University of Maryland
>> (240)314-6405
>>
>>
>>
>
>
> --
> Steven Ford
> IT Infrastructure Specialist
> Institute for Bioscience and Biotechnology Research
> University of Maryland
> (240)314-6405
>
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Aug 11 2016 - 22:00:02 PDT