Re: [AMBER] Amber16 Parallel CUDA Tests

From: Steven Ford <sford123.ibbr.umd.edu>
Date: Fri, 12 Aug 2016 12:44:54 -0400

Ross,

I updated the BIOS and disabled ASCCtrl. All 115 test pass now.

Thanks,

Steve

On Aug 12, 2016 12:39 AM, "Ross Walker" <ross.rosswalker.co.uk> wrote:

> Hi Steven,
>
> Yes there is but the naming may vary by motherboard manufacturer. Look for
> something along the lines of ACSCtrl (supermicro calls it this) and set it
> to disabled if the option exists.
>
> All the best
> Ross
>
> > On Aug 11, 2016, at 9:33 PM, Steven Ford <sford123.ibbr.umd.edu <mailto:
> sford123.ibbr.umd.edu>> wrote:
> >
> > Ross,
> >
> > Thanks, I will look for bios updates. Firmware aside, is there any
> configuration in the bios that would affect this?
> >
> > Thanks,
> >
> > Steve
> >
> >
> > On Aug 12, 2016 12:29 AM, "Ross Walker" <ross.rosswalker.co.uk <mailto:
> ross.rosswalker.co.uk>> wrote:
> > Hi Steven,
> >
> > Ah I thought you meant you had 4 GPUs as in 2 K80s rather than a single
> K80 card that contains 2 GPUs.
> >
> > Either way this shows your hardware is incorrectly configured / has a
> buggy bios. Who makes it? You probably need to go back to them and get an
> updated bios that properly handles peer to peer communication.
> >
> > You could also check the motherboard manufacturer and see if they have
> an up to date bios that fixes this bug.
> >
> > All those entries reported by lspci should have a minus after them if
> things are correct in the bios.
> >
> > All the best
> > Ross
> >
> >> On Aug 11, 2016, at 9:21 PM, Steven Ford <sford123.ibbr.umd.edu
> <mailto:sford123.ibbr.umd.edu>> wrote:
> >>
> >> Ross,
> >>
> >> The output of lspci -d "10b5:*" -vvv | grep ACSCtl is:
> >>
> >> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+
> EgressCtrl- DirectTrans-
> >> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+
> EgressCtrl- DirectTrans-
> >>
> >>
> >> With CUDA_VISIBLE_DEVICES unset:
> >>
> >> [./simpleP2P] - Starting...
> >> Checking for multiple GPUs...
> >> CUDA-capable device count: 2
> >> > GPU0 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> >> > GPU1 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> >>
> >> Checking GPU(s) for support of peer to peer memory access...
> >> > Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes
> >> > Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes
> >> Enabling peer access between GPU0 and GPU1...
> >> Checking GPU0 and GPU1 for UVA capabilities...
> >> > Tesla K80 (GPU0) supports UVA: Yes
> >> > Tesla K80 (GPU1) supports UVA: Yes
> >> Both GPUs can support UVA, enabling...
> >> Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
> >> Creating event handles...
> >> cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 1.11GB/s
> >> Preparing host buffer and memcpy to GPU0...
> >> Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
> >> Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
> >> Copy data back to host from GPU0 and verify results...
> >> Verification error . element 0: val = nan, ref = 0.000000
> >> Verification error . element 1: val = nan, ref = 4.000000
> >> Verification error . element 2: val = nan, ref = 8.000000
> >> Verification error . element 3: val = nan, ref = 12.000000
> >> Verification error . element 4: val = nan, ref = 16.000000
> >> Verification error . element 5: val = nan, ref = 20.000000
> >> Verification error . element 6: val = nan, ref = 24.000000
> >> Verification error . element 7: val = nan, ref = 28.000000
> >> Verification error . element 8: val = nan, ref = 32.000000
> >> Verification error . element 9: val = nan, ref = 36.000000
> >> Verification error . element 10: val = nan, ref = 40.000000
> >> Verification error . element 11: val = nan, ref = 44.000000
> >> Disabling peer access...
> >> Shutting down...
> >> Test failed!
> >>
> >> With CUDA_VISIBLE_DEVICES=0,1
> >>
> >> [./simpleP2P] - Starting...
> >> Checking for multiple GPUs...
> >> CUDA-capable device count: 2
> >> > GPU0 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> >> > GPU1 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> >>
> >> Checking GPU(s) for support of peer to peer memory access...
> >> > Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes
> >> > Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes
> >> Enabling peer access between GPU0 and GPU1...
> >> Checking GPU0 and GPU1 for UVA capabilities...
> >> > Tesla K80 (GPU0) supports UVA: Yes
> >> > Tesla K80 (GPU1) supports UVA: Yes
> >> Both GPUs can support UVA, enabling...
> >> Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
> >> Creating event handles...
> >> cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 1.11GB/s
> >> Preparing host buffer and memcpy to GPU0...
> >> Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
> >> Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
> >> Copy data back to host from GPU0 and verify results...
> >> Verification error . element 0: val = nan, ref = 0.000000
> >> Verification error . element 1: val = nan, ref = 4.000000
> >> Verification error . element 2: val = nan, ref = 8.000000
> >> Verification error . element 3: val = nan, ref = 12.000000
> >> Verification error . element 4: val = nan, ref = 16.000000
> >> Verification error . element 5: val = nan, ref = 20.000000
> >> Verification error . element 6: val = nan, ref = 24.000000
> >> Verification error . element 7: val = nan, ref = 28.000000
> >> Verification error . element 8: val = nan, ref = 32.000000
> >> Verification error . element 9: val = nan, ref = 36.000000
> >> Verification error . element 10: val = nan, ref = 40.000000
> >> Verification error . element 11: val = nan, ref = 44.000000
> >> Disabling peer access...
> >> Shutting down...
> >> Test failed!
> >>
> >>
> >> With CUDA_VISIBLE_DEVICES=2,3
> >>
> >> [./simpleP2P] - Starting...
> >> Checking for multiple GPUs...
> >> CUDA error at simpleP2P.cu:63 code=38(cudaErrorNoDevice)
> "cudaGetDeviceCount(&gpu_n)"
> >>
> >>
> >> and with CUDA_VISIBLE_DEVICES=0,2
> >>
> >> CUDA-capable device count: 1
> >> Two or more GPUs with SM 2.0 or higher capability are required for
> ./simpleP2P.
> >> Waiving test.
> >>
> >>
> >> I'm guessing the last two test fail because I have only one card with
> two K80 GPUs on it, so no devices 2 or 3. Seems like something's awry with
> the peer to peer communication between 0 and 1. Is it possible for them to
> be on different PCIe domains even though they are on the same physical card?
> >>
> >> This makes me wonder: If each PCIe slot is connected to one CPU, should
> this system either use only one CPU or have another K80 in the other PCIe
> slot that's connected to the other CPU?
> >>
> >> If it helps, nvidia-smi topo -m shows:
> >>
> >> GPU0 GPU1 CPU Affinity
> >> GPU0 X PIX 0-7,16-23
> >> GPU1 PIX X 0-7,16-23
> >>
> >>
> >> Thanks again,
> >>
> >> Steve
> >>
> >> On Thu, Aug 11, 2016 at 11:17 PM, Ross Walker <ross.rosswalker.co.uk
> <mailto:ross.rosswalker.co.uk>> wrote:
> >> Hi Steve,
> >>
> >> I suspect your hardware is misconfigured. Can you run a couple of tests
> please.
> >>
> >> With CUDA_VISIBLE_DEVICES unset
> >>
> >> 1) As root run: lspci -d "10b5:*" -vvv | grep ACSCtl
> >>
> >> and post the output here.
> >>
> >> 2) Compile the CUDA samples installed as part of CUDA 7.5 and then run
> the following:
> >>
> >> unset CUDA_VISIBLE_DEVICES
> >> ./simpleP2P
> >>
> >> export CUDA_VISIBLE_DEVICES=0,1
> >> ./simpleP2P
> >>
> >> export CUDA_VISIBLE_DEVICES=2,3
> >> ./simpleP2P
> >>
> >> export CUDA_VISIBLE_DEVICES=0,2
> >> ./simpleP2P
> >>
> >> And post the results here.
> >>
> >> My suspicion is that your two K80s are on different PCI-E domains
> connected to different CPU sockets BUT your bios is misconfigured such that
> it is incorrectly reporting that the two K80s can talk to each other via
> P2P. Thus the first two simpleP2P runs above should pass. The last one will
> likely report that P2P is possible but then the bandwidth will be very low
> and it will ultimately fail the test because the array received by GPU 2
> will be garbage.
> >>
> >> If my suspicions are correct you would find the following behavior with
> AMBER
> >>
> >> 4 x 1 GPU runs, one on each GPU would be fine.
> >> (1 or 2) x 2 GPU runs will be fine if you use GPUS 0,1 and 2,3 but will
> fail if you were to use 0,2 - 0,3 - 1,2 or 1,3
> >> 1 x 4 GPU runs will fail unless you restrict it to GPUs 0,1 or 2,3 and
> thus overload the GPUs.
> >>
> >> Ps. nvidia-smi reporting 2 threads per mpi task is not an issue - it to
> be expected.
> >>
> >> All the best
> >> Ross
> >>
> >>> On Aug 11, 2016, at 7:54 PM, Steven Ford <sford123.ibbr.umd.edu
> <mailto:sford123.ibbr.umd.edu>> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I'm still trying to figure out why the MPI CUDA tests are failing.
> >>>
> >>> If I run tests with DO_PARALLEL="mpirun -np 4" and limit
> CUDA_VISIBLE_DEVICES to only 0 or 1, all tests pass. I get the same
> behavior with OpenMPI 1.8, 1.10, 2.0 and mpich 3.1.
> >>>
> >>> I ran gpuP2PCheck just in case communication between the GPUs was the
> problem. It confirms that communication is working:
> >>>
> >>> CUDA-capable device count: 2
> >>> GPU0 " Tesla K80"
> >>> GPU1 " Tesla K80"
> >>>
> >>> Two way peer access between:
> >>> GPU0 and GPU1: YES
> >>>
> >>> If it's of any use, here is the output of nvidia-smi -q:
> >>>
> >>> ==============NVSMI LOG==============
> >>>
> >>> Timestamp : Thu Aug 11 22:42:34 2016
> >>> Driver Version : 352.93
> >>>
> >>> Attached GPUs : 2
> >>> GPU 0000:05:00.0
> >>> Product Name : Tesla K80
> >>> Product Brand : Tesla
> >>> Display Mode : Disabled
> >>> Display Active : Disabled
> >>> Persistence Mode : Disabled
> >>> Accounting Mode : Disabled
> >>> Accounting Mode Buffer Size : 1920
> >>> Driver Model
> >>> Current : N/A
> >>> Pending : N/A
> >>> Serial Number : 0325015055313
> >>> GPU UUID : GPU-a65eaa77-8871-ded5-b6ee-
> 5268404192f1
> >>> Minor Number : 0
> >>> VBIOS Version : 80.21.1B.00.01
> >>> MultiGPU Board : Yes
> >>> Board ID : 0x300
> >>> Inforom Version
> >>> Image Version : 2080.0200.00.04
> >>> OEM Object : 1.1
> >>> ECC Object : 3.0
> >>> Power Management Object : N/A
> >>> GPU Operation Mode
> >>> Current : N/A
> >>> Pending : N/A
> >>> PCI
> >>> Bus : 0x05
> >>> Device : 0x00
> >>> Domain : 0x0000
> >>> Device Id : 0x102D10DE
> >>> Bus Id : 0000:05:00.0
> >>> Sub System Id : 0x106C10DE
> >>> GPU Link Info
> >>> PCIe Generation
> >>> Max : 3
> >>> Current : 3
> >>> Link Width
> >>> Max : 16x
> >>> Current : 16x
> >>> Bridge Chip
> >>> Type : PLX
> >>> Firmware : 0xF0472900
> >>> Replays since reset : 0
> >>> Tx Throughput : N/A
> >>> Rx Throughput : N/A
> >>> Fan Speed : N/A
> >>> Performance State : P0
> >>> Clocks Throttle Reasons
> >>> Idle : Not Active
> >>> Applications Clocks Setting : Active
> >>> SW Power Cap : Not Active
> >>> HW Slowdown : Not Active
> >>> Unknown : Not Active
> >>> FB Memory Usage
> >>> Total : 12287 MiB
> >>> Used : 56 MiB
> >>> Free : 12231 MiB
> >>> BAR1 Memory Usage
> >>> Total : 16384 MiB
> >>> Used : 2 MiB
> >>> Free : 16382 MiB
> >>> Compute Mode : Default
> >>> Utilization
> >>> Gpu : 0 %
> >>> Memory : 0 %
> >>> Encoder : 0 %
> >>> Decoder : 0 %
> >>> Ecc Mode
> >>> Current : Disabled
> >>> Pending : Disabled
> >>> ECC Errors
> >>> Volatile
> >>> Single Bit
> >>> Device Memory : N/A
> >>> Register File : N/A
> >>> L1 Cache : N/A
> >>> L2 Cache : N/A
> >>> Texture Memory : N/A
> >>> Total : N/A
> >>> Double Bit
> >>> Device Memory : N/A
> >>> Register File : N/A
> >>> L1 Cache : N/A
> >>> L2 Cache : N/A
> >>> Texture Memory : N/A
> >>> Total : N/A
> >>> Aggregate
> >>> Single Bit
> >>> Device Memory : N/A
> >>> Register File : N/A
> >>> L1 Cache : N/A
> >>> L2 Cache : N/A
> >>> Texture Memory : N/A
> >>> Total : N/A
> >>> Double Bit
> >>> Device Memory : N/A
> >>> Register File : N/A
> >>> L1 Cache : N/A
> >>> L2 Cache : N/A
> >>> Texture Memory : N/A
> >>> Total : N/A
> >>> Retired Pages
> >>> Single Bit ECC : 0
> >>> Double Bit ECC : 0
> >>> Pending : No
> >>> Temperature
> >>> GPU Current Temp : 31 C
> >>> GPU Shutdown Temp : 93 C
> >>> GPU Slowdown Temp : 88 C
> >>> Power Readings
> >>> Power Management : Supported
> >>> Power Draw : 59.20 W
> >>> Power Limit : 149.00 W
> >>> Default Power Limit : 149.00 W
> >>> Enforced Power Limit : 149.00 W
> >>> Min Power Limit : 100.00 W
> >>> Max Power Limit : 175.00 W
> >>> Clocks
> >>> Graphics : 562 MHz
> >>> SM : 562 MHz
> >>> Memory : 2505 MHz
> >>> Applications Clocks
> >>> Graphics : 562 MHz
> >>> Memory : 2505 MHz
> >>> Default Applications Clocks
> >>> Graphics : 562 MHz
> >>> Memory : 2505 MHz
> >>> Max Clocks
> >>> Graphics : 875 MHz
> >>> SM : 875 MHz
> >>> Memory : 2505 MHz
> >>> Clock Policy
> >>> Auto Boost : On
> >>> Auto Boost Default : On
> >>> Processes : None
> >>>
> >>> GPU 0000:06:00.0
> >>> Product Name : Tesla K80
> >>> Product Brand : Tesla
> >>> Display Mode : Disabled
> >>> Display Active : Disabled
> >>> Persistence Mode : Disabled
> >>> Accounting Mode : Disabled
> >>> Accounting Mode Buffer Size : 1920
> >>> Driver Model
> >>> Current : N/A
> >>> Pending : N/A
> >>> Serial Number : 0325015055313
> >>> GPU UUID : GPU-21c2be1c-72a9-1b68-adab-
> 459d05dd7adc
> >>> Minor Number : 1
> >>> VBIOS Version : 80.21.1B.00.02
> >>> MultiGPU Board : Yes
> >>> Board ID : 0x300
> >>> Inforom Version
> >>> Image Version : 2080.0200.00.04
> >>> OEM Object : 1.1
> >>> ECC Object : 3.0
> >>> Power Management Object : N/A
> >>> GPU Operation Mode
> >>> Current : N/A
> >>> Pending : N/A
> >>> PCI
> >>> Bus : 0x06
> >>> Device : 0x00
> >>> Domain : 0x0000
> >>> Device Id : 0x102D10DE
> >>> Bus Id : 0000:06:00.0
> >>> Sub System Id : 0x106C10DE
> >>> GPU Link Info
> >>> PCIe Generation
> >>> Max : 3
> >>> Current : 3
> >>> Link Width
> >>> Max : 16x
> >>> Current : 16x
> >>> Bridge Chip
> >>> Type : PLX
> >>> Firmware : 0xF0472900
> >>> Replays since reset : 0
> >>> Tx Throughput : N/A
> >>> Rx Throughput : N/A
> >>> Fan Speed : N/A
> >>> Performance State : P0
> >>> Clocks Throttle Reasons
> >>> Idle : Not Active
> >>> Applications Clocks Setting : Active
> >>> SW Power Cap : Not Active
> >>> HW Slowdown : Not Active
> >>> Unknown : Not Active
> >>> FB Memory Usage
> >>> Total : 12287 MiB
> >>> Used : 56 MiB
> >>> Free : 12231 MiB
> >>> BAR1 Memory Usage
> >>> Total : 16384 MiB
> >>> Used : 2 MiB
> >>> Free : 16382 MiB
> >>> Compute Mode : Default
> >>> Utilization
> >>> Gpu : 0 %
> >>> Memory : 0 %
> >>> Encoder : 0 %
> >>> Decoder : 0 %
> >>> Ecc Mode
> >>> Current : Disabled
> >>> Pending : Disabled
> >>> ECC Errors
> >>> Volatile
> >>> Single Bit
> >>> Device Memory : N/A
> >>> Register File : N/A
> >>> L1 Cache : N/A
> >>> L2 Cache : N/A
> >>> Texture Memory : N/A
> >>> Total : N/A
> >>> Double Bit
> >>> Device Memory : N/A
> >>> Register File : N/A
> >>> L1 Cache : N/A
> >>> L2 Cache : N/A
> >>> Texture Memory : N/A
> >>> Total : N/A
> >>> Aggregate
> >>> Single Bit
> >>> Device Memory : N/A
> >>> Register File : N/A
> >>> L1 Cache : N/A
> >>> L2 Cache : N/A
> >>> Texture Memory : N/A
> >>> Total : N/A
> >>> Double Bit
> >>> Device Memory : N/A
> >>> Register File : N/A
> >>> L1 Cache : N/A
> >>> L2 Cache : N/A
> >>> Texture Memory : N/A
> >>> Total : N/A
> >>> Retired Pages
> >>> Single Bit ECC : 0
> >>> Double Bit ECC : 0
> >>> Pending : No
> >>> Temperature
> >>> GPU Current Temp : 24 C
> >>> GPU Shutdown Temp : 93 C
> >>> GPU Slowdown Temp : 88 C
> >>> Power Readings
> >>> Power Management : Supported
> >>> Power Draw : 70.89 W
> >>> Power Limit : 149.00 W
> >>> Default Power Limit : 149.00 W
> >>> Enforced Power Limit : 149.00 W
> >>> Min Power Limit : 100.00 W
> >>> Max Power Limit : 175.00 W
> >>> Clocks
> >>> Graphics : 562 MHz
> >>> SM : 562 MHz
> >>> Memory : 2505 MHz
> >>> Applications Clocks
> >>> Graphics : 562 MHz
> >>> Memory : 2505 MHz
> >>> Default Applications Clocks
> >>> Graphics : 562 MHz
> >>> Memory : 2505 MHz
> >>> Max Clocks
> >>> Graphics : 875 MHz
> >>> SM : 875 MHz
> >>> Memory : 2505 MHz
> >>> Clock Policy
> >>> Auto Boost : On
> >>> Auto Boost Default : On
> >>> Processes : None
> >>>
> >>>
> >>> If it matters, when I do the tests with DO_PARALLEL="mpirun -np 4", I
> see that each process is running a thread on both GPUs. For example:
> >>>
> >>> # gpu pid type sm mem enc dec command
> >>> # Idx # C/G % % % % name
> >>> 0 30599 C 24 0 0 0 pmemd.cuda_DPFP
> >>> 0 30600 C 0 0 0 0 pmemd.cuda_DPFP
> >>> 0 30601 C 11 0 0 0 pmemd.cuda_DPFP
> >>> 0 30602 C 0 0 0 0 pmemd.cuda_DPFP
> >>> 1 30599 C 0 0 0 0 pmemd.cuda_DPFP
> >>> 1 30600 C 36 0 0 0 pmemd.cuda_DPFP
> >>> 1 30601 C 0 0 0 0 pmemd.cuda_DPFP
> >>> 1 30602 C 6 0 0 0 pmemd.cuda_DPFP
> >>>
> >>> Is that expected behavior?
> >>>
> >>> Has anybody else had any problems using K80s with MPI and CUDA? Or
> using CentOS/RHEL 6?
> >>>
> >>> This machine does have dual CPUs, could that be a factor?
> >>>
> >>> I'm currently using AmberTools version 16.12 and Amber version 16.05.
> >>>
> >>> Any insight would be greatly appreciated.
> >>>
> >>> Thanks,
> >>>
> >>> Steve
> >>>
> >>>
> >>>
> >>> On Mon, Jul 25, 2016 at 3:06 PM, Steven Ford <sford123.ibbr.umd.edu
> <mailto:sford123.ibbr.umd.edu>> wrote:
> >>> Ross,
> >>>
> >>> This is CentOS version 6.7 with kernel version
> 2.6.32-573.22.1.el6.x86_64.
> >>>
> >>> The output of nvidia-smi is:
> >>>
> >>> +------------------------------------------------------+
> >>> | NVIDIA-SMI 352.79 Driver Version: 352.79 |
> >>> |-------------------------------+----------------------+----
> ------------------+
> >>> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile
> Uncorr. ECC |
> >>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util
> Compute M. |
> >>> |===============================+======================+====
> ==================|
> >>> | 0 Tesla K80 Off | 0000:05:00.0 Off |
> Off |
> >>> | N/A 34C P0 59W / 149W | 56MiB / 12287MiB | 0%
> Default |
> >>> +-------------------------------+----------------------+----
> ------------------+
> >>> | 1 Tesla K80 Off | 0000:06:00.0 Off |
> Off |
> >>> | N/A 27C P0 48W / 149W | 56MiB / 12287MiB | 0%
> Default |
> >>> +-------------------------------+----------------------+----
> ------------------+
> >>>
> >>> +-----------------------------------------------------------
> ------------------+
> >>> | Processes: GPU
> Memory |
> >>> | GPU PID Type Process name
> Usage |
> >>> |===========================================================
> ==================|
> >>> | No running processes found
> |
> >>> +-----------------------------------------------------------
> ------------------+
> >>>
> >>> The version of nvcc:
> >>>
> >>> nvcc: NVIDIA (R) Cuda compiler driver
> >>> Copyright (c) 2005-2015 NVIDIA Corporation
> >>> Built on Tue_Aug_11_14:27:32_CDT_2015
> >>> Cuda compilation tools, release 7.5, V7.5.17
> >>>
> >>> I used the GNU compilers, version 4.4.7.
> >>>
> >>> I am using OpenMPI version 1.8.1-5.el6 from the CentOS repository. I
> have not tried any other MPI installation.
> >>>
> >>> Output of mpif90 --showme:
> >>>
> >>> gfortran -I/usr/include/openmpi-x86_64 -pthread
> -I/usr/lib64/openmpi/lib -Wl,-rpath -Wl,/usr/lib64/openmpi/lib
> -Wl,--enable-new-dtags -L/usr/lib64/openmpi/lib -lmpi_usempi -lmpi_mpifh
> -lmpi
> >>>
> >>>
> >>> I set DO_PARALLEL to "mpirun -np 2"
> >>>
> >>> The parallel tests for the CPU were all successful.
> >>>
> >>> I had not run 'make clean' in between each step. I tried the tests
> again this morning after running 'make clean' and got the same result.
> >>>
> >>> I applied all patches this morning before testing again. I am using
> AmberTools 16.10 and Amber 16.04
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Steve
> >>>
> >>> On Sat, Jul 23, 2016 at 6:32 PM, Ross Walker <ross.rosswalker.co.uk
> <mailto:ross.rosswalker.co.uk>> wrote:
> >>> Hi Steven,
> >>>
> >>> This is a large number of very worrying failures. Something is
> definitely very wrong here and I'd like to investigate further. Can you
> give me some more details about your system please. This includes:
> >>>
> >>> The specifics of what version of Linux you are using.
> >>>
> >>> The output of nvidia-smi
> >>>
> >>> nvcc -V (might be lower case v to get version info).
> >>>
> >>> Did you use the GNU compilers or the Intel compilers and in either
> case which version?
> >>>
> >>> OpenMPI - can you confirm the version again and also send me the
> output of mpif90 --showme (it might be --show or -show or something
> similar) - essentially I want to see what the underlying compilation line
> is.
> >>>
> >>> Can you confirm what you had $DO_PARALLEL set to when you ran make
> test for the parallel GPU build. Also can you confirm if the regular (CPU)
> parallel build passed the tests please?
> >>>
> >>> Also did you run 'make clean' before each build step? E.g.
> >>>
> >>> ./configure -cuda gnu
> >>> make -j8 install
> >>> make test
> >>> make clean
> >>>
> >>> ./configure -cuda -mpi gnu
> >>> make -j8 install
> >>> make test
> >>>
> >>> Have you tried any other MPI installations? - E.g. MPICH?
> >>>
> >>> And finally can you please confirm which version of Amber (and
> AmberTools) this is and which patches have been applied?
> >>>
> >>> Thanks.
> >>>
> >>> All the best
> >>> Ross
> >>>
> >>>> On Jul 21, 2016, at 14:20, Steven Ford <sford123.ibbr.umd.edu
> <mailto:sford123.ibbr.umd.edu>> wrote:
> >>>>
> >>>> Ross,
> >>>>
> >>>> Attached are the log and diff files. Thank you for taking a look.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Steve
> >>>>
> >>>> On Thu, Jul 21, 2016 at 5:34 AM, Ross Walker <ross.rosswalker.co.uk
> <mailto:ross.rosswalker.co.uk>> wrote:
> >>>> Hi Steve,
> >>>>
> >>>> Indeed that is too big a difference to just be rounding error -
> although if those tests are using Langevin or Anderson for the thermostat
> that would explain it (different random number streams) - although those
> tests are supposed to be skipped in parallel.
> >>>>
> >>>> Can you send me a copy directly of your .log and .dif files for the 2
> GPU run and I'll take a closer look at it.
> >>>>
> >>>> All the best
> >>>> Ross
> >>>>
> >>>> > On Jul 20, 2016, at 21:19, Steven Ford <sford123.ibbr.umd.edu
> <mailto:sford123.ibbr.umd.edu>> wrote:
> >>>> >
> >>>> > Hello All,
> >>>> >
> >>>> > I currently trying to get Amber16 installed and running on our
> computing
> >>>> > cluster. Our researchers are primarily interested in running the GPU
> >>>> > accelerated programs. For GPU computing jobs, we have one CentOS
> 6.7 node
> >>>> > with a Tesla K80.
> >>>> >
> >>>> > I was able to build Amber16 and run the Serial/Parallel CPU plus
> the Serial
> >>>> > GPU tests with all file comparisons passing. However, only 5
> parallel GPU
> >>>> > tests succeeded, while the other 100 comparisons failed.
> >>>> >
> >>>> > Examining the diff file shows that some of the numbers are not off
> by much
> >>>> > like the documentation said could happen. For example:
> >>>> >
> >>>> > 66c66
> >>>> > < NSTEP = 1 TIME(PS) = 50.002 TEMP(K) = 351.27
> PRESS =
> >>>> > 0.
> >>>> >> NSTEP = 1 TIME(PS) = 50.002 TEMP(K) = 353.29
> PRESS =
> >>>> > 0.
> >>>> >
> >>>> > This may also be too large to attribute to a rounding error, but it
> is a
> >>>> > small difference compared to others:
> >>>> >
> >>>> > 85c85
> >>>> > < Etot = -217.1552 EKtot = 238.6655 EPtot =
> >>>> > -455.8207
> >>>> >> Etot = -1014.2562 EKtot = 244.6242 EPtot =
> >>>> > -1258.8804
> >>>> >
> >>>> > This was build with CUDA 7.5, OpenMPI 1.8, and run with
> DO_PARALLEL="mpirun
> >>>> > -np 2"
> >>>> >
> >>>> > Any idea what else could be affecting the output?
> >>>> >
> >>>> > Thanks,
> >>>> >
> >>>> > Steve
> >>>> >
> >>>> > --
> >>>> > Steven Ford
> >>>> > IT Infrastructure Specialist
> >>>> > Institute for Bioscience and Biotechnology Research
> >>>> > University of Maryland
> >>>> > (240)314-6405 <tel:%28240%29314-6405>
> >>>> > _______________________________________________
> >>>> > AMBER mailing list
> >>>> > AMBER.ambermd.org <mailto:AMBER.ambermd.org>
> >>>> > http://lists.ambermd.org/mailman/listinfo/amber <
> http://lists.ambermd.org/mailman/listinfo/amber>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> AMBER mailing list
> >>>> AMBER.ambermd.org <mailto:AMBER.ambermd.org>
> >>>> http://lists.ambermd.org/mailman/listinfo/amber <
> http://lists.ambermd.org/mailman/listinfo/amber>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Steven Ford
> >>>> IT Infrastructure Specialist
> >>>> Institute for Bioscience and Biotechnology Research
> >>>> University of Maryland
> >>>> (240)314-6405 <tel:%28240%29314-6405>
> >>>> <2016-07-20_11-17-52.diff><2016-07-20_11-17-52.log>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Steven Ford
> >>> IT Infrastructure Specialist
> >>> Institute for Bioscience and Biotechnology Research
> >>> University of Maryland
> >>> (240)314-6405 <tel:%28240%29314-6405>
> >>>
> >>>
> >>>
> >>> --
> >>> Steven Ford
> >>> IT Infrastructure Specialist
> >>> Institute for Bioscience and Biotechnology Research
> >>> University of Maryland
> >>> (240)314-6405 <tel:%28240%29314-6405>
> >>
> >>
> >>
> >>
> >> --
> >> Steven Ford
> >> IT Infrastructure Specialist
> >> Institute for Bioscience and Biotechnology Research
> >> University of Maryland
> >> (240)314-6405 <tel:%28240%29314-6405>
> >
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Aug 12 2016 - 10:00:03 PDT
Custom Search