Hi Steven,
Ah I thought you meant you had 4 GPUs as in 2 K80s rather than a single K80 card that contains 2 GPUs.
Either way this shows your hardware is incorrectly configured / has a buggy bios. Who makes it? You probably need to go back to them and get an updated bios that properly handles peer to peer communication.
You could also check the motherboard manufacturer and see if they have an up to date bios that fixes this bug.
All those entries reported by lspci should have a minus after them if things are correct in the bios.
All the best
Ross
> On Aug 11, 2016, at 9:21 PM, Steven Ford <sford123.ibbr.umd.edu> wrote:
>
> Ross,
>
> The output of lspci -d "10b5:*" -vvv | grep ACSCtl is:
>
> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
>
>
> With CUDA_VISIBLE_DEVICES unset:
>
> [./simpleP2P] - Starting...
> Checking for multiple GPUs...
> CUDA-capable device count: 2
> > GPU0 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> > GPU1 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
>
> Checking GPU(s) for support of peer to peer memory access...
> > Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes
> > Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes
> Enabling peer access between GPU0 and GPU1...
> Checking GPU0 and GPU1 for UVA capabilities...
> > Tesla K80 (GPU0) supports UVA: Yes
> > Tesla K80 (GPU1) supports UVA: Yes
> Both GPUs can support UVA, enabling...
> Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
> Creating event handles...
> cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 1.11GB/s
> Preparing host buffer and memcpy to GPU0...
> Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
> Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
> Copy data back to host from GPU0 and verify results...
> Verification error . element 0: val = nan, ref = 0.000000
> Verification error . element 1: val = nan, ref = 4.000000
> Verification error . element 2: val = nan, ref = 8.000000
> Verification error . element 3: val = nan, ref = 12.000000
> Verification error . element 4: val = nan, ref = 16.000000
> Verification error . element 5: val = nan, ref = 20.000000
> Verification error . element 6: val = nan, ref = 24.000000
> Verification error . element 7: val = nan, ref = 28.000000
> Verification error . element 8: val = nan, ref = 32.000000
> Verification error . element 9: val = nan, ref = 36.000000
> Verification error . element 10: val = nan, ref = 40.000000
> Verification error . element 11: val = nan, ref = 44.000000
> Disabling peer access...
> Shutting down...
> Test failed!
>
> With CUDA_VISIBLE_DEVICES=0,1
>
> [./simpleP2P] - Starting...
> Checking for multiple GPUs...
> CUDA-capable device count: 2
> > GPU0 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> > GPU1 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
>
> Checking GPU(s) for support of peer to peer memory access...
> > Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes
> > Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes
> Enabling peer access between GPU0 and GPU1...
> Checking GPU0 and GPU1 for UVA capabilities...
> > Tesla K80 (GPU0) supports UVA: Yes
> > Tesla K80 (GPU1) supports UVA: Yes
> Both GPUs can support UVA, enabling...
> Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
> Creating event handles...
> cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 1.11GB/s
> Preparing host buffer and memcpy to GPU0...
> Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
> Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
> Copy data back to host from GPU0 and verify results...
> Verification error . element 0: val = nan, ref = 0.000000
> Verification error . element 1: val = nan, ref = 4.000000
> Verification error . element 2: val = nan, ref = 8.000000
> Verification error . element 3: val = nan, ref = 12.000000
> Verification error . element 4: val = nan, ref = 16.000000
> Verification error . element 5: val = nan, ref = 20.000000
> Verification error . element 6: val = nan, ref = 24.000000
> Verification error . element 7: val = nan, ref = 28.000000
> Verification error . element 8: val = nan, ref = 32.000000
> Verification error . element 9: val = nan, ref = 36.000000
> Verification error . element 10: val = nan, ref = 40.000000
> Verification error . element 11: val = nan, ref = 44.000000
> Disabling peer access...
> Shutting down...
> Test failed!
>
>
> With CUDA_VISIBLE_DEVICES=2,3
>
> [./simpleP2P] - Starting...
> Checking for multiple GPUs...
> CUDA error at simpleP2P.cu:63 code=38(cudaErrorNoDevice) "cudaGetDeviceCount(&gpu_n)"
>
>
> and with CUDA_VISIBLE_DEVICES=0,2
>
> CUDA-capable device count: 1
> Two or more GPUs with SM 2.0 or higher capability are required for ./simpleP2P.
> Waiving test.
>
>
> I'm guessing the last two test fail because I have only one card with two K80 GPUs on it, so no devices 2 or 3. Seems like something's awry with the peer to peer communication between 0 and 1. Is it possible for them to be on different PCIe domains even though they are on the same physical card?
>
> This makes me wonder: If each PCIe slot is connected to one CPU, should this system either use only one CPU or have another K80 in the other PCIe slot that's connected to the other CPU?
>
> If it helps, nvidia-smi topo -m shows:
>
> GPU0 GPU1 CPU Affinity
> GPU0 X PIX 0-7,16-23
> GPU1 PIX X 0-7,16-23
>
>
> Thanks again,
>
> Steve
>
> On Thu, Aug 11, 2016 at 11:17 PM, Ross Walker <ross.rosswalker.co.uk <mailto:ross.rosswalker.co.uk>> wrote:
> Hi Steve,
>
> I suspect your hardware is misconfigured. Can you run a couple of tests please.
>
> With CUDA_VISIBLE_DEVICES unset
>
> 1) As root run: lspci -d "10b5:*" -vvv | grep ACSCtl
>
> and post the output here.
>
> 2) Compile the CUDA samples installed as part of CUDA 7.5 and then run the following:
>
> unset CUDA_VISIBLE_DEVICES
> ./simpleP2P
>
> export CUDA_VISIBLE_DEVICES=0,1
> ./simpleP2P
>
> export CUDA_VISIBLE_DEVICES=2,3
> ./simpleP2P
>
> export CUDA_VISIBLE_DEVICES=0,2
> ./simpleP2P
>
> And post the results here.
>
> My suspicion is that your two K80s are on different PCI-E domains connected to different CPU sockets BUT your bios is misconfigured such that it is incorrectly reporting that the two K80s can talk to each other via P2P. Thus the first two simpleP2P runs above should pass. The last one will likely report that P2P is possible but then the bandwidth will be very low and it will ultimately fail the test because the array received by GPU 2 will be garbage.
>
> If my suspicions are correct you would find the following behavior with AMBER
>
> 4 x 1 GPU runs, one on each GPU would be fine.
> (1 or 2) x 2 GPU runs will be fine if you use GPUS 0,1 and 2,3 but will fail if you were to use 0,2 - 0,3 - 1,2 or 1,3
> 1 x 4 GPU runs will fail unless you restrict it to GPUs 0,1 or 2,3 and thus overload the GPUs.
>
> Ps. nvidia-smi reporting 2 threads per mpi task is not an issue - it to be expected.
>
> All the best
> Ross
>
>> On Aug 11, 2016, at 7:54 PM, Steven Ford <sford123.ibbr.umd.edu <mailto:sford123.ibbr.umd.edu>> wrote:
>>
>> Hello,
>>
>> I'm still trying to figure out why the MPI CUDA tests are failing.
>>
>> If I run tests with DO_PARALLEL="mpirun -np 4" and limit CUDA_VISIBLE_DEVICES to only 0 or 1, all tests pass. I get the same behavior with OpenMPI 1.8, 1.10, 2.0 and mpich 3.1.
>>
>> I ran gpuP2PCheck just in case communication between the GPUs was the problem. It confirms that communication is working:
>>
>> CUDA-capable device count: 2
>> GPU0 " Tesla K80"
>> GPU1 " Tesla K80"
>>
>> Two way peer access between:
>> GPU0 and GPU1: YES
>>
>> If it's of any use, here is the output of nvidia-smi -q:
>>
>> ==============NVSMI LOG==============
>>
>> Timestamp : Thu Aug 11 22:42:34 2016
>> Driver Version : 352.93
>>
>> Attached GPUs : 2
>> GPU 0000:05:00.0
>> Product Name : Tesla K80
>> Product Brand : Tesla
>> Display Mode : Disabled
>> Display Active : Disabled
>> Persistence Mode : Disabled
>> Accounting Mode : Disabled
>> Accounting Mode Buffer Size : 1920
>> Driver Model
>> Current : N/A
>> Pending : N/A
>> Serial Number : 0325015055313
>> GPU UUID : GPU-a65eaa77-8871-ded5-b6ee-5268404192f1
>> Minor Number : 0
>> VBIOS Version : 80.21.1B.00.01
>> MultiGPU Board : Yes
>> Board ID : 0x300
>> Inforom Version
>> Image Version : 2080.0200.00.04
>> OEM Object : 1.1
>> ECC Object : 3.0
>> Power Management Object : N/A
>> GPU Operation Mode
>> Current : N/A
>> Pending : N/A
>> PCI
>> Bus : 0x05
>> Device : 0x00
>> Domain : 0x0000
>> Device Id : 0x102D10DE
>> Bus Id : 0000:05:00.0
>> Sub System Id : 0x106C10DE
>> GPU Link Info
>> PCIe Generation
>> Max : 3
>> Current : 3
>> Link Width
>> Max : 16x
>> Current : 16x
>> Bridge Chip
>> Type : PLX
>> Firmware : 0xF0472900
>> Replays since reset : 0
>> Tx Throughput : N/A
>> Rx Throughput : N/A
>> Fan Speed : N/A
>> Performance State : P0
>> Clocks Throttle Reasons
>> Idle : Not Active
>> Applications Clocks Setting : Active
>> SW Power Cap : Not Active
>> HW Slowdown : Not Active
>> Unknown : Not Active
>> FB Memory Usage
>> Total : 12287 MiB
>> Used : 56 MiB
>> Free : 12231 MiB
>> BAR1 Memory Usage
>> Total : 16384 MiB
>> Used : 2 MiB
>> Free : 16382 MiB
>> Compute Mode : Default
>> Utilization
>> Gpu : 0 %
>> Memory : 0 %
>> Encoder : 0 %
>> Decoder : 0 %
>> Ecc Mode
>> Current : Disabled
>> Pending : Disabled
>> ECC Errors
>> Volatile
>> Single Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Double Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Aggregate
>> Single Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Double Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Retired Pages
>> Single Bit ECC : 0
>> Double Bit ECC : 0
>> Pending : No
>> Temperature
>> GPU Current Temp : 31 C
>> GPU Shutdown Temp : 93 C
>> GPU Slowdown Temp : 88 C
>> Power Readings
>> Power Management : Supported
>> Power Draw : 59.20 W
>> Power Limit : 149.00 W
>> Default Power Limit : 149.00 W
>> Enforced Power Limit : 149.00 W
>> Min Power Limit : 100.00 W
>> Max Power Limit : 175.00 W
>> Clocks
>> Graphics : 562 MHz
>> SM : 562 MHz
>> Memory : 2505 MHz
>> Applications Clocks
>> Graphics : 562 MHz
>> Memory : 2505 MHz
>> Default Applications Clocks
>> Graphics : 562 MHz
>> Memory : 2505 MHz
>> Max Clocks
>> Graphics : 875 MHz
>> SM : 875 MHz
>> Memory : 2505 MHz
>> Clock Policy
>> Auto Boost : On
>> Auto Boost Default : On
>> Processes : None
>>
>> GPU 0000:06:00.0
>> Product Name : Tesla K80
>> Product Brand : Tesla
>> Display Mode : Disabled
>> Display Active : Disabled
>> Persistence Mode : Disabled
>> Accounting Mode : Disabled
>> Accounting Mode Buffer Size : 1920
>> Driver Model
>> Current : N/A
>> Pending : N/A
>> Serial Number : 0325015055313
>> GPU UUID : GPU-21c2be1c-72a9-1b68-adab-459d05dd7adc
>> Minor Number : 1
>> VBIOS Version : 80.21.1B.00.02
>> MultiGPU Board : Yes
>> Board ID : 0x300
>> Inforom Version
>> Image Version : 2080.0200.00.04
>> OEM Object : 1.1
>> ECC Object : 3.0
>> Power Management Object : N/A
>> GPU Operation Mode
>> Current : N/A
>> Pending : N/A
>> PCI
>> Bus : 0x06
>> Device : 0x00
>> Domain : 0x0000
>> Device Id : 0x102D10DE
>> Bus Id : 0000:06:00.0
>> Sub System Id : 0x106C10DE
>> GPU Link Info
>> PCIe Generation
>> Max : 3
>> Current : 3
>> Link Width
>> Max : 16x
>> Current : 16x
>> Bridge Chip
>> Type : PLX
>> Firmware : 0xF0472900
>> Replays since reset : 0
>> Tx Throughput : N/A
>> Rx Throughput : N/A
>> Fan Speed : N/A
>> Performance State : P0
>> Clocks Throttle Reasons
>> Idle : Not Active
>> Applications Clocks Setting : Active
>> SW Power Cap : Not Active
>> HW Slowdown : Not Active
>> Unknown : Not Active
>> FB Memory Usage
>> Total : 12287 MiB
>> Used : 56 MiB
>> Free : 12231 MiB
>> BAR1 Memory Usage
>> Total : 16384 MiB
>> Used : 2 MiB
>> Free : 16382 MiB
>> Compute Mode : Default
>> Utilization
>> Gpu : 0 %
>> Memory : 0 %
>> Encoder : 0 %
>> Decoder : 0 %
>> Ecc Mode
>> Current : Disabled
>> Pending : Disabled
>> ECC Errors
>> Volatile
>> Single Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Double Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Aggregate
>> Single Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Double Bit
>> Device Memory : N/A
>> Register File : N/A
>> L1 Cache : N/A
>> L2 Cache : N/A
>> Texture Memory : N/A
>> Total : N/A
>> Retired Pages
>> Single Bit ECC : 0
>> Double Bit ECC : 0
>> Pending : No
>> Temperature
>> GPU Current Temp : 24 C
>> GPU Shutdown Temp : 93 C
>> GPU Slowdown Temp : 88 C
>> Power Readings
>> Power Management : Supported
>> Power Draw : 70.89 W
>> Power Limit : 149.00 W
>> Default Power Limit : 149.00 W
>> Enforced Power Limit : 149.00 W
>> Min Power Limit : 100.00 W
>> Max Power Limit : 175.00 W
>> Clocks
>> Graphics : 562 MHz
>> SM : 562 MHz
>> Memory : 2505 MHz
>> Applications Clocks
>> Graphics : 562 MHz
>> Memory : 2505 MHz
>> Default Applications Clocks
>> Graphics : 562 MHz
>> Memory : 2505 MHz
>> Max Clocks
>> Graphics : 875 MHz
>> SM : 875 MHz
>> Memory : 2505 MHz
>> Clock Policy
>> Auto Boost : On
>> Auto Boost Default : On
>> Processes : None
>>
>>
>> If it matters, when I do the tests with DO_PARALLEL="mpirun -np 4", I see that each process is running a thread on both GPUs. For example:
>>
>> # gpu pid type sm mem enc dec command
>> # Idx # C/G % % % % name
>> 0 30599 C 24 0 0 0 pmemd.cuda_DPFP
>> 0 30600 C 0 0 0 0 pmemd.cuda_DPFP
>> 0 30601 C 11 0 0 0 pmemd.cuda_DPFP
>> 0 30602 C 0 0 0 0 pmemd.cuda_DPFP
>> 1 30599 C 0 0 0 0 pmemd.cuda_DPFP
>> 1 30600 C 36 0 0 0 pmemd.cuda_DPFP
>> 1 30601 C 0 0 0 0 pmemd.cuda_DPFP
>> 1 30602 C 6 0 0 0 pmemd.cuda_DPFP
>>
>> Is that expected behavior?
>>
>> Has anybody else had any problems using K80s with MPI and CUDA? Or using CentOS/RHEL 6?
>>
>> This machine does have dual CPUs, could that be a factor?
>>
>> I'm currently using AmberTools version 16.12 and Amber version 16.05.
>>
>> Any insight would be greatly appreciated.
>>
>> Thanks,
>>
>> Steve
>>
>>
>>
>> On Mon, Jul 25, 2016 at 3:06 PM, Steven Ford <sford123.ibbr.umd.edu <mailto:sford123.ibbr.umd.edu>> wrote:
>> Ross,
>>
>> This is CentOS version 6.7 with kernel version 2.6.32-573.22.1.el6.x86_64.
>>
>> The output of nvidia-smi is:
>>
>> +------------------------------------------------------+
>> | NVIDIA-SMI 352.79 Driver Version: 352.79 |
>> |-------------------------------+----------------------+----------------------+
>> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
>> |===============================+======================+======================|
>> | 0 Tesla K80 Off | 0000:05:00.0 Off | Off |
>> | N/A 34C P0 59W / 149W | 56MiB / 12287MiB | 0% Default |
>> +-------------------------------+----------------------+----------------------+
>> | 1 Tesla K80 Off | 0000:06:00.0 Off | Off |
>> | N/A 27C P0 48W / 149W | 56MiB / 12287MiB | 0% Default |
>> +-------------------------------+----------------------+----------------------+
>>
>> +-----------------------------------------------------------------------------+
>> | Processes: GPU Memory |
>> | GPU PID Type Process name Usage |
>> |=============================================================================|
>> | No running processes found |
>> +-----------------------------------------------------------------------------+
>>
>> The version of nvcc:
>>
>> nvcc: NVIDIA (R) Cuda compiler driver
>> Copyright (c) 2005-2015 NVIDIA Corporation
>> Built on Tue_Aug_11_14:27:32_CDT_2015
>> Cuda compilation tools, release 7.5, V7.5.17
>>
>> I used the GNU compilers, version 4.4.7.
>>
>> I am using OpenMPI version 1.8.1-5.el6 from the CentOS repository. I have not tried any other MPI installation.
>>
>> Output of mpif90 --showme:
>>
>> gfortran -I/usr/include/openmpi-x86_64 -pthread -I/usr/lib64/openmpi/lib -Wl,-rpath -Wl,/usr/lib64/openmpi/lib -Wl,--enable-new-dtags -L/usr/lib64/openmpi/lib -lmpi_usempi -lmpi_mpifh -lmpi
>>
>>
>> I set DO_PARALLEL to "mpirun -np 2"
>>
>> The parallel tests for the CPU were all successful.
>>
>> I had not run 'make clean' in between each step. I tried the tests again this morning after running 'make clean' and got the same result.
>>
>> I applied all patches this morning before testing again. I am using AmberTools 16.10 and Amber 16.04
>>
>>
>> Thanks,
>>
>> Steve
>>
>> On Sat, Jul 23, 2016 at 6:32 PM, Ross Walker <ross.rosswalker.co.uk <mailto:ross.rosswalker.co.uk>> wrote:
>> Hi Steven,
>>
>> This is a large number of very worrying failures. Something is definitely very wrong here and I'd like to investigate further. Can you give me some more details about your system please. This includes:
>>
>> The specifics of what version of Linux you are using.
>>
>> The output of nvidia-smi
>>
>> nvcc -V (might be lower case v to get version info).
>>
>> Did you use the GNU compilers or the Intel compilers and in either case which version?
>>
>> OpenMPI - can you confirm the version again and also send me the output of mpif90 --showme (it might be --show or -show or something similar) - essentially I want to see what the underlying compilation line is.
>>
>> Can you confirm what you had $DO_PARALLEL set to when you ran make test for the parallel GPU build. Also can you confirm if the regular (CPU) parallel build passed the tests please?
>>
>> Also did you run 'make clean' before each build step? E.g.
>>
>> ./configure -cuda gnu
>> make -j8 install
>> make test
>> make clean
>>
>> ./configure -cuda -mpi gnu
>> make -j8 install
>> make test
>>
>> Have you tried any other MPI installations? - E.g. MPICH?
>>
>> And finally can you please confirm which version of Amber (and AmberTools) this is and which patches have been applied?
>>
>> Thanks.
>>
>> All the best
>> Ross
>>
>>> On Jul 21, 2016, at 14:20, Steven Ford <sford123.ibbr.umd.edu <mailto:sford123.ibbr.umd.edu>> wrote:
>>>
>>> Ross,
>>>
>>> Attached are the log and diff files. Thank you for taking a look.
>>>
>>> Regards,
>>>
>>> Steve
>>>
>>> On Thu, Jul 21, 2016 at 5:34 AM, Ross Walker <ross.rosswalker.co.uk <mailto:ross.rosswalker.co.uk>> wrote:
>>> Hi Steve,
>>>
>>> Indeed that is too big a difference to just be rounding error - although if those tests are using Langevin or Anderson for the thermostat that would explain it (different random number streams) - although those tests are supposed to be skipped in parallel.
>>>
>>> Can you send me a copy directly of your .log and .dif files for the 2 GPU run and I'll take a closer look at it.
>>>
>>> All the best
>>> Ross
>>>
>>> > On Jul 20, 2016, at 21:19, Steven Ford <sford123.ibbr.umd.edu <mailto:sford123.ibbr.umd.edu>> wrote:
>>> >
>>> > Hello All,
>>> >
>>> > I currently trying to get Amber16 installed and running on our computing
>>> > cluster. Our researchers are primarily interested in running the GPU
>>> > accelerated programs. For GPU computing jobs, we have one CentOS 6.7 node
>>> > with a Tesla K80.
>>> >
>>> > I was able to build Amber16 and run the Serial/Parallel CPU plus the Serial
>>> > GPU tests with all file comparisons passing. However, only 5 parallel GPU
>>> > tests succeeded, while the other 100 comparisons failed.
>>> >
>>> > Examining the diff file shows that some of the numbers are not off by much
>>> > like the documentation said could happen. For example:
>>> >
>>> > 66c66
>>> > < NSTEP = 1 TIME(PS) = 50.002 TEMP(K) = 351.27 PRESS =
>>> > 0.
>>> >> NSTEP = 1 TIME(PS) = 50.002 TEMP(K) = 353.29 PRESS =
>>> > 0.
>>> >
>>> > This may also be too large to attribute to a rounding error, but it is a
>>> > small difference compared to others:
>>> >
>>> > 85c85
>>> > < Etot = -217.1552 EKtot = 238.6655 EPtot =
>>> > -455.8207
>>> >> Etot = -1014.2562 EKtot = 244.6242 EPtot =
>>> > -1258.8804
>>> >
>>> > This was build with CUDA 7.5, OpenMPI 1.8, and run with DO_PARALLEL="mpirun
>>> > -np 2"
>>> >
>>> > Any idea what else could be affecting the output?
>>> >
>>> > Thanks,
>>> >
>>> > Steve
>>> >
>>> > --
>>> > Steven Ford
>>> > IT Infrastructure Specialist
>>> > Institute for Bioscience and Biotechnology Research
>>> > University of Maryland
>>> > (240)314-6405 <tel:%28240%29314-6405>
>>> > _______________________________________________
>>> > AMBER mailing list
>>> > AMBER.ambermd.org <mailto:AMBER.ambermd.org>
>>> > http://lists.ambermd.org/mailman/listinfo/amber <http://lists.ambermd.org/mailman/listinfo/amber>
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org <mailto:AMBER.ambermd.org>
>>> http://lists.ambermd.org/mailman/listinfo/amber <http://lists.ambermd.org/mailman/listinfo/amber>
>>>
>>>
>>>
>>> --
>>> Steven Ford
>>> IT Infrastructure Specialist
>>> Institute for Bioscience and Biotechnology Research
>>> University of Maryland
>>> (240)314-6405 <tel:%28240%29314-6405>
>>> <2016-07-20_11-17-52.diff><2016-07-20_11-17-52.log>
>>
>>
>>
>>
>> --
>> Steven Ford
>> IT Infrastructure Specialist
>> Institute for Bioscience and Biotechnology Research
>> University of Maryland
>> (240)314-6405 <tel:%28240%29314-6405>
>>
>>
>>
>> --
>> Steven Ford
>> IT Infrastructure Specialist
>> Institute for Bioscience and Biotechnology Research
>> University of Maryland
>> (240)314-6405 <tel:%28240%29314-6405>
>
>
>
>
> --
> Steven Ford
> IT Infrastructure Specialist
> Institute for Bioscience and Biotechnology Research
> University of Maryland
> (240)314-6405
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Aug 11 2016 - 21:30:03 PDT