Re: [AMBER] Issues compiling Amber with CUDA support

From: Ross Walker <ross.rosswalker.co.uk>
Date: Sat, 27 Aug 2016 15:53:28 -0700

Hi Joe,

I would say yes - except for the fact that this test is with vrand and the difference occurs at step 20. I bet if you actually compare the two output files (rather than the diff which doesn't tell you the whole story) you'll find that they match up to step 19 and then it says something along the line of randomizing velocities (anderson thermostat) and then the difference occurs after that. That just means the random number stream is different between the two runs - likely because the GPU and CUDA version he used produces a different stream of random numbers from curand than the test systems I used. Unfortunately this is a pain with anything that uses random number streams and not easy to avoid. If such difference occurred in the absence of random number use - or prior to the random number generator being called then I'd be concerned.

All the best
Ross

> On Aug 27, 2016, at 1:10 PM, Baker, Joseph <bakerj.tcnj.edu> wrote:
>
> Hi Ross,
>
> I just checked the diff logs that Shawn posted at his link (we are in the
> process of getting Amber installed on a new cluster at TCNJ) and some of
> the differences we are seeing are a bit bigger. For example below are the
> observed differences large enough to be concerning? Thanks, Joe
>
> possible FAILURE: check mdout.vrand.dif
> /opt/tcnjhpc/amberdev/amber16/test/cuda/4096wat
>
> < NSTEP = 20 TIME(PS) = 1.020 TEMP(K) = 298.31 PRESS = 0.
>> NSTEP = 20 TIME(PS) = 1.020 TEMP(K) = 296.70 PRESS = 0.
> 212c212
>
> < Etot = -32126.2133 EKtot = 7283.3078 EPtot =
> -39409.5211
>> Etot = -32067.4187 EKtot = 7244.0343 EPtot = -39311.4530
> 214c214
>
> < 1-4 NB = 0. 1-4 EEL = 0. VDWAALS = 6013.9630
>> 1-4 NB = 0. 1-4 EEL = 0. VDWAALS = 6025.7161
> 215c215
>
> < EELEC = -45423.4842 EHBOND = 0. RESTRAINT = 0.
>> EELEC = -45337.1691 EHBOND = 0. RESTRAINT = 0.
>
>
>
> ------
> Joseph Baker, PhD
> Assistant Professor
> Department of Chemistry
> C101 Science Complex
> The College of New Jersey
> Ewing, NJ 08628
> Phone: (609) 771-3173
> Web: http://bakerj.pages.tcnj.edu/
>
>
> On Thu, Aug 25, 2016 at 7:30 AM, Ross Walker <ross.rosswalker.co.uk> wrote:
>
>> Hi Shawn,
>>
>> Assuming the differences are all similar to the cellulose example you show
>> below where they differ only on a few lines and only by a very small amount
>> it is innocuous and you can ignore it. I've never tested on a Quadro NVS
>> 510 GPU but I'd expect it to work fine - although it is possible some tests
>> may fail due to being out of memory - it depends on how much is on the card.
>>
>> Other than that though the minor differences can be ignored.
>>
>> All the best
>> Ross
>>
>>> On Aug 24, 2016, at 14:47, Sivy, Shawn <ssivy.tcnj.edu> wrote:
>>>
>>> Hello,
>>>
>>> I’m hoping someone could help me with compiling the Amber software on our
>>> systems. I have an older model Intel Xeon 5500 server with an NVIDIA
>>> Quadro NVS 510 (GK107) GPU card in it. I’ve successfully compiled the
>>> serial and parallel versions of Amber using gcc 4.8.5. I’m trying to
>>> compile the serial CUDA version now. The “make install” builds without
>>> errors, but the “make test” gets “possible failures” on 9 comparisons.
>>> I’ve tried building with CUDA 7.5 and CUDA 8.0 resulting in the same
>>> failures. Does anyone have a suggestion on what to try next? I’m
>> assuming
>>> “make test” needs to run without any errors. I suspect maybe the GPU I’m
>>> using doesn’t have the precision that a Tesla or recent GTX card has.
>>>
>>>
>>> Below is some sample output from the “make install” and the resulting log
>>> and diff files. The complete contents of the log and diff files can be
>>> found at https://www.tcnj.edu/~ssivy/amberlogs/
>>>
>>> Thanks in advance for any assistance.
>>>
>>> ...
>>>
>>> make[2]: Leaving directory `/opt/tcnjhpc/amberdev/amber16/test'
>>> 130 file comparisons passed
>>> 9 file comparisons failed
>>> 0 tests experienced errors
>>> Test log file saved as /opt/tcnjhpc/amberdev/amber16/
>>> logs/test_amber_cuda/2016-08-24_11-35-41.log
>>> Test diffs file saved as /opt/tcnjhpc/amberdev/amber16/
>>> logs/test_amber_cuda/2016-08-24_11-35-41.diff
>>> make[1]: Leaving directory `/opt/tcnjhpc/amberdev/amber16/test'
>>>
>>>
>>> Example from log file:
>>>
>>> ==============================================================
>>> cd cellulose/ && ./Run.cellulose_nvt DPFP /opt/tcnjhpc/amberdev/amber16/
>>> include/netcdf.mod
>>> diffing mdout.cellulose_nvt.GPU_DPFP with mdout.cellulose_nvt
>>> possible FAILURE: check mdout.cellulose_nvt.dif
>>> ==============================================================
>>>
>>> Example from diff file:
>>>
>>> possible FAILURE: check mdout.cellulose_nvt.dif
>>>
>>> /opt/tcnjhpc/amberdev/amber16/test/cuda/cellulose
>>>
>>> 212c212
>>>
>>> < Etot = 5.8651 EKtot = 273.2191 EPtot =
>>> 276.4278
>>>
>>>> Etot = 5.8650 EKtot = 273.2191 EPtot =
>>> 276.4278
>>>
>>> ### Maximum absolute error in matching lines = 1.00e-04 at line 212
>> field 3
>>>
>>> ### Maximum relative error in matching lines = 1.71e-05 at line 212
>> field 3
>>>
>>> ---------------------------------------
>>>
>>>
>>> ------------------------------
>>> [image: The College of New Jersey] <http://tcnj.pages.tcnj.edu/> Shawn
>> Sivy
>>> HPC System Administrator
>>> School of Science
>>> PO Box 7718 Ewing, NJ 08628-0718
>>> 609-771-3475
>>> ssivy.tcnj.edu <email.tcnj.edu>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Aug 27 2016 - 16:00:03 PDT
Custom Search