Re: [AMBER] Amber 14 w/ CUDA - unclear "make test"-errors from Ross Walker on 2016-02-11 (Amber Archive Feb 2016)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 11 Feb 2016 07:29:26 -0800

FYI This will be fixed in AMBER 16.

For now if you are concerned you can build the DPFP model and test that:

./configure -cuda_DPFP gnu
make install
cd test
./test_amber_cuda.sh DPFP

The only difference here is the precision model so you get less rounding (the rest of the code is identical) so this will give you a valid test of whether things are working or not.

Ultimately our test cases on the user pespective are way too complicated. The test suite is really designed as regression tests for those modifying the code etc while what an end user test cases need to be is one that just checks the compilation worked and tests a reasonable range of options to look for obvious compiler bugs etc. Unfortunately nobody has volunteered yet to split the testing in this way so users run the very long and complicated regression test which can lead to confusion.

TLNR you are fine - the issue is rounding differences on different hardware - it's tricky to deal with with Newtonian integrators but the AMBER 16 approach should be more robust.

> On Feb 11, 2016, at 05:03, Jason Swails <jason.swails.gmail.com> wrote:
>
> On Thu, Feb 11, 2016 at 7:36 AM, Falko Jähnert <
> falko.jaehnert.biochemtech.uni-halle.de> wrote:
>
>> Dear Amberlings,
>>
>>
>>
>> at first, thanks a lot helping me out with my last problem „Howto cpptraj -
>> multiple trajin-commands in one line“. @Jean-Marc Billod: I did it your way
>> and this works just fine!
>>
>>
>>
>> Now I’ve got a little concern about the results of my installation of Amber
>> 14. The make test-procedure at the parallel installation level (both with 2
>> and 4 threads) went through without a single error, even without rounding
>> mistakes. After that i’ve compiled Amber 14 the usual way to gather
>> CUDA-support. Now the make test produce some rounding errors which are okay
>> (I hope), but also errors where lines one of the compared files (*.diff)
>> are
>> inserted and thus produce a lot of differences. If one compares the numbers
>> of the correctly aligned lines then everything is fine (I hope – again with
>> some rounding errors). To understand my problem better I attached the *log-
>> and the *.diff-files which are shortened to display only the unclear diffs.
>>
>>
>>
>> Can I ignore this diffs safely? If not, may someone provide any information
>> handling this problem?
>>
>
> This is a known deficiency in the CUDA testing infrastructure. All of the
> larger failures (i.e., that are not round-off) arise from stochastic
> methods (ntt=2 or ntt=3) where the random number stream is different on
> every GPU.
>
> While there is a way to fix it (and it is on the to-do list), it apparently
> hasn't been important enough to make it to the top yet.
>
> HTH,
> Jason
> 
> --
> Jason M. Swails
> BioMaPS,
> Rutgers University
> Postdoctoral Researcher
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Feb 11 2016 - 08:00:06 PST