Re: [AMBER] multinode pmemd.cuda.MPI jac9999 behavior

From: Scott Brozell <sbrozell.rci.rutgers.edu>
Date: Tue, 25 Apr 2017 13:48:19 -0400

On Tue, Apr 25, 2017 at 01:27:45PM -0400, David Case wrote:
> On Tue, Apr 25, 2017, Scott Brozell wrote:
> >
> > On 1.:
> > Perhaps it was not clear, but i showed both the small and large
> > test results. In other words, intel gives 4 energies for small
> > and 2 energies for large; gnu gives 1 energy for small and
> > 1 energy for large.
> >
> > There have also been multiple experiments over the whole cluster
> > yielding the same exact energies.
> >
> > This certainly seems like good gpu's and some issue with intel
> > compilers. Perhaps it is time to contact our intel colleagues
> > if we have no explanation.
>
> Agreed...but it may also be time to stop trying to support intel compilers for
> pmemd.cuda. People will try that (to no benefit) simply because it is an
> avialable option and because they think the Intel compilers are "better".
>
> (There are a very small number of people who somehow have Intel compilers
> but not gnu compilers available, and for some reason cannot install the
> latter. On the other side of the equation, 99+% of the real-world testing
> of pmemd.cuda is done using the gnu compilers, and Intel keeps putting in
> new bugs in their compiler every year, so it's a big headache to support
> this combination.)
>
> Have you tried putting the Intel compilers into debug mode (-O0), to see
> what happens?

A first step would be merely putting that statement about 99% into
the Amber gpu web page. That would have been enough for me to build
cuda stuff with gnu.

Using intel was the path of least resistance because there was
evidence of superior performance for pmemd.MPI built with intel compilers.
And on a large cluster with lots of users using one compiler avoids
issues with modulefiles, rpath, and installation scripts.

No i have not tried -O0; i may.

thanks,
scott

> > On Tue, Apr 25, 2017 at 09:09:14AM -0400, Daniel Roe wrote:
> > >
> > > My experience with the GPU validation test is that with Intel
> > > compilers I usually end up with final energies flipping between two
> > > different values. With GNU compilers and the same GPUs I only get one
> > > energy each time. This is why I only use GNU compilers for the CUDA
> > > stuff. If there is more variation than that (i.e. 2 values for Intel,
> > > 1 for GNU) that indicates a "bad" GPU.


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Apr 25 2017 - 11:00:03 PDT
Custom Search