Re: [AMBER] transition metal system works on pmemd crashes on pmemd.cuda

From: Marc van der Kamp <marcvanderkamp.gmail.com>
Date: Wed, 5 Sep 2012 12:18:58 +0100

Hi again,

The old driver was used by mistake (not commented out). Now 295.41 is
installed, but Patrick's problem persists, i.e. NaN values appear in the
energy output after a few steps. (With Amber 11 pmemd.cuda the simulation
kept running even though it was producing NaN values.)
I assume that when you say "it runs at my end" that there aren't any NaN
values?

I'll ask about getting you an account on the machine in question. It is our
institution-wide (University of Bristol) HPC cluster (BlueCrystal), so I'm
not sure if that is going to be possible.

--Marc

On 4 September 2012 22:34, Scott Le Grand <varelse2005.gmail.com> wrote:

> Well, can your sysadmin give me account on the machine in question?
>
> Or please point him to http://developer.nvidia.com/cuda/cuda-downloads
>
> It indicates 295.41 as the latest driver...
>
> Your point about passing test.cuda is valid, but since it runs at my end,
> there's not much else I can suggest...
>
>
> On Tue, Sep 4, 2012 at 1:31 PM, Marc van der Kamp
> <marcvanderkamp.gmail.com>wrote:
>
> > PS As pmemd.cuda essentially passed make test.cuda completely, I'd think
> > the driver shouldn't be the issue?
> > It is only Patrick's test cases with transition metals in co-factors that
> > are failing. So you can run these same test cases without problems?
> > --Marc
> >
> > On 4 September 2012 21:21, Marc van der Kamp <marcvanderkamp.gmail.com
> > >wrote:
> >
> > > Thanks Scott,
> > > The driver was installed by the sysadmin - I don't have root access on
> > > this cluster.
> > > He wrote to me that it was "the latest driver", but apparently not.
> I'll
> > > ask him to download a fresh driver from nvidia.
> > >
> > > Can pmemd.cuda post-bugfix.9 still be compiled with the 4.0 toolkit? I
> > > initially tried this, but got an error saying I needed to use 4.2.
> > >
> > > Thanks,
> > > --Marc
> > >
> > > On 4 September 2012 21:15, Scott Le Grand <varelse2005.gmail.com>
> wrote:
> > >
> > >> Your driver is far too old for 4.2. Either install a newer driver or
> > use
> > >> the 4.0 toolkit...
> > >> On Sep 4, 2012 1:08 PM, "Marc van der Kamp" <marcvanderkamp.gmail.com
> >
> > >> wrote:
> > >>
> > >> > Hi Scott,
> > >> >
> > >> > I compiled pmemd.cuda_SPFP for Patrick and ran make test.cuda. All
> > tests
> > >> > passed, apart from a few (6 I think) that only had minor differences
> > in
> > >> > values (different 4th digit).
> > >> >
> > >> > The CUDA Toolkit:
> > >> > $ nvcc -V
> > >> > nvcc: NVIDIA (R) Cuda compiler driver
> > >> > Copyright (c) 2005-2012 NVIDIA Corporation
> > >> > Built on Thu_Apr__5_00:24:31_PDT_2012
> > >> > Cuda compilation tools, release 4.2, V0.2.1221
> > >> >
> > >> > The driver was freshly installed today:
> > >> > devdriver_4.0_linux_64_270.41.19
> > >> >
> > >> > Cards on the node where both test.cuda and Patrick's jobs ran:
> > >> > $ nvidia-smi -L
> > >> > GPU 0: Tesla M2050 (S/N: 0322310084063)
> > >> > GPU 1: Tesla M2050 (S/N: 0322310082367)
> > >> >
> > >> > Hope this helps,
> > >> > Marc
> > >> >
> > >> >
> > >> > On 4 September 2012 19:44, Scott Le Grand <varelse2005.gmail.com>
> > >> wrote:
> > >> >
> > >> > > Does your build of pmemd.cuda pass a make test.cuda?
> > >> > >
> > >> > > Also what CUDA Toolkit/Display driver are you using?
> > >> > >
> > >> > > On Tue, Sep 4, 2012 at 10:27 AM, Patrick von Glehn <
> > >> > > patrickvonglehn.gmail.com> wrote:
> > >> > >
> > >> > > > Hi Jason and Scott,
> > >> > > >
> > >> > > > Unfortunately bugfix 9 has not solved the problem.
> > >> > > >
> > >> > > > To reiterate for anyone else who is interested, molecular
> dynamics
> > >> on
> > >> > > > my system of interest runs smoothly with pmemd but the system
> > blows
> > >> up
> > >> > > > when run with pmemd.cuda on GPUs (a few atoms in the region of
> the
> > >> > > > hexacoordinated cobalt fly off in different directions). This
> > >> happens
> > >> > > > with either a 0.002ps timestep or a 0.000002ps timestep.
> > >> > > >
> > >> > > > I initially ran the calculations on NVIDIA Tesla M2090 GPUs with
> > >> > > > pmemd.cuda_SPDP and then I tried again on Nvidia Fermi M2050
> GPUs
> > >> with
> > >> > > > bufix.9 applied.
> > >> > > >
> > >> > > > Input files can be found attached to the first message in this
> > >> thread.
> > >> > > >
> > >> > > > Any help would be greatly appreciated,
> > >> > > >
> > >> > > > Patrick von Glehn
> > >> > > > PhD student in the Harvey and Mulholland groups
> > >> > > > Centre for Computational Chemistry
> > >> > > > University of Bristol
> > >> > > >
> > >> > > > On 22 August 2012 15:50, Jason Swails <jason.swails.gmail.com>
> > >> wrote:
> > >> > > > > On Wed, Aug 22, 2012 at 10:28 AM, Patrick von Glehn <
> > >> > > > > patrickvonglehn.gmail.com> wrote:
> > >> > > > >
> > >> > > > >> Hi Scott,
> > >> > > > >>
> > >> > > > >> Thanks for your reply.
> > >> > > > >>
> > >> > > > >> Do you have reason to believe that the new patch will resolve
> > >> this
> > >> > > > >> error? Were you able to reproduce the error with an unpatched
> > >> > version
> > >> > > > >> of amber? Also, forgive my ignorance, but what does TOT
> mean?
> > >> > > > >>
> > >> > > > >
> > >> > > > > Top Of Tree, I think :). What this means is that he doesn't
> see
> > >> the
> > >> > > > error
> > >> > > > > with the soon-to-be-released pmemd.cuda upgrade (I don't think
> > the
> > >> > > > current
> > >> > > > > version of amber was tested, but the upcoming patch is known
> to
> > >> have
> > >> > > > fixed
> > >> > > > > a handful of bugs).
> > >> > > > >
> > >> > > > >
> > >> > > > >> What sort of timescale are we talking about here for the new
> > >> patch
> > >> > > > >> release? Days/weeks/months? I am very keen to get my GPU
> > >> simulations
> > >> > > > >> going!
> > >> > > > >>
> > >> > > > >
> > >> > > > > No promises here, but in conversations I've had with Ross, I
> > would
> > >> > say
> > >> > > > > we're aiming for 'days'. The patch is a large one, and has to
> > be
> > >> > > handled
> > >> > > > > with care, but we're taking a crack at generating the patch
> > >> tonight.
> > >> > > If
> > >> > > > > the merge goes smoothly and everything tests out correctly the
> > >> first
> > >> > > time
> > >> > > > > through, you probably will not have more than a few days to
> > wait.
> > >> > > > >
> > >> > > > > HTH,
> > >> > > > > Jason
> > >> > > > >
> > >> > > > > --
> > >> > > > > Jason M. Swails
> > >> > > > > Quantum Theory Project,
> > >> > > > > University of Florida
> > >> > > > > Ph.D. Candidate
> > >> > > > > 352-392-4032
> > >> > > > > _______________________________________________
> > >> > > > > AMBER mailing list
> > >> > > > > AMBER.ambermd.org
> > >> > > > > http://lists.ambermd.org/mailman/listinfo/amber
> > >> > > >
> > >> > > > _______________________________________________
> > >> > > > AMBER mailing list
> > >> > > > AMBER.ambermd.org
> > >> > > > http://lists.ambermd.org/mailman/listinfo/amber
> > >> > > >
> > >> > > _______________________________________________
> > >> > > AMBER mailing list
> > >> > > AMBER.ambermd.org
> > >> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >> > >
> > >> > _______________________________________________
> > >> > AMBER mailing list
> > >> > AMBER.ambermd.org
> > >> > http://lists.ambermd.org/mailman/listinfo/amber
> > >> >
> > >> _______________________________________________
> > >> AMBER mailing list
> > >> AMBER.ambermd.org
> > >> http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 05 2012 - 04:30:03 PDT
Custom Search