Re: [AMBER] transition metal system works on pmemd crashes on pmemd.cuda

From: Scott Le Grand <varelse2005.gmail.com>
Date: Wed, 5 Sep 2012 05:50:16 -0700

Yep, it runs all 25,000 steps. Looking at the output you guys sent,
whatever's happening to Patrick is hosed from the get-go...



On Wed, Sep 5, 2012 at 4:18 AM, Marc van der Kamp
<marcvanderkamp.gmail.com>wrote:

> Hi again,
>
> The old driver was used by mistake (not commented out). Now 295.41 is
> installed, but Patrick's problem persists, i.e. NaN values appear in the
> energy output after a few steps. (With Amber 11 pmemd.cuda the simulation
> kept running even though it was producing NaN values.)
> I assume that when you say "it runs at my end" that there aren't any NaN
> values?
>
> I'll ask about getting you an account on the machine in question. It is our
> institution-wide (University of Bristol) HPC cluster (BlueCrystal), so I'm
> not sure if that is going to be possible.
>
> --Marc
>
> On 4 September 2012 22:34, Scott Le Grand <varelse2005.gmail.com> wrote:
>
> > Well, can your sysadmin give me account on the machine in question?
> >
> > Or please point him to http://developer.nvidia.com/cuda/cuda-downloads
> >
> > It indicates 295.41 as the latest driver...
> >
> > Your point about passing test.cuda is valid, but since it runs at my end,
> > there's not much else I can suggest...
> >
> >
> > On Tue, Sep 4, 2012 at 1:31 PM, Marc van der Kamp
> > <marcvanderkamp.gmail.com>wrote:
> >
> > > PS As pmemd.cuda essentially passed make test.cuda completely, I'd
> think
> > > the driver shouldn't be the issue?
> > > It is only Patrick's test cases with transition metals in co-factors
> that
> > > are failing. So you can run these same test cases without problems?
> > > --Marc
> > >
> > > On 4 September 2012 21:21, Marc van der Kamp <marcvanderkamp.gmail.com
> > > >wrote:
> > >
> > > > Thanks Scott,
> > > > The driver was installed by the sysadmin - I don't have root access
> on
> > > > this cluster.
> > > > He wrote to me that it was "the latest driver", but apparently not.
> > I'll
> > > > ask him to download a fresh driver from nvidia.
> > > >
> > > > Can pmemd.cuda post-bugfix.9 still be compiled with the 4.0 toolkit?
> I
> > > > initially tried this, but got an error saying I needed to use 4.2.
> > > >
> > > > Thanks,
> > > > --Marc
> > > >
> > > > On 4 September 2012 21:15, Scott Le Grand <varelse2005.gmail.com>
> > wrote:
> > > >
> > > >> Your driver is far too old for 4.2. Either install a newer driver
> or
> > > use
> > > >> the 4.0 toolkit...
> > > >> On Sep 4, 2012 1:08 PM, "Marc van der Kamp" <
> marcvanderkamp.gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> > Hi Scott,
> > > >> >
> > > >> > I compiled pmemd.cuda_SPFP for Patrick and ran make test.cuda. All
> > > tests
> > > >> > passed, apart from a few (6 I think) that only had minor
> differences
> > > in
> > > >> > values (different 4th digit).
> > > >> >
> > > >> > The CUDA Toolkit:
> > > >> > $ nvcc -V
> > > >> > nvcc: NVIDIA (R) Cuda compiler driver
> > > >> > Copyright (c) 2005-2012 NVIDIA Corporation
> > > >> > Built on Thu_Apr__5_00:24:31_PDT_2012
> > > >> > Cuda compilation tools, release 4.2, V0.2.1221
> > > >> >
> > > >> > The driver was freshly installed today:
> > > >> > devdriver_4.0_linux_64_270.41.19
> > > >> >
> > > >> > Cards on the node where both test.cuda and Patrick's jobs ran:
> > > >> > $ nvidia-smi -L
> > > >> > GPU 0: Tesla M2050 (S/N: 0322310084063)
> > > >> > GPU 1: Tesla M2050 (S/N: 0322310082367)
> > > >> >
> > > >> > Hope this helps,
> > > >> > Marc
> > > >> >
> > > >> >
> > > >> > On 4 September 2012 19:44, Scott Le Grand <varelse2005.gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > > Does your build of pmemd.cuda pass a make test.cuda?
> > > >> > >
> > > >> > > Also what CUDA Toolkit/Display driver are you using?
> > > >> > >
> > > >> > > On Tue, Sep 4, 2012 at 10:27 AM, Patrick von Glehn <
> > > >> > > patrickvonglehn.gmail.com> wrote:
> > > >> > >
> > > >> > > > Hi Jason and Scott,
> > > >> > > >
> > > >> > > > Unfortunately bugfix 9 has not solved the problem.
> > > >> > > >
> > > >> > > > To reiterate for anyone else who is interested, molecular
> > dynamics
> > > >> on
> > > >> > > > my system of interest runs smoothly with pmemd but the system
> > > blows
> > > >> up
> > > >> > > > when run with pmemd.cuda on GPUs (a few atoms in the region of
> > the
> > > >> > > > hexacoordinated cobalt fly off in different directions). This
> > > >> happens
> > > >> > > > with either a 0.002ps timestep or a 0.000002ps timestep.
> > > >> > > >
> > > >> > > > I initially ran the calculations on NVIDIA Tesla M2090 GPUs
> with
> > > >> > > > pmemd.cuda_SPDP and then I tried again on Nvidia Fermi M2050
> > GPUs
> > > >> with
> > > >> > > > bufix.9 applied.
> > > >> > > >
> > > >> > > > Input files can be found attached to the first message in this
> > > >> thread.
> > > >> > > >
> > > >> > > > Any help would be greatly appreciated,
> > > >> > > >
> > > >> > > > Patrick von Glehn
> > > >> > > > PhD student in the Harvey and Mulholland groups
> > > >> > > > Centre for Computational Chemistry
> > > >> > > > University of Bristol
> > > >> > > >
> > > >> > > > On 22 August 2012 15:50, Jason Swails <jason.swails.gmail.com
> >
> > > >> wrote:
> > > >> > > > > On Wed, Aug 22, 2012 at 10:28 AM, Patrick von Glehn <
> > > >> > > > > patrickvonglehn.gmail.com> wrote:
> > > >> > > > >
> > > >> > > > >> Hi Scott,
> > > >> > > > >>
> > > >> > > > >> Thanks for your reply.
> > > >> > > > >>
> > > >> > > > >> Do you have reason to believe that the new patch will
> resolve
> > > >> this
> > > >> > > > >> error? Were you able to reproduce the error with an
> unpatched
> > > >> > version
> > > >> > > > >> of amber? Also, forgive my ignorance, but what does TOT
> > mean?
> > > >> > > > >>
> > > >> > > > >
> > > >> > > > > Top Of Tree, I think :). What this means is that he doesn't
> > see
> > > >> the
> > > >> > > > error
> > > >> > > > > with the soon-to-be-released pmemd.cuda upgrade (I don't
> think
> > > the
> > > >> > > > current
> > > >> > > > > version of amber was tested, but the upcoming patch is known
> > to
> > > >> have
> > > >> > > > fixed
> > > >> > > > > a handful of bugs).
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >> What sort of timescale are we talking about here for the
> new
> > > >> patch
> > > >> > > > >> release? Days/weeks/months? I am very keen to get my GPU
> > > >> simulations
> > > >> > > > >> going!
> > > >> > > > >>
> > > >> > > > >
> > > >> > > > > No promises here, but in conversations I've had with Ross, I
> > > would
> > > >> > say
> > > >> > > > > we're aiming for 'days'. The patch is a large one, and has
> to
> > > be
> > > >> > > handled
> > > >> > > > > with care, but we're taking a crack at generating the patch
> > > >> tonight.
> > > >> > > If
> > > >> > > > > the merge goes smoothly and everything tests out correctly
> the
> > > >> first
> > > >> > > time
> > > >> > > > > through, you probably will not have more than a few days to
> > > wait.
> > > >> > > > >
> > > >> > > > > HTH,
> > > >> > > > > Jason
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > Jason M. Swails
> > > >> > > > > Quantum Theory Project,
> > > >> > > > > University of Florida
> > > >> > > > > Ph.D. Candidate
> > > >> > > > > 352-392-4032
> > > >> > > > > _______________________________________________
> > > >> > > > > AMBER mailing list
> > > >> > > > > AMBER.ambermd.org
> > > >> > > > > http://lists.ambermd.org/mailman/listinfo/amber
> > > >> > > >
> > > >> > > > _______________________________________________
> > > >> > > > AMBER mailing list
> > > >> > > > AMBER.ambermd.org
> > > >> > > > http://lists.ambermd.org/mailman/listinfo/amber
> > > >> > > >
> > > >> > > _______________________________________________
> > > >> > > AMBER mailing list
> > > >> > > AMBER.ambermd.org
> > > >> > > http://lists.ambermd.org/mailman/listinfo/amber
> > > >> > >
> > > >> > _______________________________________________
> > > >> > AMBER mailing list
> > > >> > AMBER.ambermd.org
> > > >> > http://lists.ambermd.org/mailman/listinfo/amber
> > > >> >
> > > >> _______________________________________________
> > > >> AMBER mailing list
> > > >> AMBER.ambermd.org
> > > >> http://lists.ambermd.org/mailman/listinfo/amber
> > > >>
> > > >
> > > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 05 2012 - 06:00:08 PDT
Custom Search