Re: [AMBER] Error: unspecified launch failure launching kernel kReduceForces

From: Aron Broom <broomsday.gmail.com>
Date: Mon, 2 Apr 2012 12:43:21 -0400

As one more thing to add, AMBER being run on a GPU, particularly the GTX
ones, seems to often run into the problem where the coordinates and
velocities get lost from one step to another. Maybe you've already done
it, but Ross' response made me think that you should search your restart
file for any 'NaN' entries.

~Aron

On Mon, Apr 2, 2012 at 12:36 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Giovanni,
>
> > My system has inside a gadolinium DOTA molecule - the Gd atom has 8
> > coordination bonds parametrized according to literature.
>
> I don't think we ever tested the GPU code with 8 bonds to a single atom
> since things like Gadolinium fall way outside the remit of what AMBER would
> normally be used to simulate. Can you post your input files please (a
> private message to me is fine) along with the exact command lines you use
> to
> reproduce the problem and I will see if I can confirm it and then figure
> out
> what is going wrong.
>
> That said I am confused by some of what you are reporting:
>
> > The problem is that the system passes successfully the minimization
> > step with reasonable energy and everything seems to be ok.
>
> Did you check the structure manually after minimization? Does everything
> look good? Does the energy trend during minimization look ok? - Also did
> you
> try minimizing with CPU.
>
> > Error: unspecified launch failure launching kernel kReduceForces
> > cudaFree GpuBuffer::Deallocate failed unspecified launch failure
> >
> > At line 109 of file inpcrd_dat.f90 (unit = 9, file =
> > 'heat1_G3_DOTA.rst')
> >
> > Fortran runtime error: End of file
> >
> > STOP PMEMD Terminated Abnormally!
>
> This is VERY strange. The unspecified launch error for kReduceForces seems
> reasonable. As in if there are infinite forces or atoms sitting on top of
> each other or other strange structural anomalies then this is where it will
> crash. What I don't understand is how you got the second error out. Did
> this
> all come out of the same run? - The end of file for the inpcrd file
> suggests
> one of two things. Either you set irest / ntx wrong such that the code
> expects velocities to be in the inpcrd file and they aren't, or you gave
> the
> code a inpcrd file lacking in box information (i.e. you minimized without
> periodic boundaries) but then you requested a periodic simulation.
>
> It is also possible that your inpcrd (restart) file is corrupt for some
> reason. Either way I can't see how the code would error here saying it
> can't
> read the restart file but still produce you energy output for step zero. So
> where did the energy output for step0 that you show below come from?
>
> > Another strange thing is that if the same MD step is run with pmemd
> > instead
> > of pmemd.cuda, no error is reported.
> >
> > Curiously, however the energies reported are quite different!
>
> Did you also run the minimization with CPU? Try this please:
>
> 1) Minimize using the CPU code. Check the output carefully.
>
> 2) Run with nstlim=10 and ntpr=1 and ntwx=1 for using the CPU minimization
> restart file for both pmemd and pmemd.cuda and compare.
>
> > In fact PMEMD reports "usual/safe" negative values, while pmemd.cuda
> > reports a "worse" energetic situation.
>
> The origin is arbitrary MD simulations so the sign is not very indicative
> BUT big differences on step 0 for otherwise identical input does suggest a
> bug.
>
> GPU
> > NSTEP = 0 TIME(PS) = 0.000 TEMP(K) = 0.00 PRESS =
> 0.0
> > Etot = 1108141.2453 EKtot = 0.0000 EPtot =
> 1108141.2453
> > BOND = 9126.0102 ANGLE = 3780.4845 DIHED = 2107.0903
> > 1-4 NB = 1401.2256 1-4 EEL = -42904.2608 VDWAALS =
> 1773510.4445
> > EELEC = -638879.7492 EHBOND = 0.0000 RESTRAINT = 0.0000
>
> CPU
> > NSTEP = 0 TIME(PS) = 0.000 TEMP(K) = 0.00 PRESS =
> 0.0
> > Etot = -585383.3566 EKtot = 0.0000 EPtot =
> -585383.3566
> > BOND = 9126.0102 ANGLE = 3780.4845 DIHED = 2107.0903
> > 1-4 NB = 1401.2256 1-4 EEL = -42904.2608 VDWAALS =
> 72165.7215
> > EELEC = -631059.6280 EHBOND = 0.0000 RESTRAINT = 0.0000
> > Ewald error estimate: 0.2749E-03
>
> So interestingly here your bond, angle, dihedral, 1-4NB, 1-4EEL and elec
> terms are all identical. The difference is coming from the VDWAALS term
> which is radically different in each case. This should make it easier to
> track down what is going on - it may be possible that your Gadolinium has
> very 'unorthodox' VDW parameters - can you post them as well please.
>
> > Am I missing something?
>
> Can you confirm that you are running with the very latest patched version
> of
> the AMBER 11 code.
>
> The mdout file should contain:
>
> |--------------------- INFORMATION ----------------------
> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> | Version 2.3
>
> Note the version '2.3'.
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> ---------------------------------------------------------
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Adjunct Assistant Professor |
> | Dept. of Chemistry and Biochemistry |
> | University of California San Diego |
> | NVIDIA Fellow |
> | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> ---------------------------------------------------------
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Apr 02 2012 - 10:00:08 PDT
Custom Search