Re: [AMBER] NaN error in .rst files

From: Ross Walker <>
Date: Wed, 26 Jan 2011 22:51:11 -0800

Hi Marek,

> -----------------------------------
> Error: unspecified launch failure launching kernel kClearForces
> cudaFree GpuBuffer::Deallocate failed unspecified launch failure
> STOP PMEMD Terminated Abnormally!
> -----------------------------------
> has nothing common with the "NaN error story" !

I agree... With the caveat that it could be related possibly in that
whatever is causing the NANs may actually in some cases cause the kernel to
abort. One thing it would be useful if you could check is whether there is a
runtime limit on kernels set. Run deviceQuery and look for the following:

  Run time limit on kernels: No

If this is Yes then it is possible that the watchdog process is
inadvertently killing kernels that run for too long. This is a long shot but
possible. The reason this would be Yes is because you have X11 running on
the same card you are running the calculations on. This is not advisable
since it competes for memory and resources and sets the run time kernel
limit to prevent something from freezing the X interface. Really any machine
running GPU runs should be in runlevel 3. Just something to try, it will be
interesting to see if the problem goes away with this.

All the best

|\oss Walker

| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| | |
| Tel: +1 858 822 0854 | EMail:- |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

AMBER mailing list
Received on Wed Jan 26 2011 - 23:00:03 PST
Custom Search