On Fri, 2014-09-26 at 12:31 +0000, Parker de Waal wrote:
> Another update - I've switched from CUDA 6.5 to 5.5, however the error remains the same:
>
> Error: unspecified launch failure launching kernel kClearForces
> cudaFree GpuBuffer::Deallocate failed unspecified launch failure
>
> Is it possible that because both my cartesian restraints (only on
> chain A) AND NMR restraints (tethering chain B to chain A) affect some
> of the same residues that PMEMD.CUDA is crashing and that this
> combination is not possible?
What happens if you turn off all of the restraints? Does it work?
Does it work with the CPU? Here is a good workflow for debugging
pmemd.cuda issues:
1. Run the pmemd.cuda test suite and make sure that the tests run, and
the majority pass (and that the failures are small differences). (This
should really be step 0, since hopefully you've already done this part.)
2. Does the pmemd.cuda simulation fail reliably? That is, do you get
the EXACT same behavior every time? (i.e., it fails on the same step
with the same error message, and yields identical energies each time if
it gets that far?)
3a. If the answer to 2 is "no", download and run the GPU validation test
suite. If that test fails, RMA your bad GPU and skip to step 6. If the
test passes, continue to 3b or 4.
3b. Try reducing the feature set of your simulation to make sure
"plain" molecular dynamics works. If "plain" MD works, try adding the
"extras" separately until you find the minimal combination that fails.
This will help narrow down where to look.
4. Run the problematic simulation using pmemd on the CPU. Does it fail?
5a. If it fails, have a look at the error message -- pmemd and pmemd.MPI
are very often better at giving informative error messages than
pmemd.cuda (which often just says the equivalent of "something went
wrong").
5b. If the CPU works, then the problem is more likely (although still
not certainly) a problem with pmemd.cuda, so consult this list, giving
as much detail as possible (including what you've done so far).
6. Have a beer. Or a glass of milk.
Of course there are more things you can try -- like reducing ntpr and
ntwx to very small numbers (e.g., 1) so that you can visualize what is
happening each step leading up to the crash (you may have to run some MD
to get your system "close" if it happens after thousands of steps).
Hope this helps,
Jason
--
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Sep 26 2014 - 07:30:02 PDT