Re: [AMBER] Fwd: cudaMemcpy GpuBuffer::Download failed

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 25 Mar 2013 10:22:05 -0700

Hi Alessandro,


This typically means something is wrong with your structure. What ends up
happening is you get a NAN in the force array and then this causes the
code to crash when it tries to download things. It is not easy to trap it
within the GPU code itself due to the tens of thousands of threads that
are running. I would take a careful look at your simulation results - does
anything look out of place, are any energies or temperatures unreasonably
high, does the structure look ok? One place this is happening seems to be
with hydrogen atoms colliding with other atoms - they have zero VDW on
hydroxyls and occasionally come too close to other atoms. This was never
really a problem in the past since it rare and people didn't run for long
but now people are routinely running microsecond+ simulations it is
starting to bite more often.

Does it always crash in the same place if you start with the same random
seed and run the exact same simulation?

Have you ever seen it crash if you run on the CPU?

If you restart from the previous restart file does it crash very quickly?

It's going to take a little digging to figure out what is going wrong
unfortunately.

All the best
Ross

On 3/25/13 5:06 AM, "Alessandro Orro" <alessandro.orro.itb.cnr.it> wrote:

>dear all
>
>I'm trying to run a MD simulation with pmemd.cuda using the cmdline
>
>*time pmemd.cuda -O -i md.in -o md.out -p com.wat.leap.prm7 -c npt.rst7
>-ref npt.rst7 -x md.trj -inf md.info -r md.rst7;*
>
>this is the md.in file
>
>*production dynamics*
>* &cntrl*
>* imin=0, irest=1, ntx=5,*
>* nstlim=25000000, dt=0.002,*
>* ntc=2, ntf=2,*
>* cut=10.0, ntb=2, ntp=1, taup=2.0,*
>* ntpr=1000, ntwx=1000, ntwr=50000,*
>* ntt=3, gamma_ln=2.0,*
>* temp0=300.0,*
>*/*
>
>After about 1000 min the run crashes with the error
>
>*cudaMemcpy GpuBuffer::Download failed unspecified launch failure*
>
>I also tried with ig=-1 and ntf=1, as suggested by someone in this mailing
>list, but the error is the same.
>
>I think to use the most updated version because the md.out contains
>
>*|--------------------- INFORMATION ----------------------*
>*| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.*
>*| Version 12.2*
>*| *
>*| 01/10/2013*
>*| *
>*| Implementation by:*
>*| Ross C. Walker (SDSC)*
>*| Scott Le Grand (nVIDIA)*
>*| Duncan Poole (nVIDIA)*
>*| *
>*| CAUTION: The CUDA code is currently experimental.*
>*| You use it at your own risk. Be sure to*
>*| check ALL results carefully.*
>*| *
>*| Precision model in use:*
>*| [SPFP] - Mixed Single/Double/Fixed Point Precision.*
>*| (Default)*
>*| *
>*|--------------------------------------------------------*
>
>using another protein-ligand complex the simulation finished correctly.
>Any
>suggestions?
>
>thank you in advance
>
>Alessandro
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Mar 25 2013 - 10:30:03 PDT
Custom Search