Re: [AMBER] NaN error on traj and output with AMBER CUDA - strange reproducable error

From: Marek Maly <marek.maly.ujep.cz>
Date: Sun, 23 Jan 2011 01:47:06 +0100

Hi Ross,
I am not sure if the card/motherboard incompatibility is the key reason
here.

We have brand new computers with six core CPUs. Each of this computer
has just one GTX 470.

That random error which I described (cudaFree GpuBuffer::Deallocate failed
unspecified launch failure)
is relatively rare, but I need more testing to see here any eventual
statistical differences between
individual machines (we have 19 such PCs) to judge if some PC has
significantly bigger probability of such
errors than the others. If I identify such ones, also some particular
incompatibility might play role here
but in my opinion the major reason is heating and long runs (days) for
which these "gaming" GPUs are not
projected.

   Best wishes,

      Marek



Dne Sat, 22 Jan 2011 21:24:33 +0100 Ross Walker <ross.rosswalker.co.uk>
napsal/-a:

> Hi Marek,
>
>> One of that error which seems appear randomly and is not
>> reproducable is this one:
>>
>>
>> Error: unspecified launch failure launching kernel kClearForces
>> cudaFree GpuBuffer::Deallocate failed unspecified launch failure
>> STOP PMEMD Terminated Abnormally!
>>
>> This kind of errors might be solved/minimised with Peker approach
>> or with cooling improvement (liquid cooling ...) I guess.
>
> I've seen this problem as well on one of my machines but never on another
> one. The difference between them being the motherboard. I have pretty
> much
> come to the conclusion that if your motherboard was manufactured before
> the
> GTX4XX series cards came out then you may well have some strange
> incompatibilities. A bios update may help. To give you an example of
> what I
> have seen. If I have 2 GTX480's in one of the machines (SuperMicro
> X7DWA-N
> Motherboard) then one of the cards often gives weird kernel launch
> failures
> but only on certain sizes of simulation. The other works fine. If I swap
> the
> boards around then the other one fails suggesting it is PCI-E slot
> dependent
> and not an issue with the card itself. Having just one card in either
> slot
> works fine. 2 x C2070's also work fine. So what is the problem is
> anyone's
> guess. Insufficient power, weird motherboard incompatibility?, driver
> issue?
> Who knows... On a newer motherboard I have not seen the issue though so
> I am
> inclined to think it is an incompatibility issue with the motherboard.
>
> I think one may just need to accept that if you want to use GTX4XX / 5XX
> cards then things will always be slightly temperamental. If you are happy
> living with this then all is fine.
>
> I'd be interested to hear of people who have NOT seen this problem though
> and what motherboards / bios versions they are using.
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> ---------------------------------------------------------
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Adjunct Assistant Professor |
> | Dept. of Chemistry and Biochemistry |
> | University of California San Diego |
> | NVIDIA Fellow |
> | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> ---------------------------------------------------------
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may
> not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> __________ Informace od ESET NOD32 Antivirus, verze databaze 5808
> (20110122) __________
>
> Tuto zpravu proveril ESET NOD32 Antivirus.
>
> http://www.eset.cz
>
>
>


-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Jan 22 2011 - 17:00:06 PST
Custom Search