Hi,
Before to decide if you have a bad GPU. You would try to install the last
gtx460 drivers, the CUDA5.0 toolkit and recompile pmemd with gnu all patch
included.
Hector.
> Run JAC NVE for 100,000 iterations. If it crashes, you have a bad GPU.
> On Mar 11, 2013 6:59 PM, "John Gehman" <jgehman.unimelb.edu.au> wrote:
>
>> Many Thanks Jason, Hector, and Ross,
>>
>> To answer Ross's questions:
>> -- No, I cannot find any error messages anywhere — I've checked md.out
>> files, /var/log files, monitored nvidia-smi, no evidence of any
>> problems.
>> -- I've confirmed that the fan is running fine, however I think it's
>> probably correct that the fault is temperature related: I tested again
>> this
>> morning, at slightly cooler ambient temperature than the tests I
>> reported
>> earlier which were run later in the day (during a general heat wave here
>> in
>> Australia) -- the md ran longer this time (2-3 minutes), but I think
>> failed
>> at similar temperatures (last caught temps with manual nvidia-smi
>> updates
>> before failure were 67-70C)
>> -- The card does not fall off the bus — no reboot required, and from
>> what
>> I can find on the web, I believe there should be a log entry in /var/log
>> if
>> I were to suffer such an event.
>> -- I'll probe a bit further into the results to look for "crazy". The
>> cuda
>> tests reported no bona fide errors, and 7/88 "possible failures", all of
>> which were "Maximum * error …" messages for differences in the last
>> digit
>> of specified values; all the tests fundamentally ran and completed,
>> though.
>>
>> CUDA-Z runs fine for as long as I've left it go (longer than the two
>> minutes that it fails running AMBER), although the temperature doesn't
>> hit
>> the same level.
>>
>> Certainly please let me know if the above follow-up sheds any more light
>> on the matter, but it all sounds fairly likely that I've got a dodgy
>> card,
>> and buying a replacement is warranted. I take your point, Jason, that
>> quality/reliability and performance may *both* scale with the model
>> selected, even if Hector got lucky. Maybe I need to have another look
>> down
>> the back of the sofa before going shopping. Many thanks for your help!
>>
>> Kind Regards,
>> John
>>
>> ==== === == = = = = = = = = =
>> =
>> John Gehman Office +61 3 8344 2417
>> ARC Future Fellow Fax +61 3 9347 8189
>> School of Chemistry Magnets +61 3 8344 2470
>> Bio21 Institute Mobile +61 407 536 585
>> 30 Flemington Rd jgehman.unimelb.edu.au
>> Univ. of Melbourne .GehmanLab
>> VIC 3010 Australia
>> http://www2.chemistry.unimelb.edu.au/staff/jgehman/research/
>>
>> "Science really suffers from bureaucracy. If we hadn't broken
>> every single WHO rule many times over, we would never
>> have defeated smallpox. Never."
>> -- Isao Arita, final director of the WHO smallpox eradication program
>>
>> ==== === == = = = = = = = = =
>> =
>>
>>
>>
>>
>>
>> From: Ross Walker <ross.rosswalker.co.uk<mailto:ross.rosswalker.co.uk>>
>> Reply-To: AMBER Mailing List
>> <amber.ambermd.org<mailto:amber.ambermd.org>>
>> Date: Tuesday, 12 March 2013 2:20 AM
>> To: AMBER Mailing List <amber.ambermd.org<mailto:amber.ambermd.org>>
>> Subject: Re: [AMBER] GTX 460 ?
>>
>> Hi John
>>
>> The list on the amber website is far from exhaustive. Mainly because I
>> can't keep up with all the various models of GPU that NVIDIA release.
>> The
>> GTX460 and 465 should both work fine with AMBER although I've not tested
>> it. The fact that the code runs some MD is indicative that it should
>> work.
>> What you are seeing is indicative of a faulty GPU. Are there no error
>> messages reported anywhere? - Does it always fail at the same point or
>> just roughly the same point? Does the GPU drop off the bus completely
>> (requiring a reboot to see it again?). Typically when a job will run for
>> a
>> few minutes and then stops it implies an overheating GPU, maybe a fan
>> not
>> working properly for example. It could also mean dodgy memory on the GPU
>> which happens sometimes although in that case the results are normally
>> crazy before the crash.
>>
>> Do the test cases all pass?
>>
>> As for the GTX560 - yes that should work fine.
>>
>> All the best
>> Ross
>>
>>
>> On 3/10/13 10:54 PM, "John Gehman" <jgehman.unimelb.edu.au<mailto:
>> jgehman.unimelb.edu.au>> wrote:
>>
>> Dear Amber Fans,
>>
>> Could anybody confirm whether or not the nVidia GTX 460 chipset should
>> work with Amber12? It's not on the list at
>> http://ambermd.org/gpus/#supported_gpus, which I presume is drawing a
>> distinction between hardware revision/compute capability 2.1 vs 2.0
>> [e.g.
>> for the GTX 465] per the guideline on that page. However, v2.1 *does*
>> appear to provide double precision, and the GTX560 which *is* OK'd for
>> Amber12 appears to actually be v2.1 as well (ref
>> https://developer.nvidia.com/cuda-gpus).
>>
>> The problem is that my Amber12 jobs seem to die with no errors or
>> explanation after about 30 ps on my GPU. This has happened for one of my
>> runs, as well as one of the benchmark runs, which run fine (albeit slow,
>> of course) on a single CPU.
>>
>> I am trying to ascertain whether the GPU is to blame, and if so, whether
>> a GTX560 (Ti) will actually get me going, or not.
>>
>> Many Thanks!
>> John Gehman
>> University of Melbourne
>>
>> ==== === == = = = = = = = = =
>> =
>> John Gehman Office +61 3 8344
>> 2417
>> ARC Future Fellow Fax +61 3 9347 8189
>> School of Chemistry Magnets +61 3 8344 2470
>> Bio21 Institute Mobile +61 407
>> 536
>> 585
>> 30 Flemington Rd jgehman.unimelb.edu.au
>> <mailto:jgehman.unimelb.edu.au>
>> Univ. of Melbourne
>> .GehmanLab
>> VIC 3010 Australia
>> http://www2.chemistry.unimelb.edu.au/staff/jgehman/research/
>>
>> "Crooked nails hold better" (JDG, unpublished data)
>>
>> ==== === == = = = = = = = = =
>> =
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org<mailto:AMBER.ambermd.org>
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org<mailto:AMBER.ambermd.org>
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
--------------------------------------
Dr. Hector A. Baldoni
Area de Quimica General e Inorganica
Universidad Nacional de San Luis
Chacabuco 917 (D5700BWS)
San Luis - Argentina
hbaldoni at unsl dot edu dot ar
Tel.:+54-(0)266-4423789 ext. 157
--------------------------------------
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Mar 12 2013 - 12:30:03 PDT