Re: [AMBER] GTX 460 ?

From: Aron Broom <broomsday.gmail.com>
Date: Tue, 12 Mar 2013 16:39:24 -0400

Also, if you check out the SimTK or OpenMM people's website, I believe they
have a GPU version of the popular memtest86 application, that can allow you
to quickly or exhaustively check your GPUs memory.

I had found that I was often having AMBER jobs complete, but with all
positions as NaN after a few fs on a GTX 580, but not on a 570 or M2070,
and running that application showed the 580 had a number of bad memory
sectors.

~Aron

On Tue, Mar 12, 2013 at 3:20 PM, Hector A. Baldoni <hbaldoni.unsl.edu.ar>wrote:

> Hi,
>
> Before to decide if you have a bad GPU. You would try to install the last
> gtx460 drivers, the CUDA5.0 toolkit and recompile pmemd with gnu all patch
> included.
>
> Hector.
>
> > Run JAC NVE for 100,000 iterations. If it crashes, you have a bad GPU.
> > On Mar 11, 2013 6:59 PM, "John Gehman" <jgehman.unimelb.edu.au> wrote:
> >
> >> Many Thanks Jason, Hector, and Ross,
> >>
> >> To answer Ross's questions:
> >> -- No, I cannot find any error messages anywhere — I've checked md.out
> >> files, /var/log files, monitored nvidia-smi, no evidence of any
> >> problems.
> >> -- I've confirmed that the fan is running fine, however I think it's
> >> probably correct that the fault is temperature related: I tested again
> >> this
> >> morning, at slightly cooler ambient temperature than the tests I
> >> reported
> >> earlier which were run later in the day (during a general heat wave here
> >> in
> >> Australia) -- the md ran longer this time (2-3 minutes), but I think
> >> failed
> >> at similar temperatures (last caught temps with manual nvidia-smi
> >> updates
> >> before failure were 67-70C)
> >> -- The card does not fall off the bus — no reboot required, and from
> >> what
> >> I can find on the web, I believe there should be a log entry in /var/log
> >> if
> >> I were to suffer such an event.
> >> -- I'll probe a bit further into the results to look for "crazy". The
> >> cuda
> >> tests reported no bona fide errors, and 7/88 "possible failures", all of
> >> which were "Maximum * error …" messages for differences in the last
> >> digit
> >> of specified values; all the tests fundamentally ran and completed,
> >> though.
> >>
> >> CUDA-Z runs fine for as long as I've left it go (longer than the two
> >> minutes that it fails running AMBER), although the temperature doesn't
> >> hit
> >> the same level.
> >>
> >> Certainly please let me know if the above follow-up sheds any more light
> >> on the matter, but it all sounds fairly likely that I've got a dodgy
> >> card,
> >> and buying a replacement is warranted. I take your point, Jason, that
> >> quality/reliability and performance may *both* scale with the model
> >> selected, even if Hector got lucky. Maybe I need to have another look
> >> down
> >> the back of the sofa before going shopping. Many thanks for your help!
> >>
> >> Kind Regards,
> >> John
> >>
> >> ==== === == = = = = = = = = =
> >> =
> >> John Gehman Office +61 3 8344 2417
> >> ARC Future Fellow Fax +61 3 9347 8189
> >> School of Chemistry Magnets +61 3 8344 2470
> >> Bio21 Institute Mobile +61 407 536 585
> >> 30 Flemington Rd jgehman.unimelb.edu.au
> >> Univ. of Melbourne .GehmanLab
> >> VIC 3010 Australia
> >> http://www2.chemistry.unimelb.edu.au/staff/jgehman/research/
> >>
> >> "Science really suffers from bureaucracy. If we hadn't broken
> >> every single WHO rule many times over, we would never
> >> have defeated smallpox. Never."
> >> -- Isao Arita, final director of the WHO smallpox eradication program
> >>
> >> ==== === == = = = = = = = = =
> >> =
> >>
> >>
> >>
> >>
> >>
> >> From: Ross Walker <ross.rosswalker.co.uk<mailto:ross.rosswalker.co.uk>>
> >> Reply-To: AMBER Mailing List
> >> <amber.ambermd.org<mailto:amber.ambermd.org>>
> >> Date: Tuesday, 12 March 2013 2:20 AM
> >> To: AMBER Mailing List <amber.ambermd.org<mailto:amber.ambermd.org>>
> >> Subject: Re: [AMBER] GTX 460 ?
> >>
> >> Hi John
> >>
> >> The list on the amber website is far from exhaustive. Mainly because I
> >> can't keep up with all the various models of GPU that NVIDIA release.
> >> The
> >> GTX460 and 465 should both work fine with AMBER although I've not tested
> >> it. The fact that the code runs some MD is indicative that it should
> >> work.
> >> What you are seeing is indicative of a faulty GPU. Are there no error
> >> messages reported anywhere? - Does it always fail at the same point or
> >> just roughly the same point? Does the GPU drop off the bus completely
> >> (requiring a reboot to see it again?). Typically when a job will run for
> >> a
> >> few minutes and then stops it implies an overheating GPU, maybe a fan
> >> not
> >> working properly for example. It could also mean dodgy memory on the GPU
> >> which happens sometimes although in that case the results are normally
> >> crazy before the crash.
> >>
> >> Do the test cases all pass?
> >>
> >> As for the GTX560 - yes that should work fine.
> >>
> >> All the best
> >> Ross
> >>
> >>
> >> On 3/10/13 10:54 PM, "John Gehman" <jgehman.unimelb.edu.au<mailto:
> >> jgehman.unimelb.edu.au>> wrote:
> >>
> >> Dear Amber Fans,
> >>
> >> Could anybody confirm whether or not the nVidia GTX 460 chipset should
> >> work with Amber12? It's not on the list at
> >> http://ambermd.org/gpus/#supported_gpus, which I presume is drawing a
> >> distinction between hardware revision/compute capability 2.1 vs 2.0
> >> [e.g.
> >> for the GTX 465] per the guideline on that page. However, v2.1 *does*
> >> appear to provide double precision, and the GTX560 which *is* OK'd for
> >> Amber12 appears to actually be v2.1 as well (ref
> >> https://developer.nvidia.com/cuda-gpus).
> >>
> >> The problem is that my Amber12 jobs seem to die with no errors or
> >> explanation after about 30 ps on my GPU. This has happened for one of my
> >> runs, as well as one of the benchmark runs, which run fine (albeit slow,
> >> of course) on a single CPU.
> >>
> >> I am trying to ascertain whether the GPU is to blame, and if so, whether
> >> a GTX560 (Ti) will actually get me going, or not.
> >>
> >> Many Thanks!
> >> John Gehman
> >> University of Melbourne
> >>
> >> ==== === == = = = = = = = = =
> >> =
> >> John Gehman Office +61 3 8344
> >> 2417
> >> ARC Future Fellow Fax +61 3 9347
> 8189
> >> School of Chemistry Magnets +61 3 8344 2470
> >> Bio21 Institute Mobile +61 407
> >> 536
> >> 585
> >> 30 Flemington Rd jgehman.unimelb.edu.au
> >> <mailto:jgehman.unimelb.edu.au>
> >> Univ. of Melbourne
> >> .GehmanLab
> >> VIC 3010 Australia
> >> http://www2.chemistry.unimelb.edu.au/staff/jgehman/research/
> >>
> >> "Crooked nails hold better" (JDG, unpublished data)
> >>
> >> ==== === == = = = = = = = = =
> >> =
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org<mailto:AMBER.ambermd.org>
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org<mailto:AMBER.ambermd.org>
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
> --------------------------------------
> Dr. Hector A. Baldoni
> Area de Quimica General e Inorganica
> Universidad Nacional de San Luis
> Chacabuco 917 (D5700BWS)
> San Luis - Argentina
> hbaldoni at unsl dot edu dot ar
> Tel.:+54-(0)266-4423789 ext. 157
> --------------------------------------
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Mar 12 2013 - 14:00:03 PDT
Custom Search