Re: [AMBER] Monitoring GPU voltage

From: Kenneth Lam <kenneth.lam.zh.gmail.com>
Date: Thu, 4 Dec 2014 18:00:52 -0500

Hi Ross,

We're currently using 4 GTX 780s in each of our machines, and some of them
are breaking down. We suspect this might be a power issue. We'd like to
track the power consumption of each of the cards while running AMBER so
that we know whether or not the cards are failing because of insufficient
power draw from the PSU. We're trying to find a solution that will allow
us to log the power consumption for the GPUs while they're running, and
using an ammeter may not be able to provide us with a running log. Thanks!

Kenneth


On Thu, Dec 4, 2014 at 5:16 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Kenneth,
>
> Can I ask why you need to monitor the power consumption of the GPUs - it's
> not really information that is of much use as far as I can tell. If you
> want to know the power consumption it's probably better just to stick an
> ammeter on the lead to the power supply.
>
> All the best
> Ross
>
>
> On 12/4/14, 1:24 PM, "Kenneth Lam" <kenneth.lam.zh.gmail.com> wrote:
>
> >Unfortunately, the fixes mentioned in the linked threads point to the same
> >github repository that Tru indicated. Users there have noted that this
> >fix
> >does not work past 331.20, as pointed out here in this thread.
> >https://github.com/CFSworks/nvml_fix/issues/5
> >
> >We've tried going through nvidia-settings, but it does not report the
> >voltage consumption. Are there any other alternatives available? Thanks
> >again!
> >
> >Kenneth
> >
> >On Thu, Dec 4, 2014 at 4:14 PM, Jason Swails <jason.swails.gmail.com>
> >wrote:
> >
> >>
> >> > On Dec 4, 2014, at 3:08 PM, Kenneth Lam <kenneth.lam.zh.gmail.com>
> >> wrote:
> >> >
> >> > Hello all,
> >> >
> >> > We're unable to monitor the voltage going to our GTX 680s and 780s.
> >>We
> >> > have been trying to use nvidia-smi to do so, but it does not support
> >>any
> >> > cards past the GTX500 series. Is there a recommended software that
> >>works
> >> > with current gen GPUs (GTX 680+) and works on Linux, or should this be
> >> done
> >> > at the hardware level? If yes, what would you recommend? Thanks!
> >>
> >> This has been discussed in the nVidia forums before with some fixes to
> >> NVML being proposed (really more of a band-aid). You can try the fixes
> >> discussed there (may be outdated). Alternatively, I think you can also
> >>get
> >> that information in nvidia-settings.
> >>
> >>
> >>
> >>
> https://devtalk.nvidia.com/default/topic/560248/system-management-and-mon
> >>itoring-nvml-/bug-nvml-incorrectly-detects-certain-gpus-as-unsupported-/
> >> <
> >>
> >>
> https://devtalk.nvidia.com/default/topic/560248/system-management-and-mon
> >>itoring-nvml-/bug-nvml-incorrectly-detects-certain-gpus-as-unsupported-/
> >> >
> >>
> >> These statistics *are* reported for the Tesla line, so I¹m not sure if
> >> this is a marketing move that nVidia is using to promote their HPC line
> >>or
> >> what (but according to the above thread, such reporting _is_ supported
> >>in
> >> hardware for those cards).
> >>
> >> HTH,
> >> Jason
> >>
> >> --
> >> Jason M. Swails
> >> BioMaPS,
> >> Rutgers University
> >> Postdoctoral Researcher
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Dec 04 2014 - 15:30:02 PST
Custom Search