Re: [AMBER] GPU kernel error

From: Ross Walker <ross.rosswalker.co.uk>
Date: Sun, 17 Jun 2012 08:25:21 -0700

Hi Dmitry,

I can almost guarantee that this is caused by inadequate cooling of your
M2090 card. The fact it locks the machine up is key. I have seen exactly the
same issue with test cards I have in homemade machines. The M2090s have to
be in a properly ducked chassis designed specifically for them. I would
suggest both sending the card back under warranty and contacting the
manufacturer of your machine to find out if their cooling system is properly
certified for M2090 cards. If it is then you might want to check for
malfunctioning fans etc but most of these 1U or 2U server chassis for
M2090's have full fan monitoring and alarms so if there is a fan issue it
should already be alarming.

Out of interest what is the exact specs of your system? Who makes it? If you
let me know the model and the manufacturer I can try to escalate within
NVIDIA.

All the best
Ross

> -----Original Message-----
> From: Dmitry Mukha [mailto:dvmukha.gmail.com]
> Sent: Sunday, June 17, 2012 5:54 AM
> To: AMBER Mailing List
> Subject: Re: [AMBER] GPU kernel error
>
> Hi Ross!
>
> What do think, may it be connected with some BIOS settings? I faced
> with
> the similar problem with M2090. In /var/log/messages I found something
> like
> this right before 'stuck' error (this example was taken from forum, but
> text of the messages was the same)
>
> Message from syslogd.phoenix at Apr 20 13:05:41 ...
> kernel:[ 4787.436095] Do you have a strange power saving mode enabled?
>
> Message from syslogd.phoenix at Apr 20 13:05:41 ...
> kernel:[ 4787.436104] Dazed and confused, but trying to continue
>
> System is Fedora 16, kernel 3.3.6-3.fc16.x86_64, error is reproduced
> oddly,
> all tests were done fine. PC goes mad and need to be rebooted manually
> because shutdown -r now doesn't work.
>
> 2012/6/16 Ross Walker <ross.rosswalker.co.uk>
> >
> > Hi Fernando,
> >
> > Is this the only output you get,is there anything in the mdout file.
> Also
> is it reproducible as in it always occurs at the same step every time?
> What
> about if you try a slightly different system or simulation parameters?
>
> >
> >
> --
> Sincerely,
> Dmitry Mukha
> Institute of Bioorganic Chemistry, NAS, Minsk, Belarus
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Jun 17 2012 - 08:30:03 PDT
Custom Search