Re: [AMBER] NaN error in .rst files

From: filip fratev <filipfratev.yahoo.com>
Date: Tue, 25 Jan 2011 00:25:08 -0800 (PST)

Hi Peker,
I use GTX470 card too and as I reported previously there were no problems till now after bugfix12. After bugfix12 has been released I performed more that 100ns simulations including several restarts. What kind of OC you are using?

About the heat issue..
Probably most of you know these options, but I will share them again:

You can be able to run the graphical interface of nvidia-settings (nvidia-settings X Server Display Configuration) that allows you to follow the temperature levels. Just type: nvidia-settings. This package comes with Nvidia drivers and should be included in your system too. There are many programs that show the CPU and GPU temps and should also be included in your Linux distribution (if it is Linux). The last what I saw is python based software written just few days ago, which is very helpful because it shows graphically almost everything about your card (memory usage, temperature and so on, thus you don’t need to use nvidia-smi anymore:

http://www.nvnews.net/vbulletin/showthread.php?p=2382425
http://sourceforge.net/projects/nvchart/

Probably you know that the default GTX470 fan speed is ONLY 40%. I run my, during Amber calculations, with fan speed between 75-80% in order to keep the temperature up to 70C, but some times it reach the level of 80C. Thus if you use the default speed for Amber calculations I can not imagine what is your temperature…and this really can be problem. Unfortunately, Nvidia continue to irritate the people and don’t provide overcklock possibility for Fermi (even in the new beta drivers 270.13), but fortunately there is a fan control option. You just need to write in your Xorg file, just below device section: Option "Coolbits" "5" and than restart and type nvidia-settings and you should see that your fan control is enabled and can control your fan speed.

Hope that helps,

Filip


--- On Tue, 1/25/11, peker milas <pekermilas.gmail.com> wrote:

> From: peker milas <pekermilas.gmail.com>
> Subject: [AMBER] NaN error in .rst files
> To: "AMBER Mailing List" <amber.ambermd.org>
> Date: Tuesday, January 25, 2011, 12:42 AM
> Dear Amber coders/users
>
> We had a discussion about this problem couple days ago.
> Briefly at
> some point of simulations pmemd.cuda creates NaN s in the
> restart
> files. Then the next step of simulation messes up. This is
> sort of a
> problem which happens in totally random way. As some users
> already
> pointed out, part of it can be related with heating problem
> because
> GTX 470/480 cards were not designed for this kind of
> intense
> calculations.  Although i totally agree with this, i
> couldn't find an
> easy way of observing this behavior. In fact there is only
> nvidia-smi
> command which shows me the current state of the card (like
> its
> temperature, etc...). So i really want to go little bit
> deep with this
> problem and i would like to know if anybody knows a good
> indicator for
> checking cards behavior in detail. It can be a command or
> any sort
> additional package/program for monitoring it.
>
> Any help will be appreciated
>
> regards
> peker milas
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>


      

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jan 25 2011 - 00:30:05 PST
Custom Search