Re: [AMBER] JAC test on GTX 470 error produced/run again w/o changes/need help

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 8 Sep 2010 20:42:12 -0700

Hi Sergio,

> Because there has been some question of temperatures, I am running the
> X server and the nividia-settings tool this time. Since the job is
> small I don't expect any interference, but this will allow me to record
> temperatures. I will re run the job w/o X again to eliminate any
> doubts about X server interference. The temperatures are 83 C gpu, 56
> C board, 55% fan. The job already passed the 120 ps where it crashed

This does not tell you about any specific hot spots in part of the GPU
though so I'm not sure what can be read from this. If you can somehow force
the fan to run at 100% that would be more useful. Or open the case and stick
the biggest fan you possibly can pointing straight at the GPU and see if
that helps. - Note the real test though will be if underclocking the GPU and
Memory speeds to match the C2050 helps fix the problem.

> Please help: I ran that patch with the new bugfix.all for Amber11 to
> include patches 7, and 8 that I did not have. It appears that in patch
> 4, which I had previously applied, the patch did not get skipped and I
> got a message saying:
> Patching file src/pmemd/src/cuda/gpu.cpp
> HUNK #4 Failed at 2072
> HUNK #5 Failed at 2259
> HUNK #6 Failed at 2773 out of 6 hunks failed - saving rejects in ...

My advice at this point would be to delete the whole of your amber11 tree
(or archive it somewhere). Then re-extract a vanilla copy from the
Amber11.tar.bz2 file and the AMBERTools1.4.tar.bz2 files and then apply the
AMBER 11 bugfix.all and AmberTools bugfix.all. Ultimately this will be much
more efficient than trying to debug the above issue. And will give you peace
of mind that things are patched correctly.
 
> At any rate, I have not recompiled Amber 11. I know I should have just
> applied patches 7,8 individually, and instead I got lazy and used the
> entire bugfix.all.

No, this 'should' work but relies on how well patch can recognize if
something is already patched. In your case though this suggests to me that
something was 'fishy' with the gpu.cpp file in the first place so
rextracting everything from the original tar files is probably safer.

All the best
Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.




_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 08 2010 - 21:00:03 PDT
Custom Search