AMBER: floating point assist faults on IA64 PMEMD 9

From: Jarrod Smith <jarrod.smith.vanderbilt.edu>
Date: Wed, 13 Dec 2006 12:42:06 -0600 (CST)

Hi all,

We see several of these every few seconds in /var/log/messages on our
Altix 350:

Dec 13 10:42:54 4A:thresher kernel: pmemd(15284): floating-point assist
fault at ip 40000000000c5021, isr 0000020000000008

Our executable was compiled with ifort 9.1.040, and links to the Intel MKL
8. config.h is created with "./configure sgi_altix ifort mpi". All the
tests in the test.pmemd suite pass. I am only asking this question
because it seems likely that we may be able to get better performance if
we can avoid this condition.

There's lots of info out there about what this means and how to avoid it.
For example:

http://i-cluster2.inrialpes.fr/doc/misc/fpswa.txt
http://h21007.www2.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,165,00.html

Also possibly of interest:

http://www.intel.com/design/itanium/downloads/24541501.pdf

It seems like the "-ftz" option to ifort should do the trick. This has
always been set by default in config.h for building the .o files and I've
since added it to the LOADFLAGS as well so that it's there at link time,
too. Even so, the issue remains.

I've also attached the text of a related SGI knowledgebase entry to this
message. I tried their suggestion (-O2 -ftz) and this also had no impact.
Now I'm out of ideas. Any comments from pmemd and/or ia64 experts would
be much appreciated.

Sincerely,

Jarrod Smith

-- 
Jarrod A. Smith, Ph.D.
Asst. Director, Center for Structural Biology
Research Assoc. Professor, Biochemistry
Vanderbilt University


-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu

Received on Thu Dec 14 2006 - 04:59:59 PST
Custom Search