Re: AMBER: floating point assist faults on IA64 PMEMD 9

From: Robert Duke <rduke.email.unc.edu>
Date: Wed, 13 Dec 2006 15:22:23 -0500

Jarrod -
Okay, I remember this stuff from amber 8, but have not heard of problems in
a long time. Could be folks just don't see the logs. On the one altix that
I can routinely access, I can't see the logs because I am not root. As you
say, we know that there is not a correctness issue here, but we could
possibly be running faster, as the fp software assist code is known to be
slow. So I have two possible guesses, seeing as how this stuff seems to be
turned off by the default pmemd build with the -ftz flag:
1) Perhaps the MKL library was not built by the vendor with this flag set?
It would seem they would do this, but who knows.
2) What about previous versions of the compiler? Are you seeing this only
after upgrading?

I just got a cc'd mail from Roberto Gomperts, our SGI applications
specialist; he thinks it is probably not the MKL, but I just don't know.
I'll happily let Roberto run with this one; Intel should listen to him a
little better than the rest of us ;-) (thanks, Roberto!)

Regards - Bob Duke


----- Original Message -----
From: "Jarrod Smith" <jarrod.smith.vanderbilt.edu>
To: <amber.scripps.edu>
Sent: Wednesday, December 13, 2006 1:42 PM
Subject: AMBER: floating point assist faults on IA64 PMEMD 9


> Hi all,
>
> We see several of these every few seconds in /var/log/messages on our
> Altix 350:
>
> Dec 13 10:42:54 4A:thresher kernel: pmemd(15284): floating-point assist
> fault at ip 40000000000c5021, isr 0000020000000008
>
> Our executable was compiled with ifort 9.1.040, and links to the Intel MKL
> 8. config.h is created with "./configure sgi_altix ifort mpi". All the
> tests in the test.pmemd suite pass. I am only asking this question
> because it seems likely that we may be able to get better performance if
> we can avoid this condition.
>
> There's lots of info out there about what this means and how to avoid it.
> For example:
>
> http://i-cluster2.inrialpes.fr/doc/misc/fpswa.txt
> http://h21007.www2.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,165,00.html
>
> Also possibly of interest:
>
> http://www.intel.com/design/itanium/downloads/24541501.pdf
>
> It seems like the "-ftz" option to ifort should do the trick. This has
> always been set by default in config.h for building the .o files and I've
> since added it to the LOADFLAGS as well so that it's there at link time,
> too. Even so, the issue remains.
>
> I've also attached the text of a related SGI knowledgebase entry to this
> message. I tried their suggestion (-O2 -ftz) and this also had no impact.
> Now I'm out of ideas. Any comments from pmemd and/or ia64 experts would
> be much appreciated.
>
> Sincerely,
>
> Jarrod Smith
>
> --
> Jarrod A. Smith, Ph.D.
> Asst. Director, Center for Structural Biology
> Research Assoc. Professor, Biochemistry
> Vanderbilt University
>


-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Thu Dec 14 2006 - 05:00:01 PST
Custom Search