I'm writing not to ask for help or to request a bugfix - but to leave a
record of a workaround for a failure of the Run.dhfr.min serial sander
test on
our JS20 IBM PowerPC cluster architecture and the IBM XLF90 Fortran
compiler (flags -O3 -qstrict -q64) for Linux (bug is in both versions
9.1 and 10.1)
Without the workaround, the failure appears as a large, positive, EEL
energy value in the Run.dhfr.min test, starting with step 2.
I hope that these notes might save someone else some time if they
encounter similar problems. And, if anyone at Scripps would like me to
do further detailed debugging, I'd be happy to work with them off-line.
The workaround is to simply compile $AMBERHOME/src/sander/runmin.f with
optimizations turned off. Though not elegant, a brute force approach is
to hand-edit $AMBERHOME/src/sander/depend so that the FOPTFLAG variable
for runmin.o is replaced with an explicit list of flags that turn off
optimization. i.e. "-O0" in place of "-O3 -qstrict" and all other
FOPTFLAGS unchanged.
-----------------------------------------------------
More information/details for those of you interested:
-----------------------------------------------------
Excerpted from config.h, our Fortran compiler options are:
#------------------------------------------------------------------------------
# Fortran preprocessing and compiler.
# FPPFLAGS holds the main Fortran options, such as whether MPI is used.
#------------------------------------------------------------------------------
FPPFLAGS= -P -traditional-cpp -DCLINK_PLAIN -DXLF90 $(AMBERBUILDFLAGS)
-DF90_TIMER
FPP= cpp $(FPPFLAGS)
FC= xlf90
FFLAGS= -qfixed -c $(LOCALFLAGS) $(AMBERBUILDFLAGS) -q64
FOPTFLAGS= -qfixed -O3 -qstrict -q64 -qmaxmem=-1 -qarch=auto -qtune=auto
-c $(LOCALFLAGS) $(AMBERBUILDFLAGS)
FREEFORMAT_FLAG= -qfree=f90 -q64
The first few lines of mdout.dhfr.min.dif after ./Run.dhfr.min fails
show clear miscaluclation.
Note that step 1 has reasonable EEL and is not a problem. But, then EEL
goes very positive in step 2 and subsequently.
$ head -18 mdout.dhfr.min.dif
83c83
< 2 -7.1873E+4 1.7336E+1 1.0535E+2 C 1855
---
> 2 9.1283E+5 8.1856E+1 1.8334E+2 O 13223
85c85
< VDWAALS = 8106.9482 EEL = -89882.6659 HBOND = 0.
---
> VDWAALS = 8106.9482 EEL = 894818.7954 HBOND
= 0. <<< OUCH BAD EEL in step 2
88,91c88,91
< 3 -7.1885E+4 1.7318E+1 1.0448E+2 C 1855
< BOND = 443.9987 ANGLE = 1271.5621 DIHED =
967.6424
< VDWAALS = 8104.0210 EEL = -89883.8930 HBOND =
0. <<< GOOD/EXPECTED RESULT
< 1-4 VDW = 545.7307 1-4 EEL = 6666.3248 RESTRAINT = 0.
---
> 3 9.1284E+5 8.1855E+1 1.8333E+2 O 13223
> BOND = 447.9747 ANGLE = 1274.3896 DIHED =
968.0028
> VDWAALS = 8106.9279 EEL = 894831.2130 HBOND
= 0. <<< OUCH BAD EEL
> 1-4 VDW = 545.8298 1-4 EEL = 6666.3730 RESTRAINT = 0.
More about our setup
----------------------
A Summary of our architecture is here:
http://www.accre.vanderbilt.edu/mission/researcher_text/pubstext.php
To be compatible with libraries on our system, libraries that are not
under our control, we must compile sander (at least the mpi version)
with the -q64 flag.
( Compiling all modules with -O0 or -q32 eliminated the xlf90 problem in
serial sander. )
Using the newest 10.1 compiler had no impact. -q64 with -O3/-qstrict
continued to fail. So, we don't think there is a "compiler patch" yet
to fix this.
Using a binary search approach - we quickly homed in on runmin.f as the
"problem" and, with optimization turned off for this one module, all
appears well.
It would be nice to report this to IBM in a short sample program... But
without assistance, I don't think I can produce that very quickly.
I am not a FORTRAN guru - but I am very experienced in C/C++ and gdb
debugging. If anyone would like to coach me through a little debugging
under xlf90, I'd be happy to try to home in on exactly what lines of
code are so distasteful to xlf90 for ppc/-q64.
Chris
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Feb 04 2007 - 06:07:34 PST