Different numbers on different architectures.

From: David Smith <David.Smith_at_cup.uni-muenchen.de>
Date: Fri 04 Oct 2002 15:33:24 +0200

Hello all,

Up until now I have been running AMBER on linux systems compiled
with g77. Just recently I got access to some new platforms on which I
attempted to compile AMBER. I would like to use this mail to report on
my experiences and, in particular, ask about the seriousness of the
differing results I get between these builds.

For the benifit of those who might be able to help me I will try to
include as much relevant detail as possible, so the mail may end up
being rather long.

I am still running with AMBER 6 (sorry, I like sander_classic), which I
understand is not supported anymore. However, I think most of the issues
are not version specific, especially as my main interest is in the
results coming out of gibbs.

For each build I ran the tests provided as well of some jobs of my own.

A) G77

For example, on an AMD processor using an unmodified version of
Machine.g77, the gibbs tests pass with only one possible failure which
is the following in gibbs_2.out.dif:


----
144,146c144,146
<  delta(LAMBDA)=0.2500000E-01  dA/d(LAMBDA) [SLOPE]=  0.00000E+00
<  slope*delta(LAMBDA)= 0.00000E+00  corr. coef.= 0.000000  pts for
line=   0.00
<  delA(for)-delA(rev)= 0.00000E+00  multiplier=  1.0000
---
>  delta(LAMBDA)=0.2500000E-01  dA/d(LAMBDA) [SLOPE]=   0.0000
>  slope*delta(LAMBDA)=  0.0000      corr. coef.= 0.000000  pts for line=   0.00
>  delA(for)-delA(rev)=  0.0000      multiplier=  1.0000
----
Which is only a difference in the output format of the 0s.
I also have a test job of my own I have been using.  This is a
perturbation in a small box of water with periodic boundary conditions
in the NTP ensemble. I use internal constraints via intr=1 and itor=2. I
equilibrate this box using gibbs at lambda=1 by setting NSTMEQ >
NSTLIM.  I then use the restart file from this run to start the
perturbation (51 2ps steps with electrostatic decoupling and
Thermodynamic
Integration, total 102ps for the electrostatic part), reading in the
positions, velocities and box
dimensions via ntx=7. The input file is as follows:
perturbation of bu1 (to bu3) in water
 &cntrl
irest=0, ntx=7, init=4, ntb=2, nrun=51,
dt=0.001, nstlim=2000, nstmeq=600, nstmul=1400,
ntc=2, ntf=2, intr=1,
temp0=300.0, tautp=0.5, ntt=1,
ntp=1, pres0=1.0, taup=0.5, npscal=1,
cut=9.0, scnb=2.0, scee=1.2, dielc=1.0,
cutprt=12.0, nsnb=20,
ielper=1, intprt=0,
idifrg=1, isande=1,
almda=1.0, almdel=0.02, isldyn=-3,
ntpr=200, ntwx=200,
 &end
00002 00001 00000 00000 00000 00000 00002  000.00000 001.00000
0000.00000 001.82532 000.00000 001.82795 0000 0000
other internal constraints ...
run with:
gibbs -O -i ti-100e.in -o ti-100e.out -r ti-100e.rst \
         -p bu1-s-pert.top -c equil.rst -ms ti-100e.sum -x ti-100e.crd
and the final energy is:
    Lambda   =  0.000000    F_energy  =    0.40692
    Enthalpy =    0.76915   T*Entropy =    0.36223 
B) PGF77
I recently got hold of the portland compiler and thought I'd give it a
go. I used an unaltered Machine.pgf77 file.
This time gibbs_2.out.dif has:
146c146
<  delA(for)-delA(rev)= 0.00000E+00  multiplier=  1.0000
---
>  delA(for)-delA(rev)= 0.92044E-15  multiplier=  1.0000
which I thought was a pretty small difference.
For my job above, I used the same input file with the same script and
the same PINCRD (equil.rst).
The final energy is pretty close to before (at least G if not H and S) 
but not quite the same:
    Lambda   =  0.000000    F_energy  =    0.39719
    Enthalpy =    0.20729   T*Entropy =   -0.18991 
In addition, a window by window comparison shows quite some differences
(e.g. window 35 has F= 0.0112 for g77 and F=0.0057 for pgf77).
C) Alpha Linux
I also got hold of a couple of compaq workstations and wanted to try
with Alpha Linux (RedHat 7.2) which I recently put on. I got the Compaq
compilers and used the following (slightly modified)
version of Machine.alpha_linux: 
setenv MACHINE "DEC Alpha linux"
setenv MACH AXP_OSF
setenv MACHINEFLAGS " -DPREC -DREGNML -DEWALD -DHAS_FTN_ERFC"
# CPP is the cpp for this machine
setenv CPP "/lib/cpp -traditional"
# SYSDIR is the name of the system-specific source directory for
makemake
setenv SYSDIR Machines/alpha
setenv LOADLIB  "/usr/lib/libcxml.a "
# COMPILER ALIASES:
setenv CC "ccc "
setenv LOADCC "ccc "
setenv VENDOR_BLAS yes
setenv VENDOR_LAPACK yes
# LOADER/LINKER:
setenv LOAD "fort  -convert big_endian "
setenv L0 "fort -arch host -extend_source -convert big_endian -c -tune
host -fast "
setenv L1 "fort  -arch host -extend_source -convert big_endian -c -O
-tune host -fast "
setenv L2 "fort  -arch host -extend_source -convert big_endian -c -O
-tune host -fast "
setenv L3 "fort  -arch host -extend_source -convert big_endian -c -O5
-tune host -fast -unroll 3 "
# ranlib, if it exists
setenv RANLIB ranlib
#--------------------------------
and then I had to compile leap separately using gcc but without the
-taso, -non_shared, and -ldnet_stub flags in the
$AMBERHOME/src/leap/src/leap/Imakefile.
This time gibbs_2.out.dif has:
144,146c144,146
<  delta(LAMBDA)=0.2500000E-01  dA/d(LAMBDA) [SLOPE]=  0.00000E+00
<  slope*delta(LAMBDA)= 0.00000E+00  corr. coef.= 0.000000  pts for
line=   0.00
<  delA(for)-delA(rev)= 0.00000E+00  multiplier=  1.0000
---
>  delta(LAMBDA)=0.2500000E-01  dA/d(LAMBDA) [SLOPE]=   0.0000
>  slope*delta(LAMBDA)=  0.0000      corr. coef.= 0.000000  pts for line=   0.00
>  delA(for)-delA(rev)= 0.65746E-15  multiplier=  1.0000
Again I ran my test job and the final answer is:
    Lambda   =  0.000000    F_energy  =    0.41389
    Enthalpy =    0.33619   T*Entropy =   -0.07770
Once again, the window by window comparison shows quite some
differences.
Finally on an SGI (8 times R12000) I compiled with an unaltered
Machine.sgi  (I used the sgi_mpi for sander). 
gibbs_2.out.dif:
102a103
>      |  Running shared memory parallel version on     4 processors
146c147
<  delA(for)-delA(rev)= 0.00000E+00  multiplier=  1.0000
---
>  delA(for)-delA(rev)= 0.26298E-15  multiplier=  1.0000
and my job gives:
 
    Lambda   =  0.000000    F_energy  =    0.37039
    Enthalpy =    0.31467   T*Entropy =   -0.05572
With MACINE=Machine.sgi_nopar I get all tests passed but my job gives
the same answer.
The window by window comparison is as before (quite different on all
versions).
In summary:
Water box.
compiler/architecture	F_energy
g77/i686	0.40692
pgf77/i686	0.39719
compaq/alpha	0.41389
mips/sgi	0.37039
I also have an 50ps FEP (double wide) run of a protein in a droplet
which I didn't run under pgf yet but I do get:
Protein.
complier/architectuere	DG(forward)	DG(reverse)
g77/i686	0.12555		-0.16087
compaq/alpha	0.17262		-0.21105
mips/sgi	0.02955		-0.07174
I know that both of the runs are quite short but I was expecting a
little better agreement.
I was just wondering if the developers had a feel if this was the normal
amount of noise observed across different architectures or if I have
done something obviously wrong.
Is there any way to reduce this variation or is it something one just
has to live with.  In general, I am extending my runs to times long
enough to see convergence. It seems to me that this is something I
cannot do across different architectures, would everybody agree ??
If you got this far, thanks for your patience and I will really
appreciate any comments anybody has.
---------------------------------------
Dr. David Smith
Department of Chemistry
Ludwig Maximilians University
Butenandt-Str. 5-13, D-81377 Munich
Germany
Tel.: +49 (0)89 2180 7740
Fax.: +49 (0)89 2180 7738
e-mail: David.Smith_at_cup.uni-muenchen.de
---------------------------------------
Received on Fri Oct 04 2002 - 06:33:24 PDT
Custom Search