Re: AMBER: Test fails in Parallel version Amber9

From: Scott Brozell <sbrozell.scripps.edu>
Date: Tue, 16 Oct 2007 11:35:24 -0700

Hi,

On Tue, 16 Oct 2007, gong wb wrote:

> We have compiled serial and parallel version of Amber9. The test
> of the serial version reported 4 possible errors, all of them are
> round off errors. But the test of the parallel version gave other type
> errors.
> Our Installation Step (parallel version):
> cd $AMBERHOME
> cp bugfix.all ./
> patch -p0 -N -r patch_rejects < ./bugfix.all
> cd src
> ./configure -static -lam -bintraj ifort_ia32
> make clean
> make parallel >& make_parallel.log
>
> We have checked the logfile and find no error message.
> Here is the information about our operator system and compiler version
> Operator system: Red Hat Linux 8
> >uname -a
> Linux nodeXX 2.4.18-26.7.xsmp #1 SMP Mon Feb 24 09:37:16 EST 2003 i686
> i686 i386 GNU/Linux
> Hardware:Parallel Cluster(each node have two cpus)
> >cat /proc/cpuinfo
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 15
> model : 2
> model name : Intel(R) Xeon(TM) CPU 2.40GHz
> stepping : 7
> cpu MHz : 2392.217
> cache size : 512 KB
> Physical processor ID : 0
> Number of siblings : 1
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 2
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
> bogomips : 4750.85
> (The other cpu is the same, so info omit)
> Compiler version: Intel Fortran 8.1
> >ifort -v
> Version 8.1
> >gcc -v
> Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2/specs
> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
> --infodir=/usr/share/info --enable-shared --enable-threads=posix
> --disable-checking --host=i386-redhat-linux --with-system-zlib
> --enable-__cxa_atexit
> Thread model: posix
> gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7)
>
> We use lam_mpi_7.1.3 for parallel calculation, it is compiled using the
> same Intel C and Fortran compiler (Version 8.1)
> The following is our test steps:
> First, create a nodefile for lam, and the content is:
> nodeXX cpu=2
> nodeXX cpu=2
> Then, set the DO_PARALLEL and start lamboot:
> export DO_PARALLEL='/opt/lam_mpi_7.1.3/bin/mpirun -np 4'
> /opt/lam_mpi_7.1.3/bin/lamboot -prefix /opt/lam_mpi_7.1.3 ./nodefile
> make test.parallel >& test.log
>
> The DIFF file is:
> possible FAILURE: check rms.dif
> /public/amber9/test/bintraj
> 1,10d0
> < 1.00 0.
> < 2.00 0.00998
> < 3.00 0.02162
> < 4.00 0.03381
> < 5.00 0.04485
> < 6.00 0.05458
> < 7.00 0.06244
> < 8.00 0.06958
> < 9.00 0.07753
> < 10.00 0.08657

The diff shows the whole normal output.
This indicates that file rms was not created.
According to test/bintraj/Run.bintraj
the problem may be with ptraj.
Examine ptraj.out and try running the last steps of Run.bintraj
involving ptraj manually.

> ---------------------------------------
> possible FAILURE: check mdout.jar.001.dif
> /public/amber9/test/jar_multi
> 177c177
> < Etot = -3538.3785 EKtot = 478.2764 EPtot =
> -4016.6548
> ---
> > Etot = -3538.3784 EKtot = 478.2764 EPtot =
> -4016.6548
> 180c180
> < EELEC = -18.4200 EGB = -2503.6434 RESTRAINT =
> 3.6286
> ---
> > EELEC = -18.4199 EGB = -2503.6434 RESTRAINT =
> 3.6286

Insignificant difs.

> ---------------------------------------
> possible FAILURE: check rem.log.dif
> /public/amber9/test/rem_gb_4rep
> 26c26
> < 2 1.15 234.76 -3.24 300.00 400.00 0.80
> ---
> > 2 1.15 261.02 -4.61 300.00 400.00 0.80
> ---------------------------------------
> possible FAILURE: check reminfo.000.dif
> /public/amber9/test/rem_gb_4rep
> 16,20c16,20
> < NSTEP = 100 TIME(PS) = 100.800 TEMP(K) = 234.76 PRESS
> = 0.
> < Etot = 21.0164 EKtot = 24.2585 EPtot =
> -3.2421
> < BOND = 14.3725 ANGLE = 19.8208 DIHED =
> 25.4361
> < 1-4 NB = 5.7103 1-4 EEL = 182.5250 VDWAALS =
> -5.9319
> < EELEC = -213.6574 EGB = -31.5175 RESTRAINT =
> 0.
> ---
> > NSTEP = 100 TIME(PS) = 100.800 TEMP(K) = 261.02 PRESS
> = 0.
> > Etot = 22.3628 EKtot = 26.9719 EPtot =
> -4.6092
> > BOND = 14.9791 ANGLE = 17.9986 DIHED =
> 25.4386
> > 1-4 NB = 5.6257 1-4 EEL = 182.4183 VDWAALS =
> -5.8981
> > EELEC = -213.4152 EGB = -31.7563 RESTRAINT =
> 0.

The 2nd replica of the last exchange is different.
This may be insignificant; is it reproducible ?

> ---------------------------------------
> There are four FAILUREs, one is round off error. But we cann't figure
> out the others. Hope that you can help us, thanks!

Thanks for the clear reporting.
Scott

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Wed Oct 17 2007 - 06:07:54 PDT
Custom Search