Re: AMBER: Test fails in Parallel version Amber9

From: gong wb <bnmrcamber.gmail.com>
Date: Wed, 17 Oct 2007 15:54:30 +0800

Dear Scott,


On 10/17/07, Scott Brozell <sbrozell.scripps.edu> wrote:
> Hi,
>
> On Tue, 16 Oct 2007, gong wb wrote:
>
> > We have compiled serial and parallel version of Amber9. The test
> > of the serial version reported 4 possible errors, all of them are
> > round off errors. But the test of the parallel version gave other type
> > errors.
> > Our Installation Step (parallel version):
> > cd $AMBERHOME
> > cp bugfix.all ./
> > patch -p0 -N -r patch_rejects < ./bugfix.all
> > cd src
> > ./configure -static -lam -bintraj ifort_ia32
> > make clean
> > make parallel >& make_parallel.log
> >
> > We have checked the logfile and find no error message.
> > Here is the information about our operator system and compiler version
> > Operator system: Red Hat Linux 8
> > >uname -a
> > Linux nodeXX 2.4.18-26.7.xsmp #1 SMP Mon Feb 24 09:37:16 EST 2003 i686
> > i686 i386 GNU/Linux
> > Hardware:Parallel Cluster(each node have two cpus)
> > >cat /proc/cpuinfo
> > processor : 0
> > vendor_id : GenuineIntel
> > cpu family : 15
> > model : 2
> > model name : Intel(R) Xeon(TM) CPU 2.40GHz
> > stepping : 7
> > cpu MHz : 2392.217
> > cache size : 512 KB
> > Physical processor ID : 0
> > Number of siblings : 1
> > fdiv_bug : no
> > hlt_bug : no
> > f00f_bug : no
> > coma_bug : no
> > fpu : yes
> > fpu_exception : yes
> > cpuid level : 2
> > wp : yes
> > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> > mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
> > bogomips : 4750.85
> > (The other cpu is the same, so info omit)
> > Compiler version: Intel Fortran 8.1
> > >ifort -v
> > Version 8.1
> > >gcc -v
> > Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2/specs
> > Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
> > --infodir=/usr/share/info --enable-shared --enable-threads=posix
> > --disable-checking --host=i386-redhat-linux --with-system-zlib
> > --enable-__cxa_atexit
> > Thread model: posix
> > gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7)
> >
> > We use lam_mpi_7.1.3 for parallel calculation, it is compiled using the
> > same Intel C and Fortran compiler (Version 8.1)
> > The following is our test steps:
> > First, create a nodefile for lam, and the content is:
> > nodeXX cpu=2
> > nodeXX cpu=2
> > Then, set the DO_PARALLEL and start lamboot:
> > export DO_PARALLEL='/opt/lam_mpi_7.1.3/bin/mpirun -np 4'
> > /opt/lam_mpi_7.1.3/bin/lamboot -prefix /opt/lam_mpi_7.1.3 ./nodefile
> > make test.parallel >& test.log
> >
> > The DIFF file is:
> > possible FAILURE: check rms.dif
> > /public/amber9/test/bintraj
> > 1,10d0
> > < 1.00 0.
> > < 2.00 0.00998
> > < 3.00 0.02162
> > < 4.00 0.03381
> > < 5.00 0.04485
> > < 6.00 0.05458
> > < 7.00 0.06244
> > < 8.00 0.06958
> > < 9.00 0.07753
> > < 10.00 0.08657
>
> The diff shows the whole normal output.
> This indicates that file rms was not created.
> According to test/bintraj/Run.bintraj
> the problem may be with ptraj.
> Examine ptraj.out and try running the last steps of Run.bintraj
> involving ptraj manually.
>
> > ---------------------------------------
> > possible FAILURE: check mdout.jar.001.dif
> > /public/amber9/test/jar_multi
> > 177c177
> > < Etot = -3538.3785 EKtot = 478.2764 EPtot =
> > -4016.6548
> > ---
> > > Etot = -3538.3784 EKtot = 478.2764 EPtot =
> > -4016.6548
> > 180c180
> > < EELEC = -18.4200 EGB = -2503.6434 RESTRAINT =
> > 3.6286
> > ---
> > > EELEC = -18.4199 EGB = -2503.6434 RESTRAINT =
> > 3.6286
>
> Insignificant difs.
>
> > ---------------------------------------
> > possible FAILURE: check rem.log.dif
> > /public/amber9/test/rem_gb_4rep
> > 26c26
> > < 2 1.15 234.76 -3.24 300.00 400.00 0.80
> > ---
> > > 2 1.15 261.02 -4.61 300.00 400.00 0.80
> > ---------------------------------------
> > possible FAILURE: check reminfo.000.dif
> > /public/amber9/test/rem_gb_4rep
> > 16,20c16,20
> > < NSTEP = 100 TIME(PS) = 100.800 TEMP(K) = 234.76 PRESS
> > = 0.
> > < Etot = 21.0164 EKtot = 24.2585 EPtot =
> > -3.2421
> > < BOND = 14.3725 ANGLE = 19.8208 DIHED =
> > 25.4361
> > < 1-4 NB = 5.7103 1-4 EEL = 182.5250 VDWAALS =
> > -5.9319
> > < EELEC = -213.6574 EGB = -31.5175 RESTRAINT =
> > 0.
> > ---
> > > NSTEP = 100 TIME(PS) = 100.800 TEMP(K) = 261.02 PRESS
> > = 0.
> > > Etot = 22.3628 EKtot = 26.9719 EPtot =
> > -4.6092
> > > BOND = 14.9791 ANGLE = 17.9986 DIHED =
> > 25.4386
> > > 1-4 NB = 5.6257 1-4 EEL = 182.4183 VDWAALS =
> > -5.8981
> > > EELEC = -213.4152 EGB = -31.7563 RESTRAINT =
> > 0.
>
> The 2nd replica of the last exchange is different.
> This may be insignificant; is it reproducible ?
>
> > ---------------------------------------
> > There are four FAILUREs, one is round off error. But we cann't figure
> > out the others. Hope that you can help us, thanks!
>
> Thanks for the clear reporting.
> Scott
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Oct 21 2007 - 06:07:05 PDT
Custom Search