Re: [AMBER] Issues compiling Amber with CUDA support

From: Baker, Joseph <bakerj.tcnj.edu>
Date: Sun, 28 Aug 2016 00:48:21 -0400

Hi Ross,

Yep, random velocities actually get set at step 5 (I just posted step 20).
When I look at the two mdout files they are identical to that step, and
then pick up the difference after the randomizing of velocities. Thanks!

Joe


------
Joseph Baker, PhD
Assistant Professor
Department of Chemistry
C101 Science Complex
The College of New Jersey
Ewing, NJ 08628
Phone: (609) 771-3173
Web: http://bakerj.pages.tcnj.edu/


On Sat, Aug 27, 2016 at 6:53 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Joe,
>
> I would say yes - except for the fact that this test is with vrand and the
> difference occurs at step 20. I bet if you actually compare the two output
> files (rather than the diff which doesn't tell you the whole story) you'll
> find that they match up to step 19 and then it says something along the
> line of randomizing velocities (anderson thermostat) and then the
> difference occurs after that. That just means the random number stream is
> different between the two runs - likely because the GPU and CUDA version he
> used produces a different stream of random numbers from curand than the
> test systems I used. Unfortunately this is a pain with anything that uses
> random number streams and not easy to avoid. If such difference occurred in
> the absence of random number use - or prior to the random number generator
> being called then I'd be concerned.
>
> All the best
> Ross
>
> > On Aug 27, 2016, at 1:10 PM, Baker, Joseph <bakerj.tcnj.edu> wrote:
> >
> > Hi Ross,
> >
> > I just checked the diff logs that Shawn posted at his link (we are in the
> > process of getting Amber installed on a new cluster at TCNJ) and some of
> > the differences we are seeing are a bit bigger. For example below are the
> > observed differences large enough to be concerning? Thanks, Joe
> >
> > possible FAILURE: check mdout.vrand.dif
> > /opt/tcnjhpc/amberdev/amber16/test/cuda/4096wat
> >
> > < NSTEP = 20 TIME(PS) = 1.020 TEMP(K) = 298.31 PRESS
> = 0.
> >> NSTEP = 20 TIME(PS) = 1.020 TEMP(K) = 296.70 PRESS =
> 0.
> > 212c212
> >
> > < Etot = -32126.2133 EKtot = 7283.3078 EPtot =
> > -39409.5211
> >> Etot = -32067.4187 EKtot = 7244.0343 EPtot =
> -39311.4530
> > 214c214
> >
> > < 1-4 NB = 0. 1-4 EEL = 0. VDWAALS = 6013.9630
> >> 1-4 NB = 0. 1-4 EEL = 0. VDWAALS = 6025.7161
> > 215c215
> >
> > < EELEC = -45423.4842 EHBOND = 0. RESTRAINT = 0.
> >> EELEC = -45337.1691 EHBOND = 0. RESTRAINT = 0.
> >
> >
> >
> > ------
> > Joseph Baker, PhD
> > Assistant Professor
> > Department of Chemistry
> > C101 Science Complex
> > The College of New Jersey
> > Ewing, NJ 08628
> > Phone: (609) 771-3173
> > Web: http://bakerj.pages.tcnj.edu/
> >
> >
> > On Thu, Aug 25, 2016 at 7:30 AM, Ross Walker <ross.rosswalker.co.uk>
> wrote:
> >
> >> Hi Shawn,
> >>
> >> Assuming the differences are all similar to the cellulose example you
> show
> >> below where they differ only on a few lines and only by a very small
> amount
> >> it is innocuous and you can ignore it. I've never tested on a Quadro NVS
> >> 510 GPU but I'd expect it to work fine - although it is possible some
> tests
> >> may fail due to being out of memory - it depends on how much is on the
> card.
> >>
> >> Other than that though the minor differences can be ignored.
> >>
> >> All the best
> >> Ross
> >>
> >>> On Aug 24, 2016, at 14:47, Sivy, Shawn <ssivy.tcnj.edu> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I’m hoping someone could help me with compiling the Amber software on
> our
> >>> systems. I have an older model Intel Xeon 5500 server with an NVIDIA
> >>> Quadro NVS 510 (GK107) GPU card in it. I’ve successfully compiled the
> >>> serial and parallel versions of Amber using gcc 4.8.5. I’m trying to
> >>> compile the serial CUDA version now. The “make install” builds without
> >>> errors, but the “make test” gets “possible failures” on 9 comparisons.
> >>> I’ve tried building with CUDA 7.5 and CUDA 8.0 resulting in the same
> >>> failures. Does anyone have a suggestion on what to try next? I’m
> >> assuming
> >>> “make test” needs to run without any errors. I suspect maybe the GPU
> I’m
> >>> using doesn’t have the precision that a Tesla or recent GTX card has.
> >>>
> >>>
> >>> Below is some sample output from the “make install” and the resulting
> log
> >>> and diff files. The complete contents of the log and diff files can be
> >>> found at https://www.tcnj.edu/~ssivy/amberlogs/
> >>>
> >>> Thanks in advance for any assistance.
> >>>
> >>> ...
> >>>
> >>> make[2]: Leaving directory `/opt/tcnjhpc/amberdev/amber16/test'
> >>> 130 file comparisons passed
> >>> 9 file comparisons failed
> >>> 0 tests experienced errors
> >>> Test log file saved as /opt/tcnjhpc/amberdev/amber16/
> >>> logs/test_amber_cuda/2016-08-24_11-35-41.log
> >>> Test diffs file saved as /opt/tcnjhpc/amberdev/amber16/
> >>> logs/test_amber_cuda/2016-08-24_11-35-41.diff
> >>> make[1]: Leaving directory `/opt/tcnjhpc/amberdev/amber16/test'
> >>>
> >>>
> >>> Example from log file:
> >>>
> >>> ==============================================================
> >>> cd cellulose/ && ./Run.cellulose_nvt DPFP
> /opt/tcnjhpc/amberdev/amber16/
> >>> include/netcdf.mod
> >>> diffing mdout.cellulose_nvt.GPU_DPFP with mdout.cellulose_nvt
> >>> possible FAILURE: check mdout.cellulose_nvt.dif
> >>> ==============================================================
> >>>
> >>> Example from diff file:
> >>>
> >>> possible FAILURE: check mdout.cellulose_nvt.dif
> >>>
> >>> /opt/tcnjhpc/amberdev/amber16/test/cuda/cellulose
> >>>
> >>> 212c212
> >>>
> >>> < Etot = 5.8651 EKtot = 273.2191 EPtot =
> >>> 276.4278
> >>>
> >>>> Etot = 5.8650 EKtot = 273.2191 EPtot =
> >>> 276.4278
> >>>
> >>> ### Maximum absolute error in matching lines = 1.00e-04 at line 212
> >> field 3
> >>>
> >>> ### Maximum relative error in matching lines = 1.71e-05 at line 212
> >> field 3
> >>>
> >>> ---------------------------------------
> >>>
> >>>
> >>> ------------------------------
> >>> [image: The College of New Jersey] <http://tcnj.pages.tcnj.edu/> Shawn
> >> Sivy
> >>> HPC System Administrator
> >>> School of Science
> >>> PO Box 7718 Ewing, NJ 08628-0718
> >>> 609-771-3475
> >>> ssivy.tcnj.edu <email.tcnj.edu>
> >>> _______________________________________________
> >>> AMBER mailing list
> >>> AMBER.ambermd.org
> >>> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Aug 27 2016 - 22:00:02 PDT
Custom Search