Re: [AMBER] NaN error in .rst files - UPDATE

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 27 Jan 2011 18:20:42 -0800

Hi Marek,

These failures are because your GTX470 doesn't have enough GPU memory to run
the nucleosome test cases. This is to be expected.

All the best
Ross

> -----Original Message-----
> From: Marek Maly [mailto:marek.maly.ujep.cz]
> Sent: Thursday, January 27, 2011 9:29 AM
> To: AMBER Mailing List
> Subject: Re: [AMBER] NaN error in .rst files - UPDATE
>
> Hello Jason,
>
> thanks for the comment regarding compiler.
>
> Regarding to pmemd.cuda tests:
>
> here is the result (detail listing is below):
>
> 53 file comparisons passed
> 0 file comparisons failed
> 5 tests experienced errors
>
> so I would say that my installation relatively well passed the tests.
>
> Moreover regarding to my explicit solvent simulations I did already lot
> of simulations here and compared obtained data to CPUs and also to Tesla
> results on the same molecular systems and it was OK.
>
> The only problem is here that two types of errors described sooner.
>
> I have to say that on the other hands I obtained many errors from
> pmemd.MPI tests (however compilation was OK),
> but about this I didn.t care much as I wanted to use just pmemd.cuda here.
>
> Best wishes,
>
> Marek
>
> Below is the full list of pmemd.cuda tests on our GTX 470 system.
>
>
> maly.physics ~ $ ssh sta-6
> Last login: Thu Jan 27 17:25:01 2011 from physics
> Have a lot of fun...
> mmaly.sta-6:~> cd _APPS/amber/test/
> mmaly.sta-6:~/_APPS/amber/test> ./test_amber_cuda.sh
> Using default GPU_ID = -1
> Using default PREC_MODEL = SPDP
> cd cuda && make -k test.pmemd.cuda GPU_ID=-1 PREC_MODEL=SPDP
> make[1]: Entering directory `/home/mmaly/_APPS/amber/test/cuda'
> ------------------------------------
> Running CUDA Implicit solvent tests.
> Precision Model = SPDP
> GPU_ID = -1
> ------------------------------------
> cd trpcage/ && ./Run_md_trpcage -1 SPDP netcdf.mod
> diffing trpcage_md.out.GPU_SPDP with trpcage_md.out
> PASSED
> ==========================================================
> ====
> cd myoglobin/ && ./Run_md_myoglobin -1 SPDP netcdf.mod
> diffing myoglobin_md.out.GPU_SPDP with myoglobin_md.out
> PASSED
> ==========================================================
> ====
> cd chamber/dhfr/ && ./Run.dhfr_charmm.md -1 SPDP netcdf.mod
> diffing mdout.dhfr_charmm_md.GPU_SPDP with mdout.dhfr_charmm_md
> PASSED
> ==========================================================
> ====
> cd chamber/dhfr/ && ./Run.dhfr_charmm.min -1 SPDP netcdf.mod
> diffing mdout.dhfr_charmm_min.GPU_SPDP with mdout.dhfr_charmm_min
> PASSED
> ==========================================================
> ====
> cd chamber/dhfr_cmap/ && ./Run.dhfr_charmm.md -1 SPDP netcdf.mod
> diffing mdout.dhfr_charmm_md.GPU_SPDP with mdout.dhfr_charmm_md
> PASSED
> ==========================================================
> ====
> cd chamber/dhfr_cmap/ && ./Run.dhfr_charmm.min -1 SPDP netcdf.mod
> diffing mdout.dhfr_charmm_min.GPU_SPDP with mdout.dhfr_charmm_min
> PASSED
> ==========================================================
> ====
> cd nucleosome/ && ./Run_min.1 -1 SPDP netcdf.mod
> cudaMalloc GpuBuffer::Allocate failed out of memory
> ./Run_min.1: Program error
> make[1]: *** [test.pmemd.cuda.gb] Error 1
> cd gb_ala3/ && ./Run.igb1_ntc1_min -1 SPDP netcdf.mod
> diffing igb1_ntc1_min.out.GPU_SPDP with igb1_ntc1_min.out
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest1_ntt0_igb1_ntc1 -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb1_ntc1.out.GPU_SPDP with irest1_ntt0_igb1_ntc1.out
> PASSED
> ==========================================================
> ====
> diffing irest1_ntt0_igb1_ntc1.rst.GPU_SPDP with irest1_ntt0_igb1_ntc1.rst
> PASSED
> ==========================================================
> ====
> diffing irest1_ntt0_igb1_ntc1.mdcrd.GPU_SPDP with
> irest1_ntt0_igb1_ntc1.mdcrd
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest0_ntt0_igb1_ntc1_hotstart -1 SPDP netcdf.mod
> diffing irest0_ntt0_igb1_ntc1_hotstart.out.GPU_SPDP with
> irest0_ntt0_igb1_ntc1_hotstart.out
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest0_ntt0_igb1_ntc1_coldstart -1 SPDP netcdf.mod
> diffing irest0_ntt0_igb1_ntc1_coldstart.out.GPU_SPDP with
> irest0_ntt0_igb1_ntc1_coldstart.out
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest1_ntt0_igb1_ntc2 -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb1_ntc2.out.GPU_SPDP with irest1_ntt0_igb1_ntc2.out
> PASSED
> ==========================================================
> ====
> diffing irest1_ntt0_igb1_ntc2.rst.GPU_SPDP with irest1_ntt0_igb1_ntc2.rst
> PASSED
> ==========================================================
> ====
> diffing irest1_ntt0_igb1_ntc2.mdcrd.GPU_SPDP with
> irest1_ntt0_igb1_ntc2.mdcrd
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest1_ntt0_igb2_ntc1 -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb2_ntc1.out.GPU_SPDP with irest1_ntt0_igb2_ntc1.out
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest1_ntt0_igb2_ntc2 -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb2_ntc2.out.GPU_SPDP with irest1_ntt0_igb2_ntc2.out
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc1 -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb5_ntc1.out.GPU_SPDP with irest1_ntt0_igb5_ntc1.out
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2 -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb5_ntc2.out.GPU_SPDP with irest1_ntt0_igb5_ntc2.out
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_ntr1 -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb5_ntc2_ntr1.out.GPU_SPDP with
> irest1_ntt0_igb5_ntc2_ntr1.out
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_saltcon -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb5_ntc2_saltcon.out.GPU_SPDP with
> irest1_ntt0_igb5_ntc2_saltcon.out
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_rgbmax -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb5_ntc2_rgbmax.out.GPU_SPDP with
> irest1_ntt0_igb5_ntc2_rgbmax.out
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_alpb -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb5_ntc2_alpb.out.GPU_SPDP with
> irest1_ntt0_igb5_ntc2_alpb.out
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest1_ntt1_igb1_ntc2 -1 SPDP netcdf.mod
> diffing irest1_ntt1_igb1_ntc2.out.GPU_SPDP with irest1_ntt1_igb1_ntc2.out
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest1_ntt2_igb1_ntc2 -1 SPDP netcdf.mod
> diffing irest1_ntt2_igb1_ntc2.out.GPU_SPDP with irest1_ntt2_igb1_ntc2.out
> PASSED
> ==========================================================
> ====
> cd gb_ala3/ && ./Run.irest1_ntt3_igb1_ntc2 -1 SPDP netcdf.mod
> diffing irest1_ntt3_igb1_ntc2.out.GPU_SPDP with irest1_ntt3_igb1_ntc2.out
> PASSED
> ==========================================================
> ====
> ------------------------------------
> Running CUDA Explicit solvent tests.
> Precision Model = SPDP
> GPU_ID = -1
> ------------------------------------
> cd 4096wat/ && ./Run.pure_wat -1 SPDP netcdf.mod
> diffing mdout.pure_wat.GPU_SPDP with mdout.pure_wat
> PASSED
> ==========================================================
> ====
> cd 4096wat/ && ./Run.vrand -1 SPDP netcdf.mod
> diffing mdout.vrand.GPU_SPDP with mdout.vrand
> PASSED
> ==========================================================
> ====
> cd 4096wat_oct/ && ./Run.pure_wat_oct_NVE -1 SPDP netcdf.mod
> diffing mdout.pure_wat_oct_nve.GPU_SPDP with
> mdout.pure_wat_oct_nve
> PASSED
> ==========================================================
> ====
> diffing mdcrd.pure_wat_oct_nve.GPU_SPDP with mdcrd.pure_wat_oct_nve
> PASSED
> ==========================================================
> ====
> cd 4096wat_oct/ && ./Run.pure_wat_oct_NVT_NTT1 -1 SPDP netcdf.mod
> diffing mdout.pure_wat_oct_nvt_ntt1.GPU_SPDP with
> mdout.pure_wat_oct_nvt_ntt1
> PASSED
> ==========================================================
> ====
> diffing mdcrd.pure_wat_oct_nvt_ntt1.GPU_SPDP with
> mdcrd.pure_wat_oct_nvt_ntt1
> PASSED
> ==========================================================
> ====
> cd 4096wat_oct/ && ./Run.pure_wat_oct_NVT_NTT2 -1 SPDP netcdf.mod
> diffing mdout.pure_wat_oct_nvt_ntt2.GPU_SPDP with
> mdout.pure_wat_oct_nvt_ntt2
> PASSED
> ==========================================================
> ====
> diffing mdcrd.pure_wat_oct_nvt_ntt2.GPU_SPDP with
> mdcrd.pure_wat_oct_nvt_ntt2
> PASSED
> ==========================================================
> ====
> cd 4096wat_oct/ && ./Run.pure_wat_oct_NVT_NTT3 -1 SPDP netcdf.mod
> diffing mdout.pure_wat_oct_nvt_ntt3.GPU_SPDP with
> mdout.pure_wat_oct_nvt_ntt3
> PASSED
> ==========================================================
> ====
> diffing mdcrd.pure_wat_oct_nvt_ntt3.GPU_SPDP with
> mdcrd.pure_wat_oct_nvt_ntt3
> PASSED
> ==========================================================
> ====
> cd 4096wat_oct/ && ./Run.pure_wat_oct_NPT_NTT1 -1 SPDP netcdf.mod
> diffing mdout.pure_wat_oct_npt_ntt1.GPU_SPDP with
> mdout.pure_wat_oct_npt_ntt1
> PASSED
> ==========================================================
> ====
> diffing mdcrd.pure_wat_oct_npt_ntt1.GPU_SPDP with
> mdcrd.pure_wat_oct_npt_ntt1
> PASSED
> ==========================================================
> ====
> cd jac/ && ./Run.jac -1 SPDP netcdf.mod
> diffing jac.out.GPU_SPDP with jac.out
> PASSED
> ==========================================================
> ====
> cd dhfr/ && ./Run.dhfr -1 SPDP netcdf.mod
> diffing mdout.dhfr.GPU_SPDP with mdout.dhfr
> PASSED
> ==========================================================
> ====
> cd dhfr/ && ./Run.dhfr.ntr1 -1 SPDP netcdf.mod
> diffing mdout.dhfr.ntr1.GPU_SPDP with mdout.dhfr.ntr1
> PASSED
> ==========================================================
> ====
> cd dhfr/ && ./Run.dhfr.ntb2 -1 SPDP netcdf.mod
> diffing mdout.dhfr.ntb2.GPU_SPDP with mdout.dhfr.ntb2
> PASSED
> ==========================================================
> ====
> cd dhfr/ && ./Run.dhfr.ntb2_ntt1 -1 SPDP netcdf.mod
> diffing mdout.dhfr.ntb2_ntt1.GPU_SPDP with mdout.dhfr.ntb2_ntt1
> PASSED
> ==========================================================
> ====
> cd dhfr/ && ./Run.dhfr.ntb2_ntt1_ntr1 -1 SPDP netcdf.mod
> diffing mdout.dhfr.ntb2_ntt1_ntr1.GPU_SPDP with
> mdout.dhfr.ntb2_ntt1_ntr1
> PASSED
> ==========================================================
> ====
> cd dhfr/ && ./Run.dhfr.ntb2_ntt3 -1 SPDP netcdf.mod
> diffing mdout.dhfr.ntb2_ntt3.GPU_SPDP with mdout.dhfr.ntb2_ntt3
> PASSED
> ==========================================================
> ====
> cd dhfr/ && ./Run.dhfr.min -1 SPDP netcdf.mod
> diffing mdout.dhfr.min.GPU_SPDP with mdout.dhfr.min
> PASSED
> ==========================================================
> ====
> cd dhfr/ && ./Run.dhfr.noshake -1 SPDP netcdf.mod
> diffing mdout.dhfr.noshake.GPU_SPDP with mdout.dhfr.noshake
> PASSED
> ==========================================================
> ====
> cd chamber/dhfr_pbc/ && ./Run.dhfr_pbc_charmm_noshake.md -1 SPDP
> netcdf.mod
> diffing mdout.dhfr_charmm_pbc_noshake_md.GPU_SPDP with
> mdout.dhfr_charmm_pbc_noshake_md
> PASSED
> ==========================================================
> ====
> cd chamber/dhfr_pbc/ && ./Run.dhfr_pbc_charmm_noshake.min -1 SPDP
> netcdf.mod
> diffing mdout.dhfr_charmm_pbc_noshake_min.GPU_SPDP with
> mdout.dhfr_charmm_pbc_noshake_min
> PASSED
> ==========================================================
> ====
> cd chamber/dhfr_cmap_pbc/ && ./Run.dhfr_cmap_pbc_charmm.md -1
> SPDP
> netcdf.mod
> diffing mdout.dhfr_charmm_pbc_md.GPU_SPDP with
> mdout.dhfr_charmm_pbc_md
> PASSED
> ==========================================================
> ====
> cd chamber/dhfr_cmap_pbc/ && ./Run.dhfr_cmap_pbc_charmm.min -1
> SPDP
> netcdf.mod
> diffing mdout.dhfr_charmm_pbc_min.GPU_SPDP with
> mdout.dhfr_charmm_pbc_min
> PASSED
> ==========================================================
> ====
> cd chamber/dhfr_cmap_pbc/ &&
> ./Run.dhfr_cmap_pbc_charmm_noshake.md -1 SPDP
> netcdf.mod
> diffing mdout.dhfr_charmm_pbc_noshake_md.GPU_SPDP with
> mdout.dhfr_charmm_pbc_noshake_md
> PASSED
> ==========================================================
> ====
> cd chamber/dhfr_cmap_pbc/ &&
> ./Run.dhfr_cmap_pbc_charmm_noshake.min -1
> SPDP netcdf.mod
> diffing mdout.dhfr_charmm_pbc_noshake_min.GPU_SPDP with
> mdout.dhfr_charmm_pbc_noshake_min
> PASSED
> ==========================================================
> ====
> make[1]: Target `test.pmemd.cuda' not remade because of errors.
> make[1]: Leaving directory `/home/mmaly/_APPS/amber/test/cuda'
> make: *** [test.pmemd.cuda] Error 2
> make: Target `test.serial.cuda' not remade because of errors.
> 53 file comparisons passed
> 0 file comparisons failed
> 5 tests experienced errors
> Test log file saved as logs/test_amber_cuda/2011-01-27_17-29-54.log
> No test diffs to save!
>
>
>
>
>
>
>
>
> Dne Thu, 27 Jan 2011 17:58:06 +0100 Jason Swails <jason.swails.gmail.com>
> napsal/-a:
>
> > Hello,
> >
> > As a note on GCC 4.5 -- it is now up to version 4.5.2 (stable). While
> > you
> > may be fine downgrading, you may also be fine upgrading as well (4.5.0
> > is a
> > very early release of a new compiler).
> >
> > However -- did the tests pass with your current install?
> >
> > 2011/1/27 Marek Maly <marek.maly.ujep.cz>
> >
> >> Hi Peker,
> >> thanks a lot !
> >>
> >> as you has also a little bit older gcc
> >> ( gcc 4.4.3. ) than me (gcc 4.5.0 ) on the other hand Cuda 3.2 as me,
> >> your result
> >> increased my suspicion that problem resides really in gcc version (
> >> simple
> >> too new for serious work :(( ).
> >> Anyway it is clear that the first thing which I have to try is to use
> >> some
> >> older gcc and recompile
> >> everything and see what will happen then ...
> >>
> >> Best wishes,
> >>
> >> Marek
> >>
> >>
> >>
> >>
> >> Dne Thu, 27 Jan 2011 17:09:42 +0100 peker milas
> <pekermilas.gmail.com>
> >> napsal/-a:
> >>
> >> > Hi again Marek,
> >> >
> >> > It finished 85000 step without giving me any NaN s ??? I wanted to
let
> >> > you know
> >> >
> >> > best
> >> > peker
> >> >
> >> > On Thu, Jan 27, 2011 at 10:42 AM, filip fratev
<filipfratev.yahoo.com>
> >> > wrote:
> >> >> Hi Marek,
> >> >> I performed 50 000 steps. You can find the outputs as an attache
> >> file.
> >> >> Update: Because the output files are about 2.7mb and attachment
> needs
> >> >> to be approved by moderator I am sending you the results privately
> >> too
> >> >> (just in case).
> >> >>
> >> >> My system:
> >> >> --------------------------------------
> >> >> Linux 2.6.34-12-desktop x86_64
> >> >> openSUSE 11.3 (x86_64)
> >> >> GeForce GTX 470
> >> >> NVIDIA 260.19.36
> >> >> AMD Phenom(tm) II X6 1090T Processor
> >> >> RAM: 7.8 GiB
> >> >> --------------------------------------
> >> >> Cuda 3.1, everything compiled by gcc (SUSE Linux) 4.3.4
> >> [gcc-4_3-branch
> >> >> revision 152973] due to described issues with CUDA and newer
> >> versions.
> >> >>
> >> >> Regards,
> >> >> Filip
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> AMBER mailing list
> >> >> AMBER.ambermd.org
> >> >> http://lists.ambermd.org/mailman/listinfo/amber
> >> >>
> >> >
> >> > _______________________________________________
> >> > AMBER mailing list
> >> > AMBER.ambermd.org
> >> > http://lists.ambermd.org/mailman/listinfo/amber
> >> >
> >> > __________ Informace od ESET NOD32 Antivirus, verze databaze 5824
> >> > (20110127) __________
> >> >
> >> > Tuto zpravu proveril ESET NOD32 Antivirus.
> >> >
> >> > http://www.eset.cz
> >> >
> >> >
> >> >
> >>
> >>
> >> --
> >> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> >> http://www.opera.com/mail/
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >
> >
> >
>
>
> --
> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> http://www.opera.com/mail/
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jan 27 2011 - 18:30:02 PST
Custom Search