Re: [AMBER] NaN error in .rst files - UPDATE

From: peker milas <pekermilas.gmail.com>
Date: Thu, 27 Jan 2011 13:28:55 -0500

Hi Marek,

Files as total is big for attaching them. But i believe you already
knew that. So i would like to send you a link that you can download
them. It is as below;

http://goldnerlab.physics.umass.edu/~peker/Marek_Maly/

As you will observe, i didn't have any NaN s or other failures during
the whole run. I hope i could help. Also our GPU is currently idle
because of this NaN errors. I mean i can't continue with my runs. So,
if there is a way of helping about debugging this issue, i definitely
can help and i would like to help.

best
peker


2011/1/27 Marek Maly <marek.maly.ujep.cz>:
> Hello Jason,
>
> thanks for the comment regarding compiler.
>
> Regarding to pmemd.cuda tests:
>
> here is the result (detail listing is below):
>
> 53 file comparisons passed
> 0 file comparisons failed
> 5 tests experienced errors
>
> so I would say that my installation relatively well passed the tests.
>
> Moreover regarding to my explicit solvent simulations I did already lot
> of simulations here and compared obtained data to CPUs and also to Tesla
> results on the same molecular systems and it was OK.
>
> The only problem is here that two types of errors described sooner.
>
> I have to say that on the other hands I obtained many errors from
> pmemd.MPI tests (however compilation was OK),
> but about this I didn.t care much as I wanted to use just pmemd.cuda here.
>
> Best wishes,
>
>    Marek
>
> Below is the full list of pmemd.cuda tests on our GTX 470 system.
>
>
> maly.physics ~ $ ssh sta-6
> Last login: Thu Jan 27 17:25:01 2011 from physics
> Have a lot of fun...
> mmaly.sta-6:~> cd _APPS/amber/test/
> mmaly.sta-6:~/_APPS/amber/test> ./test_amber_cuda.sh
> Using default GPU_ID = -1
> Using default PREC_MODEL = SPDP
> cd cuda && make -k test.pmemd.cuda GPU_ID=-1 PREC_MODEL=SPDP
> make[1]: Entering directory `/home/mmaly/_APPS/amber/test/cuda'
> ------------------------------------
> Running CUDA Implicit solvent tests.
>   Precision Model = SPDP
>            GPU_ID = -1
> ------------------------------------
> cd trpcage/ && ./Run_md_trpcage -1 SPDP netcdf.mod
> diffing trpcage_md.out.GPU_SPDP with trpcage_md.out
> PASSED
> ==============================================================
> cd myoglobin/ && ./Run_md_myoglobin -1 SPDP netcdf.mod
> diffing myoglobin_md.out.GPU_SPDP with myoglobin_md.out
> PASSED
> ==============================================================
> cd chamber/dhfr/ && ./Run.dhfr_charmm.md -1 SPDP netcdf.mod
> diffing mdout.dhfr_charmm_md.GPU_SPDP with mdout.dhfr_charmm_md
> PASSED
> ==============================================================
> cd chamber/dhfr/ && ./Run.dhfr_charmm.min -1 SPDP netcdf.mod
> diffing mdout.dhfr_charmm_min.GPU_SPDP with mdout.dhfr_charmm_min
> PASSED
> ==============================================================
> cd chamber/dhfr_cmap/ && ./Run.dhfr_charmm.md -1 SPDP netcdf.mod
> diffing mdout.dhfr_charmm_md.GPU_SPDP with mdout.dhfr_charmm_md
> PASSED
> ==============================================================
> cd chamber/dhfr_cmap/ && ./Run.dhfr_charmm.min -1 SPDP netcdf.mod
> diffing mdout.dhfr_charmm_min.GPU_SPDP with mdout.dhfr_charmm_min
> PASSED
> ==============================================================
> cd nucleosome/ && ./Run_min.1 -1 SPDP netcdf.mod
> cudaMalloc GpuBuffer::Allocate failed out of memory
>   ./Run_min.1:  Program error
> make[1]: *** [test.pmemd.cuda.gb] Error 1
> cd gb_ala3/ && ./Run.igb1_ntc1_min -1 SPDP netcdf.mod
> diffing igb1_ntc1_min.out.GPU_SPDP with igb1_ntc1_min.out
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest1_ntt0_igb1_ntc1 -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb1_ntc1.out.GPU_SPDP with irest1_ntt0_igb1_ntc1.out
> PASSED
> ==============================================================
> diffing irest1_ntt0_igb1_ntc1.rst.GPU_SPDP with irest1_ntt0_igb1_ntc1.rst
> PASSED
> ==============================================================
> diffing irest1_ntt0_igb1_ntc1.mdcrd.GPU_SPDP with
> irest1_ntt0_igb1_ntc1.mdcrd
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest0_ntt0_igb1_ntc1_hotstart -1 SPDP netcdf.mod
> diffing irest0_ntt0_igb1_ntc1_hotstart.out.GPU_SPDP with
> irest0_ntt0_igb1_ntc1_hotstart.out
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest0_ntt0_igb1_ntc1_coldstart -1 SPDP netcdf.mod
> diffing irest0_ntt0_igb1_ntc1_coldstart.out.GPU_SPDP with
> irest0_ntt0_igb1_ntc1_coldstart.out
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest1_ntt0_igb1_ntc2 -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb1_ntc2.out.GPU_SPDP with irest1_ntt0_igb1_ntc2.out
> PASSED
> ==============================================================
> diffing irest1_ntt0_igb1_ntc2.rst.GPU_SPDP with irest1_ntt0_igb1_ntc2.rst
> PASSED
> ==============================================================
> diffing irest1_ntt0_igb1_ntc2.mdcrd.GPU_SPDP with
> irest1_ntt0_igb1_ntc2.mdcrd
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest1_ntt0_igb2_ntc1 -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb2_ntc1.out.GPU_SPDP with irest1_ntt0_igb2_ntc1.out
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest1_ntt0_igb2_ntc2 -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb2_ntc2.out.GPU_SPDP with irest1_ntt0_igb2_ntc2.out
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc1 -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb5_ntc1.out.GPU_SPDP with irest1_ntt0_igb5_ntc1.out
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2 -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb5_ntc2.out.GPU_SPDP with irest1_ntt0_igb5_ntc2.out
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_ntr1 -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb5_ntc2_ntr1.out.GPU_SPDP with
> irest1_ntt0_igb5_ntc2_ntr1.out
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_saltcon -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb5_ntc2_saltcon.out.GPU_SPDP with
> irest1_ntt0_igb5_ntc2_saltcon.out
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_rgbmax -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb5_ntc2_rgbmax.out.GPU_SPDP with
> irest1_ntt0_igb5_ntc2_rgbmax.out
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_alpb -1 SPDP netcdf.mod
> diffing irest1_ntt0_igb5_ntc2_alpb.out.GPU_SPDP with
> irest1_ntt0_igb5_ntc2_alpb.out
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest1_ntt1_igb1_ntc2 -1 SPDP netcdf.mod
> diffing irest1_ntt1_igb1_ntc2.out.GPU_SPDP with irest1_ntt1_igb1_ntc2.out
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest1_ntt2_igb1_ntc2 -1 SPDP netcdf.mod
> diffing irest1_ntt2_igb1_ntc2.out.GPU_SPDP with irest1_ntt2_igb1_ntc2.out
> PASSED
> ==============================================================
> cd gb_ala3/ && ./Run.irest1_ntt3_igb1_ntc2 -1 SPDP netcdf.mod
> diffing irest1_ntt3_igb1_ntc2.out.GPU_SPDP with irest1_ntt3_igb1_ntc2.out
> PASSED
> ==============================================================
> ------------------------------------
> Running CUDA Explicit solvent tests.
>   Precision Model = SPDP
>            GPU_ID = -1
> ------------------------------------
> cd 4096wat/ && ./Run.pure_wat -1 SPDP netcdf.mod
> diffing mdout.pure_wat.GPU_SPDP with mdout.pure_wat
> PASSED
> ==============================================================
> cd 4096wat/ && ./Run.vrand -1 SPDP netcdf.mod
> diffing mdout.vrand.GPU_SPDP with mdout.vrand
> PASSED
> ==============================================================
> cd 4096wat_oct/ && ./Run.pure_wat_oct_NVE -1 SPDP netcdf.mod
> diffing mdout.pure_wat_oct_nve.GPU_SPDP with mdout.pure_wat_oct_nve
> PASSED
> ==============================================================
> diffing mdcrd.pure_wat_oct_nve.GPU_SPDP with mdcrd.pure_wat_oct_nve
> PASSED
> ==============================================================
> cd 4096wat_oct/ && ./Run.pure_wat_oct_NVT_NTT1 -1 SPDP netcdf.mod
> diffing mdout.pure_wat_oct_nvt_ntt1.GPU_SPDP with
> mdout.pure_wat_oct_nvt_ntt1
> PASSED
> ==============================================================
> diffing mdcrd.pure_wat_oct_nvt_ntt1.GPU_SPDP with
> mdcrd.pure_wat_oct_nvt_ntt1
> PASSED
> ==============================================================
> cd 4096wat_oct/ && ./Run.pure_wat_oct_NVT_NTT2 -1 SPDP netcdf.mod
> diffing mdout.pure_wat_oct_nvt_ntt2.GPU_SPDP with
> mdout.pure_wat_oct_nvt_ntt2
> PASSED
> ==============================================================
> diffing mdcrd.pure_wat_oct_nvt_ntt2.GPU_SPDP with
> mdcrd.pure_wat_oct_nvt_ntt2
> PASSED
> ==============================================================
> cd 4096wat_oct/ && ./Run.pure_wat_oct_NVT_NTT3 -1 SPDP netcdf.mod
> diffing mdout.pure_wat_oct_nvt_ntt3.GPU_SPDP with
> mdout.pure_wat_oct_nvt_ntt3
> PASSED
> ==============================================================
> diffing mdcrd.pure_wat_oct_nvt_ntt3.GPU_SPDP with
> mdcrd.pure_wat_oct_nvt_ntt3
> PASSED
> ==============================================================
> cd 4096wat_oct/ && ./Run.pure_wat_oct_NPT_NTT1 -1 SPDP netcdf.mod
> diffing mdout.pure_wat_oct_npt_ntt1.GPU_SPDP with
> mdout.pure_wat_oct_npt_ntt1
> PASSED
> ==============================================================
> diffing mdcrd.pure_wat_oct_npt_ntt1.GPU_SPDP with
> mdcrd.pure_wat_oct_npt_ntt1
> PASSED
> ==============================================================
> cd jac/ && ./Run.jac -1 SPDP netcdf.mod
> diffing jac.out.GPU_SPDP with jac.out
> PASSED
> ==============================================================
> cd dhfr/ && ./Run.dhfr -1 SPDP netcdf.mod
> diffing mdout.dhfr.GPU_SPDP with mdout.dhfr
> PASSED
> ==============================================================
> cd dhfr/ && ./Run.dhfr.ntr1 -1 SPDP netcdf.mod
> diffing mdout.dhfr.ntr1.GPU_SPDP with mdout.dhfr.ntr1
> PASSED
> ==============================================================
> cd dhfr/ && ./Run.dhfr.ntb2 -1 SPDP netcdf.mod
> diffing mdout.dhfr.ntb2.GPU_SPDP with mdout.dhfr.ntb2
> PASSED
> ==============================================================
> cd dhfr/ && ./Run.dhfr.ntb2_ntt1 -1 SPDP netcdf.mod
> diffing mdout.dhfr.ntb2_ntt1.GPU_SPDP with mdout.dhfr.ntb2_ntt1
> PASSED
> ==============================================================
> cd dhfr/ && ./Run.dhfr.ntb2_ntt1_ntr1 -1 SPDP netcdf.mod
> diffing mdout.dhfr.ntb2_ntt1_ntr1.GPU_SPDP with mdout.dhfr.ntb2_ntt1_ntr1
> PASSED
> ==============================================================
> cd dhfr/ && ./Run.dhfr.ntb2_ntt3 -1 SPDP netcdf.mod
> diffing mdout.dhfr.ntb2_ntt3.GPU_SPDP with mdout.dhfr.ntb2_ntt3
> PASSED
> ==============================================================
> cd dhfr/ && ./Run.dhfr.min -1 SPDP netcdf.mod
> diffing mdout.dhfr.min.GPU_SPDP with mdout.dhfr.min
> PASSED
> ==============================================================
> cd dhfr/ && ./Run.dhfr.noshake -1 SPDP netcdf.mod
> diffing mdout.dhfr.noshake.GPU_SPDP with mdout.dhfr.noshake
> PASSED
> ==============================================================
> cd chamber/dhfr_pbc/ && ./Run.dhfr_pbc_charmm_noshake.md -1 SPDP netcdf.mod
> diffing mdout.dhfr_charmm_pbc_noshake_md.GPU_SPDP with
> mdout.dhfr_charmm_pbc_noshake_md
> PASSED
> ==============================================================
> cd chamber/dhfr_pbc/ && ./Run.dhfr_pbc_charmm_noshake.min -1 SPDP
> netcdf.mod
> diffing mdout.dhfr_charmm_pbc_noshake_min.GPU_SPDP with
> mdout.dhfr_charmm_pbc_noshake_min
> PASSED
> ==============================================================
> cd chamber/dhfr_cmap_pbc/ && ./Run.dhfr_cmap_pbc_charmm.md -1 SPDP
> netcdf.mod
> diffing mdout.dhfr_charmm_pbc_md.GPU_SPDP with mdout.dhfr_charmm_pbc_md
> PASSED
> ==============================================================
> cd chamber/dhfr_cmap_pbc/ && ./Run.dhfr_cmap_pbc_charmm.min -1 SPDP
> netcdf.mod
> diffing mdout.dhfr_charmm_pbc_min.GPU_SPDP with mdout.dhfr_charmm_pbc_min
> PASSED
> ==============================================================
> cd chamber/dhfr_cmap_pbc/ && ./Run.dhfr_cmap_pbc_charmm_noshake.md -1 SPDP
> netcdf.mod
> diffing mdout.dhfr_charmm_pbc_noshake_md.GPU_SPDP with
> mdout.dhfr_charmm_pbc_noshake_md
> PASSED
> ==============================================================
> cd chamber/dhfr_cmap_pbc/ && ./Run.dhfr_cmap_pbc_charmm_noshake.min -1
> SPDP netcdf.mod
> diffing mdout.dhfr_charmm_pbc_noshake_min.GPU_SPDP with
> mdout.dhfr_charmm_pbc_noshake_min
> PASSED
> ==============================================================
> make[1]: Target `test.pmemd.cuda' not remade because of errors.
> make[1]: Leaving directory `/home/mmaly/_APPS/amber/test/cuda'
> make: *** [test.pmemd.cuda] Error 2
> make: Target `test.serial.cuda' not remade because of errors.
> 53 file comparisons passed
> 0 file comparisons failed
> 5 tests experienced errors
> Test log file saved as logs/test_amber_cuda/2011-01-27_17-29-54.log
> No test diffs to save!
>
>
>
>
>
>
>
>
> Dne Thu, 27 Jan 2011 17:58:06 +0100 Jason Swails <jason.swails.gmail.com>
> napsal/-a:
>
>> Hello,
>>
>> As a note on GCC 4.5 -- it is now up to version 4.5.2 (stable).  While
>> you
>> may be fine downgrading, you may also be fine upgrading as well (4.5.0
>> is a
>> very early release of a new compiler).
>>
>> However -- did the tests pass with your current install?
>>
>> 2011/1/27 Marek Maly <marek.maly.ujep.cz>
>>
>>> Hi Peker,
>>> thanks a lot !
>>>
>>> as you has also a little bit older gcc
>>> ( gcc  4.4.3. ) than me (gcc 4.5.0 ) on the other hand Cuda 3.2 as me,
>>> your result
>>> increased my suspicion that problem resides really in gcc version (
>>> simple
>>> too new for serious work :(( ).
>>> Anyway it is clear that the first thing which I have to try is to use
>>> some
>>> older gcc and recompile
>>> everything and see what will happen then ...
>>>
>>> Best wishes,
>>>
>>>     Marek
>>>
>>>
>>>
>>>
>>> Dne Thu, 27 Jan 2011 17:09:42 +0100 peker milas <pekermilas.gmail.com>
>>> napsal/-a:
>>>
>>> > Hi again Marek,
>>> >
>>> > It finished 85000 step without giving me any NaN s ??? I wanted to let
>>> > you know
>>> >
>>> > best
>>> > peker
>>> >
>>> > On Thu, Jan 27, 2011 at 10:42 AM, filip fratev <filipfratev.yahoo.com>
>>> > wrote:
>>> >> Hi Marek,
>>> >> I performed 50 000 steps. You can find the outputs as an attache
>>> file.
>>> >> Update: Because the output files are about 2.7mb and attachment needs
>>> >> to be approved by moderator I am sending you the results privately
>>> too
>>> >> (just in case).
>>> >>
>>> >> My system:
>>> >> --------------------------------------
>>> >> Linux 2.6.34-12-desktop x86_64
>>> >> openSUSE 11.3 (x86_64)
>>> >> GeForce GTX 470
>>> >> NVIDIA 260.19.36
>>> >> AMD Phenom(tm) II X6 1090T Processor
>>> >> RAM:  7.8 GiB
>>> >> --------------------------------------
>>> >> Cuda 3.1, everything compiled by gcc (SUSE Linux) 4.3.4
>>> [gcc-4_3-branch
>>> >> revision 152973] due to described issues with CUDA and newer
>>> versions.
>>> >>
>>> >> Regards,
>>> >> Filip
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> AMBER mailing list
>>> >> AMBER.ambermd.org
>>> >> http://lists.ambermd.org/mailman/listinfo/amber
>>> >>
>>> >
>>> > _______________________________________________
>>> > AMBER mailing list
>>> > AMBER.ambermd.org
>>> > http://lists.ambermd.org/mailman/listinfo/amber
>>> >
>>> > __________ Informace od ESET NOD32 Antivirus, verze databaze 5824
>>> > (20110127) __________
>>> >
>>> > Tuto zpravu proveril ESET NOD32 Antivirus.
>>> >
>>> > http://www.eset.cz
>>> >
>>> >
>>> >
>>>
>>>
>>> --
>>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>>> http://www.opera.com/mail/
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
>>
>
>
> --
> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> http://www.opera.com/mail/
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jan 27 2011 - 10:30:04 PST
Custom Search