Re: [AMBER] NaN error in .rst files - UPDATE

From: Marek Maly <marek.maly.ujep.cz>
Date: Thu, 27 Jan 2011 19:46:17 +0100

OK thanks again,

for me is just important the info, that you didn.t obtain any
NaNs using my input files which are able to reproduce NaNs on my PCs.

For this moment I have no clear advice for you. I think that only think we
can
do for this moment is try to recompile with different gcc version
(or that older from Filip : gcc (SUSE Linux) 4.3.4 ) or as Jason suggested
the newest one 4.5.x and if this doesn.t help so to go from Cuda 3.2 back
to 3.1 to
mimic whole Filips config:

Cuda 3.1, everything compiled by gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch
revision 152973]

Sorry I really don.t know, for the moment ...

Best,

    Marek



Dne Thu, 27 Jan 2011 19:28:55 +0100 peker milas <pekermilas.gmail.com>
napsal/-a:

> Hi Marek,
>
> Files as total is big for attaching them. But i believe you already
> knew that. So i would like to send you a link that you can download
> them. It is as below;
>
> http://goldnerlab.physics.umass.edu/~peker/Marek_Maly/
>
> As you will observe, i didn't have any NaN s or other failures during
> the whole run. I hope i could help. Also our GPU is currently idle
> because of this NaN errors. I mean i can't continue with my runs. So,
> if there is a way of helping about debugging this issue, i definitely
> can help and i would like to help.
>
> best
> peker
>
>
> 2011/1/27 Marek Maly <marek.maly.ujep.cz>:
>> Hello Jason,
>>
>> thanks for the comment regarding compiler.
>>
>> Regarding to pmemd.cuda tests:
>>
>> here is the result (detail listing is below):
>>
>> 53 file comparisons passed
>> 0 file comparisons failed
>> 5 tests experienced errors
>>
>> so I would say that my installation relatively well passed the tests.
>>
>> Moreover regarding to my explicit solvent simulations I did already lot
>> of simulations here and compared obtained data to CPUs and also to Tesla
>> results on the same molecular systems and it was OK.
>>
>> The only problem is here that two types of errors described sooner.
>>
>> I have to say that on the other hands I obtained many errors from
>> pmemd.MPI tests (however compilation was OK),
>> but about this I didn.t care much as I wanted to use just pmemd.cuda
>> here.
>>
>> Best wishes,
>>
>> Marek
>>
>> Below is the full list of pmemd.cuda tests on our GTX 470 system.
>>
>>
>> maly.physics ~ $ ssh sta-6
>> Last login: Thu Jan 27 17:25:01 2011 from physics
>> Have a lot of fun...
>> mmaly.sta-6:~> cd _APPS/amber/test/
>> mmaly.sta-6:~/_APPS/amber/test> ./test_amber_cuda.sh
>> Using default GPU_ID = -1
>> Using default PREC_MODEL = SPDP
>> cd cuda && make -k test.pmemd.cuda GPU_ID=-1 PREC_MODEL=SPDP
>> make[1]: Entering directory `/home/mmaly/_APPS/amber/test/cuda'
>> ------------------------------------
>> Running CUDA Implicit solvent tests.
>> Precision Model = SPDP
>> GPU_ID = -1
>> ------------------------------------
>> cd trpcage/ && ./Run_md_trpcage -1 SPDP netcdf.mod
>> diffing trpcage_md.out.GPU_SPDP with trpcage_md.out
>> PASSED
>> ==============================================================
>> cd myoglobin/ && ./Run_md_myoglobin -1 SPDP netcdf.mod
>> diffing myoglobin_md.out.GPU_SPDP with myoglobin_md.out
>> PASSED
>> ==============================================================
>> cd chamber/dhfr/ && ./Run.dhfr_charmm.md -1 SPDP netcdf.mod
>> diffing mdout.dhfr_charmm_md.GPU_SPDP with mdout.dhfr_charmm_md
>> PASSED
>> ==============================================================
>> cd chamber/dhfr/ && ./Run.dhfr_charmm.min -1 SPDP netcdf.mod
>> diffing mdout.dhfr_charmm_min.GPU_SPDP with mdout.dhfr_charmm_min
>> PASSED
>> ==============================================================
>> cd chamber/dhfr_cmap/ && ./Run.dhfr_charmm.md -1 SPDP netcdf.mod
>> diffing mdout.dhfr_charmm_md.GPU_SPDP with mdout.dhfr_charmm_md
>> PASSED
>> ==============================================================
>> cd chamber/dhfr_cmap/ && ./Run.dhfr_charmm.min -1 SPDP netcdf.mod
>> diffing mdout.dhfr_charmm_min.GPU_SPDP with mdout.dhfr_charmm_min
>> PASSED
>> ==============================================================
>> cd nucleosome/ && ./Run_min.1 -1 SPDP netcdf.mod
>> cudaMalloc GpuBuffer::Allocate failed out of memory
>> ./Run_min.1: Program error
>> make[1]: *** [test.pmemd.cuda.gb] Error 1
>> cd gb_ala3/ && ./Run.igb1_ntc1_min -1 SPDP netcdf.mod
>> diffing igb1_ntc1_min.out.GPU_SPDP with igb1_ntc1_min.out
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest1_ntt0_igb1_ntc1 -1 SPDP netcdf.mod
>> diffing irest1_ntt0_igb1_ntc1.out.GPU_SPDP with
>> irest1_ntt0_igb1_ntc1.out
>> PASSED
>> ==============================================================
>> diffing irest1_ntt0_igb1_ntc1.rst.GPU_SPDP with
>> irest1_ntt0_igb1_ntc1.rst
>> PASSED
>> ==============================================================
>> diffing irest1_ntt0_igb1_ntc1.mdcrd.GPU_SPDP with
>> irest1_ntt0_igb1_ntc1.mdcrd
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest0_ntt0_igb1_ntc1_hotstart -1 SPDP netcdf.mod
>> diffing irest0_ntt0_igb1_ntc1_hotstart.out.GPU_SPDP with
>> irest0_ntt0_igb1_ntc1_hotstart.out
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest0_ntt0_igb1_ntc1_coldstart -1 SPDP netcdf.mod
>> diffing irest0_ntt0_igb1_ntc1_coldstart.out.GPU_SPDP with
>> irest0_ntt0_igb1_ntc1_coldstart.out
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest1_ntt0_igb1_ntc2 -1 SPDP netcdf.mod
>> diffing irest1_ntt0_igb1_ntc2.out.GPU_SPDP with
>> irest1_ntt0_igb1_ntc2.out
>> PASSED
>> ==============================================================
>> diffing irest1_ntt0_igb1_ntc2.rst.GPU_SPDP with
>> irest1_ntt0_igb1_ntc2.rst
>> PASSED
>> ==============================================================
>> diffing irest1_ntt0_igb1_ntc2.mdcrd.GPU_SPDP with
>> irest1_ntt0_igb1_ntc2.mdcrd
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest1_ntt0_igb2_ntc1 -1 SPDP netcdf.mod
>> diffing irest1_ntt0_igb2_ntc1.out.GPU_SPDP with
>> irest1_ntt0_igb2_ntc1.out
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest1_ntt0_igb2_ntc2 -1 SPDP netcdf.mod
>> diffing irest1_ntt0_igb2_ntc2.out.GPU_SPDP with
>> irest1_ntt0_igb2_ntc2.out
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc1 -1 SPDP netcdf.mod
>> diffing irest1_ntt0_igb5_ntc1.out.GPU_SPDP with
>> irest1_ntt0_igb5_ntc1.out
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2 -1 SPDP netcdf.mod
>> diffing irest1_ntt0_igb5_ntc2.out.GPU_SPDP with
>> irest1_ntt0_igb5_ntc2.out
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_ntr1 -1 SPDP netcdf.mod
>> diffing irest1_ntt0_igb5_ntc2_ntr1.out.GPU_SPDP with
>> irest1_ntt0_igb5_ntc2_ntr1.out
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_saltcon -1 SPDP netcdf.mod
>> diffing irest1_ntt0_igb5_ntc2_saltcon.out.GPU_SPDP with
>> irest1_ntt0_igb5_ntc2_saltcon.out
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_rgbmax -1 SPDP netcdf.mod
>> diffing irest1_ntt0_igb5_ntc2_rgbmax.out.GPU_SPDP with
>> irest1_ntt0_igb5_ntc2_rgbmax.out
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_alpb -1 SPDP netcdf.mod
>> diffing irest1_ntt0_igb5_ntc2_alpb.out.GPU_SPDP with
>> irest1_ntt0_igb5_ntc2_alpb.out
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest1_ntt1_igb1_ntc2 -1 SPDP netcdf.mod
>> diffing irest1_ntt1_igb1_ntc2.out.GPU_SPDP with
>> irest1_ntt1_igb1_ntc2.out
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest1_ntt2_igb1_ntc2 -1 SPDP netcdf.mod
>> diffing irest1_ntt2_igb1_ntc2.out.GPU_SPDP with
>> irest1_ntt2_igb1_ntc2.out
>> PASSED
>> ==============================================================
>> cd gb_ala3/ && ./Run.irest1_ntt3_igb1_ntc2 -1 SPDP netcdf.mod
>> diffing irest1_ntt3_igb1_ntc2.out.GPU_SPDP with
>> irest1_ntt3_igb1_ntc2.out
>> PASSED
>> ==============================================================
>> ------------------------------------
>> Running CUDA Explicit solvent tests.
>> Precision Model = SPDP
>> GPU_ID = -1
>> ------------------------------------
>> cd 4096wat/ && ./Run.pure_wat -1 SPDP netcdf.mod
>> diffing mdout.pure_wat.GPU_SPDP with mdout.pure_wat
>> PASSED
>> ==============================================================
>> cd 4096wat/ && ./Run.vrand -1 SPDP netcdf.mod
>> diffing mdout.vrand.GPU_SPDP with mdout.vrand
>> PASSED
>> ==============================================================
>> cd 4096wat_oct/ && ./Run.pure_wat_oct_NVE -1 SPDP netcdf.mod
>> diffing mdout.pure_wat_oct_nve.GPU_SPDP with mdout.pure_wat_oct_nve
>> PASSED
>> ==============================================================
>> diffing mdcrd.pure_wat_oct_nve.GPU_SPDP with mdcrd.pure_wat_oct_nve
>> PASSED
>> ==============================================================
>> cd 4096wat_oct/ && ./Run.pure_wat_oct_NVT_NTT1 -1 SPDP netcdf.mod
>> diffing mdout.pure_wat_oct_nvt_ntt1.GPU_SPDP with
>> mdout.pure_wat_oct_nvt_ntt1
>> PASSED
>> ==============================================================
>> diffing mdcrd.pure_wat_oct_nvt_ntt1.GPU_SPDP with
>> mdcrd.pure_wat_oct_nvt_ntt1
>> PASSED
>> ==============================================================
>> cd 4096wat_oct/ && ./Run.pure_wat_oct_NVT_NTT2 -1 SPDP netcdf.mod
>> diffing mdout.pure_wat_oct_nvt_ntt2.GPU_SPDP with
>> mdout.pure_wat_oct_nvt_ntt2
>> PASSED
>> ==============================================================
>> diffing mdcrd.pure_wat_oct_nvt_ntt2.GPU_SPDP with
>> mdcrd.pure_wat_oct_nvt_ntt2
>> PASSED
>> ==============================================================
>> cd 4096wat_oct/ && ./Run.pure_wat_oct_NVT_NTT3 -1 SPDP netcdf.mod
>> diffing mdout.pure_wat_oct_nvt_ntt3.GPU_SPDP with
>> mdout.pure_wat_oct_nvt_ntt3
>> PASSED
>> ==============================================================
>> diffing mdcrd.pure_wat_oct_nvt_ntt3.GPU_SPDP with
>> mdcrd.pure_wat_oct_nvt_ntt3
>> PASSED
>> ==============================================================
>> cd 4096wat_oct/ && ./Run.pure_wat_oct_NPT_NTT1 -1 SPDP netcdf.mod
>> diffing mdout.pure_wat_oct_npt_ntt1.GPU_SPDP with
>> mdout.pure_wat_oct_npt_ntt1
>> PASSED
>> ==============================================================
>> diffing mdcrd.pure_wat_oct_npt_ntt1.GPU_SPDP with
>> mdcrd.pure_wat_oct_npt_ntt1
>> PASSED
>> ==============================================================
>> cd jac/ && ./Run.jac -1 SPDP netcdf.mod
>> diffing jac.out.GPU_SPDP with jac.out
>> PASSED
>> ==============================================================
>> cd dhfr/ && ./Run.dhfr -1 SPDP netcdf.mod
>> diffing mdout.dhfr.GPU_SPDP with mdout.dhfr
>> PASSED
>> ==============================================================
>> cd dhfr/ && ./Run.dhfr.ntr1 -1 SPDP netcdf.mod
>> diffing mdout.dhfr.ntr1.GPU_SPDP with mdout.dhfr.ntr1
>> PASSED
>> ==============================================================
>> cd dhfr/ && ./Run.dhfr.ntb2 -1 SPDP netcdf.mod
>> diffing mdout.dhfr.ntb2.GPU_SPDP with mdout.dhfr.ntb2
>> PASSED
>> ==============================================================
>> cd dhfr/ && ./Run.dhfr.ntb2_ntt1 -1 SPDP netcdf.mod
>> diffing mdout.dhfr.ntb2_ntt1.GPU_SPDP with mdout.dhfr.ntb2_ntt1
>> PASSED
>> ==============================================================
>> cd dhfr/ && ./Run.dhfr.ntb2_ntt1_ntr1 -1 SPDP netcdf.mod
>> diffing mdout.dhfr.ntb2_ntt1_ntr1.GPU_SPDP with
>> mdout.dhfr.ntb2_ntt1_ntr1
>> PASSED
>> ==============================================================
>> cd dhfr/ && ./Run.dhfr.ntb2_ntt3 -1 SPDP netcdf.mod
>> diffing mdout.dhfr.ntb2_ntt3.GPU_SPDP with mdout.dhfr.ntb2_ntt3
>> PASSED
>> ==============================================================
>> cd dhfr/ && ./Run.dhfr.min -1 SPDP netcdf.mod
>> diffing mdout.dhfr.min.GPU_SPDP with mdout.dhfr.min
>> PASSED
>> ==============================================================
>> cd dhfr/ && ./Run.dhfr.noshake -1 SPDP netcdf.mod
>> diffing mdout.dhfr.noshake.GPU_SPDP with mdout.dhfr.noshake
>> PASSED
>> ==============================================================
>> cd chamber/dhfr_pbc/ && ./Run.dhfr_pbc_charmm_noshake.md -1 SPDP
>> netcdf.mod
>> diffing mdout.dhfr_charmm_pbc_noshake_md.GPU_SPDP with
>> mdout.dhfr_charmm_pbc_noshake_md
>> PASSED
>> ==============================================================
>> cd chamber/dhfr_pbc/ && ./Run.dhfr_pbc_charmm_noshake.min -1 SPDP
>> netcdf.mod
>> diffing mdout.dhfr_charmm_pbc_noshake_min.GPU_SPDP with
>> mdout.dhfr_charmm_pbc_noshake_min
>> PASSED
>> ==============================================================
>> cd chamber/dhfr_cmap_pbc/ && ./Run.dhfr_cmap_pbc_charmm.md -1 SPDP
>> netcdf.mod
>> diffing mdout.dhfr_charmm_pbc_md.GPU_SPDP with mdout.dhfr_charmm_pbc_md
>> PASSED
>> ==============================================================
>> cd chamber/dhfr_cmap_pbc/ && ./Run.dhfr_cmap_pbc_charmm.min -1 SPDP
>> netcdf.mod
>> diffing mdout.dhfr_charmm_pbc_min.GPU_SPDP with
>> mdout.dhfr_charmm_pbc_min
>> PASSED
>> ==============================================================
>> cd chamber/dhfr_cmap_pbc/ && ./Run.dhfr_cmap_pbc_charmm_noshake.md -1
>> SPDP
>> netcdf.mod
>> diffing mdout.dhfr_charmm_pbc_noshake_md.GPU_SPDP with
>> mdout.dhfr_charmm_pbc_noshake_md
>> PASSED
>> ==============================================================
>> cd chamber/dhfr_cmap_pbc/ && ./Run.dhfr_cmap_pbc_charmm_noshake.min -1
>> SPDP netcdf.mod
>> diffing mdout.dhfr_charmm_pbc_noshake_min.GPU_SPDP with
>> mdout.dhfr_charmm_pbc_noshake_min
>> PASSED
>> ==============================================================
>> make[1]: Target `test.pmemd.cuda' not remade because of errors.
>> make[1]: Leaving directory `/home/mmaly/_APPS/amber/test/cuda'
>> make: *** [test.pmemd.cuda] Error 2
>> make: Target `test.serial.cuda' not remade because of errors.
>> 53 file comparisons passed
>> 0 file comparisons failed
>> 5 tests experienced errors
>> Test log file saved as logs/test_amber_cuda/2011-01-27_17-29-54.log
>> No test diffs to save!
>>
>>
>>
>>
>>
>>
>>
>>
>> Dne Thu, 27 Jan 2011 17:58:06 +0100 Jason Swails
>> <jason.swails.gmail.com>
>> napsal/-a:
>>
>>> Hello,
>>>
>>> As a note on GCC 4.5 -- it is now up to version 4.5.2 (stable). While
>>> you
>>> may be fine downgrading, you may also be fine upgrading as well (4.5.0
>>> is a
>>> very early release of a new compiler).
>>>
>>> However -- did the tests pass with your current install?
>>>
>>> 2011/1/27 Marek Maly <marek.maly.ujep.cz>
>>>
>>>> Hi Peker,
>>>> thanks a lot !
>>>>
>>>> as you has also a little bit older gcc
>>>> ( gcc 4.4.3. ) than me (gcc 4.5.0 ) on the other hand Cuda 3.2 as me,
>>>> your result
>>>> increased my suspicion that problem resides really in gcc version (
>>>> simple
>>>> too new for serious work :(( ).
>>>> Anyway it is clear that the first thing which I have to try is to use
>>>> some
>>>> older gcc and recompile
>>>> everything and see what will happen then ...
>>>>
>>>> Best wishes,
>>>>
>>>> Marek
>>>>
>>>>
>>>>
>>>>
>>>> Dne Thu, 27 Jan 2011 17:09:42 +0100 peker milas <pekermilas.gmail.com>
>>>> napsal/-a:
>>>>
>>>> > Hi again Marek,
>>>> >
>>>> > It finished 85000 step without giving me any NaN s ??? I wanted to
>>>> let
>>>> > you know
>>>> >
>>>> > best
>>>> > peker
>>>> >
>>>> > On Thu, Jan 27, 2011 at 10:42 AM, filip fratev
>>>> <filipfratev.yahoo.com>
>>>> > wrote:
>>>> >> Hi Marek,
>>>> >> I performed 50 000 steps. You can find the outputs as an attache
>>>> file.
>>>> >> Update: Because the output files are about 2.7mb and attachment
>>>> needs
>>>> >> to be approved by moderator I am sending you the results privately
>>>> too
>>>> >> (just in case).
>>>> >>
>>>> >> My system:
>>>> >> --------------------------------------
>>>> >> Linux 2.6.34-12-desktop x86_64
>>>> >> openSUSE 11.3 (x86_64)
>>>> >> GeForce GTX 470
>>>> >> NVIDIA 260.19.36
>>>> >> AMD Phenom(tm) II X6 1090T Processor
>>>> >> RAM: 7.8 GiB
>>>> >> --------------------------------------
>>>> >> Cuda 3.1, everything compiled by gcc (SUSE Linux) 4.3.4
>>>> [gcc-4_3-branch
>>>> >> revision 152973] due to described issues with CUDA and newer
>>>> versions.
>>>> >>
>>>> >> Regards,
>>>> >> Filip
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> AMBER mailing list
>>>> >> AMBER.ambermd.org
>>>> >> http://lists.ambermd.org/mailman/listinfo/amber
>>>> >>
>>>> >
>>>> > _______________________________________________
>>>> > AMBER mailing list
>>>> > AMBER.ambermd.org
>>>> > http://lists.ambermd.org/mailman/listinfo/amber
>>>> >
>>>> > __________ Informace od ESET NOD32 Antivirus, verze databaze 5824
>>>> > (20110127) __________
>>>> >
>>>> > Tuto zpravu proveril ESET NOD32 Antivirus.
>>>> >
>>>> > http://www.eset.cz
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>> --
>>>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>>>> http://www.opera.com/mail/
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>
>>>
>>>
>>
>>
>> --
>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> http://www.opera.com/mail/
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> __________ Informace od ESET NOD32 Antivirus, verze databaze 5825
> (20110127) __________
>
> Tuto zpravu proveril ESET NOD32 Antivirus.
>
> http://www.eset.cz
>
>
>


-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jan 27 2011 - 11:00:08 PST
Custom Search