Re: [AMBER] NaN error in .rst files - UPDATE

From: Marek Maly <marek.maly.ujep.cz>
Date: Thu, 27 Jan 2011 18:29:06 +0100

Hello Jason,

thanks for the comment regarding compiler.

Regarding to pmemd.cuda tests:

here is the result (detail listing is below):

53 file comparisons passed
0 file comparisons failed
5 tests experienced errors

so I would say that my installation relatively well passed the tests.

Moreover regarding to my explicit solvent simulations I did already lot
of simulations here and compared obtained data to CPUs and also to Tesla
results on the same molecular systems and it was OK.

The only problem is here that two types of errors described sooner.

I have to say that on the other hands I obtained many errors from
pmemd.MPI tests (however compilation was OK),
but about this I didn.t care much as I wanted to use just pmemd.cuda here.

Best wishes,

    Marek

Below is the full list of pmemd.cuda tests on our GTX 470 system.


maly.physics ~ $ ssh sta-6
Last login: Thu Jan 27 17:25:01 2011 from physics
Have a lot of fun...
mmaly.sta-6:~> cd _APPS/amber/test/
mmaly.sta-6:~/_APPS/amber/test> ./test_amber_cuda.sh
Using default GPU_ID = -1
Using default PREC_MODEL = SPDP
cd cuda && make -k test.pmemd.cuda GPU_ID=-1 PREC_MODEL=SPDP
make[1]: Entering directory `/home/mmaly/_APPS/amber/test/cuda'
------------------------------------
Running CUDA Implicit solvent tests.
   Precision Model = SPDP
            GPU_ID = -1
------------------------------------
cd trpcage/ && ./Run_md_trpcage -1 SPDP netcdf.mod
diffing trpcage_md.out.GPU_SPDP with trpcage_md.out
PASSED
==============================================================
cd myoglobin/ && ./Run_md_myoglobin -1 SPDP netcdf.mod
diffing myoglobin_md.out.GPU_SPDP with myoglobin_md.out
PASSED
==============================================================
cd chamber/dhfr/ && ./Run.dhfr_charmm.md -1 SPDP netcdf.mod
diffing mdout.dhfr_charmm_md.GPU_SPDP with mdout.dhfr_charmm_md
PASSED
==============================================================
cd chamber/dhfr/ && ./Run.dhfr_charmm.min -1 SPDP netcdf.mod
diffing mdout.dhfr_charmm_min.GPU_SPDP with mdout.dhfr_charmm_min
PASSED
==============================================================
cd chamber/dhfr_cmap/ && ./Run.dhfr_charmm.md -1 SPDP netcdf.mod
diffing mdout.dhfr_charmm_md.GPU_SPDP with mdout.dhfr_charmm_md
PASSED
==============================================================
cd chamber/dhfr_cmap/ && ./Run.dhfr_charmm.min -1 SPDP netcdf.mod
diffing mdout.dhfr_charmm_min.GPU_SPDP with mdout.dhfr_charmm_min
PASSED
==============================================================
cd nucleosome/ && ./Run_min.1 -1 SPDP netcdf.mod
cudaMalloc GpuBuffer::Allocate failed out of memory
   ./Run_min.1: Program error
make[1]: *** [test.pmemd.cuda.gb] Error 1
cd gb_ala3/ && ./Run.igb1_ntc1_min -1 SPDP netcdf.mod
diffing igb1_ntc1_min.out.GPU_SPDP with igb1_ntc1_min.out
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest1_ntt0_igb1_ntc1 -1 SPDP netcdf.mod
diffing irest1_ntt0_igb1_ntc1.out.GPU_SPDP with irest1_ntt0_igb1_ntc1.out
PASSED
==============================================================
diffing irest1_ntt0_igb1_ntc1.rst.GPU_SPDP with irest1_ntt0_igb1_ntc1.rst
PASSED
==============================================================
diffing irest1_ntt0_igb1_ntc1.mdcrd.GPU_SPDP with
irest1_ntt0_igb1_ntc1.mdcrd
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest0_ntt0_igb1_ntc1_hotstart -1 SPDP netcdf.mod
diffing irest0_ntt0_igb1_ntc1_hotstart.out.GPU_SPDP with
irest0_ntt0_igb1_ntc1_hotstart.out
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest0_ntt0_igb1_ntc1_coldstart -1 SPDP netcdf.mod
diffing irest0_ntt0_igb1_ntc1_coldstart.out.GPU_SPDP with
irest0_ntt0_igb1_ntc1_coldstart.out
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest1_ntt0_igb1_ntc2 -1 SPDP netcdf.mod
diffing irest1_ntt0_igb1_ntc2.out.GPU_SPDP with irest1_ntt0_igb1_ntc2.out
PASSED
==============================================================
diffing irest1_ntt0_igb1_ntc2.rst.GPU_SPDP with irest1_ntt0_igb1_ntc2.rst
PASSED
==============================================================
diffing irest1_ntt0_igb1_ntc2.mdcrd.GPU_SPDP with
irest1_ntt0_igb1_ntc2.mdcrd
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest1_ntt0_igb2_ntc1 -1 SPDP netcdf.mod
diffing irest1_ntt0_igb2_ntc1.out.GPU_SPDP with irest1_ntt0_igb2_ntc1.out
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest1_ntt0_igb2_ntc2 -1 SPDP netcdf.mod
diffing irest1_ntt0_igb2_ntc2.out.GPU_SPDP with irest1_ntt0_igb2_ntc2.out
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc1 -1 SPDP netcdf.mod
diffing irest1_ntt0_igb5_ntc1.out.GPU_SPDP with irest1_ntt0_igb5_ntc1.out
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2 -1 SPDP netcdf.mod
diffing irest1_ntt0_igb5_ntc2.out.GPU_SPDP with irest1_ntt0_igb5_ntc2.out
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_ntr1 -1 SPDP netcdf.mod
diffing irest1_ntt0_igb5_ntc2_ntr1.out.GPU_SPDP with
irest1_ntt0_igb5_ntc2_ntr1.out
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_saltcon -1 SPDP netcdf.mod
diffing irest1_ntt0_igb5_ntc2_saltcon.out.GPU_SPDP with
irest1_ntt0_igb5_ntc2_saltcon.out
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_rgbmax -1 SPDP netcdf.mod
diffing irest1_ntt0_igb5_ntc2_rgbmax.out.GPU_SPDP with
irest1_ntt0_igb5_ntc2_rgbmax.out
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest1_ntt0_igb5_ntc2_alpb -1 SPDP netcdf.mod
diffing irest1_ntt0_igb5_ntc2_alpb.out.GPU_SPDP with
irest1_ntt0_igb5_ntc2_alpb.out
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest1_ntt1_igb1_ntc2 -1 SPDP netcdf.mod
diffing irest1_ntt1_igb1_ntc2.out.GPU_SPDP with irest1_ntt1_igb1_ntc2.out
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest1_ntt2_igb1_ntc2 -1 SPDP netcdf.mod
diffing irest1_ntt2_igb1_ntc2.out.GPU_SPDP with irest1_ntt2_igb1_ntc2.out
PASSED
==============================================================
cd gb_ala3/ && ./Run.irest1_ntt3_igb1_ntc2 -1 SPDP netcdf.mod
diffing irest1_ntt3_igb1_ntc2.out.GPU_SPDP with irest1_ntt3_igb1_ntc2.out
PASSED
==============================================================
------------------------------------
Running CUDA Explicit solvent tests.
   Precision Model = SPDP
            GPU_ID = -1
------------------------------------
cd 4096wat/ && ./Run.pure_wat -1 SPDP netcdf.mod
diffing mdout.pure_wat.GPU_SPDP with mdout.pure_wat
PASSED
==============================================================
cd 4096wat/ && ./Run.vrand -1 SPDP netcdf.mod
diffing mdout.vrand.GPU_SPDP with mdout.vrand
PASSED
==============================================================
cd 4096wat_oct/ && ./Run.pure_wat_oct_NVE -1 SPDP netcdf.mod
diffing mdout.pure_wat_oct_nve.GPU_SPDP with mdout.pure_wat_oct_nve
PASSED
==============================================================
diffing mdcrd.pure_wat_oct_nve.GPU_SPDP with mdcrd.pure_wat_oct_nve
PASSED
==============================================================
cd 4096wat_oct/ && ./Run.pure_wat_oct_NVT_NTT1 -1 SPDP netcdf.mod
diffing mdout.pure_wat_oct_nvt_ntt1.GPU_SPDP with
mdout.pure_wat_oct_nvt_ntt1
PASSED
==============================================================
diffing mdcrd.pure_wat_oct_nvt_ntt1.GPU_SPDP with
mdcrd.pure_wat_oct_nvt_ntt1
PASSED
==============================================================
cd 4096wat_oct/ && ./Run.pure_wat_oct_NVT_NTT2 -1 SPDP netcdf.mod
diffing mdout.pure_wat_oct_nvt_ntt2.GPU_SPDP with
mdout.pure_wat_oct_nvt_ntt2
PASSED
==============================================================
diffing mdcrd.pure_wat_oct_nvt_ntt2.GPU_SPDP with
mdcrd.pure_wat_oct_nvt_ntt2
PASSED
==============================================================
cd 4096wat_oct/ && ./Run.pure_wat_oct_NVT_NTT3 -1 SPDP netcdf.mod
diffing mdout.pure_wat_oct_nvt_ntt3.GPU_SPDP with
mdout.pure_wat_oct_nvt_ntt3
PASSED
==============================================================
diffing mdcrd.pure_wat_oct_nvt_ntt3.GPU_SPDP with
mdcrd.pure_wat_oct_nvt_ntt3
PASSED
==============================================================
cd 4096wat_oct/ && ./Run.pure_wat_oct_NPT_NTT1 -1 SPDP netcdf.mod
diffing mdout.pure_wat_oct_npt_ntt1.GPU_SPDP with
mdout.pure_wat_oct_npt_ntt1
PASSED
==============================================================
diffing mdcrd.pure_wat_oct_npt_ntt1.GPU_SPDP with
mdcrd.pure_wat_oct_npt_ntt1
PASSED
==============================================================
cd jac/ && ./Run.jac -1 SPDP netcdf.mod
diffing jac.out.GPU_SPDP with jac.out
PASSED
==============================================================
cd dhfr/ && ./Run.dhfr -1 SPDP netcdf.mod
diffing mdout.dhfr.GPU_SPDP with mdout.dhfr
PASSED
==============================================================
cd dhfr/ && ./Run.dhfr.ntr1 -1 SPDP netcdf.mod
diffing mdout.dhfr.ntr1.GPU_SPDP with mdout.dhfr.ntr1
PASSED
==============================================================
cd dhfr/ && ./Run.dhfr.ntb2 -1 SPDP netcdf.mod
diffing mdout.dhfr.ntb2.GPU_SPDP with mdout.dhfr.ntb2
PASSED
==============================================================
cd dhfr/ && ./Run.dhfr.ntb2_ntt1 -1 SPDP netcdf.mod
diffing mdout.dhfr.ntb2_ntt1.GPU_SPDP with mdout.dhfr.ntb2_ntt1
PASSED
==============================================================
cd dhfr/ && ./Run.dhfr.ntb2_ntt1_ntr1 -1 SPDP netcdf.mod
diffing mdout.dhfr.ntb2_ntt1_ntr1.GPU_SPDP with mdout.dhfr.ntb2_ntt1_ntr1
PASSED
==============================================================
cd dhfr/ && ./Run.dhfr.ntb2_ntt3 -1 SPDP netcdf.mod
diffing mdout.dhfr.ntb2_ntt3.GPU_SPDP with mdout.dhfr.ntb2_ntt3
PASSED
==============================================================
cd dhfr/ && ./Run.dhfr.min -1 SPDP netcdf.mod
diffing mdout.dhfr.min.GPU_SPDP with mdout.dhfr.min
PASSED
==============================================================
cd dhfr/ && ./Run.dhfr.noshake -1 SPDP netcdf.mod
diffing mdout.dhfr.noshake.GPU_SPDP with mdout.dhfr.noshake
PASSED
==============================================================
cd chamber/dhfr_pbc/ && ./Run.dhfr_pbc_charmm_noshake.md -1 SPDP netcdf.mod
diffing mdout.dhfr_charmm_pbc_noshake_md.GPU_SPDP with
mdout.dhfr_charmm_pbc_noshake_md
PASSED
==============================================================
cd chamber/dhfr_pbc/ && ./Run.dhfr_pbc_charmm_noshake.min -1 SPDP
netcdf.mod
diffing mdout.dhfr_charmm_pbc_noshake_min.GPU_SPDP with
mdout.dhfr_charmm_pbc_noshake_min
PASSED
==============================================================
cd chamber/dhfr_cmap_pbc/ && ./Run.dhfr_cmap_pbc_charmm.md -1 SPDP
netcdf.mod
diffing mdout.dhfr_charmm_pbc_md.GPU_SPDP with mdout.dhfr_charmm_pbc_md
PASSED
==============================================================
cd chamber/dhfr_cmap_pbc/ && ./Run.dhfr_cmap_pbc_charmm.min -1 SPDP
netcdf.mod
diffing mdout.dhfr_charmm_pbc_min.GPU_SPDP with mdout.dhfr_charmm_pbc_min
PASSED
==============================================================
cd chamber/dhfr_cmap_pbc/ && ./Run.dhfr_cmap_pbc_charmm_noshake.md -1 SPDP
netcdf.mod
diffing mdout.dhfr_charmm_pbc_noshake_md.GPU_SPDP with
mdout.dhfr_charmm_pbc_noshake_md
PASSED
==============================================================
cd chamber/dhfr_cmap_pbc/ && ./Run.dhfr_cmap_pbc_charmm_noshake.min -1
SPDP netcdf.mod
diffing mdout.dhfr_charmm_pbc_noshake_min.GPU_SPDP with
mdout.dhfr_charmm_pbc_noshake_min
PASSED
==============================================================
make[1]: Target `test.pmemd.cuda' not remade because of errors.
make[1]: Leaving directory `/home/mmaly/_APPS/amber/test/cuda'
make: *** [test.pmemd.cuda] Error 2
make: Target `test.serial.cuda' not remade because of errors.
53 file comparisons passed
0 file comparisons failed
5 tests experienced errors
Test log file saved as logs/test_amber_cuda/2011-01-27_17-29-54.log
No test diffs to save!








Dne Thu, 27 Jan 2011 17:58:06 +0100 Jason Swails <jason.swails.gmail.com>
napsal/-a:

> Hello,
>
> As a note on GCC 4.5 -- it is now up to version 4.5.2 (stable). While
> you
> may be fine downgrading, you may also be fine upgrading as well (4.5.0
> is a
> very early release of a new compiler).
>
> However -- did the tests pass with your current install?
>
> 2011/1/27 Marek Maly <marek.maly.ujep.cz>
>
>> Hi Peker,
>> thanks a lot !
>>
>> as you has also a little bit older gcc
>> ( gcc 4.4.3. ) than me (gcc 4.5.0 ) on the other hand Cuda 3.2 as me,
>> your result
>> increased my suspicion that problem resides really in gcc version (
>> simple
>> too new for serious work :(( ).
>> Anyway it is clear that the first thing which I have to try is to use
>> some
>> older gcc and recompile
>> everything and see what will happen then ...
>>
>> Best wishes,
>>
>> Marek
>>
>>
>>
>>
>> Dne Thu, 27 Jan 2011 17:09:42 +0100 peker milas <pekermilas.gmail.com>
>> napsal/-a:
>>
>> > Hi again Marek,
>> >
>> > It finished 85000 step without giving me any NaN s ??? I wanted to let
>> > you know
>> >
>> > best
>> > peker
>> >
>> > On Thu, Jan 27, 2011 at 10:42 AM, filip fratev <filipfratev.yahoo.com>
>> > wrote:
>> >> Hi Marek,
>> >> I performed 50 000 steps. You can find the outputs as an attache
>> file.
>> >> Update: Because the output files are about 2.7mb and attachment needs
>> >> to be approved by moderator I am sending you the results privately
>> too
>> >> (just in case).
>> >>
>> >> My system:
>> >> --------------------------------------
>> >> Linux 2.6.34-12-desktop x86_64
>> >> openSUSE 11.3 (x86_64)
>> >> GeForce GTX 470
>> >> NVIDIA 260.19.36
>> >> AMD Phenom(tm) II X6 1090T Processor
>> >> RAM: 7.8 GiB
>> >> --------------------------------------
>> >> Cuda 3.1, everything compiled by gcc (SUSE Linux) 4.3.4
>> [gcc-4_3-branch
>> >> revision 152973] due to described issues with CUDA and newer
>> versions.
>> >>
>> >> Regards,
>> >> Filip
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> AMBER mailing list
>> >> AMBER.ambermd.org
>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >>
>> >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> > __________ Informace od ESET NOD32 Antivirus, verze databaze 5824
>> > (20110127) __________
>> >
>> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> >
>> > http://www.eset.cz
>> >
>> >
>> >
>>
>>
>> --
>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> http://www.opera.com/mail/
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>


-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jan 27 2011 - 10:00:04 PST
Custom Search