Re: [AMBER] cuda test failing after installation

From: Ravi Abrol <raviabrol.gmail.com>
Date: Tue, 7 May 2019 12:43:28 -0700

Hi David,

During my testing, the specific file you asked for was lost due to a
subsequent make clean, however, I am attaching
test/cuda/myoglobin/myoglobin_md.out.dif file for your reference, which
shows the same type of error for cuda_parallel mode testing.

Thanks,
Ravi



On Tue, May 7, 2019 at 11:03 AM David Cerutti <dscerutti.gmail.com> wrote:

> I think to diagnose this I would need to see the actual outputs of the test
> cases on those GTX-1080s. I don't have such a card (I do have a 1080Ti),
> but if you go into ${AMBERHOME}/test/cuda/amd/dhfr_pme/, for example, and
> show us the mdout.pme.amd2.dif file that might be helpful.
>
> Dave ( Cerutti )
>
>
> On Tue, May 7, 2019 at 1:27 PM Ravi Abrol <raviabrol.gmail.com> wrote:
>
> > Dear Dave,
> > Sorry took a while to test this. Thanks for your suggestion to upgrade to
> > Amber18, which resolved these errors on 2 out of 3 workstations.
> >
> > All three workstations have the same OS (POP), gcc, mpich, CUDA-9.2, etc.
> >
> > Workstations where this issue is resolved have either a GTX970 or two
> > RTX2080.
> > Workstation on which the issue persists has two GTX1080.
> >
> > On this third workstation, other tests work fine (0 tests with errors),
> but
> > test_amber_cuda_parallel tests all fail with messages like:
> > ******
> > cd trpcage/ && ./Run_md_trpcage DPFP
> /usr/local/amber18/include/netcdf.mod
> > Note: The following floating-point exceptions are signalling:
> > IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
> > diffing trpcage_md.out.GPU_DPFP with trpcage_md.out
> > possible FAILURE: check trpcage_md.out.dif
> > *******
> > Here are the example cases with the biggest maximum absolute/relative
> > errors:
> >
> > possible FAILURE: check nucleosome_md1_ntt1.out.dif
> > ### Maximum absolute error in matching lines = 1.35e+05 at line 251
> field 4
> > possible FAILURE: check nucleosome_md2_ntt0.out.dif
> > ### Maximum absolute error in matching lines = 1.32e+05 at line 248
> field 4
> > possible FAILURE: check mdout.gb.gamd2.dif
> > ### Maximum absolute error in matching lines = 3.61e+06 at line 293
> field 3
> > ### Maximum relative error in matching lines = 8.75e+06 at line 309
> field 3
> > possible FAILURE: check FactorIX_NVE.out.dif
> > ### Maximum absolute error in matching lines = 1.10e+06 at line 195
> field 3
> > possible FAILURE: check mdout.dhfr.noshake.dif
> > ### Maximum absolute error in matching lines = 1.30e+05 at line 123
> field 3
> > possible FAILURE: check mdout.dhfr_charmm_pbc_noshake_md.dif
> > ### Maximum absolute error in matching lines = 4.94e+05 at line 169
> field 3
> > possible FAILURE: check mdout.dhfr_charmm_pbc_noshake_md.dif
> > ### Maximum absolute error in matching lines = 3.34e+05 at line 148
> field 3
> > possible FAILURE: check mdout.ips.dif
> > ### Maximum absolute error in matching lines = 1.08e+05 at line 223
> field 3
> > ### Maximum relative error in matching lines = 5.93e+04 at line 255
> field 3
> > possible FAILURE: check mdout.pme.amd2.dif
> > ### Maximum absolute error in matching lines = 1.64e+06 at line 225
> field 3
> > possible FAILURE: check mdout.dif
> > ### Maximum absolute error in matching lines = 8.00e+07 at line 257
> field 4
> > possible FAILURE: check mdout.dif
> > ### Maximum absolute error in matching lines = 8.00e+07 at line 260
> field 4
> > possible FAILURE: check mdout.dif
> > ### Maximum absolute error in matching lines = 8.00e+07 at line 258
> field 4
> > possible FAILURE: check mdout.dif
> > ### Maximum absolute error in matching lines = 8.81e+08 at line 233
> field 3
> > ### Maximum relative error in matching lines = 1.42e+04 at line 233
> field 3
> > possible FAILURE: check mdout.dif
> > ### Maximum absolute error in matching lines = 3.45e+07 at line 209
> field 3
> > possible FAILURE: check mdout.cellulose_nvt.dif
> > ### Maximum absolute error in matching lines = 4.59e+06 at line 193
> field 3
> > ### Maximum relative error in matching lines = 1.70e+05 at line 207
> field 3
> > possible FAILURE: check mdout.cellulose_npt.dif
> > ### Maximum absolute error in matching lines = 4.59e+06 at line 234
> field 3
> > ### Maximum relative error in matching lines = 1.12e+05 at line 252
> field 3
> >
> > How do I diagnose this problem?
> >
> > Thanks,
> > Ravi
> >
> >
> > On Sun, Mar 24, 2019 at 10:35 PM Ravi Abrol <raviabrol.gmail.com> wrote:
> >
> > > Thanks Dave for your reply.
> > >
> > > We have GTX 1080 with 6GB memory.
> > >
> > > The default mode for GPU testing was originally DPFP, which flagged
> even
> > > more tests with large errors.
> > > The runs I mentioned in my email below were done with SPFP. Hope that
> > this
> > > helps.
> > >
> > > Ravi
> > >
> > > ---
> > > On Sun, Mar 24, 2019 at 5:35 AM David Case <david.case.rutgers.edu>
> > wrote:
> > >
> > >> On Wed, Mar 20, 2019, Ravi Abrol wrote:
> > >> >
> > >> >I installed amber16 on a new linux machine (running pop_os) and
> during
> > >> the
> > >> >cuda testing (for both pmemd.cuda and pmemd.cuda.MPI), one of the
> tests
> > >> >failed:
> > >> >
> > >> >$AMBERHOME/test/cuda/large_solute_count/mdout.ntb2_ntt1.dif
> > >> >shows:
> > >> >### Maximum absolute error in matching lines = 7.44e+08 at line 112
> > >> field 3
> > >> >### Maximum relative error in matching lines = 1.38e+07 at line 112
> > >> field 3
> > >> >
> > >> >How do I diagnose this error?
> > >>
> > >> Sorry for the slow reply. What model of GPU are you using? How much
> > >> memory does it have? It's possible that you are overflowing memory
> in a
> > >> way that is not caught.
> > >>
> > >> Also, which tests are you running? SPFP or DPFP?
> > >>
> > >> Problems like this can indeed be hard to track down. I'm hoping that
> > >> this post will trigger memories of other users/developers, in case
> they
> > >> maight have seen similar test failures.
> > >>
> > >> ....dac
> > >>
> > >>
> > >> _______________________________________________
> > >> AMBER mailing list
> > >> AMBER.ambermd.org
> > >> http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Tue May 07 2019 - 13:00:03 PDT
Custom Search