Re: [AMBER] Fwd: amber14 parallel build problems

From: Hai Nguyen <nhai.qn.gmail.com>
Date: Wed, 18 Jan 2017 15:09:32 -0500

can you just send the log.* files as suggested from my previous first to
install a fresh amber?

To debug, only do one step at a time and free your mind from others.

Hai

On Wed, Jan 18, 2017 at 3:05 PM, Ada Sedova <ada.a.sedova.gmail.com> wrote:

> So to make things clear, I have a serial build that I set aside for
> comparison, and then I ran ANOTHER serial build in another directory with a
> subsequent attempt at mpi build on top of it, as directed. I then tried
> calling the yacc call to cifparse in each directory. If you look in the
> other email thread there are details. The serial version exits with no
> error. The parallel version crashes. So something HAS changed about the
> yacc binary during the parallel build (after the serial build). It is
> either a rebuild, which shouldn't happen according to the Makefile for that
> part, or the library paths have been changed wrt yacc during the parallel
> build.
>
> It's possible that, because we use the lustre filesystem for work nodes,
> that the libraries for mpi are set differently for the work node and it's
> incompatible with running yacc on the home node.
>
> On Wed, Jan 18, 2017 at 2:58 PM, Ada Sedova <ada.a.sedova.gmail.com>
> wrote:
>
> > Well, I guess that's because he did not reply to me with a cc to the
> > mailing list. I simply hit reply, and the thread was already, I thought,
> to
> > the mailing list, as it was initiated as such.
> >
> > On Wed, Jan 18, 2017 at 2:54 PM, Hai Nguyen <nhai.qn.gmail.com> wrote:
> >
> >> hi
> >>
> >> side note: It's clearly that you did not attach two_par.out and
> >> one_par.out
> >> files to amber mailing list.
> >> You only sent to David.
> >>
> >> Hai
> >>
> >> On Wed, Jan 18, 2017 at 2:44 PM, Ada Sedova <ada.a.sedova.gmail.com>
> >> wrote:
> >>
> >> > This was sent a few days ago with no response.
> >> >
> >> > Today I was told this info was not given.
> >> >
> >> > Here it is again.
> >> >
> >> >
> >> > AS
> >> >
> >> >
> >> > ---------- Forwarded message ----------
> >> > From: Ada Sedova <ada.a.sedova.gmail.com>
> >> > Date: Tue, Jan 17, 2017 at 11:19 AM
> >> > Subject: Re: [AMBER] amber14 parallel build problems
> >> > To: david.case.rutgers.edu
> >> >
> >> >
> >> > Yes, I would like to continue to debug this, as getting OLCF to update
> >> to
> >> > amber16 may be difficult, as it requires a purchase and thus a bunch
> of
> >> > paperwork and bureaucratic steps.
> >> >
> >> > The output logs form sdtout and stderr from the failed mpi build are
> >> > attached.
> >> >
> >> > The complete output from mpicc -show was given above in this thread,
> >> but I
> >> > will repeat for convenience:
> >> >
> >> > -bash-4.1$ mpicc -show
> >> > >
> >> > > gcc -I/sw/rhea/openmpi/1.8.4/rhel6.6_gcc4.8.2/include -pthread
> >> > > -L/usr/lib64
> >> > > -Wl,-rpath -Wl,/usr/lib64 -Wl,-rpath
> >> > > -Wl,/sw/rhea/openmpi/1.8.4/rhel6.6_gcc4.8.2/lib
> >> -Wl,--enable-new-dtags
> >> > > -L/sw/rhea/openmpi/1.8.4/rhel6.6_gcc4.8.2/lib -lmpi
> >> > >
> >> >
> >> > serial build:
> >> >
> >> > -bash-4.1$ ldd ./yacc
> >> >
> >> > linux-vdso.so.1 => (0x00007ffe30bfd000)
> >> >
> >> > libc.so.6 => /lib64/libc.so.6 (0x00007f3b7805c000)
> >> >
> >> > /lib64/ld-linux-x86-64.so.2 (0x00007f3b7841b000)
> >> >
> >> >
> >> > parallel build:
> >> >
> >> > -bash-4.1$ ldd ./yacc
> >> >
> >> > linux-vdso.so.1 => (0x00007ffe86cf6000)
> >> >
> >> > libmpi.so.1 => /sw/rhea/openmpi/1.8.4/rhel6.
> 6_gcc4.8.2/lib/libmpi.so.1
> >> > (0x00007f29e89ac000)
> >> >
> >> > libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f29e8764000)
> >> >
> >> > libc.so.6 => /lib64/libc.so.6 (0x00007f29e83d0000)
> >> >
> >> > librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x00007f29e81bb000)
> >> >
> >> > libosmcomp.so.3 => /usr/lib64/libosmcomp.so.3 (0x00007f29e7fad000)
> >> >
> >> > libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00007f29e7d9b000)
> >> >
> >> > libpsm_infinipath.so.1 => /usr/lib64/libpsm_infinipath.so.1
> >> > (0x00007f29e7b47000)
> >> >
> >> > libopen-rte.so.7 => /sw/rhea/openmpi/1.8.4/rhel6.
> >> > 6_gcc4.8.2/lib/libopen-rte.so.7 (0x00007f29e7856000)
> >> >
> >> > libtorque.so.2 => /usr/lib64/libtorque.so.2 (0x00007f29e6f65000)
> >> >
> >> > libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00007f29e6c12000)
> >> >
> >> > libz.so.1 => /lib64/libz.so.1 (0x00007f29e69fb000)
> >> >
> >> > libcrypto.so.10 => /usr/lib64/libcrypto.so.10 (0x00007f29e6617000)
> >> >
> >> > libssl.so.10 => /usr/lib64/libssl.so.10 (0x00007f29e63aa000)
> >> >
> >> > libopen-pal.so.6 => /sw/rhea/openmpi/1.8.4/rhel6.
> >> > 6_gcc4.8.2/lib/libopen-pal.so.6 (0x00007f29e60b9000)
> >> >
> >> > libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f29e5eae000)
> >> >
> >> > libdl.so.2 => /lib64/libdl.so.2 (0x00007f29e5caa000)
> >> >
> >> > librt.so.1 => /lib64/librt.so.1 (0x00007f29e5aa1000)
> >> >
> >> > libm.so.6 => /lib64/libm.so.6 (0x00007f29e581d000)
> >> >
> >> > libutil.so.1 => /lib64/libutil.so.1 (0x00007f29e561a000)
> >> >
> >> > /lib64/ld-linux-x86-64.so.2 (0x00007f29e8ec1000)
> >> >
> >> > libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x00007f29e5412000)
> >> >
> >> > libgcc_s.so.1 => /ccs/compilers/gcc/rhel6-x86_6
> >> 4/4.8.2/lib64/libgcc_s.so.1
> >> > (0x00007f29e51fc000)
> >> >
> >> > libnl.so.1 => /lib64/libnl.so.1 (0x00007f29e4faa000)
> >> >
> >> > libinfinipath.so.4 => /usr/lib64/libinfinipath.so.4
> (0x00007f29e4d9b000)
> >> >
> >> > libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f29e4b97000)
> >> >
> >> > libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f29e4891000)
> >> >
> >> > libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f29e464c000)
> >> >
> >> > libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f29e4365000)
> >> >
> >> > libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f29e4161000)
> >> >
> >> > libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f29e3f34000)
> >> >
> >> > libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f29e3d29000)
> >> >
> >> > libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f29e3b25000)
> >> >
> >> > libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f29e390b000)
> >> >
> >> > libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f29e36ec000)
> >> >
> >> >
> >> > Thanks for your continued assistance.
> >> >
> >> >
> >> > Ada
> >> >
> >> > On Tue, Jan 17, 2017 at 9:10 AM, David Case <david.case.rutgers.edu>
> >> > wrote:
> >> >
> >> > > On Sun, Jan 15, 2017, Ada Sedova wrote:
> >> > >
> >> > > > Here are the results of the checks you suggested:
> >> > > >
> >> > > > 1) mpicc -show shows the correct gcc (4.8.2) that I used for
> serial
> >> > > compile
> >> > >
> >> > > I'm not sure if you still want to try to debug this problem or not.
> >> If
> >> > you
> >> > > do, please save the complete log of the "make install" run that
> fails,
> >> > and
> >> > > send it as an attachment. So far, I think we have only seen
> snippets
> >> of
> >> > > the
> >> > > end of the error messages.
> >> > >
> >> > > Also: run the serial install, cd $AMBERHOME/bin and report the
> output
> >> > > from "ldd ./yacc". Do the same after the (failed) parallel install.
> >> > > Are there any differences in what libraries are loaded?
> >> > >
> >> > > Finally, please provide the full output from "mpicc -show", and let
> us
> >> > know
> >> > > which MPI version you are using, and how you installed it.
> >> > >
> >> > > As best I can understand things, when you compile byacc with gcc,
> the
> >> > > resulting executable works, but when you later compile it with
> mpicc,
> >> the
> >> > > resulting executable tries to load a library that is not in your
> >> > > LD_LIBRARY_PATH. If this is correct, it is something we should be
> >> able
> >> > to
> >> > > track down.
> >> > >
> >> > > [An alternative workaround is to comment out the line in the
> Makefile
> >> > > that (re-)makes yacc during the parallel compilation step. But if
> >> your
> >> > > mpicc
> >> > > is failing with yacc, it seems likely to fail at some other step as
> >> > well.]
> >> > >
> >> > > Trying the whole procedure again with AmberTools16 is probably a
> good
> >> > > sanity
> >> > > check (removes some possible sources of problems).
> >> > >
> >> > > Sorry for all the problems you are seeing: I don't recall having
> ever
> >> > seen
> >> > > this particular problem before.
> >> > >
> >> > > ...dac
> >> > >
> >> >
> >> > _______________________________________________
> >> > AMBER mailing list
> >> > AMBER.ambermd.org
> >> > http://lists.ambermd.org/mailman/listinfo/amber
> >> >
> >> >
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 18 2017 - 12:30:04 PST
Custom Search