Re: [AMBER] Fwd: amber14 parallel build problems

From: Ada Sedova <ada.a.sedova.gmail.com>
Date: Wed, 18 Jan 2017 15:30:21 -0500

ok. maybe this was the problem. we'll see. trying now.

On Wed, Jan 18, 2017 at 3:23 PM, Hai Nguyen <nhai.qn.gmail.com> wrote:

> "make clean" only clean the file objects like ".o"
>
> "make distclean" try (its best) to restore the amber to original
> distribution.
>
> Hai
>
> On Wed, Jan 18, 2017 at 3:20 PM, Ada Sedova <ada.a.sedova.gmail.com>
> wrote:
>
> > Ok. I have not tried make clean before mpi build. Because I assumed that
> > would undo the things created by the serial install that were needed by
> the
> > parallel version. And there is not distclean.
> >
> > On Wed, Jan 18, 2017 at 3:09 PM, Hai Nguyen <nhai.qn.gmail.com> wrote:
> >
> > > can you just send the log.* files as suggested from my previous first
> to
> > > install a fresh amber?
> > >
> > > To debug, only do one step at a time and free your mind from others.
> > >
> > > Hai
> > >
> > > On Wed, Jan 18, 2017 at 3:05 PM, Ada Sedova <ada.a.sedova.gmail.com>
> > > wrote:
> > >
> > > > So to make things clear, I have a serial build that I set aside for
> > > > comparison, and then I ran ANOTHER serial build in another directory
> > > with a
> > > > subsequent attempt at mpi build on top of it, as directed. I then
> tried
> > > > calling the yacc call to cifparse in each directory. If you look in
> > the
> > > > other email thread there are details. The serial version exits with
> no
> > > > error. The parallel version crashes. So something HAS changed about
> the
> > > > yacc binary during the parallel build (after the serial build). It is
> > > > either a rebuild, which shouldn't happen according to the Makefile
> for
> > > that
> > > > part, or the library paths have been changed wrt yacc during the
> > parallel
> > > > build.
> > > >
> > > > It's possible that, because we use the lustre filesystem for work
> > nodes,
> > > > that the libraries for mpi are set differently for the work node and
> > it's
> > > > incompatible with running yacc on the home node.
> > > >
> > > > On Wed, Jan 18, 2017 at 2:58 PM, Ada Sedova <ada.a.sedova.gmail.com>
> > > > wrote:
> > > >
> > > > > Well, I guess that's because he did not reply to me with a cc to
> the
> > > > > mailing list. I simply hit reply, and the thread was already, I
> > > thought,
> > > > to
> > > > > the mailing list, as it was initiated as such.
> > > > >
> > > > > On Wed, Jan 18, 2017 at 2:54 PM, Hai Nguyen <nhai.qn.gmail.com>
> > wrote:
> > > > >
> > > > >> hi
> > > > >>
> > > > >> side note: It's clearly that you did not attach two_par.out and
> > > > >> one_par.out
> > > > >> files to amber mailing list.
> > > > >> You only sent to David.
> > > > >>
> > > > >> Hai
> > > > >>
> > > > >> On Wed, Jan 18, 2017 at 2:44 PM, Ada Sedova <
> ada.a.sedova.gmail.com
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > This was sent a few days ago with no response.
> > > > >> >
> > > > >> > Today I was told this info was not given.
> > > > >> >
> > > > >> > Here it is again.
> > > > >> >
> > > > >> >
> > > > >> > AS
> > > > >> >
> > > > >> >
> > > > >> > ---------- Forwarded message ----------
> > > > >> > From: Ada Sedova <ada.a.sedova.gmail.com>
> > > > >> > Date: Tue, Jan 17, 2017 at 11:19 AM
> > > > >> > Subject: Re: [AMBER] amber14 parallel build problems
> > > > >> > To: david.case.rutgers.edu
> > > > >> >
> > > > >> >
> > > > >> > Yes, I would like to continue to debug this, as getting OLCF to
> > > update
> > > > >> to
> > > > >> > amber16 may be difficult, as it requires a purchase and thus a
> > bunch
> > > > of
> > > > >> > paperwork and bureaucratic steps.
> > > > >> >
> > > > >> > The output logs form sdtout and stderr from the failed mpi build
> > are
> > > > >> > attached.
> > > > >> >
> > > > >> > The complete output from mpicc -show was given above in this
> > thread,
> > > > >> but I
> > > > >> > will repeat for convenience:
> > > > >> >
> > > > >> > -bash-4.1$ mpicc -show
> > > > >> > >
> > > > >> > > gcc -I/sw/rhea/openmpi/1.8.4/rhel6.6_gcc4.8.2/include
> -pthread
> > > > >> > > -L/usr/lib64
> > > > >> > > -Wl,-rpath -Wl,/usr/lib64 -Wl,-rpath
> > > > >> > > -Wl,/sw/rhea/openmpi/1.8.4/rhel6.6_gcc4.8.2/lib
> > > > >> -Wl,--enable-new-dtags
> > > > >> > > -L/sw/rhea/openmpi/1.8.4/rhel6.6_gcc4.8.2/lib -lmpi
> > > > >> > >
> > > > >> >
> > > > >> > serial build:
> > > > >> >
> > > > >> > -bash-4.1$ ldd ./yacc
> > > > >> >
> > > > >> > linux-vdso.so.1 => (0x00007ffe30bfd000)
> > > > >> >
> > > > >> > libc.so.6 => /lib64/libc.so.6 (0x00007f3b7805c000)
> > > > >> >
> > > > >> > /lib64/ld-linux-x86-64.so.2 (0x00007f3b7841b000)
> > > > >> >
> > > > >> >
> > > > >> > parallel build:
> > > > >> >
> > > > >> > -bash-4.1$ ldd ./yacc
> > > > >> >
> > > > >> > linux-vdso.so.1 => (0x00007ffe86cf6000)
> > > > >> >
> > > > >> > libmpi.so.1 => /sw/rhea/openmpi/1.8.4/rhel6.
> > > > 6_gcc4.8.2/lib/libmpi.so.1
> > > > >> > (0x00007f29e89ac000)
> > > > >> >
> > > > >> > libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f29e8764000)
> > > > >> >
> > > > >> > libc.so.6 => /lib64/libc.so.6 (0x00007f29e83d0000)
> > > > >> >
> > > > >> > librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x00007f29e81bb000)
> > > > >> >
> > > > >> > libosmcomp.so.3 => /usr/lib64/libosmcomp.so.3
> (0x00007f29e7fad000)
> > > > >> >
> > > > >> > libibverbs.so.1 => /usr/lib64/libibverbs.so.1
> (0x00007f29e7d9b000)
> > > > >> >
> > > > >> > libpsm_infinipath.so.1 => /usr/lib64/libpsm_infinipath.so.1
> > > > >> > (0x00007f29e7b47000)
> > > > >> >
> > > > >> > libopen-rte.so.7 => /sw/rhea/openmpi/1.8.4/rhel6.
> > > > >> > 6_gcc4.8.2/lib/libopen-rte.so.7 (0x00007f29e7856000)
> > > > >> >
> > > > >> > libtorque.so.2 => /usr/lib64/libtorque.so.2 (0x00007f29e6f65000)
> > > > >> >
> > > > >> > libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00007f29e6c12000)
> > > > >> >
> > > > >> > libz.so.1 => /lib64/libz.so.1 (0x00007f29e69fb000)
> > > > >> >
> > > > >> > libcrypto.so.10 => /usr/lib64/libcrypto.so.10
> (0x00007f29e6617000)
> > > > >> >
> > > > >> > libssl.so.10 => /usr/lib64/libssl.so.10 (0x00007f29e63aa000)
> > > > >> >
> > > > >> > libopen-pal.so.6 => /sw/rhea/openmpi/1.8.4/rhel6.
> > > > >> > 6_gcc4.8.2/lib/libopen-pal.so.6 (0x00007f29e60b9000)
> > > > >> >
> > > > >> > libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f29e5eae000)
> > > > >> >
> > > > >> > libdl.so.2 => /lib64/libdl.so.2 (0x00007f29e5caa000)
> > > > >> >
> > > > >> > librt.so.1 => /lib64/librt.so.1 (0x00007f29e5aa1000)
> > > > >> >
> > > > >> > libm.so.6 => /lib64/libm.so.6 (0x00007f29e581d000)
> > > > >> >
> > > > >> > libutil.so.1 => /lib64/libutil.so.1 (0x00007f29e561a000)
> > > > >> >
> > > > >> > /lib64/ld-linux-x86-64.so.2 (0x00007f29e8ec1000)
> > > > >> >
> > > > >> > libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x00007f29e5412000)
> > > > >> >
> > > > >> > libgcc_s.so.1 => /ccs/compilers/gcc/rhel6-x86_6
> > > > >> 4/4.8.2/lib64/libgcc_s.so.1
> > > > >> > (0x00007f29e51fc000)
> > > > >> >
> > > > >> > libnl.so.1 => /lib64/libnl.so.1 (0x00007f29e4faa000)
> > > > >> >
> > > > >> > libinfinipath.so.4 => /usr/lib64/libinfinipath.so.4
> > > > (0x00007f29e4d9b000)
> > > > >> >
> > > > >> > libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f29e4b97000)
> > > > >> >
> > > > >> > libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f29e4891000)
> > > > >> >
> > > > >> > libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2
> > > (0x00007f29e464c000)
> > > > >> >
> > > > >> > libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f29e4365000)
> > > > >> >
> > > > >> > libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f29e4161000)
> > > > >> >
> > > > >> > libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f29e3f34000)
> > > > >> >
> > > > >> > libkrb5support.so.0 => /lib64/libkrb5support.so.0
> > > (0x00007f29e3d29000)
> > > > >> >
> > > > >> > libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f29e3b25000)
> > > > >> >
> > > > >> > libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f29e390b000)
> > > > >> >
> > > > >> > libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f29e36ec000)
> > > > >> >
> > > > >> >
> > > > >> > Thanks for your continued assistance.
> > > > >> >
> > > > >> >
> > > > >> > Ada
> > > > >> >
> > > > >> > On Tue, Jan 17, 2017 at 9:10 AM, David Case <
> > david.case.rutgers.edu
> > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > On Sun, Jan 15, 2017, Ada Sedova wrote:
> > > > >> > >
> > > > >> > > > Here are the results of the checks you suggested:
> > > > >> > > >
> > > > >> > > > 1) mpicc -show shows the correct gcc (4.8.2) that I used for
> > > > serial
> > > > >> > > compile
> > > > >> > >
> > > > >> > > I'm not sure if you still want to try to debug this problem or
> > > not.
> > > > >> If
> > > > >> > you
> > > > >> > > do, please save the complete log of the "make install" run
> that
> > > > fails,
> > > > >> > and
> > > > >> > > send it as an attachment. So far, I think we have only seen
> > > > snippets
> > > > >> of
> > > > >> > > the
> > > > >> > > end of the error messages.
> > > > >> > >
> > > > >> > > Also: run the serial install, cd $AMBERHOME/bin and report the
> > > > output
> > > > >> > > from "ldd ./yacc". Do the same after the (failed) parallel
> > > install.
> > > > >> > > Are there any differences in what libraries are loaded?
> > > > >> > >
> > > > >> > > Finally, please provide the full output from "mpicc -show",
> and
> > > let
> > > > us
> > > > >> > know
> > > > >> > > which MPI version you are using, and how you installed it.
> > > > >> > >
> > > > >> > > As best I can understand things, when you compile byacc with
> > gcc,
> > > > the
> > > > >> > > resulting executable works, but when you later compile it with
> > > > mpicc,
> > > > >> the
> > > > >> > > resulting executable tries to load a library that is not in
> your
> > > > >> > > LD_LIBRARY_PATH. If this is correct, it is something we
> should
> > be
> > > > >> able
> > > > >> > to
> > > > >> > > track down.
> > > > >> > >
> > > > >> > > [An alternative workaround is to comment out the line in the
> > > > Makefile
> > > > >> > > that (re-)makes yacc during the parallel compilation step.
> But
> > if
> > > > >> your
> > > > >> > > mpicc
> > > > >> > > is failing with yacc, it seems likely to fail at some other
> step
> > > as
> > > > >> > well.]
> > > > >> > >
> > > > >> > > Trying the whole procedure again with AmberTools16 is
> probably a
> > > > good
> > > > >> > > sanity
> > > > >> > > check (removes some possible sources of problems).
> > > > >> > >
> > > > >> > > Sorry for all the problems you are seeing: I don't recall
> having
> > > > ever
> > > > >> > seen
> > > > >> > > this particular problem before.
> > > > >> > >
> > > > >> > > ...dac
> > > > >> > >
> > > > >> >
> > > > >> > _______________________________________________
> > > > >> > AMBER mailing list
> > > > >> > AMBER.ambermd.org
> > > > >> > http://lists.ambermd.org/mailman/listinfo/amber
> > > > >> >
> > > > >> >
> > > > >> _______________________________________________
> > > > >> AMBER mailing list
> > > > >> AMBER.ambermd.org
> > > > >> http://lists.ambermd.org/mailman/listinfo/amber
> > > > >>
> > > > >
> > > > >
> > > > _______________________________________________
> > > > AMBER mailing list
> > > > AMBER.ambermd.org
> > > > http://lists.ambermd.org/mailman/listinfo/amber
> > > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 18 2017 - 13:00:02 PST
Custom Search