Re: [AMBER] Fwd: amber14 parallel build problems

From: Ada Sedova <ada.a.sedova.gmail.com>
Date: Wed, 18 Jan 2017 15:20:31 -0500

Ok. I have not tried make clean before mpi build. Because I assumed that
would undo the things created by the serial install that were needed by the
parallel version. And there is not distclean.

On Wed, Jan 18, 2017 at 3:09 PM, Hai Nguyen <nhai.qn.gmail.com> wrote:

> can you just send the log.* files as suggested from my previous first to
> install a fresh amber?
>
> To debug, only do one step at a time and free your mind from others.
>
> Hai
>
> On Wed, Jan 18, 2017 at 3:05 PM, Ada Sedova <ada.a.sedova.gmail.com>
> wrote:
>
> > So to make things clear, I have a serial build that I set aside for
> > comparison, and then I ran ANOTHER serial build in another directory
> with a
> > subsequent attempt at mpi build on top of it, as directed. I then tried
> > calling the yacc call to cifparse in each directory. If you look in the
> > other email thread there are details. The serial version exits with no
> > error. The parallel version crashes. So something HAS changed about the
> > yacc binary during the parallel build (after the serial build). It is
> > either a rebuild, which shouldn't happen according to the Makefile for
> that
> > part, or the library paths have been changed wrt yacc during the parallel
> > build.
> >
> > It's possible that, because we use the lustre filesystem for work nodes,
> > that the libraries for mpi are set differently for the work node and it's
> > incompatible with running yacc on the home node.
> >
> > On Wed, Jan 18, 2017 at 2:58 PM, Ada Sedova <ada.a.sedova.gmail.com>
> > wrote:
> >
> > > Well, I guess that's because he did not reply to me with a cc to the
> > > mailing list. I simply hit reply, and the thread was already, I
> thought,
> > to
> > > the mailing list, as it was initiated as such.
> > >
> > > On Wed, Jan 18, 2017 at 2:54 PM, Hai Nguyen <nhai.qn.gmail.com> wrote:
> > >
> > >> hi
> > >>
> > >> side note: It's clearly that you did not attach two_par.out and
> > >> one_par.out
> > >> files to amber mailing list.
> > >> You only sent to David.
> > >>
> > >> Hai
> > >>
> > >> On Wed, Jan 18, 2017 at 2:44 PM, Ada Sedova <ada.a.sedova.gmail.com>
> > >> wrote:
> > >>
> > >> > This was sent a few days ago with no response.
> > >> >
> > >> > Today I was told this info was not given.
> > >> >
> > >> > Here it is again.
> > >> >
> > >> >
> > >> > AS
> > >> >
> > >> >
> > >> > ---------- Forwarded message ----------
> > >> > From: Ada Sedova <ada.a.sedova.gmail.com>
> > >> > Date: Tue, Jan 17, 2017 at 11:19 AM
> > >> > Subject: Re: [AMBER] amber14 parallel build problems
> > >> > To: david.case.rutgers.edu
> > >> >
> > >> >
> > >> > Yes, I would like to continue to debug this, as getting OLCF to
> update
> > >> to
> > >> > amber16 may be difficult, as it requires a purchase and thus a bunch
> > of
> > >> > paperwork and bureaucratic steps.
> > >> >
> > >> > The output logs form sdtout and stderr from the failed mpi build are
> > >> > attached.
> > >> >
> > >> > The complete output from mpicc -show was given above in this thread,
> > >> but I
> > >> > will repeat for convenience:
> > >> >
> > >> > -bash-4.1$ mpicc -show
> > >> > >
> > >> > > gcc -I/sw/rhea/openmpi/1.8.4/rhel6.6_gcc4.8.2/include -pthread
> > >> > > -L/usr/lib64
> > >> > > -Wl,-rpath -Wl,/usr/lib64 -Wl,-rpath
> > >> > > -Wl,/sw/rhea/openmpi/1.8.4/rhel6.6_gcc4.8.2/lib
> > >> -Wl,--enable-new-dtags
> > >> > > -L/sw/rhea/openmpi/1.8.4/rhel6.6_gcc4.8.2/lib -lmpi
> > >> > >
> > >> >
> > >> > serial build:
> > >> >
> > >> > -bash-4.1$ ldd ./yacc
> > >> >
> > >> > linux-vdso.so.1 => (0x00007ffe30bfd000)
> > >> >
> > >> > libc.so.6 => /lib64/libc.so.6 (0x00007f3b7805c000)
> > >> >
> > >> > /lib64/ld-linux-x86-64.so.2 (0x00007f3b7841b000)
> > >> >
> > >> >
> > >> > parallel build:
> > >> >
> > >> > -bash-4.1$ ldd ./yacc
> > >> >
> > >> > linux-vdso.so.1 => (0x00007ffe86cf6000)
> > >> >
> > >> > libmpi.so.1 => /sw/rhea/openmpi/1.8.4/rhel6.
> > 6_gcc4.8.2/lib/libmpi.so.1
> > >> > (0x00007f29e89ac000)
> > >> >
> > >> > libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f29e8764000)
> > >> >
> > >> > libc.so.6 => /lib64/libc.so.6 (0x00007f29e83d0000)
> > >> >
> > >> > librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x00007f29e81bb000)
> > >> >
> > >> > libosmcomp.so.3 => /usr/lib64/libosmcomp.so.3 (0x00007f29e7fad000)
> > >> >
> > >> > libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00007f29e7d9b000)
> > >> >
> > >> > libpsm_infinipath.so.1 => /usr/lib64/libpsm_infinipath.so.1
> > >> > (0x00007f29e7b47000)
> > >> >
> > >> > libopen-rte.so.7 => /sw/rhea/openmpi/1.8.4/rhel6.
> > >> > 6_gcc4.8.2/lib/libopen-rte.so.7 (0x00007f29e7856000)
> > >> >
> > >> > libtorque.so.2 => /usr/lib64/libtorque.so.2 (0x00007f29e6f65000)
> > >> >
> > >> > libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00007f29e6c12000)
> > >> >
> > >> > libz.so.1 => /lib64/libz.so.1 (0x00007f29e69fb000)
> > >> >
> > >> > libcrypto.so.10 => /usr/lib64/libcrypto.so.10 (0x00007f29e6617000)
> > >> >
> > >> > libssl.so.10 => /usr/lib64/libssl.so.10 (0x00007f29e63aa000)
> > >> >
> > >> > libopen-pal.so.6 => /sw/rhea/openmpi/1.8.4/rhel6.
> > >> > 6_gcc4.8.2/lib/libopen-pal.so.6 (0x00007f29e60b9000)
> > >> >
> > >> > libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f29e5eae000)
> > >> >
> > >> > libdl.so.2 => /lib64/libdl.so.2 (0x00007f29e5caa000)
> > >> >
> > >> > librt.so.1 => /lib64/librt.so.1 (0x00007f29e5aa1000)
> > >> >
> > >> > libm.so.6 => /lib64/libm.so.6 (0x00007f29e581d000)
> > >> >
> > >> > libutil.so.1 => /lib64/libutil.so.1 (0x00007f29e561a000)
> > >> >
> > >> > /lib64/ld-linux-x86-64.so.2 (0x00007f29e8ec1000)
> > >> >
> > >> > libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x00007f29e5412000)
> > >> >
> > >> > libgcc_s.so.1 => /ccs/compilers/gcc/rhel6-x86_6
> > >> 4/4.8.2/lib64/libgcc_s.so.1
> > >> > (0x00007f29e51fc000)
> > >> >
> > >> > libnl.so.1 => /lib64/libnl.so.1 (0x00007f29e4faa000)
> > >> >
> > >> > libinfinipath.so.4 => /usr/lib64/libinfinipath.so.4
> > (0x00007f29e4d9b000)
> > >> >
> > >> > libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f29e4b97000)
> > >> >
> > >> > libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f29e4891000)
> > >> >
> > >> > libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2
> (0x00007f29e464c000)
> > >> >
> > >> > libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f29e4365000)
> > >> >
> > >> > libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f29e4161000)
> > >> >
> > >> > libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f29e3f34000)
> > >> >
> > >> > libkrb5support.so.0 => /lib64/libkrb5support.so.0
> (0x00007f29e3d29000)
> > >> >
> > >> > libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f29e3b25000)
> > >> >
> > >> > libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f29e390b000)
> > >> >
> > >> > libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f29e36ec000)
> > >> >
> > >> >
> > >> > Thanks for your continued assistance.
> > >> >
> > >> >
> > >> > Ada
> > >> >
> > >> > On Tue, Jan 17, 2017 at 9:10 AM, David Case <david.case.rutgers.edu
> >
> > >> > wrote:
> > >> >
> > >> > > On Sun, Jan 15, 2017, Ada Sedova wrote:
> > >> > >
> > >> > > > Here are the results of the checks you suggested:
> > >> > > >
> > >> > > > 1) mpicc -show shows the correct gcc (4.8.2) that I used for
> > serial
> > >> > > compile
> > >> > >
> > >> > > I'm not sure if you still want to try to debug this problem or
> not.
> > >> If
> > >> > you
> > >> > > do, please save the complete log of the "make install" run that
> > fails,
> > >> > and
> > >> > > send it as an attachment. So far, I think we have only seen
> > snippets
> > >> of
> > >> > > the
> > >> > > end of the error messages.
> > >> > >
> > >> > > Also: run the serial install, cd $AMBERHOME/bin and report the
> > output
> > >> > > from "ldd ./yacc". Do the same after the (failed) parallel
> install.
> > >> > > Are there any differences in what libraries are loaded?
> > >> > >
> > >> > > Finally, please provide the full output from "mpicc -show", and
> let
> > us
> > >> > know
> > >> > > which MPI version you are using, and how you installed it.
> > >> > >
> > >> > > As best I can understand things, when you compile byacc with gcc,
> > the
> > >> > > resulting executable works, but when you later compile it with
> > mpicc,
> > >> the
> > >> > > resulting executable tries to load a library that is not in your
> > >> > > LD_LIBRARY_PATH. If this is correct, it is something we should be
> > >> able
> > >> > to
> > >> > > track down.
> > >> > >
> > >> > > [An alternative workaround is to comment out the line in the
> > Makefile
> > >> > > that (re-)makes yacc during the parallel compilation step. But if
> > >> your
> > >> > > mpicc
> > >> > > is failing with yacc, it seems likely to fail at some other step
> as
> > >> > well.]
> > >> > >
> > >> > > Trying the whole procedure again with AmberTools16 is probably a
> > good
> > >> > > sanity
> > >> > > check (removes some possible sources of problems).
> > >> > >
> > >> > > Sorry for all the problems you are seeing: I don't recall having
> > ever
> > >> > seen
> > >> > > this particular problem before.
> > >> > >
> > >> > > ...dac
> > >> > >
> > >> >
> > >> > _______________________________________________
> > >> > AMBER mailing list
> > >> > AMBER.ambermd.org
> > >> > http://lists.ambermd.org/mailman/listinfo/amber
> > >> >
> > >> >
> > >> _______________________________________________
> > >> AMBER mailing list
> > >> AMBER.ambermd.org
> > >> http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 18 2017 - 12:30:05 PST
Custom Search