Re: [AMBER] Fwd: amber14 parallel build problems

From: Ada Sedova <ada.a.sedova.gmail.com>
Date: Wed, 18 Jan 2017 15:05:50 -0500

So to make things clear, I have a serial build that I set aside for
comparison, and then I ran ANOTHER serial build in another directory with a
subsequent attempt at mpi build on top of it, as directed. I then tried
calling the yacc call to cifparse in each directory. If you look in the
other email thread there are details. The serial version exits with no
error. The parallel version crashes. So something HAS changed about the
yacc binary during the parallel build (after the serial build). It is
either a rebuild, which shouldn't happen according to the Makefile for that
part, or the library paths have been changed wrt yacc during the parallel
build.

It's possible that, because we use the lustre filesystem for work nodes,
that the libraries for mpi are set differently for the work node and it's
incompatible with running yacc on the home node.

On Wed, Jan 18, 2017 at 2:58 PM, Ada Sedova <ada.a.sedova.gmail.com> wrote:

> Well, I guess that's because he did not reply to me with a cc to the
> mailing list. I simply hit reply, and the thread was already, I thought, to
> the mailing list, as it was initiated as such.
>
> On Wed, Jan 18, 2017 at 2:54 PM, Hai Nguyen <nhai.qn.gmail.com> wrote:
>
>> hi
>>
>> side note: It's clearly that you did not attach two_par.out and
>> one_par.out
>> files to amber mailing list.
>> You only sent to David.
>>
>> Hai
>>
>> On Wed, Jan 18, 2017 at 2:44 PM, Ada Sedova <ada.a.sedova.gmail.com>
>> wrote:
>>
>> > This was sent a few days ago with no response.
>> >
>> > Today I was told this info was not given.
>> >
>> > Here it is again.
>> >
>> >
>> > AS
>> >
>> >
>> > ---------- Forwarded message ----------
>> > From: Ada Sedova <ada.a.sedova.gmail.com>
>> > Date: Tue, Jan 17, 2017 at 11:19 AM
>> > Subject: Re: [AMBER] amber14 parallel build problems
>> > To: david.case.rutgers.edu
>> >
>> >
>> > Yes, I would like to continue to debug this, as getting OLCF to update
>> to
>> > amber16 may be difficult, as it requires a purchase and thus a bunch of
>> > paperwork and bureaucratic steps.
>> >
>> > The output logs form sdtout and stderr from the failed mpi build are
>> > attached.
>> >
>> > The complete output from mpicc -show was given above in this thread,
>> but I
>> > will repeat for convenience:
>> >
>> > -bash-4.1$ mpicc -show
>> > >
>> > > gcc -I/sw/rhea/openmpi/1.8.4/rhel6.6_gcc4.8.2/include -pthread
>> > > -L/usr/lib64
>> > > -Wl,-rpath -Wl,/usr/lib64 -Wl,-rpath
>> > > -Wl,/sw/rhea/openmpi/1.8.4/rhel6.6_gcc4.8.2/lib
>> -Wl,--enable-new-dtags
>> > > -L/sw/rhea/openmpi/1.8.4/rhel6.6_gcc4.8.2/lib -lmpi
>> > >
>> >
>> > serial build:
>> >
>> > -bash-4.1$ ldd ./yacc
>> >
>> > linux-vdso.so.1 => (0x00007ffe30bfd000)
>> >
>> > libc.so.6 => /lib64/libc.so.6 (0x00007f3b7805c000)
>> >
>> > /lib64/ld-linux-x86-64.so.2 (0x00007f3b7841b000)
>> >
>> >
>> > parallel build:
>> >
>> > -bash-4.1$ ldd ./yacc
>> >
>> > linux-vdso.so.1 => (0x00007ffe86cf6000)
>> >
>> > libmpi.so.1 => /sw/rhea/openmpi/1.8.4/rhel6.6_gcc4.8.2/lib/libmpi.so.1
>> > (0x00007f29e89ac000)
>> >
>> > libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f29e8764000)
>> >
>> > libc.so.6 => /lib64/libc.so.6 (0x00007f29e83d0000)
>> >
>> > librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x00007f29e81bb000)
>> >
>> > libosmcomp.so.3 => /usr/lib64/libosmcomp.so.3 (0x00007f29e7fad000)
>> >
>> > libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00007f29e7d9b000)
>> >
>> > libpsm_infinipath.so.1 => /usr/lib64/libpsm_infinipath.so.1
>> > (0x00007f29e7b47000)
>> >
>> > libopen-rte.so.7 => /sw/rhea/openmpi/1.8.4/rhel6.
>> > 6_gcc4.8.2/lib/libopen-rte.so.7 (0x00007f29e7856000)
>> >
>> > libtorque.so.2 => /usr/lib64/libtorque.so.2 (0x00007f29e6f65000)
>> >
>> > libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00007f29e6c12000)
>> >
>> > libz.so.1 => /lib64/libz.so.1 (0x00007f29e69fb000)
>> >
>> > libcrypto.so.10 => /usr/lib64/libcrypto.so.10 (0x00007f29e6617000)
>> >
>> > libssl.so.10 => /usr/lib64/libssl.so.10 (0x00007f29e63aa000)
>> >
>> > libopen-pal.so.6 => /sw/rhea/openmpi/1.8.4/rhel6.
>> > 6_gcc4.8.2/lib/libopen-pal.so.6 (0x00007f29e60b9000)
>> >
>> > libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f29e5eae000)
>> >
>> > libdl.so.2 => /lib64/libdl.so.2 (0x00007f29e5caa000)
>> >
>> > librt.so.1 => /lib64/librt.so.1 (0x00007f29e5aa1000)
>> >
>> > libm.so.6 => /lib64/libm.so.6 (0x00007f29e581d000)
>> >
>> > libutil.so.1 => /lib64/libutil.so.1 (0x00007f29e561a000)
>> >
>> > /lib64/ld-linux-x86-64.so.2 (0x00007f29e8ec1000)
>> >
>> > libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x00007f29e5412000)
>> >
>> > libgcc_s.so.1 => /ccs/compilers/gcc/rhel6-x86_6
>> 4/4.8.2/lib64/libgcc_s.so.1
>> > (0x00007f29e51fc000)
>> >
>> > libnl.so.1 => /lib64/libnl.so.1 (0x00007f29e4faa000)
>> >
>> > libinfinipath.so.4 => /usr/lib64/libinfinipath.so.4 (0x00007f29e4d9b000)
>> >
>> > libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f29e4b97000)
>> >
>> > libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f29e4891000)
>> >
>> > libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f29e464c000)
>> >
>> > libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f29e4365000)
>> >
>> > libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f29e4161000)
>> >
>> > libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f29e3f34000)
>> >
>> > libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f29e3d29000)
>> >
>> > libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f29e3b25000)
>> >
>> > libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f29e390b000)
>> >
>> > libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f29e36ec000)
>> >
>> >
>> > Thanks for your continued assistance.
>> >
>> >
>> > Ada
>> >
>> > On Tue, Jan 17, 2017 at 9:10 AM, David Case <david.case.rutgers.edu>
>> > wrote:
>> >
>> > > On Sun, Jan 15, 2017, Ada Sedova wrote:
>> > >
>> > > > Here are the results of the checks you suggested:
>> > > >
>> > > > 1) mpicc -show shows the correct gcc (4.8.2) that I used for serial
>> > > compile
>> > >
>> > > I'm not sure if you still want to try to debug this problem or not.
>> If
>> > you
>> > > do, please save the complete log of the "make install" run that fails,
>> > and
>> > > send it as an attachment. So far, I think we have only seen snippets
>> of
>> > > the
>> > > end of the error messages.
>> > >
>> > > Also: run the serial install, cd $AMBERHOME/bin and report the output
>> > > from "ldd ./yacc". Do the same after the (failed) parallel install.
>> > > Are there any differences in what libraries are loaded?
>> > >
>> > > Finally, please provide the full output from "mpicc -show", and let us
>> > know
>> > > which MPI version you are using, and how you installed it.
>> > >
>> > > As best I can understand things, when you compile byacc with gcc, the
>> > > resulting executable works, but when you later compile it with mpicc,
>> the
>> > > resulting executable tries to load a library that is not in your
>> > > LD_LIBRARY_PATH. If this is correct, it is something we should be
>> able
>> > to
>> > > track down.
>> > >
>> > > [An alternative workaround is to comment out the line in the Makefile
>> > > that (re-)makes yacc during the parallel compilation step. But if
>> your
>> > > mpicc
>> > > is failing with yacc, it seems likely to fail at some other step as
>> > well.]
>> > >
>> > > Trying the whole procedure again with AmberTools16 is probably a good
>> > > sanity
>> > > check (removes some possible sources of problems).
>> > >
>> > > Sorry for all the problems you are seeing: I don't recall having ever
>> > seen
>> > > this particular problem before.
>> > >
>> > > ...dac
>> > >
>> >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> >
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 18 2017 - 12:30:03 PST
Custom Search