[AMBER] problem with "make test.parallel" for AMBER12

From: Marc van der Kamp <marcvanderkamp.gmail.com>
Date: Mon, 16 Apr 2012 15:15:21 +0100

Hi,

I have a problem running the tests for the parallel executables. (My
apologies if this is not an AMBER issue per se)

I have:
compiled & tested AMBER12 in serial (with success)
downloaded and compiled openmpi-1.5.4 (using "./configure_openmpi -gnu" in
$AMBERHOME/AmberTools/src/)
compiled AMBER12 in parallel (with success, it seems)

Now, I'd like to test AMBER12 in parallel.
First, I tried simply doing:

cd $AMBERHOME
make test

The first series of tests that run are in AmberTools/test/nab and
these tests are fine.
Then, when trying to run tests in AmberTools/test/mmpbsa_py I get a bunch
of errors like this:

make[3]: Entering directory
`/export/users/chmwvdk/amber12/AmberTools/test/mmpbsa_py'
cd EstRAL_Files && ./Run.makeparms
This is not a parallel test.
cd 01_Generalized_Born && ./Run.GB
[curie.chm.bris.ac.uk:23918] [[32424,1],0] ORTE_ERROR_LOG: Data unpack
would read past end of buffer in file util/nidmap.c at line 371
[curie.chm.bris.ac.uk:23918] [[32424,1],0] ORTE_ERROR_LOG: Data unpack
would read past end of buffer in file base/ess_base_nidmap.c at line 62
[curie.chm.bris.ac.uk:23918] [[32424,1],0] ORTE_ERROR_LOG: Data unpack
would read past end of buffer in file ess_env_module.c at line 173
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_base_build_nidmap failed
  --> Returned value Data unpack would read past end of buffer (-26)
instead of ORTE_SUCCESS
--------------------------------------------------------------------------
...
...

This is unfortunate, but not really of great importance to me, as the
parallel executables of the AMBER12 programs (not AmberTools12) is what I'm
really after.
However, when I get to this stage, the process simply 'hangs' after:
make[2]: Entering directory `/export/users/chmwvdk/amber12/test'
export TESTsander=/users/chmwvdk/amber12/bin/sander.MPI; make -k
test.sander.BASIC
make[3]: Entering directory `/export/users/chmwvdk/amber12/test'
cd cytosine && ./Run.cytosine


It turns out that the cytosine test has run fine, writing up to 'section 4'
in the output file (values in cytosine/cytosine.out and
cytosine/cytosine.out.save are identical), but nothing happens after that.
In other words, the process seems to hang on getting the TIMINGS
information (section 5) in the output file.

I checked that my environment was OK (as far as I can tell), i.e.
$PATH contains $AMBERHOME/bin
$LD_LIBRARY_PATH contains $AMBERHOME/lib
$MPI_HOME is set to $AMBERHOME
'which mpirun' gives $AMBERHOME/bin/mpirun
'which mpicc' gives $AMBERHOME/bin/mpicc

This installation is on a cluster, and I'm not sure if it is set up to run
in parallel on the headnode, so I also made a PBS submission script (as
suggested in http://archive.ambermd.org/200701/0112.html ) and submitted
that.
This is the submission script:

#!/bin/bash
#
#PBS -l walltime=5:0:0,nodes=1:ppn=4
#PBS -q veryshort
#PBS -N parallel_test
#PBS -j oe

export DO_PARALLEL="mpirun -np 4 "
export AMBERHOME=/users/chmwvdk/amber12/
export MPI_HOME="$AMBERHOME"
cd $AMBERHOME/test
make test.parallel


Essentially the same appears to happen - no output is obtained past section
4 of the first test, cytosine/cytosine.out (which is identical to to the
equivalent section in cytosine/cytosine.out.save). The job continues to be
'running' though. So, the run seems to hang on obtaining the timing
information (section 5)?

Strangely, when I run a simple job on my own system (i.e. replace "make
test.parallel" with a line with e.g. $AMBERHOME/bin/sander.MPI as
executable and the usual flags), then the job finishes as normal (without
errors) and I do get the full output including the timing information.
So, it appears that I can actually use sander.MPI without problems...

How can I solve this, so that I can use 'make test' (or similar) to run the
series of tests for the parallel executables?

As an aside, I notice that the 'header' of output files from AMBER12 still
reads:

          -------------------------------------------------------
          Amber 11 SANDER 2010
          -------------------------------------------------------

That should probably be updated...

Many thanks in advance,
Marc
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Apr 16 2012 - 07:30:03 PDT
Custom Search