Ross,
I tried the things you suggested. The replies are inline.
On Mon, 2008-04-14 at 14:32 -0700, Ross Walker wrote:
>  
> 
> Hi Sasha,
>  
>  export AMBERHOME=/data/amber9
> export MPI_HOME=/data/openmpi
> 
> source /opt/intel/cce/10.1.012/bin/iccvars.sh
> source /opt/intel/fce/10.1.012/bin/ifortvars.sh
> 
> PATH=/opt/intel/cce/10.1.012/bin:$PATH; export PATH
> PATH=/opt/intel/fce/10.1.012/bin:$PATH; export PATH 
>                 Try checking that MPI_HOME/bin is being picked up at
>                 the beginning of your path as well - to make sure that
>                 'which mpif90' and 'which mpirun' return the correct
>                 versions.
which mpif90 and which mpirun return the correct paths. Below is the
mpif90 -show output:
[sasha.abicluster ~]$ mpif90 -show
/opt/intel/fce/10.1.012/bin/ifort -I/data/openmpi/include -pthread
-I/data/openmpi/lib -L/data/openmpi/lib -lmpi_f90 -lmpi_f77 -lmpi
-lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil
>                  
>                 Also try running:
>                  
>                 mpif90 -show
>                  
>                 to make sure it returns the correct compiler etc. E.g.
>                 here is mine for ifort with mpich2:
>                  
>                 [14:21][caffeine:0.04][rcw:~]$ mpif90 -show
>                 ifort -g
>                 -I/usr/local/mpi/mpich2-1.0.3_ifort9.1.039/include
>                 -I/usr/local/mpi/mpich2-1.0.3_ifort9.1.039/include
>                 -L/usr/local/mpi/mpich2-1.0.3_ifort9.1.039/lib
>                 -lmpichf90 -lmpichf90 -lmpich -lpthread -lrt
>                  
>                  
>                 Serial version compiles ok with or without the -static
>                 flag, but make test.serial fails:
>                  
>                 So the serial version links against the MKL libraries
>                 okay then? It is just the parallel version below that
>                 doesn't? 
>                 cd qmmm2/2pk4; ./Run.2pk4_stan
>                 This test not set up for parallel, skipping 
>                  
>                 This is really weird - if you really did make
>                 test.serial but it returns that "This test not set up
>                 for parallel," then something is wrong here. Make sure
>                 the DO_PARALLEL and TESTsander variables are NOT set.
>                 Then try things again. My suspicion is that you have
>                 DO_PARALLEL set so it is running the serial version of
>                 sander through mpi - I.e. running multiple copies of
>                 the same code - hence errors opening restrt files etc.
I did check that DO_PARALLEL is not set, but it still fails:
        cd amoeba_wat1; ./Run.amoeba_wat1
        
          Unit   16 Error on OPEN:
        restrt                                                                                                                                                                                                                                                          
          ./Run.amoeba_wat1:  Program error
        make: *** [test.sander.AMOEBA] Error 1
        
With that, I'm still mostly concerned with the parallel version..
>                  
>                 Sequence of actions to compile parallel Amber (after
>                 patching the source):
>                 
>                 [sasha.abicluster src]$ ./configure -opteron -openmpi
>                 ifort_x86_64 
>                  
>                 Leave out the -opteron - I don't think it does
>                 anything with ifort anyway.
>                  
>                 After this, I edit config.h to replace ifort with
>                 mpif90 in FC and LOAD flags. It doesn't compile
>                 without it, and it might be useful to have a note
>                 about it in the installation instructions. 
>                  
>                 Don't do this... It should be using mpif90 otherwise
>                 you will be missing all sorts of library files that
>                 are needed. You shouldn't need to edit the config.h
>                 file at all. What is the problem when you do 'make
>                 parallel' with mpif90 in the config.h file? I assume
>                 mpif90 exists in your path and picks up the correct
>                 compiler?
If I don't put mpif90 in the config.h file (instead of ifort), the
parallel compilation fails immediately with these errors (static or
not):
        ifort -c -w95   -mp1 -O0 -FR  -o evb_init.o _evb_init.f
        fortcom: Error: _evb_init.f, line 171: Cannot open include file
        'mpif-common.h'
              include 'mpif-common.h'
        --------------^
        fortcom: Error: _evb_init.f, line 321: This name does not have a
        type, and must have an explicit type.   [MPI_INTEGER]
           call mpi_bcast ( ndim, 1, MPI_INTEGER, 0, commworld, ierr )
        -----------------------------^
        fortcom: Error: _evb_init.f, line 361: This name does not have a
        type, and must have an explicit type.   [MPI_DOUBLE_PRECISION]
                 call mpi_bcast ( xdat_dia(n)% q, ndim,
        MPI_DOUBLE_PRECISION, 0, commworld, ierr )
        ------------------------------------------------^
        fortcom: Error: _evb_init.f, line 366: This name does not have a
        type, and must have an explicit type.   [MPI_CHARACTER]
                 call mpi_bcast ( xdat_dia(n)% filename, 512,
        MPI_CHARACTER, 0, commworld, ierr )
        ------------------------------------------------------^
        compilation aborted for _evb_init.f (code 1)
        make[1]: *** [evb_init.o] Error 1
        make[1]: Leaving directory `/data/amber9/src/sander'
        make: *** [parallel] Error 2
So the problem is having "ifort" in the config.h, and is fixed (or,
should I say, worked around) by replacing it with "mpif90".
>                  
>                 make parallel creates the executables, but make
>                 test.parallel fails with this error:
>                  
>                 You are linking dynamically here I assume?
Yes, as I said before, -static always fails. 
>                  
>                 cd cytosine; ./Run.cytosine
>                 /data/amber9/exe/sander.MPI: error while loading
>                 shared libraries: libmkl_lapack.so: cannot open  
>                  
>                 This implies that the environment is someway different
>                 on different nodes. Typically this happens in parallel
>                 when you set some environment variables on one node
>                 but the other node (which is also running part of the
>                 mpi code) doesn't inherit these - hence it doesn't
>                 know where to look for the mkl libraries. Typically
>                 the simplest solution here is to try and compile
>                 statically and then you don't need to worry about it.
Well, that's the thing - static compilation ALWAYS fails.
I run the test immediately after the compilation on the same system, so
there are no differences in the environment. I'm not even getting to
running it on multiple nodes, since it fails the post-compilation test
on the compilation node..
>                  
>                 Otherwise you will need to tweak things like the
>                 default .profile or .bashrc so that something like
>                 'mpirun -np 4 env' returns you the same thing from all
>                 nodes. Normally static linking (if you can do it)
>                 avoids this hassle though.
>                  
>         The strange thing is that libmkl_lapack.so is located in the
>         directory that was happily noticed by the ./configure script.
>         Same error is thrown when sander.MPI is attempted to run with
>         one of the test cases from the Amber tutorial (which is kind
>         of expected after the test error). 
>          
>                 It is very possible that the mpirun command (even if
>         you run everything on the same physical node you are compiling
>         on) is invoking a new shell and not picking up the correct
>         paths. Try editing /etc/bashrc on all nodes so they source the
>         compiler and mkl environment setup scripts on login. 
>         
>         Compilation with -static flag fails invariably with the
>         following message:
>         ld: cannot find -lmpi_f90
>         make[1]: *** [sander.MPI] Error 1
>         make[1]: Leaving directory `/data/amber9/src/sander'
>         make: *** [parallel] Error 2 
>          
>         I would hope this would go away with using mpif90 - although
>         maybe not if no static library is available for openmpi. There
>         should be a way to build a statically linkable openmpi (I do
>         it all the time with mpich2 without problems). So you could
>         try that. Although I would first look at making sure the
>         environment gets inherited correctly on all nodes under an
>         mpirun. 
Once again, compiling without the -static flag succeeds but it fails at
the "make test.parallel". I don't care so much about the static
compilation, so this isn't a big deal. Just as long as I could get a
paralllel version with MKL to work..
Finally, when I run "mpirun -np 4 env", the output contains this line:
LD_LIBRARY_PATH=/opt/intel/mkl/10.0.1.014/em64t/lib:/data/openmpi/include:/data/openmpi/lib:/opt/intel/fce/10.1.012/lib:/opt/intel/cce/10.1.012/lib
So the library directory is seen, but why it can't locate the libraries
at runtime, is still a mystery...
Let me know if you can think of any reasons for this.
Thanks
Sasha
>         
>         All the best
>         Ross
>         /\
>         \/
>         |\oss Walker
>         
>         | Assistant Research Professor |
>         | San Diego Supercomputer Center |
>         | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>         | http://www.rosswalker.co.uk | PGP Key available on request |
>         
>         Note: Electronic Mail is not secure, has no guarantee of
>         delivery, may not be read every day, and should not be used
>         for urgent or sensitive issues. 
>         
>          
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Fri Apr 18 2008 - 21:19:22 PDT