RE: AMBER: Amber: Parallel Installation Problems

From: Ross Walker <ross.rosswalker.co.uk>
Date: Fri, 27 Oct 2006 10:53:10 -0700

Dear Mina,

You are kind of fighting a losing battle against a number of problems here
and you need to reduce things to as simple as possible. I would start by
addressing the following:

> I have compiled the parallel part of Amber using the mpich
> library (compiled by intel compilers version 8.1), while
> Amber9 has been compiled with ifort (version 9) and gcc.

While technically okay I always find such an approach to be dangerous. My
advice is to go to premier.intel.com, login and download the very latest 9.1
compiler which is 9.1.039 I believe. Also obtain the c compiler (v9.1.042 I
think). Then get hold of a completely fresh copy of mpich. Or probably
better mpich2.

Configure this to be compiled with the newest Intel ifort, icc and icpc
compilers (make sure you select the correct interconnect options). Once this
is built run the test cases that are provided on the mpich page and see if
they all work okay. If this all works then you are good to go to the next
step.

Start with a clean amber9 directory and apply bugfix.all to it from the
amber website. This includes some workarounds for known bugs in the Intel
compilers.

Build this in parallel making sure it links against your new mpich/mpich2
installation.

Then run all the parallel test cases with -np 2 and see if there are any
problems. You can also try -np 4 but some test cases are not designed to
work with 4 cpus.

See if your problems go away. If they do then great. If you still have
problems we can try and debug things further.

With regards to some of the errors you see below.

Run.noesy should not segfault, I'm not sure what is going on here but it
could be related to the compiler mismatch with the mpi.

The bintraj test case can be expected to fail if you did not compile in
support for binary trajectories (-bintraj option to configure)

> A3) Test.sander.GB.MPI stops running as soon as Run.gbrna.ips
> is executed. Again the script runs perfectly on its own.

I'm not sure what exactly you mean here. Do you mean it works fine in serial
but not in parallel.

Note, if you are running the test cases manually yourself you have to be
very careful in parallel as they expect the makefile to have setup certain
environment variables 'before' the script is executed.

At the very least you need to set DO_PARALLEL and also setenv
TESTsander=$AMBERHOME/exe/sander.MPI

> B1) Then I tried to run some benchmarks and it seems I do not
> have the following directories under the benchmark dir
> mg_qmmm , ladh, 1NLN_qmmm

These were for internal development purposes and should have been removed
from the Makefile in the release version. I will create a bugfix that
addresses this.
 
> B2)Also I do not have the psander executable.

psander is still under development and is not complete in amber9. If you
look at the main targets for the tests:

test.serial
and
test.parallel

you will see that the psander tests are not run. Unfortunately this is not
so obvious in the benchmarks makefile. It should probably have been removed
from the Amber 9 release version but for the moment you can safely ignore
it. The relevant benchmarks are bench.sander and bench.pmemd.

> C) Finally, I do not manage to compile pmemd as this message
> appears during the compilation:
> ifort -c -auto -tpp7 -xN -ip -O3 loadbal.f90
> fortcom: Severe: **Internal compiler error: segmentation

Hopefully obtaining the very latest Intel compiler will fix this problem. As
the message says this is a problem with the compiler and not with the pmemd
code. Hopefully Intel have fixed it.

All the best
Ross

/\
\/
|\oss Walker

| HPC Consultant and Staff Scientist |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk | PGP Key available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

> -----Original Message-----
> From: owner-amber.scripps.edu
> [mailto:owner-amber.scripps.edu] On Behalf Of Maniopoulou, A (Mina)
> Sent: Friday, October 27, 2006 10:14
> To: amber.scripps.edu
> Subject: AMBER: Amber: Parallel Installation Problems
>
> Hallo again,
>
> I have compiled the parallel part of Amber using the mpich
> library (compiled by intel compilers version 8.1), while
> Amber9 has been compiled with ifort (version 9) and gcc.
> I run the tests (using lsf, but interactively ) using 4
> processors (apart from the test.sander.EVB that needs 2).
> All run successfully apart from test.sander.BASIC.MPI and
> test.sander.GB.MPI
> A1)Test.sander.BASIC.MPI runs till the end, but when
> Run.noesy is executed I get segmentation fault.
>
> cd noesy; ./Run.noesy
> mpirun.lsf -np 4
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image PC Routine Line
> Source
> sander.MPI 08134CC2 Unknown Unknown Unknown
> sander.MPI 08115992 Unknown Unknown Unknown
> .............. ...................
> ..................
> TID HOST_NAME COMMAND_LINE STATUS
> TERMINATION_TIME
> ==== ========== ================ =======================
> ===================
> 0001 grid-data1 gmmpirun_wrapper Exit (174)
> 10/27/2006 17:48:21
> 0002 grid-data1 gmmpirun_wrapper Exit (174)
> 10/27/2006 17:48:21
> 0003 grid-data1 gmmpirun_wrapper Exit (174)
> 10/27/2006 17:48:21
> 0004 grid-data1 gmmpirun_wrapper Exit (174)
> 10/27/2006 17:48:21
>
> When I run the Run.noesy script on its own, no segmentation
> fault appears. I get a possible failure though. And what
> happens is that noesy.out is identical with noesy.out.save
> and not noesy.out.mpi.save.
>
> A2) Test.sander.BASIC.MPI errors also, when Run.bintraj is executed.
> sander and ptraj: test sander netCDF output and ptraj netCDF input
> [0] MPI Abort by user Aborting program !
> [0] Aborting program!
> Killed by signal 15.
> Killed by signal 15.
> Killed by signal 15.
>
> TID HOST_NAME COMMAND_LINE STATUS
> TERMINATION_TIME
> ==== ========== ================ =======================
> ===================
> 0001 grid-data1 gmmpirun_wrapper Signaled (SIGKILL)
> 10/27/2006 17:48:50
> 0002 grid-data1 gmmpirun_wrapper Signaled (SIGKILL)
> 10/27/2006 17:48:50
> 0003 grid-data1 gmmpirun_wrapper Signaled (SIGKILL)
> 10/27/2006 17:48:46
> 0004 grid-data1 gmmpirun_wrapper Exit (255)
> 10/27/2006 17:48:43
>
> Terminated
> ./Run.bintraj: Program error
> make[1]: [test.sander.BASIC] Error 1 (ignored)
>
> When I run ./Run.bintraj on its own, it runs perfectly.
>
> A3) Test.sander.GB.MPI stops running as soon as Run.gbrna.ips
> is executed. Again the script runs perfectly on its own.
>
>
> B1) Then I tried to run some benchmarks and it seems I do not
> have the following directories under the benchmark dir
> mg_qmmm , ladh, 1NLN_qmmm
>
> B2)Also I do not have the psander executable.
>
>
> C) Finally, I do not manage to compile pmemd as this message
> appears during the compilation:
> ifort -c -auto -tpp7 -xN -ip -O3 loadbal.f90
> fortcom: Severe: **Internal compiler error: segmentation
> violation signal raised** Please report this error along with
> the circumstances in which it occurred in a Software Problem
> Report. Note: File and line given may not be explicit cause
> of this error.
> The config.h was created by ./configure linux_p4 ifort mpich_gm
>
> Thanks a lot,
>
> Mina Maniopoulou
> --------------------------------------------------------------
> ---------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>


-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Oct 29 2006 - 06:07:31 PST
Custom Search