Is this problem unique to Amber, or do you get this kind of issue with any
MPI program?
My suggestion is to write a quick MPI program that does some basic
collective communication and see if it works across different nodes. An
example Fortran program is:
program test_mpi
implicit none
include 'mpif.h'
integer holder, ierr
call mpi_init(ierr)
holder = 1
call mpi_bcast(holder, 1, mpi_integer, 0, mpi_comm_world, ierr)
call mpi_finalize()
end program test_mpi
You can compile it with "mpif90 program_name.f90" Do you still get the
same error with this program?
On Mon, May 14, 2012 at 11:04 PM, Syed Tarique Moin <tarisyed.yahoo.com>wrote:
> Hello,
>
> I have compiled mpich2 and amber12 with intel compiler successfully. When
> I run the job with sander.MPI on single nodes with multicore it runs
> without errors. But the same job run with multiple nodes is giving
> following errors with mpirun and mpiexec.
>
> Kindly guide me.
>
> Thanks and Regards
>
>
> ------------------------------
>
> mpiexec -np 8 -machinefile /etc/mpich/machines.LINUX
> $AMBERHOME/bin/sander.MPI -O -i sim_cmplx_1000.in -o test.out -p a.prmtop
> -c sim_cmplx_1000_36.rst -r test.rst -x test.mdcrd -e test.mden &
> [1] 4374
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> Fatal error in PMPI_Barrier: Other MPI error, error stack:
> PMPI_Barrier(425)...........: MPI_Barrier(MPI_COMM_WORLD) failed
> MPIR_Barrier_impl(306)......:
> MPIR_Bcast_impl(1321).......:
> MPIR_Bcast_intra(1155)......:
> MPIR_Bcast_binomial(213)....: Failure during collective
> MPIR_Barrier_impl(292)......:
> MPIR_Barrier_or_coll_fn(121):
> MPIR_Barrier_intra(83)......:
> dequeue_and_set_error(596)..: Communication error with rank 0
>
> [1]+ Exit 1 mpiexec -np 8 -machinefile
> /etc/mpich/machines.LINUX $AMBERHOME/bin/sander.MPI -O -i
> sim_cmplx_1000.in -o test.out -p a.prmtop
>
> -----------------------------------------------
> -------------------------------------------
> mpirun -np 8 -machinefile
> /etc/mpich/machines.LINUX $AMBERHOME/bin/sander.MPI -O -i
> sim_cmplx_1000.in -o test.out -p a.prmtop -c sim_cmplx_1000_36.rst -r
> test.rst -x test.mdcrd -e test.mden &
>
>
>
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> Fatal error in PMPI_Barrier: Other MPI error, error stack:
> PMPI_Barrier(425)...........: MPI_Barrier(MPI_COMM_WORLD) failed
> MPIR_Barrier_impl(306)......:
> MPIR_Bcast_impl(1321).......:
> MPIR_Bcast_intra(1155)......:
> MPIR_Bcast_binomial(213)....: Failure during collective
> MPIR_Barrier_impl(292)......:
> MPIR_Barrier_or_coll_fn(121):
> MPIR_Barrier_intra(83)......:
> dequeue_and_set_error(596)..: Communication error with rank 0
>
> -------------------------------------------------------------
>
>
>
> Syed Tarique Moin
> Ph.D. Research Fellow,
> International Center for Chemical and Biological Sciences,
> University of Karachi
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
--
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue May 15 2012 - 01:00:03 PDT