Re: [AMBER] mpi problem

From: Syed Tarique Moin <tarisyed.yahoo.com>
Date: Tue, 15 May 2012 03:29:23 -0700 (PDT)

No, the given below program ran smoothly on a single now but I do not know how to check across different nodes.

Regards

 
Tarique




>________________________________
> From: Jason Swails <jason.swails.gmail.com>
>To: AMBER Mailing List <amber.ambermd.org>
>Sent: Tuesday, May 15, 2012 12:53 PM
>Subject: Re: [AMBER] mpi problem
>
>Is this problem unique to Amber, or do you get this kind of issue with any
>MPI program?
>
>My suggestion is to write a quick MPI program that does some basic
>collective communication and see if it works across different nodes.  An
>example Fortran program is:
>
>program test_mpi
>  implicit none
>  include 'mpif.h'
>  integer holder, ierr
>  call mpi_init(ierr)
>  holder = 1
>  call mpi_bcast(holder, 1, mpi_integer, 0, mpi_comm_world, ierr)
>  call mpi_finalize()
>end program test_mpi
>
>You can compile it with "mpif90 program_name.f90"  Do you still get the
>same error with this program?
>
>On Mon, May 14, 2012 at 11:04 PM, Syed Tarique Moin <tarisyed.yahoo.com>wrote:
>
>> Hello,
>>
>> I have compiled mpich2 and amber12 with intel compiler successfully. When
>> I run the job with sander.MPI on single nodes with multicore it  runs
>> without errors. But the same job run with multiple nodes is giving
>> following errors with mpirun and mpiexec.
>>
>> Kindly guide me.
>>
>> Thanks and Regards
>>
>>
>> ------------------------------
>>
>> mpiexec -np 8 -machinefile /etc/mpich/machines.LINUX
>> $AMBERHOME/bin/sander.MPI -O -i sim_cmplx_1000.in -o test.out -p a.prmtop
>> -c sim_cmplx_1000_36.rst -r test.rst -x test.mdcrd -e test.mden &
>> [1] 4374
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> Fatal error in PMPI_Barrier: Other MPI error, error stack:
>> PMPI_Barrier(425)...........: MPI_Barrier(MPI_COMM_WORLD) failed
>> MPIR_Barrier_impl(306)......:
>> MPIR_Bcast_impl(1321).......:
>> MPIR_Bcast_intra(1155)......:
>> MPIR_Bcast_binomial(213)....: Failure during collective
>> MPIR_Barrier_impl(292)......:
>> MPIR_Barrier_or_coll_fn(121):
>> MPIR_Barrier_intra(83)......:
>> dequeue_and_set_error(596)..: Communication error with rank 0
>>
>> [1]+  Exit 1                  mpiexec -np 8 -machinefile
>> /etc/mpich/machines.LINUX $AMBERHOME/bin/sander.MPI -O -i
>> sim_cmplx_1000.in -o test.out -p a.prmtop
>>
>>  -----------------------------------------------
>> -------------------------------------------
>> mpirun -np 8 -machinefile
>> /etc/mpich/machines.LINUX $AMBERHOME/bin/sander.MPI -O -i
>> sim_cmplx_1000.in -o test.out -p a.prmtop -c sim_cmplx_1000_36.rst -r
>> test.rst -x test.mdcrd -e test.mden &
>>
>>
>>
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> Fatal error in PMPI_Barrier: Other MPI error, error stack:
>> PMPI_Barrier(425)...........: MPI_Barrier(MPI_COMM_WORLD) failed
>> MPIR_Barrier_impl(306)......:
>> MPIR_Bcast_impl(1321).......:
>> MPIR_Bcast_intra(1155)......:
>> MPIR_Bcast_binomial(213)....: Failure during collective
>> MPIR_Barrier_impl(292)......:
>> MPIR_Barrier_or_coll_fn(121):
>> MPIR_Barrier_intra(83)......:
>> dequeue_and_set_error(596)..: Communication error with rank 0
>>
>> -------------------------------------------------------------
>>
>>
>>
>> Syed Tarique Moin
>> Ph.D. Research Fellow,
>> International Center for Chemical and Biological Sciences,
>> University of Karachi
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>
>--
>Jason M. Swails
>Quantum Theory Project,
>University of Florida
>Ph.D. Candidate
>352-392-4032
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue May 15 2012 - 03:30:05 PDT
Custom Search