Re: [AMBER] MPI error in running a multi-sander job for TI calculation

From: David A Case <david.case.rutgers.edu>
Date: Sun, 11 Feb 2018 08:30:12 -0500

On Sun, Feb 11, 2018, Ahsan Mohd wrote:

> I am getting an error in running a multi-sander job for TI calculation.
> I am running using command:
>
> mpirun -np 8 /root/amber16/bin/sander.MPI -ng 2 -groupfile step1.group
>
> and this is what I am getting:
>
> Running multisander version of sander Amber16
> Total processors = 8
> Number of groups = 2
>
> Fatal error in PMPI_Bcast: Other MPI error, error stack:
> PMPI_Bcast(1525)......: MPI_Bcast(buf=0x2b5c575076a8, count=2897,
> MPI_DOUBLE_PRECISION, root=0, comm=0x84000005) failed
> MPIR_Bcast_impl(1369).:
> MPIR_Bcast_intra(1160):
> MPIR_SMP_Bcast(1077)..: Failure during collective

Unfortunately, this is a low-level error message, coming from MPI, and doesn't
provide much useful information, beyond saying that there was a failure in a
broadcast. You'll have to do some debugging to narrow down the problem.

First, be sure (if you've not already done so) that the parallel tests PASS
for your installation. Pay particular attention to the tests in
$AMBERHOME/test/ti-*.

Try a test calculation with -np 2 (so that each group has just a single
processor). And check a non-TI calculation with -np 8.

...good luck....dac


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Feb 11 2018 - 06:00:10 PST
Custom Search