AMBER: Amber9 with MPICH2 failure at runtime

From: Sasha Buzko <obuzko.ucla.edu>
Date: Wed, 16 Apr 2008 14:43:07 -0700

Hi,
I installed and tested MPICH2 on several cluster nodes, as well as
compiled amber9 with MKL support and static linking. make test.parallel
went fine, with the exception of a couple of possible failures (didn't
follow up on those yet).
To test further, I used an example from an Amber tutorial (piece of
DNA). When run as a serial Amber, all works fine and produces expected
output. The parallel version, however, fails even when run on a single
node (one entry in the mpd.hosts file). The output is below. I did run
the resulting trajectory using Sirius, and it looked fine, except that
it's incomplete, as opposed to the serial version output. Do you have
any suggestions as to why this might be happening in the parallel
version?

Thank you

Sasha


[sasha.node6 test]$ mpiexec -n 4 $AMBERHOME/exe/sander.MPI -O
-i /data/apps/amber/test/polyAT_vac_md1_nocut.in
-o /data/apps/amber/test/polyAT_vac_md1_nocut_mpich2.out
-c /data/apps/amber/test/polyAT_vac_init_min.rst
-p /data/apps/amber/test/polyAT_vac.prmtop
-r /data/apps/amber/test/polyAT_vac_md1_nocut_mpich2.rst
-x /data/apps/amber/test/polyAT_vac_md1_nocut_mpich2.mdcrd
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0[cli_0]:
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2[cli_2]:
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3[cli_3]:
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1[cli_1]:
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
 Frac coord min, max: -2.111647559080276E-005 0.999587572668685
 Frac coord min, max: -2.111647559080276E-005 0.999587572668685
 The system has extended beyond
 The system has extended beyond
     the extent of the virtual box.
 Restarting sander will recalculate
    a new virtual box with 30 Angstroms
    extra on each side, if there is a
     the extent of the virtual box.
    restart file for this configuration.
 SANDER BOMB in subroutine Routine: map_coords (ew_force.f)
 Atom out of bounds. If a restart has been written,
 Restarting sander will recalculate
 restarting should resolve the error
    a new virtual box with 30 Angstroms
 Frac coord min, max: -2.111647559080276E-005 0.999587572668685
 The system has extended beyond
     the extent of the virtual box.
 Restarting sander will recalculate
    a new virtual box with 30 Angstroms
    extra on each side, if there is a
    restart file for this configuration.
 SANDER BOMB in subroutine Routine: map_coords (ew_force.f)
 Atom out of bounds. If a restart has been written,
 restarting should resolve the error
    extra on each side, if there is a
    restart file for this configuration.
 SANDER BOMB in subroutine Routine: map_coords (ew_force.f)
 Atom out of bounds. If a restart has been written,
 restarting should resolve the error
rank 2 in job 2 node6.abicluster_39939 caused collective abort of all
ranks
  exit status of rank 2: return code 1
rank 0 in job 2 node6.abicluster_39939 caused collective abort of all
ranks
  exit status of rank 0: killed by signal 9


-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Fri Apr 18 2008 - 21:19:54 PDT
Custom Search