Hi,
I installed and tested MPICH2 on several cluster nodes, as well as
compiled amber9 with MKL support and static linking. make test.parallel
went fine, with the exception of a couple of possible failures (didn't
follow up on those yet).
To test further, I used an example from an Amber tutorial (piece of
DNA). When run as a serial Amber, all works fine and produces expected
output. The parallel version, however, fails even when run on a single
node (one entry in the mpd.hosts file). The output is below. I did run
the resulting trajectory using Sirius, and it looked fine, except that
it's incomplete, as opposed to the serial version output. Do you have
any suggestions as to why this might be happening in the parallel
version?
Thank you
Sasha
[sasha.node6 test]$ mpiexec -n 4 $AMBERHOME/exe/sander.MPI -O
-i /data/apps/amber/test/polyAT_vac_md1_nocut.in
-o /data/apps/amber/test/polyAT_vac_md1_nocut_mpich2.out
-c /data/apps/amber/test/polyAT_vac_init_min.rst
-p /data/apps/amber/test/polyAT_vac.prmtop
-r /data/apps/amber/test/polyAT_vac_md1_nocut_mpich2.rst
-x /data/apps/amber/test/polyAT_vac_md1_nocut_mpich2.mdcrd
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0[cli_0]:
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2[cli_2]:
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3[cli_3]:
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1[cli_1]:
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
Frac coord min, max: -2.111647559080276E-005 0.999587572668685
Frac coord min, max: -2.111647559080276E-005 0.999587572668685
The system has extended beyond
The system has extended beyond
the extent of the virtual box.
Restarting sander will recalculate
a new virtual box with 30 Angstroms
extra on each side, if there is a
the extent of the virtual box.
restart file for this configuration.
SANDER BOMB in subroutine Routine: map_coords (ew_force.f)
Atom out of bounds. If a restart has been written,
Restarting sander will recalculate
restarting should resolve the error
a new virtual box with 30 Angstroms
Frac coord min, max: -2.111647559080276E-005 0.999587572668685
The system has extended beyond
the extent of the virtual box.
Restarting sander will recalculate
a new virtual box with 30 Angstroms
extra on each side, if there is a
restart file for this configuration.
SANDER BOMB in subroutine Routine: map_coords (ew_force.f)
Atom out of bounds. If a restart has been written,
restarting should resolve the error
extra on each side, if there is a
restart file for this configuration.
SANDER BOMB in subroutine Routine: map_coords (ew_force.f)
Atom out of bounds. If a restart has been written,
restarting should resolve the error
rank 2 in job 2 node6.abicluster_39939 caused collective abort of all
ranks
exit status of rank 2: return code 1
rank 0 in job 2 node6.abicluster_39939 caused collective abort of all
ranks
exit status of rank 0: killed by signal 9
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Fri Apr 18 2008 - 21:19:54 PDT