Dear Ikuo,
This unfortunately looks like a hardware failure to me. The error is coming
from your MPI implementation and not from PMEMD. It could be you have a node
that is misbehaving (try a reboot) or you have a dodgy interconnect cable.
Do any other MPI programs fail?
Try running over a completely different set of nodes and see if you see the
same error.
Also, does this error always occur at the same point in the output file or
is it random?
All the best
Ross
> -----Original Message-----
> From: kurisaki [mailto:kurisaki.ncube.human.nagoya-u.ac.jp]
> Sent: Wednesday, September 28, 2011 12:00 AM
> To: 'AMBER Mailing List'
> Subject: [AMBER] PMEMD job was aborted
>
> Dear Amber users and developers,
>
>
>
> Thank you for usual support.
>
>
>
> I am now using pmemd.parallel of amber11.
>
> but suffered from sudden abortion of jobs.
>
> The following error messages appeared.
>
>
>
> [mpiexec.rcc101]HYDU_sock_read (./utils/sock/sock.c:223): read errno
> (Input/output error)
>
> [mpiexec.rcc101] control_cb (./pm/pmiserv/pmiserv_cb.c:249): assert
> (!closed) failed
>
> [mpiexec.rcc101] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
>
> [mpiexec.rcc101] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:206): error waiting for event
>
> [mpiexec.rcc101] main (./ui/mpich/mpiexec.c:404): process manager error
> waiting for completion
>
>
>
> Although all tests for parallel compiling were passed,
>
> Pmemd.mpi jobs was frequently aborted.
>
>
>
> I am most grateful if you give some advices to resolve this problem.
>
>
>
> Yours sincerely,
>
>
>
> Ikuo Kurisaki
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 28 2011 - 00:30:03 PDT