Hi Aragorn,

> I would like a bit of advice. I have compiled pmemd for a researcher
> here at Wayne State. When we run it it dies at varying points. We get
> various errors from mpirun such as:
> p8_15722: (1054.988281) net_recv failed for fd = 9
> p8_15722: p4_error: net_recv read, errno = : 110
> p14_21538: p4_error: Found a dead connection while looking for
> messages: 9

This looks like a hardware issue to me. Are you certain your interconnect is all good and working properly. PMEMD can stress the interconnect far more than most codes so if you have flakey cables etc it may not show up in other runs, especially if they use blocking MPI instead of the nonblocking MPI that pmemd uses.

I would suggest getting hold of a stress test library for your MPI implementation and running this on the machine to see if you have issues with the hardware. This can often show up as low bandwidth for large size messages in mpi bandwidth tests. I have seen this A LOT on infiniband which can be VERY sensitive to flakey cable connections.

You should also check you are linking to the correct MPI libraries etc but I suspect hardware more than anything else.

Make sure you have applied all the latest bugfixes as well.

All the best

