AMBER: amber 10 and mpich2 (got eof on console error message from mpich2)

From: Vlad Cojocaru <>
Date: Thu, 17 Jul 2008 15:30:24 +0200

Dear amber users,

Maybe this is not the proper list to ask about this but I tried all
possible archives (mpich2 list as well) and found no answer to this. So,
I try to appeal at your experience with running mpi jobs

As I reported before, I compiled AMBER 10 (including PMEMD) with MPICH2
(intel compilers for both amber and mpich2, no root). I did this on one
node (named 06-01) in a local directory available through the network).
Everything seemed fine and the executables (both sander.MPI and pmemd)
are running nicely (also parallel performance of PMEMD is quite good) so
I was very happy. However, in the beginning I only tested on the node I
compiled 06-01 and on another one 06-02.

When I tried to run on a different node (05-02), I got an error:
mpiexec_node-05-02 (mpiexec 255): no msg recvd from mpd during version check

----------------------------command used
${MPI_HOME}/bin/mpiexec -gdb -machinefile machines -n 4 \
${AMBERHOME}/exe/pmemd -O -i .............

Trying to disect this error, I started playing with the mpi deamons on
this node. I run mpd and mpdtrace for dignostic. To my surprise mpdtrace
did not report the name of the node (as it correctly did previously on
06-01 and 06-02). Instead I got "mpdtrace (mpdtrace 57): got eof on
console". The full error message (shown below) suggests a connection
problem from node-05-02 to itself. However I can do ssh with password
from 05-02 to itsself.

The nodes are AMD Opterons (05-02 is a 2 dual core CPU machine while
06-01 and 06-02 have 4 dual core CPUs). OS=Debian Linux. I should also
say that there are some differences in the kernel between the 05-02 node
and the 06 nodes.

Has anybody seen such a behavior before? If yes and need more details
please let know which details and I will provide them.

Best wishes

--full error message from mpdtrace -----
mpdtrace (mpdtrace 57): got eof on console
node-05-02_59965 (mpd_sockpair 226): connect 110 Connection timed out
node-05-02_59965 (mpd_sockpair 233): connect error with 110 Connection
timed out
node-05-02_59965 (mpd_sockpair 244): connect 22 Invalid argument
node-05-02_59965: mpd_uncaught_except_tb handling:
  socket.error: (22, 'Invalid argument')
245 mpd_sockpair
        raise socket.error, errinfo
802 create_single_mem_ring
        self.lhsSock,self.rhsSock = mpd_sockpair()
848 enter_ring
    /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd 250 run
    /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd 1492 ?

Dr. Vlad Cojocaru
EML Research gGmbH
Schloss-Wolfsbrunnenweg 33
69118 Heidelberg
Tel: ++49-6221-533266
Fax: ++49-6221-533298
EML Research gGmbH
Amtgericht Mannheim / HRB 337446
Managing Partner: Dr. h.c. Klaus Tschira
Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter
The AMBER Mail Reflector
To post, send mail to
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
Received on Sun Jul 20 2008 - 06:07:25 PDT
Custom Search