A small correction ...
I can do ssh without password from 05-02 to itself (in my prev. email by
mistake I wrote with password instead of without password)
Sorry for this
vlad
Vlad Cojocaru wrote:
> Dear amber users,
>
> Maybe this is not the proper list to ask about this but I tried all
> possible archives (mpich2 list as well) and found no answer to this.
> So, I try to appeal at your experience with running mpi jobs
>
> As I reported before, I compiled AMBER 10 (including PMEMD) with
> MPICH2 (intel compilers for both amber and mpich2, no root). I did
> this on one node (named 06-01) in a local directory available through
> the network). Everything seemed fine and the executables (both
> sander.MPI and pmemd) are running nicely (also parallel performance of
> PMEMD is quite good) so I was very happy. However, in the beginning I
> only tested on the node I compiled 06-01 and on another one 06-02.
>
> When I tried to run on a different node (05-02), I got an error:
> mpiexec_node-05-02 (mpiexec 255): no msg recvd from mpd during version
> check
>
> ----------------------------command used
> ---------------------------------------------------------------------------------------------
>
> ${MPI_HOME}/bin/mpiexec -gdb -machinefile machines -n 4 \
> ${AMBERHOME}/exe/pmemd -O -i .............
> ------------------------------------------------------------------------------------------------------------------------------------------
>
>
> Trying to disect this error, I started playing with the mpi deamons on
> this node. I run mpd and mpdtrace for dignostic. To my surprise
> mpdtrace did not report the name of the node (as it correctly did
> previously on 06-01 and 06-02). Instead I got "mpdtrace (mpdtrace
> 57): got eof on console". The full error message (shown below)
> suggests a connection problem from node-05-02 to itself. However I can
> do ssh with password from 05-02 to itsself.
>
> The nodes are AMD Opterons (05-02 is a 2 dual core CPU machine while
> 06-01 and 06-02 have 4 dual core CPUs). OS=Debian Linux. I should also
> say that there are some differences in the kernel between the 05-02
> node and the 06 nodes.
>
> Has anybody seen such a behavior before? If yes and need more details
> please let know which details and I will provide them.
>
> Best wishes
> vlad
>
> --full error message from mpdtrace -----
> mpdtrace (mpdtrace 57): got eof on console
> node-05-02_59965 (mpd_sockpair 226): connect 110 Connection timed out
> node-05-02_59965 (mpd_sockpair 233): connect error with 110 Connection
> timed out
> node-05-02_59965 (mpd_sockpair 244): connect 22 Invalid argument
> node-05-02_59965: mpd_uncaught_except_tb handling:
> socket.error: (22, 'Invalid argument')
>
> /scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin/mpdlib.py
> 245 mpd_sockpair
> raise socket.error, errinfo
>
> /scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin/mpdlib.py
> 802 create_single_mem_ring
> self.lhsSock,self.rhsSock = mpd_sockpair()
>
> /scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin/mpdlib.py
> 848 enter_ring
> rhsHandler=rhsHandler)
> /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd 250 run
> rhsHandler=self.handle_rhs_input)
> /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd 1492 ?
>
> mpd.run()
>
>
--
----------------------------------------------------------------------------
Dr. Vlad Cojocaru
EML Research gGmbH
Schloss-Wolfsbrunnenweg 33
69118 Heidelberg
Tel: ++49-6221-533266
Fax: ++49-6221-533298
e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de
http://projects.villa-bosch.de/mcm/people/cojocaru/
----------------------------------------------------------------------------
EML Research gGmbH
Amtgericht Mannheim / HRB 337446
Managing Partner: Dr. h.c. Klaus Tschira
Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter
http://www.eml-r.org
----------------------------------------------------------------------------
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
to majordomo.scripps.edu
Received on Sun Jul 20 2008 - 06:07:25 PDT