Re: [AMBER] mpi with openmpi

From: Yan Gao <yan.gao.2001.gmail.com>
Date: Mon, 26 Jul 2010 19:38:35 -0700

Dear David,

You are right. Somehow the terminal I used can not log onto the node w/o
passphrase. But I did use ssh localhost to test before qsub, and saw it
worked.
Anyway, So I re-did the command: exec /usr/bin/ssh-agent $SHELL and ssh-add,

Now the "Permission denied (publickey,gssapi-with-mic,password)."
disappeared, but I still got new errors and warnings:

*error: executing task of job 426882 failed: failed sending task to
execd.compute-1-0.local: can't find connection*
--------------------------------------------------------------------------
A daemon (pid 18123) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished

-catch_rsh
/opt/gridengine/default/spool/compute-2-27/active_jobs/426882.1/pe_hostfile
compute-2-27
compute-2-27
compute-1-0
compute-1-0
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.

*********************************************************************
Also the exec /usr/bin/ssh-agent $SHELL and ssh-add, seems can work for one
time only. I have to re-do it next time I open a new terminal. Am I doing
wrong? Thanks.

ps. just as I am going to send this message, I found the "permission deny"
error returned in another run.... seems something unstable?? I ssh the node
afterwards and it worked w/o passphrase. Any idea? I have used the full
paths.

Regards,
Yan

On Mon, Jul 26, 2010 at 6:47 PM, David Watson <dewatson.olemiss.edu> wrote:

> First, make sure that you actually CAN log in w/o a passphrase:
> ssh localhost
>
> If so, everything is fine, if not, you should make sure that the
> permissions are correct on your ~/.ssh and home directories, and that
> permissions are correct on your ~/.ssh/authorized_keys file and perhaps
> every other file under ~/.ssh
>
> Another gotcha is that if you set a passphrase, then you must use ssh-agent
> in order to facilitate logging in w/o a password (see the manual pages for
> ssh-agent for examples).
>
> Otherwise, if everything is fine then try the following:
> You may need to specify the full path to your openmpi executable (e.g.
> /path/to/mpirun) and also to sander.MPI (e.g. /path/to/amber/sander.MPI) in
> order for things to work correctly.
>
> Good luck
>
> On Jul 26, 2010, at 8:26 PM, Yan Gao wrote:
>
> > Hi there,
> >
> > I tried to run amber with openmpi on a unix system.
> > I got below errors when I did a trial:
> >
> >
> >
> *********************************************************************************************************************
> > Permission denied, please try again.
> > Permission denied, please try again.
> > Permission denied (publickey,gssapi-with-mic,password).
> >
> --------------------------------------------------------------------------
> > A daemon (pid 17525) died unexpectedly with status 129 while attempting
> > to launch so we are aborting.
> >
> > There may be more information reported by the environment (see above).
> >
> > This may be because the daemon was unable to find all the needed shared
> > libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
> > location of the shared libraries on the remote nodes and this will
> > automatically be forwarded to the remote nodes.
> >
> --------------------------------------------------------------------------
> >
> --------------------------------------------------------------------------
> > mpirun noticed that the job aborted, but has no info as to the process
> > that caused that situation.
> >
> --------------------------------------------------------------------------
> > mpirun: clean termination accomplished
> >
> > *****************************************************this is in a
> separate
> > output file************************************************************
> > -catch_rsh
> >
> /opt/gridengine/default/spool/compute-0-19/active_jobs/426880.1/pe_hostfile
> > compute-0-19
> > compute-0-19
> > compute-0-18
> > compute-0-18
> > Warning: no access to tty (Bad file descriptor).
> > Thus no job control in this shell.
> >
> ######################################################################################################
> > /home/y1gao/soft/openmpi-1.4.2/bin/mpirun
> > libopen-rte.so.0 => /home/y1gao/soft/openmpi-1.4.2/lib/libopen-rte.so.0
> > (0x40001000)
> > libopen-pal.so.0 => /home/y1gao/soft/openmpi-1.4.2/lib/libopen-pal.so.0
> > (0x40078000)
> > libnuma.so.1 => /usr/lib/libnuma.so.1 (0x0077c000)
> > libdl.so.2 => /lib/libdl.so.2 (0x400d7000)
> > libnsl.so.1 => /lib/libnsl.so.1 (0x0080e000)
> > libutil.so.1 => /lib/libutil.so.1 (0x007c3000)
> > libm.so.6 => /lib/tls/libm.so.6 (0x00782000)
> > libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x007b9000)
> > libpthread.so.0 => /lib/tls/libpthread.so.0 (0x0089a000)
> > libc.so.6 => /lib/tls/libc.so.6 (0x0064f000)
> > libimf.so => /nas/y1gao/soft/intel-11.1.072/lib/ia32/libimf.so
> > (0x400dc000)
> > libsvml.so => /nas/y1gao/soft/intel-11.1.072/lib/ia32/libsvml.so
> > (0x40341000)
> > libintlc.so.5 => /nas/y1gao/soft/intel-11.1.072/lib/ia32/libintlc.so.5
> > (0x4046c000)
> > /lib/ld-linux.so.2 (0x00631000)
> > /nas/y1gao/soft/amber10/exe/sander.MPI
> > libsvml.so => /nas/y1gao/soft/intel-11.1.072/lib/ia32/libsvml.so
> > (0x40001000)
> > libmpi_f90.so.0 => /home/y1gao/soft/openmpi-1.4.2/lib/libmpi_f90.so.0
> > (0x4012b000)
> > libmpi_f77.so.0 => /home/y1gao/soft/openmpi-1.4.2/lib/libmpi_f77.so.0
> > (0x4012e000)
> > libmpi.so.0 => /home/y1gao/soft/openmpi-1.4.2/lib/libmpi.so.0
> > (0x40154000)
> > libopen-rte.so.0 => /home/y1gao/soft/openmpi-1.4.2/lib/libopen-rte.so.0
> > (0x40305000)
> > libopen-pal.so.0 => /home/y1gao/soft/openmpi-1.4.2/lib/libopen-pal.so.0
> > (0x4037d000)
> > libnuma.so.1 => /usr/lib/libnuma.so.1 (0x0077c000)
> > libdl.so.2 => /lib/libdl.so.2 (0x403dc000)
> > libnsl.so.1 => /lib/libnsl.so.1 (0x0080e000)
> > libutil.so.1 => /lib/libutil.so.1 (0x007c3000)
> > libm.so.6 => /lib/tls/libm.so.6 (0x00782000)
> > libpthread.so.0 => /lib/tls/libpthread.so.0 (0x0089a000)
> > libc.so.6 => /lib/tls/libc.so.6 (0x0064f000)
> > libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x007b9000)
> > libifport.so.5 => /nas/y1gao/soft/intel-11.1.072/lib/ia32/libifport.so.5
> > (0x403e1000)
> > libifcoremt.so.5 =>
> > /nas/y1gao/soft/intel-11.1.072/lib/ia32/libifcoremt.so.5 (0x40401000)
> > libimf.so => /nas/y1gao/soft/intel-11.1.072/lib/ia32/libimf.so
> > (0x40511000)
> > libintlc.so.5 => /nas/y1gao/soft/intel-11.1.072/lib/ia32/libintlc.so.5
> > (0x40776000)
> > /lib/ld-linux.so.2 (0x00631000)
> >
> ######################################################################################################
> >
> >
> >
> *****************************************************************************************************************
> >
> > I then google "*Permission denied (publickey,gssapi-with-mic,password)*",
> > and setup the passphrase. So I can automatically log onto a node without
> > inputting the password/passphrase manually.
> > Then I tried again with mpi, and got the same output. I am kind of stuck
> > here, could anyone help me. Thanks!
> >
> > Regards,
> > --
> > Yan Gao
> > Jacobs School of Engineering
> > University of California, San Diego
> > Tel: 858-952-2308
> > Email: Yan.Gao.2001.gmail.com
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
Yan Gao
Jacobs School of Engineering
University of California, San Diego
Tel: 858-952-2308
Email: Yan.Gao.2001.gmail.com
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jul 26 2010 - 20:00:03 PDT
Custom Search