Re: [AMBER] mpi with openmpi

From: David Watson <dewatson.olemiss.edu>
Date: Mon, 26 Jul 2010 22:04:50 -0500

On Jul 26, 2010, at 9:38 PM, Yan Gao wrote:

> Dear David,
>
> You are right. Somehow the terminal I used can not log onto the node w/o
> passphrase. But I did use ssh localhost to test before qsub, and saw it
> worked.

If you have a passphrase, then you will need to use ssh-add and then ssh-agent before you try to execute your script.

Try the following:
  ssh-add
  echo `basename $SHELL`

If you see:
  "bash" then type:
    eval `ssh-agent -s`
  "csh" or "tcsh" then type:
    eval `ssh-agent -c`

NOTE: this is a 'one-time only' type of authentication. You can use this to authenticate for only one instance of any terminal session.
So, for instance, if you create a new shell in another window, you will have to do this all over again.

If you have a public-key authentication mechanism set up, you might try to remove the passphrase requirement, although this will sabotage your security.
I will let you look this up on your own. (google: remove ssh passphrase dsa)

> <snip>
> Also the exec /usr/bin/ssh-agent $SHELL and ssh-add, seems can work for one
> time only. I have to re-do it next time I open a new terminal. Am I doing
> wrong? Thanks.
>

No, you are doing it right, but maybe it's not doing what you expected.

> ps. just as I am going to send this message, I found the "permission deny"
> error returned in another run.... seems something unstable?? I ssh the node
> afterwards and it worked w/o passphrase. Any idea? I have used the full
> paths.
>
> Regards,
> Yan
>

You have to use the full paths because you haven't set up your environment appropriately for your shell.
You seem to be using some type of PBS script, and that's fine.

I intentionally asked you to do what I did because I felt you might be having this type of error.
You need to edit your ~/.cshrc or ~/.bashrc (depending upon whether you are using csh or bash) so that when you execute:
   ssh REMOTENODE.YANGOASCOMPUTER.COM which mpirun
you see /home/y1gao/soft/openmpi-1.4.2/bin/mpirun (or something very similar, and by the way change REMOTENODE.YANGAOSCOMPUTER.COM in the example to the actual name or IP address of the remote node you are testing.).

You'll know something is wrong if you get something like "/usr/local/openmpi/bin/mpirun" or "/usr/bin/mpirun" in your specific case, since you have compiled openmpi already.

> On Mon, Jul 26, 2010 at 6:47 PM, David Watson <dewatson.olemiss.edu> wrote:
>
>> First, make sure that you actually CAN log in w/o a passphrase:
>> ssh localhost
>>
>> If so, everything is fine, if not, you should make sure that the
>> permissions are correct on your ~/.ssh and home directories, and that
>> permissions are correct on your ~/.ssh/authorized_keys file and perhaps
>> every other file under ~/.ssh
>>
>> Another gotcha is that if you set a passphrase, then you must use ssh-agent
>> in order to facilitate logging in w/o a password (see the manual pages for
>> ssh-agent for examples).
>>
>> Otherwise, if everything is fine then try the following:
>> You may need to specify the full path to your openmpi executable (e.g.
>> /path/to/mpirun) and also to sander.MPI (e.g. /path/to/amber/sander.MPI) in
>> order for things to work correctly.
>>
>> Good luck
>>
>> On Jul 26, 2010, at 8:26 PM, Yan Gao wrote:
>>
>>> Hi there,
>>>
>>> I tried to run amber with openmpi on a unix system.
>>> I got below errors when I did a trial:
>>>
>>>
>>>
>> *********************************************************************************************************************
>>> Permission denied, please try again.
>>> Permission denied, please try again.
>>> Permission denied (publickey,gssapi-with-mic,password).
>>>
>> --------------------------------------------------------------------------
>>> A daemon (pid 17525) died unexpectedly with status 129 while attempting
>>> to launch so we are aborting.
>>>
>>> There may be more information reported by the environment (see above).
>>>
>>> This may be because the daemon was unable to find all the needed shared
>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have
>> the
>>> location of the shared libraries on the remote nodes and this will
>>> automatically be forwarded to the remote nodes.
>>>
>> --------------------------------------------------------------------------
>>>
>> --------------------------------------------------------------------------
>>> mpirun noticed that the job aborted, but has no info as to the process
>>> that caused that situation.
>>>
>> --------------------------------------------------------------------------
>>> mpirun: clean termination accomplished
>>>
>>> *****************************************************this is in a
>> separate
>>> output file************************************************************
>>> -catch_rsh
>>>
>> /opt/gridengine/default/spool/compute-0-19/active_jobs/426880.1/pe_hostfile
>>> compute-0-19
>>> compute-0-19
>>> compute-0-18
>>> compute-0-18
>>> Warning: no access to tty (Bad file descriptor).
>>> Thus no job control in this shell.
>>>
>> ######################################################################################################
>>> /home/y1gao/soft/openmpi-1.4.2/bin/mpirun
>>> libopen-rte.so.0 => /home/y1gao/soft/openmpi-1.4.2/lib/libopen-rte.so.0
>>> (0x40001000)
>>> libopen-pal.so.0 => /home/y1gao/soft/openmpi-1.4.2/lib/libopen-pal.so.0
>>> (0x40078000)
>>> libnuma.so.1 => /usr/lib/libnuma.so.1 (0x0077c000)
>>> libdl.so.2 => /lib/libdl.so.2 (0x400d7000)
>>> libnsl.so.1 => /lib/libnsl.so.1 (0x0080e000)
>>> libutil.so.1 => /lib/libutil.so.1 (0x007c3000)
>>> libm.so.6 => /lib/tls/libm.so.6 (0x00782000)
>>> libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x007b9000)
>>> libpthread.so.0 => /lib/tls/libpthread.so.0 (0x0089a000)
>>> libc.so.6 => /lib/tls/libc.so.6 (0x0064f000)
>>> libimf.so => /nas/y1gao/soft/intel-11.1.072/lib/ia32/libimf.so
>>> (0x400dc000)
>>> libsvml.so => /nas/y1gao/soft/intel-11.1.072/lib/ia32/libsvml.so
>>> (0x40341000)
>>> libintlc.so.5 => /nas/y1gao/soft/intel-11.1.072/lib/ia32/libintlc.so.5
>>> (0x4046c000)
>>> /lib/ld-linux.so.2 (0x00631000)
>>> /nas/y1gao/soft/amber10/exe/sander.MPI
>>> libsvml.so => /nas/y1gao/soft/intel-11.1.072/lib/ia32/libsvml.so
>>> (0x40001000)
>>> libmpi_f90.so.0 => /home/y1gao/soft/openmpi-1.4.2/lib/libmpi_f90.so.0
>>> (0x4012b000)
>>> libmpi_f77.so.0 => /home/y1gao/soft/openmpi-1.4.2/lib/libmpi_f77.so.0
>>> (0x4012e000)
>>> libmpi.so.0 => /home/y1gao/soft/openmpi-1.4.2/lib/libmpi.so.0
>>> (0x40154000)
>>> libopen-rte.so.0 => /home/y1gao/soft/openmpi-1.4.2/lib/libopen-rte.so.0
>>> (0x40305000)
>>> libopen-pal.so.0 => /home/y1gao/soft/openmpi-1.4.2/lib/libopen-pal.so.0
>>> (0x4037d000)
>>> libnuma.so.1 => /usr/lib/libnuma.so.1 (0x0077c000)
>>> libdl.so.2 => /lib/libdl.so.2 (0x403dc000)
>>> libnsl.so.1 => /lib/libnsl.so.1 (0x0080e000)
>>> libutil.so.1 => /lib/libutil.so.1 (0x007c3000)
>>> libm.so.6 => /lib/tls/libm.so.6 (0x00782000)
>>> libpthread.so.0 => /lib/tls/libpthread.so.0 (0x0089a000)
>>> libc.so.6 => /lib/tls/libc.so.6 (0x0064f000)
>>> libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x007b9000)
>>> libifport.so.5 => /nas/y1gao/soft/intel-11.1.072/lib/ia32/libifport.so.5
>>> (0x403e1000)
>>> libifcoremt.so.5 =>
>>> /nas/y1gao/soft/intel-11.1.072/lib/ia32/libifcoremt.so.5 (0x40401000)
>>> libimf.so => /nas/y1gao/soft/intel-11.1.072/lib/ia32/libimf.so
>>> (0x40511000)
>>> libintlc.so.5 => /nas/y1gao/soft/intel-11.1.072/lib/ia32/libintlc.so.5
>>> (0x40776000)
>>> /lib/ld-linux.so.2 (0x00631000)
>>>
>> ######################################################################################################
>>>
>>>
>>>
>> *****************************************************************************************************************
>>>
>>> I then google "*Permission denied (publickey,gssapi-with-mic,password)*",
>>> and setup the passphrase. So I can automatically log onto a node without
>>> inputting the password/passphrase manually.
>>> Then I tried again with mpi, and got the same output. I am kind of stuck
>>> here, could anyone help me. Thanks!
>>>
>>> Regards,
>>> --
>>> Yan Gao
>>> Jacobs School of Engineering
>>> University of California, San Diego
>>> Tel: 858-952-2308
>>> Email: Yan.Gao.2001.gmail.com
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>
> --
> Yan Gao
> Jacobs School of Engineering
> University of California, San Diego
> Tel: 858-952-2308
> Email: Yan.Gao.2001.gmail.com
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jul 26 2010 - 20:30:03 PDT
Custom Search