RE: [AMBER] mairun

From: <wl2290.columbia.edu>
Date: Thu, 08 Jan 2009 17:33:52 -0500

Dear Ross and All,

Thanks for your email help. I have made sure that the PATH is set such
that lamboot can be executed in any directories now:

echo $PATH
--> /usr/bin:/bin:...:/opt/amber10/bin:/opt/amber10/exe

then, /test/make test.parallel.MM < /dev/null --> error below:
export TESTsander=/exe/sander.MPI; make test.sander.BASIC
make[1]: Entering directory `/usr/opt/amber10/test'
cd cytosine && ./Run.cytosine
in.md: Permission denied.
mpirun: cannot start /exe/sander.MPI on n0 (o): No such file or directory

I also tried in test/cytosine:

mpirun -np 4 sander.MPI -O -i in.md -c crd.md.23 -o cytosine.out (no
any output files were generated)

   Unit 6 Error on OPEN: cytosine.out
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 7085 failed on node n0 (127.0.0.1) with exit status 1.
-----------------------------------------------------------------------------

files in amber10/bin:
-rwxr-xr-x 1 root root 296305 2009-01-06 15:22 addles
-rwxr-xr-x 1 root root 62445 2009-01-06 15:12 hboot
lrwxrwxrwx 1 root root 5 2009-01-06 15:13 hcc -> mpicc
lrwxrwxrwx 1 root root 5 2009-01-06 15:13 hcp -> mpiCC
lrwxrwxrwx 1 root root 6 2009-01-06 15:13 hf77 -> mpif77
-rwxr-xr-x 1 root root 194474 2009-01-06 15:12 lamboot
-rwxr-xr-x 1 root root 170819 2009-01-06 15:12 lamcheckpoint
-rwxr-xr-x 1 root root 99990 2009-01-06 15:12 lamclean
-rwxr-xr-x 1 root root 280842 2009-01-06 15:12 lamd
-rwxr-xr-x 1 root root 124152 2009-01-06 15:12 lamexec
-rwxr-xr-x 1 root root 210535 2009-01-06 15:12 lamgrow
-rwxr-xr-x 1 root root 89938 2009-01-06 15:12 lamhalt
-rwxr-xr-x 1 root root 698998 2009-01-06 15:12 laminfo
-rwxr-xr-x 1 root root 94986 2009-01-06 15:12 lamnodes
-rwxr-xr-x 1 root root 170816 2009-01-06 15:12 lamrestart
-rwxr-xr-x 1 root root 99665 2009-01-06 15:12 lamshrink
-rwxr-xr-x 1 root root 99368 2009-01-06 15:12 lamtrace
-rwxr-xr-x 1 root root 194194 2009-01-06 15:12 lamwipe
-rwxr-xr-x 1 root root 316 2009-01-06 15:22 lmodprmtop
-rwxr-xr-x 1 root root 61509 2009-01-06 15:13 mpic++
-rwxr-xr-x 1 root root 61506 2009-01-06 15:13 mpicc
lrwxrwxrwx 1 root root 6 2009-01-06 15:13 mpiCC -> mpic++
-rwxr-xr-x 1 root root 19941 2009-01-06 15:12 mpiexec
-rwxr-xr-x 1 root root 61509 2009-01-06 15:13 mpif77
-rwxr-xr-x 1 root root 118025 2009-01-06 15:12 mpimsg
-rwxr-xr-x 1 root root 228654 2009-01-06 15:12 mpirun
-rwxr-xr-x 1 root root 117102 2009-01-06 15:12 mpitask
-rwxr-xr-x 1 root root 199219 2009-01-06 15:19 ncdump
-rwxr-xr-x 1 root root 189929 2009-01-06 15:12 recon
-rwxr-xr-x 1 root root 5577502 2009-01-06 15:22 sander.LES.MPI
-rwxr-xr-x 1 root root 5500906 2009-01-06 15:22 sander.MPI
-rwxr-xr-x 1 root root 57651 2009-01-06 15:12 tkill
-rwxr-xr-x 1 root root 99073 2009-01-06 15:12 tping
lrwxrwxrwx 1 root root 7 2009-01-06 15:12 wipe -> lamwipe

I was hoping that you could help.

Thank you!
Wen

Quoting Ross Walker <ross.rosswalker.co.uk>:

> Hi Wen
>
>> I am testing parallel programs which have been installed on our linux
>> cluster:
>>
>> ls -l /opt/amber10/bin/mpirun
>> -rwxr-xr-x 1 root root 228654 2009-01-06 15:12 /opt/amber10/bin/mpirun
>> ls -l /opt/amber10/exe/sander.MPI
>> -rwxr-xr-x 1 root root 5500906 2009-01-06 15:22
>> /opt/amber10/exe/sander.MPI
>>
>> Run test/cytosine> /opt/amber10/bin/mpirun -np 4
>> /opt/amber10/exe/sander.MPI -O -i in.md -c crd.md.23 -o cytosine.out
>>
>> --> no lamd running on the host
>>
>> run /opt/amber10/bin/lamboot
>>
>> --> LAM 7.1.3 - Indiana University
>>
>> then run the test again, and got the same message "no lamd running on
>> the host"
>
> This suggests a problem with the configuration on your machine. What does
> the 'run' command you list above actually do? It is running it on your local
> machine yes?
>
> I would try a few simple things to check things.
>
> 1) Check your path and make sure mpirun and lamboot are the correct ones (in
> /opt/...) and not in /usr/bin etc.
>
> You can use: which lamboot
>
> to see what it returns.
>
> If need be add: /opt/amber10/bin/ to the 'beginning' of your path in your
> login files (such as .bashrc)
>
> 2) Check MPI_HOME points to /opt/amber10/
>
> 3) Run 'ps aux' and see if any copies of lamd or lamboot are running and
> kill them if they are.
>
> 4) As a regular user (NOT ROOT since lamboot cannot be run as root) do the
> following:
>
> lamboot
> mpirun -np 2 ls
>
> You should get 2 copies of ls run which will return 2 directory listings. If
> this works then you can try again running an amber simulation.
>
> You could also see if lamboot has a verbose mode you can run it in -
> something like lamboot -v (I don't have lamboot installed on any of my
> machines to check unfortunately).
>
> I suspect though that your problem lies in either the version of lamboot
> that is running not matching the mpirun command (due to path issues) or the
> correct lamboot running but it running a different version of lamd due to
> path and MPI_HOME issues. Then when you run mpirun the lamd quits silently
> and then you are presented with the lamd not running error.
>
> Just a guess - but it should give you some things to try.
>
> All the best
> Ross
>
>
> /\
> \/
> |\oss Walker
>
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> | http://www.rosswalker.co.uk | PGP Key available on request |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Jan 09 2009 - 01:23:34 PST
Custom Search