Dear Ross and All,
Thanks for your email help. I have made sure that the PATH is set such
that lamboot can be executed in any directories now:
echo $PATH
--> /usr/bin:/bin:...:/opt/amber10/bin:/opt/amber10/exe
then, /test/make test.parallel.MM < /dev/null --> error below:
export TESTsander=/exe/sander.MPI; make test.sander.BASIC
make[1]: Entering directory `/usr/opt/amber10/test'
cd cytosine && ./Run.cytosine
in.md: Permission denied.
mpirun: cannot start /exe/sander.MPI on n0 (o): No such file or directory
I also tried in test/cytosine:
mpirun -np 4 sander.MPI -O -i in.md -c crd.md.23 -o cytosine.out (no
any output files were generated)
Unit 6 Error on OPEN: cytosine.out
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 7085 failed on node n0 (127.0.0.1) with exit status 1.
-----------------------------------------------------------------------------
files in amber10/bin:
-rwxr-xr-x 1 root root 296305 2009-01-06 15:22 addles
-rwxr-xr-x 1 root root 62445 2009-01-06 15:12 hboot
lrwxrwxrwx 1 root root 5 2009-01-06 15:13 hcc -> mpicc
lrwxrwxrwx 1 root root 5 2009-01-06 15:13 hcp -> mpiCC
lrwxrwxrwx 1 root root 6 2009-01-06 15:13 hf77 -> mpif77
-rwxr-xr-x 1 root root 194474 2009-01-06 15:12 lamboot
-rwxr-xr-x 1 root root 170819 2009-01-06 15:12 lamcheckpoint
-rwxr-xr-x 1 root root 99990 2009-01-06 15:12 lamclean
-rwxr-xr-x 1 root root 280842 2009-01-06 15:12 lamd
-rwxr-xr-x 1 root root 124152 2009-01-06 15:12 lamexec
-rwxr-xr-x 1 root root 210535 2009-01-06 15:12 lamgrow
-rwxr-xr-x 1 root root 89938 2009-01-06 15:12 lamhalt
-rwxr-xr-x 1 root root 698998 2009-01-06 15:12 laminfo
-rwxr-xr-x 1 root root 94986 2009-01-06 15:12 lamnodes
-rwxr-xr-x 1 root root 170816 2009-01-06 15:12 lamrestart
-rwxr-xr-x 1 root root 99665 2009-01-06 15:12 lamshrink
-rwxr-xr-x 1 root root 99368 2009-01-06 15:12 lamtrace
-rwxr-xr-x 1 root root 194194 2009-01-06 15:12 lamwipe
-rwxr-xr-x 1 root root 316 2009-01-06 15:22 lmodprmtop
-rwxr-xr-x 1 root root 61509 2009-01-06 15:13 mpic++
-rwxr-xr-x 1 root root 61506 2009-01-06 15:13 mpicc
lrwxrwxrwx 1 root root 6 2009-01-06 15:13 mpiCC -> mpic++
-rwxr-xr-x 1 root root 19941 2009-01-06 15:12 mpiexec
-rwxr-xr-x 1 root root 61509 2009-01-06 15:13 mpif77
-rwxr-xr-x 1 root root 118025 2009-01-06 15:12 mpimsg
-rwxr-xr-x 1 root root 228654 2009-01-06 15:12 mpirun
-rwxr-xr-x 1 root root 117102 2009-01-06 15:12 mpitask
-rwxr-xr-x 1 root root 199219 2009-01-06 15:19 ncdump
-rwxr-xr-x 1 root root 189929 2009-01-06 15:12 recon
-rwxr-xr-x 1 root root 5577502 2009-01-06 15:22 sander.LES.MPI
-rwxr-xr-x 1 root root 5500906 2009-01-06 15:22 sander.MPI
-rwxr-xr-x 1 root root 57651 2009-01-06 15:12 tkill
-rwxr-xr-x 1 root root 99073 2009-01-06 15:12 tping
lrwxrwxrwx 1 root root 7 2009-01-06 15:12 wipe -> lamwipe
I was hoping that you could help.
Thank you!
Wen
Quoting Ross Walker <ross.rosswalker.co.uk>:
> Hi Wen
>
>> I am testing parallel programs which have been installed on our linux
>> cluster:
>>
>> ls -l /opt/amber10/bin/mpirun
>> -rwxr-xr-x 1 root root 228654 2009-01-06 15:12 /opt/amber10/bin/mpirun
>> ls -l /opt/amber10/exe/sander.MPI
>> -rwxr-xr-x 1 root root 5500906 2009-01-06 15:22
>> /opt/amber10/exe/sander.MPI
>>
>> Run test/cytosine> /opt/amber10/bin/mpirun -np 4
>> /opt/amber10/exe/sander.MPI -O -i in.md -c crd.md.23 -o cytosine.out
>>
>> --> no lamd running on the host
>>
>> run /opt/amber10/bin/lamboot
>>
>> --> LAM 7.1.3 - Indiana University
>>
>> then run the test again, and got the same message "no lamd running on
>> the host"
>
> This suggests a problem with the configuration on your machine. What does
> the 'run' command you list above actually do? It is running it on your local
> machine yes?
>
> I would try a few simple things to check things.
>
> 1) Check your path and make sure mpirun and lamboot are the correct ones (in
> /opt/...) and not in /usr/bin etc.
>
> You can use: which lamboot
>
> to see what it returns.
>
> If need be add: /opt/amber10/bin/ to the 'beginning' of your path in your
> login files (such as .bashrc)
>
> 2) Check MPI_HOME points to /opt/amber10/
>
> 3) Run 'ps aux' and see if any copies of lamd or lamboot are running and
> kill them if they are.
>
> 4) As a regular user (NOT ROOT since lamboot cannot be run as root) do the
> following:
>
> lamboot
> mpirun -np 2 ls
>
> You should get 2 copies of ls run which will return 2 directory listings. If
> this works then you can try again running an amber simulation.
>
> You could also see if lamboot has a verbose mode you can run it in -
> something like lamboot -v (I don't have lamboot installed on any of my
> machines to check unfortunately).
>
> I suspect though that your problem lies in either the version of lamboot
> that is running not matching the mpirun command (due to path issues) or the
> correct lamboot running but it running a different version of lamd due to
> path and MPI_HOME issues. Then when you run mpirun the lamd quits silently
> and then you are presented with the lamd not running error.
>
> Just a guess - but it should give you some things to try.
>
> All the best
> Ross
>
>
> /\
> \/
> |\oss Walker
>
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> | http://www.rosswalker.co.uk | PGP Key available on request |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Jan 09 2009 - 01:23:34 PST