Dear Ross,
Thank you for your quick response! After some additional
diagnostics, I discovered I could not run simple test cases either. I
emailed MPICH2 about this problem and was told that there was a bug in
version 1.0.5p2 (see thread below). I upgraded to version 1.0.5p4 and
the issue is now resolved. I wanted to pass along the information in
case any other users were having problems.
Sincerely,
Terry
//////////////////////////////////////////////
Are you using the latest patch release 1.0.5p4? It has a fix in it for a
problem with machinefile. See if it helps.
Rajeev
> -----Original Message-----
> From: Terry Lang [mailto:terry.lego.berkeley.edu]
> Sent: Monday, April 23, 2007 1:25 PM
> To: mpich2-maint.mcs.anl.gov
> Cc: mpich2-maint.mcs.anl.gov
> Subject: [MPICH2 Req #3371] problems with machinefile
>
> Dear MPICH2 Support Desk,
>
> I am trying to run software using mpd with a
> machinefile. I have
> run the following diagnostics:
>
> % mpiexec -n 2 /programs/mpich2/examples/cpi
>
> With the results:
>
> Process 0 of 2 is on cyclops
> Process 1 of 2 is on sabretooth
> pi is approximately 3.1415926544231265, Error is 0.0000000008333334
> wall clock time = 0.001908
> %
>
> However, when I run this command:
>
> % mpiexec -machinefile machines -n 2 /programs/mpich2/examples/cpi
>
> I get two lines of output....
>
> Process 0 of 2 is on cyclops
> Process 1 of 2 is on rogue
>
> ...and then the process hangs indefinitely. Here are the
> details of the
> system I am using:
>
> Intel compiler version 9.1
> MPICH2-1.0.5p2
> Debian OS on i686 Linux boxes
>
> Any insight would be greatly appreciated!
>
> Sincerely,
> Terry
>
>
Ross Walker wrote:
> Hi Terry,
>
>
>
>> mpiexec -machinefile <my_path>/machines -np 4 sander.MPI -O
>> -i run_gb.in
>> -o run_gb.out -p ptpb_gb.prm \
>> -c run_gb.rst -r run_gb.rst -x run_gb.crd < /dev/null
>>
>> but, when launched, the job just hangs. I am able to launch the job
>> successfully onto 4 processors using the command without the machine
>> file option. Has anyone had a similar experience?
>>
>
>
>> MPICH2-1.0.5p2
>> Debian OS on i686 Linux boxes
>>
>
> I suspect that this may be a permission issue with your system.
>
> Can you run any simple mpich2 test cases and/or do other mpi jobs run? If
> you rsh or ssh to one of the nodes specified in the machine file can you do
> so without a password? Perhaps this part of the operation is just hanging. I
> assume you don't see any output from sander.MPI which imply it never even
> gets started.
>
> The reason it probably works on 4 local processors is that the mpiexec
> command doesn't have to rsh/ssh to another node to fire up the necessary
> executables.
>
> I would go back to the mpich2 setup and make sure you can run some test jobs
> correctly and have setup your system as specified in the installation
> instructions for mpich2.
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> | HPC Consultant and Staff Scientist |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> | http://www.rosswalker.co.uk | PGP Key available on request |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>
--
P. Therese Lang
Post Doc
Alber Lab, UC Berkeley
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Wed Apr 25 2007 - 06:07:38 PDT