Re: [AMBER] AMBER on a Mac Mini cluster PoC

From: Alan <alanwilter.gmail.com>
Date: Sun, 16 Aug 2009 23:30:19 +0100

Just to let you know OpenMpi 1.3.x series is broken for Xgrid. But this
doesn't imply the issues you're having.
Alan

On Sun, Aug 16, 2009 at 18:08, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Abdul,
>
> This looks to be an issue with your MPI installation rather than a problem
> with AMBER per se. Have you tested your openMPI installation? Often such
> things come with test suites or you can download them to test things out.
>
> Also I note that the error message, suggests TCP connections which to me
> implies you are using ethernet. Is this correct? If so then you are
> unlikely
> to see any benefit running in parallel at least for regular MD runs.
> Ensemble runs such as replica exchange might work. Unfortunately these days
> ethernet is too slow to be of much help in parallel. Thus you could try
> running in parallel with 2 threads on one machine. I.e. just one entry in
> your machine file and -np 2. Then run the tests and see how they do. This
> will at least test your actual parallel installation.
>
> Then you can try running some simulations yourself (or perhaps using the
> benchmarks here http://ambermd.org/amber10.bench1.html) across both nodes
> so
> you can compare performance. My guess is that it will probably actually be
> slower using 2 nodes rather than 1 but you might be lucky. It will
> certainly
> be easier to debug MPI problems (often caused by flakey interconnect
> hardware) when running a job yourself rather than inside the test
> environment.
>
> Good luck,
> Ross
>
> > -----Original Message-----
> > From: amber-bounces.ambermd.org [mailto:amber-bounces.ambermd.org] On
> > Behalf Of Abdul Rehman Gani
> > Sent: Sunday, August 16, 2009 8:52 AM
> > To: amber.ambermd.org
> > Subject: [AMBER] AMBER on a Mac Mini cluster PoC
> >
> > Hi,
> >
> > I have installed Amber 10 with OpenMPI 1.3.3 on a 'cluster' of 2 Mac
> > mini's (Intel) as a proof of concept. I compiled Amber using gcc and
> > gfortran.
> >
> > I was able to successfully build the serial version and successfully
> > ran
> > make test. I was also able to successfully build the parallel version,
> > but am having some trouble with the test.
> >
> > Although the 2 mac's are configured using XGrid, I also have
> > password-less ssh and OpenMPI is using that. I have not figured how to
> > convince Amber to use XGrid (using the DO_PARALLEL environment variable
> > perhaps?)
> >
> > Currently I have compiled Amber and OpenMPI on one mac and copied both
> > folders to the other Mac. gfortran is installed on both. I then setup
> > the environment:-
> >
> > AMBERHOME=/User/Shared/amber10
> > DO_PARALLEL=mpirun -machinefile /Users/Shared/amber10/test/machinefile
> > -np 4
> >
> > When I ran make test.parallel I get a successful first test
> > (RUN.cytosine), but then the second test (RUN.nonper) seems to go on
> > forever. I stopped the first run after 14 hours and it successfully
> > shut
> > down all the sander.MPI processes (2 on each machine). My second run is
> > currently at 30:51 (CPU time) for the 2nd test.
> >
> > This is the output of the test thus far:-
> >
> > xmini101:test admin$ make test.parallel
> > export TESTsander=/Users/Shared/amber10/exe/sander.MPI; make
> > test.sander.BASIC
> > cd cytosine && ./Run.cytosine
> > diffing cytosine.out.save with cytosine.out
> > PASSED
> > ==============================================================
> > cd nonper && ./Run.nonper
> > [xmini101.istnet.co.za][[60430,1],1][btl_tcp_endpoint.c:486:mca_btl_tcp
> > _endpoint_recv_connect_ack]
> > received unexpected process identifier [[60430,1],2]
> > [xmini102.istnet.co.za][[60430,1],3][btl_tcp_endpoint.c:486:mca_btl_tcp
> > _endpoint_recv_connect_ack]
> > received unexpected process identifier [[60430,1],0]
> >
> > There are currently two sander.MPI processes on each machine and each
> > process is consuming close to 100% CPU. Each mac Mini is an Intel Core
> > 2
> > Duo machine with 1GB RAM and they are connected using GB Ethernet.
> > One's
> > CPU runs at 2GHz and the other at 1.83Ghz.
> >
> > Can anyone tell me what to look for to solve this issue?
> >
> > Thanks,
> >
> > Abdul
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



--
Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate
Department of Biochemistry, University of Cambridge.
80 Tennis Court Road, Cambridge CB2 1GA, UK.
>>http://www.bio.cam.ac.uk/~awd28<<
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Aug 19 2009 - 23:13:37 PDT
Custom Search