RE: [AMBER] AMBER on a Mac Mini cluster PoC

From: Ross Walker <ross.rosswalker.co.uk>
Date: Sun, 16 Aug 2009 18:08:47 +0100

Hi Abdul,

This looks to be an issue with your MPI installation rather than a problem
with AMBER per se. Have you tested your openMPI installation? Often such
things come with test suites or you can download them to test things out.

Also I note that the error message, suggests TCP connections which to me
implies you are using ethernet. Is this correct? If so then you are unlikely
to see any benefit running in parallel at least for regular MD runs.
Ensemble runs such as replica exchange might work. Unfortunately these days
ethernet is too slow to be of much help in parallel. Thus you could try
running in parallel with 2 threads on one machine. I.e. just one entry in
your machine file and -np 2. Then run the tests and see how they do. This
will at least test your actual parallel installation.

Then you can try running some simulations yourself (or perhaps using the
benchmarks here http://ambermd.org/amber10.bench1.html) across both nodes so
you can compare performance. My guess is that it will probably actually be
slower using 2 nodes rather than 1 but you might be lucky. It will certainly
be easier to debug MPI problems (often caused by flakey interconnect
hardware) when running a job yourself rather than inside the test
environment.

Good luck,
Ross

> -----Original Message-----
> From: amber-bounces.ambermd.org [mailto:amber-bounces.ambermd.org] On
> Behalf Of Abdul Rehman Gani
> Sent: Sunday, August 16, 2009 8:52 AM
> To: amber.ambermd.org
> Subject: [AMBER] AMBER on a Mac Mini cluster PoC
>
> Hi,
>
> I have installed Amber 10 with OpenMPI 1.3.3 on a 'cluster' of 2 Mac
> mini's (Intel) as a proof of concept. I compiled Amber using gcc and
> gfortran.
>
> I was able to successfully build the serial version and successfully
> ran
> make test. I was also able to successfully build the parallel version,
> but am having some trouble with the test.
>
> Although the 2 mac's are configured using XGrid, I also have
> password-less ssh and OpenMPI is using that. I have not figured how to
> convince Amber to use XGrid (using the DO_PARALLEL environment variable
> perhaps?)
>
> Currently I have compiled Amber and OpenMPI on one mac and copied both
> folders to the other Mac. gfortran is installed on both. I then setup
> the environment:-
>
> AMBERHOME=/User/Shared/amber10
> DO_PARALLEL=mpirun -machinefile /Users/Shared/amber10/test/machinefile
> -np 4
>
> When I ran make test.parallel I get a successful first test
> (RUN.cytosine), but then the second test (RUN.nonper) seems to go on
> forever. I stopped the first run after 14 hours and it successfully
> shut
> down all the sander.MPI processes (2 on each machine). My second run is
> currently at 30:51 (CPU time) for the 2nd test.
>
> This is the output of the test thus far:-
>
> xmini101:test admin$ make test.parallel
> export TESTsander=/Users/Shared/amber10/exe/sander.MPI; make
> test.sander.BASIC
> cd cytosine && ./Run.cytosine
> diffing cytosine.out.save with cytosine.out
> PASSED
> ==============================================================
> cd nonper && ./Run.nonper
> [xmini101.istnet.co.za][[60430,1],1][btl_tcp_endpoint.c:486:mca_btl_tcp
> _endpoint_recv_connect_ack]
> received unexpected process identifier [[60430,1],2]
> [xmini102.istnet.co.za][[60430,1],3][btl_tcp_endpoint.c:486:mca_btl_tcp
> _endpoint_recv_connect_ack]
> received unexpected process identifier [[60430,1],0]
>
> There are currently two sander.MPI processes on each machine and each
> process is consuming close to 100% CPU. Each mac Mini is an Intel Core
> 2
> Duo machine with 1GB RAM and they are connected using GB Ethernet.
> One's
> CPU runs at 2GHz and the other at 1.83Ghz.
>
> Can anyone tell me what to look for to solve this issue?
>
> Thanks,
>
> Abdul
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Aug 19 2009 - 23:11:50 PDT
Custom Search