[AMBER] AMBER on a Mac Mini cluster PoC

From: Abdul Rehman Gani <amber.infostream.co.za>
Date: Sun, 16 Aug 2009 16:51:44 +0100

Hi,

I have installed Amber 10 with OpenMPI 1.3.3 on a 'cluster' of 2 Mac
mini's (Intel) as a proof of concept. I compiled Amber using gcc and
gfortran.

I was able to successfully build the serial version and successfully ran
make test. I was also able to successfully build the parallel version,
but am having some trouble with the test.

Although the 2 mac's are configured using XGrid, I also have
password-less ssh and OpenMPI is using that. I have not figured how to
convince Amber to use XGrid (using the DO_PARALLEL environment variable
perhaps?)

Currently I have compiled Amber and OpenMPI on one mac and copied both
folders to the other Mac. gfortran is installed on both. I then setup
the environment:-

AMBERHOME=/User/Shared/amber10
DO_PARALLEL=mpirun -machinefile /Users/Shared/amber10/test/machinefile -np 4

When I ran make test.parallel I get a successful first test
(RUN.cytosine), but then the second test (RUN.nonper) seems to go on
forever. I stopped the first run after 14 hours and it successfully shut
down all the sander.MPI processes (2 on each machine). My second run is
currently at 30:51 (CPU time) for the 2nd test.

This is the output of the test thus far:-

xmini101:test admin$ make test.parallel
export TESTsander=/Users/Shared/amber10/exe/sander.MPI; make
test.sander.BASIC
cd cytosine && ./Run.cytosine
diffing cytosine.out.save with cytosine.out
PASSED
==============================================================
cd nonper && ./Run.nonper
[xmini101.istnet.co.za][[60430,1],1][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack]
received unexpected process identifier [[60430,1],2]
[xmini102.istnet.co.za][[60430,1],3][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack]
received unexpected process identifier [[60430,1],0]

There are currently two sander.MPI processes on each machine and each
process is consuming close to 100% CPU. Each mac Mini is an Intel Core 2
Duo machine with 1GB RAM and they are connected using GB Ethernet. One's
CPU runs at 2GHz and the other at 1.83Ghz.

Can anyone tell me what to look for to solve this issue?

Thanks,

Abdul

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Aug 19 2009 - 23:11:23 PDT
Custom Search