Thanks very much David. You were right, the problem was in the crd file, rather than in MPI or Amber. However, I fixed the crd file and the problem now is the following:
The script and my files run normally in one machhine (4 processors). Speed is 5.21 ns/day. The output file says "running Amber/MPI on 4 nodes"
When I run the script on several machines in parallel the speed doesn't increase (or it can get even lower), even though the processes are active where specified in myhostfile (by doing "ssh" and "top" for each machine). If I select 2 machines and 4 processors per machine in myhosfile, the output file says ""running Amber/MPI on 8 nodes", etc etc.
I have tried different combinations of active machines/active processors per machine, but I can't scale up the process. Something is not working with parallelization?
I read a recent thread reporting something similar: http://archive.ambermd.org/201212/0291.html However, I could not find a clear solution in there.
I have to add that exactly the same thing happened to me last week while trying to run this script with amber11 and openmpi 1.4.2. Then, we went for reinstalling openmpi and amber from scratch, which is the starting point of this thread.
Any ideas of what might be going on? Our system administrator is also a bit lost.
Thanks very much! and Merry Christmas :-)
________________________________________
From: David A Case [case.biomaps.rutgers.edu]
Sent: 24 December 2012 15:28
To: AMBER Mailing List
Subject: Re: [AMBER] problem with parallel installation of amber12 - MPI_COMM_WORLD
On Mon, Dec 24, 2012, Amparo Garcia Lopez wrote:
>
> I've been experiencing problems while trying to run pmemd.MPI in
> different nodes. We have installed openmpi (version 1.6.3) and amber
> (version 12).
>
>
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 1.
Look at your output file. The message above just means that the master
process quit with some error. The problem probably has nothing to do with
MPI. Since your test cases passed, it is also probably not a problem with
compilation.
A quick test (which you may have already done?) is to run a short version of
the same job with just pmemd (in serial mode). By "short version", I mean set
nstlim to some small number like 100.
...good luck...dac
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Dec 24 2012 - 09:30:03 PST