Re: AMBER: AMBER goes in a Loop from Robert Duke on 2005-10-14 (Amber Archive Oct 2005)

From: Robert Duke <rduke.email.unc.edu>
Date: Fri, 14 Oct 2005 13:58:50 -0400

Folks -
A wild guess here. Sometimes with mpich there can be problems with deadlocks if too much memory is used, given the mpich configuration, and this will look like an infinite loop, but what it is is a hang on a deadlock in the kernel over buffer space. I think this problem is worse for sander than pmemd because it uses more mpi memory, but it can happen to either. The critical interplay is between P4_SOCKBUFSIZE (mpich environment variable under mpich 1.2.x, I think it is MPICH_SOCKET_BUFFER_SIZE in mpich 2) and kernel networking memory params. What I do is:

In /etc/rc.d/rc.local put the two lines:

echo 1048576 > /proc/sys/net/core/rmem_max
echo 1048576 > /proc/sys/net/core/wmem_max

This way, every time you reboot there is a substantial chunk of memory dedicated to net buffers. Doing this of course requires root privileges.

Then set P4_SOCKBUFSIZE (MPICH_SOCKET_BUFFER_SIZE for mpich 2) to something like 131072 in your .cshrc or wherever makes sense for you.

The critical point here is that you need sufficient memory set aside that a read and write operation can be underway simultaneously in each mpi process, or things will deadlock, and when you run mpich on dual processor machines, the amount of net buffer space increases (so you see above I am specifying 8 x as much memory in the kernel as in P4_SOCKBUFSIZE; I don't know what the minimum "overage" required to prevent deadlocks is, but this config works well for my machines).

Now, with mpich you will also need a very large value from P4_GLOBMEMSIZE; I set my machines to something like 134217728 to be able to run the rt benchmark on sander; pmemd requires a fraction of this. The run always dies with an obvious error message when this is a problem.

Another point: These large buffer sizes DO improve mpich/gigabit ethernet performance significantly. There are also issues about being sure the right number of processors start on the right machines, and that your server nics (you did buy expensive but faster server nics for your back end didn't you, and you do have a separate local lan interconnecting the machines, right?) are where the mpi i/o occurs. The only way I have found to get the right number of processes on the machines and using the right interconnects is with a "process group file" where I can reference the interconnect - see the mpich doc. All these things make a huge difference for gigabit ethernet lan performance. I currently get the following throughput on 3.2 ghz dual cpu p4's connected as described above for factor ix const pressure (90906 atoms):

#proc psec/day
1 114
2 182
4 291

Note this is current in-development code, not pmemd 8. Basically you DON'T get linear scaling on something like factor ix on these small systems with gigabit ethernet because the distributed fft transposes are huge and overwhelm the interconnect bandwidth. There is not nearly as much of a problem for shared memory machines or real supercomputers (the 1 to 2 processor scaling drop is actually largely a cache sharing issue on these small machines as you don't use the nic's; once you go to 4 procs, though, you use the nics).

Okay, I may or may not have ever posted anything on this; I don't remember. But if I didn't, the reason I didn't was because these are machine-specific instructions that work with RedHat linux and probably a variety of other linuxes (but probably not all), and that work with mpich. So you may have to poke around for your specific machine. If you have a canned vendor setup - like something from sgi or what have you, they probably get the base config correct; the grief comes when you take a generic system and put your own mpi(ch) on top of it. I have not looked at LAM, but there is no reason it would not also be susceptible to the problem. This sort of thing reflects a lack of deadlock avoidance software down there somewhere.

Sorry if this is not at all your problem; in my case though, this is the source of rt benchmark hangs for sander 8 or pmemd 8.

Regards - Bob Duke
  ----- Original Message -----
  From: Imran Khan
  To: amber.scripps.edu
  Sent: Friday, October 14, 2005 12:37 PM
  Subject: Re: AMBER: AMBER goes in a Loop

  Hi David,

  Yes, all other benchmark tests ie.. hb, jac and gb_alp etc.. run successfully for 8 processors. Also, they run for 1, 2 and 4 processors.

  The problem is only with rt 8 processor run.

  Imran

  On 10/14/05, David A. Case <case.scripps.edu> wrote:
    On Thu, Oct 13, 2005, Imran Khan wrote:
>
> I am trying to run a benchmark called `rt` using Amber for 8 processors,
> where it sorts of goes into an infinite loop. However it runs successfully
> to completion for less than 8 processors. This is for sander runs.

    Does the system work for other benchmarks, e.g. "jac" or "hb"? I'm trying to
    find out if the problem is specific to the "rt" test case. Also, does the
    system pass all the tests at 8 processors?

    ...dac

    -----------------------------------------------------------------------
    The AMBER Mail Reflector
    To post, send mail to amber.scripps.edu
    To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Fri Oct 14 2005 - 19:53:01 PDT