On Wed, Sep 25, 2013 at 11:26 AM, George Tzotzos <gtzotzos.me.com> wrote:
> Jason,
>
> Many thanks. We have Infiniband running on the cluster. Is there another
> diagnostic to achieve better scaling?
>
First thing to do is check that your MPI is even using the available
infiniband connection.
It may be that even with infiniband there is not enough bandwidth to split
12 ways per node. Perhaps try leaving 6 cores idle on each node and see if
it continues to scale past 24 CPUs. If this improves scaling, your problem
is Infiniband bandwidth and your options are to either improve it, deal
with leaving a bunch of CPUs idle, or stick to 2 nodes.
It is also worth seeing how far 'separated' the nodes you are running on
are from one another in terms of connectivity. The farther apart 2 nodes
are (i.e., the more 'other' nodes the traffic has to flow through), the
longer communications take.
The only 'tricks' you can play to improve the scaling is via PBS resource
requests to minimize the number of machines between the nodes you are
using, or to pass some options to mpirun/mpiexec to help it optimize
performance (I'm very unfamiliar with the latter approach). Both of these
solutions are dependent on your particular system setup and MPI
implementation and may require lots of manpage and manual reading or
conversations with your resident expert sysadmin. (And neither may be
possible...)
On the other hand, every system has a 'maximum' efficiency that cannot be
broken through to achieve faster compute times, even if you throw more
processors on it. It may be that your system is small enough to see no
HTH,
Jason
--
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 25 2013 - 09:00:04 PDT