Re: [AMBER] Fwd: Is it possible to do REMD for a 12000 atom molecule? from Jason Swails on 2011-08-23 (Amber Archive Aug 2011)

From: Jason Swails <jason.swails.gmail.com>
Date: Tue, 23 Aug 2011 17:18:28 -0400

I'll jump in here really quick (not that quick, probably).

Carlos and Adrian have already pointed out that the code is *not* set up to
do what you're proposing, and also like they pointed out already, the
dynamic load balancer of pmemd "learns" from its inefficient start and tries
to balance the load better, so if you keep cutting it off from doing this
via short runs, you will destroy most of the benefit of using pmemd in the
first place (sander doesn't suffer from this to any great extent, but that's
because it employs static load balancing and never *improves* from its
initial performance).

However, you *can* do what you ultimately want (if I correctly interpret
your desires). Why not just overload your processors with threads? There's
nothing preventing you from running more threads than you have processors.
In a perfect world with ideal scaling, you can run 10 MPI threads on a
single processor machine and get the same exact performance as running in
serial. This is obviously an upper limit, and imperfect scaling hurts your
overall performance for a number of reasons. However, because REMD is
trivially parallel, you will see a 4-replica REMD run complete in about the
same time on 1 processor by launching them all at once and have each one go
~1/4 of the speed, than split up the simulation so you run only 1 thread at
a time as you initially proposed.

Note that once you start going off-node with multiple processors, your
communication bandwidth may start to cut into your performance a bit. If
you have enough nodes to make this a big issue, though, I'd have to wonder
why you'd need to overload, anyway.

In any case, when amber12 rolls around, pmemd will not be well-suited for
"small-scale" replica exchange simulations. That is, each replica will be
*required* to have at least 2 threads devoted exclusively to it, but it
should be a bit faster than sander.

Just to be clear, I'm not advocating in favour of my advice, I'm just saying
that it's possible. REMD is not a method well-suited for serial execution
(it is so popular for exactly the opposite reason). To implement the
*exact* algorithm you're thinking about will require significant code
changes to sander and/or pmemd. You will have to be able to store numerous
systems/topologies in a single replica and do a lot of bookkeeping which is
not currently present and then switch between them on-the-fly. Another
approach is to use the existing code, but have "extra" threads just wait
while the active threads finish (so you'll still be "overloading" your
processors by binding more than 1 thread to them, but most will be idle).

Of course the easiest way to do this is just script it in
Python/Perl/Whatever Tcl's your fancy, but then you will have to re-run the
setup portion of your simulation (allocating the stack, reading the files,
etc.) *each* time you restart a simulation. If you follow the Roitberg
exchange scheme (try as often as you can), you will spend far more time in
setup than you will actually running calculations.

This email for the most part echoes what Ross said while I was writing this
email, but perhaps over-emphasizes the challenges of doing this according to
your original designs due to the fact that sander/pmemd were not coded to
make this endeavour easy by any stretch of the imagination.

HTH,
Jason

On Tue, Aug 23, 2011 at 4:32 PM, Carlos Simmerling <
carlos.simmerling.gmail.com> wrote:

> I think the final answer is no the code is not set up to do this. You
> probably would have to write some scripts to take the final energies and
> then calculate exchanges, then create new input files. should not be
> difficult, but you'll need to do it yourself... maybe someone on the list
> has something similar already.
>
>
> On Tue, Aug 23, 2011 at 4:26 PM, Dmitry Mukha <dvmukha.gmail.com> wrote:
>
> > Yes it is what I mean. For me, I'd like to run REMD on 1 machine, but if
> it
> > was implemented for serial execution, it would be easy to add scalability
> > allowing to run, let say, 16 thread handling 32 replicas (some integer
> > multiplier of replicas, 2 or more). It would be certainly faster than
> > distributed computing may do this task.
> >
> > 2011/8/23 Carlos Simmerling <carlos.simmerling.gmail.com>
> >
> > > it's possible he means normal REMD, but run serially. no reason they
> need
> > > to
> > > run simultaneously- I think Pande may have done something similar. it's
> > > just
> > > going to take a very long time in wallclock. also the code is not set
> up
> > > for
> > > this at present.
> > >
> > >
> > --
> > Sincerely,
> > Dmitry Mukha
> > Institute of Bioorganic Chemistry, NAS, Minsk, Belarus
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Tue Aug 23 2011 - 15:30:03 PDT