On Tue, 2013-11-12 at 14:36 +0100, Vlad Cojocaru wrote:
> Hi Jason,
>
> I do not have so much control over the cluster as its not a local cluster..
>
> I was indeed using the 256 cores as requested (at least to my knowledge
> cannot do it differently on this machine) .... Well, it seems that I
> don't fully understand how MMPBSA deals with the memory ... So, I was
> thinking that the memory usage per job should not change depending on
> the number of cores since the number of frames analyzed per core
> decreases with the increase of the number of cores ...
MMPBSA.py analyzes frames sequentially. If you are running in serial,
there is never more than 1 frame being analyzed at a time (and therefore
only one frame in memory). So regardless of how many frames are being
analyzed, the memory consumption will not change.
In parallel with N threads, MMPBSA.py.MPI splits up the whole trajectory
into N equal-sized (or as close as possible) smaller trajectories which
are each then analyzed sequentially. As a result, with N threads you
are analyzing N frames at a time, and therefore using N times the memory
used in serial.
The alternative, which would give far poorer scaling, would be to
analyze each frame using the number of requested cores, which would in
turn depend on the parallelizability of the requested algorithm. For GB
this is OK, but for PB it is quite limiting. The approach of
parallelizing over frames takes advantage of the embarrassingly parallel
property of MM/PBSA calculations and is why you can get nearly ideal
scaling up to ca. nframes/2 processors.
> Obviously, my thinking is flawed as from what you are saying the memory
> requirements increase with the number of cores ...
>
> So, if I get the memory usage for a single frame on a single core, can I
> actually calculate how much memory I need for lets say 10000 frames on
> 128 cores ?
>
> I will do some single core, single frames tests now ..
As I said above, the memory requirements depend on how many frames are
being analyzed concurrently---not how many frames are being analyzed
total. With 128 cores, you are analyzing 128 frames at once, so you
have to make sure you have enough memory for that. If each node has,
say, 32 GB of memory for 16 cores, you will need to ask for all 16
cores, but run no more than 4 threads (which will use all 32 GB of RAM)
on that node. [I would actually err on the side of caution and only run
3 threads per node to allow up to 8 GB of overrun for each thread.]
Many queuing systems also allow memory to be requested as a resource,
which means you can specify how much memory you want made available to
your job per processor. Other clusters may require you to use a full
node, so setting per-process memory limits wouldn't make as much sense.
This is where the cluster documentation helps significantly.
Good luck,
Jason
--
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Nov 12 2013 - 07:30:03 PST