Re: [AMBER] cpptraj.MPI versus cpptraj from Debarati DasGupta on 2019-12-20 (Amber Archive Dec 2019)

From: Debarati DasGupta <debarati_dasgupta.hotmail.com>
Date: Fri, 20 Dec 2019 13:26:26 +0000

Hi Daniel,
I just cross checked.We have a 100Gig network switch but yes if I had like 4 processes on a single node I/O was getting saturated and a simple top command showed the %wait to be like 20-30%. Generally it always stays less than 1% in usual days.

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

From: Daniel Roe<mailto:daniel.r.roe.gmail.com>
Sent: 20 December 2019 18:53
To: AMBER Mailing List<mailto:amber.ambermd.org>
Subject: Re: [AMBER] cpptraj.MPI versus cpptraj

Hi,

The answer is to benchmark, benchmark, benchmark.

How much speedup you can get with cpptraj.MPI depends a lot on how
many nodes you're using, what your IO bandwidth and network bandwidth
is, what your underlying filesystem is, etc. On the systems I was
testing, I found that the IO became saturated at 2 processes per node
(basically using more processes on a node than there were sockets led
to less efficiency): see discussion in
https://onlinelibrary.wiley.com/doi/full/10.1002/jcc.25382 for more
details. Of course, that's not to say you can't use more processes per
node, just that the efficiency drops. When not using all available
cores, if you're using an OpenMP-enabled action (like e.g. hbond or
rdf) you can use the remaining cores for OpenMP threads, although that
doesn't help in this particular case.

So maybe benchmark on a small subset of the trajectory (100 frames)
using 2, 4, 6 processes, but be aware that if it's a NetCDF
trajectory, the second time you run through it will likely be faster
due to caching.

Hope this helps,

-Dan

My recommendation if you don't want to benchmark is

On Thu, Dec 19, 2019 at 2:52 PM Debarati DasGupta
<debarati_dasgupta.hotmail.com> wrote:
>
>
>
> Dear Users,
> I have ~ 30000 distance based calculations I have to perform using AMBER18 cpptraj package. I am definitely using the cpptraj.MPI version as its multi threaded and will be faster than on a single processor cpptraj job.
> Any idea as to how many cores should work best, i.e. should I choose 8 or 12. Will choosing 12 drastically make my calculations faster? Is there any route which will work better. My trajectories are approx. 2 microseconds.
> $MPI_HOME/bin/mpiexec -n 8 cpptraj.MPI -i $input
>
> Thanks
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Dec 20 2019 - 05:30:04 PST