Re: [AMBER] cpptraj.MPI versus cpptraj

From: Debarati DasGupta <>
Date: Fri, 20 Dec 2019 13:18:43 +0000

Dear Prof Case,
Thanks once again for the reply.

Yes, I have 10 separate production runs and each run is 200 ns long. I save every 1ns.
I tried what you said last night and I was really amazed to see that cpptraj is doing the job much cleaner than the MPI version.
One single cpptraj on a file is taking 45 mins maximum, but when I tried cpptraj.MPI on 8 cores I get it done in 22 mins.
But the sad part is the multi threading probably is affecting a bit of the I/O and colleagues had gpu production runs (everyone uses AMBER in our lab), so their production runs came to a grinding halt if I had 10 cpptraj.MPI runs happening. They were getting 170ns/day versus when I submitted my cpptraj.MPI runs they suddenly got 10-20ns/day in the mdinfo information!
So I will do a bit of testing and I guess will do cpptraj and runs 100+ jobs than do MPI and have huge cpu wait times...

Sent from Mail<> for Windows 10

From: David Case<>
Sent: 20 December 2019 08:30
To: AMBER Mailing List<>
Subject: Re: [AMBER] cpptraj.MPI versus cpptraj

On Thu, Dec 19, 2019, Debarati DasGupta wrote:
>I have ~ 30000 distance based calculations I have to perform using
>AMBER18 cpptraj package. I am definitely using the cpptraj.MPI version as
>its multi threaded and will be faster than on a single processor cpptraj
>Any idea as to how many cores should work best, i.e. should I choose
>8 or 12. Will choosing 12 drastically make my calculations faster? Is
>there any route which will work better. My trajectories are approx. 2
>$MPI_HOME/bin/mpiexec -n 8 cpptraj.MPI -i $input

Do this in steps: try doing 300 distances on your trajectory with the
serial version of cpptraj. See how long it takes, estimate what the
"real" calculation would require. You may find that you don't need
cpptraj.MPI at all: just let the job run run overnight and you'll be
done. (Also, by doing just 1% of the calculation first, you'll have a
chance to see if all your syntax is OK, if the results make sense, etc.)

I don't have enough experience with cpptraj.MPI to provide much advice
about performance in parallel. But this should scale pretty well, so
you might expect something like an order of magnitude decrease in
wall-clock time if you use 12 cores. You could again do a trial run,
where you only analyze every 20th frame (say), to get more secure timing

It's also worth thinking in advance about how you will process the data.
You don't say how often you saved frames, but the number of frames is
more important than the fact that they cover 2 microseconds. So if you
saved a frame every nanosecond, you will end up with 30000 x 2000 =
6 million distances.

...good luck....dac

AMBER mailing list

AMBER mailing list
Received on Fri Dec 20 2019 - 05:30:02 PST
Custom Search