Re: [AMBER] sieve keyword in cpptraj clustering

From: Daniel Roe <>
Date: Wed, 25 Mar 2020 15:16:43 -0400


On Tue, Mar 10, 2020 at 9:37 AM Debarati DasGupta
<> wrote:
> Any ideas on “sieve” and “clusters”?
> What is the optimum I should keep?
> If I have no idea what should be the number of clusters should I make it 50 or 80 (a higher number) and what is the functionality of sieve I could not clearly understand from the manual.. Any suggestions?

So the manual has this: "Perform clustering only for every <#> frame.
After clustering, all other frames will be added to clusters."

Admittedly that could be a bit more verbose, but the essential idea is
that since the most time-consuming/memory-hungry part of clustering
tends to be the calculation/storage of the pairwise distance matrix,
you do your clustering on a subset of frames to reduce this size.
Hopefully this gives you clustering that is more or less
representative of what the clustering would be if you used your entire
trajectory, and you can then add the frames you didn't cluster back in
afterwards. This idea is fleshed out much better in the Shao &
Cheatham et al. clustering paper:

So there's no "right" answer to what value you should use, but in
general I would recommend using a value that gives you the largest #
frames your system can feasibly handle (10-20k frames is probably a
safe bet to start, and you can increase from there). Also I recommend
using the "random" keyword as well, as it should make your sieving
less prone to any potential periodic (or harmonic) artifacts in the
trajectory data.


> Thanks everyone.
> Regards
> _______________________________________________
> AMBER mailing list

AMBER mailing list
Received on Wed Mar 25 2020 - 12:30:02 PDT
Custom Search