Re: [AMBER] algorithm for clustering solvent molecules? from Hai Nguyen on 2015-08-03 (Amber Archive Aug 2015)

From: Hai Nguyen <nhai.qn.gmail.com>
Date: Mon, 3 Aug 2015 17:43:39 -0400

'10' is total (desired) number of clusters. I `made-up` this example by
adapting from cpptraj's manual

http://ambermd.org/doc12/Amber15.pdf (page 610, "Example: cluster on a
speciﬁc distance").

D. Roe probably comment more about clustering in cpptraj.

the output is something like below

#Clustering: 10 clusters 401 frames
#Cluster 0 has average-distance-to-centroid 1.115112
#Cluster 1 has average-distance-to-centroid 1.058091
#Cluster 2 has average-distance-to-centroid 1.136221
#Cluster 3 has average-distance-to-centroid 0.769939
#Cluster 4 has average-distance-to-centroid 0.770699
#Cluster 5 has average-distance-to-centroid 0.825140
#Cluster 6 has average-distance-to-centroid 0.502964
#Cluster 7 has average-distance-to-centroid 0.622283
#Cluster 8 has average-distance-to-centroid 0.556346
#Cluster 9 has average-distance-to-centroid 0.594979
#DBI: 0.471646
#pSF: 4271.368183
#Algorithm: HierAgglo linkage average-linkage nclusters 10 epsilon
1.79769e+308
#Representative frames: 385 5 217 187 255 70 32 39 23 116
[0 1 8 ..., 0 4 0]

Let's us know any unclear thing (so we could simplify the user interface).

Hai

On Mon, Aug 3, 2015 at 5:35 PM, Jose Borreguero <borreguero.gmail.com>
wrote:

> what is the '10' in result = pt.clustering_dataset(xyz[:, 0], 'clusters
> 10') ? Is it required final number of clusters or cutoff distance between
> oxygen atoms when calculating the contact map?
>
> On Mon, Aug 3, 2015 at 4:48 PM, Hai Nguyen <nhai.qn.gmail.com> wrote:
>
> > .Jose
> > for the sake of completeness. You can use pytraj/cpptraj for this task
> (if
> > I understand your goal and Jason's idea correctly)
> >
> > Since pytraj is not well-documented (yet), I wrote example here for demo.
> >
> >
> https://github.com/pytraj/pytraj/blob/master/examples/example_water_clustering.py
> >
> > import pytraj as pt
> >
> > # use `iterload` to save memory (same as `generator` (with fancy
> indexing)
> > in python)
> > # you can use `load`, which is similiar to `mdtraj`
> > traj = pt.iterload("../tests/data/tz2.ortho.nc",
> > "../tests/data/tz2.ortho.parm7")
> > # get some info
> > print(traj)
> >
> > # get new trajectory for specific waters (Oxygen atom only)
> > wat_traj = traj[':100-500.O']
> >
> > # iterate every frame and do clustering
> > for frame in wat_traj:
> > xyz = frame.xyz
> > # clustering for x-coordniates
> > result = pt.clustering_dataset(xyz[:, 0], 'clusters 10')
> >
> > # cluster index for each atom
> > print(result)
> >
> > Hai
> >
> > On Mon, Aug 3, 2015 at 3:21 PM, Jose Borreguero <borreguero.gmail.com>
> > wrote:
> >
> > > I have created a graph for every frame. Nodes in the graph are the
> > solvent
> > > molecules, and two nodes are connected with and edge if the distance
> > > between the associated solvent molecules is below a cutoff I chose. I
> > have
> > > systems with different solvation levels, some of then featuring
> "pockets"
> > > of solvent molecules. These pockets are the clusters I'm interested in.
> > > Algorithm networkx.connected_components
> > > <
> > >
> >
> https://networkx.github.io/documentation/latest/reference/generated/networkx.algorithms.components.connected.connected_components.html
> > > >can
> > > find the connected clusters from a graph. To create the graph, I am
> using
> > > MDAnalysis to obtain the contact map between solvent molecules.
> Regarding
> > > time, it takes 2.2seconds to create a contact map for 4132 solvent
> > > molecules, which I think is reasonable (unless you have many thousands
> of
> > > frames)
> > >
> > > On Mon, Aug 3, 2015 at 1:17 PM, Jason Swails <jason.swails.gmail.com>
> > > wrote:
> > >
> > > > On Mon, 2015-08-03 at 12:57 -0400, Jose Borreguero wrote:
> > > > > Thanks a lot, Jason. I'll go along with python for the clustering
> > > step. I
> > > > > found module networkx which is very straighforward for clustering,
> > and
> > > > > quite fast.
> > > >
> > > > How are you using networkx for this? I've used it to define a bond
> > > > graph in molecular topology before, but I don't see how networkx maps
> > to
> > > > clustering here. Do you have a fully connected graph whose edge
> > weights
> > > > are the distance between the nodes or something? That sounds like it
> > > > would be an expensive graph to create. Keep in mind that the PBC
> will
> > > > have an effect on the clusters, so how you pick the unit cell
> > > > representation is likely important.
> > > >
> > > > I've used sklearn to cluster in the past, and I've found it to be
> > pretty
> > > > easy to use, for what that's worth.
> > > >
> > > > All the best,
> > > > Jason
> > > >
> > > > --
> > > > Jason M. Swails
> > > > BioMaPS,
> > > > Rutgers University
> > > > Postdoctoral Researcher
> > > >
> > > >
> > > > _______________________________________________
> > > > AMBER mailing list
> > > > AMBER.ambermd.org
> > > > http://lists.ambermd.org/mailman/listinfo/amber
> > > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Aug 03 2015 - 15:00:03 PDT