Re: [AMBER] algorithm for clustering solvent molecules?

From: Jose Borreguero <>
Date: Mon, 3 Aug 2015 12:57:08 -0400

Thanks a lot, Jason. I'll go along with python for the clustering step. I
found module networkx which is very straighforward for clustering, and
quite fast.

On Mon, Aug 3, 2015 at 10:00 AM, Jason Swails <>

> On Sat, 2015-08-01 at 21:23 -0400, Jose Borreguero wrote:
> > Dear AMBER users,
> >
> > My system is very inhomogeneous regarding the spatial distribution of
> > solvent molecues. I want to cluster these molecules with the algorithm
> > "hieragglo" of the "cluster" command in cpptraj. However, it seems this
> > command can only cluster frames because the distance metric is evaluated
> > between frames. What I want to do is to cluster the solvent molecules for
> > each frame, independent of other frames. Is this possible to do with
> > cpptraj? If not, do you know of other program that can do this?
> cpptraj can do this with some massaging. You can tell cpptraj to
> cluster an arbitrary data set, so what you would need to do is generate
> datasets for the water (I presume you just want the x, y, and
> z-coordinates of each oxygen atom). Basically you should create
> separate data sets for the X-, Y-, and Z-coordinates of every water
> oxygen atom (which is basically the same as the COM of the water
> molecule), and feed those 3 data sets to the "cluster" command in
> cpptraj for each frame.
> How you get the X-, Y-, and Z-coordinates of each water oxygen is up to
> you -- you can use the "vector" command in cpptraj, you can write a
> Python script to do it (perhaps with the help of a trajectory library
> like pytraj or mdtraj), etc.
> You can also implement the entire workflow in a couple lines of Python
> if you have the right libraries installed. For example, scikit-learn is
> a machine learning library written in Python that implements a wide
> array of clustering algorithms
> ( --
> so if you use pytraj or mdtraj to extract the X-, Y-, and Z-coordinates
> pretty easily. MDTraj is currently an easier package to install and
> start using immediately. So if you install anaconda or miniconda
> (, you can use the following
> commands to install the necessary packages:
> conda install -c omnia mdtraj scikit-learn
> Then a simple Python script like the following should extract the
> information you want:
> import mdtraj as md
> traj = md.load('', top='your_topology.prmtop')
> wat_xyz =[:,'resname HOH and name O'),:]
> Then you can either write a dataset with that data and feed it to
> cpptraj:
> import numpy as np
> for i, frame in enumerate(wat_xyz):
> np.savetxt('frame_%d.dat' % i, frame)
> Note, though, that if you have a lot of water molecules, the clustering
> can take a long time for each frame.
> HTH,
> Jason
> --
> Jason M. Swails
> BioMaPS,
> Rutgers University
> Postdoctoral Researcher
> _______________________________________________
> AMBER mailing list
AMBER mailing list
Received on Mon Aug 03 2015 - 10:00:03 PDT
Custom Search