Re: [AMBER] algorithm for clustering solvent molecules?

From: Jason Swails <>
Date: Mon, 03 Aug 2015 10:00:18 -0400

On Sat, 2015-08-01 at 21:23 -0400, Jose Borreguero wrote:
> Dear AMBER users,
> My system is very inhomogeneous regarding the spatial distribution of
> solvent molecues. I want to cluster these molecules with the algorithm
> "hieragglo" of the "cluster" command in cpptraj. However, it seems this
> command can only cluster frames because the distance metric is evaluated
> between frames. What I want to do is to cluster the solvent molecules for
> each frame, independent of other frames. Is this possible to do with
> cpptraj? If not, do you know of other program that can do this?

cpptraj can do this with some massaging. You can tell cpptraj to
cluster an arbitrary data set, so what you would need to do is generate
datasets for the water (I presume you just want the x, y, and
z-coordinates of each oxygen atom). Basically you should create
separate data sets for the X-, Y-, and Z-coordinates of every water
oxygen atom (which is basically the same as the COM of the water
molecule), and feed those 3 data sets to the "cluster" command in
cpptraj for each frame.

How you get the X-, Y-, and Z-coordinates of each water oxygen is up to
you -- you can use the "vector" command in cpptraj, you can write a
Python script to do it (perhaps with the help of a trajectory library
like pytraj or mdtraj), etc.

You can also implement the entire workflow in a couple lines of Python
if you have the right libraries installed. For example, scikit-learn is
a machine learning library written in Python that implements a wide
array of clustering algorithms
( --
so if you use pytraj or mdtraj to extract the X-, Y-, and Z-coordinates
pretty easily. MDTraj is currently an easier package to install and
start using immediately. So if you install anaconda or miniconda
(, you can use the following
commands to install the necessary packages:

conda install -c omnia mdtraj scikit-learn

Then a simple Python script like the following should extract the
information you want:

import mdtraj as md

traj = md.load('', top='your_topology.prmtop')
wat_xyz =[:,'resname HOH and name O'),:]

Then you can either write a dataset with that data and feed it to

import numpy as np
for i, frame in enumerate(wat_xyz):
    np.savetxt('frame_%d.dat' % i, frame)

Note, though, that if you have a lot of water molecules, the clustering
can take a long time for each frame.


Jason M. Swails
Rutgers University
Postdoctoral Researcher
AMBER mailing list
Received on Mon Aug 03 2015 - 07:00:03 PDT
Custom Search