Dear Jianyin,
Firstly, sorry I took some time to reply to this. However, I have again gone through the recipe that you outline below. And again by hand I can reproduce the DBI for rmsds but not DMEs
I tried for a trajectory that I clustered into a size of 2 so as to circumvent the max in fred(X) and even with a simple test like this I am unable to get the DME numbers.
To calculate the DBI by hand I used the two numbers from the clustering output i.e:
#Cluster 0 has average-distance-to-centroid 1.556713
#Cluster 1 has average-distance-to-centroid 1.454760
and then computed the DME, using the formula that you use, from centroid0 to centroid1. The DBI should then just be (1.55+1.45)/(the number I calc) however it is way off what is reported in the clustering output DBI: 2.34419 versus DBI: 8.6.
Again I reiterate doing the same thing with RMSDs I have no problem. This suggests that perhaps my DME formula is incorrect. However, I assure you that I am using the one from the 1994 Torda paper, which is the one you quote below.
Is there something obvious that I'm missing?
Thanks in advance,
Geoff.
On May 10, 2010, at 7:10 PM, Jianyin Shao wrote:
> Hi Geoffrey,
>
> The calculation of DBI when DME is used is same as the calculation of DBI
> when rmsd is used. The DBI is defined as the average, for all clusters X, of
> fred, where fred(X) = max, across other clusters Y, of (Cx + Cy)/dXY. Here
> Cx is the average distance from points in X to the centroid, similarly Cy,
> and dXY is
> the distance between cluster centroids:
>
> DBI = average(fred(X))
> fred(X) = max((C(X)+C(Y))/distance(X,Y))
> C(X) = average(distance(x, x-centroid))
> C(Y) = average(distance(y, y-centroid))
> distance(X,Y) = distance(x-centroid, y-centroid)
>
> Since you can reproduce the DBI value with rmsd metric, I guess the problem
> does not lie in the DBI calculation. Here I found two possible causes that
> DME might go wrong.
> 1. The centroid of a cluster using rmsd metric will implicitly aligned.
> There is no alignment before calculating the centroid when DME metric is
> used.
> 2. When calculating DME, we used formula:
> DME = sqrt(sum((distance(X,i,j)-distance(Y,i,j))^2)/C(N,2)), in which
> distance(X,i,j) is the distance between atom i and atom j in structure X.
> C(N,2) is N*(N-1)/2. But I've also seen formulas like this:
> DME = sqrt(sum(distance(X,i,j)-distance(Y,i,j))/C(N,2)) (sum up the
> distance error, not the square of distance error) or
> DME = sqrt(sum((distance(X,i,j)-distance(Y,i,j))^2)/N^2) (sum up all the
> pairs, including (i,i), which should be 0, then divided by the number of
> total pairs, N^2)
>
> I don't know how you calculate the DME. Hope this explanation helps.
>
> Best,
>
> Jianyin Shao
>
>
> On Mon, May 3, 2010 at 2:48 PM, Geoffrey Wood <ge.ffw.d.gmail.com> wrote:
>
>> Dear Amber Users,
>>
>> I have been looking carefully at the clustering tools in ptraj in
>> particular
>> the cluster validity testing using the davies bouldin index (DBI). If my
>> set measure is RMSD then I am able to reproduce the DBI when I calculate it
>> "by hand" using ptraj... i.e taking the RMSD of each centroid with the
>> others and applying the DBI formula. However, when I take the set measure
>> to be the DME then I am unable to reproduce the values that ptraj
>> calculates. In this case ptraj doesn't have a DME command but it is easy
>> enough to use the distance command and calculate this matrix by hand. My
>> question is: how is ptraj calculating the DBI when the set measure is DME?
>>
>> Thanks in advance, Geoffrey Wood
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jul 05 2010 - 10:30:03 PDT