Re: [AMBER] pseudo F-statistic (pSF)

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Thu, 8 Jan 2015 09:56:02 -0700

Hi,

Sorry for the delay in replying - the original ptraj clustering code
is not mine, so I have far less insight into it's use. I will answer
your questions as best as I am able.

On Sun, Jan 4, 2015 at 11:05 AM, Jonathan Gough
<jonathan.d.gough.gmail.com> wrote:
> Not knowing what the cluster count is, it seems that the best way to
> proceed is to:
> 1. run CobWeb (clusters -1) to get an idea of the number (range of
> numbers) for natural clusters in a given data set (where the atom masks are
> the same)
> 2. Then run averagelinkage, means, and/or SOM algorithms with cluster
> values around that implicated by CobWeb and use the pSF and DBI to assess
> efficacy and choose a specific cluster count. Then proceed with analysis...

This may be a good way to go, but my personal feeling is that it is
overkill. I usually just try to get a good idea for how many clusters
there are by generating and visualizing a 2d RMS plot.

> I did notice that in the implementation of the agglomerative algorithms
> the ClusterMerging.txt
> file gives pSF and DBI values for different numbers of clusters. It was
> noted that one can set clusters to 1 and then use ReadMerge to generate
> other numbers of clusters. I'm not sure exactly how to use this type of
> functionality, but I was wondering, can you just run cluster 1, then use
> the results generated in ClusterMerging.txt to decide on the number of
> clusters and critical distance (as per your paper - low DBI and high pSF)
> for an agglomerative algorithm?

I'm not sure what paper you mean. If you're talking about the Shao
Cheatham et al. 2007 paper I'm not on that :-). As far as I know the
'readmerge' functionality was designed so you could essentially
"restart" clustering for certain algorithms, e.g. if you stop at 20
clusters with hierarchical average linkage and want to continue to 10
you can without having to repeat the previous clustering steps. At any
rate, the best advice I can give here is to just "try it and see".

> If I am looking to test out and compare different clustering algorithms, is
> there a way to tell ptraj to NOT print out the full clusters? (if all I am
> trying to do is assess the representative structures and the clustering
> metrics)

I think if you omit the 'all' keyword then ptraj should not print out
cluster trajectories. Is this not the case?

-Dan

>
> Any insight would be appreciated.
>
> Thanks,
> Jonathan
>
>
>
> On Mon, Dec 22, 2014 at 9:59 PM, Jonathan Gough <jonathan.d.gough.gmail.com>
> wrote:
>
>> As Always, Very Helpful. Thanks Dan!
>>
>> On Mon, Dec 22, 2014 at 9:30 PM, Daniel Roe <daniel.r.roe.gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> The ptraj code is still there in Amber 14, it's just not built by default.
>>> You need to go to the AmberTools/src/ptraj subdirectory and type 'make
>>> install' (after configuring serial of course). Let me know if you hit any
>>> roadblocks.
>>>
>>> Hope this helps,
>>>
>>> -Dan
>>>
>>> On Monday, December 22, 2014, Jonathan Gough <jonathan.d.gough.gmail.com>
>>> wrote:
>>>
>>> > That makes sense.
>>> >
>>> > Have all the algorithms from your "J. Chem. Theory Comput., Vol. 3, No.
>>> 6,
>>> > 2007" paper been ported over to cpptraj?
>>> >
>>> > I just realized that ptraj is not in Amber14, do you have a
>>> recommendation
>>> > on how best to install/compile ptraj (AmberTools13) alongside of a
>>> > pre-existing Amber14 installation?
>>> >
>>> > Thanks,
>>> > Jonathan
>>> >
>>> > On Fri, Dec 19, 2014 at 11:52 AM, Daniel Roe <daniel.r.roe.gmail.com
>>> > <javascript:;>> wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > > The pseudo-F calculation is not yet in the released version of
>>> > > cpptraj, but it has been in the development version for some time. It
>>> > > might possibly be released as part of an update. You could *maybe*
>>> > > calculate it manually but it would take quite a bit of scripting. You
>>> > > would need to manually calculate the centroid for each cluster and a
>>> > > centroid for all clusters (doing rms-fitting as necessary if that is
>>> > > your distance metric), then calculate the between-group and
>>> > > within-group sum of squares for each cluster. If for some reason you
>>> > > really need the pseudo-F statistic right away you'll have to use
>>> > > clustering in ptraj for now.
>>> > >
>>> > > -Dan
>>> > >
>>> > > On Thu, Dec 18, 2014 at 3:54 PM, Jonathan Gough
>>> > > <jonathan.d.gough.gmail.com <javascript:;>> wrote:
>>> > > > Dear All,
>>> > > >
>>> > > > Is there a way to compute the pseudo F-statistic
>>> > > > (pSF), as per J. Chem. Theory Comput., Vol. 3, No. 6, 2007,
>>> > > > when clustering in cpptraj?
>>> > > >
>>> > > > Thanks
>>> > > > Jonathan
>>> > > > _______________________________________________
>>> > > > AMBER mailing list
>>> > > > AMBER.ambermd.org <javascript:;>
>>> > > > http://lists.ambermd.org/mailman/listinfo/amber
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > -------------------------
>>> > > Daniel R. Roe, PhD
>>> > > Department of Medicinal Chemistry
>>> > > University of Utah
>>> > > 30 South 2000 East, Room 307
>>> > > Salt Lake City, UT 84112-5820
>>> > > http://home.chpc.utah.edu/~cheatham/
>>> > > (801) 587-9652
>>> > > (801) 585-6208 (Fax)
>>> > >
>>> > > _______________________________________________
>>> > > AMBER mailing list
>>> > > AMBER.ambermd.org <javascript:;>
>>> > > http://lists.ambermd.org/mailman/listinfo/amber
>>> > >
>>> > _______________________________________________
>>> > AMBER mailing list
>>> > AMBER.ambermd.org <javascript:;>
>>> > http://lists.ambermd.org/mailman/listinfo/amber
>>> >
>>>
>>>
>>> --
>>> -------------------------
>>> Daniel R. Roe, PhD
>>> Department of Medicinal Chemistry
>>> University of Utah
>>> 30 South 2000 East, Room 307
>>> Salt Lake City, UT 84112-5820
>>> http://home.chpc.utah.edu/~cheatham/
>>> (801) 587-9652
>>> (801) 585-6208 (Fax)
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber



-- 
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 307
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-6208 (Fax)
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jan 08 2015 - 09:00:04 PST
Custom Search