Re: [AMBER] Error with using dpeaks clustering in CPPTRAJ

From: Yinglong Miao <yinglong.miao.gmail.com>
Date: Fri, 17 Jan 2020 09:09:34 -0600

Hi Dan,

Thank you for looking into this. We have been actually using dbscan and got some unsatisfactory results, in this case for clustering of ligand snapshots during binding to a protein. Lots of the snapshots were assigned to a cluster -1, which seems to include widely distributed ligand conformations in both the bound and unbound states. How comes this “-1” cluster? Do you have any thoughts to perhaps assign these snapshots more accurately to the correct clusters?

Given what we have for dbscan, I wanted to try dpeaks, hoping it can resolve the issue. I will lower epsilon as you suggested and see if it can at least complete the calculation. From initial outputs, seems there is also a “-1” cluster though. Since we have a very large number of simulation frames for clustering, the sieve option is essential to avoid the memory problem. When would you possibly make that available for dpeaks?

Thanks again,
Yinglong


> On Jan 17, 2020, at 8:41 AM, Daniel Roe <daniel.r.roe.gmail.com> wrote:
>
> PPS - Also note that adding back sieved frames isn't yet implemented
> for 'dpeaks', so you may just want to use another clustering method.
> If you want to stick with density based there's 'dbscan'...
>
> On Fri, Jan 17, 2020 at 9:37 AM Daniel Roe <daniel.r.roe.gmail.com> wrote:
>>
>> PS - If you're really interested, prior to the clustering command you
>> can use 'debug <#>' (where <#> is greater than 0) to print more
>> potentially helpful information. In the output you will see 'DBG: Max
>> dist=' which will show the maximum distance observed between points;
>> epsilon should be less than this. I should probably have that printed
>> by default.
>>
>> Thanks for the report by the way.
>>
>> On Fri, Jan 17, 2020 at 9:34 AM Daniel Roe <daniel.r.roe.gmail.com> wrote:
>>>
>>> OK - I've been looking at this for a bit. I think that the problem
>>> must be that all your points too close i.e. all points are within
>>> epsilon from each other. Your dvdfile backs that up - the first column
>>> is '#Density', which just means # of points that are within epsilon
>>> from that point. In each case the #Density is 1249, indicating that
>>> everyone is too tight. I think if you lower epsilon you'll start to
>>> get better results.
>>>
>>> This is probably a case that cpptraj should trap. In my (limited)
>>> defense, it does state that the 'dpeaks' implementation is under
>>> development...
>>>
>>> So in summary, try lowering epsilon and see if that helps. I'll work
>>> on an update to trap the case where epsilon is too large.
>>>
>>> Hope this helps,
>>>
>>> -Dan
>>>
>>> On Tue, Jan 14, 2020 at 12:34 PM <yinglong.miao.gmail.com> wrote:
>>>>
>>>> I have also tried the gauss option. It gave the following output:
>>>> ACTION OUTPUT:
>>>>
>>>> ANALYSIS: Performing 1 analyses:
>>>> 0: [cluster C0 dpeaks epsilon 4 dvdfile dvdfile choosepoints auto runavg
>>>> runavg.dat deltafile delta.dat sieve 200 gauss]
>>>> Starting clustering.
>>>> Mask [*] corresponds to 15 atoms.
>>>> Estimated pair-wise matrix memory usage: > 3.123 MB
>>>> Pair-wise matrix set up with sieve, 250000 frames, 1250 sieved frames.
>>>> Calculating pair-wise distances.
>>>> 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
>>>>
>>>> No error message was given but also no further output ...
>>>>
>>>> Thanks,
>>>> Yinglong
>>>>
>>>>
>>>> On Tue, Jan 14, 2020 at 9:47 AM Daniel Roe <daniel.r.roe.gmail.com> wrote:
>>>>
>>>>> Can you provide me (either in reply to this or off list) your entire
>>>>> cpptraj output and the contents of dvdfile?
>>>>>
>>>>> This could happen with very sparse density I think, although its
>>>>> difficult to say without exactly replicating. You could potentially
>>>>> try the 'gauss' keyword for Gaussian density instead of discrete
>>>>> density.
>>>>>
>>>>> -Dan
>>>>>
>>>>> On Mon, Jan 13, 2020 at 8:10 PM Yinglong Miao <yinglong.miao.gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi Dan,
>>>>>>
>>>>>> It’s the latest version as in AMBER git repository.
>>>>>>
>>>>>> Thanks,
>>>>>> Yinglong
>>>>>>
>>>>>>
>>>>>>> On Jan 13, 2020, at 6:20 PM, Daniel Roe <daniel.r.roe.gmail.com>
>>>>> wrote:
>>>>>>>
>>>>>>> What version of cpptraj are you using?
>>>>>>>
>>>>>>> -Dan
>>>>>>>
>>>>>>> On Mon, Jan 13, 2020 at 6:51 PM <yinglong.miao.gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I tried to use the dpeaks algorithm for clustering with the following
>>>>>>>> command:
>>>>>>>> cluster C0 dpeaks epsilon 4 dvdfile dvdfile choosepoints auto runavg
>>>>>>>> runavg.dat deltafile delta.dat sieve 200
>>>>>>>>
>>>>>>>> But keep getting the following output with error:
>>>>>>>> ...
>>>>>>>> Finding closest neighbor point with higher density for each point.
>>>>>>>> 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
>>>>>>>> Internal Error: In Cluster_DPeaks::AssignClusterNum nearest neighbor
>>>>> is -1.
>>>>>>>> Segmentation fault (core dumped)
>>>>>>>>
>>>>>>>> I will appreciate any suggestions that would fix this ...
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Yinglong
>>>>>>>>
>>>>>>>> Yinglong Miao, Ph.D.
>>>>>>>> Assistant Professor
>>>>>>>> Center for Computational Biology and
>>>>>>>> Department of Molecular Biosciences
>>>>>>>> University of Kansas
>>>>>>>> http://miao.compbio.ku.edu
>>>>>>>> _______________________________________________
>>>>>>>> AMBER mailing list
>>>>>>>> AMBER.ambermd.org
>>>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> AMBER mailing list
>>>>>>> AMBER.ambermd.org
>>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> AMBER mailing list
>>>>>> AMBER.ambermd.org
>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>>
>>>>> _______________________________________________
>>>>> AMBER mailing list
>>>>> AMBER.ambermd.org
>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Jan 17 2020 - 07:30:02 PST
Custom Search