Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels

Maulik, Ujjwal and Sarkar, Anasua LU orcid (2013) In PLoS ONE 8(2).
Abstract

Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels... (More)

Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: sarkar@labri.fr.

(Less)
Please use this url to cite or link to this publication:
author
and
publishing date
type
Contribution to journal
publication status
published
in
PLoS ONE
volume
8
issue
2
article number
e46468
publisher
Public Library of Science (PLoS)
external identifiers
  • pmid:23457439
  • scopus:84874025140
ISSN
1932-6203
DOI
10.1371/journal.pone.0046468
language
English
LU publication?
no
id
c4108cec-8d99-43e7-9e7d-0b74525c38ad
date added to LUP
2018-10-09 09:54:14
date last changed
2024-06-10 19:13:14
@article{c4108cec-8d99-43e7-9e7d-0b74525c38ad,
  abstract     = {{<p>Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: sarkar@labri.fr.</p>}},
  author       = {{Maulik, Ujjwal and Sarkar, Anasua}},
  issn         = {{1932-6203}},
  language     = {{eng}},
  month        = {{02}},
  number       = {{2}},
  publisher    = {{Public Library of Science (PLoS)}},
  series       = {{PLoS ONE}},
  title        = {{Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels}},
  url          = {{http://dx.doi.org/10.1371/journal.pone.0046468}},
  doi          = {{10.1371/journal.pone.0046468}},
  volume       = {{8}},
  year         = {{2013}},
}