How to find simple and accurate rules for viral protease cleavage specificities

Rognvaldsson, Thorsteinn; Etchells, Terence A.; You, Liwen; Garwicz, Daniel; Jarman, Ian; Lisboa, Paulo J. G.

How to find simple and accurate rules for viral protease cleavage specificities

Mark

Rognvaldsson, Thorsteinn ; Etchells, Terence A. ; You, Liwen ^LU ; Garwicz, Daniel ; Jarman, Ian and Lisboa, Paulo J. G. (2009) In BMC Bioinformatics 10.

Abstract: Background: Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way. Results: A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The... (More); Background: Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way. Results: A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods. Conclusion: A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/1463422

author

Rognvaldsson, Thorsteinn ; Etchells, Terence A. ; You, Liwen ^LU ; Garwicz, Daniel ; Jarman, Ian and Lisboa, Paulo J. G.

organization

Computational Biology and Biological Physics

publishing date

2009

type

Contribution to journal

publication status

published

subject

Bioinformatics and Computational Biology

in

BMC Bioinformatics

volume

10

publisher

BioMed Central (BMC)

external identifiers

wos:000267595400003
scopus:67650914275
pmid:19445713

ISSN

1471-2105

DOI

10.1186/1471-2105-10-149

language

English

LU publication?

yes

id

4b683523-593e-4ac0-a11f-5bf60ada180a (old id 1463422)

date added to LUP

2016-04-01 13:28:59

date last changed

2025-10-14 11:45:27

@article{4b683523-593e-4ac0-a11f-5bf60ada180a,
  abstract     = {{Background: Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way. Results: A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods. Conclusion: A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.}},
  author       = {{Rognvaldsson, Thorsteinn and Etchells, Terence A. and You, Liwen and Garwicz, Daniel and Jarman, Ian and Lisboa, Paulo J. G.}},
  issn         = {{1471-2105}},
  language     = {{eng}},
  publisher    = {{BioMed Central (BMC)}},
  series       = {{BMC Bioinformatics}},
  title        = {{How to find simple and accurate rules for viral protease cleavage specificities}},
  url          = {{http://dx.doi.org/10.1186/1471-2105-10-149}},
  doi          = {{10.1186/1471-2105-10-149}},
  volume       = {{10}},
  year         = {{2009}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

How to find simple and accurate rules for viral protease cleavage specificities