Physicochemical feature-based classification of amino acid mutations.

Shen, Bairong; Bai, Jinwei; Vihinen, Mauno

Physicochemical feature-based classification of amino acid mutations.

Mark

Shen, Bairong ; Bai, Jinwei and Vihinen, Mauno ^LU

(2008) In Protein Engineering Design & Selection 21(1). p.37-44

Abstract: A huge quantity of gene and protein sequences have become available during the post-genomic era, and information about genetic variations, including amino acid substitutions and SNPs, is accumulating rapidly. To understand the effects of these changes, it is often essential to apply bioinformatics tools. Where there is a lack of homologous sequences or a three-dimensional structure, it becomes essential to predict the effects of mutations based solely on protein sequence information. Several computational methods utilizing machine learning techniques have been developed. These predictions generally use the 20-alphabet amino acid code to train the model. With limited available data, the 20-alphabet amino acid features may introduce so many... (More); A huge quantity of gene and protein sequences have become available during the post-genomic era, and information about genetic variations, including amino acid substitutions and SNPs, is accumulating rapidly. To understand the effects of these changes, it is often essential to apply bioinformatics tools. Where there is a lack of homologous sequences or a three-dimensional structure, it becomes essential to predict the effects of mutations based solely on protein sequence information. Several computational methods utilizing machine learning techniques have been developed. These predictions generally use the 20-alphabet amino acid code to train the model. With limited available data, the 20-alphabet amino acid features may introduce so many parameters that the model becomes over-fitted. To decrease the number of parameters, we propose a physicochemical feature-based method to forecast the effects of amino acid substitutions on protein stability. Protein structure alterations caused by mutations can be classified as stabilizing or destabilizing. Based on experimental folding-unfolding free energy (DeltaDeltaG) values, we trained a support vector machine with a cleaned data set. The physicochemical properties of the mutated residues, the number of neighboring residues in the primary sequence and the temperature and pH were used as input attributes. Different kernel functions, attributes and window sizes were optimized. An average accuracy of 80% was obtained in cross-validation experiments. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/3635202

author

Shen, Bairong ; Bai, Jinwei and Vihinen, Mauno ^LU

publishing date

2008

type

Contribution to journal

publication status

published

subject

Medical Genetics and Genomics (including Gene Therapy)

keywords

Amino Acid Substitution: genetics, Proteins: chemistry, Proteins: genetics

in

Protein Engineering Design & Selection

volume

21

issue

1

pages

37 - 44

publisher

Oxford University Press

external identifiers

pmid:18096555
scopus:38349113850
pmid:18096555

ISSN

1741-0126

DOI

10.1093/protein/gzm084

language

English

LU publication?

no

id

dcab4f47-6c10-4cda-bcf7-a6d85b56fc68 (old id 3635202)

alternative location

http://www.ncbi.nlm.nih.gov/pubmed/18096555?dopt=Abstract

date added to LUP

2016-04-04 09:32:33

date last changed

2025-10-14 11:00:45

@article{dcab4f47-6c10-4cda-bcf7-a6d85b56fc68,
  abstract     = {{A huge quantity of gene and protein sequences have become available during the post-genomic era, and information about genetic variations, including amino acid substitutions and SNPs, is accumulating rapidly. To understand the effects of these changes, it is often essential to apply bioinformatics tools. Where there is a lack of homologous sequences or a three-dimensional structure, it becomes essential to predict the effects of mutations based solely on protein sequence information. Several computational methods utilizing machine learning techniques have been developed. These predictions generally use the 20-alphabet amino acid code to train the model. With limited available data, the 20-alphabet amino acid features may introduce so many parameters that the model becomes over-fitted. To decrease the number of parameters, we propose a physicochemical feature-based method to forecast the effects of amino acid substitutions on protein stability. Protein structure alterations caused by mutations can be classified as stabilizing or destabilizing. Based on experimental folding-unfolding free energy (DeltaDeltaG) values, we trained a support vector machine with a cleaned data set. The physicochemical properties of the mutated residues, the number of neighboring residues in the primary sequence and the temperature and pH were used as input attributes. Different kernel functions, attributes and window sizes were optimized. An average accuracy of 80% was obtained in cross-validation experiments.}},
  author       = {{Shen, Bairong and Bai, Jinwei and Vihinen, Mauno}},
  issn         = {{1741-0126}},
  keywords     = {{Amino Acid Substitution: genetics; Proteins: chemistry; Proteins: genetics}},
  language     = {{eng}},
  number       = {{1}},
  pages        = {{37--44}},
  publisher    = {{Oxford University Press}},
  series       = {{Protein Engineering Design & Selection}},
  title        = {{Physicochemical feature-based classification of amino acid mutations.}},
  url          = {{http://dx.doi.org/10.1093/protein/gzm084}},
  doi          = {{10.1093/protein/gzm084}},
  volume       = {{21}},
  year         = {{2008}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Physicochemical feature-based classification of amino acid mutations.