Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Accurate prediction of substitution rates at protein sites with a mutation-selection model

André, Ingemar LU orcid (2025) In Scientific Reports 15(1).
Abstract

The pattern of substitutions at sites in proteins provides invaluable information about their biophysical and functional importance and what selection pressures are acting at individual sites. Amino acid site rates are typically estimated using phenomenological models where sequence variability is described by rate factors that scale the overall substitution rate in a protein to sites. In this study, we demonstrate that site rates can be calculated accurately from amino acid sequences from a multiple sequence alignment using a mutation-selection model in combination with a simple nucleotide substitution model. The method performs better than the standard phylogenetic approach on sequences generated by structure-based evolutionary... (More)

The pattern of substitutions at sites in proteins provides invaluable information about their biophysical and functional importance and what selection pressures are acting at individual sites. Amino acid site rates are typically estimated using phenomenological models where sequence variability is described by rate factors that scale the overall substitution rate in a protein to sites. In this study, we demonstrate that site rates can be calculated accurately from amino acid sequences from a multiple sequence alignment using a mutation-selection model in combination with a simple nucleotide substitution model. The method performs better than the standard phylogenetic approach on sequences generated by structure-based evolutionary dynamics simulations, robustly estimates rates for shallow multiple sequence alignments, and can be rapidly calculated also on larger sequence alignments. On natural sequences, site rates from the mutation-selection model are strongly correlated with rates calculated with the empirical Bayes methods. The model complements other work in providing a link between amino acid substitution rates and equilibrium frequency distributions at sites in proteins. We show how an ensemble of equilibrium frequency vectors can be used to represent the rate variation encoded in empirical amino acid substitution matrices. This study demonstrates that a rapid and simple method can be developed from the mutation-selection model to predict substitution rates from amino acid data, complementing the standard phylogenetic approach.

(Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Amino acid site-rates, Mutation-selection model, Substitution matrices
in
Scientific Reports
volume
15
issue
1
article number
34315
publisher
Nature Publishing Group
external identifiers
  • scopus:105017704202
  • pmid:41039064
ISSN
2045-2322
DOI
10.1038/s41598-025-22516-y
language
English
LU publication?
yes
id
9502aca1-07e7-4722-ad1f-93820e9b9a92
date added to LUP
2025-11-21 12:32:26
date last changed
2025-11-22 03:00:02
@article{9502aca1-07e7-4722-ad1f-93820e9b9a92,
  abstract     = {{<p>The pattern of substitutions at sites in proteins provides invaluable information about their biophysical and functional importance and what selection pressures are acting at individual sites. Amino acid site rates are typically estimated using phenomenological models where sequence variability is described by rate factors that scale the overall substitution rate in a protein to sites. In this study, we demonstrate that site rates can be calculated accurately from amino acid sequences from a multiple sequence alignment using a mutation-selection model in combination with a simple nucleotide substitution model. The method performs better than the standard phylogenetic approach on sequences generated by structure-based evolutionary dynamics simulations, robustly estimates rates for shallow multiple sequence alignments, and can be rapidly calculated also on larger sequence alignments. On natural sequences, site rates from the mutation-selection model are strongly correlated with rates calculated with the empirical Bayes methods. The model complements other work in providing a link between amino acid substitution rates and equilibrium frequency distributions at sites in proteins. We show how an ensemble of equilibrium frequency vectors can be used to represent the rate variation encoded in empirical amino acid substitution matrices. This study demonstrates that a rapid and simple method can be developed from the mutation-selection model to predict substitution rates from amino acid data, complementing the standard phylogenetic approach.</p>}},
  author       = {{André, Ingemar}},
  issn         = {{2045-2322}},
  keywords     = {{Amino acid site-rates; Mutation-selection model; Substitution matrices}},
  language     = {{eng}},
  number       = {{1}},
  publisher    = {{Nature Publishing Group}},
  series       = {{Scientific Reports}},
  title        = {{Accurate prediction of substitution rates at protein sites with a mutation-selection model}},
  url          = {{http://dx.doi.org/10.1038/s41598-025-22516-y}},
  doi          = {{10.1038/s41598-025-22516-y}},
  volume       = {{15}},
  year         = {{2025}},
}