Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Sequence-Based Prediction for Protein Solvent Accessibility

Yang, Yang ; Chen, Mengqi ; Liu, Congrui and Vihinen, Mauno LU orcid (2025) In International Journal of Molecular Sciences 26(12).
Abstract

When globular proteins fold into their characteristic three-dimensional structures, some amino acids are located on the surface, while others are situated in the protein core, where they cannot interact with molecules in the environment. Predicting the degree of solubility of amino acids provides insight into the function and relevance of residues. Residue accessibility is crucial for several protein functions, including enzymatic activity, allostery, multimer formation, binding to other molecules, and immunogenicity. We developed a novel sequence-based predictor for amino acid accessibility with features derived from three-dimensional protein structures. Several machine learning algorithms were tested, and the long short-term memory... (More)

When globular proteins fold into their characteristic three-dimensional structures, some amino acids are located on the surface, while others are situated in the protein core, where they cannot interact with molecules in the environment. Predicting the degree of solubility of amino acids provides insight into the function and relevance of residues. Residue accessibility is crucial for several protein functions, including enzymatic activity, allostery, multimer formation, binding to other molecules, and immunogenicity. We developed a novel sequence-based predictor for amino acid accessibility with features derived from three-dimensional protein structures. Several machine learning algorithms were tested, and the long short-term memory (LSTM) deep learning method demonstrated the best performance; thus, it was utilized to develop the freely available SolAcc tool. It showed superior performance compared to state-of-the-art predictors in a blind test.

(Less)
Please use this url to cite or link to this publication:
author
; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
amino acid accessibility, machine learning, protein structure, sequence-based prediction, solubility
in
International Journal of Molecular Sciences
volume
26
issue
12
article number
5604
publisher
MDPI AG
external identifiers
  • pmid:40565067
  • scopus:105009002645
ISSN
1661-6596
DOI
10.3390/ijms26125604
language
English
LU publication?
yes
additional info
Publisher Copyright: © 2025 by the authors.
id
2089794d-fdb6-4e76-9451-1451034e22f2
date added to LUP
2025-12-17 09:39:32
date last changed
2025-12-18 03:00:15
@article{2089794d-fdb6-4e76-9451-1451034e22f2,
  abstract     = {{<p>When globular proteins fold into their characteristic three-dimensional structures, some amino acids are located on the surface, while others are situated in the protein core, where they cannot interact with molecules in the environment. Predicting the degree of solubility of amino acids provides insight into the function and relevance of residues. Residue accessibility is crucial for several protein functions, including enzymatic activity, allostery, multimer formation, binding to other molecules, and immunogenicity. We developed a novel sequence-based predictor for amino acid accessibility with features derived from three-dimensional protein structures. Several machine learning algorithms were tested, and the long short-term memory (LSTM) deep learning method demonstrated the best performance; thus, it was utilized to develop the freely available SolAcc tool. It showed superior performance compared to state-of-the-art predictors in a blind test.</p>}},
  author       = {{Yang, Yang and Chen, Mengqi and Liu, Congrui and Vihinen, Mauno}},
  issn         = {{1661-6596}},
  keywords     = {{amino acid accessibility; machine learning; protein structure; sequence-based prediction; solubility}},
  language     = {{eng}},
  number       = {{12}},
  publisher    = {{MDPI AG}},
  series       = {{International Journal of Molecular Sciences}},
  title        = {{Sequence-Based Prediction for Protein Solvent Accessibility}},
  url          = {{http://dx.doi.org/10.3390/ijms26125604}},
  doi          = {{10.3390/ijms26125604}},
  volume       = {{26}},
  year         = {{2025}},
}