Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Global pentapeptide statistics are far away from expected distributions

Poznański, Jarosław ; Topiński, Jan ; Muszewska, Anna ; Dębski, Konrad J. ; Hoffman-Sommer, Marta ; Pawłowski, Krzysztof LU and Grynberg, Marcin (2018) In Scientific Reports 8(1).
Abstract

The relationships between polypeptide composition, sequence, structure and function have been puzzling biologists ever since first protein sequences were determined. Here, we study the statistics of occurrence of all possible pentapeptide sequences in known proteins. To compensate for the non-uniform distribution of individual amino acid residues in protein sequences, we investigate separately all possible permutations of every given amino acid composition. For the majority of permutation groups we find that pentapeptide occurrences deviate strongly from the expected binomial distributions, and that the observed distributions are also characterized by high numbers of outlier sequences. An analysis of identified outliers shows they often... (More)

The relationships between polypeptide composition, sequence, structure and function have been puzzling biologists ever since first protein sequences were determined. Here, we study the statistics of occurrence of all possible pentapeptide sequences in known proteins. To compensate for the non-uniform distribution of individual amino acid residues in protein sequences, we investigate separately all possible permutations of every given amino acid composition. For the majority of permutation groups we find that pentapeptide occurrences deviate strongly from the expected binomial distributions, and that the observed distributions are also characterized by high numbers of outlier sequences. An analysis of identified outliers shows they often contain known motifs and rare amino acids, suggesting that they represent important functional elements. We further compare the pentapeptide composition of regions known to correspond to protein domains with that of non-domain regions. We find that a substantial number of pentapeptides is clearly strongly favored in protein domains. Finally, we show that over-represented pentapeptides are significantly related to known functional motifs and to predicted ancient structural peptides.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Scientific Reports
volume
8
issue
1
article number
15178
publisher
Nature Publishing Group
external identifiers
  • scopus:85054775034
  • pmid:30310110
ISSN
2045-2322
DOI
10.1038/s41598-018-33433-8
language
English
LU publication?
yes
id
2721d9a9-1613-4b4d-bc25-89e69e220319
date added to LUP
2018-10-29 13:10:43
date last changed
2024-06-11 23:47:57
@article{2721d9a9-1613-4b4d-bc25-89e69e220319,
  abstract     = {{<p>The relationships between polypeptide composition, sequence, structure and function have been puzzling biologists ever since first protein sequences were determined. Here, we study the statistics of occurrence of all possible pentapeptide sequences in known proteins. To compensate for the non-uniform distribution of individual amino acid residues in protein sequences, we investigate separately all possible permutations of every given amino acid composition. For the majority of permutation groups we find that pentapeptide occurrences deviate strongly from the expected binomial distributions, and that the observed distributions are also characterized by high numbers of outlier sequences. An analysis of identified outliers shows they often contain known motifs and rare amino acids, suggesting that they represent important functional elements. We further compare the pentapeptide composition of regions known to correspond to protein domains with that of non-domain regions. We find that a substantial number of pentapeptides is clearly strongly favored in protein domains. Finally, we show that over-represented pentapeptides are significantly related to known functional motifs and to predicted ancient structural peptides.</p>}},
  author       = {{Poznański, Jarosław and Topiński, Jan and Muszewska, Anna and Dębski, Konrad J. and Hoffman-Sommer, Marta and Pawłowski, Krzysztof and Grynberg, Marcin}},
  issn         = {{2045-2322}},
  language     = {{eng}},
  number       = {{1}},
  publisher    = {{Nature Publishing Group}},
  series       = {{Scientific Reports}},
  title        = {{Global pentapeptide statistics are far away from expected distributions}},
  url          = {{http://dx.doi.org/10.1038/s41598-018-33433-8}},
  doi          = {{10.1038/s41598-018-33433-8}},
  volume       = {{8}},
  year         = {{2018}},
}