Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Uncovering new families and folds in the natural protein universe

Durairaj, Janani ; Waterhouse, Andrew M ; Mets, Toomas LU ; Brodiazhenko, Tetiana ; Abdullah, Minhal LU ; Studer, Gabriel ; Tauriello, Gerardo ; Akdel, Mehmet ; Andreeva, Antonina and Bateman, Alex , et al. (2023) In Nature 622(7983). p.646-653
Abstract

We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database1. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this "dark matter" of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from... (More)

We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database1. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this "dark matter" of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure, and semantic perspectives, we uncovered the β-flower fold, added multiple protein families to Pfam database2, and experimentally demonstrate that one of these belongs to a new superfamily of translation-targeting toxin-antitoxin systems, TumE-TumA. This work underscores the value of large-scale efforts in identifying, annotating, and prioritising novel protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; ; ; and , et al. (More)
; ; ; ; ; ; ; ; ; ; ; ; and (Less)
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Nature
volume
622
issue
7983
pages
646 - 653
publisher
Nature Publishing Group
external identifiers
  • scopus:85171531550
  • pmid:37704037
ISSN
0028-0836
DOI
10.1038/s41586-023-06622-3
language
English
LU publication?
yes
additional info
© 2023. The Author(s), under exclusive licence to Springer Nature Limited.
id
eeb34085-fc68-4ac8-9248-116bd8a598f1
date added to LUP
2023-09-14 20:59:17
date last changed
2024-04-19 14:31:28
@article{eeb34085-fc68-4ac8-9248-116bd8a598f1,
  abstract     = {{<p>We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database1. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this "dark matter" of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure, and semantic perspectives, we uncovered the β-flower fold, added multiple protein families to Pfam database2, and experimentally demonstrate that one of these belongs to a new superfamily of translation-targeting toxin-antitoxin systems, TumE-TumA. This work underscores the value of large-scale efforts in identifying, annotating, and prioritising novel protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.</p>}},
  author       = {{Durairaj, Janani and Waterhouse, Andrew M and Mets, Toomas and Brodiazhenko, Tetiana and Abdullah, Minhal and Studer, Gabriel and Tauriello, Gerardo and Akdel, Mehmet and Andreeva, Antonina and Bateman, Alex and Tenson, Tanel and Hauryliuk, Vasili and Schwede, Torsten and Pereira, Joana}},
  issn         = {{0028-0836}},
  language     = {{eng}},
  month        = {{09}},
  number       = {{7983}},
  pages        = {{646--653}},
  publisher    = {{Nature Publishing Group}},
  series       = {{Nature}},
  title        = {{Uncovering new families and folds in the natural protein universe}},
  url          = {{http://dx.doi.org/10.1038/s41586-023-06622-3}},
  doi          = {{10.1038/s41586-023-06622-3}},
  volume       = {{622}},
  year         = {{2023}},
}