Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

uORF4u : a tool for annotation of conserved upstream open reading frames

Egorov, Artyom A. LU orcid and Atkinson, Gemma C. LU (2023) In Bioinformatics 39(5).
Abstract

Summary: Upstream open reading frames (uORFs, often encoding so-called leader peptides) can regulate translation and transcription of downstream main ORFs (mORFs) in prokaryotes and eukaryotes. However, annotation of novel functional uORFs is challenging due to their short size of usually <100 codons. While transcription- and translation-level next-generation sequencing methods can be used for genome-wide functional uORF identification, this data are not available for the vast majority of species with sequenced genomes. At the same time, the exponentially increasing amount of genome assemblies gives us the opportunity to take advantage of evolutionary conservation in our predictions of functional ORFs. Here, we present a tool for... (More)

Summary: Upstream open reading frames (uORFs, often encoding so-called leader peptides) can regulate translation and transcription of downstream main ORFs (mORFs) in prokaryotes and eukaryotes. However, annotation of novel functional uORFs is challenging due to their short size of usually <100 codons. While transcription- and translation-level next-generation sequencing methods can be used for genome-wide functional uORF identification, this data are not available for the vast majority of species with sequenced genomes. At the same time, the exponentially increasing amount of genome assemblies gives us the opportunity to take advantage of evolutionary conservation in our predictions of functional ORFs. Here, we present a tool for conserved uORF annotation in 50 upstream sequences of a user-defined protein of interest or a set of protein homologs. It can also be used to find small conserved ORFs within a set of nucleotide sequences. The output includes publication-quality figures with multiple sequence alignments, sequence logos, and locus annotation of the predicted conserved uORFs in graphical vector format. Availability and implementation: uORF4u is written in Python3 and runs on Linux and MacOS. The command-line interface covers most practical use cases, while the provided Python API allows usage within a Python program and additional customization. Source code is available from the GitHub page: github.com/GCA-VH-lab/uorf4u. Detailed documentation that includes an example-driven guide available at the software home page: gca-vh-lab.github.io/uorf4u. A web version of uORF4u is available at server.atkinson-lab.com/uorf4u.

(Less)
Please use this url to cite or link to this publication:
author
and
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Bioinformatics
volume
39
issue
5
article number
btad323
publisher
Oxford University Press
external identifiers
  • pmid:37184890
  • scopus:85161197918
ISSN
1367-4803
DOI
10.1093/bioinformatics/btad323
language
English
LU publication?
yes
id
4f1d08b4-e4ed-4660-819f-2b857bffea76
date added to LUP
2023-08-22 10:50:05
date last changed
2024-04-20 01:12:54
@article{4f1d08b4-e4ed-4660-819f-2b857bffea76,
  abstract     = {{<p>Summary: Upstream open reading frames (uORFs, often encoding so-called leader peptides) can regulate translation and transcription of downstream main ORFs (mORFs) in prokaryotes and eukaryotes. However, annotation of novel functional uORFs is challenging due to their short size of usually &lt;100 codons. While transcription- and translation-level next-generation sequencing methods can be used for genome-wide functional uORF identification, this data are not available for the vast majority of species with sequenced genomes. At the same time, the exponentially increasing amount of genome assemblies gives us the opportunity to take advantage of evolutionary conservation in our predictions of functional ORFs. Here, we present a tool for conserved uORF annotation in 5<sup>0</sup> upstream sequences of a user-defined protein of interest or a set of protein homologs. It can also be used to find small conserved ORFs within a set of nucleotide sequences. The output includes publication-quality figures with multiple sequence alignments, sequence logos, and locus annotation of the predicted conserved uORFs in graphical vector format. Availability and implementation: uORF4u is written in Python3 and runs on Linux and MacOS. The command-line interface covers most practical use cases, while the provided Python API allows usage within a Python program and additional customization. Source code is available from the GitHub page: github.com/GCA-VH-lab/uorf4u. Detailed documentation that includes an example-driven guide available at the software home page: gca-vh-lab.github.io/uorf4u. A web version of uORF4u is available at server.atkinson-lab.com/uorf4u.</p>}},
  author       = {{Egorov, Artyom A. and Atkinson, Gemma C.}},
  issn         = {{1367-4803}},
  language     = {{eng}},
  number       = {{5}},
  publisher    = {{Oxford University Press}},
  series       = {{Bioinformatics}},
  title        = {{uORF4u : a tool for annotation of conserved upstream open reading frames}},
  url          = {{http://dx.doi.org/10.1093/bioinformatics/btad323}},
  doi          = {{10.1093/bioinformatics/btad323}},
  volume       = {{39}},
  year         = {{2023}},
}