Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

EPTA: A User-friendly Protein Phylogenetic Tree Generation and Annotating Toolkit

Zhao, Xuran (2022) BINP51 20222
Degree Projects in Bioinformatics
Abstract
Phylogenetic tree analysis is a powerful method for studying the evolution of gene families. There are many different software available to produce and annotation phylogenetic trees that vary in their ease-of-use and user autonomy. We developed a new pipeline tool Easy Protein Tree Annotating (EPTA) to provide a user-friendly tool for phylogenetic tree generation and annotation. EPTA is a command-line tool and was distributed in the form of Conda package, able to run on Windows, Linux and Mac platforms. EPTA has lite and stand-alone version, which lite version run all computational steps on public web servers and stand-alone version able to run all steps locally while including all functionalities of lite version. As a result, EPTA will... (More)
Phylogenetic tree analysis is a powerful method for studying the evolution of gene families. There are many different software available to produce and annotation phylogenetic trees that vary in their ease-of-use and user autonomy. We developed a new pipeline tool Easy Protein Tree Annotating (EPTA) to provide a user-friendly tool for phylogenetic tree generation and annotation. EPTA is a command-line tool and was distributed in the form of Conda package, able to run on Windows, Linux and Mac platforms. EPTA has lite and stand-alone version, which lite version run all computational steps on public web servers and stand-alone version able to run all steps locally while including all functionalities of lite version. As a result, EPTA will generate a phylogenetic tree from FASTA file with one line command and reasonable speed. We believe it will helps researchers work more efficiently in protein studies. (Less)
Popular Abstract
An User-Friendly Tool for Protein Phylogenetic Tree Annotating

Bioinformatics pipelines are heavily used with a wide variety in multiple fields of biological studies to increase efficiency by running bioinformatics workflows automatically. However, for many analyses, for example, the annotation of protein phylogenetic trees built from protein sequences, bioinformatics pipelines are not user-friendly. Protein phylogenetic trees are trees that show the evolutionary relationship among species or proteins based on similarities and differences in their protein sequences. Using current software, users either need to bear complicated manual work or run a pipeline with a few parameters and incomplete workflow. Thus, we think it is necessary to... (More)
An User-Friendly Tool for Protein Phylogenetic Tree Annotating

Bioinformatics pipelines are heavily used with a wide variety in multiple fields of biological studies to increase efficiency by running bioinformatics workflows automatically. However, for many analyses, for example, the annotation of protein phylogenetic trees built from protein sequences, bioinformatics pipelines are not user-friendly. Protein phylogenetic trees are trees that show the evolutionary relationship among species or proteins based on similarities and differences in their protein sequences. Using current software, users either need to bear complicated manual work or run a pipeline with a few parameters and incomplete workflow. Thus, we think it is necessary to develop a user-friendly full-process automatic pipeline tool to solve all these issues.

Our tool, EPTA, the abbreviation of Easy Protein Tree Annotating developed in Python and distributed as a Conda package. The goal of EPTA is to allow users to easily customize and automate phylogenetic tree construction and annotation. To achieve our goal, I introduce plenty of powerful and popular tools and packages into EPTA, such as Pandas, Biopython, PfamScan, MAFFT, IQ-TREE, etc. As a result, EPTA has a full process pipeline from protein sequence (FASTA file) to an annotated protein phylogenetic tree, with 40 parameters, various user-friendly designs and cross platform capability.

EPTA generates an annotated protein phylogenetic tree like the example (FIG 1) with only one command line. Except for the standalone version that can only run locally, EPTA also has a lite version that runs all computational steps on web servers. Run with the lite version. Even an ultraportable laptop can manage to finish 200 sequences of phylogenetic tree construction and annotation in 1 hour.

Furthermore, EPTA has various user-friendly designs, for example, log and configuration files for reproducibility and a manual for users to get familiar with EPTA quickly. Also, distributed as a Conda package, the installation and environment build are fast and easy.

We believe EPTA is a handy tool for those who are struggling with the lengthy workflow of protein phylogenetic tree construction and annotation, and we will keep the improvement of EPTA in the future to make it more convenient and reliable.
Availability
EPTA package is accessible on Anaconda channel:
https://anaconda.org/phillip404/repo.
The source code of EPTA is open on Git-hub link :
https://github.com/Phillip404/easy_protein_tree_annotating.
EPTA documentation supported by Sphinx and Read the Docs:
https://easy-protein-tree-annotating.readthedocs.io/en/latest/.

Master’s Degree Project in Biology/Molecular Biology/Bioinformatics 45 credits 2022
Department of Biology, Lund University

Supervisor: Courtney Stairs
Department of Biology, Lund University (Less)
Please use this url to cite or link to this publication:
author
Zhao, Xuran
supervisor
organization
course
BINP51 20222
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
9112920
date added to LUP
2023-03-27 14:34:46
date last changed
2023-03-27 14:34:46
@misc{9112920,
  abstract     = {{Phylogenetic tree analysis is a powerful method for studying the evolution of gene families. There are many different software available to produce and annotation phylogenetic trees that vary in their ease-of-use and user autonomy. We developed a new pipeline tool Easy Protein Tree Annotating (EPTA) to provide a user-friendly tool for phylogenetic tree generation and annotation. EPTA is a command-line tool and was distributed in the form of Conda package, able to run on Windows, Linux and Mac platforms. EPTA has lite and stand-alone version, which lite version run all computational steps on public web servers and stand-alone version able to run all steps locally while including all functionalities of lite version. As a result, EPTA will generate a phylogenetic tree from FASTA file with one line command and reasonable speed. We believe it will helps researchers work more efficiently in protein studies.}},
  author       = {{Zhao, Xuran}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{EPTA: A User-friendly Protein Phylogenetic Tree Generation and Annotating Toolkit}},
  year         = {{2022}},
}