Advanced

THESES db - The Algae 18S rDNA Sequence-Structure Database for Inferring Phylogenies

Marin Rodrigues, Maria Valentina (2016) BINP30 20152
Degree Projects in Bioinformatics
Abstract
There is a long tradition, especially in phycology, of using 18S rDNA sequences in inferring phylogenies, in particular for higher taxonomic level analysis. 18S like ITS2 displays a conserved RNA secondary structure which could be used simultaneously with the sequence to increase the amount of information available when inferring phylogenetic relationships. In ITS2 research sequence-structure phylogenetics is already established. Secondary structures do no longer guide the alignment and trees, but are used simultaneously by encoding the sequence-structure information into a new 12 letter alphabet. In this study we transfer the knowledge gathered from the ITS2 with regards to sequence structure phylogenetics; we present THESES db - The... (More)
There is a long tradition, especially in phycology, of using 18S rDNA sequences in inferring phylogenies, in particular for higher taxonomic level analysis. 18S like ITS2 displays a conserved RNA secondary structure which could be used simultaneously with the sequence to increase the amount of information available when inferring phylogenetic relationships. In ITS2 research sequence-structure phylogenetics is already established. Secondary structures do no longer guide the alignment and trees, but are used simultaneously by encoding the sequence-structure information into a new 12 letter alphabet. In this study we transfer the knowledge gathered from the ITS2 with regards to sequence structure phylogenetics; we present THESES db - The Algae 18S rDNA Sequence-Structure Database (LINK) which contains sequences and structures for three major groups of algae (Chlorophyta, Rhodophyta, and Bacillariophyta). This database should be the starting point for future 18S rDNA sequence-structure based phylogenetic analysis, even beyond phycology. Furthermore, in this study one hundred pairs of phylogenetic trees generated from Rhodophyta 18S sequence-structure data and 18S sequence-only data were compared. Half of the trees have different topologies. Assuming the lineage information for each species listed in GenBank is correct, in 5% the sequence-structure approach considers a genus monophyletic where the sequence-only approach does not. In 3% it is the other way around. Using a bigger sample size, future work is going to test these tendencies on different taxonomic hierarchies. (Less)
Popular Abstract
THESES db

Phylogenetics is the study of evolutionary relationships. These relationships are inferred by conducting phylogenetic analysis. Researchers perform this analysis using the DNA sequence of a gene from an organism and comparing it with the sequence of that same gene in another organisms. Some genes used include the Internal Trancribed Spacer 2 (ITS2) and the 18S ribosomal DNA.

Studies revieled that the ITS2 presents a conserved secondary structure and a variable DNA sequence. Researchers decided to combine the ITS2 DNA sequence together with its secondary structure when they are performing a phylogenetic analysis. Studies have shown that the resulting phylogenetic analysis using ITS2 DNA sequence and secondary structure... (More)
THESES db

Phylogenetics is the study of evolutionary relationships. These relationships are inferred by conducting phylogenetic analysis. Researchers perform this analysis using the DNA sequence of a gene from an organism and comparing it with the sequence of that same gene in another organisms. Some genes used include the Internal Trancribed Spacer 2 (ITS2) and the 18S ribosomal DNA.

Studies revieled that the ITS2 presents a conserved secondary structure and a variable DNA sequence. Researchers decided to combine the ITS2 DNA sequence together with its secondary structure when they are performing a phylogenetic analysis. Studies have shown that the resulting phylogenetic analysis using ITS2 DNA sequence and secondary structure together is more acurate than when only the ITS2 DNA sequence is used.

The benefits of using the DNA sequence together with the secondary structure to infer phylogenetic relationships is clear in the case of the ITS2 and now with my master thesis I present what happens when we transfer this methods using the 18S rDNA.

To this end I now present THESES db – The Algae 18S rDNA Sequence-Structure Database for Inferring Phylogenies. THESES db is a web service which contains >3.000 diatoms, green and red algae 18S rDNA sequences and their individual secondary structures as obtained by homology modeling and ready to use for sequence-structure based phylogenetic tree reconstruction. Testing robustness and accuracy of newly inferred phylogenetic trees is ongoing research.

Advisor in Lund University: Dr Björn Canbäck
Advisor in Würzburg University: Dr Matthias Wolf
Master's Degree Project in Bioinformatics 30 credits 2016 (Less)
Please use this url to cite or link to this publication:
author
Marin Rodrigues, Maria Valentina
supervisor
organization
course
BINP30 20152
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
8820771
date added to LUP
2016-02-29 16:10:47
date last changed
2016-02-29 16:10:47
@misc{8820771,
  abstract     = {There is a long tradition, especially in phycology, of using 18S rDNA sequences in inferring phylogenies, in particular for higher taxonomic level analysis. 18S like ITS2 displays a conserved RNA secondary structure which could be used simultaneously with the sequence to increase the amount of information available when inferring phylogenetic relationships. In ITS2 research sequence-structure phylogenetics is already established. Secondary structures do no longer guide the alignment and trees, but are used simultaneously by encoding the sequence-structure information into a new 12 letter alphabet. In this study we transfer the knowledge gathered from the ITS2 with regards to sequence structure phylogenetics; we present THESES db - The Algae 18S rDNA Sequence-Structure Database (LINK) which contains sequences and structures for three major groups of algae (Chlorophyta, Rhodophyta, and Bacillariophyta). This database should be the starting point for future 18S rDNA sequence-structure based phylogenetic analysis, even beyond phycology. Furthermore, in this study one hundred pairs of phylogenetic trees generated from Rhodophyta 18S sequence-structure data and 18S sequence-only data were compared. Half of the trees have different topologies. Assuming the lineage information for each species listed in GenBank is correct, in 5% the sequence-structure approach considers a genus monophyletic where the sequence-only approach does not. In 3% it is the other way around. Using a bigger sample size, future work is going to test these tendencies on different taxonomic hierarchies.},
  author       = {Marin Rodrigues, Maria Valentina},
  language     = {eng},
  note         = {Student Paper},
  title        = {THESES db - The Algae 18S rDNA Sequence-Structure Database for Inferring Phylogenies},
  year         = {2016},
}