Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

OrthoGroup SEARCH: Identification of Orthologous genes using Synteny

Mohanty, Madhuchhanda (2014) BINP32 20131
Degree Projects in Bioinformatics
Abstract
Abstract

One problem with identifying orthologs is to distinguish them from paralogs. Finding orthologs in expanded gene families (with many paralogs) has proven to be notoriously difficult. Orthologs shared between two or more species are often identified by reciprocal BLAST (1) searches. In the case, when sequences are derived from only two species an ortholog pair is easily defined. When using more than two species, the situation gets more complicated. If four species are used, the four top hits should include one and only one sequence from each species, and this list has to be the same for all the four species (but not in the same order). Software has been developed for this purpose. OrthoMCL (2) uses BLAST searches or other... (More)
Abstract

One problem with identifying orthologs is to distinguish them from paralogs. Finding orthologs in expanded gene families (with many paralogs) has proven to be notoriously difficult. Orthologs shared between two or more species are often identified by reciprocal BLAST (1) searches. In the case, when sequences are derived from only two species an ortholog pair is easily defined. When using more than two species, the situation gets more complicated. If four species are used, the four top hits should include one and only one sequence from each species, and this list has to be the same for all the four species (but not in the same order). Software has been developed for this purpose. OrthoMCL (2) uses BLAST searches or other sequence similarity based tools. Data is stored in a MySQL database. The advantage of this software is that many gene catalogs or proteomes can be analyzed in a single run. The disadvantage is the requirement of setting up a database server. Like in KOG (Eukaryotic Orthologous Groups of proteins) (3), two or more sequences from the same species may end up in the same orthologous group. InParanoid (4) is an alternative software that attempts to avoid the inclusion of out-paralogs in the orthologous groups. It identifies orthologs and in-paralogs between two species. Ortholog clusters in the InParanoid are seeded with a two-way best pairwise match using NCBI-Blast, after which an algorithm for adding in-paralogs is applied. Proteinortho (5) detects (co-)orthologs for large datasets. It uses an extended version of the reciprocal best alignment heuristic. It identifies co-orthologs as well as true orthologs. The here developed software “OrthoGroup SEARCH” is to identify true orthologs while avoiding inclusion of paralogs by using synteny (gene order homology). The software will assist in automatic alignment of orthologous sequences and subsequent test for positive selection, gene/protein evolution, genome annotation, comparative genomics, phylogenetic analysis and identification of candidates for drug and/or vaccine development. (Less)
Abstract
Popular science summary

Ortho Group SEARCH

A gene is a unit of heredity of a living species. Genes are transferred from one generation to the next by vertical descent. Two genes are homologous if derived from a common ancestral gene. Homologous genes related by duplication event within the genome are paralogous genes. Copies of the same ancestral gene separated by a speciation event (when a species diverges into two separate species) found in resulting species are orthologous genes. Synteny is an occurrence where two or more genes having a specific order are located in the same chromosome shared by related species passed down from a common ancestor. For example, if species A has genes 1, 2 and 5, while species B has genes 1, 2 and 8.... (More)
Popular science summary

Ortho Group SEARCH

A gene is a unit of heredity of a living species. Genes are transferred from one generation to the next by vertical descent. Two genes are homologous if derived from a common ancestral gene. Homologous genes related by duplication event within the genome are paralogous genes. Copies of the same ancestral gene separated by a speciation event (when a species diverges into two separate species) found in resulting species are orthologous genes. Synteny is an occurrence where two or more genes having a specific order are located in the same chromosome shared by related species passed down from a common ancestor. For example, if species A has genes 1, 2 and 5, while species B has genes 1, 2 and 8. Both species are said to have syntenic genes 1 and 2 given that they are arranged in the same order.

Identifying orthologous genes is a critical goal in genomics. The approach taken here is to identify orthologous genes using synteny. This approach follows two steps. In the first step, the traditional sequence similarity method blastp is used to identify homologous sequences. In the second, the synteny is used to distinguish true orthologous genes from paralogous genes.

Results
A program with this approach for finding orthologous genes is written in the popular C++ programming language. It requires a limited amount of memory, time and can be run both on Windows and Linux operating system. The input to this program is the positions of the genes in the genome, information on the similarity between all the proteins that these genes are coding for and amino acid sequences data for each protein. It outputs the groups of orthologous genes and the evolutionary tree that shows how species are related to each other. The program is incorporated into a web application “OrthoGroup SEARCH” (http://130.235.46.99/~student/OrthoGroup.html). It allows the user to visualize group of orthologous genes specifying the genomic positions of conserved regions as graphics and also to visualize their evolutionary tree diagrams.

Applications
Orthologous genes, as compared to paralogous genes, are more likely to share the same function. They are expected to reflect species evolution. They highlight the divergence and conservation of gene families and biological processes. There are various implications of orthologous genes like positive selection, genome annotation, gene/protein evolution, comparative genomics, phylogenetic analysis and identification of candidates for drug and/or vaccine development.


Advisor: Björn Canbäck
Master´s Degree Project 60 credits in Bioinformatics, 60 credits
Department of Biology, Lund university (Less)
Please use this url to cite or link to this publication:
author
Mohanty, Madhuchhanda
supervisor
organization
course
BINP32 20131
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
4438761
date added to LUP
2014-05-19 14:18:52
date last changed
2014-05-19 14:18:52
@misc{4438761,
  abstract     = {{Popular science summary

Ortho Group SEARCH

A gene is a unit of heredity of a living species. Genes are transferred from one generation to the next by vertical descent. Two genes are homologous if derived from a common ancestral gene. Homologous genes related by duplication event within the genome are paralogous genes. Copies of the same ancestral gene separated by a speciation event (when a species diverges into two separate species) found in resulting species are orthologous genes. Synteny is an occurrence where two or more genes having a specific order are located in the same chromosome shared by related species passed down from a common ancestor. For example, if species A has genes 1, 2 and 5, while species B has genes 1, 2 and 8. Both species are said to have syntenic genes 1 and 2 given that they are arranged in the same order.

Identifying orthologous genes is a critical goal in genomics. The approach taken here is to identify orthologous genes using synteny. This approach follows two steps. In the first step, the traditional sequence similarity method blastp is used to identify homologous sequences. In the second, the synteny is used to distinguish true orthologous genes from paralogous genes.

Results
A program with this approach for finding orthologous genes is written in the popular C++ programming language. It requires a limited amount of memory, time and can be run both on Windows and Linux operating system. The input to this program is the positions of the genes in the genome, information on the similarity between all the proteins that these genes are coding for and amino acid sequences data for each protein. It outputs the groups of orthologous genes and the evolutionary tree that shows how species are related to each other. The program is incorporated into a web application “OrthoGroup SEARCH” (http://130.235.46.99/~student/OrthoGroup.html). It allows the user to visualize group of orthologous genes specifying the genomic positions of conserved regions as graphics and also to visualize their evolutionary tree diagrams.

Applications
Orthologous genes, as compared to paralogous genes, are more likely to share the same function. They are expected to reflect species evolution. They highlight the divergence and conservation of gene families and biological processes. There are various implications of orthologous genes like positive selection, genome annotation, gene/protein evolution, comparative genomics, phylogenetic analysis and identification of candidates for drug and/or vaccine development.


Advisor: Björn Canbäck 
Master´s Degree Project 60 credits in Bioinformatics, 60 credits
Department of Biology, Lund university}},
  author       = {{Mohanty, Madhuchhanda}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{OrthoGroup SEARCH: Identification of Orthologous genes using Synteny}},
  year         = {{2014}},
}