Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Comparative testing of DNA segmentation algorithms using benchmark simulations

Elhaik, Eran LU orcid ; Graur, Dan and Josic, Kresimir (2010) In Molecular biology and evolution 27(5). p.1015-1024
Abstract

Numerous segmentation methods for the detection of compositionally homogeneous domains within genomic sequences have been proposed. Unfortunately, these methods yield inconsistent results. Here, we present a benchmark consisting of two sets of simulated genomic sequences for testing the performances of segmentation algorithms. Sequences in the first set are composed of fixed-sized homogeneous domains, distinct in their between-domain guanine and cytosine (GC) content variability. The sequences in the second set are composed of a mosaic of many short domains and a few long ones, distinguished by sharp GC content boundaries between neighboring domains. We use these sets to test the performance of seven segmentation algorithms in the... (More)

Numerous segmentation methods for the detection of compositionally homogeneous domains within genomic sequences have been proposed. Unfortunately, these methods yield inconsistent results. Here, we present a benchmark consisting of two sets of simulated genomic sequences for testing the performances of segmentation algorithms. Sequences in the first set are composed of fixed-sized homogeneous domains, distinct in their between-domain guanine and cytosine (GC) content variability. The sequences in the second set are composed of a mosaic of many short domains and a few long ones, distinguished by sharp GC content boundaries between neighboring domains. We use these sets to test the performance of seven segmentation algorithms in the literature. Our results show that recursive segmentation algorithms based on the Jensen-Shannon divergence outperform all other algorithms. However, even these algorithms perform poorly in certain instances because of the arbitrary choice of a segmentation-stopping criterion.

(Less)
Please use this url to cite or link to this publication:
author
; and
publishing date
type
Contribution to journal
publication status
published
keywords
Algorithms, Base Composition/genetics, Base Pairing/genetics, Base Sequence, Chromosomes, Human, Pair 1/genetics, Computational Biology/methods, Computer Simulation, DNA/genetics, Databases, Nucleic Acid, Genome, Human/genetics, Humans, Sequence Analysis, DNA/methods, Time Factors
in
Molecular biology and evolution
volume
27
issue
5
pages
1015 - 1024
publisher
Oxford University Press
external identifiers
  • scopus:77951536244
  • pmid:20018981
ISSN
0737-4038
DOI
10.1093/molbev/msp307
language
English
LU publication?
no
id
b1780a06-f283-4149-8b28-59c41484a39e
date added to LUP
2019-11-10 16:50:19
date last changed
2024-10-02 16:10:57
@article{b1780a06-f283-4149-8b28-59c41484a39e,
  abstract     = {{<p>Numerous segmentation methods for the detection of compositionally homogeneous domains within genomic sequences have been proposed. Unfortunately, these methods yield inconsistent results. Here, we present a benchmark consisting of two sets of simulated genomic sequences for testing the performances of segmentation algorithms. Sequences in the first set are composed of fixed-sized homogeneous domains, distinct in their between-domain guanine and cytosine (GC) content variability. The sequences in the second set are composed of a mosaic of many short domains and a few long ones, distinguished by sharp GC content boundaries between neighboring domains. We use these sets to test the performance of seven segmentation algorithms in the literature. Our results show that recursive segmentation algorithms based on the Jensen-Shannon divergence outperform all other algorithms. However, even these algorithms perform poorly in certain instances because of the arbitrary choice of a segmentation-stopping criterion.</p>}},
  author       = {{Elhaik, Eran and Graur, Dan and Josic, Kresimir}},
  issn         = {{0737-4038}},
  keywords     = {{Algorithms; Base Composition/genetics; Base Pairing/genetics; Base Sequence; Chromosomes, Human, Pair 1/genetics; Computational Biology/methods; Computer Simulation; DNA/genetics; Databases, Nucleic Acid; Genome, Human/genetics; Humans; Sequence Analysis, DNA/methods; Time Factors}},
  language     = {{eng}},
  number       = {{5}},
  pages        = {{1015--1024}},
  publisher    = {{Oxford University Press}},
  series       = {{Molecular biology and evolution}},
  title        = {{Comparative testing of DNA segmentation algorithms using benchmark simulations}},
  url          = {{http://dx.doi.org/10.1093/molbev/msp307}},
  doi          = {{10.1093/molbev/msp307}},
  volume       = {{27}},
  year         = {{2010}},
}