Advanced

Methods for structural variation detection and improved theory prediction for densely labeled DNA barcodes

Stewart, Callum (2017) BINP32 20162
Degree Projects in Bioinformatics
Abstract
Optical DNA mapping offers an alternative or complementary method of sequencing and identification to the widely used next generation sequencing techniques. However, the rich selection of computational tools found for standard nucleotide sequencing are not as mature for DNA optical maps. This thesis focuses on two tools for the analysis of densely labeled optical maps. Firstly, parameters for an existing tool that predicts dye binding along a known nucleotide sequence are improved upon. Secondly, a local alignment tool based on profile hidden Markov models is introduced and compared to other dynamic programming based methods for the purpose of detecting structural variations in densely labeled optical DNA maps. The ability to annotate... (More)
Optical DNA mapping offers an alternative or complementary method of sequencing and identification to the widely used next generation sequencing techniques. However, the rich selection of computational tools found for standard nucleotide sequencing are not as mature for DNA optical maps. This thesis focuses on two tools for the analysis of densely labeled optical maps. Firstly, parameters for an existing tool that predicts dye binding along a known nucleotide sequence are improved upon. Secondly, a local alignment tool based on profile hidden Markov models is introduced and compared to other dynamic programming based methods for the purpose of detecting structural variations in densely labeled optical DNA maps. The ability to annotate experimental optical maps using annotations from a related nucleotide sequence is also demonstrated. (Less)
Popular Abstract
Tools for DNA optical maps

The last couple of decades have seen a revolutionary advancement in DNA sequencing, but challenges still remain, even for next generation sequencing techniques. As a complement, or even alternative to, nucleotide-level sequencing, it is possible to image fluorescently dyed DNA to picture a low-resolution but high-range sequence, or optical map.

One use for optical maps may be in genome assembly. Current nucleotide sequencing has small read lengths, leaving segments, or contigs, that must be ordered. This is a difficult task, particularly for newly sequenced genomes, but could be assisted by using a long-range optical map of the entire sequence as a guide. Additionally, an optical map could be used to... (More)
Tools for DNA optical maps

The last couple of decades have seen a revolutionary advancement in DNA sequencing, but challenges still remain, even for next generation sequencing techniques. As a complement, or even alternative to, nucleotide-level sequencing, it is possible to image fluorescently dyed DNA to picture a low-resolution but high-range sequence, or optical map.

One use for optical maps may be in genome assembly. Current nucleotide sequencing has small read lengths, leaving segments, or contigs, that must be ordered. This is a difficult task, particularly for newly sequenced genomes, but could be assisted by using a long-range optical map of the entire sequence as a guide. Additionally, an optical map could be used to identify DNA in cases where the exact nucleotide sequence is either known or not important. This work concerns densely labeled optical maps, many small fluorescent dye molecules are bound along a DNA molecule, with binding dependent on the underlying nucleotide sequence. Different sequences of DNA will therefore result in different sequences of light intensity along the molecule, with a pixel of light covering roughly 500 nucleotides.

Nucleotide sequences have a rich backing of computational tools, but tools for optical map are comparatively less extensive and developed. It is the goal of this thesis to improve on two computational aspects: predicting how the fluorescent molecules will bind along a known nucleotide sequence and detecting structural variations between related but non-identical optical maps.

To predict a theoretical optical map based on a nucleotide sequence, a method has previously been developed. However, the parameters describing the binding strength of different ligands were overly simplified. The binding strength of one of the ligands is dependent on the nucleotide sequence. The binding site size is 4 nucleotides long, and with 4 different nucleotides that gives 256, ignoring duplications on the reverse strand, possible binding strengths that need to be accounted for. That is too many parameters to train with few experimental sequences, and so several simplified models are put forward, including using values derived from the literature. The consequently chosen parameters give a closer fit between experimental and theoretical optical maps, which should make detecting structural variations, among other uses, more accurate.

A structural variation is some rearrangement, loss, or repetition of a segment of a sequence. They are quite common when looking at bacterial DNA where they can lead to changes in pathogenicity or drug resistance. It is therefore important to be able to detect structural variations in optical maps when they are being used to identify pathogens. To detect structural variations we compare a variety of algorithms that find either local or ‘glocal’ alignments. A local alignment between two sequences finds similar subsequences with no regard for the overall alignment. A glocal alignment tries to align the entire length of a sequence, while allowing for rearrangements, where the rearrangements in the glocal alignment correspond to structural variations. The most accurate method when comparing artificially generated structural variations was a profile hidden Markov model method, although there is still room for improvement, for example in determining statistical significance or by rescaling optical maps differently.

Master’s Degree Project in Bioinformatics 60 credits 2016-2017
Department of Biology, Lund University
Advisor: Tobias Ambjörnsson
Department of Theoretical Physics, Lund University (Less)
Please use this url to cite or link to this publication:
author
Stewart, Callum
supervisor
organization
course
BINP32 20162
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
8925394
date added to LUP
2017-09-12 09:05:29
date last changed
2017-09-12 09:05:29
@misc{8925394,
  abstract     = {Optical DNA mapping offers an alternative or complementary method of sequencing and identification to the widely used next generation sequencing techniques. However, the rich selection of computational tools found for standard nucleotide sequencing are not as mature for DNA optical maps. This thesis focuses on two tools for the analysis of densely labeled optical maps. Firstly, parameters for an existing tool that predicts dye binding along a known nucleotide sequence are improved upon. Secondly, a local alignment tool based on profile hidden Markov models is introduced and compared to other dynamic programming based methods for the purpose of detecting structural variations in densely labeled optical DNA maps. The ability to annotate experimental optical maps using annotations from a related nucleotide sequence is also demonstrated.},
  author       = {Stewart, Callum},
  language     = {eng},
  note         = {Student Paper},
  title        = {Methods for structural variation detection and improved theory prediction for densely labeled DNA barcodes},
  year         = {2017},
}