Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Hierarchical clustering matrix (HCM) method applied to DNA barcode assembly for bacterial chromosomes

Zhu, Wensi LU (2018) FYTM04 20181
Department of Astronomy and Theoretical Physics - Undergoing reorganization
Computational Biology and Biological Physics - Undergoing reorganization
Abstract
DNA barcodes carry coarse-grained genetic information of DNA sequences taken from a genome. Potential applications include bacteriology, medical diagnosis and taxonomy. However, the current state-of-the-art tools for extracting DNA molecules from cells provide only fragmented pieces of chromosomal DNA. As a consequence, also DNA barcodes are fragmented. This calls for the development of complementary computational methods to piece up the fragments which help to restore the intact barcodes. Challenges for such developments are noise effects, an influence of DNA structural variation and experimental errors.

This thesis presents a new method for assembling DNA fragments of large sizes (300 kilobase pairs in mean length). We develop a... (More)
DNA barcodes carry coarse-grained genetic information of DNA sequences taken from a genome. Potential applications include bacteriology, medical diagnosis and taxonomy. However, the current state-of-the-art tools for extracting DNA molecules from cells provide only fragmented pieces of chromosomal DNA. As a consequence, also DNA barcodes are fragmented. This calls for the development of complementary computational methods to piece up the fragments which help to restore the intact barcodes. Challenges for such developments are noise effects, an influence of DNA structural variation and experimental errors.

This thesis presents a new method for assembling DNA fragments of large sizes (300 kilobase pairs in mean length). We develop a matrix-based hierarchical clustering algorithm to piece together the DNA fragments by assembling the overlapping DNA regions. Two barcodes are compared by sliding one to another to find the best alignment position. Following this step, we average the overlapping regions and stitch two barcodes together into an assembled barcode. By repeating the above process, we could get a near-intact full barcode of an intact chromosome. We demonstrate that our method works quite well for assembling fragments of theory barcodes with added noise. For the experimental barcodes, we only get several large pieces instead of an intact barcode. In the last section we discuss possible improvements of our method and future applications of DNA barcode assembly of large-sized DNA barcodes. (Less)
Popular Abstract
DNA sequencing, which is a process determining the specific order of nucleotides of a DNA molecule, has a pivotal role in a wide range of scientific fields such as medical diagnosis, forensic biology and biotechnology. DNA barcoding is a complementary tool to traditional DNA sequencing which provides coarse-grained sequence information and it can be used, for instance, for species identification, in the same way a scanner uses the black and white stripes of the barcode for goods in the supermarket.

How do the DNA barcodes work? After staining with dye molecules, DNA molecules with different sequence information have different fluorescent patterns observed with fluorescence microscopy. Then we convert these fluorescent patterns into DNA... (More)
DNA sequencing, which is a process determining the specific order of nucleotides of a DNA molecule, has a pivotal role in a wide range of scientific fields such as medical diagnosis, forensic biology and biotechnology. DNA barcoding is a complementary tool to traditional DNA sequencing which provides coarse-grained sequence information and it can be used, for instance, for species identification, in the same way a scanner uses the black and white stripes of the barcode for goods in the supermarket.

How do the DNA barcodes work? After staining with dye molecules, DNA molecules with different sequence information have different fluorescent patterns observed with fluorescence microscopy. Then we convert these fluorescent patterns into DNA barcodes, which serve as a sequence-dependent 'ID card' for each DNA molecule. The process takes either a few minutes or hours, depending on the number of molecules. This approach is faster than traditional ones, which can take days or weeks from sample preparation to results. Therefore it has potentials in rapid diagnostics.

However, many unsolved problems still hinder the wide use of DNA barcodes. For example, the fluorescent images are always affected by experimental noise so that the patterns of the same regions of a DNA molecule all have slight differences. Furthermore, DNA molecules are fragmented during extraction from cells, and thus the fluorescent patterns that are collected reflect only the fragmented molecule. Like pieces of a puzzle, the fragmented barcodes need to be joined together to get a barcode for the intact DNA molecule.

This thesis reports on developing an algorithm to piece up DNA fragmented barcodes for chromosomal bacterial DNA while handling the problems mentioned above. This work provides an important opportunity to improve the ways of piecing full chromosomal barcodes together. In the future, chromosomal barcoding might open up new ways to profile the genetic dynamics in various fields like the fast diagnosis of bacterial infections, embryonic diagnostics and cancer diagnosis. (Less)
Please use this url to cite or link to this publication:
author
Zhu, Wensi LU
supervisor
organization
course
FYTM04 20181
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
8963612
date added to LUP
2018-11-29 12:06:57
date last changed
2018-11-29 12:09:03
@misc{8963612,
  abstract     = {{DNA barcodes carry coarse-grained genetic information of DNA sequences taken from a genome. Potential applications include bacteriology, medical diagnosis and taxonomy. However, the current state-of-the-art tools for extracting DNA molecules from cells provide only fragmented pieces of chromosomal DNA. As a consequence, also DNA barcodes are fragmented. This calls for the development of complementary computational methods to piece up the fragments which help to restore the intact barcodes. Challenges for such developments are noise effects, an influence of DNA structural variation and experimental errors.

This thesis presents a new method for assembling DNA fragments of large sizes (300 kilobase pairs in mean length). We develop a matrix-based hierarchical clustering algorithm to piece together the DNA fragments by assembling the overlapping DNA regions. Two barcodes are compared by sliding one to another to find the best alignment position. Following this step, we average the overlapping regions and stitch two barcodes together into an assembled barcode. By repeating the above process, we could get a near-intact full barcode of an intact chromosome. We demonstrate that our method works quite well for assembling fragments of theory barcodes with added noise. For the experimental barcodes, we only get several large pieces instead of an intact barcode. In the last section we discuss possible improvements of our method and future applications of DNA barcode assembly of large-sized DNA barcodes.}},
  author       = {{Zhu, Wensi}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Hierarchical clustering matrix (HCM) method applied to DNA barcode assembly for bacterial chromosomes}},
  year         = {{2018}},
}