Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Chromosomal DNA Barcode Assembly Using Hierarchical Clustering Matrix Method: Including Elastic Matching

Clarkson, Erik LU (2020) In LU-TP FYTK02 20201
Computational Biology and Biological Physics - Undergoing reorganization
Abstract
Obtaining DNA sequences is a time-consuming task, which typically requires one or several days for completion. One way of reducing analysis times is to be satisfied with long-range sequence patterns on the order of thousands of base pairs. DNA barcoding is a DNA-characterising technique that works according to this principle. It does so by using fluorescence microscopy to visualise long-range sequence patterns along DNA molecules which are fluorescently stained. The resultant light intensity curve works as an often unique identifier and is called a DNA barcode. This would be sufficient for identifying many bacteria species and would also provide a faster result compared to other candidate methods, with possible implementations in... (More)
Obtaining DNA sequences is a time-consuming task, which typically requires one or several days for completion. One way of reducing analysis times is to be satisfied with long-range sequence patterns on the order of thousands of base pairs. DNA barcoding is a DNA-characterising technique that works according to this principle. It does so by using fluorescence microscopy to visualise long-range sequence patterns along DNA molecules which are fluorescently stained. The resultant light intensity curve works as an often unique identifier and is called a DNA barcode. This would be sufficient for identifying many bacteria species and would also provide a faster result compared to other candidate methods, with possible implementations in bacteriology, diagnosis and epidemiology.

When DNA is to be extracted from cells, it breaks at some points along the way, resulting in DNA fragments. This happens even with the most sophisticated methods to date. Therefore, a computational part of the assembly process is required in order to obtain an intact DNA barcode. This thesis explores the addition of stretching out the fragments in the assembly process, to see to what extent it increases the assembly quality, as compared to a previous method [Wensi Zhu, Lund University, 2018]. Stretching as a parameter is motivated by the fact that confined DNA fragments in nano-channels are not equally stretched. In the assembly, we merge the fragments based on their similarity at different overlap and in a hierarchical order, always merging the best matching pair first.

Comparing stretching to non-stretching, we found that the number of merged fragments and the size of DNA that it covers increases considerably with stretching included in the assembly process. It is therefore well motivated to include stretching in further analyses of DNA barcode assembly, in the ambition of developing DNA barcoding further. (Less)
Popular Abstract
Imagine that the diagnosis of an ill person, implemented by identifying his/hers bacteria, could be made in a couple of hours. By identifying the bacterium that is the root of the illness, the appropriate measures could be taken directly at the hospital.

It turns out that the main features of bacterial DNA may be captured in a simple way: staining the DNA with two types of ligands (molecules binding to DNA) along the chain that re-emit light differently, can yield large-scale sequence information. In short, by photographing this chain, one obtains a unique fluorescence intensity pattern for every type of bacterium. This is simply a curve, much like the graph of a share on the stock market.

The technique described above is called DNA... (More)
Imagine that the diagnosis of an ill person, implemented by identifying his/hers bacteria, could be made in a couple of hours. By identifying the bacterium that is the root of the illness, the appropriate measures could be taken directly at the hospital.

It turns out that the main features of bacterial DNA may be captured in a simple way: staining the DNA with two types of ligands (molecules binding to DNA) along the chain that re-emit light differently, can yield large-scale sequence information. In short, by photographing this chain, one obtains a unique fluorescence intensity pattern for every type of bacterium. This is simply a curve, much like the graph of a share on the stock market.

The technique described above is called DNA barcoding and has several benefits over its alternatives. These benefits include high quality data on a large scale as well as its speed. If such a fluorescence intensity pattern is intact, it is enough to be able to identify a bacterium through comparison with a database. These intensity patterns therefore work as bacterial genomic 'fingerprints'.

Identifying bacterial DNA with current methods is rather cumbersome, requiring special techniques and several days of time. Opting for a DNA barcoding solution, there is an essential problem to overcome: as bacterial DNA is extracted using state-of-the-art techniques, it is unavoidably fragmented. This is a problem that requires computational methods.

So, in this B.Sc. project I have been working on implementing a computer algorithm that finds out how to best piece DNA barcode fragments together, re-obtaining an intact fingerprint. Somewhat in conformity with how a person might reason as he/she solves an old-fashioned puzzle, the computer tries out every possible option, saves the best fit and then repeats until all fragments are linked together.

And just as a puzzle that contains twice the number of pieces may take much longer than twice the time to solve, so the vastness of the DNA problem increases rapidly with the number of fragments. The iterative schedule of linking the fragments is simple nonetheless, and perfectly suitable for a computer, which finishes the job simply and effectively.

Noise from the light emission-experiments is unavoidable though, where an experiment must be done for every bacterium. The remaining challenge is thus to take some of these noise effects into account, in order to reach a good-enough consensus barcode (say, 90 percent similarity) with the ‘real one’, i.e. the database counterpart.

In a previous M.Sc. project [Wensi Zhu, Hierarchical clustering matrix method (HCM) applied to DNA barcode assembly for bacterial chromosomes, Lund University, 2018], Zhu considered an assembly method which assumed that the stretch of the DNA molecules were identical. In that study, the method worked well with no noise effects, but not well enough including the noise. In this study, I extended the method by Zhu to include stretching in the assembly process. We found that by this modification, both the number of merged fragments and the size of the DNA that it covers increases. (Less)
Please use this url to cite or link to this publication:
author
Clarkson, Erik LU
supervisor
organization
course
FYTK02 20201
year
type
M2 - Bachelor Degree
subject
keywords
DNA barcoding
publication/series
LU-TP
report number
20-25
language
English
id
9022052
date added to LUP
2020-06-24 19:13:45
date last changed
2020-07-27 14:52:32
@misc{9022052,
  abstract     = {{Obtaining DNA sequences is a time-consuming task, which typically requires one or several days for completion. One way of reducing analysis times is to be satisfied with long-range sequence patterns on the order of thousands of base pairs. DNA barcoding is a DNA-characterising technique that works according to this principle. It does so by using fluorescence microscopy to visualise long-range sequence patterns along DNA molecules which are fluorescently stained. The resultant light intensity curve works as an often unique identifier and is called a DNA barcode. This would be sufficient for identifying many bacteria species and would also provide a faster result compared to other candidate methods, with possible implementations in bacteriology, diagnosis and epidemiology. 

When DNA is to be extracted from cells, it breaks at some points along the way, resulting in DNA fragments. This happens even with the most sophisticated methods to date. Therefore, a computational part of the assembly process is required in order to obtain an intact DNA barcode. This thesis explores the addition of stretching out the fragments in the assembly process, to see to what extent it increases the assembly quality, as compared to a previous method [Wensi Zhu, Lund University, 2018]. Stretching as a parameter is motivated by the fact that confined DNA fragments in nano-channels are not equally stretched. In the assembly, we merge the fragments based on their similarity at different overlap and in a hierarchical order, always merging the best matching pair first. 

Comparing stretching to non-stretching, we found that the number of merged fragments and the size of DNA that it covers increases considerably with stretching included in the assembly process. It is therefore well motivated to include stretching in further analyses of DNA barcode assembly, in the ambition of developing DNA barcoding further.}},
  author       = {{Clarkson, Erik}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{LU-TP}},
  title        = {{Chromosomal DNA Barcode Assembly Using Hierarchical Clustering Matrix Method: Including Elastic Matching}},
  year         = {{2020}},
}