Advanced

Contig assembly and plasmid analysis using DNA barcodes

Pichler, Christoffer LU (2016) FYTM02 20152
Computational Biology and Biological Physics
Abstract
Two methods of computational analysis of DNA barcodes are presented. A DNA barcode is formed by making GC-rich regions of a DNA molecule fluoresce while AT-rich regions remain dark, thus when stretched using nano-channels and viewed in a microscope, the DNA molecule will resemble a barcode with black and white stripes. Because of point-spread functions and pixellation the resolution will be roughly one data point per 200nm (or roughly 700 bp). This resolution is typically enough to distinguish between two different DNA molecules.

First DNA barcodes are used for analyzing an antibiotic resistance outbreak. In the outbreak, antibiotic resistant bacteria infected newborn children at Sahlgrenska University Hospital. The bacteria were of... (More)
Two methods of computational analysis of DNA barcodes are presented. A DNA barcode is formed by making GC-rich regions of a DNA molecule fluoresce while AT-rich regions remain dark, thus when stretched using nano-channels and viewed in a microscope, the DNA molecule will resemble a barcode with black and white stripes. Because of point-spread functions and pixellation the resolution will be roughly one data point per 200nm (or roughly 700 bp). This resolution is typically enough to distinguish between two different DNA molecules.

First DNA barcodes are used for analyzing an antibiotic resistance outbreak. In the outbreak, antibiotic resistant bacteria infected newborn children at Sahlgrenska University Hospital. The bacteria were of different strains and it was suspected that the bacteria shared the antibiotic resistant gene with bacteria not containing it through the exchange of plasmids. A plasmid is a short circular DNA molecule, typical length between 2 kbp to 1 Mbp (base pairs), which bacteria use to store genes that benefit survival (such as antibiotic resistance genes).

The second method is about matching short pieces of DNA sequence, called contigs, to a long intact barcode (from the same molecule as the contigs) to figure out the order of the pieces of sequence. In order to match a sequence to a barcode, the sequence has to be converted into a theoretical barcode first. After that it is compared to the long barcode, to find the optimal placement. Contigs are not supposed to overlap, and that is an assumption used in the methods presented in section 7.

The matching in both methods is facilitated by the use of our new statistical tools in order to reduce the number of false positives in the matching process. The results for the plasmid tracing method show that the method can be used to trace plasmid spread. On the other hand, the results for the contig assembly show that the method has potential to be useful, but at the moment it has been unsuccessful at assembling real contigs into a full, correct, sequence. (Less)
Please use this url to cite or link to this publication:
author
Pichler, Christoffer LU
supervisor
organization
course
FYTM02 20152
year
type
H2 - Master's Degree (Two Years)
subject
keywords
DNA Barcode, Contig, Plasmid, Phase Randomization, Tree, Free Energy, P-value
language
English
id
8727977
date added to LUP
2016-05-13 09:32:17
date last changed
2017-10-06 16:10:55
@misc{8727977,
  abstract     = {Two methods of computational analysis of DNA barcodes are presented. A DNA barcode is formed by making GC-rich regions of a DNA molecule fluoresce while AT-rich regions remain dark, thus when stretched using nano-channels and viewed in a microscope, the DNA molecule will resemble a barcode with black and white stripes. Because of point-spread functions and pixellation the resolution will be roughly one data point per 200nm (or roughly 700 bp). This resolution is typically enough to distinguish between two different DNA molecules.

First DNA barcodes are used for analyzing an antibiotic resistance outbreak. In the outbreak, antibiotic resistant bacteria infected newborn children at Sahlgrenska University Hospital. The bacteria were of different strains and it was suspected that the bacteria shared the antibiotic resistant gene with bacteria not containing it through the exchange of plasmids. A plasmid is a short circular DNA molecule, typical length between 2 kbp to 1 Mbp (base pairs), which bacteria use to store genes that benefit survival (such as antibiotic resistance genes).

The second method is about matching short pieces of DNA sequence, called contigs, to a long intact barcode (from the same molecule as the contigs) to figure out the order of the pieces of sequence. In order to match a sequence to a barcode, the sequence has to be converted into a theoretical barcode first. After that it is compared to the long barcode, to find the optimal placement. Contigs are not supposed to overlap, and that is an assumption used in the methods presented in section 7.

The matching in both methods is facilitated by the use of our new statistical tools in order to reduce the number of false positives in the matching process. The results for the plasmid tracing method show that the method can be used to trace plasmid spread. On the other hand, the results for the contig assembly show that the method has potential to be useful, but at the moment it has been unsuccessful at assembling real contigs into a full, correct, sequence.},
  author       = {Pichler, Christoffer},
  keyword      = {DNA Barcode,Contig,Plasmid,Phase Randomization,Tree,Free Energy,P-value},
  language     = {eng},
  note         = {Student Paper},
  title        = {Contig assembly and plasmid analysis using DNA barcodes},
  year         = {2016},
}