Contig assembly and plasmid analysis using DNA barcodes

Pichler, Christoffer

Contig assembly and plasmid analysis using DNA barcodes

Mark

Pichler, Christoffer ^LU (2016) FYTM02 20152
Computational Biology and Biological Physics - Has been reorganised

Abstract: Two methods of computational analysis of DNA barcodes are presented. A DNA barcode is formed by making GC-rich regions of a DNA molecule fluoresce while AT-rich regions remain dark, thus when stretched using nano-channels and viewed in a microscope, the DNA molecule will resemble a barcode with black and white stripes. Because of point-spread functions and pixellation the resolution will be roughly one data point per 200nm (or roughly 700 bp). This resolution is typically enough to distinguish between two different DNA molecules.

First DNA barcodes are used for analyzing an antibiotic resistance outbreak. In the outbreak, antibiotic resistant bacteria infected newborn children at Sahlgrenska University Hospital. The bacteria were of... (More); Two methods of computational analysis of DNA barcodes are presented. A DNA barcode is formed by making GC-rich regions of a DNA molecule fluoresce while AT-rich regions remain dark, thus when stretched using nano-channels and viewed in a microscope, the DNA molecule will resemble a barcode with black and white stripes. Because of point-spread functions and pixellation the resolution will be roughly one data point per 200nm (or roughly 700 bp). This resolution is typically enough to distinguish between two different DNA molecules.

First DNA barcodes are used for analyzing an antibiotic resistance outbreak. In the outbreak, antibiotic resistant bacteria infected newborn children at Sahlgrenska University Hospital. The bacteria were of different strains and it was suspected that the bacteria shared the antibiotic resistant gene with bacteria not containing it through the exchange of plasmids. A plasmid is a short circular DNA molecule, typical length between 2 kbp to 1 Mbp (base pairs), which bacteria use to store genes that benefit survival (such as antibiotic resistance genes).

The second method is about matching short pieces of DNA sequence, called contigs, to a long intact barcode (from the same molecule as the contigs) to figure out the order of the pieces of sequence. In order to match a sequence to a barcode, the sequence has to be converted into a theoretical barcode first. After that it is compared to the long barcode, to find the optimal placement. Contigs are not supposed to overlap, and that is an assumption used in the methods presented in section 7.

The matching in both methods is facilitated by the use of our new statistical tools in order to reduce the number of false positives in the matching process. The results for the plasmid tracing method show that the method can be used to trace plasmid spread. On the other hand, the results for the contig assembly show that the method has potential to be useful, but at the moment it has been unsuccessful at assembling real contigs into a full, correct, sequence. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/8727977

author

Pichler, Christoffer ^LU

supervisor

Tobias Ambjörnsson ^LU

organization

Computational Biology and Biological Physics - Has been reorganised

course

FYTM02 20152

year

2016

type

H2 - Master's Degree (Two Years)

subject

Physics and Astronomy

keywords

DNA Barcode, Contig, Plasmid, Phase Randomization, Tree, Free Energy, P-value

language

English

id

8727977

date added to LUP

2016-05-13 09:32:17

date last changed

2017-10-06 16:10:55

@misc{8727977,
  abstract     = {{Two methods of computational analysis of DNA barcodes are presented. A DNA barcode is formed by making GC-rich regions of a DNA molecule fluoresce while AT-rich regions remain dark, thus when stretched using nano-channels and viewed in a microscope, the DNA molecule will resemble a barcode with black and white stripes. Because of point-spread functions and pixellation the resolution will be roughly one data point per 200nm (or roughly 700 bp). This resolution is typically enough to distinguish between two different DNA molecules.

First DNA barcodes are used for analyzing an antibiotic resistance outbreak. In the outbreak, antibiotic resistant bacteria infected newborn children at Sahlgrenska University Hospital. The bacteria were of different strains and it was suspected that the bacteria shared the antibiotic resistant gene with bacteria not containing it through the exchange of plasmids. A plasmid is a short circular DNA molecule, typical length between 2 kbp to 1 Mbp (base pairs), which bacteria use to store genes that benefit survival (such as antibiotic resistance genes).

The second method is about matching short pieces of DNA sequence, called contigs, to a long intact barcode (from the same molecule as the contigs) to figure out the order of the pieces of sequence. In order to match a sequence to a barcode, the sequence has to be converted into a theoretical barcode first. After that it is compared to the long barcode, to find the optimal placement. Contigs are not supposed to overlap, and that is an assumption used in the methods presented in section 7.

The matching in both methods is facilitated by the use of our new statistical tools in order to reduce the number of false positives in the matching process. The results for the plasmid tracing method show that the method can be used to trace plasmid spread. On the other hand, the results for the contig assembly show that the method has potential to be useful, but at the moment it has been unsuccessful at assembling real contigs into a full, correct, sequence.}},
  author       = {{Pichler, Christoffer}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Contig assembly and plasmid analysis using DNA barcodes}},
  year         = {{2016}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Contig assembly and plasmid analysis using DNA barcodes