Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

SeqCapAnalyst, a pipeline for analyzing Sequence Capture data in avian malaria

Hansson, Roland (2018) BINP32 20162
Degree Projects in Bioinformatics
Abstract
SeqCapAnalyst is a pipeline to process .fastq files, producing the standard files as well as statistics useful for singling out good targets for Sequence Capture probes. Rather than speed, it is optimized for flexibility and ease of modification. SeqCapAnalyst is predominantly written in bash, with one module written in Python.

The pipeline was tested on sequence capture data which was obtained for 1000 genes, based on probes designed from the genome of Haemoproteus tartakovskyi. For this study, data from sixteen samples was used, including four isolates of H. tartakovskyi from which the probes were designed, 11 samples from more distantly related species of Haemoproteus, and one sample of a Plasmodium parasite. (See appendix 2 for... (More)
SeqCapAnalyst is a pipeline to process .fastq files, producing the standard files as well as statistics useful for singling out good targets for Sequence Capture probes. Rather than speed, it is optimized for flexibility and ease of modification. SeqCapAnalyst is predominantly written in bash, with one module written in Python.

The pipeline was tested on sequence capture data which was obtained for 1000 genes, based on probes designed from the genome of Haemoproteus tartakovskyi. For this study, data from sixteen samples was used, including four isolates of H. tartakovskyi from which the probes were designed, 11 samples from more distantly related species of Haemoproteus, and one sample of a Plasmodium parasite. (See appendix 2 for details.)

Out of the 1000 genes, 380 had an average coverage of 60% or better, indicating suitability for testing further samples. Out of the probes in these genes, 2206 were in the top 50% with regards to coverage, indicating that they were a factor in the sequence capture.

Successful usage as well as benchmark testing indicate the SeqCapAnalyst pipeline is potentially useful. (Less)
Popular Abstract
Making Sequence Capturing Easier

Avian malaria parasites are frequently used as model organisms to test hypotheses in the field of host-parasite ecology and evolution. However, since malaria is extracted from blood, and avian red blood cells, unlike mammal ones, contain DNA – meaning there is much contamination from the host. This is usually compensated for with filtering techniques such as Sequence Capture.

Sequence Capture works by utilizing 120-base-pair DNA probes, that fit to the DNA you are interested in, allowing you to wash away non-binding DNA. The remaining DNA is then sequenced as per usual.
Because the probe-binding sequence is just a minor part of the DNA fragment, this allows you to get DNA information about areas... (More)
Making Sequence Capturing Easier

Avian malaria parasites are frequently used as model organisms to test hypotheses in the field of host-parasite ecology and evolution. However, since malaria is extracted from blood, and avian red blood cells, unlike mammal ones, contain DNA – meaning there is much contamination from the host. This is usually compensated for with filtering techniques such as Sequence Capture.

Sequence Capture works by utilizing 120-base-pair DNA probes, that fit to the DNA you are interested in, allowing you to wash away non-binding DNA. The remaining DNA is then sequenced as per usual.
Because the probe-binding sequence is just a minor part of the DNA fragment, this allows you to get DNA information about areas close to the probe.

Sequence Capture, however, requires you to have these probes in the first place; ones that match the DNA from the species you want, while not binding the contaminant DNA.

SeqCapAnalyst
SeqCapAnalyst is a pipeline to process DNA sequencing files (in the .fastq format), producing the standard files as well as statistics useful for evaluating the performance of Sequence Capture probes. Rather than speed, it is optimized for flexibility and ease of modification.
It is predominantly written in bash, with one module written in Python.

The pipeline was tested on sequence capture data derived from 31,650 probes covering 1001 genes, with probes taken from the genome of Haemoproteus tartakovskyi. Sixteen samples were used, representing 9 species of malaria with varying genetic distance.

380 genes had good coverage - 60% or better - and of the probes in these genes, 2206 had 50% average coverage or better, indicating that they are suitable for use in sequence capture.
Successful usage and benchmark testing indicate the SeqCapAnalyst pipeline is potentially useful, but more tests are needed to define threshold values for what constitutes a high-quality probe.

Master’s Degree Project in Bioinformatics 60 credits 2018
Department of Biology, Lund University

Advisor: Björn Canbäck, Staffan Bensch (Less)
Please use this url to cite or link to this publication:
author
Hansson, Roland
supervisor
organization
course
BINP32 20162
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
8937993
date added to LUP
2018-03-23 16:02:50
date last changed
2018-03-23 16:02:50
@misc{8937993,
  abstract     = {{SeqCapAnalyst is a pipeline to process .fastq files, producing the standard files as well as statistics useful for singling out good targets for Sequence Capture probes. Rather than speed, it is optimized for flexibility and ease of modification. SeqCapAnalyst is predominantly written in bash, with one module written in Python.

The pipeline was tested on sequence capture data which was obtained for 1000 genes, based on probes designed from the genome of Haemoproteus tartakovskyi. For this study, data from sixteen samples was used, including four isolates of H. tartakovskyi from which the probes were designed, 11 samples from more distantly related species of Haemoproteus, and one sample of a Plasmodium parasite. (See appendix 2 for details.)

Out of the 1000 genes, 380 had an average coverage of 60% or better, indicating suitability for testing further samples. Out of the probes in these genes, 2206 were in the top 50% with regards to coverage, indicating that they were a factor in the sequence capture.

Successful usage as well as benchmark testing indicate the SeqCapAnalyst pipeline is potentially useful.}},
  author       = {{Hansson, Roland}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{SeqCapAnalyst, a pipeline for analyzing Sequence Capture data in avian malaria}},
  year         = {{2018}},
}