Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

SVenX: A highly parallelized pipeline for structural variation detection using linked read whole genome sequencing data

Börjesson, Vanja (2018) BINP32 20171
Degree Projects in Bioinformatics
Abstract
Genomic rearrangements larger than 50 bp are called structural variants. As a group, they affect the phenotypic diversity among humans and have been associated with many human disorders including neurodevelopmental disorder and cancer. Recent advances in whole genome sequencing (WGS) technologies have made it possible to identify many more disease-causing genetic variants relevant in clinical diagnostics and sometimes affecting treatment. Numerous approaches have been proposed to detect structural variants, but to acquire and filter out the most significant information from the multitude of called variants in the sequencing data has shown to be a challenge. Another obstacle is the high computational cost of data analyses and difficulties... (More)
Genomic rearrangements larger than 50 bp are called structural variants. As a group, they affect the phenotypic diversity among humans and have been associated with many human disorders including neurodevelopmental disorder and cancer. Recent advances in whole genome sequencing (WGS) technologies have made it possible to identify many more disease-causing genetic variants relevant in clinical diagnostics and sometimes affecting treatment. Numerous approaches have been proposed to detect structural variants, but to acquire and filter out the most significant information from the multitude of called variants in the sequencing data has shown to be a challenge. Another obstacle is the high computational cost of data analyses and difficulties in configuring and operating the softwares and databases. Here, we present SVenX, a highly automated and parallelized pipeline that analyzes and call structural variants using linked read WGS data. It performs variant calling using three different approaches, as well as annotation of variants and variant filtering. We also introduce a new tool, SVGenT, that reanalyzes the called structural variants by performing de novo assembly using the aligned reads at the identified breakpoint junctions. By comparing assembled contigs and analyzing the read coverage between the breakpoint junctions, SVGenT improves both variant and genotype classification and the breakpoint localization. (Less)
Popular Abstract
Tool for detection of genomic rearrangements in humans

Genomic rearrangements larger than 50 base pairs are referred to as structural variants (SVs), and impact phenotypic differences between humans. Some of these variants have been associated with human diseases such as cancer and neurodevelopmental disorders. Recent advances in whole genome sequencing (WGS) technologies have made it possible to analyze and identify many structural variants. Yet, the existing tools used for analyzing these data are not perfect, and require a fair amount of knowledge in bioinformatics to operate.

SVenX is a highly parallelized and automated pipeline, executing all steps from whole genome sequencing data to filtered SVs. This includes 1) verifying... (More)
Tool for detection of genomic rearrangements in humans

Genomic rearrangements larger than 50 base pairs are referred to as structural variants (SVs), and impact phenotypic differences between humans. Some of these variants have been associated with human diseases such as cancer and neurodevelopmental disorders. Recent advances in whole genome sequencing (WGS) technologies have made it possible to analyze and identify many structural variants. Yet, the existing tools used for analyzing these data are not perfect, and require a fair amount of knowledge in bioinformatics to operate.

SVenX is a highly parallelized and automated pipeline, executing all steps from whole genome sequencing data to filtered SVs. This includes 1) verifying that all required data exist, 2) making sure no data duplications exist, 3) finding variants using different methods, and 4) annotating and filtering the detected SVs. SVenX performs 10 separate steps including 3 different variant detection tools (also known as variant callers).
Normally, these steps are performed one by one, waiting for the output before running the next. Not only does it take longer for the programs to run with this approach, it also requires an employee to execute the steps. Except from the installation, SVenX takes at the most a few minutes to setup and launch and can analyze multiple samples of WGS data at the same time. The whole pipeline takes about 4 to 5 days to complete, requiring minimal work effort and bioinformatic knowledge.

Another challenge in SV research is not only detecting the variants, but also to be confident that the detected SVs are true calls. The performance of existing variant callers differ significantly between each other. One tool can perform really good using one dataset and fail totally in detecting SVs in another dataset, while a second tool might be good in detecting only a single type of SV. Using multiple bioinformatics methods to detect SVs have shown to result in a higher detection rate.
We have created a novel tool, SVGenT, that re-analyzes already detected SVs by doing de novo assembly. SVGenT classifies the SV type (deletion, duplication, inversion or break-end), genotype (homozygous or heterozygous), and update the genomic position of the SV breakpoints.

SVGenT has been tested using two datasets: one public large-scale WGS dataset and one simulated dataset with 4000 SVs. Three different variant callers were used to detect the variants before SVGenT was run on the output files. The detection rate was calculated before and after SVGenT was applied. In most cases, SVGenT improved the classification of both SV-type and SV-genotype.

Master’s Degree Project in Biology/Molecular Biology/Bioinformatics 60 credits 2017
Department of Biology, Lund University

Advisor: Anna Lindstrand M.D., Ph.D. Karolinska Institutet. (Less)
Please use this url to cite or link to this publication:
author
Börjesson, Vanja
supervisor
organization
course
BINP32 20171
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
8935588
date added to LUP
2018-02-12 12:17:42
date last changed
2018-02-12 12:17:42
@misc{8935588,
  abstract     = {{Genomic rearrangements larger than 50 bp are called structural variants. As a group, they affect the phenotypic diversity among humans and have been associated with many human disorders including neurodevelopmental disorder and cancer. Recent advances in whole genome sequencing (WGS) technologies have made it possible to identify many more disease-causing genetic variants relevant in clinical diagnostics and sometimes affecting treatment. Numerous approaches have been proposed to detect structural variants, but to acquire and filter out the most significant information from the multitude of called variants in the sequencing data has shown to be a challenge. Another obstacle is the high computational cost of data analyses and difficulties in configuring and operating the softwares and databases. Here, we present SVenX, a highly automated and parallelized pipeline that analyzes and call structural variants using linked read WGS data. It performs variant calling using three different approaches, as well as annotation of variants and variant filtering. We also introduce a new tool, SVGenT, that reanalyzes the called structural variants by performing de novo assembly using the aligned reads at the identified breakpoint junctions. By comparing assembled contigs and analyzing the read coverage between the breakpoint junctions, SVGenT improves both variant and genotype classification and the breakpoint localization.}},
  author       = {{Börjesson, Vanja}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{SVenX: A highly parallelized pipeline for structural variation detection using linked read whole genome sequencing data}},
  year         = {{2018}},
}