Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Evaluating Variant Calling Pipelines for RNA-seq Data

Pathak, Ayushi (2024) BINP51 20231
Degree Projects in Bioinformatics
Abstract
RNA-seq has become a revolutionary tool for genome-wide gene expression analysis, opening new possibilities for research. This study focuses on using RNA-seq data for variant calling at Single Nucleotide Polymorphisms in a non-model organism, the bank vole. Variant calling from RNA-seq can for example be used for Allele-Specific Expression (ASE) analysis which sheds light on the impact of genetic variation on gene expression, specifically, cis-regulatory variation. Quality variant calling is important as it affects the downstream analyses. Here we evaluate the reliability of the tools ( Freebyaes an GATK) and analysis techniques to improve variant calling pipeline. The project's ultimate objective is to establish a robust and adaptable... (More)
RNA-seq has become a revolutionary tool for genome-wide gene expression analysis, opening new possibilities for research. This study focuses on using RNA-seq data for variant calling at Single Nucleotide Polymorphisms in a non-model organism, the bank vole. Variant calling from RNA-seq can for example be used for Allele-Specific Expression (ASE) analysis which sheds light on the impact of genetic variation on gene expression, specifically, cis-regulatory variation. Quality variant calling is important as it affects the downstream analyses. Here we evaluate the reliability of the tools ( Freebyaes an GATK) and analysis techniques to improve variant calling pipeline. The project's ultimate objective is to establish a robust and adaptable variant calling and filtering pipeline for bank voles, contributing to the growing knowledge in non-model species research.

Here we compare two widely used variant calling software, Freebayes and Genome Analysis Toolkit (GATK), in terms of variant calling, the filters used in both the analyses, and the resultant filtered variants. Additionally, we compare the variants identified from the transcriptome with those obtained from genomic data to validate the findings. While comparing the expressed genes with called variants from GATK and Freebayes, the results revealed that Freebayes exhibited higher precision and fewer false positives across three runs with increasingly stringent filters. The common variants in all three databases were reduced in Run 1 to Run 3, with Freebayes maintaining better accuracy than GATK for further analysis, showing an increase in precision from 0.50 to 0.71 while GATK exhibited constant low precision and declining sensitivity from 0.19 to 0.03 over the three runs.

Despite challenges posed by limited resources and a lack of functional annotations for non-model species like the bank vole, RNA-seq data offers a cost-effective and targeted approach for variant calling. The comparative analysis of Freebayes and GATK helps establish a choice of tools for future studies and enhances our understanding of the strengths and limitations of each approach. (Less)
Popular Abstract
In this study, we investigated how RNA-seq data, a cutting-edge technology for analyzing gene expression, can be used to identify genetic variations in a non-model organism called the bank vole. These variations, known as Single Nucleotide Polymorphisms (SNPs), are crucial for understanding how genetic differences impact gene expression and potentially influence traits and diseases. We compared two widely used software tools, Freebayes and Genome Analysis Toolkit (GATK), to assess their effectiveness in identifying these genetic variations. The analysis focused on the accuracy of variant calling and the reliability of the filtering techniques used to refine the results.

Overall, we found that Freebayes showed higher precision and lower... (More)
In this study, we investigated how RNA-seq data, a cutting-edge technology for analyzing gene expression, can be used to identify genetic variations in a non-model organism called the bank vole. These variations, known as Single Nucleotide Polymorphisms (SNPs), are crucial for understanding how genetic differences impact gene expression and potentially influence traits and diseases. We compared two widely used software tools, Freebayes and Genome Analysis Toolkit (GATK), to assess their effectiveness in identifying these genetic variations. The analysis focused on the accuracy of variant calling and the reliability of the filtering techniques used to refine the results.

Overall, we found that Freebayes showed higher precision and lower false positives compared to GATK, especially when applying more stringent filters. This suggests that Freebayes may be a better choice for variant calling in similar studies. Importantly, the study highlights the value of RNA-seq data in exploring genetic diversity in non-model organisms like bank voles. Despite challenges such as limited resources and sparse annotations, RNA-seq offers a cost-effective and targeted approach to identifying genetic variations. By establishing a robust variant calling pipeline, the research contributes to the growing knowledge of genetic variation in non-model species. This information is essential for understanding the underlying mechanisms of evolution, adaptation, and disease susceptibility across different species.

In conclusion, this study provides valuable insights into the strengths and limitations of different variant calling approaches, paving the way for future research in non-model species and enhancing the overall understanding of genetic diversity and its implications. (Less)
Please use this url to cite or link to this publication:
author
Pathak, Ayushi
supervisor
organization
course
BINP51 20231
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
9149274
date added to LUP
2024-02-29 11:48:47
date last changed
2024-02-29 11:48:47
@misc{9149274,
  abstract     = {{RNA-seq has become a revolutionary tool for genome-wide gene expression analysis, opening new possibilities for research. This study focuses on using RNA-seq data for variant calling at Single Nucleotide Polymorphisms in a non-model organism, the bank vole. Variant calling from RNA-seq can for example be used for Allele-Specific Expression (ASE) analysis which sheds light on the impact of genetic variation on gene expression, specifically, cis-regulatory variation. Quality variant calling is important as it affects the downstream analyses. Here we evaluate the reliability of the tools ( Freebyaes an GATK) and analysis techniques to improve variant calling pipeline. The project's ultimate objective is to establish a robust and adaptable variant calling and filtering pipeline for bank voles, contributing to the growing knowledge in non-model species research.

Here we compare two widely used variant calling software, Freebayes and Genome Analysis Toolkit (GATK), in terms of variant calling, the filters used in both the analyses, and the resultant filtered variants. Additionally, we compare the variants identified from the transcriptome with those obtained from genomic data to validate the findings. While comparing the expressed genes with called variants from GATK and Freebayes, the results revealed that Freebayes exhibited higher precision and fewer false positives across three runs with increasingly stringent filters. The common variants in all three databases were reduced in Run 1 to Run 3, with Freebayes maintaining better accuracy than GATK for further analysis, showing an increase in precision from 0.50 to 0.71 while GATK exhibited constant low precision and declining sensitivity from 0.19 to 0.03 over the three runs.

Despite challenges posed by limited resources and a lack of functional annotations for non-model species like the bank vole, RNA-seq data offers a cost-effective and targeted approach for variant calling. The comparative analysis of Freebayes and GATK helps establish a choice of tools for future studies and enhances our understanding of the strengths and limitations of each approach.}},
  author       = {{Pathak, Ayushi}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Evaluating Variant Calling Pipelines for RNA-seq Data}},
  year         = {{2024}},
}