Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

A resource of validated and annotated gene fusions in cancer cell lines

Alamshahi, Arianna (2025) BINP52 20242
Degree Projects in Bioinformatics
Abstract
In cancer cells, when two different genes are rearranged and joined, a fusion gene results and can act as a tumour driver. While many programs to predict fusion transcripts based on RNA-sequencing data do exist, the predictions contain many false positives without underlying genomic rearrangements. With the release of a new fusion validation pipeline to address this shortcoming, we have created a large dataset of validated and annotated gene fusions predicted by Arriba and STAR-Fusion from 328 cell lines of the publicly available Cancer Cell Line Encyclopedia. This dataset can act as a resource for the development of bioinformatic tools for fusion transcript detection and for design of functional studies or drug development.
Popular Abstract
Fusions4U: A Resource of Validated Gene Fusions in Cancer

When we think of our own genes, we of course like to imagine “gene A” and “gene B” functioning normally. However, structural rearrangements in our genetic code (DNA) can lead to a gene fusion between two genes to form “gene AB”. This gene fusion could lead to a fusion protein which could drive the formation and growth of tumours. Gene fusions are a common occurrence in cancer and can be an important tool in its diagnosis and treatment. We have created a dataset of validated fusions in cancer cell lines to act as a resource in drug development, the design of functional studies, and in the development of new bioinformatic tools for fusion transcript prediction.

The transcribed... (More)
Fusions4U: A Resource of Validated Gene Fusions in Cancer

When we think of our own genes, we of course like to imagine “gene A” and “gene B” functioning normally. However, structural rearrangements in our genetic code (DNA) can lead to a gene fusion between two genes to form “gene AB”. This gene fusion could lead to a fusion protein which could drive the formation and growth of tumours. Gene fusions are a common occurrence in cancer and can be an important tool in its diagnosis and treatment. We have created a dataset of validated fusions in cancer cell lines to act as a resource in drug development, the design of functional studies, and in the development of new bioinformatic tools for fusion transcript prediction.

The transcribed result of a gene fusion on the DNA level can be observed as a fusion transcript on the RNA level. Specialized bioinformatic software can be used on sequencing data to detect possible gene fusions based on fusion transcripts, but unfortunately, many of the predictions do not truly exist as gene fusions. In this project, we used state-of-the art fusion prediction tools to predict fusion transcripts with RNA data then validated those predictions with a recently developed validation pipeline using DNA data.

We have assembled a dataset of validated fusions in 328 cancer cell lines spanning across 22 tissue types from the publicly available sequencing data of the Cancer Cell Line Encylopedia. We used Arriba and STAR-Fusion software to predict the fusion transcripts and a recently developed pipeline to validate them. As we intend to make the dataset public, we annotated the validated predictions for features which could be of interest to users, such as involvement of a kinase gene in a fusion. Kinases are enzymes which phosphorylate other proteins to control their function, and overactive kinases can help cancer cells to grow and divide faster. Kinase annotations could be useful when designing drug studies as kinase inhibitors are used as a course of treatment in some cancers to suppress abnormal kinase proteins which can form from gene fusions.

For an accessible way to browse the dataset, we created a database app called Fusions4U with R shiny. In the app, a user can select how to view and filter the 10,977 entries in our validated and annotated dataset. For example, they can filter by tissue or cancer type of the cell line, by specific gene partners involved in the fusion, or by annotations of interest. Notably, the user is able download a TSV formatted file of all validated gene fusions or a subset of fusions based on their filtering parameters. This dataset can be used as a resource in the design of functional studies, drug development, and bioinformatic tool development.

Master’s Degree Project in Bioinformatics 60 credits 2025
Department of Biology, Lund University

Advisor: Helena Persson
Department of Clinical Sciences Lund, Oncology (Less)
Please use this url to cite or link to this publication:
author
Alamshahi, Arianna
supervisor
organization
course
BINP52 20242
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
9192385
date added to LUP
2025-06-03 11:15:14
date last changed
2025-06-03 11:15:14
@misc{9192385,
  abstract     = {{In cancer cells, when two different genes are rearranged and joined, a fusion gene results and can act as a tumour driver. While many programs to predict fusion transcripts based on RNA-sequencing data do exist, the predictions contain many false positives without underlying genomic rearrangements. With the release of a new fusion validation pipeline to address this shortcoming, we have created a large dataset of validated and annotated gene fusions predicted by Arriba and STAR-Fusion from 328 cell lines of the publicly available Cancer Cell Line Encyclopedia. This dataset can act as a resource for the development of bioinformatic tools for fusion transcript detection and for design of functional studies or drug development.}},
  author       = {{Alamshahi, Arianna}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{A resource of validated and annotated gene fusions in cancer cell lines}},
  year         = {{2025}},
}