Development of an exome pipeline for the identification of single nucleotide variants and indels from ovarian cancer tissue

Burleigh, Stephen

Development of an exome pipeline for the identification of single nucleotide variants and indels from ovarian cancer tissue

Mark

Burleigh, Stephen (2016) BINP41 20161
Degree Projects in Bioinformatics

Abstract: Approximately a quarter of a million women are diagnosed with ovarian cancer annually worldwide. Identifying the genes involved may ultimately lead to the development of corrective and preventive measures to fight this type of cancer. Next Generation Sequencing is a technology that can reveal the genes central to cancer initiation and development, however the analysis of the data is problematic due to the magnitude of information created and the diversity of the bioinformatic tools available. Here, a bioinformatic pipeline system was designed to test various components in their ability to process data collected from a rather rare, but aggressive type of ovarian cancer called ovarian clear-cell cancer. It was shown that this flexible... (More); Approximately a quarter of a million women are diagnosed with ovarian cancer annually worldwide. Identifying the genes involved may ultimately lead to the development of corrective and preventive measures to fight this type of cancer. Next Generation Sequencing is a technology that can reveal the genes central to cancer initiation and development, however the analysis of the data is problematic due to the magnitude of information created and the diversity of the bioinformatic tools available. Here, a bioinformatic pipeline system was designed to test various components in their ability to process data collected from a rather rare, but aggressive type of ovarian cancer called ovarian clear-cell cancer. It was shown that this flexible modular-based pipeline system, which allowed for individual components to be easily compared, identified a single, optimal pipeline for the clear-cell data from a total of approximately 24 different pipeline combinations. This optimized pipeline was used to identify a set of genes from three patients that were associated with this disease, along with other mutations not currently recognized as cancer-related. (Less)
Popular Abstract: An automated ‘pipeline’ of programs to process genetic ovarian cancer data

Ovarian cancer affects approximately a quarter of a million women yearly worldwide, with the average age of diagnosis in the mid-60s. One less common type is called clear-cell ovarian cancer (OCCC), named due to its unique appearance when viewed under the microscope. OCCC is resistant to chemotherapy treatment and therefore patients have a poor prognosis. Studying the genes involved in this cancer can lead to the development of more effective treatments, potentially saving thousands of lives a year.

Information on the genes involved in this disease can be gained from ‘gene sequencing’ the tumor tissue and comparing them to those from healthy tissue, to... (More); An automated ‘pipeline’ of programs to process genetic ovarian cancer data

Ovarian cancer affects approximately a quarter of a million women yearly worldwide, with the average age of diagnosis in the mid-60s. One less common type is called clear-cell ovarian cancer (OCCC), named due to its unique appearance when viewed under the microscope. OCCC is resistant to chemotherapy treatment and therefore patients have a poor prognosis. Studying the genes involved in this cancer can lead to the development of more effective treatments, potentially saving thousands of lives a year.

Information on the genes involved in this disease can be gained from ‘gene sequencing’ the tumor tissue and comparing them to those from healthy tissue, to identify the genes from the tumor that have errors. However, this comparison involves many mathematical processes of the gene sequence data, using powerful computers and a number of programs. This ‘bioinformatic’ analysis needs to be tailor made for specific sets of data, such as the OCCC data, and should be automated to make processing easier.

The project
This project was the optimization and automation of a set of bioinformatic programs, called a ‘pipeline,’ for our OCCC data. The project involved three steps; 1) making a ‘master’ program to run a variety of different data processing programs, 2) choosing the set of programs most suitable for our OCCC data, and 3) using the optimized pipeline in a case study of three OCCC patients to identify genes thought to be involved in this disease. The resulting automated pipeline was shown to be suitable for our laboratory’s OCCC data and should help us in our research to understand and eventually beat this disease.

Advisor: Ingrid Hedenfalk
Degree Project, 30 credits, Bioinformatics, 2016
Department of Biology, Lund University (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/8891620

author

Burleigh, Stephen

supervisor

Ingrid Hedenfalk ^LU

organization

Degree Projects in Bioinformatics

course

BINP41 20161

year

2016

type

M2 - Bachelor Degree

subject

Biology and Life Sciences

language

English

id

8891620

date added to LUP

2016-09-13 15:07:58

date last changed

2016-09-13 15:07:58

@misc{8891620,
  abstract     = {{Approximately a quarter of a million women are diagnosed with ovarian cancer annually worldwide. Identifying the genes involved may ultimately lead to the development of corrective and preventive measures to fight this type of cancer. Next Generation Sequencing is a technology that can reveal the genes central to cancer initiation and development, however the analysis of the data is problematic due to the magnitude of information created and the diversity of the bioinformatic tools available. Here, a bioinformatic pipeline system was designed to test various components in their ability to process data collected from a rather rare, but aggressive type of ovarian cancer called ovarian clear-cell cancer. It was shown that this flexible modular-based pipeline system, which allowed for individual components to be easily compared, identified a single, optimal pipeline for the clear-cell data from a total of approximately 24 different pipeline combinations. This optimized pipeline was used to identify a set of genes from three patients that were associated with this disease, along with other mutations not currently recognized as cancer-related.}},
  author       = {{Burleigh, Stephen}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Development of an exome pipeline for the identification of single nucleotide variants and indels from ovarian cancer tissue}},
  year         = {{2016}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Development of an exome pipeline for the identification of single nucleotide variants and indels from ovarian cancer tissue