Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Identification of functional synonymous variants in familial breast cancer

Boffelli Castro, Arthur (2022) BINP52 20212
Degree Projects in Bioinformatics
Abstract
The genetic causes of familial breast cancer are still unclear, where only less than 20% have evident germline mutations. Most studies focus on mutations that cause a direct impact in the protein, disregarding the synonymous variants, where the codon is changed but the amino acid encoded remains the same. Synonymous variants can affect mRNA stability and translation, by differences in codon usage frequency, as well as splicing and binding of regulatory proteins by changing specific motifs. We created a pipeline for annotation of synonymous variants in two datasets from studies containing breast cancer patients enriched for family history: SWEA and BRIDGES. The pipeline filters the variants, annotates each variant, and generates a table... (More)
The genetic causes of familial breast cancer are still unclear, where only less than 20% have evident germline mutations. Most studies focus on mutations that cause a direct impact in the protein, disregarding the synonymous variants, where the codon is changed but the amino acid encoded remains the same. Synonymous variants can affect mRNA stability and translation, by differences in codon usage frequency, as well as splicing and binding of regulatory proteins by changing specific motifs. We created a pipeline for annotation of synonymous variants in two datasets from studies containing breast cancer patients enriched for family history: SWEA and BRIDGES. The pipeline filters the variants, annotates each variant, and generates a table containing the synonymous variants. For the SWEA dataset, 80% of the variants were known, and approximately 15% of the patients carried a reported pathogenic/likely pathogenic variant. These patients were removed from the analysis, and 1260 synonymous variants were found in the remaining samples. For BRIDGES, both cases and controls had around 65% of known variants, and 7% of the case samples contained pathogenic/likely pathogenic mutations, compared to 3.5% for control samples. After removing samples with known pathogenic variants, an association analysis between cases and controls was performed for selecting significant synonymous variants. Thirty-eight variants showed significance for all remaining samples, and 26 variants in an analysis with only patients with family history. Several synonymous variants show interesting features, such as highly conserved sites, large difference of codon usage between alternative and reference codons, and effects in exonic splicing enhancer and silencer motifs. This pipeline is the initial step for selection of potential functional synonymous variants that can be associated with familial breast cancer. (Less)
Popular Abstract
The sounds of silence: silent mutations in familial breast cancer


One out of ten women is likely to develop breast cancer during their lifetime. At least 10% of patients have families with a history of breast cancer, but less than half of those have a known disease-causing mutation (also called genetic variants) being passed through generations. These mutations usually occur in genes responsible for repairing damages to the DNA. When a mutation deactivates one of these genes, cells with problematic DNA sequences can continue to multiply, causing a tumour.

When searching for such mutations, most studies focus on mutations that directly affect protein formation when the gene is decoded. In these cases, the protein sequence will be... (More)
The sounds of silence: silent mutations in familial breast cancer


One out of ten women is likely to develop breast cancer during their lifetime. At least 10% of patients have families with a history of breast cancer, but less than half of those have a known disease-causing mutation (also called genetic variants) being passed through generations. These mutations usually occur in genes responsible for repairing damages to the DNA. When a mutation deactivates one of these genes, cells with problematic DNA sequences can continue to multiply, causing a tumour.

When searching for such mutations, most studies focus on mutations that directly affect protein formation when the gene is decoded. In these cases, the protein sequence will be changed, or a stop signal will appear earlier than it should, causing a shorter protein. However, there are mutations that, even though the DNA changes, the resulting protein sequence is still the same. These so-called synonymous mutations that are often believed to be functionally silent are the focus of this project. Although synonymous mutations, in theory, do not change the protein, they can cause other problems that will result in a problematic protein. In this project, we created tools for bioinformatic analysis to identify synonymous mutations that are possibly contributing to an increased risk for breast cancer.

Two different studies (SWEA and BRIDGES) retrieved the DNA sequence of the most commonly affected genes in BC from several thousand patients. These sequences were analysed in a software that identifies mutations, and generates the variant calling format (vcf) files, which contain the position of each mutation found. Modern sequencing techniques make very few errors; however, errors still occur. The first step of our analysis removed the mutations that are likely errors in the sequencing step. After making sure that all data are reliable, we added information about each variant, based on the reference human genome. This information helped us evaluate each variant, for example, which gene is affected, how common this mutation is in the population, and if the mutation was already reported by other studies.

We then excluded all patients with known disease-causing mutations from further analysis, to focus on finding new mutations in the patients that still had unexplained disease. In total, we found more than 1250 synonymous mutations which will be closely evaluated based on their information, and finally the most interesting ones will be selected for further laboratory testing. We believe that, instead of searching for new genes, we should dig deeper in the known genes to find problems that are still hidden, improving early diagnosis and survival of breast cancer patients.

Master’s Degree Project in Bioinformatics 60 credits 2022
Department of Biology, Lund University

Advisor: Helena Persson
Department of Clinical Sciences Lund, Oncology (Less)
Please use this url to cite or link to this publication:
author
Boffelli Castro, Arthur
supervisor
organization
course
BINP52 20212
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
9102718
date added to LUP
2022-11-04 11:15:19
date last changed
2022-11-04 11:15:19
@misc{9102718,
  abstract     = {{The genetic causes of familial breast cancer are still unclear, where only less than 20% have evident germline mutations. Most studies focus on mutations that cause a direct impact in the protein, disregarding the synonymous variants, where the codon is changed but the amino acid encoded remains the same. Synonymous variants can affect mRNA stability and translation, by differences in codon usage frequency, as well as splicing and binding of regulatory proteins by changing specific motifs. We created a pipeline for annotation of synonymous variants in two datasets from studies containing breast cancer patients enriched for family history: SWEA and BRIDGES. The pipeline filters the variants, annotates each variant, and generates a table containing the synonymous variants. For the SWEA dataset, 80% of the variants were known, and approximately 15% of the patients carried a reported pathogenic/likely pathogenic variant. These patients were removed from the analysis, and 1260 synonymous variants were found in the remaining samples. For BRIDGES, both cases and controls had around 65% of known variants, and 7% of the case samples contained pathogenic/likely pathogenic mutations, compared to 3.5% for control samples. After removing samples with known pathogenic variants, an association analysis between cases and controls was performed for selecting significant synonymous variants. Thirty-eight variants showed significance for all remaining samples, and 26 variants in an analysis with only patients with family history. Several synonymous variants show interesting features, such as highly conserved sites, large difference of codon usage between alternative and reference codons, and effects in exonic splicing enhancer and silencer motifs. This pipeline is the initial step for selection of potential functional synonymous variants that can be associated with familial breast cancer.}},
  author       = {{Boffelli Castro, Arthur}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Identification of functional synonymous variants in familial breast cancer}},
  year         = {{2022}},
}