Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Evaluation of haplotype imputation as a new method to improve genetic variant information from great apes low-coverage whole genome sequencing (lcWGS) data

Valenzuela, Alejandro (2019) BINP52 20182
Degree Projects in Bioinformatics
Abstract
The study of high-quality genomic sequences from great-apes, our closest relatives and one of the most relevant to understand human bio-logical processes, is reaching a limit in terms of access. New methods are needed to achieve the maximum variant information from lower-quality samples. In that scenario, imputation is shown as a good candidate for improving genotype information on low-genotyped data. Here I focused on a population-based imputation approach using linkage-disequilibrium blocks between individuals. Such methods have been under thorough development in humans, with the creation of reference haplotype panels containing more than 2,000 individuals, increasing the power to genotype extremely low-frequency variants correctly. In... (More)
The study of high-quality genomic sequences from great-apes, our closest relatives and one of the most relevant to understand human bio-logical processes, is reaching a limit in terms of access. New methods are needed to achieve the maximum variant information from lower-quality samples. In that scenario, imputation is shown as a good candidate for improving genotype information on low-genotyped data. Here I focused on a population-based imputation approach using linkage-disequilibrium blocks between individuals. Such methods have been under thorough development in humans, with the creation of reference haplotype panels containing more than 2,000 individuals, increasing the power to genotype extremely low-frequency variants correctly. In this work, I show the feasibility of carrying out imputation analysis in chimpanzees using simulated low-coverage whole-genome sequencing samples (lcWGS). This way, comparison analyses between chimpan-zee and human genomic data were carried out to assess the efficacy of the method. I find that high uncertainty within typed markers from our study samples, together with a lack of a broad ape population-based reference panel might complicate the accurate imputation of missing positions within chimpanzee genomes. Further studies will focus on solving those limitations when using non-human primate data. (Less)
Popular Abstract
Imputation and its performance in great ape’s genomics

Genetic variability of great apes is one of the focus in primates’ evolutionary genomic studies. Latest years, access to high-quality DNA sequences from apes has improved the knowledge on this field. However, a limit has recently been reached and there is the need of innovative methods to retrieve genomic information from these species. Here, I tested haplotype imputation, a statistical method used in humans to recover much of the genetic information lost in low-quality genotyped samples.

In this project, I use a population-based approach to infer missing genotypes in chimpanzee samples using haplotypes. Genotype haplotypes consist of exact nucleotide sequences ordered along the... (More)
Imputation and its performance in great ape’s genomics

Genetic variability of great apes is one of the focus in primates’ evolutionary genomic studies. Latest years, access to high-quality DNA sequences from apes has improved the knowledge on this field. However, a limit has recently been reached and there is the need of innovative methods to retrieve genomic information from these species. Here, I tested haplotype imputation, a statistical method used in humans to recover much of the genetic information lost in low-quality genotyped samples.

In this project, I use a population-based approach to infer missing genotypes in chimpanzee samples using haplotypes. Genotype haplotypes consist of exact nucleotide sequences ordered along the chromosome and inherited from the parents. The method makes use of a reference panel of haplotypes with good quality for the analyzed species. Our panel contained a total of 58 chimpanzees from 4 different subspecies. For testing, I focused on one of them, the western chimpanzee, as it was the one with the most resources. Only chromosomes 21 and 22 were tested, as they are smaller than the rest and with more individuals genotyped for those chromosomes.

Developed imputation software like IMPUTE2 or Beagle were applied to our chimpanzee data. They look for haplotype blocks shared between individuals to infer missing sites. In order to evaluate the performance of them, 2 methods were followed. First, the best way to test accuracy was to compare imputed genotypes on the genome with the true genotypes from the same individual whether available. Second, when there are no reference genotypes to compare with, quality scores are used. Each imputed position is assigned a quality value showing its statistical reliability.
Comparison with Human imputation

Imputation tests were also carried out in humans with the 1000 Genomes Project data. This dataset is known to be widely used in GWAS studies to recover missing positions in human populations. All imputation tests were performed both in chimpanzee and human data, obtaining always better results for the human trials. Last analyses were performed using genotype likelihoods as input for the software, instead of genotype calls. This decision was made when realizing that for low-coverage sequencing, a model based on probabilities performed better than a deterministic model.

Accurate results were obtained for low-coverage human samples. A binomial distribution for the quality of imputed sites in the genome validated human analyses. However, results in chimpanzees were far from this distribution. Complications might be related to the characteristics of the reference chimpanzees’ panel. Further studies will try to solve this problem by improving the quality of great-apes populations panel of haplotypes.

Master’s Degree Project in Bioinformatics 60 credits - 2019
Department of Biology, Lund University

Advisor: Tomas Marquès Bonet
Advisors Unit/Department) DCEXS ( Deepartment of Experimental Sciences) – UPF (Less)
Please use this url to cite or link to this publication:
author
Valenzuela, Alejandro
supervisor
organization
course
BINP52 20182
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
8990059
date added to LUP
2019-07-04 15:12:48
date last changed
2019-07-04 15:12:48
@misc{8990059,
  abstract     = {{The study of high-quality genomic sequences from great-apes, our closest relatives and one of the most relevant to understand human bio-logical processes, is reaching a limit in terms of access. New methods are needed to achieve the maximum variant information from lower-quality samples. In that scenario, imputation is shown as a good candidate for improving genotype information on low-genotyped data. Here I focused on a population-based imputation approach using linkage-disequilibrium blocks between individuals. Such methods have been under thorough development in humans, with the creation of reference haplotype panels containing more than 2,000 individuals, increasing the power to genotype extremely low-frequency variants correctly. In this work, I show the feasibility of carrying out imputation analysis in chimpanzees using simulated low-coverage whole-genome sequencing samples (lcWGS). This way, comparison analyses between chimpan-zee and human genomic data were carried out to assess the efficacy of the method. I find that high uncertainty within typed markers from our study samples, together with a lack of a broad ape population-based reference panel might complicate the accurate imputation of missing positions within chimpanzee genomes. Further studies will focus on solving those limitations when using non-human primate data.}},
  author       = {{Valenzuela, Alejandro}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Evaluation of haplotype imputation as a new method to improve genetic variant information from great apes low-coverage whole genome sequencing (lcWGS) data}},
  year         = {{2019}},
}