Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

SVCurator: A Crowdsourcing app to visualize evidence of structural variants for the human genome

Chapman, Lesley M ; Spies, Noah ; Pai, Patrick ; Lim, Chun Shen ; Carroll, Andrew ; Narzisi, Giuseppe ; Watson, Christopher M ; Proukakis, Christos ; Clarke, Wayne E and Nariai, Naoki , et al. (2019)
Abstract
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is yet to be defined. In this study, we manually curated 1235 SVs which can ultimately be used to evaluate SV callers or
train machine learning models. We developed a crowdsourcing app - SVCurator - to help curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy.

SVCurator is a Python Flask-based web platform that displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM... (More)
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is yet to be defined. In this study, we manually curated 1235 SVs which can ultimately be used to evaluate SV callers or
train machine learning models. We developed a crowdsourcing app - SVCurator - to help curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy.

SVCurator is a Python Flask-based web platform that displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. The crowdsourced results were highly concordant with 37 out of
the 61 curators having at least 78% concordance with a set of ‘expert’ curators, where there was 93% concordance amongst ‘expert’ curators. This produced high confidence labels for 935 events. When compared to the heuristic-based draft benchmark SV callset from GIAB, the SVCurator crowdsourced labels were 94.5% concordant with the benchmark set. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies. (Less)
Please use this url to cite or link to this publication:
@misc{85487d73-df1c-4ec2-88d0-8665f34b33ed,
  abstract     = {{A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is yet to be defined. In this study, we manually curated 1235 SVs which can ultimately be used to evaluate SV callers or<br/>train machine learning models. We developed a crowdsourcing app - SVCurator - to help curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy.<br/><br/>SVCurator is a Python Flask-based web platform that displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. The crowdsourced results were highly concordant with 37 out of<br/>the 61 curators having at least 78% concordance with a set of ‘expert’ curators, where there was 93% concordance amongst ‘expert’ curators. This produced high confidence labels for 935 events. When compared to the heuristic-based draft benchmark SV callset from GIAB, the SVCurator crowdsourced labels were 94.5% concordant with the benchmark set. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.}},
  author       = {{Chapman, Lesley M and Spies, Noah and Pai, Patrick and Lim, Chun Shen and Carroll, Andrew and Narzisi, Giuseppe and Watson, Christopher M and Proukakis, Christos and Clarke, Wayne E and Nariai, Naoki and Dawson, Eric and Jones, Garan and Blankenberg, Daniel and Brueffer, Christian and Xiao, Chunlin and Kolora, Sree Rohit Raj and Alexander, Noah and Wolujewicz, Paul and Ahmed, Azza E. and Smith, Graeme and Shehreen, Saadlee and Wenger, Aaron M and Salit, Marc and Zook, Justin M}},
  keywords     = {{Bioinformatics; Genomics; Structural variants; Benchmark datasets}},
  language     = {{eng}},
  month        = {{03}},
  note         = {{Preprint}},
  publisher    = {{bioRxiv}},
  title        = {{SVCurator: A Crowdsourcing app to visualize evidence of structural variants for the human genome}},
  url          = {{http://dx.doi.org/10.1101/581264}},
  doi          = {{10.1101/581264}},
  year         = {{2019}},
}