Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

A novel scatterplot-based method to detect copy number variation (CNV)

Qiao, Jia Lu ; Levinson, Rebecca T. ; Chen, Bowang ; Engelter, Stefan T. ; Erhart, Philipp ; Gaynor, Brady J. ; McArdle, Patrick F. ; Schlicht, Kristina ; Krawczak, Michael and Stenman, Martin LU , et al. (2023) In Frontiers in Genetics 14.
Abstract

Objective: Most methods to detect copy number variation (CNV) have high false positive rates, especially for small CNVs and in real-life samples from clinical studies. In this study, we explored a novel scatterplot-based method to detect CNVs in microarray samples. Methods: Illumina SNP microarray data from 13,254 individuals were analyzed with scatterplots and by PennCNV. The data were analyzed without the prior exclusion of low-quality samples. For CNV scatterplot visualization, the median signal intensity of all SNPs located within a CNV region was plotted against the median signal intensity of the flanking genomic region. Since CNV causes loss or gain of signal intensities, carriers of different CNV alleles pop up in clusters.... (More)

Objective: Most methods to detect copy number variation (CNV) have high false positive rates, especially for small CNVs and in real-life samples from clinical studies. In this study, we explored a novel scatterplot-based method to detect CNVs in microarray samples. Methods: Illumina SNP microarray data from 13,254 individuals were analyzed with scatterplots and by PennCNV. The data were analyzed without the prior exclusion of low-quality samples. For CNV scatterplot visualization, the median signal intensity of all SNPs located within a CNV region was plotted against the median signal intensity of the flanking genomic region. Since CNV causes loss or gain of signal intensities, carriers of different CNV alleles pop up in clusters. Moreover, SNPs within a deletion are not heterozygous, whereas heterozygous SNPs within a duplication show typical 1:2 signal distribution between the alleles. Scatterplot-based CNV calls were compared with standard results of PennCNV analysis. All discordant calls as well as a random selection of 100 concordant calls were individually analyzed by visual inspection after noise-reduction. Results: An algorithm for the automated scatterplot visualization of CNVs was developed and used to analyze six known CNV regions. Use of scatterplots and PennCNV yielded 1019 concordant and 108 discordant CNV calls. All concordant calls were evaluated as true CNV-findings. Among the 108 discordant calls, 7 were false positive findings by the scatterplot method, 80 were PennCNV false positives, and 21 were true CNVs detected by the scatterplot method, but missed by PennCNV (i.e., false negative findings). Conclusion: CNV visualization by scatterplots allows for a reliable and rapid detection of CNVs in large studies. This novel method may thus be used both to confirm the results of genome-wide CNV detection software and to identify known CNVs in hitherto untyped samples.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; ; ; and , et al. (More)
; ; ; ; ; ; ; ; ; ; ; and (Less)
author collaboration
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
copy number variation (CNV), filtering, pennCNV, quality control, scatterplot
in
Frontiers in Genetics
volume
14
article number
1166972
publisher
Frontiers Media S. A.
external identifiers
  • pmid:37485343
  • scopus:85165391974
ISSN
1664-8021
DOI
10.3389/fgene.2023.1166972
language
English
LU publication?
yes
id
7032f41e-c1c9-4f02-b2cc-421f943aac17
date added to LUP
2023-09-19 15:03:08
date last changed
2024-04-19 01:26:41
@article{7032f41e-c1c9-4f02-b2cc-421f943aac17,
  abstract     = {{<p>Objective: Most methods to detect copy number variation (CNV) have high false positive rates, especially for small CNVs and in real-life samples from clinical studies. In this study, we explored a novel scatterplot-based method to detect CNVs in microarray samples. Methods: Illumina SNP microarray data from 13,254 individuals were analyzed with scatterplots and by PennCNV. The data were analyzed without the prior exclusion of low-quality samples. For CNV scatterplot visualization, the median signal intensity of all SNPs located within a CNV region was plotted against the median signal intensity of the flanking genomic region. Since CNV causes loss or gain of signal intensities, carriers of different CNV alleles pop up in clusters. Moreover, SNPs within a deletion are not heterozygous, whereas heterozygous SNPs within a duplication show typical 1:2 signal distribution between the alleles. Scatterplot-based CNV calls were compared with standard results of PennCNV analysis. All discordant calls as well as a random selection of 100 concordant calls were individually analyzed by visual inspection after noise-reduction. Results: An algorithm for the automated scatterplot visualization of CNVs was developed and used to analyze six known CNV regions. Use of scatterplots and PennCNV yielded 1019 concordant and 108 discordant CNV calls. All concordant calls were evaluated as true CNV-findings. Among the 108 discordant calls, 7 were false positive findings by the scatterplot method, 80 were PennCNV false positives, and 21 were true CNVs detected by the scatterplot method, but missed by PennCNV (i.e., false negative findings). Conclusion: CNV visualization by scatterplots allows for a reliable and rapid detection of CNVs in large studies. This novel method may thus be used both to confirm the results of genome-wide CNV detection software and to identify known CNVs in hitherto untyped samples.</p>}},
  author       = {{Qiao, Jia Lu and Levinson, Rebecca T. and Chen, Bowang and Engelter, Stefan T. and Erhart, Philipp and Gaynor, Brady J. and McArdle, Patrick F. and Schlicht, Kristina and Krawczak, Michael and Stenman, Martin and Lindgren, Arne G. and Cole, John W. and Grond-Ginsbach, Caspar}},
  issn         = {{1664-8021}},
  keywords     = {{copy number variation (CNV); filtering; pennCNV; quality control; scatterplot}},
  language     = {{eng}},
  publisher    = {{Frontiers Media S. A.}},
  series       = {{Frontiers in Genetics}},
  title        = {{A novel scatterplot-based method to detect copy number variation (CNV)}},
  url          = {{http://dx.doi.org/10.3389/fgene.2023.1166972}},
  doi          = {{10.3389/fgene.2023.1166972}},
  volume       = {{14}},
  year         = {{2023}},
}