Advanced

HapZipper : sharing HapMap populations just got easier

Chanda, Pritam ; Elhaik, Eran LU and Bader, Joel S (2012) In Nucleic Acids Research 40(20).
Abstract

The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression... (More)

The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing HapZipper, a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip, bzip2 and lzma. We demonstrate the usefulness of HapZipper by compressing HapMap 3 populations to <5% of their original sizes. HapZipper is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz2.

(Less)
Please use this url to cite or link to this publication:
author
; and
publishing date
type
Contribution to journal
publication status
published
keywords
Data Compression, HapMap Project, Humans, Polymorphism, Single Nucleotide, Software
in
Nucleic Acids Research
volume
40
issue
20
article number
e159
pages
7 pages
publisher
Oxford University Press
external identifiers
  • pmid:22844100
  • scopus:84869052036
ISSN
1362-4962
DOI
10.1093/nar/gks709
language
English
LU publication?
no
id
f0fcc98e-b18c-4a66-a991-8d00bef37a37
date added to LUP
2019-11-10 16:52:14
date last changed
2021-04-20 05:13:25
@article{f0fcc98e-b18c-4a66-a991-8d00bef37a37,
  abstract     = {<p>The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing HapZipper, a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip, bzip2 and lzma. We demonstrate the usefulness of HapZipper by compressing HapMap 3 populations to &lt;5% of their original sizes. HapZipper is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz2.</p>},
  author       = {Chanda, Pritam and Elhaik, Eran and Bader, Joel S},
  issn         = {1362-4962},
  language     = {eng},
  month        = {11},
  number       = {20},
  publisher    = {Oxford University Press},
  series       = {Nucleic Acids Research},
  title        = {HapZipper : sharing HapMap populations just got easier},
  url          = {http://dx.doi.org/10.1093/nar/gks709},
  doi          = {10.1093/nar/gks709},
  volume       = {40},
  year         = {2012},
}