Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

ExCAPE-DB : An integrated large scale dataset facilitating Big Data analysis in chemogenomics

Sun, Jiangming LU orcid ; Jeliazkova, Nina ; Chupakin, Vladimir ; Golib-Dzib, Jose Felipe ; Engkvist, Ola ; Carlsson, Lars ; Wegner, Jörg ; Ceulemans, Hugo ; Georgiev, Ivan and Jeliazkov, Vedrin , et al. (2017) In Journal of Cheminformatics 9(1).
Abstract

Chemogenomics data generally refers to the activity data of chemical compounds on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing volume of chemogenomics data offers exciting opportunities to build models based on Big Data. Preparing a high quality data set is a vital step in realizing this goal and this work aims to compile such a comprehensive chemogenomics dataset. This dataset comprises over 70 million SAR data points from publicly available databases (PubChem and ChEMBL) including structure, target information and activity annotations. Our aspiration is to create a useful chemogenomics resource reflecting industry-scale data not only for... (More)

Chemogenomics data generally refers to the activity data of chemical compounds on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing volume of chemogenomics data offers exciting opportunities to build models based on Big Data. Preparing a high quality data set is a vital step in realizing this goal and this work aims to compile such a comprehensive chemogenomics dataset. This dataset comprises over 70 million SAR data points from publicly available databases (PubChem and ChEMBL) including structure, target information and activity annotations. Our aspiration is to create a useful chemogenomics resource reflecting industry-scale data not only for building predictive models of in silico polypharmacology and off-target effects but also for the validation of cheminformatics approaches in general.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; ; ; and , et al. (More)
; ; ; ; ; ; ; ; ; ; ; and (Less)
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Big Data, Bioactivity, Chemical structure, Chemogenomics, Molecular fingerprints, QSAR, Search engine
in
Journal of Cheminformatics
volume
9
issue
1
article number
17
publisher
ChemistryCentral
external identifiers
  • scopus:85014532240
ISSN
1758-2946
DOI
10.1186/s13321-017-0203-5
language
English
LU publication?
no
additional info
Publisher Copyright: © 2017 The Author(s).
id
b2485c69-9f7f-4937-9fd3-dca955058ede
date added to LUP
2023-04-24 15:36:17
date last changed
2023-04-27 07:35:33
@article{b2485c69-9f7f-4937-9fd3-dca955058ede,
  abstract     = {{<p>Chemogenomics data generally refers to the activity data of chemical compounds on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing volume of chemogenomics data offers exciting opportunities to build models based on Big Data. Preparing a high quality data set is a vital step in realizing this goal and this work aims to compile such a comprehensive chemogenomics dataset. This dataset comprises over 70 million SAR data points from publicly available databases (PubChem and ChEMBL) including structure, target information and activity annotations. Our aspiration is to create a useful chemogenomics resource reflecting industry-scale data not only for building predictive models of in silico polypharmacology and off-target effects but also for the validation of cheminformatics approaches in general.</p>}},
  author       = {{Sun, Jiangming and Jeliazkova, Nina and Chupakin, Vladimir and Golib-Dzib, Jose Felipe and Engkvist, Ola and Carlsson, Lars and Wegner, Jörg and Ceulemans, Hugo and Georgiev, Ivan and Jeliazkov, Vedrin and Kochev, Nikolay and Ashby, Thomas J. and Chen, Hongming}},
  issn         = {{1758-2946}},
  keywords     = {{Big Data; Bioactivity; Chemical structure; Chemogenomics; Molecular fingerprints; QSAR; Search engine}},
  language     = {{eng}},
  month        = {{03}},
  number       = {{1}},
  publisher    = {{ChemistryCentral}},
  series       = {{Journal of Cheminformatics}},
  title        = {{ExCAPE-DB : An integrated large scale dataset facilitating Big Data analysis in chemogenomics}},
  url          = {{http://dx.doi.org/10.1186/s13321-017-0203-5}},
  doi          = {{10.1186/s13321-017-0203-5}},
  volume       = {{9}},
  year         = {{2017}},
}