Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Application of Bioactivity Profile-Based Fingerprints for Building Machine Learning Models

Sturm, Noé ; Sun, Jiangming LU orcid ; Vandriessche, Yves ; Mayr, Andreas ; Klambauer, Günter ; Carlsson, Lars ; Engkvist, Ola and Chen, Hongming (2019) In Journal of Chemical Information and Modeling 59(3). p.962-972
Abstract

The volume of high throughput screening data has considerably increased since the beginning of the automated biochemical and cell-based assays era. This information-rich data source provides tremendous repurposing opportunities for data mining. It was recently shown that biochemical or cell-based assay results can be compiled into so-called high-throughput fingerprints (HTSFPs) as a new type of descriptor describing molecular bioactivity profiles which can be applied in virtual screening, iterative screening, and target deconvolution. However, so far, studies around HTSFPs and machine learning have mainly focused on predicting the outcome of molecules in single high-throughput assays, and no one has reported the modeling of compounds'... (More)

The volume of high throughput screening data has considerably increased since the beginning of the automated biochemical and cell-based assays era. This information-rich data source provides tremendous repurposing opportunities for data mining. It was recently shown that biochemical or cell-based assay results can be compiled into so-called high-throughput fingerprints (HTSFPs) as a new type of descriptor describing molecular bioactivity profiles which can be applied in virtual screening, iterative screening, and target deconvolution. However, so far, studies around HTSFPs and machine learning have mainly focused on predicting the outcome of molecules in single high-throughput assays, and no one has reported the modeling of compounds' biochemical assay activities toward a panel of target proteins. In this article, we aim at comparing how our in-house HTSFPs perform at this when combined with multitask deep learning versus the single task support vector machine method both in terms of hit identification and of scaffold hopping potential. Performances obtained from the two HTSFP models were reported with respect to the performances of multitask deep learning and support vector machine models built with the structural descriptors ECFP. Moreover, we investigated the effect of high throughput screening false positives and negatives on the performance of the generated models. Our results showed that the two fingerprints yielded in similar performances and diverse hits with very little overlap, thus demonstrating the orthogonality of bioactivity profile-based descriptors with structural descriptors. Therefore, modeling compound activity data using ECFPs together with HTSFPs increases the scaffold hopping potential of the predictive models.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; and
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Deep Learning, Machine Learning
in
Journal of Chemical Information and Modeling
volume
59
issue
3
pages
962 - 972
publisher
The American Chemical Society (ACS)
external identifiers
  • pmid:30408959
  • scopus:85057545783
ISSN
1549-9596
DOI
10.1021/acs.jcim.8b00550
language
English
LU publication?
no
additional info
Publisher Copyright: © 2018 American Chemical Society.
id
e332d5e4-5df3-4a96-ba71-288b953c2a7d
date added to LUP
2023-04-24 15:37:04
date last changed
2024-04-05 17:10:23
@article{e332d5e4-5df3-4a96-ba71-288b953c2a7d,
  abstract     = {{<p>The volume of high throughput screening data has considerably increased since the beginning of the automated biochemical and cell-based assays era. This information-rich data source provides tremendous repurposing opportunities for data mining. It was recently shown that biochemical or cell-based assay results can be compiled into so-called high-throughput fingerprints (HTSFPs) as a new type of descriptor describing molecular bioactivity profiles which can be applied in virtual screening, iterative screening, and target deconvolution. However, so far, studies around HTSFPs and machine learning have mainly focused on predicting the outcome of molecules in single high-throughput assays, and no one has reported the modeling of compounds' biochemical assay activities toward a panel of target proteins. In this article, we aim at comparing how our in-house HTSFPs perform at this when combined with multitask deep learning versus the single task support vector machine method both in terms of hit identification and of scaffold hopping potential. Performances obtained from the two HTSFP models were reported with respect to the performances of multitask deep learning and support vector machine models built with the structural descriptors ECFP. Moreover, we investigated the effect of high throughput screening false positives and negatives on the performance of the generated models. Our results showed that the two fingerprints yielded in similar performances and diverse hits with very little overlap, thus demonstrating the orthogonality of bioactivity profile-based descriptors with structural descriptors. Therefore, modeling compound activity data using ECFPs together with HTSFPs increases the scaffold hopping potential of the predictive models.</p>}},
  author       = {{Sturm, Noé and Sun, Jiangming and Vandriessche, Yves and Mayr, Andreas and Klambauer, Günter and Carlsson, Lars and Engkvist, Ola and Chen, Hongming}},
  issn         = {{1549-9596}},
  keywords     = {{Deep Learning; Machine Learning}},
  language     = {{eng}},
  month        = {{03}},
  number       = {{3}},
  pages        = {{962--972}},
  publisher    = {{The American Chemical Society (ACS)}},
  series       = {{Journal of Chemical Information and Modeling}},
  title        = {{Application of Bioactivity Profile-Based Fingerprints for Building Machine Learning Models}},
  url          = {{http://dx.doi.org/10.1021/acs.jcim.8b00550}},
  doi          = {{10.1021/acs.jcim.8b00550}},
  volume       = {{59}},
  year         = {{2019}},
}