Advanced

Applicability Domain Dependent Predictive Uncertainty in QSAR Regressions

Sahlin, Ullrika LU ; Jeliazkova, N. and Oberg, T. (2014) In Molecular Informatics 33(1). p.26-35
Abstract
Predictive models used in decision making, such as QSARs in chemical regulation or drug discovery, call for evaluated approaches to quantitatively assess associated uncertainty in predictions. Uncertainty in less reliable predictions may be captured by locally varying predictive errors. In the current study, model-based bootstrapping was combined with analogy reasoning to generate predictive distributions varying in magnitude over a model's domain of applicability. A resampling experiment based on PLS regressions on four QSAR data sets demonstrated that predictive errors assessed by k nearest neighbour or weighted PRedicted Error Sum of Squares (PRESS) on samples of external test data or by internal cross-validation improved the... (More)
Predictive models used in decision making, such as QSARs in chemical regulation or drug discovery, call for evaluated approaches to quantitatively assess associated uncertainty in predictions. Uncertainty in less reliable predictions may be captured by locally varying predictive errors. In the current study, model-based bootstrapping was combined with analogy reasoning to generate predictive distributions varying in magnitude over a model's domain of applicability. A resampling experiment based on PLS regressions on four QSAR data sets demonstrated that predictive errors assessed by k nearest neighbour or weighted PRedicted Error Sum of Squares (PRESS) on samples of external test data or by internal cross-validation improved the performance of the uncertainty assessment. Analogy using similarity defined by Euclidean distances, or differences in standard deviation in perturbed predictions, resulted in better performances than similarity defined by distance to, or density of, the training data. Locally assessed predictive distributions had on average at least as good coverage as Gaussian distribution with variance assessed from the PRESS. An R-code is provided that evaluates performances of the suggested algorithms to assess predictive error based on log likelihood scores and empirical coverage graphs, and which applies these to derive confidence intervals or samples from the predictive distributions of query compounds. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Predictive error, Variance, Reliability, Bootstrap, Risk assessment
in
Molecular Informatics
volume
33
issue
1
pages
26 - 35
publisher
John Wiley & Sons
external identifiers
  • wos:000346768100004
  • scopus:84895165560
ISSN
1868-1751
DOI
10.1002/minf.201200131
language
English
LU publication?
yes
id
5f7752f9-7c03-4785-a98f-b38d997bfdc2 (old id 4941519)
date added to LUP
2015-01-27 16:35:35
date last changed
2017-11-05 03:18:49
@article{5f7752f9-7c03-4785-a98f-b38d997bfdc2,
  abstract     = {Predictive models used in decision making, such as QSARs in chemical regulation or drug discovery, call for evaluated approaches to quantitatively assess associated uncertainty in predictions. Uncertainty in less reliable predictions may be captured by locally varying predictive errors. In the current study, model-based bootstrapping was combined with analogy reasoning to generate predictive distributions varying in magnitude over a model's domain of applicability. A resampling experiment based on PLS regressions on four QSAR data sets demonstrated that predictive errors assessed by k nearest neighbour or weighted PRedicted Error Sum of Squares (PRESS) on samples of external test data or by internal cross-validation improved the performance of the uncertainty assessment. Analogy using similarity defined by Euclidean distances, or differences in standard deviation in perturbed predictions, resulted in better performances than similarity defined by distance to, or density of, the training data. Locally assessed predictive distributions had on average at least as good coverage as Gaussian distribution with variance assessed from the PRESS. An R-code is provided that evaluates performances of the suggested algorithms to assess predictive error based on log likelihood scores and empirical coverage graphs, and which applies these to derive confidence intervals or samples from the predictive distributions of query compounds.},
  author       = {Sahlin, Ullrika and Jeliazkova, N. and Oberg, T.},
  issn         = {1868-1751},
  keyword      = {Predictive error,Variance,Reliability,Bootstrap,Risk assessment},
  language     = {eng},
  number       = {1},
  pages        = {26--35},
  publisher    = {John Wiley & Sons},
  series       = {Molecular Informatics},
  title        = {Applicability Domain Dependent Predictive Uncertainty in QSAR Regressions},
  url          = {http://dx.doi.org/10.1002/minf.201200131},
  volume       = {33},
  year         = {2014},
}