Advanced

PLS-Optimal: A Stepwise D-Optimal Design Based on Latent Variables

Brandmaier, Stefan; Sahlin, Ullrika LU ; Tetko, Igor V. and Oberg, Tomas (2012) In Journal of Chemical Information and Modeling 52(4). p.975-983
Abstract
Several applications, such as risk assessment within REACH or drug discovery, require reliable methods for the design of experiments and efficient testing strategies. Keeping the number of experiments as low as possible is important from both a financial and an ethical point of view, as exhaustive testing of compounds requires significant financial resources and animal lives. With a large initial set of compounds, experimental design techniques can be used to select a representative subset for testing. Once measured, these compounds can be used to develop quantitative structure activity relationship models to predict properties of the remaining compounds. This reduces the required resources and time. D-Optimal design is frequently used to... (More)
Several applications, such as risk assessment within REACH or drug discovery, require reliable methods for the design of experiments and efficient testing strategies. Keeping the number of experiments as low as possible is important from both a financial and an ethical point of view, as exhaustive testing of compounds requires significant financial resources and animal lives. With a large initial set of compounds, experimental design techniques can be used to select a representative subset for testing. Once measured, these compounds can be used to develop quantitative structure activity relationship models to predict properties of the remaining compounds. This reduces the required resources and time. D-Optimal design is frequently used to select an optimal set of compounds by analyzing data variance. We developed a new sequential approach to apply a D-Optimal design to latent variables derived from a partial least squares (PLS) model instead of principal components. The stepwise procedure selects a new set of molecules to be measured after each previous measurement cycle. We show that application of the D-Optimal selection generates models with a significantly improved performance on four different data sets with end points relevant for REACH. Compared to those derived from principal components, PLS models derived from the selection on latent variables had a lower root-mean-square error and a higher Q2 and R2. This improvement is statistically significant, especially for the small number of compounds selected. (Less)
Please use this url to cite or link to this publication:
author
publishing date
type
Contribution to journal
publication status
published
subject
in
Journal of Chemical Information and Modeling
volume
52
issue
4
pages
975 - 983
publisher
The American Chemical Society
external identifiers
  • wos:000303038000011
  • scopus:84862021618
ISSN
1549-960X
DOI
10.1021/ci3000198
language
English
LU publication?
no
id
a7f4f441-086a-4a0c-aae6-961ebeac5507 (old id 3800111)
date added to LUP
2013-05-24 12:25:09
date last changed
2017-10-08 03:21:10
@article{a7f4f441-086a-4a0c-aae6-961ebeac5507,
  abstract     = {Several applications, such as risk assessment within REACH or drug discovery, require reliable methods for the design of experiments and efficient testing strategies. Keeping the number of experiments as low as possible is important from both a financial and an ethical point of view, as exhaustive testing of compounds requires significant financial resources and animal lives. With a large initial set of compounds, experimental design techniques can be used to select a representative subset for testing. Once measured, these compounds can be used to develop quantitative structure activity relationship models to predict properties of the remaining compounds. This reduces the required resources and time. D-Optimal design is frequently used to select an optimal set of compounds by analyzing data variance. We developed a new sequential approach to apply a D-Optimal design to latent variables derived from a partial least squares (PLS) model instead of principal components. The stepwise procedure selects a new set of molecules to be measured after each previous measurement cycle. We show that application of the D-Optimal selection generates models with a significantly improved performance on four different data sets with end points relevant for REACH. Compared to those derived from principal components, PLS models derived from the selection on latent variables had a lower root-mean-square error and a higher Q2 and R2. This improvement is statistically significant, especially for the small number of compounds selected.},
  author       = {Brandmaier, Stefan and Sahlin, Ullrika and Tetko, Igor V. and Oberg, Tomas},
  issn         = {1549-960X},
  language     = {eng},
  number       = {4},
  pages        = {975--983},
  publisher    = {The American Chemical Society},
  series       = {Journal of Chemical Information and Modeling},
  title        = {PLS-Optimal: A Stepwise D-Optimal Design Based on Latent Variables},
  url          = {http://dx.doi.org/10.1021/ci3000198},
  volume       = {52},
  year         = {2012},
}