A new pipeline for the normalization and pooling of metabolomics data

Viallon, Vivian; His, Mathilde; Rinaldi, Sabina; Breeur, Marie; Gicquiau, Audrey; Hemon, Bertrand; Overvad, Kim; Tjønneland, Anne; Rostgaard-Hansen, Agnetha Linn; Rothwell, Joseph A.; Lecuyer, Lucie; Severi, Gianluca; Kaaks, Rudolf; Johnson, Theron; Schulze, Matthias B.; Palli, Domenico; Agnoli, Claudia; Panico, Salvatore; Tumino, Rosario; Ricceri, Fulvio; Monique Verschuren, W. M.; Engelfriet, Peter; Onland-Moret, Charlotte; Vermeulen, Roel; Nøst, Therese Haugdahl; Urbarova, Ilona; Zamora-Ros, Raul; Rodriguez-Barranco, Miguel; Amiano, Pilar; Huerta, José Maria; Ardanaz, Eva; Melander, Olle; Ottoson, Filip; Vidman, Linda; Rentoft, Matilda; Schmidt, Julie A.; Travis, Ruth C.; Weiderpass, Elisabete; Johansson, Mattias; Dossus, Laure; Jenab, Mazda; Gunter, Marc J.; Bermejo, Justo Lorenzo; Scherer, Dominique; Salek, Reza M.; Keski-Rahkonen, Pekka; Ferrari, Pietro

A new pipeline for the normalization and pooling of metabolomics data

Mark

Viallon, Vivian ; His, Mathilde ; Rinaldi, Sabina ; Breeur, Marie ; Gicquiau, Audrey ; Hemon, Bertrand ; Overvad, Kim ; Tjønneland, Anne ; Rostgaard-Hansen, Agnetha Linn and Rothwell, Joseph A. , et al. (2021) In Metabolites 11(9).

Abstract: Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers;... (More); Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples’ originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/13d41a4f-7812-4615-8ca3-76bee31b3125

author

Viallon, Vivian ; His, Mathilde ; Rinaldi, Sabina ; Breeur, Marie ; Gicquiau, Audrey ; Hemon, Bertrand ; Overvad, Kim ; Tjønneland, Anne ; Rostgaard-Hansen, Agnetha Linn and Rothwell, Joseph A. , et al. (More)

Viallon, Vivian ; His, Mathilde ; Rinaldi, Sabina ; Breeur, Marie ; Gicquiau, Audrey ; Hemon, Bertrand ; Overvad, Kim ; Tjønneland, Anne ; Rostgaard-Hansen, Agnetha Linn ; Rothwell, Joseph A. ; Lecuyer, Lucie ; Severi, Gianluca ; Kaaks, Rudolf ; Johnson, Theron ; Schulze, Matthias B. ; Palli, Domenico ; Agnoli, Claudia ; Panico, Salvatore ; Tumino, Rosario ; Ricceri, Fulvio ; Monique Verschuren, W. M. ; Engelfriet, Peter ; Onland-Moret, Charlotte ; Vermeulen, Roel ; Nøst, Therese Haugdahl ; Urbarova, Ilona ; Zamora-Ros, Raul ; Rodriguez-Barranco, Miguel ; Amiano, Pilar ; Huerta, José Maria ; Ardanaz, Eva ; Melander, Olle ^LU

; Ottoson, Filip ; Vidman, Linda ; Rentoft, Matilda ; Schmidt, Julie A. ; Travis, Ruth C. ; Weiderpass, Elisabete ; Johansson, Mattias ; Dossus, Laure ; Jenab, Mazda ; Gunter, Marc J. ; Bermejo, Justo Lorenzo ; Scherer, Dominique ; Salek, Reza M. ; Keski-Rahkonen, Pekka and Ferrari, Pietro (Less)

organization

publishing date

2021-09

type

Contribution to journal

publication status

published

subject

Bioinformatics (Computational Biology)

keywords

Cancer epidemiology, Metabolites, Metabolomics, Normalization, Pooling, Technical variability

in

Metabolites

volume

11

issue

9

article number

631

publisher

MDPI AG

external identifiers

pmid:34564446
scopus:85115861814

ISSN

2218-1989

DOI

10.3390/metabo11090631

language

English

LU publication?

yes

additional info

id

13d41a4f-7812-4615-8ca3-76bee31b3125

date added to LUP

2021-10-14 13:46:33

date last changed

2026-01-13 22:10:40

@article{13d41a4f-7812-4615-8ca3-76bee31b3125,
  abstract     = {{<p>Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples’ originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.</p>}},
  author       = {{Viallon, Vivian and His, Mathilde and Rinaldi, Sabina and Breeur, Marie and Gicquiau, Audrey and Hemon, Bertrand and Overvad, Kim and Tjønneland, Anne and Rostgaard-Hansen, Agnetha Linn and Rothwell, Joseph A. and Lecuyer, Lucie and Severi, Gianluca and Kaaks, Rudolf and Johnson, Theron and Schulze, Matthias B. and Palli, Domenico and Agnoli, Claudia and Panico, Salvatore and Tumino, Rosario and Ricceri, Fulvio and Monique Verschuren, W. M. and Engelfriet, Peter and Onland-Moret, Charlotte and Vermeulen, Roel and Nøst, Therese Haugdahl and Urbarova, Ilona and Zamora-Ros, Raul and Rodriguez-Barranco, Miguel and Amiano, Pilar and Huerta, José Maria and Ardanaz, Eva and Melander, Olle and Ottoson, Filip and Vidman, Linda and Rentoft, Matilda and Schmidt, Julie A. and Travis, Ruth C. and Weiderpass, Elisabete and Johansson, Mattias and Dossus, Laure and Jenab, Mazda and Gunter, Marc J. and Bermejo, Justo Lorenzo and Scherer, Dominique and Salek, Reza M. and Keski-Rahkonen, Pekka and Ferrari, Pietro}},
  issn         = {{2218-1989}},
  keywords     = {{Cancer epidemiology; Metabolites; Metabolomics; Normalization; Pooling; Technical variability}},
  language     = {{eng}},
  number       = {{9}},
  publisher    = {{MDPI AG}},
  series       = {{Metabolites}},
  title        = {{A new pipeline for the normalization and pooling of metabolomics data}},
  url          = {{http://dx.doi.org/10.3390/metabo11090631}},
  doi          = {{10.3390/metabo11090631}},
  volume       = {{11}},
  year         = {{2021}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

A new pipeline for the normalization and pooling of metabolomics data