Predictive Modeling of Pipetting Dynamics. Multivariate Regression Analysis: PLS and ANN for Estimating Density and Volume from Pressure Recordings

Linard Pedersen, Lisa

Predictive Modeling of Pipetting Dynamics. Multivariate Regression Analysis: PLS and ANN for Estimating Density and Volume from Pressure Recordings

Mark

Linard Pedersen, Lisa ^LU (2024) BMEM01 20241
Department of Biomedical Engineering

Abstract: Thermo Fisher Scientific manufacture automatic pipetting instruments for diagnostic tests. These tests are sensitive to abnormalities and changes in e.g. volume or density could potentially lead to less precision or other issues in the pipetting work flow. Utilizing data collected from a pressure sensor inside the pipette could be a way of automatically verifying different aspects related to the pipetting. Machine learning may be a powerful tool in continuously evaluating these aspects and keeping the handler notified of any changes.

This thesis aims to investigate the feasibility of extracting useful insights from pipetting pressure recordings. The initial objective was classifying error causes such as bubbles or foam in the pipette... (More); Thermo Fisher Scientific manufacture automatic pipetting instruments for diagnostic tests. These tests are sensitive to abnormalities and changes in e.g. volume or density could potentially lead to less precision or other issues in the pipetting work flow. Utilizing data collected from a pressure sensor inside the pipette could be a way of automatically verifying different aspects related to the pipetting. Machine learning may be a powerful tool in continuously evaluating these aspects and keeping the handler notified of any changes.

This thesis aims to investigate the feasibility of extracting useful insights from pipetting pressure recordings. The initial objective was classifying error causes such as bubbles or foam in the pipette and data was collected with this in mind. This however was not successful as these errors were not detectable in the pressure recordings. Hence, the thesis focuses on the secondary objective, to estimate pipetted volume and density based on pressure sensor data.

The data collection was done using the Thermo Fisher pipetting instrument Phadia 200. Three different sets were collected. D1 data set consisting of 4 groups of 80 observations each. These were water, 5% glycerol, 10% glycerol and 40% glycerol. D2 data set consisting of 3 groups of 50 observations each. These were three different human samples. D3 data set consisting of 3 groups of 50 observations each. These were 2.5% glycerol, 7.5% glycerol and 20% glycerol. Pressure recordings as well as estimated volumes for each sample were collected. A partial least squares model (PLS) and an artificial neural network (ANN) model were used for the regression problem.

The results of the regressions were not satisfactory and it was concluded that the data was not ideal for the task. All models but the ones where all data sets were included in training yielded very poor R2 scores, especially in the volume estimations. The best model was a PLS model which had an R2 of 0.96 in volume predictions and 0.54 in density predictions. This model had an RMSE of 0.9660 in volume predictions and 0.0140 in density predictions. However, since this model was trained with all data and did not predict on any new densities, this does not say anything about generalizability to new and unseen data.

The model that had the best results for predicting unseen data was a PLS model trained on D1 data set predicting the D3 data set. For these density predictions, R2 was 0.80 and RMSE 0.0093. For the volume predictions however, R2 was -40 and RMSE 2.1135.

The data was collected with the primary objective, a classification problem, in mind. Since the data was finally used for a regression task, it was concluded that shortcomings in the experimental design were a crucial aspect affecting the results. It is however not possible to say whether a better set up and data set would yield better results. There is a risk that the relationships between pressure, volume and density simply are not clear enough or are too easily affected by outside factors. The conclusion is therefore that further investigation needs to be done in order to evaluate the feasibility of the methods. (Less)
Popular Abstract (Swedish): Ditt pipetteringsinstrument skulle kunna vägleda dig och leverera information om förändringar eller problem i realtid.

Genom att kontinuerligt undersöka sensordata kan maskininlärningsalgoritmer vara ett kraftfullt verktyg för att utvärdera olika aspekter och hålla dig informerad om möjliga fel som uppstår under arbetet. Detta skulle kunna förhindra dolda fel som volym- eller densitetsavvikelser och hjälpa den som hanterar ett instrument genom att bidra med information vid eventuella problem eller missöden.

I detta arbete har vi undersökt möjligheten att extrahera användbara insikter från tryckavläsningar vid pipettering. Denna data samlades in med hjälp av Thermo Fishers automatiserade pipetteringsinstrument Phadia 200. Genom att... (More); Ditt pipetteringsinstrument skulle kunna vägleda dig och leverera information om förändringar eller problem i realtid.

Genom att kontinuerligt undersöka sensordata kan maskininlärningsalgoritmer vara ett kraftfullt verktyg för att utvärdera olika aspekter och hålla dig informerad om möjliga fel som uppstår under arbetet. Detta skulle kunna förhindra dolda fel som volym- eller densitetsavvikelser och hjälpa den som hanterar ett instrument genom att bidra med information vid eventuella problem eller missöden.

I detta arbete har vi undersökt möjligheten att extrahera användbara insikter från tryckavläsningar vid pipettering. Denna data samlades in med hjälp av Thermo Fishers automatiserade pipetteringsinstrument Phadia 200. Genom att utnyttja denna sensordata var syftet att undersöka huruvida en algoritm skulle kunna leverera information om saker som densitet, volym eller felkällor som till exempel igensättningar i pipetten eller skum i provet. Till en början undersöktes felkällorna, men eftersom dessa inte var detekterbara i tryckavläsningarna blev densitet och volym istället de huvudsakliga intresseområdena.

Volym och densitet vid pipettering är av intresse eftersom diagnostiska tester kan påverkas mycket av till synes små fel eller skillnader. En okänd avvikelse i arbetsprocessen skulle kunna leda till feldiagnostisering och då alltså innebära fara för en patient. Då vore en algoritm som kontinuerligt rapporterar möjliga avvikelser och registrerar aspekter såsom densitet till stor hjälp. Om det skulle ske något oförutsägbart under en arbetsprocess skulle algoritmen kunna bidra med information kring vad problemet skulle kunna vara för den som hanterar instrumentet, detta hade kunnat underlätta och effektivisera arbetet.

Analysen stötte på vissa problem och den datan som använts tros ha varit en avgörande del i dessa motgångar. Det finns dock fortfarande potential för metoderna att ge goda resultat om experimentets design och procedur förbättras. Detta är emellertid beroende av tydliga samband mellan sensordata och pipetteringsaspekter. Av denna anledning diskuterar denna rapport ytterligare potentiella felorsaker och utveckling av de använda metoderna. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9149493

author

Linard Pedersen, Lisa ^LU

supervisor

Frida Sandberg ^LU
Jan Ybrahim

organization

Department of Biomedical Engineering

alternative title

Prediktiv modellering av pipetteringsdynamik. Multivariat Regressionsanalys: PLS och ANN för estimering av densitet och volym från tryckdata.

course

BMEM01 20241

year

2024

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

language

English

additional info

2024-03

id

9149493

date added to LUP

2024-03-19 11:27:09

date last changed

2024-03-19 11:27:09

@misc{9149493,
abstract = {{Thermo Fisher Scientific manufacture automatic pipetting instruments for diagnostic tests. These tests are sensitive to abnormalities and changes in e.g. volume or density could potentially lead to less precision or other issues in the pipetting work flow. Utilizing data collected from a pressure sensor inside the pipette could be a way of automatically verifying different aspects related to the pipetting. Machine learning may be a powerful tool in continuously evaluating these aspects and keeping the handler notified of any changes.

This thesis aims to investigate the feasibility of extracting useful insights from pipetting pressure recordings. The initial objective was classifying error causes such as bubbles or foam in the pipette and data was collected with this in mind. This however was not successful as these errors were not detectable in the pressure recordings. Hence, the thesis focuses on the secondary objective, to estimate pipetted volume and density based on pressure sensor data.

The data collection was done using the Thermo Fisher pipetting instrument Phadia 200. Three different sets were collected. D1 data set consisting of 4 groups of 80 observations each. These were water, 5% glycerol, 10% glycerol and 40% glycerol. D2 data set consisting of 3 groups of 50 observations each. These were three different human samples. D3 data set consisting of 3 groups of 50 observations each. These were 2.5% glycerol, 7.5% glycerol and 20% glycerol. Pressure recordings as well as estimated volumes for each sample were collected. A partial least squares model (PLS) and an artificial neural network (ANN) model were used for the regression problem.

The results of the regressions were not satisfactory and it was concluded that the data was not ideal for the task. All models but the ones where all data sets were included in training yielded very poor R2 scores, especially in the volume estimations. The best model was a PLS model which had an R2 of 0.96 in volume predictions and 0.54 in density predictions. This model had an RMSE of 0.9660 in volume predictions and 0.0140 in density predictions. However, since this model was trained with all data and did not predict on any new densities, this does not say anything about generalizability to new and unseen data.

The model that had the best results for predicting unseen data was a PLS model trained on D1 data set predicting the D3 data set. For these density predictions, R2 was 0.80 and RMSE 0.0093. For the volume predictions however, R2 was -40 and RMSE 2.1135.

The data was collected with the primary objective, a classification problem, in mind. Since the data was finally used for a regression task, it was concluded that shortcomings in the experimental design were a crucial aspect affecting the results. It is however not possible to say whether a better set up and data set would yield better results. There is a risk that the relationships between pressure, volume and density simply are not clear enough or are too easily affected by outside factors. The conclusion is therefore that further investigation needs to be done in order to evaluate the feasibility of the methods.}},
author = {{Linard Pedersen, Lisa}},
language = {{eng}},
note = {{Student Paper}},
title = {{Predictive Modeling of Pipetting Dynamics. Multivariate Regression Analysis: PLS and ANN for Estimating Density and Volume from Pressure Recordings}},
year = {{2024}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Predictive Modeling of Pipetting Dynamics. Multivariate Regression Analysis: PLS and ANN for Estimating Density and Volume from Pressure Recordings