Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Goodness-of-fit Tests for Time Dependent Ensemble Averages

Lumsden, Jake LU (2022) FYTM04 20221
Computational Biology and Biological Physics - Undergoing reorganization
Department of Astronomy and Theoretical Physics - Undergoing reorganization
Abstract
Fitting a model to a time-dependent ensemble average is a process repeated frequently
throughout biophysics. A selected ensemble-averaged observable (⟨y(t)⟩) for a given system
can be predicted through the use of an estimated ensemble average, where the estimated
ensemble average is created via simulated or experimental data sets. Fitting a model to
this estimated ensemble average allows for estimations of ⟨y(t)⟩.

Often, one tests the quality of a fitted model through the use of a ’goodness-of-fit’ (GOF)
procedure. The quality of the model is determined by the placement of a test statistic
(S) on its associated probability distribution (φ(S)). Traditional choices of S, such as
the normalised residual sum of squares (RSS), neglect... (More)
Fitting a model to a time-dependent ensemble average is a process repeated frequently
throughout biophysics. A selected ensemble-averaged observable (⟨y(t)⟩) for a given system
can be predicted through the use of an estimated ensemble average, where the estimated
ensemble average is created via simulated or experimental data sets. Fitting a model to
this estimated ensemble average allows for estimations of ⟨y(t)⟩.

Often, one tests the quality of a fitted model through the use of a ’goodness-of-fit’ (GOF)
procedure. The quality of the model is determined by the placement of a test statistic
(S) on its associated probability distribution (φ(S)). Traditional choices of S, such as
the normalised residual sum of squares (RSS), neglect correlations in fluctuations of the
ensemble average around the fitted model. Under this assumption, the normalised RSS
is distributed according to the χ2-distribution (φχ2(S)), with mean (μ) and variance (σ2)
proportional to the degree of freedom (υ) of the fitted model. The inability of the traditional
χ2-GOF procedure to account for these correlations can lead to less reliable evaluations of
the quality of a fitted model.

The thesis covers the derivation and validation of the correct form of φ(S) when correla-
tions are considered, for use in a new GOF procedure. The new GOF procedure was tested
under varying parameters, correlation types and ensemble make-ups. Testing environments
included three ensemble generating prototype models, and three movies of noisified sim-
ulations of vesicle movement. It is demonstrated that the new GOF procedure correctly
accepts and rejects well and poor fitting models respectively, and is a valid indicator of
model quality. Furthermore, it is shown that compared to the traditional χ2-GOF proce-
dure, the new GOF procedure is a more accurate measure of model quality under a variety
of correlation types, is reliable in a greater region of parameter space, and performs better
in all tested scenarios. (Less)
Popular Abstract
Studies of micro and nanoscopic particles and molecules have been of huge benefit to the
biophysics community. The ability to conduct experiments at the microscopic level has
generated new knowledge about once unseen biological processes from virus incubation to
the life cycle of bacteria.

At the microscopic level, the recorded trajectories of a given system of particles are often
grouped into what is referred to as a time dependent ensemble average. A time dependent
ensemble average describes a given system of particles as a single stream of time dependent
data, that data being a chosen metric by which to average over time. Time dependent
ensemble averages allow for easier and more reliable interpretation of a system of... (More)
Studies of micro and nanoscopic particles and molecules have been of huge benefit to the
biophysics community. The ability to conduct experiments at the microscopic level has
generated new knowledge about once unseen biological processes from virus incubation to
the life cycle of bacteria.

At the microscopic level, the recorded trajectories of a given system of particles are often
grouped into what is referred to as a time dependent ensemble average. A time dependent
ensemble average describes a given system of particles as a single stream of time dependent
data, that data being a chosen metric by which to average over time. Time dependent
ensemble averages allow for easier and more reliable interpretation of a system of particles,
and for the extraction of certain system dependent parameters, such as how and the rate
at which certain particles move under a set of pre-defined conditions.

Fitting a model to the time dependent ensemble average allows for predictions further in
time to be made, and further parameters to be extracted, such as the rate of diffusion.

In practice, one often estimates a chosen ensemble-averaged observable (⟨y(t)⟩) through
the use of an estimated ensemble average made up of simulated or experimental data sets
containing ⟨y(t)⟩. Fitting a model to this estimated ensemble average then allows for the
extraction of a prediction of ⟨y(t)⟩.

For both predictions and extracted parameters to be accurate, one must make sure that the
model is well fitting. Goodness-of-fit (GOF) procedures are used frequently throughout
various scientific communities to ensure that models are well fitting to their ensemble
counterparts. Traditional GOF procedures, such as the χ2-GOF procedure, neglect any
correlation among the fluctuations of the ensemble average around the fitted model, leading
to less reliable determination of a fitted model’s quality.

This thesis fills this gap in traditional GOF procedures, developing a new GOF proce-
dure which includes the correlations in fluctuations of a ensemble average around a fitted
model. The new GOF procedure will allow scientists within the biophysics community to
make more reliable predictions and extract more accurate parameters from a given time
dependent ensemble average. The hope is that researchers will take to and use this new
GOF procedure to further the knowledge of intricate biological particles and processes, for
example, virus structure and transmission, or the diffusion of molecules through the lipid
membrane. (Less)
Please use this url to cite or link to this publication:
author
Lumsden, Jake LU
supervisor
organization
course
FYTM04 20221
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
9092999
date added to LUP
2022-06-28 09:04:38
date last changed
2022-06-29 14:02:58
@misc{9092999,
  abstract     = {{Fitting a model to a time-dependent ensemble average is a process repeated frequently
throughout biophysics. A selected ensemble-averaged observable (⟨y(t)⟩) for a given system
can be predicted through the use of an estimated ensemble average, where the estimated
ensemble average is created via simulated or experimental data sets. Fitting a model to
this estimated ensemble average allows for estimations of ⟨y(t)⟩.

Often, one tests the quality of a fitted model through the use of a ’goodness-of-fit’ (GOF)
procedure. The quality of the model is determined by the placement of a test statistic
(S) on its associated probability distribution (φ(S)). Traditional choices of S, such as
the normalised residual sum of squares (RSS), neglect correlations in fluctuations of the
ensemble average around the fitted model. Under this assumption, the normalised RSS
is distributed according to the χ2-distribution (φχ2(S)), with mean (μ) and variance (σ2)
proportional to the degree of freedom (υ) of the fitted model. The inability of the traditional
χ2-GOF procedure to account for these correlations can lead to less reliable evaluations of
the quality of a fitted model.

The thesis covers the derivation and validation of the correct form of φ(S) when correla-
tions are considered, for use in a new GOF procedure. The new GOF procedure was tested
under varying parameters, correlation types and ensemble make-ups. Testing environments
included three ensemble generating prototype models, and three movies of noisified sim-
ulations of vesicle movement. It is demonstrated that the new GOF procedure correctly
accepts and rejects well and poor fitting models respectively, and is a valid indicator of
model quality. Furthermore, it is shown that compared to the traditional χ2-GOF proce-
dure, the new GOF procedure is a more accurate measure of model quality under a variety
of correlation types, is reliable in a greater region of parameter space, and performs better
in all tested scenarios.}},
  author       = {{Lumsden, Jake}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Goodness-of-fit Tests for Time Dependent Ensemble Averages}},
  year         = {{2022}},
}