Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Deep Autoencoders for Compression in High Energy Physics

Wulff, Eric LU (2020) PHYM01 20192
Particle and nuclear physics
Department of Physics
Abstract
Current technological limitations make it impossible to store the enormous amount of data produced from proton-proton collisions by the ATLAS detector at CERN's Large Hadron Collider. Therefore, specialised hardware and software is being used to decide what data, or which proton-proton \textit{collision events}, to save and which to discard. A reduction in the storage size of each collision event is desirable as it would allow for more data to be saved and thereby make a larger set of physics analyses possible.

The focus of this thesis is to understand whether it is possible to reduce the storage size of previously mentioned collision events using machine learning techniques for dimensionality reduction. This has never before been tried... (More)
Current technological limitations make it impossible to store the enormous amount of data produced from proton-proton collisions by the ATLAS detector at CERN's Large Hadron Collider. Therefore, specialised hardware and software is being used to decide what data, or which proton-proton \textit{collision events}, to save and which to discard. A reduction in the storage size of each collision event is desirable as it would allow for more data to be saved and thereby make a larger set of physics analyses possible.

The focus of this thesis is to understand whether it is possible to reduce the storage size of previously mentioned collision events using machine learning techniques for dimensionality reduction. This has never before been tried within the ATLAS experiment and is an interesting forward-looking study for future experiments. Specifically, autoencoder neural networks are used to compress a number of variables into a smaller latent space used for storage. Different neural network architectures with varying width and depth are explored.

The AEs are trained and validated on experimental data from the ATLAS detector and their performance is tested on an independent signal Monte-Carlo sample. The AEs are shown to successfully compress and decompress simple hadron jet data and preliminary results indicate that the reconstruction quality is good enough for certain applications where high precision is not paramount. The AEs are evaluated by their reconstruction error, the relative error of each compressed variable and the ability to retain good resolution of a dijet mass signal (from the previously mentioned Monte-Carlo sample) after encoding and decoding hadron jet data. (Less)
Popular Abstract
Artificial intelligence could help scientists in their search for new fundamental particles

A neural network trained to find patterns in experimental data from the world's largest particle accelerator, the LHC, might be able to help scientists in their search for new fundamental particles. In a preliminary study, it has been able to reduce the storage needed to save the enormous amounts of data produced by particle detectors such as ATLAS, located at CERN in Geneva, Switzerland.

Located at the European Organisation for Nuclear Research, CERN, in Geneva Switzerland is the world's largest particle accelerator, the Large Hadron Collider (LHC). Most of the time it is used to accelerate protons to speeds of up to 99.999999 % the speed of... (More)
Artificial intelligence could help scientists in their search for new fundamental particles

A neural network trained to find patterns in experimental data from the world's largest particle accelerator, the LHC, might be able to help scientists in their search for new fundamental particles. In a preliminary study, it has been able to reduce the storage needed to save the enormous amounts of data produced by particle detectors such as ATLAS, located at CERN in Geneva, Switzerland.

Located at the European Organisation for Nuclear Research, CERN, in Geneva Switzerland is the world's largest particle accelerator, the Large Hadron Collider (LHC). Most of the time it is used to accelerate protons to speeds of up to 99.999999 % the speed of light. That's just 10 km/h slower than light.

These protons are then smashed together in special collision points around the LHC. The proton-proton collisions result in a transformation of the collision energy into a myriad of new particles. Giant detectors are then used to record information like the mass, energy and paths of these new particles. This recorded information is what makes up the experimental data.

A special kind of neural network called an autoencoder was allowed to study some of the experimental data collected by ATLAS, the largest particle detector at CERN. The data used for this study consists of so called jets, which are basically groups of particles that all travel in roughly the same direction, and are a product of the highly energetic proton-proton collisions at the LHC. The task given to the neural network was to find patterns and correlations in the data that could be exploited to compress that data. If, for instance, the network would discover that two variables are related by a multiplicative factor it could save just one of the variables and remember this multiplicative factor. Then, when the second variable is needed, it can easily be computed from the first. In realty, of course, the relationships between jet variables can be much more complex, otherwise we wouldn't need to ask artificial intelligence to give us a hand.

Although this may sound great, there are still drawbacks. The network is never able to reconstruct the original data perfectly once it has been compressed. It can get very close but will never reach perfection. In some applications however, this doesn't have to be a problem. Certain analyses that scientists at CERN want to do do not require absolute precision but rather would benefit from having more experimental data. For such analyses, this neural network could be a big help for scientists and may contribute to discoveries of new physics in the future. (Less)
Please use this url to cite or link to this publication:
author
Wulff, Eric LU
supervisor
organization
course
PHYM01 20192
year
type
H2 - Master's Degree (Two Years)
subject
keywords
ATLAS, hadron jet, machine learning, autoencoder, dimensionality reduction, artificial intelligence, high energy physics, data compression
language
English
id
9004751
date added to LUP
2020-02-19 13:24:49
date last changed
2020-02-19 13:24:49
@misc{9004751,
  abstract     = {{Current technological limitations make it impossible to store the enormous amount of data produced from proton-proton collisions by the ATLAS detector at CERN's Large Hadron Collider. Therefore, specialised hardware and software is being used to decide what data, or which proton-proton \textit{collision events}, to save and which to discard. A reduction in the storage size of each collision event is desirable as it would allow for more data to be saved and thereby make a larger set of physics analyses possible.

The focus of this thesis is to understand whether it is possible to reduce the storage size of previously mentioned collision events using machine learning techniques for dimensionality reduction. This has never before been tried within the ATLAS experiment and is an interesting forward-looking study for future experiments. Specifically, autoencoder neural networks are used to compress a number of variables into a smaller latent space used for storage. Different neural network architectures with varying width and depth are explored.

The AEs are trained and validated on experimental data from the ATLAS detector and their performance is tested on an independent signal Monte-Carlo sample. The AEs are shown to successfully compress and decompress simple hadron jet data and preliminary results indicate that the reconstruction quality is good enough for certain applications where high precision is not paramount. The AEs are evaluated by their reconstruction error, the relative error of each compressed variable and the ability to retain good resolution of a dijet mass signal (from the previously mentioned Monte-Carlo sample) after encoding and decoding hadron jet data.}},
  author       = {{Wulff, Eric}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Deep Autoencoders for Compression in High Energy Physics}},
  year         = {{2020}},
}