Deep Autoencoders for Compression in High Energy Physics

Wulff, Eric

Deep Autoencoders for Compression in High Energy Physics

Mark

Wulff, Eric ^LU (2020) PHYM01 20192
Particle and nuclear physics
Department of Physics

Abstract: Current technological limitations make it impossible to store the enormous amount of data produced from proton-proton collisions by the ATLAS detector at CERN's Large Hadron Collider. Therefore, specialised hardware and software is being used to decide what data, or which proton-proton \textit{collision events}, to save and which to discard. A reduction in the storage size of each collision event is desirable as it would allow for more data to be saved and thereby make a larger set of physics analyses possible.

The focus of this thesis is to understand whether it is possible to reduce the storage size of previously mentioned collision events using machine learning techniques for dimensionality reduction. This has never before been tried... (More); Current technological limitations make it impossible to store the enormous amount of data produced from proton-proton collisions by the ATLAS detector at CERN's Large Hadron Collider. Therefore, specialised hardware and software is being used to decide what data, or which proton-proton \textit{collision events}, to save and which to discard. A reduction in the storage size of each collision event is desirable as it would allow for more data to be saved and thereby make a larger set of physics analyses possible.

The focus of this thesis is to understand whether it is possible to reduce the storage size of previously mentioned collision events using machine learning techniques for dimensionality reduction. This has never before been tried within the ATLAS experiment and is an interesting forward-looking study for future experiments. Specifically, autoencoder neural networks are used to compress a number of variables into a smaller latent space used for storage. Different neural network architectures with varying width and depth are explored.

The AEs are trained and validated on experimental data from the ATLAS detector and their performance is tested on an independent signal Monte-Carlo sample. The AEs are shown to successfully compress and decompress simple hadron jet data and preliminary results indicate that the reconstruction quality is good enough for certain applications where high precision is not paramount. The AEs are evaluated by their reconstruction error, the relative error of each compressed variable and the ability to retain good resolution of a dijet mass signal (from the previously mentioned Monte-Carlo sample) after encoding and decoding hadron jet data. (Less)
Popular Abstract: Artificial intelligence could help scientists in their search for new fundamental particles

A neural network trained to find patterns in experimental data from the world's largest particle accelerator, the LHC, might be able to help scientists in their search for new fundamental particles. In a preliminary study, it has been able to reduce the storage needed to save the enormous amounts of data produced by particle detectors such as ATLAS, located at CERN in Geneva, Switzerland.

Located at the European Organisation for Nuclear Research, CERN, in Geneva Switzerland is the world's largest particle accelerator, the Large Hadron Collider (LHC). Most of the time it is used to accelerate protons to speeds of up to 99.999999 % the speed of... (More); Artificial intelligence could help scientists in their search for new fundamental particles

A neural network trained to find patterns in experimental data from the world's largest particle accelerator, the LHC, might be able to help scientists in their search for new fundamental particles. In a preliminary study, it has been able to reduce the storage needed to save the enormous amounts of data produced by particle detectors such as ATLAS, located at CERN in Geneva, Switzerland.

Located at the European Organisation for Nuclear Research, CERN, in Geneva Switzerland is the world's largest particle accelerator, the Large Hadron Collider (LHC). Most of the time it is used to accelerate protons to speeds of up to 99.999999 % the speed of light. That's just 10 km/h slower than light.

These protons are then smashed together in special collision points around the LHC. The proton-proton collisions result in a transformation of the collision energy into a myriad of new particles. Giant detectors are then used to record information like the mass, energy and paths of these new particles. This recorded information is what makes up the experimental data.

A special kind of neural network called an autoencoder was allowed to study some of the experimental data collected by ATLAS, the largest particle detector at CERN. The data used for this study consists of so called jets, which are basically groups of particles that all travel in roughly the same direction, and are a product of the highly energetic proton-proton collisions at the LHC. The task given to the neural network was to find patterns and correlations in the data that could be exploited to compress that data. If, for instance, the network would discover that two variables are related by a multiplicative factor it could save just one of the variables and remember this multiplicative factor. Then, when the second variable is needed, it can easily be computed from the first. In realty, of course, the relationships between jet variables can be much more complex, otherwise we wouldn't need to ask artificial intelligence to give us a hand.

Although this may sound great, there are still drawbacks. The network is never able to reconstruct the original data perfectly once it has been compressed. It can get very close but will never reach perfection. In some applications however, this doesn't have to be a problem. Certain analyses that scientists at CERN want to do do not require absolute precision but rather would benefit from having more experimental data. For such analyses, this neural network could be a big help for scientists and may contribute to discoveries of new physics in the future. (Less)

- Open Access
- |
- PDF

Links

Document download statistics

Related Materials

Related object is popular science:
Popular Science Summary

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9004751

author

Wulff, Eric ^LU

supervisor

Caterina Doglioni ^LU

organization

course

PHYM01 20192

year

2020

type

H2 - Master's Degree (Two Years)

subject

Physics and Astronomy

keywords

ATLAS, hadron jet, machine learning, autoencoder, dimensionality reduction, artificial intelligence, high energy physics, data compression

language

English

id

9004751

date added to LUP

2020-02-19 13:24:49

date last changed

2020-02-19 13:24:49

@misc{9004751,
  abstract     = {{Current technological limitations make it impossible to store the enormous amount of data produced from proton-proton collisions by the ATLAS detector at CERN's Large Hadron Collider. Therefore, specialised hardware and software is being used to decide what data, or which proton-proton \textit{collision events}, to save and which to discard. A reduction in the storage size of each collision event is desirable as it would allow for more data to be saved and thereby make a larger set of physics analyses possible.

The focus of this thesis is to understand whether it is possible to reduce the storage size of previously mentioned collision events using machine learning techniques for dimensionality reduction. This has never before been tried within the ATLAS experiment and is an interesting forward-looking study for future experiments. Specifically, autoencoder neural networks are used to compress a number of variables into a smaller latent space used for storage. Different neural network architectures with varying width and depth are explored.

The AEs are trained and validated on experimental data from the ATLAS detector and their performance is tested on an independent signal Monte-Carlo sample. The AEs are shown to successfully compress and decompress simple hadron jet data and preliminary results indicate that the reconstruction quality is good enough for certain applications where high precision is not paramount. The AEs are evaluated by their reconstruction error, the relative error of each compressed variable and the ability to retain good resolution of a dijet mass signal (from the previously mentioned Monte-Carlo sample) after encoding and decoding hadron jet data.}},
  author       = {{Wulff, Eric}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Deep Autoencoders for Compression in High Energy Physics}},
  year         = {{2020}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Deep Autoencoders for Compression in High Energy Physics