Autoencoder Compression in High Energy Physics

Åstrand, Sten

Autoencoder Compression in High Energy Physics

Mark

Åstrand, Sten ^LU (2022) PHYM01 20212
Particle and nuclear physics
Department of Physics
Faculty of Engineering, LTH

Abstract: Situated in Geneva, Switzerland, the Large Hadron Collider is largest particle accelerator in the world, and as such, its operation carries with it some of the greatest technical challenges ever faced. Among them are the huge demands put on data storage capacity by experiments in particle physics, both in terms of rate and volume of data. Several systems are employed to manage and reduce the flow of data generated at the collider experiment stations. This comes at the cost of a reduced amount of material available for study.

This thesis analyses a relatively novel method of compressing, and thereby reducing the storage requirements of, data describing jets - showers of particles created in collisions between protons in the ATLAS... (More); Situated in Geneva, Switzerland, the Large Hadron Collider is largest particle accelerator in the world, and as such, its operation carries with it some of the greatest technical challenges ever faced. Among them are the huge demands put on data storage capacity by experiments in particle physics, both in terms of rate and volume of data. Several systems are employed to manage and reduce the flow of data generated at the collider experiment stations. This comes at the cost of a reduced amount of material available for study.

This thesis analyses a relatively novel method of compressing, and thereby reducing the storage requirements of, data describing jets - showers of particles created in collisions between protons in the ATLAS experiment at the Large Hadron Collider. The main tool used for this compression is an artificial neural network of a type called an autoencoder. Such compression has previously been shown to be possible on single jets. As a continuation of that work, this thesis investigates whether it is possible to compress groups of jets with better results than when compressing them individually.

To that end, several autoencoder models are trained on jet groups of different configurations. These autoencoders are shown to be able to replicate the results of previous, single-jet studies, but the errors introduced during compression increase when jets are compressed in a group. This holds true for jets from the same proton-proton collision as well as jets randomly selected from a larger dataset. It is demonstrated that groups specifically made to contain jets with almost identical values of one variable can be compressed at a higher ratio than individual jets, with only slightly increased errors. However, this process carries with it the requirement of access to a large dataset, which is not possible if applied in a particle physics experiment, where data is gathered detection by detection. (Less)
Popular Abstract: Artificial intelligence on the path to understanding the building blocks of the universe

The use of artificially intelligent algorithms has recently emerged as a way to aid particle physics research. Scientists in this field, studying the smallest building blocks of the universe, have long faced the challenges of “Big Data”. Rare, exotic particles like the Higgs boson – finally discovered in 2012 after being theorised to exist for 50 years – do not appear alone. Instead, they are created together with billions of other particles, and the total amount of information – or data – coming out of an experiment is huge. The hope is that the use of AI to find patterns in this data will make it easier to manage, increasing the chances of future... (More); Artificial intelligence on the path to understanding the building blocks of the universe

The use of artificially intelligent algorithms has recently emerged as a way to aid particle physics research. Scientists in this field, studying the smallest building blocks of the universe, have long faced the challenges of “Big Data”. Rare, exotic particles like the Higgs boson – finally discovered in 2012 after being theorised to exist for 50 years – do not appear alone. Instead, they are created together with billions of other particles, and the total amount of information – or data – coming out of an experiment is huge. The hope is that the use of AI to find patterns in this data will make it easier to manage, increasing the chances of future physics discoveries.

A particle physicist’s dream is to be able to isolate interesting particles, weighing and measuring them at their leisure. In reality, this will never be possible, not least because exotic particles exist for only a very brief time before decaying into other particles, which in turn decay themselves. Therefore, out of an experimental chamber, showers of particles called jets flow in every direction and get caught by particle detectors surrounding the experiment. Out of the data collected about these jets, a picture of the original, exotic particle is pieced together. This is made harder by the fact that during each second of an experiment, a large number of jets are created, some interesting, most of them not. Taking in the information of all of those jets would be like watching millions of Netflix movies simultaneously – a rate of information that the detector systems simply cannot handle.

Several sophisticated systems are used to filter the flow of detector data, at the cost of reducing the amount of information available for study. Recently, a new method of reducing the size of this information has been tested, as an alternative or complement to filtering the information. A type of artificially intelligent neural network called an autoencoder was trained to identify patterns within the data that describes a jet. Through Einstein’s famous E=mc2, it is well known that the energy and mass of an object are related, and artificial intelligence is believed to be able to identify other such relationships or patterns. Through such a process, it has been shown that an autoencoder can compress information describing a single jet at the cost of the introduction of relatively small inaccuracies.

This thesis project builds upon this use of autoencoders for data compression. The basic idea was that grouping similar jets and compressing their information together, as opposed to treating them individually as in previous work, could allow compression with even smaller inaccuracies. This was found to be possible if the dataset describing the jets is preprocessed so that similarities between different jets become more evident, for example by sorting the dataset. However, sorting requires access to a large dataset and is time-consuming. It is therefore likely not applicable in a particle physics experiment, where each jet has to be detected and treated very rapidly, so that the system is then ready to detect the next incoming jet. Instead, the method may be useful in situations where one needs to reduce the size of a large collection of data, for example for transmission over an internet connection. This is another way to aid particle physics research, albeit not in the manner originally proposed in the thesis. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9075881

author

Åstrand, Sten ^LU

supervisor

Caterina Doglioni ^LU

organization

alternative title

Investigation of the utilization of inter-jet correlation for collective compression

course

PHYM01 20212

year

2022

type

H2 - Master's Degree (Two Years)

subject

Physics and Astronomy

keywords

High energy physics, LHC, ATLAS, hadron jet, machine learning, autoencoder, dimensionality reduction, artificial intelligence, data compression

language

English

id

9075881

date added to LUP

2022-03-28 15:45:31

date last changed

2022-03-28 15:45:31

@misc{9075881,
abstract = {{Situated in Geneva, Switzerland, the Large Hadron Collider is largest particle accelerator in the world, and as such, its operation carries with it some of the greatest technical challenges ever faced. Among them are the huge demands put on data storage capacity by experiments in particle physics, both in terms of rate and volume of data. Several systems are employed to manage and reduce the flow of data generated at the collider experiment stations. This comes at the cost of a reduced amount of material available for study.

This thesis analyses a relatively novel method of compressing, and thereby reducing the storage requirements of, data describing jets - showers of particles created in collisions between protons in the ATLAS experiment at the Large Hadron Collider. The main tool used for this compression is an artificial neural network of a type called an autoencoder. Such compression has previously been shown to be possible on single jets. As a continuation of that work, this thesis investigates whether it is possible to compress groups of jets with better results than when compressing them individually.

To that end, several autoencoder models are trained on jet groups of different configurations. These autoencoders are shown to be able to replicate the results of previous, single-jet studies, but the errors introduced during compression increase when jets are compressed in a group. This holds true for jets from the same proton-proton collision as well as jets randomly selected from a larger dataset. It is demonstrated that groups specifically made to contain jets with almost identical values of one variable can be compressed at a higher ratio than individual jets, with only slightly increased errors. However, this process carries with it the requirement of access to a large dataset, which is not possible if applied in a particle physics experiment, where data is gathered detection by detection.}},
author = {{Åstrand, Sten}},
language = {{eng}},
note = {{Student Paper}},
title = {{Autoencoder Compression in High Energy Physics}},
year = {{2022}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Autoencoder Compression in High Energy Physics