Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Improvement of the Rucio implementation for the LDCS platform and search for dark data

Yartsev, Piotr LU (2022) FYSK02 20211
Particle and nuclear physics
Department of Physics
Abstract
In this work we aim to implement a software package to detect and categorize dark data, data not accessible or not known by the user, generated in the simulations of the Light Dark Matter eXperiment (LDMX). This will involve studying current existing solutions for such problems, attempting to implement them for the Lightweight Distributed Computing System (LDCS), and developing our own Dark Data Search (DDS)
toolkit to perform the detection and categorization of the dark data. The result provided by these tools will be examined further for clues as to why and how dark data was created. Physics simulations of the LDMX detector were executed to create dark data, allowing us to study the conditions for their creation, and to get a deeper... (More)
In this work we aim to implement a software package to detect and categorize dark data, data not accessible or not known by the user, generated in the simulations of the Light Dark Matter eXperiment (LDMX). This will involve studying current existing solutions for such problems, attempting to implement them for the Lightweight Distributed Computing System (LDCS), and developing our own Dark Data Search (DDS)
toolkit to perform the detection and categorization of the dark data. The result provided by these tools will be examined further for clues as to why and how dark data was created. Physics simulations of the LDMX detector were executed to create dark data, allowing us to study the conditions for their creation, and to get a deeper understanding of the physics for a missing momentum dark matter experiment. Based on the research done in this paper a multitude of systematic problems was found that would require addressing for the LDCS. (Less)
Popular Abstract
Dark data from dark matter Piotr Yartsev. One of the great unsolved mysteries of how our world works is dark matter. Everything we see, everything we touch, real matter, only account for one-sixth of the matter in the universe. The rest we call dark matter. Although some theories exist, so far we don't have any idea what dark matter is. Studying it could give us a deeper fundamental understanding of how our universe works. An experiment designed to study dark matter is LDMX, which is planned to be constructed in San Francisco. Before the experiment is actually turned on, digital infrastructure has to be designed to be able to handle an estimated 10 Petabytes, that is 10 000 Terabytes, of data generated by the experiment. To put that number... (More)
Dark data from dark matter Piotr Yartsev. One of the great unsolved mysteries of how our world works is dark matter. Everything we see, everything we touch, real matter, only account for one-sixth of the matter in the universe. The rest we call dark matter. Although some theories exist, so far we don't have any idea what dark matter is. Studying it could give us a deeper fundamental understanding of how our universe works. An experiment designed to study dark matter is LDMX, which is planned to be constructed in San Francisco. Before the experiment is actually turned on, digital infrastructure has to be designed to be able to handle an estimated 10 Petabytes, that is 10 000 Terabytes, of data generated by the experiment. To put that number in perspective, that is enough storage to hold 5 000 000 hours of HD video, enough to keep you occupied 24 hours a day for almost 600 years. Everyone knows how tedious it is to spend time looking for that one document you can't remember where you saved, so you can imagine how difficult it is to keep track of thatmuch data that is stored at multiple locations all over the world. To solve this problem, the LDMX experiment chose to use Rucio as a data catalog, a software that was developed at the particle physics research center CERN in Switzerland, to keep track of what file is located where. Currently, the software does not work perfectly at the LDMX because sometimes the location of datafiles registered in Ruico does not match the file's actual location. This type of data is called dark data and is taking up valuable storage space. My bachelor’s project aims to solve this issue by creating a program that would be able to compare the data files in storage with the Rucio catalog and find all the inconsistencies. These data files not registered correctly in Rucio would be reported to the data storage center, which would then have to make a decision: do we update the Ruico catalog with the actual location of the data file or do we simply delete it? The development team working on the Rucio software has already encountered this problem and came up with a solution, but we need to see if it is applicable for LDCS. While research like this won't result in the newest iPhone, this doesn’t mean it would not be of use to people, since it answers more large-scope questions about the universe. These types of experiments often require the invention of new technology to make them possible, so working on creating experiment infrastructure could result in new useful technological discoveries. An example of such an invention is the World Wide Web, a thing impossible to even imagine life without now, which was created at a particle physics laboratory, CERN, by Tim Berners-Lee. (Less)
Please use this url to cite or link to this publication:
author
Yartsev, Piotr LU
supervisor
organization
course
FYSK02 20211
year
type
M2 - Bachelor Degree
subject
keywords
Dark matter, rucio, digital infrastructure, Root, LDMX, LDCS, dark data, Python, CERN, particle physics, big data, Geant4, Light Dark Matter eXperiment, Lightweight Distributed Computing System, data storage
language
English
id
9091190
date added to LUP
2022-08-15 14:55:50
date last changed
2022-08-15 14:55:50
@misc{9091190,
  abstract     = {{In this work we aim to implement a software package to detect and categorize dark data, data not accessible or not known by the user, generated in the simulations of the Light Dark Matter eXperiment (LDMX). This will involve studying current existing solutions for such problems, attempting to implement them for the Lightweight Distributed Computing System (LDCS), and developing our own Dark Data Search (DDS)
toolkit to perform the detection and categorization of the dark data. The result provided by these tools will be examined further for clues as to why and how dark data was created. Physics simulations of the LDMX detector were executed to create dark data, allowing us to study the conditions for their creation, and to get a deeper understanding of the physics for a missing momentum dark matter experiment. Based on the research done in this paper a multitude of systematic problems was found that would require addressing for the LDCS.}},
  author       = {{Yartsev, Piotr}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Improvement of the Rucio implementation for the LDCS platform and search for dark data}},
  year         = {{2022}},
}