Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Efficient Discovery Of Binary Stars

Navarro Barrachina, Pablo LU (2020) In Lund Observatory Examensarbeten ASTM31 20201
Lund Observatory - Undergoing reorganization
Department of Astronomy and Theoretical Physics - Undergoing reorganization
Abstract
Purpose: even in the era of exponential increase in the amount of stellar data gathered, binaries are still often overlooked in observational data due to the special handling they require. The goal of this work is to develop a method capable of automatically and efficiently identifying and extract double-lined spectroscopic binaries (SB2) from a spectroscopic survey, while being scalable and technically successful, and to identify and optimize the parameters that influence their detection.

Method: we combine two state-of-the-art machine learning algorithms that group the spec- tra in the data-set in clusters based on their similarities, projecting them in a human readable manner (t-distributed Stochastic Neighbor Embedding, t-SNE), and... (More)
Purpose: even in the era of exponential increase in the amount of stellar data gathered, binaries are still often overlooked in observational data due to the special handling they require. The goal of this work is to develop a method capable of automatically and efficiently identifying and extract double-lined spectroscopic binaries (SB2) from a spectroscopic survey, while being scalable and technically successful, and to identify and optimize the parameters that influence their detection.

Method: we combine two state-of-the-art machine learning algorithms that group the spec- tra in the data-set in clusters based on their similarities, projecting them in a human readable manner (t-distributed Stochastic Neighbor Embedding, t-SNE), and automatically identify and retrieve those clusters that contain binary spectra (Density Based Spacial Clustering of Applica- tions with Noise, DBSCAN). These methods are then optimized for efficient recovery of binaries from a synthetic spectroscopic data-set, where we know exactly which stars are single and which are binaries.

Results: we study the results following from 360 combinations of our method’s parameters and obtain a total average of recovered binaries of 57%. We show that under optimal conditions we are able to reach a recovery of 75%. We find that bluer spectral regions (450 nm - 600 nm) are better suited to identify binary stars than redder regions (600 nm - 900 nm) with our method. Not only this, but we also show that a moderate amount of noise can be beneficial and can improve the recovery of binary stars. Furthermore, we find that the stellar parameters that most influence the final recovery are the luminosity (or mass) ratio and the radial velocity different between the two stellar components of the binary system, while some standard stellar parameters can play a major role as well.

Conclusions: we show that our method and the adopted combination of machine learning algorithms to be successful at automatically detect and retrieve binary stars from our synthetic spectroscopic data and we provide a list with guidelines for its application to real spectroscopic surveys. (Less)
Popular Abstract
Contrary to the popular belief that most stars are singles, around half of the stars we see in the galaxy are actually found in pairs called binary systems or simply ”binaries”. Their binary nature can be discovered or inferred in many different ways, such as through eclipses that occur when one star of the pair passes in front of the other, or through the characteristic features and behavior of their combined spectrum. Moreover, binaries play a major role in astrophysics. They offer scientists an insight into crucial stellar processes as well as enable accurate measurements of fundamental stellar parameters such as mass and radius through the gravitational interaction between the two components of the system.

In recent years, stellar... (More)
Contrary to the popular belief that most stars are singles, around half of the stars we see in the galaxy are actually found in pairs called binary systems or simply ”binaries”. Their binary nature can be discovered or inferred in many different ways, such as through eclipses that occur when one star of the pair passes in front of the other, or through the characteristic features and behavior of their combined spectrum. Moreover, binaries play a major role in astrophysics. They offer scientists an insight into crucial stellar processes as well as enable accurate measurements of fundamental stellar parameters such as mass and radius through the gravitational interaction between the two components of the system.

In recent years, stellar surveys have increased exponentially in complexity and amount of stars observed, and while binaries have been shown to be abundant they are often missed in observational data. The reason for this is that in order to reveal their true nature, binary stars require a special handling besides that given by traditional methods for the analysis of stellar data. This can, in most cases, be quite time consuming. However, new approaches for discovery and characterization of binary stars have been made possible by advances in the field of machine learning and the increase of computational power. Machine learning is the name given to a set of algorithms and statistical tools used by computers to extract information from data by recognizing patterns without being explicitly programmed to do so. With it, it is possible to not only examine and study the large amounts of new data gathered by stellar surveys, but also ”revisit” older data-sets in order to extract insights and patterns that were overlooked in the past.

In this project we will try to address the discovery eciency of binary stars in an archetypical spectroscopic survey when using machine learning algorithms. By generating different types of spectra ourselves, we create a mock spectroscopic survey data-set for which the distribution of stellar parameters and the amount of binary stars are known. Unlike in real surveys, by using self-generated data the nature of each star is known beforehand. This allows us to evaluate a series of machine learning algorithms with respect to its own input parameters and the ranges of stellar parameters present in the generated data. With this evaluation, we want to determine and constrain the efficiency and limits of the used method regarding their efficiency discovering and detecting binary stars.

Our aim is to generate an automated method capable of maximizing the recovery and detection of binary systems from real spectroscopic data while being scalable and applicable to future surveys. To achieve such a goal, we combined two well-known and readily available machine learning algorithms for the automatic analysis of spectroscopic data and the retrieval of binary stars from it. One algorithm is called t-SNE, which is used to project the data onto a plane, grouping objects that are similar in clusters and separating those that are dissimilar. The groups of data-points representing spectra created by t-SNE are then automatically recovered regardless of their morphology by the second machine learning algorithm we use, which is called DBSCAN.

The combination of t-SNE and DBSCAN, whose individual implementation has been carefully chosen to minimize the computation time, allowed us to obtain results that are easy to implement and understand. Results from our study are promising, showing a mean recovery of 57%, averaged over all the 360 simulations we carried. We find that the presence of moderate noise levels in the studied spectra can help improving the detection of spectroscopic binaries, as it can smear out information from it that might throw off the machine learning analysis. Furthermore, we show that bluer spectral regions (between 450 and 650 nm) are better suited than those in redder parts of the spectrum (between 650 and 900 nm) due to the increased amount of information in the form of spectral lines present in the analyzed spectroscopic data. In the end, we provide a table of stellar parameters for binary stars that were contained in our synthetic sample and which were successfully identified in more than 90% of our simulations and which can serve as a guide for future implementations of our method. (Less)
Please use this url to cite or link to this publication:
author
Navarro Barrachina, Pablo LU
supervisor
organization
course
ASTM31 20201
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Binary Stars, Spectroscopic binary stars, SB2, Machine Learning, Data Mining
publication/series
Lund Observatory Examensarbeten
report number
2020-EXA158
language
English
id
9012286
date added to LUP
2020-06-26 14:34:48
date last changed
2020-06-26 14:34:48
@misc{9012286,
  abstract     = {{Purpose: even in the era of exponential increase in the amount of stellar data gathered, binaries are still often overlooked in observational data due to the special handling they require. The goal of this work is to develop a method capable of automatically and efficiently identifying and extract double-lined spectroscopic binaries (SB2) from a spectroscopic survey, while being scalable and technically successful, and to identify and optimize the parameters that influence their detection.

Method: we combine two state-of-the-art machine learning algorithms that group the spec- tra in the data-set in clusters based on their similarities, projecting them in a human readable manner (t-distributed Stochastic Neighbor Embedding, t-SNE), and automatically identify and retrieve those clusters that contain binary spectra (Density Based Spacial Clustering of Applica- tions with Noise, DBSCAN). These methods are then optimized for efficient recovery of binaries from a synthetic spectroscopic data-set, where we know exactly which stars are single and which are binaries.

Results: we study the results following from 360 combinations of our method’s parameters and obtain a total average of recovered binaries of 57%. We show that under optimal conditions we are able to reach a recovery of 75%. We find that bluer spectral regions (450 nm - 600 nm) are better suited to identify binary stars than redder regions (600 nm - 900 nm) with our method. Not only this, but we also show that a moderate amount of noise can be beneficial and can improve the recovery of binary stars. Furthermore, we find that the stellar parameters that most influence the final recovery are the luminosity (or mass) ratio and the radial velocity different between the two stellar components of the binary system, while some standard stellar parameters can play a major role as well.

Conclusions: we show that our method and the adopted combination of machine learning algorithms to be successful at automatically detect and retrieve binary stars from our synthetic spectroscopic data and we provide a list with guidelines for its application to real spectroscopic surveys.}},
  author       = {{Navarro Barrachina, Pablo}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Lund Observatory Examensarbeten}},
  title        = {{Efficient Discovery Of Binary Stars}},
  year         = {{2020}},
}