Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Analysis of stellar spectra with machine learning

Segovia Otero, Alvaro LU (2020) In Lund Observatory Examensarbeten ASTM31 20201
Lund Observatory - Undergoing reorganization
Department of Astronomy and Theoretical Physics - Undergoing reorganization
Abstract
Researchers in the field of Galactic Archaeology have entered the era of industrial revolution. Upcoming surveys are planning on observing tens of millions of stars and high precision and accuracy must be ensured when deriving their stellar parameters and elemental abundances. Unconventional data-driven techniques hold the promise of efficiently dealing with these vast collections of data while still rendering results of astrophysical value.

The Cannon is a supervised machine learning algorithm implemented to transfer stellar properties or labels from a dataset of reference to any desired collection of stars. In this thesis, The Cannon is trained on a set of synthetic spectra generated ab initio and applied to a sub-set of 1410 FGK-type... (More)
Researchers in the field of Galactic Archaeology have entered the era of industrial revolution. Upcoming surveys are planning on observing tens of millions of stars and high precision and accuracy must be ensured when deriving their stellar parameters and elemental abundances. Unconventional data-driven techniques hold the promise of efficiently dealing with these vast collections of data while still rendering results of astrophysical value.

The Cannon is a supervised machine learning algorithm implemented to transfer stellar properties or labels from a dataset of reference to any desired collection of stars. In this thesis, The Cannon is trained on a set of synthetic spectra generated ab initio and applied to a sub-set of 1410 FGK-type stars from the Gaia-ESO Survey for a label space of high dimensionality (Teff , log g, vmic, v sin i and 16 [X/H] abundances, where X is Mg, Na, Ca, Sc, Si, V, Ti, Mn, Fe, Ni, Cr, Co, Ba, Eu, O and Al). The aforementioned synthetic training set does not represent a grid of synthetic spectra or a sub-sample of stars with well studied properties. Instead, we have designed a sophisticated training set predominantly based on the Bensby catalogue of 714 stars with well measured stellar parameters and elemental abundances.

The Cannon is indeed very fast, taking an average time of 15 seconds to simultaneously fit 20 labels on one single spectrum after having trained on the model. It succeeds in recovering the Teff , log g and [Fe/H] stellar parameters with typical deviations of sigma_[Fe/H] = 0.08 dex,
sigma_Teff = 88 K and sigma_log g = 0.14 dex in the label offsets with respect to the GES values, as well as determine 15 elemental abundances within a SNR range spanning from 10 to 300. (Less)
Popular Abstract
Astronomers are time travellers and stars are their time capsules. Such a statement is particularly symbolic in the field of Galactic Archaeology, where the ultimate goal is to reconstruct the history of our galaxy by studying in great detail the light radiated from its stars.

The Milky Way, our galaxy, is a mixture of interacting gas and stars bound together by gravity, and so, its evolution goes hand in hand with the evolution of these two components. A key point here is the idea of chemical enrichment. Primordial gas is mainly composed of hydrogen and helium, but as stars form, heavier chemical elements are produced in their interiors and injected into the surrounding gas at death. This metal-rich gas will further clog up into giant... (More)
Astronomers are time travellers and stars are their time capsules. Such a statement is particularly symbolic in the field of Galactic Archaeology, where the ultimate goal is to reconstruct the history of our galaxy by studying in great detail the light radiated from its stars.

The Milky Way, our galaxy, is a mixture of interacting gas and stars bound together by gravity, and so, its evolution goes hand in hand with the evolution of these two components. A key point here is the idea of chemical enrichment. Primordial gas is mainly composed of hydrogen and helium, but as stars form, heavier chemical elements are produced in their interiors and injected into the surrounding gas at death. This metal-rich gas will further clog up into giant clouds, collapse and cool down to produce a new generation of stars re-initiating the cycle. In addition to this, during its lifetime, the Milky Way has collided with other galaxies in its vicinity. This makes different enrichment channels even more complex to unravel, as gas and stars get violently mixed and scattered around. The result is a plethora of stellar populations with distinct dynamical and chemical imprints, some of which are preserved in stellar atmospheres and accessible by state-of-the-art observatories.

Our current understanding of the Milky Way classifies it as a spiral galaxy, with a distinguishable disk in which the Solar System is embedded. It is in this Galactic disk where most of the gas and stars are located, allowing us to pursue extensive and precise surveys of the properties of millions of these stars to disentangle the formation and evolution of the Milky Way. Among these observable properties we find the temperature of the stellar atmosphere, related to how bright its surface is, the surface gravity, which gives us an idea of its pressure structure, and the metallicity of the star, representing the abundance of heavier elements blocking the radiation emitted from the stellar atmosphere towards our telescope. These three main stellar parameters along with other element abundances fix the shape of stellar spectra, i.e. "stellar IDs" containing information about the chemistry and dynamics of stars.

At this point one might imagine the computational and human efforts required to analyse data from these large stellar surveys characterising millions of objects. Designed methodologies must therefore be effectively automatised. They need to ensure both efficiency when examining stellar spectra for such large numbers of observed objects, but also high accuracy in order to obtain precise enough results to distinguish among the various chemical enrichment channels. Another significant concern in the study of these massive catalogues are the theoretical assumptions made when modelling the transferred radiation through stellar atmospheres. Stellar properties change at different depths of the atmospheres in very convoluted ways. Only a simplified model of the real astrophysical phenomena can be constructed, sometimes in strong disagreement with one another even when the same stars are examined.

Here, an algorithm called The Cannon is implemented to cast some light upon these issues. The Cannon is a machine-learning code, meaning that it "learns" to relate spectra with stellar properties by optimizing purely mathematical functions within a dataset of reference and then applies such functions to any test set of interest. This is a data-driven approach as it does not contain any astrophysical assumptions and is extremely fast as the optimized mathematical expressions can be computationally cheap to calculate. In this work, we prove the scientific value of this method when substituting the reference set by artificially generated stellar spectra with known stellar properties and applied to a test set of 1410 stars observed with a high resolution instrument at the Very Large Telescope (VLT) of the European Southern Observatory (ESO, Chile). (Less)
Please use this url to cite or link to this publication:
author
Segovia Otero, Alvaro LU
supervisor
organization
course
ASTM31 20201
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Galactic Archaeology, machine learning, chemical abundances, FGK stars
publication/series
Lund Observatory Examensarbeten
report number
2020-EXA159
language
English
id
9027985
date added to LUP
2020-09-07 17:04:36
date last changed
2020-09-07 17:04:36
@misc{9027985,
  abstract     = {{Researchers in the field of Galactic Archaeology have entered the era of industrial revolution. Upcoming surveys are planning on observing tens of millions of stars and high precision and accuracy must be ensured when deriving their stellar parameters and elemental abundances. Unconventional data-driven techniques hold the promise of efficiently dealing with these vast collections of data while still rendering results of astrophysical value.

The Cannon is a supervised machine learning algorithm implemented to transfer stellar properties or labels from a dataset of reference to any desired collection of stars. In this thesis, The Cannon is trained on a set of synthetic spectra generated ab initio and applied to a sub-set of 1410 FGK-type stars from the Gaia-ESO Survey for a label space of high dimensionality (Teff , log g, vmic, v sin i and 16 [X/H] abundances, where X is Mg, Na, Ca, Sc, Si, V, Ti, Mn, Fe, Ni, Cr, Co, Ba, Eu, O and Al). The aforementioned synthetic training set does not represent a grid of synthetic spectra or a sub-sample of stars with well studied properties. Instead, we have designed a sophisticated training set predominantly based on the Bensby catalogue of 714 stars with well measured stellar parameters and elemental abundances.

The Cannon is indeed very fast, taking an average time of 15 seconds to simultaneously fit 20 labels on one single spectrum after having trained on the model. It succeeds in recovering the Teff , log g and [Fe/H] stellar parameters with typical deviations of sigma_[Fe/H] = 0.08 dex,
sigma_Teff = 88 K and sigma_log g = 0.14 dex in the label offsets with respect to the GES values, as well as determine 15 elemental abundances within a SNR range spanning from 10 to 300.}},
  author       = {{Segovia Otero, Alvaro}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Lund Observatory Examensarbeten}},
  title        = {{Analysis of stellar spectra with machine learning}},
  year         = {{2020}},
}