Characterization of the Galactic disks using t-SNE and k-means clustering
(2025) FYSK04 20251Department of Physics
Astrophysics
- Abstract
- Problem: The Milky Way disk consists of two distinct components, the thin and thick disk, each with unique kinematical, elemental, and age-related properties. Among all the properties considered, age stands out as the only one that does not exhibit significant overlap, thereby offering clearer separation between the populations. Stellar ages, determined through isochrone fitting, are reliable mainly for nearby dwarfs and subgiants, while giants, the only stars observable at greater distances, have poorly constrained ages. This motivates the search for alternative discriminators based on elemental abundances and kinematical properties.
Aims: This project aims to identify elemental and kinematical properties that effectively distinguish the... (More) - Problem: The Milky Way disk consists of two distinct components, the thin and thick disk, each with unique kinematical, elemental, and age-related properties. Among all the properties considered, age stands out as the only one that does not exhibit significant overlap, thereby offering clearer separation between the populations. Stellar ages, determined through isochrone fitting, are reliable mainly for nearby dwarfs and subgiants, while giants, the only stars observable at greater distances, have poorly constrained ages. This motivates the search for alternative discriminators based on elemental abundances and kinematical properties.
Aims: This project aims to identify elemental and kinematical properties that effectively distinguish the two disk populations.
Methods: A t-SNE dimensionality reduction approach is applied to solar-near F and G dwarf and subgiant stars from Bensby et al. (2014) and Battistini & Bensby (2015). Projections based on five input variables are generated and color-coded by stellar age, as it offers the clearest separation between thick and thin disk stars. Clusters are classified using k-means, and computations are implemented in Python within a parallelized framework for efficiency.
Conclusions and Results: The combination of t-SNE, k-means clustering, and statistical evaluation proved effective for astrophysical applications. The results are consistent with Sandell (2009), identifying [Cr/Ti] and [Ni/Ti] as useful discriminators, although [Ni/Ti] appears more affected by outliers. In contrast, based on this approach, kinematical data were found to be less effective as discriminators for the thick and thin disk. (Less) - Popular Abstract
- Spanning tens of thousands of light-years, the Milky Way disk forms the backbone of our spiral galaxy, a striking structure visible in the night sky, yet still hiding many secrets about its formation and evolution.
For over a century, scientists have searched for answers in the stars themselves, which, much like hard drives, preserve traces of the Galaxy’s past. Our Sun acts as a light bulb in a dark room, illuminating the cold, dark, and vast expanse of space, as do countless other stars in our universe. Since spectroscopy was invented in the 1800s, light from stars has provided scientists with data on the elements hiding beneath the bright coronas. Furthermore, while this light has provided clarity to what elements hide inside the... (More) - Spanning tens of thousands of light-years, the Milky Way disk forms the backbone of our spiral galaxy, a striking structure visible in the night sky, yet still hiding many secrets about its formation and evolution.
For over a century, scientists have searched for answers in the stars themselves, which, much like hard drives, preserve traces of the Galaxy’s past. Our Sun acts as a light bulb in a dark room, illuminating the cold, dark, and vast expanse of space, as do countless other stars in our universe. Since spectroscopy was invented in the 1800s, light from stars has provided scientists with data on the elements hiding beneath the bright coronas. Furthermore, while this light has provided clarity to what elements hide inside the stars, it has nonetheless introduced questions about why certain elements are evident. While stellar models of nuclear fusion processes can explain the presence of some elements, another process has been introduced in addition to nuclear fusion, namely Cosmic Recycling. In short, Cosmic recycling is a model that predicts that the elemental properties of stars are directly correlated to the evolution of the surrounding interstellar medium. This follows from the death of some massive stars yielding a supernova explosion, which supplies the interstellar medium with elements that acts as seeds used to grow new stars. Thus, stars act as time capsules, storing the history of their surroundings. Alongside spectroscopy, the Doppler effect has aided scientists in uncovering kinematical properties of stars.
Furthermore, the vertical stellar density distribution of our galaxy have led to a major discovery, that the traditional perception of the Milky Way having a singular disk is misleading. In fact, the Milky Way consists of two disks, the thin and thick disk, each with stellar populations of different elemental and kinematic characteristics. However, although evidence from the vertical distributions of stars in our galaxy has unveiled the existence of the two disks, the characteristics of the respective constituent stars are not fully clear. This is because plots of elemental and kinematical data have shown the existence of so-called ”mix-regions” for which overlapping stellar populations are found between distinct overdensities representing populations of thin and thick disk stars. Furthermore, to emphasize the historical connection, the similarity of certain properties of distinct thin and thick disk stars has led scientists to speculate on the origin of these characteristics, linking them to possible preceding galactic events, such as galactic mergers, which might have supplied our Galaxy with additional stars. This project aims to use modern machine learning, a field that has had many implementations in various disciplines in the 2000s, to clarify the characterization of the different populations of stars with respect to the two disks. By creating high-dimensional schemes of elemental and kinematical data of stars in the solar neighborhood, the project used a machine learning technique known as t-distributed stochastic neighbor embedding (t-SNE) together with k-means clustering to predict characteristic elements, and possibly kinematics, of the respective populations.
Ultimately, the goal of this project is to provide clearer insight into what distinguishes thin disk stars from thick disk stars, thereby contributing to our understanding of the Galactic history that shaped the Milky Way we observe today. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9198884
- author
- Dimkovski Gottschalk, Nicolai LU
- supervisor
- organization
- course
- FYSK04 20251
- year
- 2025
- type
- M2 - Bachelor Degree
- subject
- keywords
- thick disk, thin disk, t-SNE, stars, galaxy, k-means
- report number
- 2025-EXA246
- other publication id
- 2025-EXA246
- language
- English
- id
- 9198884
- date added to LUP
- 2025-06-18 11:57:42
- date last changed
- 2025-06-18 11:57:42
@misc{9198884, abstract = {{Problem: The Milky Way disk consists of two distinct components, the thin and thick disk, each with unique kinematical, elemental, and age-related properties. Among all the properties considered, age stands out as the only one that does not exhibit significant overlap, thereby offering clearer separation between the populations. Stellar ages, determined through isochrone fitting, are reliable mainly for nearby dwarfs and subgiants, while giants, the only stars observable at greater distances, have poorly constrained ages. This motivates the search for alternative discriminators based on elemental abundances and kinematical properties. Aims: This project aims to identify elemental and kinematical properties that effectively distinguish the two disk populations. Methods: A t-SNE dimensionality reduction approach is applied to solar-near F and G dwarf and subgiant stars from Bensby et al. (2014) and Battistini & Bensby (2015). Projections based on five input variables are generated and color-coded by stellar age, as it offers the clearest separation between thick and thin disk stars. Clusters are classified using k-means, and computations are implemented in Python within a parallelized framework for efficiency. Conclusions and Results: The combination of t-SNE, k-means clustering, and statistical evaluation proved effective for astrophysical applications. The results are consistent with Sandell (2009), identifying [Cr/Ti] and [Ni/Ti] as useful discriminators, although [Ni/Ti] appears more affected by outliers. In contrast, based on this approach, kinematical data were found to be less effective as discriminators for the thick and thin disk.}}, author = {{Dimkovski Gottschalk, Nicolai}}, language = {{eng}}, note = {{Student Paper}}, title = {{Characterization of the Galactic disks using t-SNE and k-means clustering}}, year = {{2025}}, }