Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Semantic Similarity Analysis on English Translations of the Iliad

Bijkerk, Maria LU (2024) DABN01 20241
Department of Economics
Department of Statistics
Abstract
Studying translations gives us more insight into cultures and languages. Machine Translation is an application area of the field Natural Language Processing (NLP), used to transfer information from one language to another. Creating these tools require a lot of data, including data about the semantic relationships of the texts, and for unspoken languages like Ancient Greek, there does not exist a lot of (digital) data. In this study, we explore 16 different English translations of the first book of the Iliad, an Ancient Greek epic seen as one of the most influential literary works on modern western literature. We use three different algorithms (GloVe, Word2Vec, and BERT) to create document embeddings for each translation. We then analyse... (More)
Studying translations gives us more insight into cultures and languages. Machine Translation is an application area of the field Natural Language Processing (NLP), used to transfer information from one language to another. Creating these tools require a lot of data, including data about the semantic relationships of the texts, and for unspoken languages like Ancient Greek, there does not exist a lot of (digital) data. In this study, we explore 16 different English translations of the first book of the Iliad, an Ancient Greek epic seen as one of the most influential literary works on modern western literature. We use three different algorithms (GloVe, Word2Vec, and BERT) to create document embeddings for each translation. We then analyse how three features (publication year, genre, name versions) influence the cosine similarity scores between the documents. We also use hierarchical clustering to group the translations together without needed a pre-determined number of clusters, to see how the full document embeddings relate to each other. We find that the publication year does not have a significant influence on the similarity scores, but the genre and name versions do seem to have a significant influence. (Less)
Please use this url to cite or link to this publication:
author
Bijkerk, Maria LU
supervisor
organization
course
DABN01 20241
year
type
H1 - Master's Degree (One Year)
subject
keywords
natural language processing, textual analysis, document embedding, multidimensional scaling, hierarchical clustering
language
English
id
9156845
date added to LUP
2024-09-24 08:32:33
date last changed
2024-09-24 08:32:33
@misc{9156845,
  abstract     = {{Studying translations gives us more insight into cultures and languages. Machine Translation is an application area of the field Natural Language Processing (NLP), used to transfer information from one language to another. Creating these tools require a lot of data, including data about the semantic relationships of the texts, and for unspoken languages like Ancient Greek, there does not exist a lot of (digital) data. In this study, we explore 16 different English translations of the first book of the Iliad, an Ancient Greek epic seen as one of the most influential literary works on modern western literature. We use three different algorithms (GloVe, Word2Vec, and BERT) to create document embeddings for each translation. We then analyse how three features (publication year, genre, name versions) influence the cosine similarity scores between the documents. We also use hierarchical clustering to group the translations together without needed a pre-determined number of clusters, to see how the full document embeddings relate to each other. We find that the publication year does not have a significant influence on the similarity scores, but the genre and name versions do seem to have a significant influence.}},
  author       = {{Bijkerk, Maria}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Semantic Similarity Analysis on English Translations of the Iliad}},
  year         = {{2024}},
}