Semantic Similarity Analysis on English Translations of the Iliad
(2024) DABN01 20241Department of Economics
Department of Statistics
- Abstract
- Studying translations gives us more insight into cultures and languages. Machine Translation is an application area of the field Natural Language Processing (NLP), used to transfer information from one language to another. Creating these tools require a lot of data, including data about the semantic relationships of the texts, and for unspoken languages like Ancient Greek, there does not exist a lot of (digital) data. In this study, we explore 16 different English translations of the first book of the Iliad, an Ancient Greek epic seen as one of the most influential literary works on modern western literature. We use three different algorithms (GloVe, Word2Vec, and BERT) to create document embeddings for each translation. We then analyse... (More)
- Studying translations gives us more insight into cultures and languages. Machine Translation is an application area of the field Natural Language Processing (NLP), used to transfer information from one language to another. Creating these tools require a lot of data, including data about the semantic relationships of the texts, and for unspoken languages like Ancient Greek, there does not exist a lot of (digital) data. In this study, we explore 16 different English translations of the first book of the Iliad, an Ancient Greek epic seen as one of the most influential literary works on modern western literature. We use three different algorithms (GloVe, Word2Vec, and BERT) to create document embeddings for each translation. We then analyse how three features (publication year, genre, name versions) influence the cosine similarity scores between the documents. We also use hierarchical clustering to group the translations together without needed a pre-determined number of clusters, to see how the full document embeddings relate to each other. We find that the publication year does not have a significant influence on the similarity scores, but the genre and name versions do seem to have a significant influence. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9156845
- author
- Bijkerk, Maria LU
- supervisor
- organization
- course
- DABN01 20241
- year
- 2024
- type
- H1 - Master's Degree (One Year)
- subject
- keywords
- natural language processing, textual analysis, document embedding, multidimensional scaling, hierarchical clustering
- language
- English
- id
- 9156845
- date added to LUP
- 2024-09-24 08:32:33
- date last changed
- 2024-09-24 08:32:33
@misc{9156845, abstract = {{Studying translations gives us more insight into cultures and languages. Machine Translation is an application area of the field Natural Language Processing (NLP), used to transfer information from one language to another. Creating these tools require a lot of data, including data about the semantic relationships of the texts, and for unspoken languages like Ancient Greek, there does not exist a lot of (digital) data. In this study, we explore 16 different English translations of the first book of the Iliad, an Ancient Greek epic seen as one of the most influential literary works on modern western literature. We use three different algorithms (GloVe, Word2Vec, and BERT) to create document embeddings for each translation. We then analyse how three features (publication year, genre, name versions) influence the cosine similarity scores between the documents. We also use hierarchical clustering to group the translations together without needed a pre-determined number of clusters, to see how the full document embeddings relate to each other. We find that the publication year does not have a significant influence on the similarity scores, but the genre and name versions do seem to have a significant influence.}}, author = {{Bijkerk, Maria}}, language = {{eng}}, note = {{Student Paper}}, title = {{Semantic Similarity Analysis on English Translations of the Iliad}}, year = {{2024}}, }