Improved distance measures for “mixed-content miscellanies” : An adaptation for the collections of sayings of the desert fathers and mothers
(2023) In Digital Scholarship in the Humanities 38(1). p.127-150- Abstract
- Collections of sayings of the desert fathers and mothers are extant in manuscripts in many languages and are organized differently. They are ‘fixed-content miscellanies’ (FCM): they include material that belongs to the same genre, but is variable both when it comes to appearance and order. Distance measurement methods are particularly suitable for large text traditions including variable content in the so-called mixed-content miscellanies, such as recipes, anthological compilations of shorter text passages, or catalogues, but can also be suitable for text genres like collections of sayings, that are equally variable in appearance and order of sayings, even though the genre is fixed; hence ‘fixed-content miscellanies’. In the article,... (More)
- Collections of sayings of the desert fathers and mothers are extant in manuscripts in many languages and are organized differently. They are ‘fixed-content miscellanies’ (FCM): they include material that belongs to the same genre, but is variable both when it comes to appearance and order. Distance measurement methods are particularly suitable for large text traditions including variable content in the so-called mixed-content miscellanies, such as recipes, anthological compilations of shorter text passages, or catalogues, but can also be suitable for text genres like collections of sayings, that are equally variable in appearance and order of sayings, even though the genre is fixed; hence ‘fixed-content miscellanies’. In the article, collections of sayings in seven languages were compared using four distance measures methods. Each segment of the sayings was given a unique id to be comparable. The first method used, the Jaccard distance measure, disregards the linear order of items and instead considers each collection compared only as a ‘bag of stories’. In two other methods used (Birnbaum and Levenshtein methods), the order in which the narratives of each saying appear is compared. All three methods yielded interesting results, but the collections that were apparently closely related were clustered together so tightly that it was not possible to make more nuanced analyses. In order to remove false negatives, particulars concerning lacunes in the material were taken into account in the proposed modified Levenshtein method, the fixed-content miscellanies (FCM)-Levenshtein method. By applying the FCM-Levenshtein method, previously unknown relations between collections witnessed in different languages could be detected. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/917fd111-3747-47b1-8db9-511daa609431
- author
- Göransson, Elisabet LU ; Maurits, Luke ; Dahlman, Britt LU ; Sarkisian, Karine Åkerman LU ; Rubenson, Samuel LU and Dunn, Michael
- organization
- publishing date
- 2023-04
- type
- Contribution to journal
- publication status
- published
- subject
- in
- Digital Scholarship in the Humanities
- volume
- 38
- issue
- 1
- pages
- 24 pages
- publisher
- Oxford University Press
- ISSN
- 2055-7671
- DOI
- 10.1093/llc/fqac025
- project
- Cultural evolution of texts
- language
- English
- LU publication?
- yes
- id
- 917fd111-3747-47b1-8db9-511daa609431
- date added to LUP
- 2021-06-02 19:05:19
- date last changed
- 2023-06-22 14:15:52
@article{917fd111-3747-47b1-8db9-511daa609431, abstract = {{Collections of sayings of the desert fathers and mothers are extant in manuscripts in many languages and are organized differently. They are ‘fixed-content miscellanies’ (FCM): they include material that belongs to the same genre, but is variable both when it comes to appearance and order. Distance measurement methods are particularly suitable for large text traditions including variable content in the so-called mixed-content miscellanies, such as recipes, anthological compilations of shorter text passages, or catalogues, but can also be suitable for text genres like collections of sayings, that are equally variable in appearance and order of sayings, even though the genre is fixed; hence ‘fixed-content miscellanies’. In the article, collections of sayings in seven languages were compared using four distance measures methods. Each segment of the sayings was given a unique id to be comparable. The first method used, the Jaccard distance measure, disregards the linear order of items and instead considers each collection compared only as a ‘bag of stories’. In two other methods used (Birnbaum and Levenshtein methods), the order in which the narratives of each saying appear is compared. All three methods yielded interesting results, but the collections that were apparently closely related were clustered together so tightly that it was not possible to make more nuanced analyses. In order to remove false negatives, particulars concerning lacunes in the material were taken into account in the proposed modified Levenshtein method, the fixed-content miscellanies (FCM)-Levenshtein method. By applying the FCM-Levenshtein method, previously unknown relations between collections witnessed in different languages could be detected.}}, author = {{Göransson, Elisabet and Maurits, Luke and Dahlman, Britt and Sarkisian, Karine Åkerman and Rubenson, Samuel and Dunn, Michael}}, issn = {{2055-7671}}, language = {{eng}}, number = {{1}}, pages = {{127--150}}, publisher = {{Oxford University Press}}, series = {{Digital Scholarship in the Humanities}}, title = {{Improved distance measures for “mixed-content miscellanies” : An adaptation for the collections of sayings of the desert fathers and mothers}}, url = {{http://dx.doi.org/10.1093/llc/fqac025}}, doi = {{10.1093/llc/fqac025}}, volume = {{38}}, year = {{2023}}, }