Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Improved distance measures for “mixed-content miscellanies” : An adaptation for the collections of sayings of the desert fathers and mothers

Göransson, Elisabet LU orcid ; Maurits, Luke ; Dahlman, Britt LU orcid ; Sarkisian, Karine Åkerman LU ; Rubenson, Samuel LU and Dunn, Michael (2023) In Digital Scholarship in the Humanities 38(1). p.127-150
Abstract
Collections of sayings of the desert fathers and mothers are extant in manuscripts in many languages and are organized differently. They are ‘fixed-content miscellanies’ (FCM): they include material that belongs to the same genre, but is variable both when it comes to appearance and order. Distance measurement methods are particularly suitable for large text traditions including variable content in the so-called mixed-content miscellanies, such as recipes, anthological compilations of shorter text passages, or catalogues, but can also be suitable for text genres like collections of sayings, that are equally variable in appearance and order of sayings, even though the genre is fixed; hence ‘fixed-content miscellanies’. In the article,... (More)
Collections of sayings of the desert fathers and mothers are extant in manuscripts in many languages and are organized differently. They are ‘fixed-content miscellanies’ (FCM): they include material that belongs to the same genre, but is variable both when it comes to appearance and order. Distance measurement methods are particularly suitable for large text traditions including variable content in the so-called mixed-content miscellanies, such as recipes, anthological compilations of shorter text passages, or catalogues, but can also be suitable for text genres like collections of sayings, that are equally variable in appearance and order of sayings, even though the genre is fixed; hence ‘fixed-content miscellanies’. In the article, collections of sayings in seven languages were compared using four distance measures methods. Each segment of the sayings was given a unique id to be comparable. The first method used, the Jaccard distance measure, disregards the linear order of items and instead considers each collection compared only as a ‘bag of stories’. In two other methods used (Birnbaum and Levenshtein methods), the order in which the narratives of each saying appear is compared. All three methods yielded interesting results, but the collections that were apparently closely related were clustered together so tightly that it was not possible to make more nuanced analyses. In order to remove false negatives, particulars concerning lacunes in the material were taken into account in the proposed modified Levenshtein method, the fixed-content miscellanies (FCM)-Levenshtein method. By applying the FCM-Levenshtein method, previously unknown relations between collections witnessed in different languages could be detected. (Less)
Please use this url to cite or link to this publication:
author
; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Digital Scholarship in the Humanities
volume
38
issue
1
pages
24 pages
publisher
Oxford University Press
ISSN
2055-7671
DOI
10.1093/llc/fqac025
project
Cultural evolution of texts
language
English
LU publication?
yes
id
917fd111-3747-47b1-8db9-511daa609431
date added to LUP
2021-06-02 19:05:19
date last changed
2023-06-22 14:15:52
@article{917fd111-3747-47b1-8db9-511daa609431,
  abstract     = {{Collections of sayings of the desert fathers and mothers are extant in manuscripts in many languages and are organized differently. They are ‘fixed-content miscellanies’ (FCM): they include material that belongs to the same genre, but is variable both when it comes to appearance and order. Distance measurement methods are particularly suitable for large text traditions including variable content in the so-called mixed-content miscellanies, such as recipes, anthological compilations of shorter text passages, or catalogues, but can also be suitable for text genres like collections of sayings, that are equally variable in appearance and order of sayings, even though the genre is fixed; hence ‘fixed-content miscellanies’. In the article, collections of sayings in seven languages were compared using four distance measures methods. Each segment of the sayings was given a unique id to be comparable. The first method used, the Jaccard distance measure, disregards the linear order of items and instead considers each collection compared only as a ‘bag of stories’. In two other methods used (Birnbaum and Levenshtein methods), the order in which the narratives of each saying appear is compared. All three methods yielded interesting results, but the collections that were apparently closely related were clustered together so tightly that it was not possible to make more nuanced analyses. In order to remove false negatives, particulars concerning lacunes in the material were taken into account in the proposed modified Levenshtein method, the fixed-content miscellanies (FCM)-Levenshtein method. By applying the FCM-Levenshtein method, previously unknown relations between collections witnessed in different languages could be detected.}},
  author       = {{Göransson, Elisabet and Maurits, Luke and Dahlman, Britt and Sarkisian, Karine Åkerman and Rubenson, Samuel and Dunn, Michael}},
  issn         = {{2055-7671}},
  language     = {{eng}},
  number       = {{1}},
  pages        = {{127--150}},
  publisher    = {{Oxford University Press}},
  series       = {{Digital Scholarship in the Humanities}},
  title        = {{Improved distance measures for “mixed-content miscellanies” : An adaptation for the collections of sayings of the desert fathers and mothers}},
  url          = {{http://dx.doi.org/10.1093/llc/fqac025}},
  doi          = {{10.1093/llc/fqac025}},
  volume       = {{38}},
  year         = {{2023}},
}