Reading the ransom: Methodological advancements in extracting the Swedish Wealth Tax of 1571
(2023) In Explorations in Economic History 87.- Abstract
- We describe a deep learning method to read hand-written records from the 16th century. The method consists of a combination of a segmentation module and a Handwritten Text Recognition (HTR) module. The transformer-based HTR module exploits both language and image features in reading, classifying and extracting the position of each word on the page. The method is demonstrated on a unique historical document: The Swedish Wealth Tax of 1571. Results suggest that the segmentation module performs significantly better than the lay-out analysis implemented in state-of-the art programs, enabling us to trace many more text blocks correctly on each page. The HTR module has a low character error rate (CER), in addition to being able to classify words... (More)
- We describe a deep learning method to read hand-written records from the 16th century. The method consists of a combination of a segmentation module and a Handwritten Text Recognition (HTR) module. The transformer-based HTR module exploits both language and image features in reading, classifying and extracting the position of each word on the page. The method is demonstrated on a unique historical document: The Swedish Wealth Tax of 1571. Results suggest that the segmentation module performs significantly better than the lay-out analysis implemented in state-of-the art programs, enabling us to trace many more text blocks correctly on each page. The HTR module has a low character error rate (CER), in addition to being able to classify words and help organize them into tabular formats. By demonstrating an automated process to transform loosely structured handwritten information from the 16th century into organized tables, our method should interest economic historians seeking to digitize and organize quantitative material from pre-industrial periods. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/89e2ef25-8626-40da-9167-db8b5ec8fe29
- author
- Blomqvist, Christopher ; Enflo, Kerstin LU ; Jakobsson, Andreas LU and Åström, Kalle LU
- organization
-
- Department of Economic History
- Growth, technological change, and inequality
- LTH Profile Area: AI and Digitalization
- LTH Profile Area: Engineering Health
- eSSENCE: The e-Science Collaboration
- Mathematical Statistics
- Biomedical Modelling and Computation (research group)
- Statistical Signal Processing Group (research group)
- Stroke Imaging Research group (research group)
- Mathematics (Faculty of Engineering)
- ELLIIT: the Linköping-Lund initiative on IT and mobile communication
- Mathematical Imaging Group (research group)
- publishing date
- 2023
- type
- Contribution to journal
- publication status
- published
- subject
- in
- Explorations in Economic History
- volume
- 87
- article number
- 101470
- publisher
- Elsevier
- external identifiers
-
- scopus:85135155196
- ISSN
- 0014-4983
- DOI
- 10.1016/j.eeh.2022.101470
- project
- Praise the people or praise the place: How culture and specialization drive long-term regional growth
- language
- English
- LU publication?
- yes
- id
- 89e2ef25-8626-40da-9167-db8b5ec8fe29
- date added to LUP
- 2022-08-15 13:42:35
- date last changed
- 2023-11-21 10:21:34
@article{89e2ef25-8626-40da-9167-db8b5ec8fe29, abstract = {{We describe a deep learning method to read hand-written records from the 16th century. The method consists of a combination of a segmentation module and a Handwritten Text Recognition (HTR) module. The transformer-based HTR module exploits both language and image features in reading, classifying and extracting the position of each word on the page. The method is demonstrated on a unique historical document: The Swedish Wealth Tax of 1571. Results suggest that the segmentation module performs significantly better than the lay-out analysis implemented in state-of-the art programs, enabling us to trace many more text blocks correctly on each page. The HTR module has a low character error rate (CER), in addition to being able to classify words and help organize them into tabular formats. By demonstrating an automated process to transform loosely structured handwritten information from the 16th century into organized tables, our method should interest economic historians seeking to digitize and organize quantitative material from pre-industrial periods.}}, author = {{Blomqvist, Christopher and Enflo, Kerstin and Jakobsson, Andreas and Åström, Kalle}}, issn = {{0014-4983}}, language = {{eng}}, publisher = {{Elsevier}}, series = {{Explorations in Economic History}}, title = {{Reading the ransom: Methodological advancements in extracting the Swedish Wealth Tax of 1571}}, url = {{http://dx.doi.org/10.1016/j.eeh.2022.101470}}, doi = {{10.1016/j.eeh.2022.101470}}, volume = {{87}}, year = {{2023}}, }