Joint Handwritten Text Recognition and Word Classification for Tabular Information Extraction

Blomqvist, Christopher; Enflo, Kerstin; Jakobsson, Andreas; Åström, Kalle (2022-11-29). Joint Handwritten Text Recognition and Word Classification for Tabular Information Extraction 2022 26th International Conference on Pattern Recognition (ICPR), 1564 - 1570. 26TH International Conference on Pattern Recognition, 2022. Montreal, Canada: IEEE - Institute of Electrical and Electronics Engineers Inc.
Download:
DOI:
Conference Proceeding/Paper | Published | English
Authors:
Blomqvist, Christopher ; Enflo, Kerstin ; Jakobsson, Andreas ; Åström, Kalle
Department:
Department of Economic History
Growth, technological change, and inequality
LTH Profile Area: AI and Digitalization
eSSENCE: The e-Science Collaboration
Mathematical Statistics
Biomedical Modelling and Computation
Statistical Signal Processing Group
Stroke Imaging Research group
Mathematics (Faculty of Engineering)
ELLIIT: the Linköping-Lund initiative on IT and mobile communication
Mathematical Imaging Group
Project:
Praise the people or praise the place: How culture and specialization drive long-term regional growth
Research Group:
Biomedical Modelling and Computation
Statistical Signal Processing Group
Stroke Imaging Research group
Mathematical Imaging Group
Abstract:
In this paper, we present a system for extracting tabular information from loosely structured handwritten documents. The system consists of three parts, (i) a u-net like CNN-based method for text detection and segmentation, (ii) a new attention-based method for simultaneous text recognition and classification of word-parts, and (iii) a method for matching the word parts into a tabular structure for each entry. A key contribution is the observation that the new attention-based recognition and classification module makes it possible for improved spatial analysis of the tabular information. The method is evaluated on a unique historical document: The Swedish Wealth Tax of 1571, consisting of 11,453 pages of hand-written tax records. The evaluation shows that the system provides a significant improvement to the state-of-the-art to the problem of tabular extraction from loosely structured historical documents.
Keywords:
Histograms ; Image segmentation ; Text recognition ; Finance ; Writing ; Information retrieval ; Decoding ; Computer Vision and Robotics (Autonomous Systems) ; Economic History
ISBN:
978-1-6654-9063-4
LUP-ID:
b5f50e29-597f-474b-b687-ab45f476d11d | Link: https://lup.lub.lu.se/record/b5f50e29-597f-474b-b687-ab45f476d11d | Statistics

Cite this