Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Record linkage in the Cape of Good Hope Panel

Rijpma, Auke ; Cilliers, Jeanne LU and Fourie, Johan (2020) In Historical Methods 53(2). p.112-129
Abstract

In this article, we describe the record linkage procedure to create a panel from Cape Colony census returns, or opgaafrolle, for 1787–1828, a dataset of 42,354 household-level observations. Based on a subset of manually linked records, we first evaluate statistical models and deterministic algorithms to best identify and match households over time. By using household-level characteristics in the linking process and near-annual data, we are able to create high-quality links for 84% of the dataset. We compare basic analyses on the linked panel dataset to the original cross-sectional data, evaluate the feasibility of the strategy when linking to supplementary sources, and discuss the scalability of our approach to the full Cape panel.

Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Census, machine learning, micro-data, panel data, record linkage, South Africa
in
Historical Methods
volume
53
issue
2
pages
18 pages
publisher
Heldref Publications
external identifiers
  • scopus:85061778651
ISSN
0161-5440
DOI
10.1080/01615440.2018.1517030
project
The Cape of the Good Hope Panel: Long-term studies of growth, inequality and labour coercion in the global south
language
English
LU publication?
yes
id
8fa82bee-43d2-4994-a3bd-eba9935092ec
date added to LUP
2019-03-04 11:59:15
date last changed
2022-04-25 21:26:33
@article{8fa82bee-43d2-4994-a3bd-eba9935092ec,
  abstract     = {{<p>In this article, we describe the record linkage procedure to create a panel from Cape Colony census returns, or opgaafrolle, for 1787–1828, a dataset of 42,354 household-level observations. Based on a subset of manually linked records, we first evaluate statistical models and deterministic algorithms to best identify and match households over time. By using household-level characteristics in the linking process and near-annual data, we are able to create high-quality links for 84% of the dataset. We compare basic analyses on the linked panel dataset to the original cross-sectional data, evaluate the feasibility of the strategy when linking to supplementary sources, and discuss the scalability of our approach to the full Cape panel.</p>}},
  author       = {{Rijpma, Auke and Cilliers, Jeanne and Fourie, Johan}},
  issn         = {{0161-5440}},
  keywords     = {{Census; machine learning; micro-data; panel data; record linkage; South Africa}},
  language     = {{eng}},
  number       = {{2}},
  pages        = {{112--129}},
  publisher    = {{Heldref Publications}},
  series       = {{Historical Methods}},
  title        = {{Record linkage in the Cape of Good Hope Panel}},
  url          = {{http://dx.doi.org/10.1080/01615440.2018.1517030}},
  doi          = {{10.1080/01615440.2018.1517030}},
  volume       = {{53}},
  year         = {{2020}},
}