Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Från chiffer till klartext? : Temamodellering av statliga offentliga utredningar 1945–1989

Snickars, Pelle LU (2022) In Scandia 88(1).
Abstract
In 2015 the National Library of Sweden finished digitising all Governmental Official Reports (SOU) from 1922 to 1999. Traditionally, SOU reports – and work performed within different governmental committees – were tasked with preparing the Swedish government for apt and rational decision-making. The range of subjects covered by governmental committees and SOU reports basically includes every area of the Swedish welfare state, from issues focused on migration and the environment to cultural and media policy.

The article departs from an analysis of all SOU reports during 1945–89 as one massive dataset; in all 3,154 SOU reports that contain 87 million tokens. Research has been performed within a Jupyter Lab environment, a web... (More)
In 2015 the National Library of Sweden finished digitising all Governmental Official Reports (SOU) from 1922 to 1999. Traditionally, SOU reports – and work performed within different governmental committees – were tasked with preparing the Swedish government for apt and rational decision-making. The range of subjects covered by governmental committees and SOU reports basically includes every area of the Swedish welfare state, from issues focused on migration and the environment to cultural and media policy.

The article departs from an analysis of all SOU reports during 1945–89 as one massive dataset; in all 3,154 SOU reports that contain 87 million tokens. Research has been performed within a Jupyter Lab environment, a web application with executable Python code that can be run to perform data analysis. The Jupyter Lab environment has been developed at the digital humanities hub Humlab at Umeå University, and research is related to the project Welfare State Analytics: Text Mining and Modelling Swedish Politics, Media & Culture, 1945–89. This is a digital humanities and digital history project that will digitise literature, curate already digitised collections, and perform research via probabilistic methods and text mining models.

If all SOU reports were to be considered one single text written by the state, which themes in this vast text is software able to read and perceive? It is possible to answer such a broad question by way of topic modelling, a computational method to study themes in texts by accentuating words that tend to co-occur and together create different topics. Via co-occurrence, topic modelling creates topics in the form of clusters of similar words (topics); a term or a word may be a part of several topics with different degrees of probability. Topics also occur in relation to each other, and clusters and networks can be visualised by using software such as Gephi.

The article focuses on topics related to media and media policy. Depending on how many topics a topic model displays – in the article models of 50, 100, 200, and 500 topics are used – different media topics can be detected. In the 50 model, one media topic was found, whereas in the 500 model, there were several, with more specific traits such as film censorship or daily press subsidies. One finding is that film was the single medium to which the SOU genre between 1945–89 devoted the most attention. Another finding is that archival issues were closely linked to media topics during the same period. Governmental committees and SOU reports on media were primarily focused on future-oriented policies, above all how media should be supported or regulated. Yet, archiving the same media forms was also something that the state was repeatedly interested in.

In conclusion, the article explains what topic modelling is in general, how the method can be used in digital historical research – not least in relation to close reading – and how statistical analysis of the distribution of words in the form of topics can generate interesting results. The SOU data is rich; topics can be traced with many different themes. As a researcher, however, one must learn to work with data: to load different models into the Jupyter Lab environment, to compute various input values, change parameters, and often curate outcomes in a way that differs from traditional historical research practices. (Less)
Abstract (Swedish)
Denna artikel innehåller en analys av samtliga rapporter från Statens offentliga utredningar (SOU), 1945-89. Analysen applicerar metoder inom digital humaniora och inkluderar forskning gjort i en Jupyterlabbmiljö som utvecklats på Humlab på Umeå universitet. Artikeln är ett resultat av forskningsprojektet Välfärdsstaten analyserad, som ämnar att digitalisera litteratur, arbetar med datasäkring av redan digitaliserade samlingar, med mera.
Please use this url to cite or link to this publication:
author
alternative title
From cipher to plain text? : Topic modelling Swedish governmental reports, 1945–1989
publishing date
type
Contribution to journal
publication status
published
subject
keywords
digital humaniora, digital historia, temamodellering, media historia, Statens offentliga utredningar (SOU), digital humanities, digital history, topic modelling, media history, Swedish Governmental Official Reports (SOU)
in
Scandia
volume
88
issue
1
publisher
Stiftelsen Scandia
external identifiers
  • scopus:85135162255
ISSN
0036-5483
DOI
10.47868/scandia.v88i1.24206
project
Welfare State Analytics. Text Mining and Modeling Swedish Politics, Media & Culture, 1945-1989
language
Swedish
LU publication?
no
id
447c30fe-ff44-48e7-bac4-6d380b954406
date added to LUP
2022-05-30 11:31:35
date last changed
2022-09-20 04:03:35
@article{447c30fe-ff44-48e7-bac4-6d380b954406,
  abstract     = {{In 2015 the National Library of Sweden finished digitising all Governmental Official Reports (SOU) from 1922 to 1999. Traditionally, SOU reports – and work performed within different governmental committees – were tasked with preparing the Swedish government for apt and rational decision-making. The range of subjects covered by governmental committees and SOU reports basically includes every area of the Swedish welfare state, from issues focused on migration and the environment to cultural and media policy.<br/><br/>The article departs from an analysis of all SOU reports during 1945–89 as one massive dataset; in all 3,154 SOU reports that contain 87 million tokens. Research has been performed within a Jupyter Lab environment, a web application with executable Python code that can be run to perform data analysis. The Jupyter Lab environment has been developed at the digital humanities hub Humlab at Umeå University, and research is related to the project Welfare State Analytics: Text Mining and Modelling Swedish Politics, Media &amp; Culture, 1945–89. This is a digital humanities and digital history project that will digitise literature, curate already digitised collections, and perform research via probabilistic methods and text mining models.<br/><br/>If all SOU reports were to be considered one single text written by the state, which themes in this vast text is software able to read and perceive? It is possible to answer such a broad question by way of topic modelling, a computational method to study themes in texts by accentuating words that tend to co-occur and together create different topics. Via co-occurrence, topic modelling creates topics in the form of clusters of similar words (topics); a term or a word may be a part of several topics with different degrees of probability. Topics also occur in relation to each other, and clusters and networks can be visualised by using software such as Gephi.<br/><br/>The article focuses on topics related to media and media policy. Depending on how many topics a topic model displays – in the article models of 50, 100, 200, and 500 topics are used – different media topics can be detected. In the 50 model, one media topic was found, whereas in the 500 model, there were several, with more specific traits such as film censorship or daily press subsidies. One finding is that film was the single medium to which the SOU genre between 1945–89 devoted the most attention. Another finding is that archival issues were closely linked to media topics during the same period. Governmental committees and SOU reports on media were primarily focused on future-oriented policies, above all how media should be supported or regulated. Yet, archiving the same media forms was also something that the state was repeatedly interested in.<br/><br/>In conclusion, the article explains what topic modelling is in general, how the method can be used in digital historical research – not least in relation to close reading – and how statistical analysis of the distribution of words in the form of topics can generate interesting results. The SOU data is rich; topics can be traced with many different themes. As a researcher, however, one must learn to work with data: to load different models into the Jupyter Lab environment, to compute various input values, change parameters, and often curate outcomes in a way that differs from traditional historical research practices.}},
  author       = {{Snickars, Pelle}},
  issn         = {{0036-5483}},
  keywords     = {{digital humaniora; digital historia; temamodellering; media historia; Statens offentliga utredningar (SOU); digital humanities; digital history; topic modelling; media history; Swedish Governmental Official Reports (SOU)}},
  language     = {{swe}},
  number       = {{1}},
  publisher    = {{Stiftelsen Scandia}},
  series       = {{Scandia}},
  title        = {{Från chiffer till klartext? : Temamodellering av statliga offentliga utredningar 1945–1989}},
  url          = {{http://dx.doi.org/10.47868/scandia.v88i1.24206}},
  doi          = {{10.47868/scandia.v88i1.24206}},
  volume       = {{88}},
  year         = {{2022}},
}