Från chiffer till klartext? : Temamodellering av statliga offentliga utredningar 1945–1989

Snickars, Pelle

Från chiffer till klartext? : Temamodellering av statliga offentliga utredningar 1945–1989

Mark

Snickars, Pelle ^LU

(2022) In Scandia 88(1).

Abstract: In 2015 the National Library of Sweden finished digitising all Governmental Official Reports (SOU) from 1922 to 1999. Traditionally, SOU reports – and work performed within different governmental committees – were tasked with preparing the Swedish government for apt and rational decision-making. The range of subjects covered by governmental committees and SOU reports basically includes every area of the Swedish welfare state, from issues focused on migration and the environment to cultural and media policy.

The article departs from an analysis of all SOU reports during 1945–89 as one massive dataset; in all 3,154 SOU reports that contain 87 million tokens. Research has been performed within a Jupyter Lab environment, a web... (More); In 2015 the National Library of Sweden finished digitising all Governmental Official Reports (SOU) from 1922 to 1999. Traditionally, SOU reports – and work performed within different governmental committees – were tasked with preparing the Swedish government for apt and rational decision-making. The range of subjects covered by governmental committees and SOU reports basically includes every area of the Swedish welfare state, from issues focused on migration and the environment to cultural and media policy.

The article departs from an analysis of all SOU reports during 1945–89 as one massive dataset; in all 3,154 SOU reports that contain 87 million tokens. Research has been performed within a Jupyter Lab environment, a web application with executable Python code that can be run to perform data analysis. The Jupyter Lab environment has been developed at the digital humanities hub Humlab at Umeå University, and research is related to the project Welfare State Analytics: Text Mining and Modelling Swedish Politics, Media & Culture, 1945–89. This is a digital humanities and digital history project that will digitise literature, curate already digitised collections, and perform research via probabilistic methods and text mining models.

If all SOU reports were to be considered one single text written by the state, which themes in this vast text is software able to read and perceive? It is possible to answer such a broad question by way of topic modelling, a computational method to study themes in texts by accentuating words that tend to co-occur and together create different topics. Via co-occurrence, topic modelling creates topics in the form of clusters of similar words (topics); a term or a word may be a part of several topics with different degrees of probability. Topics also occur in relation to each other, and clusters and networks can be visualised by using software such as Gephi.

The article focuses on topics related to media and media policy. Depending on how many topics a topic model displays – in the article models of 50, 100, 200, and 500 topics are used – different media topics can be detected. In the 50 model, one media topic was found, whereas in the 500 model, there were several, with more specific traits such as film censorship or daily press subsidies. One finding is that film was the single medium to which the SOU genre between 1945–89 devoted the most attention. Another finding is that archival issues were closely linked to media topics during the same period. Governmental committees and SOU reports on media were primarily focused on future-oriented policies, above all how media should be supported or regulated. Yet, archiving the same media forms was also something that the state was repeatedly interested in.

In conclusion, the article explains what topic modelling is in general, how the method can be used in digital historical research – not least in relation to close reading – and how statistical analysis of the distribution of words in the form of topics can generate interesting results. The SOU data is rich; topics can be traced with many different themes. As a researcher, however, one must learn to work with data: to load different models into the Jupyter Lab environment, to compute various input values, change parameters, and often curate outcomes in a way that differs from traditional historical research practices. (Less)
Abstract (Swedish): Denna artikel innehåller en analys av samtliga rapporter från Statens offentliga utredningar (SOU), 1945-89. Analysen applicerar metoder inom digital humaniora och inkluderar forskning gjort i en Jupyterlabbmiljö som utvecklats på Humlab på Umeå universitet. Artikeln är ett resultat av forskningsprojektet Välfärdsstaten analyserad, som ämnar att digitalisera litteratur, arbetar med datasäkring av redan digitaliserade samlingar, med mera.

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/447c30fe-ff44-48e7-bac4-6d380b954406

author

Snickars, Pelle ^LU

alternative title

From cipher to plain text? : Topic modelling Swedish governmental reports, 1945–1989

publishing date

2022

type

Contribution to journal

publication status

published

subject

History

keywords

digital humaniora, digital historia, temamodellering, media historia, Statens offentliga utredningar (SOU), digital humanities, digital history, topic modelling, media history, Swedish Governmental Official Reports (SOU)

in

Scandia

volume

88

issue

1

publisher

Statens Humanistiska Forskningsrad

external identifiers

scopus:85135162255

ISSN

0036-5483

DOI

10.47868/scandia.v88i1.24206

project

Welfare State Analytics. Text Mining and Modeling Swedish Politics, Media & Culture, 1945-1989

language

Swedish

LU publication?

no

id

447c30fe-ff44-48e7-bac4-6d380b954406

date added to LUP

2022-05-30 11:31:35

date last changed

2025-10-14 09:24:44

@article{447c30fe-ff44-48e7-bac4-6d380b954406,
  abstract     = {{In 2015 the National Library of Sweden finished digitising all Governmental Official Reports (SOU) from 1922 to 1999. Traditionally, SOU reports – and work performed within different governmental committees – were tasked with preparing the Swedish government for apt and rational decision-making. The range of subjects covered by governmental committees and SOU reports basically includes every area of the Swedish welfare state, from issues focused on migration and the environment to cultural and media policy.<br/><br/>The article departs from an analysis of all SOU reports during 1945–89 as one massive dataset; in all 3,154 SOU reports that contain 87 million tokens. Research has been performed within a Jupyter Lab environment, a web application with executable Python code that can be run to perform data analysis. The Jupyter Lab environment has been developed at the digital humanities hub Humlab at Umeå University, and research is related to the project Welfare State Analytics: Text Mining and Modelling Swedish Politics, Media &amp; Culture, 1945–89. This is a digital humanities and digital history project that will digitise literature, curate already digitised collections, and perform research via probabilistic methods and text mining models.<br/><br/>If all SOU reports were to be considered one single text written by the state, which themes in this vast text is software able to read and perceive? It is possible to answer such a broad question by way of topic modelling, a computational method to study themes in texts by accentuating words that tend to co-occur and together create different topics. Via co-occurrence, topic modelling creates topics in the form of clusters of similar words (topics); a term or a word may be a part of several topics with different degrees of probability. Topics also occur in relation to each other, and clusters and networks can be visualised by using software such as Gephi.<br/><br/>The article focuses on topics related to media and media policy. Depending on how many topics a topic model displays – in the article models of 50, 100, 200, and 500 topics are used – different media topics can be detected. In the 50 model, one media topic was found, whereas in the 500 model, there were several, with more specific traits such as film censorship or daily press subsidies. One finding is that film was the single medium to which the SOU genre between 1945–89 devoted the most attention. Another finding is that archival issues were closely linked to media topics during the same period. Governmental committees and SOU reports on media were primarily focused on future-oriented policies, above all how media should be supported or regulated. Yet, archiving the same media forms was also something that the state was repeatedly interested in.<br/><br/>In conclusion, the article explains what topic modelling is in general, how the method can be used in digital historical research – not least in relation to close reading – and how statistical analysis of the distribution of words in the form of topics can generate interesting results. The SOU data is rich; topics can be traced with many different themes. As a researcher, however, one must learn to work with data: to load different models into the Jupyter Lab environment, to compute various input values, change parameters, and often curate outcomes in a way that differs from traditional historical research practices.}},
  author       = {{Snickars, Pelle}},
  issn         = {{0036-5483}},
  keywords     = {{digital humaniora; digital historia; temamodellering; media historia; Statens offentliga utredningar (SOU); digital humanities; digital history; topic modelling; media history; Swedish Governmental Official Reports (SOU)}},
  language     = {{swe}},
  number       = {{1}},
  publisher    = {{Statens Humanistiska Forskningsrad}},
  series       = {{Scandia}},
  title        = {{Från chiffer till klartext? : Temamodellering av statliga offentliga utredningar 1945–1989}},
  url          = {{http://dx.doi.org/10.47868/scandia.v88i1.24206}},
  doi          = {{10.47868/scandia.v88i1.24206}},
  volume       = {{88}},
  year         = {{2022}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Från chiffer till klartext? : Temamodellering av statliga offentliga utredningar 1945–1989