Exploration of using Twitter data to predict Swedish political opinion polls with neural networks

Gren, Alexander; Lundgren, Klara

Exploration of using Twitter data to predict Swedish political opinion polls with neural networks

Mark

Gren, Alexander ^LU and Lundgren, Klara ^LU (2023) In Master's Theses in Mathematical Sciences FMSM01 20231
Mathematical Statistics

Abstract: This thesis aims to explore the possibility of using deep learning techniques to mine opinions on Twitter, with the objective to predict the
political opinion distribution in Sweden. Different methods of gathering
and annotating training data are evaluated to achieve accurate and reliable predictions. The models are quite successful at predicting test data,
achieving F1-scores in the range of 70 % to 85 %. Some party divisions
are found more difficult to classify than others. It is hypothesized and
validated that the context of the tweets can aid in the classification process. In practice, this is carried out by exploiting the structure of the
tweet thread structure. When the models were used to predict the general political... (More); This thesis aims to explore the possibility of using deep learning techniques to mine opinions on Twitter, with the objective to predict the
political opinion distribution in Sweden. Different methods of gathering
and annotating training data are evaluated to achieve accurate and reliable predictions. The models are quite successful at predicting test data,
achieving F1-scores in the range of 70 % to 85 %. Some party divisions
are found more difficult to classify than others. It is hypothesized and
validated that the context of the tweets can aid in the classification process. In practice, this is carried out by exploiting the structure of the
tweet thread structure. When the models were used to predict the general political discussion on Twitter, the results show that the predictions
are subject to large variance. Different executions can yield wildly different results and, thus, are determined not reliable enough to use as input
to regression when trying to find the relation between the predicted opinion distributions and the opinion polls. The underlying issue causing the
large variance is investigated and results suggest that the training data
is too small, or of too low quality, which causes the model to overfit and
makes patterns hard to recognize. A lexicon-based classification is carried
out as a supplement, but no significant relation can be stated between the
predicted opinion and the opinion polls. Furthermore, it is discussed that
the issue of insufficient results might lie within the method itself. The
Swedish political discussion might not be polarized enough to make good
classifications or Twitter political discussion might not be representative
of the general opinion at all. (Less)
Popular Abstract: This thesis explores the opportunity of utilizing machine learning techniques to predict the political affiliation of Twitter users and to use this to predict political opinion distribution in Sweden. The core idea of a democratic society is that public opinion shapes the future. While it’s only at the election the opinion of the people actually is recorded, changes in opinion between the elections are of interest to a large number of stakeholders. Using machine learning techniques to learn public opinion shows potential, but further research needs to be conducted in the field before accurate predictions of the opinion distribution can be generated from Twitter data.
In democratic governance, the power rests firmly in the hands of the... (More); This thesis explores the opportunity of utilizing machine learning techniques to predict the political affiliation of Twitter users and to use this to predict political opinion distribution in Sweden. The core idea of a democratic society is that public opinion shapes the future. While it’s only at the election the opinion of the people actually is recorded, changes in opinion between the elections are of interest to a large number of stakeholders. Using machine learning techniques to learn public opinion shows potential, but further research needs to be conducted in the field before accurate predictions of the opinion distribution can be generated from Twitter data.
In democratic governance, the power rests firmly in the hands of the people, and their collective opinions shape the future of society. While elections serve as important milestones, there is an increasing desire for more frequent updates on the ever-changing political landscape. With the breakthrough of the internet, it is now possible to learn the thoughts and feelings of thousands of people in real-time.
Much of the political discourse in Sweden is carried out on Twitter, which is a social media platform mainly intended for discussions. Since opinion polls aim to give an indication of how the people would vote, had they voted today, perhaps the opinions expressed on Twitter could be used to generate opinion polls? Currently, opinion polls are costly and labour-intensive to create and there is a demand for more frequent polls. This thesis explores the possibility of utilizing machine learning techniques to predict opinion polls.
The first step in order to predict opinion polls is to classify the political affiliation of tweets, and then use these to classify users. The classified users will be used to generate a political opinion distribution similar to an opinion poll. To classify the party affiliation of a tweet, Natural Language Processing (NLP) is used, which is a branch of machine learning concerned with learning models to understand human language. By gathering real-time Twitter data and exploiting NLP techniques, we have all ingredients required to try and mine the opinion of the people sharing political opinions on Twitter.
There are many ways to predict the sentiment of a tweet, but previous research suggests that neural networks are the most successful method for this task and are therefore used in this thesis. A neural network tries to mimic the structure of the human brain and is trained to predict the sentiment of a tweet by learning from a training dataset. The discussion format of Twitter is that people generally comment on their reactions to other tweets, creating threads of tweets that relate to each other. When utilizing threads in the classification, results show that the model can classify more tweets accurately. The model looks promising when predicting tweets hand-picked for testing. However, when tried on general political discussion the results are very variable, two predictions of the same tweet using the same model can yield wildly different results. The variance observed makes the method unpredictable and shows a need for further research in this field.
As it can be hard for us humans to understand the meaning of text sometimes, it is no surprise that this is a difficult task for machines. The tweet may contain too little information to indicate party affiliation, even for a human. There is no question that mining political opinion on social media is a huge possibility in politics, but alternative ways to make sense of the opinions expressed on Twitter need to be further explored. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9126219

author

Gren, Alexander ^LU and Lundgren, Klara ^LU

supervisor

Andreas Jakobsson ^LU

organization

Mathematical Statistics

course

FMSM01 20231

year

2023

type

H2 - Master's Degree (Two Years)

subject

Mathematics and Statistics

publication/series

Master's Theses in Mathematical Sciences

report number

LUTFMS-3477-2023

ISSN

1404-6342

other publication id

2023:E36

language

English

id

9126219

date added to LUP

2023-06-22 10:08:55

date last changed

2023-07-03 13:47:14

@misc{9126219,
  abstract     = {{This thesis aims to explore the possibility of using deep learning techniques to mine opinions on Twitter, with the objective to predict the
political opinion distribution in Sweden. Different methods of gathering
and annotating training data are evaluated to achieve accurate and reliable predictions. The models are quite successful at predicting test data,
achieving F1-scores in the range of 70 % to 85 %. Some party divisions
are found more difficult to classify than others. It is hypothesized and
validated that the context of the tweets can aid in the classification process. In practice, this is carried out by exploiting the structure of the
tweet thread structure. When the models were used to predict the general political discussion on Twitter, the results show that the predictions
are subject to large variance. Different executions can yield wildly different results and, thus, are determined not reliable enough to use as input
to regression when trying to find the relation between the predicted opinion distributions and the opinion polls. The underlying issue causing the
large variance is investigated and results suggest that the training data
is too small, or of too low quality, which causes the model to overfit and
makes patterns hard to recognize. A lexicon-based classification is carried
out as a supplement, but no significant relation can be stated between the
predicted opinion and the opinion polls. Furthermore, it is discussed that
the issue of insufficient results might lie within the method itself. The
Swedish political discussion might not be polarized enough to make good
classifications or Twitter political discussion might not be representative
of the general opinion at all.}},
  author       = {{Gren, Alexander and Lundgren, Klara}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Exploration of using Twitter data to predict Swedish political opinion polls with neural networks}},
  year         = {{2023}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Exploration of using Twitter data to predict Swedish political opinion polls with neural networks