Advanced

Text classification of short messages

Lundborg, Anton LU (2017) In LU-CS-EX 2017-14 EDA920 20171
Department of Computer Science
Abstract
Almost every large Swedish online newspaper has disabled comments under their articles due to problems with hateful and offensive comments. In this Master's thesis, we explore different ways to detect toxic comments using machine learning. We carry out a comparison of classification algorithms and evaluate a number of different feature sets with the goal of optimizing accuracy for the classification of comments. We carry out the experiment with a manually labeled data set.

The best classifier was logistic regression with the f-score of 0.47 and recall of 0.50. We incorporated the classifier into a moderation tool for comments to help streamline the moderation process.
Please use this url to cite or link to this publication:
author
Lundborg, Anton LU
supervisor
organization
alternative title
Detecting inappropriate comments in online user debates
course
EDA920 20171
year
type
H3 - Professional qualifications (4 Years - )
subject
keywords
text classification, Machine learning, hate speech, Linear classification, neural network, natural language processing, word2vec
publication/series
LU-CS-EX 2017-14
report number
LU-CS-EX 2017-14
ISSN
1650-2884
language
English
id
8928009
date added to LUP
2017-11-01 12:55:14
date last changed
2017-11-01 12:55:14
@misc{8928009,
  abstract     = {Almost every large Swedish online newspaper has disabled comments under their articles due to problems with hateful and offensive comments. In this Master's thesis, we explore different ways to detect toxic comments using machine learning. We carry out a comparison of classification algorithms and evaluate a number of different feature sets with the goal of optimizing accuracy for the classification of comments. We carry out the experiment with a manually labeled data set.

The best classifier was logistic regression with the f-score of 0.47 and recall of 0.50. We incorporated the classifier into a moderation tool for comments to help streamline the moderation process.},
  author       = {Lundborg, Anton},
  issn         = {1650-2884},
  keyword      = {text classification,Machine learning,hate speech,Linear classification,neural network,natural language processing,word2vec},
  language     = {eng},
  note         = {Student Paper},
  series       = {LU-CS-EX 2017-14},
  title        = {Text classification of short messages},
  year         = {2017},
}