Text classification of short messages

Lundborg, Anton

Text classification of short messages

Mark

Lundborg, Anton ^LU (2017) In LU-CS-EX 2017-14 EDA920 20171
Department of Computer Science

Abstract: Almost every large Swedish online newspaper has disabled comments under their articles due to problems with hateful and offensive comments. In this Master's thesis, we explore different ways to detect toxic comments using machine learning. We carry out a comparison of classification algorithms and evaluate a number of different feature sets with the goal of optimizing accuracy for the classification of comments. We carry out the experiment with a manually labeled data set.

The best classifier was logistic regression with the f-score of 0.47 and recall of 0.50. We incorporated the classifier into a moderation tool for comments to help streamline the moderation process.

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/8928009

author

Lundborg, Anton ^LU

supervisor

Pierre Nugues ^LU

organization

Department of Computer Science

alternative title

Detecting inappropriate comments in online user debates

course

EDA920 20171

year

2017

type

H3 - Professional qualifications (4 Years - )

subject

Technology and Engineering

keywords

text classification, Machine learning, hate speech, Linear classification, neural network, natural language processing, word2vec

publication/series

LU-CS-EX 2017-14

report number

LU-CS-EX 2017-14

ISSN

1650-2884

language

English

id

8928009

date added to LUP

2017-11-01 12:55:14

date last changed

2017-11-01 12:55:14

@misc{8928009,
  abstract     = {{Almost every large Swedish online newspaper has disabled comments under their articles due to problems with hateful and offensive comments. In this Master's thesis, we explore different ways to detect toxic comments using machine learning. We carry out a comparison of classification algorithms and evaluate a number of different feature sets with the goal of optimizing accuracy for the classification of comments. We carry out the experiment with a manually labeled data set.

The best classifier was logistic regression with the f-score of 0.47 and recall of 0.50. We incorporated the classifier into a moderation tool for comments to help streamline the moderation process.}},
  author       = {{Lundborg, Anton}},
  issn         = {{1650-2884}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{LU-CS-EX 2017-14}},
  title        = {{Text classification of short messages}},
  year         = {{2017}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Text classification of short messages