Text classification of short messages
(2017) In LU-CS-EX 2017-14 EDA920 20171Department of Computer Science
- Abstract
- Almost every large Swedish online newspaper has disabled comments under their articles due to problems with hateful and offensive comments. In this Master's thesis, we explore different ways to detect toxic comments using machine learning. We carry out a comparison of classification algorithms and evaluate a number of different feature sets with the goal of optimizing accuracy for the classification of comments. We carry out the experiment with a manually labeled data set.
The best classifier was logistic regression with the f-score of 0.47 and recall of 0.50. We incorporated the classifier into a moderation tool for comments to help streamline the moderation process.
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/8928009
- author
- Lundborg, Anton LU
- supervisor
- organization
- alternative title
- Detecting inappropriate comments in online user debates
- course
- EDA920 20171
- year
- 2017
- type
- H3 - Professional qualifications (4 Years - )
- subject
- keywords
- text classification, Machine learning, hate speech, Linear classification, neural network, natural language processing, word2vec
- publication/series
- LU-CS-EX 2017-14
- report number
- LU-CS-EX 2017-14
- ISSN
- 1650-2884
- language
- English
- id
- 8928009
- date added to LUP
- 2017-11-01 12:55:14
- date last changed
- 2017-11-01 12:55:14
@misc{8928009, abstract = {{Almost every large Swedish online newspaper has disabled comments under their articles due to problems with hateful and offensive comments. In this Master's thesis, we explore different ways to detect toxic comments using machine learning. We carry out a comparison of classification algorithms and evaluate a number of different feature sets with the goal of optimizing accuracy for the classification of comments. We carry out the experiment with a manually labeled data set. The best classifier was logistic regression with the f-score of 0.47 and recall of 0.50. We incorporated the classifier into a moderation tool for comments to help streamline the moderation process.}}, author = {{Lundborg, Anton}}, issn = {{1650-2884}}, language = {{eng}}, note = {{Student Paper}}, series = {{LU-CS-EX 2017-14}}, title = {{Text classification of short messages}}, year = {{2017}}, }