Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Check me out im a 15 year old rapper

Falini, Victor LU and Karlberg, Viktor LU (2020) STAH11 20192
Department of Statistics
Abstract
Soundcloud is the worlds third largest streaming platform for music. Despite this,
the website is riddled with spam comments that disturb user experience. This spam
diers much in format and content from the email spam that we are used to seeing,
and that most classic spam lters have been developed to detect. On soundcloud,
90% of spam comments are written by users wanting to self-promote their own music
and contain plenty of slang and spelling errors. Our aim is to develop a machine
learning model that can predict spam with a high accuracy for comments on this internet platform. We take several steps to process the data, such as removing stopwords and emojis, in order to facilitate analysis. We use python packages of the Natural... (More)
Soundcloud is the worlds third largest streaming platform for music. Despite this,
the website is riddled with spam comments that disturb user experience. This spam
diers much in format and content from the email spam that we are used to seeing,
and that most classic spam lters have been developed to detect. On soundcloud,
90% of spam comments are written by users wanting to self-promote their own music
and contain plenty of slang and spelling errors. Our aim is to develop a machine
learning model that can predict spam with a high accuracy for comments on this internet platform. We take several steps to process the data, such as removing stopwords and emojis, in order to facilitate analysis. We use python packages of the Natural Language Processing toolkit in order to process the data and train our algorithms. We test a Naïve Bayes algorithm and two different Support Vector Machines (RBF and Linear kernel) to see which one performs best. Our results show that all three algorithms are still highly effective even when presented with the challenge of detecting spam in this different kind of data; The Support Vector Machine with RBF kernel performs best, predicting 97,3% of comments correctly in our test, but is slower to train than the Naïve Bayes and with only a 2% difference in predictive precision. (Less)
Please use this url to cite or link to this publication:
author
Falini, Victor LU and Karlberg, Viktor LU
supervisor
organization
alternative title
Spamklassicering av Soundcloud-kommentarer med Naïve Bayes och SVM
course
STAH11 20192
year
type
M2 - Bachelor Degree
subject
keywords
Spam Classification, Soundcloud, Naive Bayes, Support Vector Machine
language
Swedish
id
9002967
date added to LUP
2020-05-12 09:22:18
date last changed
2020-05-12 09:22:18
@misc{9002967,
  abstract     = {{Soundcloud is the worlds third largest streaming platform for music. Despite this,
the website is riddled with spam comments that disturb user experience. This spam
diers much in format and content from the email spam that we are used to seeing,
and that most classic spam lters have been developed to detect. On soundcloud,
90% of spam comments are written by users wanting to self-promote their own music
and contain plenty of slang and spelling errors. Our aim is to develop a machine
learning model that can predict spam with a high accuracy for comments on this internet platform. We take several steps to process the data, such as removing stopwords and emojis, in order to facilitate analysis. We use python packages of the Natural Language Processing toolkit in order to process the data and train our algorithms. We test a Naïve Bayes algorithm and two different Support Vector Machines (RBF and Linear kernel) to see which one performs best. Our results show that all three algorithms are still highly effective even when presented with the challenge of detecting spam in this different kind of data; The Support Vector Machine with RBF kernel performs best, predicting 97,3% of comments correctly in our test, but is slower to train than the Naïve Bayes and with only a 2% difference in predictive precision.}},
  author       = {{Falini, Victor and Karlberg, Viktor}},
  language     = {{swe}},
  note         = {{Student Paper}},
  title        = {{Check me out im a 15 year old rapper}},
  year         = {{2020}},
}