Check me out im a 15 year old rapper
(2020) STAH11 20192Department of Statistics
- Abstract
- Soundcloud is the worlds third largest streaming platform for music. Despite this,
the website is riddled with spam comments that disturb user experience. This spam
diers much in format and content from the email spam that we are used to seeing,
and that most classic spam lters have been developed to detect. On soundcloud,
90% of spam comments are written by users wanting to self-promote their own music
and contain plenty of slang and spelling errors. Our aim is to develop a machine
learning model that can predict spam with a high accuracy for comments on this internet platform. We take several steps to process the data, such as removing stopwords and emojis, in order to facilitate analysis. We use python packages of the Natural... (More) - Soundcloud is the worlds third largest streaming platform for music. Despite this,
the website is riddled with spam comments that disturb user experience. This spam
diers much in format and content from the email spam that we are used to seeing,
and that most classic spam lters have been developed to detect. On soundcloud,
90% of spam comments are written by users wanting to self-promote their own music
and contain plenty of slang and spelling errors. Our aim is to develop a machine
learning model that can predict spam with a high accuracy for comments on this internet platform. We take several steps to process the data, such as removing stopwords and emojis, in order to facilitate analysis. We use python packages of the Natural Language Processing toolkit in order to process the data and train our algorithms. We test a Naïve Bayes algorithm and two different Support Vector Machines (RBF and Linear kernel) to see which one performs best. Our results show that all three algorithms are still highly effective even when presented with the challenge of detecting spam in this different kind of data; The Support Vector Machine with RBF kernel performs best, predicting 97,3% of comments correctly in our test, but is slower to train than the Naïve Bayes and with only a 2% difference in predictive precision. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9002967
- author
- Falini, Victor LU and Karlberg, Viktor LU
- supervisor
- organization
- alternative title
- Spamklassicering av Soundcloud-kommentarer med Naïve Bayes och SVM
- course
- STAH11 20192
- year
- 2020
- type
- M2 - Bachelor Degree
- subject
- keywords
- Spam Classification, Soundcloud, Naive Bayes, Support Vector Machine
- language
- Swedish
- id
- 9002967
- date added to LUP
- 2020-05-12 09:22:18
- date last changed
- 2020-05-12 09:22:18
@misc{9002967, abstract = {{Soundcloud is the worlds third largest streaming platform for music. Despite this, the website is riddled with spam comments that disturb user experience. This spam diers much in format and content from the email spam that we are used to seeing, and that most classic spam lters have been developed to detect. On soundcloud, 90% of spam comments are written by users wanting to self-promote their own music and contain plenty of slang and spelling errors. Our aim is to develop a machine learning model that can predict spam with a high accuracy for comments on this internet platform. We take several steps to process the data, such as removing stopwords and emojis, in order to facilitate analysis. We use python packages of the Natural Language Processing toolkit in order to process the data and train our algorithms. We test a Naïve Bayes algorithm and two different Support Vector Machines (RBF and Linear kernel) to see which one performs best. Our results show that all three algorithms are still highly effective even when presented with the challenge of detecting spam in this different kind of data; The Support Vector Machine with RBF kernel performs best, predicting 97,3% of comments correctly in our test, but is slower to train than the Naïve Bayes and with only a 2% difference in predictive precision.}}, author = {{Falini, Victor and Karlberg, Viktor}}, language = {{swe}}, note = {{Student Paper}}, title = {{Check me out im a 15 year old rapper}}, year = {{2020}}, }