Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Evolving Text Classifier Using Genetic Programming

Ngo, Hoang LU and Eminagic, Ema LU (2020) BMEM01 20202
Department of Biomedical Engineering
Abstract
Text classification is one of the main tasks within the field of natural language processing, which has been growing significantly during the last decade with applications in different industries. Despite different approaches to text classification showing good results, such as Machine Learning and Deep Learning, their shortcomings give substance to the need for further research on other approaches. In this thesis, we propose a genetic programming algorithm - a technique inspired by biological evolution, which is capable of producing text classification models by means of string matchers and character n-grams. This approach does not require domain-specific knowledge and manual feature engineering and can provide interpretability in the... (More)
Text classification is one of the main tasks within the field of natural language processing, which has been growing significantly during the last decade with applications in different industries. Despite different approaches to text classification showing good results, such as Machine Learning and Deep Learning, their shortcomings give substance to the need for further research on other approaches. In this thesis, we propose a genetic programming algorithm - a technique inspired by biological evolution, which is capable of producing text classification models by means of string matchers and character n-grams. This approach does not require domain-specific knowledge and manual feature engineering and can provide interpretability in the model. The performance of the classification models produced by the proposed algorithm gives promising results, especially on topic detection. (Less)
Popular Abstract (Swedish)
Evolvering av textklassificerare med hjälp av genetisk programmering

Genetisk programmering (GP) är ett område inom AI som är inspirerat av den biologiska evolutionen. I detta arbete använder vi genetisk programmering för att evolvera fram textklassificerare. Denna metod ger många fördelar som dagens teknik saknar.

Textklassificering går ut på att kategorisera textdata efter fördefinierade klasser. Några vanliga tillämpningsområden är detektion av skräppost i mail-inkorgen, kategorisering av nyhetsartiklar, analys av produktrecensioner, och chattbotar.

Textklassificering kan uppnås med många olika metoder. Trots en del framgångar råder många brister med dagens teknik. De enklare metoderna kräver mycket manuellt arbete. Vad gäller... (More)
Evolvering av textklassificerare med hjälp av genetisk programmering

Genetisk programmering (GP) är ett område inom AI som är inspirerat av den biologiska evolutionen. I detta arbete använder vi genetisk programmering för att evolvera fram textklassificerare. Denna metod ger många fördelar som dagens teknik saknar.

Textklassificering går ut på att kategorisera textdata efter fördefinierade klasser. Några vanliga tillämpningsområden är detektion av skräppost i mail-inkorgen, kategorisering av nyhetsartiklar, analys av produktrecensioner, och chattbotar.

Textklassificering kan uppnås med många olika metoder. Trots en del framgångar råder många brister med dagens teknik. De enklare metoderna kräver mycket manuellt arbete. Vad gäller de mer avancerade metoderna som baseras på maskininlärning krävs så kallade features; features är numeriska representationer av en text. För att ta fram högkvalitativa features krävs oftast mycket goda domänspecifika kunskaper. Det finns djupinlärningsmodeller som fungerar utan manuellt framtagna features, men komplexiteten hos dessa modeller gör dem till mer eller mindre svarta lådor. Det går alltså inte att tolka på vilka grunder modellerna har gjort sina prediktioner. (Less)
Please use this url to cite or link to this publication:
author
Ngo, Hoang LU and Eminagic, Ema LU
supervisor
organization
course
BMEM01 20202
year
type
H2 - Master's Degree (Two Years)
subject
keywords
genetic programming, evolutionary algorithm, evolutionary computation, text classification, n-grams, natural language processing
language
English
additional info
2020-12
id
9029205
date added to LUP
2020-09-17 15:33:42
date last changed
2020-09-17 15:33:42
@misc{9029205,
  abstract     = {{Text classification is one of the main tasks within the field of natural language processing, which has been growing significantly during the last decade with applications in different industries. Despite different approaches to text classification showing good results, such as Machine Learning and Deep Learning, their shortcomings give substance to the need for further research on other approaches. In this thesis, we propose a genetic programming algorithm - a technique inspired by biological evolution, which is capable of producing text classification models by means of string matchers and character n-grams. This approach does not require domain-specific knowledge and manual feature engineering and can provide interpretability in the model. The performance of the classification models produced by the proposed algorithm gives promising results, especially on topic detection.}},
  author       = {{Ngo, Hoang and Eminagic, Ema}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Evolving Text Classifier Using Genetic Programming}},
  year         = {{2020}},
}