Evolving Text Classifier Using Genetic Programming
(2020) BMEM01 20202Department of Biomedical Engineering
- Abstract
- Text classification is one of the main tasks within the field of natural language processing, which has been growing significantly during the last decade with applications in different industries. Despite different approaches to text classification showing good results, such as Machine Learning and Deep Learning, their shortcomings give substance to the need for further research on other approaches. In this thesis, we propose a genetic programming algorithm - a technique inspired by biological evolution, which is capable of producing text classification models by means of string matchers and character n-grams. This approach does not require domain-specific knowledge and manual feature engineering and can provide interpretability in the... (More)
- Text classification is one of the main tasks within the field of natural language processing, which has been growing significantly during the last decade with applications in different industries. Despite different approaches to text classification showing good results, such as Machine Learning and Deep Learning, their shortcomings give substance to the need for further research on other approaches. In this thesis, we propose a genetic programming algorithm - a technique inspired by biological evolution, which is capable of producing text classification models by means of string matchers and character n-grams. This approach does not require domain-specific knowledge and manual feature engineering and can provide interpretability in the model. The performance of the classification models produced by the proposed algorithm gives promising results, especially on topic detection. (Less)
- Popular Abstract (Swedish)
- Evolvering av textklassificerare med hjälp av genetisk programmering
Genetisk programmering (GP) är ett område inom AI som är inspirerat av den biologiska evolutionen. I detta arbete använder vi genetisk programmering för att evolvera fram textklassificerare. Denna metod ger många fördelar som dagens teknik saknar.
Textklassificering går ut på att kategorisera textdata efter fördefinierade klasser. Några vanliga tillämpningsområden är detektion av skräppost i mail-inkorgen, kategorisering av nyhetsartiklar, analys av produktrecensioner, och chattbotar.
Textklassificering kan uppnås med många olika metoder. Trots en del framgångar råder många brister med dagens teknik. De enklare metoderna kräver mycket manuellt arbete. Vad gäller... (More) - Evolvering av textklassificerare med hjälp av genetisk programmering
Genetisk programmering (GP) är ett område inom AI som är inspirerat av den biologiska evolutionen. I detta arbete använder vi genetisk programmering för att evolvera fram textklassificerare. Denna metod ger många fördelar som dagens teknik saknar.
Textklassificering går ut på att kategorisera textdata efter fördefinierade klasser. Några vanliga tillämpningsområden är detektion av skräppost i mail-inkorgen, kategorisering av nyhetsartiklar, analys av produktrecensioner, och chattbotar.
Textklassificering kan uppnås med många olika metoder. Trots en del framgångar råder många brister med dagens teknik. De enklare metoderna kräver mycket manuellt arbete. Vad gäller de mer avancerade metoderna som baseras på maskininlärning krävs så kallade features; features är numeriska representationer av en text. För att ta fram högkvalitativa features krävs oftast mycket goda domänspecifika kunskaper. Det finns djupinlärningsmodeller som fungerar utan manuellt framtagna features, men komplexiteten hos dessa modeller gör dem till mer eller mindre svarta lådor. Det går alltså inte att tolka på vilka grunder modellerna har gjort sina prediktioner. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9029205
- author
- Ngo, Hoang LU and Eminagic, Ema LU
- supervisor
- organization
- course
- BMEM01 20202
- year
- 2020
- type
- H2 - Master's Degree (Two Years)
- subject
- keywords
- genetic programming, evolutionary algorithm, evolutionary computation, text classification, n-grams, natural language processing
- language
- English
- additional info
- 2020-12
- id
- 9029205
- date added to LUP
- 2020-09-17 15:33:42
- date last changed
- 2020-09-17 15:33:42
@misc{9029205, abstract = {{Text classification is one of the main tasks within the field of natural language processing, which has been growing significantly during the last decade with applications in different industries. Despite different approaches to text classification showing good results, such as Machine Learning and Deep Learning, their shortcomings give substance to the need for further research on other approaches. In this thesis, we propose a genetic programming algorithm - a technique inspired by biological evolution, which is capable of producing text classification models by means of string matchers and character n-grams. This approach does not require domain-specific knowledge and manual feature engineering and can provide interpretability in the model. The performance of the classification models produced by the proposed algorithm gives promising results, especially on topic detection.}}, author = {{Ngo, Hoang and Eminagic, Ema}}, language = {{eng}}, note = {{Student Paper}}, title = {{Evolving Text Classifier Using Genetic Programming}}, year = {{2020}}, }