Age Identification of Twitter Users : Classification Methods and Sociolinguistic Analysis
(2018) 17th International Conference on Intelligent Text Processing and Computational Linguistics In Lecture Notes in Computer Science (LNCS) 9624. p.385-395- Abstract
- In this article, we address the problem of age identification of Twitter users, after their online text. We used a set of text mining, sociolinguistic-based and content-related text features, and we evaluated a number of well-known and widely used machine learning algorithms for classification, in order to examine their appropriateness on this task. The experimental results showed that Random Forest algorithm offered superior performance achieving accuracy equal to 61%. We ranked the classification features after their informativity, using the ReliefF algorithm, and we analyzed the results in terms of the sociolinguistic principles on age linguistic variation.
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/5163ea45-40d8-4baa-9004-b51a8cd3cb38
- author
- Simaki, Vasiliki LU ; Mporas, Iosif and Megalooikonomou, Vasileios
- publishing date
- 2018
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- keywords
- Text mining, Age identification, Text classification, Computational Sociolinguistics, Sociolinguistics
- host publication
- Computational Linguistics and Intelligent Text Processing : 17th International Conference, CICLing 2016, Konya, Turkey, April 3–9, 2016, Revised Selected Papers, Part II - 17th International Conference, CICLing 2016, Konya, Turkey, April 3–9, 2016, Revised Selected Papers, Part II
- series title
- Lecture Notes in Computer Science (LNCS)
- editor
- Gelbukh, Alexander
- volume
- 9624
- pages
- 385 - 395
- publisher
- Springer
- conference name
- 17th International Conference on Intelligent Text Processing and Computational Linguistics
- conference location
- Konya, Turkey
- conference dates
- 2016-04-03 - 2016-04-09
- external identifiers
-
- scopus:85044433086
- ISBN
- 978-3-319-75487-1
- 978-3-319-75486-4
- DOI
- 10.1007/978-3-319-75487-1_30
- language
- English
- LU publication?
- no
- id
- 5163ea45-40d8-4baa-9004-b51a8cd3cb38
- date added to LUP
- 2017-06-02 17:29:04
- date last changed
- 2024-09-17 02:02:38
@inproceedings{5163ea45-40d8-4baa-9004-b51a8cd3cb38, abstract = {{In this article, we address the problem of age identification of Twitter users, after their online text. We used a set of text mining, sociolinguistic-based and content-related text features, and we evaluated a number of well-known and widely used machine learning algorithms for classification, in order to examine their appropriateness on this task. The experimental results showed that Random Forest algorithm offered superior performance achieving accuracy equal to 61%. We ranked the classification features after their informativity, using the ReliefF algorithm, and we analyzed the results in terms of the sociolinguistic principles on age linguistic variation.}}, author = {{Simaki, Vasiliki and Mporas, Iosif and Megalooikonomou, Vasileios}}, booktitle = {{Computational Linguistics and Intelligent Text Processing : 17th International Conference, CICLing 2016, Konya, Turkey, April 3–9, 2016, Revised Selected Papers, Part II}}, editor = {{Gelbukh, Alexander}}, isbn = {{978-3-319-75487-1}}, keywords = {{Text mining; Age identification; Text classification; Computational Sociolinguistics; Sociolinguistics}}, language = {{eng}}, pages = {{385--395}}, publisher = {{Springer}}, series = {{Lecture Notes in Computer Science (LNCS)}}, title = {{Age Identification of Twitter Users : Classification Methods and Sociolinguistic Analysis}}, url = {{http://dx.doi.org/10.1007/978-3-319-75487-1_30}}, doi = {{10.1007/978-3-319-75487-1_30}}, volume = {{9624}}, year = {{2018}}, }