Advanced

Age Identification of Twitter Users : Classification Methods and Sociolinguistic Analysis

Simaki, Vasiliki LU ; Mporas, Iosif and Megalooikonomou, Vasileios (2018) 17th International Conference on Intelligent Text Processing and Computational Linguistics In Lecture Notes in Computer Science (LNCS) 9624. p.385-395
Abstract
In this article, we address the problem of age identification of Twitter users, after their online text. We used a set of text mining, sociolinguistic-based and content-related text features, and we evaluated a number of well-known and widely used machine learning algorithms for classification, in order to examine their appropriateness on this task. The experimental results showed that Random Forest algorithm offered superior performance achieving accuracy equal to 61%. We ranked the classification features after their informativity, using the ReliefF algorithm, and we analyzed the results in terms of the sociolinguistic principles on age linguistic variation.
Please use this url to cite or link to this publication:
author
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
Text mining, Age identification, Text classification, Computational Sociolinguistics, Sociolinguistics
host publication
Computational Linguistics and Intelligent Text Processing : 17th International Conference, CICLing 2016, Konya, Turkey, April 3–9, 2016, Revised Selected Papers, Part II - 17th International Conference, CICLing 2016, Konya, Turkey, April 3–9, 2016, Revised Selected Papers, Part II
series title
Lecture Notes in Computer Science (LNCS)
editor
Gelbukh, Alexander and
volume
9624
pages
385 - 395
publisher
Springer
conference name
17th International Conference on Intelligent Text Processing and Computational Linguistics
conference location
Konya, Turkey
conference dates
2016-04-03 - 2016-04-09
external identifiers
  • scopus:85044433086
ISBN
978-3-319-75486-4
978-3-319-75487-1
DOI
10.1007/978-3-319-75487-1_30
language
English
LU publication?
no
id
5163ea45-40d8-4baa-9004-b51a8cd3cb38
date added to LUP
2017-06-02 17:29:04
date last changed
2020-01-12 23:46:08
@inproceedings{5163ea45-40d8-4baa-9004-b51a8cd3cb38,
  abstract     = {In this article, we address the problem of age identification of Twitter users, after their online text. We used a set of text mining, sociolinguistic-based and content-related text features, and we evaluated a number of well-known and widely used machine learning algorithms for classification, in order to examine their appropriateness on this task. The experimental results showed that Random Forest algorithm offered superior performance achieving accuracy equal to 61%. We ranked the classification features after their informativity, using the ReliefF algorithm, and we analyzed the results in terms of the sociolinguistic principles on age linguistic variation.},
  author       = {Simaki, Vasiliki and Mporas, Iosif and Megalooikonomou, Vasileios},
  booktitle    = {Computational Linguistics and Intelligent Text Processing : 17th International Conference, CICLing 2016, Konya, Turkey, April 3–9, 2016, Revised Selected Papers, Part II},
  editor       = {Gelbukh, Alexander},
  isbn         = {978-3-319-75486-4},
  language     = {eng},
  pages        = {385--395},
  publisher    = {Springer},
  series       = {Lecture Notes in Computer Science (LNCS)},
  title        = {Age Identification of Twitter Users : Classification Methods and Sociolinguistic Analysis},
  url          = {http://dx.doi.org/10.1007/978-3-319-75487-1_30},
  doi          = {10.1007/978-3-319-75487-1_30},
  volume       = {9624},
  year         = {2018},
}