Age Identification of Twitter Users : Classification Methods and Sociolinguistic Analysis

Simaki, Vasiliki; Mporas, Iosif; Megalooikonomou, Vasileios

Age Identification of Twitter Users : Classification Methods and Sociolinguistic Analysis

Mark

Simaki, Vasiliki ^LU ; Mporas, Iosif and Megalooikonomou, Vasileios (2018) 17th International Conference on Intelligent Text Processing and Computational Linguistics In Lecture Notes in Computer Science (LNCS) 9624. p.385-395

Abstract: In this article, we address the problem of age identification of Twitter users, after their online text. We used a set of text mining, sociolinguistic-based and content-related text features, and we evaluated a number of well-known and widely used machine learning algorithms for classification, in order to examine their appropriateness on this task. The experimental results showed that Random Forest algorithm offered superior performance achieving accuracy equal to 61%. We ranked the classification features after their informativity, using the ReliefF algorithm, and we analyzed the results in terms of the sociolinguistic principles on age linguistic variation.

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/5163ea45-40d8-4baa-9004-b51a8cd3cb38

author

Simaki, Vasiliki ^LU ; Mporas, Iosif and Megalooikonomou, Vasileios

publishing date

2018

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

General Language Studies and Linguistics

keywords

Text mining, Age identification, Text classification, Computational Sociolinguistics, Sociolinguistics

host publication

Computational Linguistics and Intelligent Text Processing : 17th International Conference, CICLing 2016, Konya, Turkey, April 3–9, 2016, Revised Selected Papers, Part II - 17th International Conference, CICLing 2016, Konya, Turkey, April 3–9, 2016, Revised Selected Papers, Part II

series title

Lecture Notes in Computer Science (LNCS)

editor

Gelbukh, Alexander

volume

9624

pages

385 - 395

publisher

Springer

conference name

17th International Conference on Intelligent Text Processing and Computational Linguistics

conference location

Konya, Turkey

conference dates

2016-04-03 - 2016-04-09

external identifiers

scopus:85044433086

ISBN

978-3-319-75487-1

978-3-319-75486-4

DOI

10.1007/978-3-319-75487-1_30

language

English

LU publication?

no

id

5163ea45-40d8-4baa-9004-b51a8cd3cb38

date added to LUP

2017-06-02 17:29:04

date last changed

2024-09-17 02:02:38

@inproceedings{5163ea45-40d8-4baa-9004-b51a8cd3cb38,
  abstract     = {{In this article, we address the problem of age identification of Twitter users, after their online text. We used a set of text mining, sociolinguistic-based and content-related text features, and we evaluated a number of well-known and widely used machine learning algorithms for classification, in order to examine their appropriateness on this task. The experimental results showed that Random Forest algorithm offered superior performance achieving accuracy equal to 61%. We ranked the classification features after their informativity, using the ReliefF algorithm, and we analyzed the results in terms of the sociolinguistic principles on age linguistic variation.}},
  author       = {{Simaki, Vasiliki and Mporas, Iosif and Megalooikonomou, Vasileios}},
  booktitle    = {{Computational Linguistics and Intelligent Text Processing : 17th International Conference, CICLing 2016, Konya, Turkey, April 3–9, 2016, Revised Selected Papers, Part II}},
  editor       = {{Gelbukh, Alexander}},
  isbn         = {{978-3-319-75487-1}},
  keywords     = {{Text mining; Age identification; Text classification; Computational Sociolinguistics; Sociolinguistics}},
  language     = {{eng}},
  pages        = {{385--395}},
  publisher    = {{Springer}},
  series       = {{Lecture Notes in Computer Science (LNCS)}},
  title        = {{Age Identification of Twitter Users : Classification Methods and Sociolinguistic Analysis}},
  url          = {{http://dx.doi.org/10.1007/978-3-319-75487-1_30}},
  doi          = {{10.1007/978-3-319-75487-1_30}},
  volume       = {{9624}},
  year         = {{2018}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Age Identification of Twitter Users : Classification Methods and Sociolinguistic Analysis