Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Using Sociolinguistic Inspired Features for Gender Classification of Web Authors

Simaki, Vasiliki LU ; Aravantinou, Christina ; Mporas, Iosif and Megalooikonomou, Vasileios (2015) In Lecture Notes in Computer Science 9302. p.587-594
Abstract
In this article we present a methodology for classification of text from web authors, using sociolinguistic inspired text features. The proposed methodology uses a baseline text mining based feature set, which is combined with text features that quantify results from theoretical and sociolinguistic studies. Two combination approaches were evaluated and the evaluation results indicated a significant improvement in both combination cases. For the best performing combination approach the accuracy was 84.36%, in terms of percentage of correctly classified web posts.
Please use this url to cite or link to this publication:
author
; ; and
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
text classification algorithms, sociolinguistics, gender identification
host publication
Text, Speech, and Dialogue : 18th International Conference, TSD 2015, Pilsen,Czech Republic, September 14-17, 2015, Proceedings - 18th International Conference, TSD 2015, Pilsen,Czech Republic, September 14-17, 2015, Proceedings
series title
Lecture Notes in Computer Science
editor
Král, Pavel and Matoušek, Václav
volume
9302
pages
587 - 594
publisher
Springer
external identifiers
  • scopus:84951770293
ISSN
0302-9743
1611-3349
ISBN
978-3-319-24032-9
978-3-319-24033-6
DOI
10.1007/978-3-319-24033-6_66
language
English
LU publication?
no
id
6b2f9200-be76-4c70-a0e8-aba4ad16b9a4
date added to LUP
2017-06-02 19:10:09
date last changed
2024-05-12 15:08:16
@inproceedings{6b2f9200-be76-4c70-a0e8-aba4ad16b9a4,
  abstract     = {{In this article we present a methodology for classification of text from web authors, using sociolinguistic inspired text features. The proposed methodology uses a baseline text mining based feature set, which is combined with text features that quantify results from theoretical and sociolinguistic studies. Two combination approaches were evaluated and the evaluation results indicated a significant improvement in both combination cases. For the best performing combination approach the accuracy was 84.36%, in terms of percentage of correctly classified web posts.}},
  author       = {{Simaki, Vasiliki and Aravantinou, Christina and Mporas, Iosif and Megalooikonomou, Vasileios}},
  booktitle    = {{Text, Speech, and Dialogue : 18th International Conference, TSD 2015, Pilsen,Czech Republic, September 14-17, 2015, Proceedings}},
  editor       = {{Král, Pavel and Matoušek, Václav}},
  isbn         = {{978-3-319-24032-9}},
  issn         = {{0302-9743}},
  keywords     = {{text classification algorithms; sociolinguistics; gender identification}},
  language     = {{eng}},
  pages        = {{587--594}},
  publisher    = {{Springer}},
  series       = {{Lecture Notes in Computer Science}},
  title        = {{Using Sociolinguistic Inspired Features for Gender Classification of Web Authors}},
  url          = {{http://dx.doi.org/10.1007/978-3-319-24033-6_66}},
  doi          = {{10.1007/978-3-319-24033-6_66}},
  volume       = {{9302}},
  year         = {{2015}},
}