Advanced

Identifying the Authors’ National Variety of English in Social Media Texts

Simaki, Vasiliki LU ; Simakis, Panagiotis ; Paradis, Carita LU and Andreas, Kerren (2017) The 11th Biennial Conference on Recent Advances In Natural Language Processing (RANLP '17), 2-8 September 2017, Varna, Bulgaria p.671-678
Abstract
In this paper, we present a study for the identification of authors’ national variety of English in texts from social media. In data from Facebook and Twitter, information about the author’s social profile is annotated, and the national English variety (US, UK, AUS, CAN, NNS) that each author uses is attributed. We tested four feature types: formal linguistic features, POS features, lexicon-based features related to the different varieties, and data-based features from each English variety. We used various machine learning algorithms for the classification experiments, and we implemented a feature selection
process. The classification accuracy achieved, when the 31 highest ranked
features were used, was up to 77.32%. The... (More)
In this paper, we present a study for the identification of authors’ national variety of English in texts from social media. In data from Facebook and Twitter, information about the author’s social profile is annotated, and the national English variety (US, UK, AUS, CAN, NNS) that each author uses is attributed. We tested four feature types: formal linguistic features, POS features, lexicon-based features related to the different varieties, and data-based features from each English variety. We used various machine learning algorithms for the classification experiments, and we implemented a feature selection
process. The classification accuracy achieved, when the 31 highest ranked
features were used, was up to 77.32%. The experimental results are evaluated, and the efficacy of the ranked features discussed. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
host publication
Recent Advances in Natural Language Processing : Proceedings - Proceedings
editor
Angelova, Galia ; Bontcheva, Kalina ; Metkov, Ruslan ; Nikolova, Ivelina ; Temnikova, Irina ; ; ; ; and
pages
671 - 678
publisher
Association for Computational Linguistics
conference name
The 11th Biennial Conference on Recent Advances In Natural Language Processing (RANLP '17), 2-8 September 2017, Varna, Bulgaria
conference location
Varna, Bulgaria
conference dates
2017-09-02 - 2017-09-08
ISBN
978-954-452-048-9
978-954-452-049-6
DOI
10.26615/978-954-452-049-6_086
language
English
LU publication?
yes
id
d6aee74b-cf44-485e-aaaf-9c7f9b1947d9
date added to LUP
2017-08-24 13:15:57
date last changed
2019-12-22 04:00:10
@inproceedings{d6aee74b-cf44-485e-aaaf-9c7f9b1947d9,
  abstract     = {In this paper, we present a study for the identification of authors’ national variety of English in texts from social media. In data from Facebook and Twitter, information about the author’s social profile is annotated, and the national English variety (US, UK, AUS, CAN, NNS) that each author uses is attributed. We tested four feature types: formal linguistic features, POS features, lexicon-based features related to the different varieties, and data-based features from each English variety. We used various machine learning algorithms for the classification experiments, and we implemented a feature selection<br/>process. The classification accuracy achieved, when the 31 highest ranked<br/>features were used, was up to 77.32%. The experimental results are evaluated, and the efficacy of the ranked features discussed.},
  author       = {Simaki, Vasiliki and Simakis, Panagiotis and Paradis, Carita and Andreas, Kerren},
  booktitle    = {Recent Advances in Natural Language Processing : Proceedings},
  editor       = {Angelova, Galia and Bontcheva, Kalina and Metkov, Ruslan and Nikolova, Ivelina and Temnikova, Irina},
  isbn         = {978-954-452-048-9},
  language     = {eng},
  pages        = {671--678},
  publisher    = {Association for Computational Linguistics},
  title        = {Identifying the Authors’ National Variety of English in Social Media Texts},
  url          = {http://dx.doi.org/10.26615/978-954-452-049-6_086},
  doi          = {10.26615/978-954-452-049-6_086},
  year         = {2017},
}