Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

The Text-Package : An R-Package for Analyzing and Visualizing Human Language Using Natural Language Processing and Transformers

Kjell, Oscar LU orcid ; Giorgi, Salvatore and Schwartz, H. Andrew LU (2023) In Psychological Methods 28(6). p.1478-1498
Abstract

The language that individuals use for expressing themselves contains rich psychological information. Recent significant advances in Natural Language Processing (NLP) and Deep Learning (DL), namely transformers, have resulted in large performance gains in tasks related to understanding natural language. However, these state-of-the-art methods have not yet been made easily accessible for psychology researchers, nor designed to be optimal for human-level analyses. This tutorial introduces text (https://r-text.org/), a new R-package for analyzing and visualizing human language using transformers, the latest techniques from NLP and DL. The text-package is both a modular solution for accessing state-of-the-art language models and an... (More)

The language that individuals use for expressing themselves contains rich psychological information. Recent significant advances in Natural Language Processing (NLP) and Deep Learning (DL), namely transformers, have resulted in large performance gains in tasks related to understanding natural language. However, these state-of-the-art methods have not yet been made easily accessible for psychology researchers, nor designed to be optimal for human-level analyses. This tutorial introduces text (https://r-text.org/), a new R-package for analyzing and visualizing human language using transformers, the latest techniques from NLP and DL. The text-package is both a modular solution for accessing state-of-the-art language models and an end-to-end solution catered for human-level analyses. Hence, text provides user-friendly functions tailored to test hypotheses in social sciences for both relatively small and large data sets. The tutorial describes methods for analyzing text, providing functions with reliable defaults that can be used off-the-shelf as well as providing a framework for the advanced users to build on for novel pipelines. The reader learns about three core methods: (1) textEmbed(): to transform text to modern transformer-based word embeddings;(2) textTrain() and textPredict(): to train predictive models with embeddings as input, and use the models to predict from;(3) textSimilarity() and textDistance(): to compute semantic similarity/distance scores between texts.

(Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
#Rtext, computational language assessments, machine learning, Natural Language Processing, transformers
in
Psychological Methods
volume
28
issue
6
pages
21 pages
publisher
American Psychological Association (APA)
external identifiers
  • scopus:85158155891
  • pmid:37126041
ISSN
1082-989X
DOI
10.1037/met0000542
language
English
LU publication?
yes
id
5700244d-bda3-4f67-84a4-46ff5943a4e6
date added to LUP
2023-08-16 08:25:01
date last changed
2024-12-16 01:15:46
@article{5700244d-bda3-4f67-84a4-46ff5943a4e6,
  abstract     = {{<p>The language that individuals use for expressing themselves contains rich psychological information. Recent significant advances in Natural Language Processing (NLP) and Deep Learning (DL), namely transformers, have resulted in large performance gains in tasks related to understanding natural language. However, these state-of-the-art methods have not yet been made easily accessible for psychology researchers, nor designed to be optimal for human-level analyses. This tutorial introduces text (https://r-text.org/), a new R-package for analyzing and visualizing human language using transformers, the latest techniques from NLP and DL. The text-package is both a modular solution for accessing state-of-the-art language models and an end-to-end solution catered for human-level analyses. Hence, text provides user-friendly functions tailored to test hypotheses in social sciences for both relatively small and large data sets. The tutorial describes methods for analyzing text, providing functions with reliable defaults that can be used off-the-shelf as well as providing a framework for the advanced users to build on for novel pipelines. The reader learns about three core methods: (1) textEmbed(): to transform text to modern transformer-based word embeddings;(2) textTrain() and textPredict(): to train predictive models with embeddings as input, and use the models to predict from;(3) textSimilarity() and textDistance(): to compute semantic similarity/distance scores between texts.</p>}},
  author       = {{Kjell, Oscar and Giorgi, Salvatore and Schwartz, H. Andrew}},
  issn         = {{1082-989X}},
  keywords     = {{#Rtext; computational language assessments; machine learning; Natural Language Processing; transformers}},
  language     = {{eng}},
  number       = {{6}},
  pages        = {{1478--1498}},
  publisher    = {{American Psychological Association (APA)}},
  series       = {{Psychological Methods}},
  title        = {{The Text-Package : An R-Package for Analyzing and Visualizing Human Language Using Natural Language Processing and Transformers}},
  url          = {{http://dx.doi.org/10.1037/met0000542}},
  doi          = {{10.1037/met0000542}},
  volume       = {{28}},
  year         = {{2023}},
}