Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Measuring novelty in science with word embedding

Shibayama, Sotaro LU ; Yin, Deyun and Matsumoto, Kuniko (2021) In PLoS ONE 16(7).
Abstract
Novelty is a core value in science, and a reliable measurement of novelty is crucial. This study proposes a new approach of measuring the novelty of scientific articles based on both citation data and text data. The proposed approach considers an article to be novel if it cites a combination of semantically distant references. To this end, we first assign a word embedding–a vector representation of each vocabulary–to each cited reference on the basis of text information included in the reference. With these vectors, a distance between every pair of references is computed. Finally, the novelty of a focal document is evaluated by summarizing the distances between all references. The approach draws on limited text information (the titles of... (More)
Novelty is a core value in science, and a reliable measurement of novelty is crucial. This study proposes a new approach of measuring the novelty of scientific articles based on both citation data and text data. The proposed approach considers an article to be novel if it cites a combination of semantically distant references. To this end, we first assign a word embedding–a vector representation of each vocabulary–to each cited reference on the basis of text information included in the reference. With these vectors, a distance between every pair of references is computed. Finally, the novelty of a focal document is evaluated by summarizing the distances between all references. The approach draws on limited text information (the titles of references) and publicly shared library for word embeddings, which minimizes the requirement of data access and computational cost. We share the code, with which one can compute the novelty score of a document of interest only by having the focal document’s reference list. We validate the proposed measure through three exercises. First, we confirm that word embeddings can be used to quantify semantic distances between documents by comparing with an established bibliometric distance measure. Second, we confirm the criterion-related validity of the proposed novelty measure with self-reported novelty scores collected from a questionnaire survey. Finally, as novelty is known to be correlated with future citation impact, we confirm that the proposed measure can predict future citation. (Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
bibliometrics, novelty, scientific publishing
in
PLoS ONE
volume
16
issue
7
article number
e0254034
publisher
Public Library of Science (PLoS)
external identifiers
  • scopus:85109155683
  • pmid:34214135
ISSN
1932-6203
DOI
10.1371/journal.pone.0254034
language
English
LU publication?
yes
id
e4ffe3c0-a065-494c-adc6-0bd290ed4e0d
alternative location
https://dx.plos.org/10.1371/journal.pone.0254034
date added to LUP
2021-07-02 23:19:26
date last changed
2024-01-20 09:06:39
@article{e4ffe3c0-a065-494c-adc6-0bd290ed4e0d,
  abstract     = {{Novelty is a core value in science, and a reliable measurement of novelty is crucial. This study proposes a new approach of measuring the novelty of scientific articles based on both citation data and text data. The proposed approach considers an article to be novel if it cites a combination of semantically distant references. To this end, we first assign a word embedding–a vector representation of each vocabulary–to each cited reference on the basis of text information included in the reference. With these vectors, a distance between every pair of references is computed. Finally, the novelty of a focal document is evaluated by summarizing the distances between all references. The approach draws on limited text information (the titles of references) and publicly shared library for word embeddings, which minimizes the requirement of data access and computational cost. We share the code, with which one can compute the novelty score of a document of interest only by having the focal document’s reference list. We validate the proposed measure through three exercises. First, we confirm that word embeddings can be used to quantify semantic distances between documents by comparing with an established bibliometric distance measure. Second, we confirm the criterion-related validity of the proposed novelty measure with self-reported novelty scores collected from a questionnaire survey. Finally, as novelty is known to be correlated with future citation impact, we confirm that the proposed measure can predict future citation.}},
  author       = {{Shibayama, Sotaro and Yin, Deyun and Matsumoto, Kuniko}},
  issn         = {{1932-6203}},
  keywords     = {{bibliometrics; novelty; scientific publishing}},
  language     = {{eng}},
  number       = {{7}},
  publisher    = {{Public Library of Science (PLoS)}},
  series       = {{PLoS ONE}},
  title        = {{Measuring novelty in science with word embedding}},
  url          = {{http://dx.doi.org/10.1371/journal.pone.0254034}},
  doi          = {{10.1371/journal.pone.0254034}},
  volume       = {{16}},
  year         = {{2021}},
}