Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Identify novel elements of knowledge with word embedding

Yin, Deyun ; Wu, Zhao ; Yokota, Kazuki ; Matsumoto, Kuniko and Shibayama, Sotaro LU (2023) In PLoS ONE 18(6).
Abstract

As novelty is a core value in science, a reliable approach to measuring the novelty of scientific documents is critical. Previous novelty measures however had a few limitations. First, the majority of previous measures are based on recombinant novelty concept, attempting to identify a novel combination of knowledge elements, but insufficient effort has been made to identify a novel element itself (element novelty). Second, most previous measures are not validated, and it is unclear what aspect of newness is measured. Third, some of the previous measures can be computed only in certain scientific fields for technical constraints. This study thus aims to provide a validated and field-universal approach to computing element novelty. We... (More)

As novelty is a core value in science, a reliable approach to measuring the novelty of scientific documents is critical. Previous novelty measures however had a few limitations. First, the majority of previous measures are based on recombinant novelty concept, attempting to identify a novel combination of knowledge elements, but insufficient effort has been made to identify a novel element itself (element novelty). Second, most previous measures are not validated, and it is unclear what aspect of newness is measured. Third, some of the previous measures can be computed only in certain scientific fields for technical constraints. This study thus aims to provide a validated and field-universal approach to computing element novelty. We drew on machine learning to develop a word embedding model, which allows us to extract semantic information from text data. Our validation analyses suggest that our word embedding model does convey semantic information. Based on the trained word embedding, we quantified the element novelty of a document by measuring its distance from the rest of the document universe. We then carried out a questionnaire survey to obtain self-reported novelty scores from 800 scientists. We found that our element novelty measure is significantly correlated with self-reported novelty in terms of discovering and identifying new phenomena, substances, molecules, etc. and that this correlation is observed across different scientific fields.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Humans, Semantics, Machine Learning, Surveys and Questionnaires, Self Report
in
PLoS ONE
volume
18
issue
6
article number
e0284567
pages
16 pages
publisher
Public Library of Science (PLoS)
external identifiers
  • scopus:85163378653
  • pmid:37339138
ISSN
1932-6203
DOI
10.1371/journal.pone.0284567
language
English
LU publication?
yes
id
36f96c90-a9cb-4780-8863-63119dd12e1a
date added to LUP
2023-08-20 20:47:45
date last changed
2024-04-20 01:03:25
@article{36f96c90-a9cb-4780-8863-63119dd12e1a,
  abstract     = {{<p>As novelty is a core value in science, a reliable approach to measuring the novelty of scientific documents is critical. Previous novelty measures however had a few limitations. First, the majority of previous measures are based on recombinant novelty concept, attempting to identify a novel combination of knowledge elements, but insufficient effort has been made to identify a novel element itself (element novelty). Second, most previous measures are not validated, and it is unclear what aspect of newness is measured. Third, some of the previous measures can be computed only in certain scientific fields for technical constraints. This study thus aims to provide a validated and field-universal approach to computing element novelty. We drew on machine learning to develop a word embedding model, which allows us to extract semantic information from text data. Our validation analyses suggest that our word embedding model does convey semantic information. Based on the trained word embedding, we quantified the element novelty of a document by measuring its distance from the rest of the document universe. We then carried out a questionnaire survey to obtain self-reported novelty scores from 800 scientists. We found that our element novelty measure is significantly correlated with self-reported novelty in terms of discovering and identifying new phenomena, substances, molecules, etc. and that this correlation is observed across different scientific fields.</p>}},
  author       = {{Yin, Deyun and Wu, Zhao and Yokota, Kazuki and Matsumoto, Kuniko and Shibayama, Sotaro}},
  issn         = {{1932-6203}},
  keywords     = {{Humans; Semantics; Machine Learning; Surveys and Questionnaires; Self Report}},
  language     = {{eng}},
  number       = {{6}},
  publisher    = {{Public Library of Science (PLoS)}},
  series       = {{PLoS ONE}},
  title        = {{Identify novel elements of knowledge with word embedding}},
  url          = {{http://dx.doi.org/10.1371/journal.pone.0284567}},
  doi          = {{10.1371/journal.pone.0284567}},
  volume       = {{18}},
  year         = {{2023}},
}