Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

English dictionaries, gold and silver standard corpora for biomedical natural language processing related to SARS-CoV-2 and COVID-19

Kazemi Rashed, Salma LU ; Ahmed, Rafsan LU orcid ; Frid, Johan LU orcid and Aits, Sonja LU orcid (2020)
Abstract
Here we present a toolbox for natural language processing tasks related to SARS-CoV-2. It comprises English dictionaries of synonyms for SARS-CoV-2 and COVID-19, a silver standard corpus generated with the dictionaries and a gold standard corpus of 10 Pubmed abstracts manually annotated for disease, virus, symptom and protein/gene terms. This toolbox is freely available on github and can be used for text analytics in a variety of settings related to the COVID-19 crisis. It will be expanded and applied in NLP tasks over the next weeks and the community is invited to contribute.
Please use this url to cite or link to this publication:
@misc{ec656521-c625-426d-8ef9-e93344481819,
  abstract     = {{Here we present a toolbox for natural language processing tasks related to SARS-CoV-2. It comprises English dictionaries of synonyms for SARS-CoV-2 and COVID-19, a silver standard corpus generated with the dictionaries and a gold standard corpus of 10 Pubmed abstracts manually annotated for disease, virus, symptom and protein/gene terms. This toolbox is freely available on github and can be used for text analytics in a variety of settings related to the COVID-19 crisis. It will be expanded and applied in NLP tasks over the next weeks and the community is invited to contribute.}},
  author       = {{Kazemi Rashed, Salma and Ahmed, Rafsan and Frid, Johan and Aits, Sonja}},
  keywords     = {{SARS-CoV-2; COVID-19; Text mining; BioNLP; Artificial Intelligence; natural language processing; information retrieval}},
  language     = {{eng}},
  month        = {{03}},
  note         = {{Preprint}},
  publisher    = {{arXiv.org}},
  title        = {{English dictionaries, gold and silver standard corpora for biomedical natural language processing related to SARS-CoV-2 and COVID-19}},
  url          = {{https://arxiv.org/abs/2003.09865}},
  year         = {{2020}},
}