English dictionaries, gold and silver standard corpora for biomedical natural language processing related to SARS-CoV-2 and COVID-19
(2020)- Abstract
- Here we present a toolbox for natural language processing tasks related to SARS-CoV-2. It comprises English dictionaries of synonyms for SARS-CoV-2 and COVID-19, a silver standard corpus generated with the dictionaries and a gold standard corpus of 10 Pubmed abstracts manually annotated for disease, virus, symptom and protein/gene terms. This toolbox is freely available on github and can be used for text analytics in a variety of settings related to the COVID-19 crisis. It will be expanded and applied in NLP tasks over the next weeks and the community is invited to contribute.
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/ec656521-c625-426d-8ef9-e93344481819
- author
- Kazemi Rashed, Salma LU ; Ahmed, Rafsan LU ; Frid, Johan LU and Aits, Sonja LU
- organization
- publishing date
- 2020-03-22
- type
- Working paper/Preprint
- publication status
- published
- subject
- keywords
- SARS-CoV-2, COVID-19, Text mining, BioNLP, Artificial Intelligence, natural language processing, information retrieval
- publisher
- arXiv.org
- project
- Artificial intelligence-based text mining for COVID-19 and other areas of medicine
- Lund University AI Research
- Studying COVID-19 with artificial intelligence
- Biomedical text mining for systems biology
- language
- English
- LU publication?
- yes
- id
- ec656521-c625-426d-8ef9-e93344481819
- alternative location
- https://arxiv.org/abs/2003.09865
- date added to LUP
- 2020-03-27 13:30:56
- date last changed
- 2023-08-30 11:10:24
@misc{ec656521-c625-426d-8ef9-e93344481819, abstract = {{Here we present a toolbox for natural language processing tasks related to SARS-CoV-2. It comprises English dictionaries of synonyms for SARS-CoV-2 and COVID-19, a silver standard corpus generated with the dictionaries and a gold standard corpus of 10 Pubmed abstracts manually annotated for disease, virus, symptom and protein/gene terms. This toolbox is freely available on github and can be used for text analytics in a variety of settings related to the COVID-19 crisis. It will be expanded and applied in NLP tasks over the next weeks and the community is invited to contribute.}}, author = {{Kazemi Rashed, Salma and Ahmed, Rafsan and Frid, Johan and Aits, Sonja}}, keywords = {{SARS-CoV-2; COVID-19; Text mining; BioNLP; Artificial Intelligence; natural language processing; information retrieval}}, language = {{eng}}, month = {{03}}, note = {{Preprint}}, publisher = {{arXiv.org}}, title = {{English dictionaries, gold and silver standard corpora for biomedical natural language processing related to SARS-CoV-2 and COVID-19}}, url = {{https://arxiv.org/abs/2003.09865}}, year = {{2020}}, }