Abstract
Motivation: In the age of big data, the amount of scientific
information available online dwarfs the ability of current tools to
support researchers in locating and securing access to the
necessary materials. Well-structured open data and the smart
systems that make the appropriate use of it are invaluable and
can help health researchers and professionals to find the
appropriate information by, e.g., configuring the monitoring of
information or refining a specific query on a disease.
Methods: We present an automated text classifier approach
based on the MEDLINE/MeSH thesaurus, trained on the manual
annotation of more than 26 million expert-annotated scientific
abstracts. The classifier was developed tailor-fit to the public
health and health research domain experts, in the light of their
specific challenges and needs. We have applied the proposed
methodology on three specific health domains: the Coronavirus,
Mental Health and Diabetes, considering the pertinence of the
first, and the known relations with the other two health topics.
Results: A classifier is trained on the MEDLINE dataset that can
automatically annotate text, such as scientific articles, news
articles or medical reports with relevant concepts from the
MeSH thesaurus.
Conclusions: The proposed text classifier shows promising
results in the evaluation of health-related news. The application
of the developed classifier enables the exploration of news and
extraction of health-related insights, based on the MeSH
thesaurus, through a similar workflow as in the usage of
PubMed, with which most health researchers are familiar.
information available online dwarfs the ability of current tools to
support researchers in locating and securing access to the
necessary materials. Well-structured open data and the smart
systems that make the appropriate use of it are invaluable and
can help health researchers and professionals to find the
appropriate information by, e.g., configuring the monitoring of
information or refining a specific query on a disease.
Methods: We present an automated text classifier approach
based on the MEDLINE/MeSH thesaurus, trained on the manual
annotation of more than 26 million expert-annotated scientific
abstracts. The classifier was developed tailor-fit to the public
health and health research domain experts, in the light of their
specific challenges and needs. We have applied the proposed
methodology on three specific health domains: the Coronavirus,
Mental Health and Diabetes, considering the pertinence of the
first, and the known relations with the other two health topics.
Results: A classifier is trained on the MEDLINE dataset that can
automatically annotate text, such as scientific articles, news
articles or medical reports with relevant concepts from the
MeSH thesaurus.
Conclusions: The proposed text classifier shows promising
results in the evaluation of health-related news. The application
of the developed classifier enables the exploration of news and
extraction of health-related insights, based on the MeSH
thesaurus, through a similar workflow as in the usage of
PubMed, with which most health researchers are familiar.
Original language | English |
---|---|
Article number | 102053 |
Pages (from-to) | 1-11 |
Number of pages | 12 |
Journal | Artificial Intelligence in Medicine |
Volume | 114 |
DOIs | |
Publication status | Published (in print/issue) - 13 Mar 2021 |
Bibliographical note
Funding Information:This work was supported by the European Commission H2020 project MIDAS (G.A. nr. 727721 ).
Publisher Copyright:
© 2021 Elsevier B.V.
Keywords
- Big Data
- semantic technologies
- Public Health
- Healthcare
- Text Mining
- MeSH Headings
- MEDLINE
- PubMed
- COVID-19
- Diabetes
- Mental Health
- Text mining
- MeSH headings
- Mental health
- Big data
- Semantic technologies
- Public health