Abstract
Automatic text classification using machine learning is significantlyaffected by the text representation model. The structural information in textis necessary for natural language understanding, which is usually ignored invector-based representations. In this paper, we present a graph kernel-basedtext classification framework which utilises the structural information in texteffectively through the weighting and enrichment of a graph-based representation.We introduce weighted co-occurrence graphs to represent text documents,which weight the terms and their dependencies based on their relevance to textclassification. We propose a novel method to automatically enrich the weightedgraphs using semantic knowledge in the form of a word similarity matrix. Thesimilarity between enriched graphs, knowledge-driven graph similarity, is calculated using a graph kernel. The semantic knowledge in the enriched graphsensures that the graph kernel goes beyond exact matching of terms and patternsto compute the semantic similarity of documents. In the experimentson sentiment classification and topic classification tasks, our knowledge-drivensimilarity measure significantly outperforms the baseline text similarity measureson five benchmark text classification datasets.
Original language | English |
---|---|
Pages (from-to) | 1-15 |
Number of pages | 15 |
Journal | International Journal of Machine Learning and Cybernetics |
Volume | 0 |
Early online date | 19 Nov 2020 |
DOIs | |
Publication status | Published online - 19 Nov 2020 |
Bibliographical note
Funding Information:The authors would like to acknowledge the support from Ulster University through the Vice Chancellor’s Research Scholarship (VCRS) Award.
Publisher Copyright:
© 2020, The Author(s).
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
Keywords
- automatic text classification
- document similarity measure
- graph-based text representation
- graph enrichment
- graph kernels
- supervised term weighting
- SVM