TY - JOUR
T1 - Knowledge-driven graph similarity for text classification
AU - Shanavas, Niloofer
AU - Wang, Hui
AU - Lin, Zhiwei
AU - Hawe, Glenn
N1 - Funding Information:
The authors would like to acknowledge the support from Ulster University through the Vice Chancellor’s Research Scholarship (VCRS) Award.
Publisher Copyright:
© 2020, The Author(s).
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/11/19
Y1 - 2020/11/19
N2 - Automatic text classification using machine learning is significantlyaffected by the text representation model. The structural information in textis necessary for natural language understanding, which is usually ignored invector-based representations. In this paper, we present a graph kernel-basedtext classification framework which utilises the structural information in texteffectively through the weighting and enrichment of a graph-based representation.We introduce weighted co-occurrence graphs to represent text documents,which weight the terms and their dependencies based on their relevance to textclassification. We propose a novel method to automatically enrich the weightedgraphs using semantic knowledge in the form of a word similarity matrix. Thesimilarity between enriched graphs, knowledge-driven graph similarity, is calculated using a graph kernel. The semantic knowledge in the enriched graphsensures that the graph kernel goes beyond exact matching of terms and patternsto compute the semantic similarity of documents. In the experimentson sentiment classification and topic classification tasks, our knowledge-drivensimilarity measure significantly outperforms the baseline text similarity measureson five benchmark text classification datasets.
AB - Automatic text classification using machine learning is significantlyaffected by the text representation model. The structural information in textis necessary for natural language understanding, which is usually ignored invector-based representations. In this paper, we present a graph kernel-basedtext classification framework which utilises the structural information in texteffectively through the weighting and enrichment of a graph-based representation.We introduce weighted co-occurrence graphs to represent text documents,which weight the terms and their dependencies based on their relevance to textclassification. We propose a novel method to automatically enrich the weightedgraphs using semantic knowledge in the form of a word similarity matrix. Thesimilarity between enriched graphs, knowledge-driven graph similarity, is calculated using a graph kernel. The semantic knowledge in the enriched graphsensures that the graph kernel goes beyond exact matching of terms and patternsto compute the semantic similarity of documents. In the experimentson sentiment classification and topic classification tasks, our knowledge-drivensimilarity measure significantly outperforms the baseline text similarity measureson five benchmark text classification datasets.
KW - automatic text classification
KW - document similarity measure
KW - graph-based text representation
KW - graph enrichment
KW - graph kernels
KW - supervised term weighting
KW - SVM
UR - https://pure.ulster.ac.uk/en/publications/knowledge-driven-graph-similarity-for-text-classification
UR - http://www.scopus.com/inward/record.url?scp=85096337911&partnerID=8YFLogxK
U2 - https://doi.org/10.1007/s13042-020-01221-4
DO - https://doi.org/10.1007/s13042-020-01221-4
M3 - Article
SP - 1
EP - 15
JO - International Journal of Machine Learning and Cybernetics
JF - International Journal of Machine Learning and Cybernetics
SN - 1868-8071
ER -