Knowledge-driven graph similarity for text classification

Niloofer Shanavas, Hui Wang, Zhiwei Lin, Glenn Hawe

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)
149 Downloads (Pure)

Abstract

Automatic text classification using machine learning is significantlyaffected by the text representation model. The structural information in textis necessary for natural language understanding, which is usually ignored invector-based representations. In this paper, we present a graph kernel-basedtext classification framework which utilises the structural information in texteffectively through the weighting and enrichment of a graph-based representation.We introduce weighted co-occurrence graphs to represent text documents,which weight the terms and their dependencies based on their relevance to textclassification. We propose a novel method to automatically enrich the weightedgraphs using semantic knowledge in the form of a word similarity matrix. Thesimilarity between enriched graphs, knowledge-driven graph similarity, is calculated using a graph kernel. The semantic knowledge in the enriched graphsensures that the graph kernel goes beyond exact matching of terms and patternsto compute the semantic similarity of documents. In the experimentson sentiment classification and topic classification tasks, our knowledge-drivensimilarity measure significantly outperforms the baseline text similarity measureson five benchmark text classification datasets.
Original languageEnglish
Pages (from-to)1-15
Number of pages15
JournalInternational Journal of Machine Learning and Cybernetics
Volume0
Early online date19 Nov 2020
DOIs
Publication statusPublished online - 19 Nov 2020

Bibliographical note

Funding Information:
The authors would like to acknowledge the support from Ulster University through the Vice Chancellor’s Research Scholarship (VCRS) Award.

Publisher Copyright:
© 2020, The Author(s).

Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

Keywords

  • automatic text classification
  • document similarity measure
  • graph-based text representation
  • graph enrichment
  • graph kernels
  • supervised term weighting
  • SVM

Fingerprint

Dive into the research topics of 'Knowledge-driven graph similarity for text classification'. Together they form a unique fingerprint.

Cite this