Knowledge-Driven Graph Similarity for Text Classification

Niloofer Shanavas, Hui Wang, Zhiwei Lin, Glenn Hawe

Research output: Contribution to journalArticle


Automatic text classification using machine learning is significantly
affected by the text representation model. The structural information in text
is necessary for natural language understanding, which is usually ignored in
vector-based representations. In this paper, we present a graph kernel-based
text classification framework which utilises the structural information in text
effectively through the weighting and enrichment of a graph-based representation.
We introduce weighted co-occurrence graphs to represent text documents,
which weight the terms and their dependencies based on their relevance to text
classification. We propose a novel method to automatically enrich the weighted
graphs using semantic knowledge in the form of a word similarity matrix. The
similarity between enriched graphs, knowledge-driven graph similarity, is calculated using a graph kernel. The semantic knowledge in the enriched graphs
ensures that the graph kernel goes beyond exact matching of terms and patterns
to compute the semantic similarity of documents. In the experiments
on sentiment classification and topic classification tasks, our knowledge-driven
similarity measure significantly outperforms the baseline text similarity measures
on five benchmark text classification datasets.
Original languageEnglish
Number of pages27
JournalInternational Journal of Machine Learning and Cybernetics
Publication statusAccepted/In press - 3 Oct 2020

Fingerprint Dive into the research topics of 'Knowledge-Driven Graph Similarity for Text Classification'. Together they form a unique fingerprint.

Cite this