Centrality-based approach for supervised term weighting

  • Niloofer Shanavas
  • , Glenn Hawe
  • , Hui Wang
  • , Zhiwei Lin

Research output: Contribution to conferencePaperpeer-review

5 Citations (Scopus)

Abstract

The huge amount of text documents has made the manual organization of text data a tedious task. Automatic text classification helps to easily handle the large number of documents by organising them automatically into predefined classes. The effectiveness and efficiency of automatic text classification largely depends on the way text documents are represented. A text document is usually viewed as a bag of terms (or words) and represented as a vector using the vector space model where terms are assumed unordered and independent and term frequencies (or weights) are used in the representation. Graphs are another text representation scheme that considers the structure of terms in the text document which is important for natural language. Terms weighted on the basis of graph representation increase the performance of text classification. In this paper, we present a novel approach for graph-based supervised term weighting which considers information relevant for the classification task using node centrality in the co-occurrence graphs built from the labelled training documents. Our experimental evaluation of the proposed term weighting scheme on four benchmark datasets shows the scheme has consistently superior performance over the state-of-the-art term weighting methods for text classification
Original languageEnglish
Pages1261-1268
DOIs
Publication statusPublished (in print/issue) - 2 Feb 2017
Event 2016 IEEE 16th International Conference on Data Mining Workshops - Barcelona, Spain
Duration: 12 Dec 201515 Dec 2015

Conference

Conference 2016 IEEE 16th International Conference on Data Mining Workshops
Country/TerritorySpain
CityBarcelona
Period12/12/1515/12/15

Fingerprint

Dive into the research topics of 'Centrality-based approach for supervised term weighting'. Together they form a unique fingerprint.

Cite this