An kNN Model-Based Approach and Its Application in Text Categorization. CICLing 2004: 559-570

Gongde Guo, Hui Wang, David A. Bell, Yaxin Bi, Kieran Greer

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

An investigation has been conducted on two well known similarity-based learning approaches to text categorization. This includes the k-nearest neighbor (k-NN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, we propose a new classifier called the kNN model-based classifier by unifying the strengths of k-NN and Rocchio classifier and adapting to characteristics of text categorization problems.A text categorization prototypes system has been implemented and then evaluated on two common document corpora, namely, the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the kNN model-based approach outperforms the k-NN, Rocchio classifier.
LanguageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science
Pages559-570
Publication statusPublished - 2004

Fingerprint

Classifiers

Cite this

Guo, G., Wang, H., Bell, D. A., Bi, Y., & Greer, K. (2004). An kNN Model-Based Approach and Its Application in Text Categorization. CICLing 2004: 559-570. In Computational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science (pp. 559-570)
Guo, Gongde ; Wang, Hui ; Bell, David A. ; Bi, Yaxin ; Greer, Kieran. / An kNN Model-Based Approach and Its Application in Text Categorization. CICLing 2004: 559-570. Computational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science. 2004. pp. 559-570
@inbook{5d9d1ac5b87a4eefa2807839c58feb14,
title = "An kNN Model-Based Approach and Its Application in Text Categorization. CICLing 2004: 559-570",
abstract = "An investigation has been conducted on two well known similarity-based learning approaches to text categorization. This includes the k-nearest neighbor (k-NN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, we propose a new classifier called the kNN model-based classifier by unifying the strengths of k-NN and Rocchio classifier and adapting to characteristics of text categorization problems.A text categorization prototypes system has been implemented and then evaluated on two common document corpora, namely, the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the kNN model-based approach outperforms the k-NN, Rocchio classifier.",
author = "Gongde Guo and Hui Wang and Bell, {David A.} and Yaxin Bi and Kieran Greer",
year = "2004",
language = "English",
isbn = "978-3-540-21006-1",
pages = "559--570",
booktitle = "Computational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science",

}

Guo, G, Wang, H, Bell, DA, Bi, Y & Greer, K 2004, An kNN Model-Based Approach and Its Application in Text Categorization. CICLing 2004: 559-570. in Computational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science. pp. 559-570.

An kNN Model-Based Approach and Its Application in Text Categorization. CICLing 2004: 559-570. / Guo, Gongde; Wang, Hui; Bell, David A.; Bi, Yaxin; Greer, Kieran.

Computational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science. 2004. p. 559-570.

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - An kNN Model-Based Approach and Its Application in Text Categorization. CICLing 2004: 559-570

AU - Guo, Gongde

AU - Wang, Hui

AU - Bell, David A.

AU - Bi, Yaxin

AU - Greer, Kieran

PY - 2004

Y1 - 2004

N2 - An investigation has been conducted on two well known similarity-based learning approaches to text categorization. This includes the k-nearest neighbor (k-NN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, we propose a new classifier called the kNN model-based classifier by unifying the strengths of k-NN and Rocchio classifier and adapting to characteristics of text categorization problems.A text categorization prototypes system has been implemented and then evaluated on two common document corpora, namely, the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the kNN model-based approach outperforms the k-NN, Rocchio classifier.

AB - An investigation has been conducted on two well known similarity-based learning approaches to text categorization. This includes the k-nearest neighbor (k-NN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, we propose a new classifier called the kNN model-based classifier by unifying the strengths of k-NN and Rocchio classifier and adapting to characteristics of text categorization problems.A text categorization prototypes system has been implemented and then evaluated on two common document corpora, namely, the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the kNN model-based approach outperforms the k-NN, Rocchio classifier.

M3 - Chapter

SN - 978-3-540-21006-1

SP - 559

EP - 570

BT - Computational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science

ER -

Guo G, Wang H, Bell DA, Bi Y, Greer K. An kNN Model-Based Approach and Its Application in Text Categorization. CICLing 2004: 559-570. In Computational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science. 2004. p. 559-570