Classification decision combination for text categorization: An experimental study

YX Bi, D Bell, H Wang, GD Guo, Werner Dubitzky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This study investigates the combination of four different classification methods for text categorization through experimental comparisons. These methods include the Support Vector Machine, kNN (nearest neighbours), kNN model-based approach (kNNM), and Rocchio methods. We first review these learning methods and the method for combining the classifiers, and then present some experimental results on a benchmark data collection of 20-newsgroup with an emphasis of average group performance - looking at the effectiveness of combining multiple classifiers on each category. In an attempt to see why the combination of the best and the second best classifiers can achieve better performance, we propose an empirical measure called closeness as a basis of our experiments. Based on our empirical study, we verify the hypothesis that when a classifier has the high closeness to the best classifier, their combination can achieve the better performance.
Original languageEnglish
Title of host publicationUnknown Host Publication
Place of PublicationHEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY
Pages222-231
Number of pages10
Publication statusPublished - 2004
EventDATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS - Zaragoza, Spain
Duration: 1 Jan 2004 → …

Publication series

NameLECTURE NOTES IN COMPUTER SCIENCE

Conference

ConferenceDATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS
Period1/01/04 → …

Fingerprint

Classifiers
Support vector machines
Experiments

Cite this

Bi, YX., Bell, D., Wang, H., Guo, GD., & Dubitzky, W. (2004). Classification decision combination for text categorization: An experimental study. In Unknown Host Publication (pp. 222-231). (LECTURE NOTES IN COMPUTER SCIENCE). HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY.
Bi, YX ; Bell, D ; Wang, H ; Guo, GD ; Dubitzky, Werner. / Classification decision combination for text categorization: An experimental study. Unknown Host Publication. HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY, 2004. pp. 222-231 (LECTURE NOTES IN COMPUTER SCIENCE).
@inproceedings{d2b0bf7f82894ee69ef10731bf275aa1,
title = "Classification decision combination for text categorization: An experimental study",
abstract = "This study investigates the combination of four different classification methods for text categorization through experimental comparisons. These methods include the Support Vector Machine, kNN (nearest neighbours), kNN model-based approach (kNNM), and Rocchio methods. We first review these learning methods and the method for combining the classifiers, and then present some experimental results on a benchmark data collection of 20-newsgroup with an emphasis of average group performance - looking at the effectiveness of combining multiple classifiers on each category. In an attempt to see why the combination of the best and the second best classifiers can achieve better performance, we propose an empirical measure called closeness as a basis of our experiments. Based on our empirical study, we verify the hypothesis that when a classifier has the high closeness to the best classifier, their combination can achieve the better performance.",
author = "YX Bi and D Bell and H Wang and GD Guo and Werner Dubitzky",
note = "15th International Conference on Database and Expert Systems Applications (DEXA 2004), Zaragoza, SPAIN, AUG 30-SEP 03, 2004",
year = "2004",
language = "English",
series = "LECTURE NOTES IN COMPUTER SCIENCE",
pages = "222--231",
booktitle = "Unknown Host Publication",

}

Bi, YX, Bell, D, Wang, H, Guo, GD & Dubitzky, W 2004, Classification decision combination for text categorization: An experimental study. in Unknown Host Publication. LECTURE NOTES IN COMPUTER SCIENCE, HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY, pp. 222-231, DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 1/01/04.

Classification decision combination for text categorization: An experimental study. / Bi, YX; Bell, D; Wang, H; Guo, GD; Dubitzky, Werner.

Unknown Host Publication. HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY, 2004. p. 222-231 (LECTURE NOTES IN COMPUTER SCIENCE).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Classification decision combination for text categorization: An experimental study

AU - Bi, YX

AU - Bell, D

AU - Wang, H

AU - Guo, GD

AU - Dubitzky, Werner

N1 - 15th International Conference on Database and Expert Systems Applications (DEXA 2004), Zaragoza, SPAIN, AUG 30-SEP 03, 2004

PY - 2004

Y1 - 2004

N2 - This study investigates the combination of four different classification methods for text categorization through experimental comparisons. These methods include the Support Vector Machine, kNN (nearest neighbours), kNN model-based approach (kNNM), and Rocchio methods. We first review these learning methods and the method for combining the classifiers, and then present some experimental results on a benchmark data collection of 20-newsgroup with an emphasis of average group performance - looking at the effectiveness of combining multiple classifiers on each category. In an attempt to see why the combination of the best and the second best classifiers can achieve better performance, we propose an empirical measure called closeness as a basis of our experiments. Based on our empirical study, we verify the hypothesis that when a classifier has the high closeness to the best classifier, their combination can achieve the better performance.

AB - This study investigates the combination of four different classification methods for text categorization through experimental comparisons. These methods include the Support Vector Machine, kNN (nearest neighbours), kNN model-based approach (kNNM), and Rocchio methods. We first review these learning methods and the method for combining the classifiers, and then present some experimental results on a benchmark data collection of 20-newsgroup with an emphasis of average group performance - looking at the effectiveness of combining multiple classifiers on each category. In an attempt to see why the combination of the best and the second best classifiers can achieve better performance, we propose an empirical measure called closeness as a basis of our experiments. Based on our empirical study, we verify the hypothesis that when a classifier has the high closeness to the best classifier, their combination can achieve the better performance.

M3 - Conference contribution

T3 - LECTURE NOTES IN COMPUTER SCIENCE

SP - 222

EP - 231

BT - Unknown Host Publication

CY - HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY

ER -

Bi YX, Bell D, Wang H, Guo GD, Dubitzky W. Classification decision combination for text categorization: An experimental study. In Unknown Host Publication. HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY. 2004. p. 222-231. (LECTURE NOTES IN COMPUTER SCIENCE).