Multiple Sets of Rules for Text Categorization

Y Bi, TJ Anderson, SI McClean

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

An important issue in text mining is how to make use of multiple pieces knowledge discovered to improve future decisions. In this paper, we propose a new approach to combining multiple sets of rules for text categorization using Dempster's rule of combination. We develop a boosting-like technique for generating multiple sets of rules based on rough set theory and model classification decisions from multiple sets of rules as pieces of evidence which can be combined by Dempster's rule of combination. We apply these methods to 10 of the 20-newsgroups--a benchmark data collection (Baker and McCallum 1998), individually and in combination. Our experimental results show that the performance of the best combination of the multiple sets of rules on the 10 groups of the benchmark data is statistically significant and better than that of the best single set of rules. The comparative analysis between the Dempster---Shafer and the majority voting (MV) methods along with an overfitting study confirm the advantage and the robustness of our approach.
LanguageEnglish
Title of host publicationUnknown Host Publication
Pages263-272
Number of pages10
DOIs
Publication statusPublished - Oct 2004
EventAdvances in Information Systems 2004 - Izmir, Turkey
Duration: 1 Oct 2004 → …

Conference

ConferenceAdvances in Information Systems 2004
Period1/10/04 → …

Fingerprint

Rough set theory

Cite this

Bi, Y ; Anderson, TJ ; McClean, SI. / Multiple Sets of Rules for Text Categorization. Unknown Host Publication. 2004. pp. 263-272
@inproceedings{db2c95ede13f49d8a317b3f28bcf3b6d,
title = "Multiple Sets of Rules for Text Categorization",
abstract = "An important issue in text mining is how to make use of multiple pieces knowledge discovered to improve future decisions. In this paper, we propose a new approach to combining multiple sets of rules for text categorization using Dempster's rule of combination. We develop a boosting-like technique for generating multiple sets of rules based on rough set theory and model classification decisions from multiple sets of rules as pieces of evidence which can be combined by Dempster's rule of combination. We apply these methods to 10 of the 20-newsgroups--a benchmark data collection (Baker and McCallum 1998), individually and in combination. Our experimental results show that the performance of the best combination of the multiple sets of rules on the 10 groups of the benchmark data is statistically significant and better than that of the best single set of rules. The comparative analysis between the Dempster---Shafer and the majority voting (MV) methods along with an overfitting study confirm the advantage and the robustness of our approach.",
author = "Y Bi and TJ Anderson and SI McClean",
year = "2004",
month = "10",
doi = "10.1007/s10462-007-9049-y",
language = "English",
pages = "263--272",
booktitle = "Unknown Host Publication",

}

Bi, Y, Anderson, TJ & McClean, SI 2004, Multiple Sets of Rules for Text Categorization. in Unknown Host Publication. pp. 263-272, Advances in Information Systems 2004, 1/10/04. https://doi.org/10.1007/s10462-007-9049-y

Multiple Sets of Rules for Text Categorization. / Bi, Y; Anderson, TJ; McClean, SI.

Unknown Host Publication. 2004. p. 263-272.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Multiple Sets of Rules for Text Categorization

AU - Bi, Y

AU - Anderson, TJ

AU - McClean, SI

PY - 2004/10

Y1 - 2004/10

N2 - An important issue in text mining is how to make use of multiple pieces knowledge discovered to improve future decisions. In this paper, we propose a new approach to combining multiple sets of rules for text categorization using Dempster's rule of combination. We develop a boosting-like technique for generating multiple sets of rules based on rough set theory and model classification decisions from multiple sets of rules as pieces of evidence which can be combined by Dempster's rule of combination. We apply these methods to 10 of the 20-newsgroups--a benchmark data collection (Baker and McCallum 1998), individually and in combination. Our experimental results show that the performance of the best combination of the multiple sets of rules on the 10 groups of the benchmark data is statistically significant and better than that of the best single set of rules. The comparative analysis between the Dempster---Shafer and the majority voting (MV) methods along with an overfitting study confirm the advantage and the robustness of our approach.

AB - An important issue in text mining is how to make use of multiple pieces knowledge discovered to improve future decisions. In this paper, we propose a new approach to combining multiple sets of rules for text categorization using Dempster's rule of combination. We develop a boosting-like technique for generating multiple sets of rules based on rough set theory and model classification decisions from multiple sets of rules as pieces of evidence which can be combined by Dempster's rule of combination. We apply these methods to 10 of the 20-newsgroups--a benchmark data collection (Baker and McCallum 1998), individually and in combination. Our experimental results show that the performance of the best combination of the multiple sets of rules on the 10 groups of the benchmark data is statistically significant and better than that of the best single set of rules. The comparative analysis between the Dempster---Shafer and the majority voting (MV) methods along with an overfitting study confirm the advantage and the robustness of our approach.

U2 - 10.1007/s10462-007-9049-y

DO - 10.1007/s10462-007-9049-y

M3 - Conference contribution

SP - 263

EP - 272

BT - Unknown Host Publication

ER -