Structure and Majority Classes in Decision Tree Learning

RJ Hickey

    Research output: Contribution to journalArticlepeer-review

    4 Citations (Scopus)

    Abstract

    To provide good classification accuracy on unseen examples, a decision tree, learned by an algorithm such as ID3, must have sufficient structure and also identify the correct majority class in each of its leaves. If there are inadequacies in respect of either of these, the tree will have a percentage classification rate below that of the maximum possible for the domain, namely (100 - Bayes error rate). An error decomposition is introduced which enables the relative contributions of deficiencies in structure and in incorrect determination of majority class to be isolated and quantified. A sub-decomposition of majority class error permits separation of the sampling error at the leaves from the possible bias introduced by the attribute selection method of the induction algorithm. It is shown that sampling error can extend to 25% when there are more than two classes. Decompositions are obtained from experiments on several data sets. For ID3, the effect of selection bias is shown to vary from being statistically non-significant to being quite substantial, with the latter appearing to be associated with a simple underlying model.
    Original languageEnglish
    Pages (from-to)1747-1768
    JournalJournal of Machine Learning Research
    Volume8
    Publication statusPublished (in print/issue) - Aug 2007

    Fingerprint

    Dive into the research topics of 'Structure and Majority Classes in Decision Tree Learning'. Together they form a unique fingerprint.

    Cite this