Structure and Majority Classes in Decision Tree Learning

RJ Hickey

    Research output: Contribution to journalArticle

    4 Citations (Scopus)

    Abstract

    To provide good classification accuracy on unseen examples, a decision tree, learned by an algorithm such as ID3, must have sufficient structure and also identify the correct majority class in each of its leaves. If there are inadequacies in respect of either of these, the tree will have a percentage classification rate below that of the maximum possible for the domain, namely (100 - Bayes error rate). An error decomposition is introduced which enables the relative contributions of deficiencies in structure and in incorrect determination of majority class to be isolated and quantified. A sub-decomposition of majority class error permits separation of the sampling error at the leaves from the possible bias introduced by the attribute selection method of the induction algorithm. It is shown that sampling error can extend to 25% when there are more than two classes. Decompositions are obtained from experiments on several data sets. For ID3, the effect of selection bias is shown to vary from being statistically non-significant to being quite substantial, with the latter appearing to be associated with a simple underlying model.
    LanguageEnglish
    Pages1747-1768
    JournalJournal of Machine Learning Research
    Volume8
    Publication statusPublished - Aug 2007

    Fingerprint

    Decision trees
    Decision tree
    Decompose
    Leaves
    Decomposition
    Selection Bias
    Sampling
    Bayes
    Error Rate
    Percentage
    Proof by induction
    Attribute
    Vary
    Sufficient
    Class
    Learning
    Experiment
    Experiments
    Model

    Cite this

    @article{9755bd6cd8a14b97a5d5449ce4cd89ed,
    title = "Structure and Majority Classes in Decision Tree Learning",
    abstract = "To provide good classification accuracy on unseen examples, a decision tree, learned by an algorithm such as ID3, must have sufficient structure and also identify the correct majority class in each of its leaves. If there are inadequacies in respect of either of these, the tree will have a percentage classification rate below that of the maximum possible for the domain, namely (100 - Bayes error rate). An error decomposition is introduced which enables the relative contributions of deficiencies in structure and in incorrect determination of majority class to be isolated and quantified. A sub-decomposition of majority class error permits separation of the sampling error at the leaves from the possible bias introduced by the attribute selection method of the induction algorithm. It is shown that sampling error can extend to 25{\%} when there are more than two classes. Decompositions are obtained from experiments on several data sets. For ID3, the effect of selection bias is shown to vary from being statistically non-significant to being quite substantial, with the latter appearing to be associated with a simple underlying model.",
    author = "RJ Hickey",
    year = "2007",
    month = "8",
    language = "English",
    volume = "8",
    pages = "1747--1768",
    journal = "Journal of Machine Learning Research",
    issn = "1532-4435",

    }

    Structure and Majority Classes in Decision Tree Learning. / Hickey, RJ.

    In: Journal of Machine Learning Research, Vol. 8, 08.2007, p. 1747-1768.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - Structure and Majority Classes in Decision Tree Learning

    AU - Hickey, RJ

    PY - 2007/8

    Y1 - 2007/8

    N2 - To provide good classification accuracy on unseen examples, a decision tree, learned by an algorithm such as ID3, must have sufficient structure and also identify the correct majority class in each of its leaves. If there are inadequacies in respect of either of these, the tree will have a percentage classification rate below that of the maximum possible for the domain, namely (100 - Bayes error rate). An error decomposition is introduced which enables the relative contributions of deficiencies in structure and in incorrect determination of majority class to be isolated and quantified. A sub-decomposition of majority class error permits separation of the sampling error at the leaves from the possible bias introduced by the attribute selection method of the induction algorithm. It is shown that sampling error can extend to 25% when there are more than two classes. Decompositions are obtained from experiments on several data sets. For ID3, the effect of selection bias is shown to vary from being statistically non-significant to being quite substantial, with the latter appearing to be associated with a simple underlying model.

    AB - To provide good classification accuracy on unseen examples, a decision tree, learned by an algorithm such as ID3, must have sufficient structure and also identify the correct majority class in each of its leaves. If there are inadequacies in respect of either of these, the tree will have a percentage classification rate below that of the maximum possible for the domain, namely (100 - Bayes error rate). An error decomposition is introduced which enables the relative contributions of deficiencies in structure and in incorrect determination of majority class to be isolated and quantified. A sub-decomposition of majority class error permits separation of the sampling error at the leaves from the possible bias introduced by the attribute selection method of the induction algorithm. It is shown that sampling error can extend to 25% when there are more than two classes. Decompositions are obtained from experiments on several data sets. For ID3, the effect of selection bias is shown to vary from being statistically non-significant to being quite substantial, with the latter appearing to be associated with a simple underlying model.

    UR - http://jmlr.csail.mit.edu/

    M3 - Article

    VL - 8

    SP - 1747

    EP - 1768

    JO - Journal of Machine Learning Research

    T2 - Journal of Machine Learning Research

    JF - Journal of Machine Learning Research

    SN - 1532-4435

    ER -