SOPHIA-TCBR: A knowledge discovery framework for textual case-based reasoning

David Patterson, Niall Rooney, Mykola Galushka, Vladimir Dobrynin, Elena Smirnova

    Research output: Contribution to journalArticle

    17 Citations (Scopus)

    Abstract

    In this paper, we present a novel textual case-based reasoning system called SOPHIA-TCBR which provides a means of clustering semantically related textual cases where individual clusters are formed through the discovery of narrow themes which then act as attractors for related cases. During this process, SOPHIA-TCBR automatically discovers appropriate case and similarity knowledge. It then is able to organize the cases within each cluster by forming a minimum spanning tree, based on their semantic similarity. SOPHIA’s capability as a case-based text classifier is benchmarked against the well known and widely utilised k-Means approach. Results show that SOPHIA either equals or outperforms k-Means based on 2 different case-bases, and as such is an attractive approach for case-based classification. We demonstrate the quality of the knowledge discovery process by showing the high level of topic similarity between adjacent cases within the minimum spanning tree. We show that the formation of the minimum spanning tree makes it possible to identify a kernel region within the cluster, which has a higher level of similarity between cases than the cluster in its entirety, and that this corresponds directly to a higher level of topic homogeneity. We demonstrate that the topic homogeneity increases as the average semantic similarity between cases in the kernel increases. Finally having empirically demonstrated the quality of the knowledge discovery process in SOPHIA, we show how it can be competently applied to case-based retrieval.
    LanguageEnglish
    Pages404-414
    JournalKnowledge-Based Systems
    Volume21
    Issue number5
    DOIs
    Publication statusPublished - Jul 2008

    Fingerprint

    Case based reasoning
    Data mining
    Semantics
    Classifiers
    Knowledge discovery
    Case-based reasoning
    Minimum spanning tree
    Semantic similarity
    Homogeneity
    Kernel
    K-means

    Cite this

    Patterson, D., Rooney, N., Galushka, M., Dobrynin, V., & Smirnova, E. (2008). SOPHIA-TCBR: A knowledge discovery framework for textual case-based reasoning. Knowledge-Based Systems, 21(5), 404-414. https://doi.org/10.1016/j.knosys.2008.02.006
    Patterson, David ; Rooney, Niall ; Galushka, Mykola ; Dobrynin, Vladimir ; Smirnova, Elena. / SOPHIA-TCBR: A knowledge discovery framework for textual case-based reasoning. In: Knowledge-Based Systems. 2008 ; Vol. 21, No. 5. pp. 404-414.
    @article{80e208f837094a39a64602c3450072cd,
    title = "SOPHIA-TCBR: A knowledge discovery framework for textual case-based reasoning",
    abstract = "In this paper, we present a novel textual case-based reasoning system called SOPHIA-TCBR which provides a means of clustering semantically related textual cases where individual clusters are formed through the discovery of narrow themes which then act as attractors for related cases. During this process, SOPHIA-TCBR automatically discovers appropriate case and similarity knowledge. It then is able to organize the cases within each cluster by forming a minimum spanning tree, based on their semantic similarity. SOPHIA’s capability as a case-based text classifier is benchmarked against the well known and widely utilised k-Means approach. Results show that SOPHIA either equals or outperforms k-Means based on 2 different case-bases, and as such is an attractive approach for case-based classification. We demonstrate the quality of the knowledge discovery process by showing the high level of topic similarity between adjacent cases within the minimum spanning tree. We show that the formation of the minimum spanning tree makes it possible to identify a kernel region within the cluster, which has a higher level of similarity between cases than the cluster in its entirety, and that this corresponds directly to a higher level of topic homogeneity. We demonstrate that the topic homogeneity increases as the average semantic similarity between cases in the kernel increases. Finally having empirically demonstrated the quality of the knowledge discovery process in SOPHIA, we show how it can be competently applied to case-based retrieval.",
    author = "David Patterson and Niall Rooney and Mykola Galushka and Vladimir Dobrynin and Elena Smirnova",
    year = "2008",
    month = "7",
    doi = "10.1016/j.knosys.2008.02.006",
    language = "English",
    volume = "21",
    pages = "404--414",
    journal = "Knowledge-Based Systems",
    issn = "0950-7051",
    publisher = "Elsevier",
    number = "5",

    }

    Patterson, D, Rooney, N, Galushka, M, Dobrynin, V & Smirnova, E 2008, 'SOPHIA-TCBR: A knowledge discovery framework for textual case-based reasoning', Knowledge-Based Systems, vol. 21, no. 5, pp. 404-414. https://doi.org/10.1016/j.knosys.2008.02.006

    SOPHIA-TCBR: A knowledge discovery framework for textual case-based reasoning. / Patterson, David; Rooney, Niall; Galushka, Mykola; Dobrynin, Vladimir; Smirnova, Elena.

    In: Knowledge-Based Systems, Vol. 21, No. 5, 07.2008, p. 404-414.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - SOPHIA-TCBR: A knowledge discovery framework for textual case-based reasoning

    AU - Patterson, David

    AU - Rooney, Niall

    AU - Galushka, Mykola

    AU - Dobrynin, Vladimir

    AU - Smirnova, Elena

    PY - 2008/7

    Y1 - 2008/7

    N2 - In this paper, we present a novel textual case-based reasoning system called SOPHIA-TCBR which provides a means of clustering semantically related textual cases where individual clusters are formed through the discovery of narrow themes which then act as attractors for related cases. During this process, SOPHIA-TCBR automatically discovers appropriate case and similarity knowledge. It then is able to organize the cases within each cluster by forming a minimum spanning tree, based on their semantic similarity. SOPHIA’s capability as a case-based text classifier is benchmarked against the well known and widely utilised k-Means approach. Results show that SOPHIA either equals or outperforms k-Means based on 2 different case-bases, and as such is an attractive approach for case-based classification. We demonstrate the quality of the knowledge discovery process by showing the high level of topic similarity between adjacent cases within the minimum spanning tree. We show that the formation of the minimum spanning tree makes it possible to identify a kernel region within the cluster, which has a higher level of similarity between cases than the cluster in its entirety, and that this corresponds directly to a higher level of topic homogeneity. We demonstrate that the topic homogeneity increases as the average semantic similarity between cases in the kernel increases. Finally having empirically demonstrated the quality of the knowledge discovery process in SOPHIA, we show how it can be competently applied to case-based retrieval.

    AB - In this paper, we present a novel textual case-based reasoning system called SOPHIA-TCBR which provides a means of clustering semantically related textual cases where individual clusters are formed through the discovery of narrow themes which then act as attractors for related cases. During this process, SOPHIA-TCBR automatically discovers appropriate case and similarity knowledge. It then is able to organize the cases within each cluster by forming a minimum spanning tree, based on their semantic similarity. SOPHIA’s capability as a case-based text classifier is benchmarked against the well known and widely utilised k-Means approach. Results show that SOPHIA either equals or outperforms k-Means based on 2 different case-bases, and as such is an attractive approach for case-based classification. We demonstrate the quality of the knowledge discovery process by showing the high level of topic similarity between adjacent cases within the minimum spanning tree. We show that the formation of the minimum spanning tree makes it possible to identify a kernel region within the cluster, which has a higher level of similarity between cases than the cluster in its entirety, and that this corresponds directly to a higher level of topic homogeneity. We demonstrate that the topic homogeneity increases as the average semantic similarity between cases in the kernel increases. Finally having empirically demonstrated the quality of the knowledge discovery process in SOPHIA, we show how it can be competently applied to case-based retrieval.

    U2 - 10.1016/j.knosys.2008.02.006

    DO - 10.1016/j.knosys.2008.02.006

    M3 - Article

    VL - 21

    SP - 404

    EP - 414

    JO - Knowledge-Based Systems

    T2 - Knowledge-Based Systems

    JF - Knowledge-Based Systems

    SN - 0950-7051

    IS - 5

    ER -