Knowledge discovery by probabilistic clustering of distributed databases

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Clustering of distributed databases facilitates knowledge discovery through learning of new concepts that characterise common features and differences between datasets. Hence, general patterns can be learned rather than restricting learning to specific databases from which rules may not be generalisable. We cluster databases that hold aggregate count data on categorical attributes that have been classified according to homogeneous or heterogeneous classification schemes. Clustering of datasets is carried out via the probability distributions that describe their respective aggregates. The homogeneous case is straightforward. For heterogeneous data we investigate a number of clustering strategies, of which the most efficient avoid the need to compute a dynamic shared ontology to homogenise the classification schemes prior to clustering.
LanguageEnglish
Pages189-210
JournalData and Knowledge Engineering
Volume54
Issue number2
DOIs
Publication statusPublished - Aug 2005

Fingerprint

Data mining
Probability distributions
Ontology

Keywords

  • Distributed databases
  • Probabilistic clustering
  • Aggregates
  • Dynamic shared ontology

Cite this

@article{de87a988040947c9badb73c802fc8408,
title = "Knowledge discovery by probabilistic clustering of distributed databases",
abstract = "Clustering of distributed databases facilitates knowledge discovery through learning of new concepts that characterise common features and differences between datasets. Hence, general patterns can be learned rather than restricting learning to specific databases from which rules may not be generalisable. We cluster databases that hold aggregate count data on categorical attributes that have been classified according to homogeneous or heterogeneous classification schemes. Clustering of datasets is carried out via the probability distributions that describe their respective aggregates. The homogeneous case is straightforward. For heterogeneous data we investigate a number of clustering strategies, of which the most efficient avoid the need to compute a dynamic shared ontology to homogenise the classification schemes prior to clustering.",
keywords = "Distributed databases, Probabilistic clustering, Aggregates, Dynamic shared ontology",
author = "SI McClean and BW Scotney and PJ Morrow and KRC Greer",
year = "2005",
month = "8",
doi = "10.1016/j.datak.2004.12.001",
language = "English",
volume = "54",
pages = "189--210",
journal = "Data and Knowledge Engineering",
issn = "0169-023X",
publisher = "Elsevier",
number = "2",

}

Knowledge discovery by probabilistic clustering of distributed databases. / McClean, SI; Scotney, BW; Morrow, PJ; Greer, KRC.

In: Data and Knowledge Engineering, Vol. 54, No. 2, 08.2005, p. 189-210.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Knowledge discovery by probabilistic clustering of distributed databases

AU - McClean, SI

AU - Scotney, BW

AU - Morrow, PJ

AU - Greer, KRC

PY - 2005/8

Y1 - 2005/8

N2 - Clustering of distributed databases facilitates knowledge discovery through learning of new concepts that characterise common features and differences between datasets. Hence, general patterns can be learned rather than restricting learning to specific databases from which rules may not be generalisable. We cluster databases that hold aggregate count data on categorical attributes that have been classified according to homogeneous or heterogeneous classification schemes. Clustering of datasets is carried out via the probability distributions that describe their respective aggregates. The homogeneous case is straightforward. For heterogeneous data we investigate a number of clustering strategies, of which the most efficient avoid the need to compute a dynamic shared ontology to homogenise the classification schemes prior to clustering.

AB - Clustering of distributed databases facilitates knowledge discovery through learning of new concepts that characterise common features and differences between datasets. Hence, general patterns can be learned rather than restricting learning to specific databases from which rules may not be generalisable. We cluster databases that hold aggregate count data on categorical attributes that have been classified according to homogeneous or heterogeneous classification schemes. Clustering of datasets is carried out via the probability distributions that describe their respective aggregates. The homogeneous case is straightforward. For heterogeneous data we investigate a number of clustering strategies, of which the most efficient avoid the need to compute a dynamic shared ontology to homogenise the classification schemes prior to clustering.

KW - Distributed databases

KW - Probabilistic clustering

KW - Aggregates

KW - Dynamic shared ontology

U2 - 10.1016/j.datak.2004.12.001

DO - 10.1016/j.datak.2004.12.001

M3 - Article

VL - 54

SP - 189

EP - 210

JO - Data and Knowledge Engineering

T2 - Data and Knowledge Engineering

JF - Data and Knowledge Engineering

SN - 0169-023X

IS - 2

ER -