Integrating semantically heterogeneous aggregate views of distributed databases

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

In statistical databases and data warehousing applications it is commonly the case that aggregate views are maintained as an underlying mechanism for summarising information. Where the databases or applications are distributed, or arise from independent data collections or system developments, there may be incompatibility, heterogeneity, and data inconsistency. These challenges need to be overcome if federations of aggregated databases are to be successfully incorporated into systems for database management, querying, retrieval, and knowledge discovery. In this paper we address the issue of integrating aggregate views that have semantically heterogeneous classification schemes. In previous work we have developed a methodology that is efficient but that cannot easily handle data inconsistencies. Our previous approach is therefore not particularly well suited to very large databases or federations of large numbers of databases. We now address these scalability issues by introducing a methodology for heterogeneous aggregate view integration that constructs a dynamic shared ontology to which each of the aggregate views can be explicitly related. A maximum likelihood technique, implemented using the EM (Expectation Maximisation) algorithm, is used to inherently handle data inconsistencies in the computation of integrated aggregates that are described in terms of the dynamic shared ontology.
LanguageEnglish
Pages73-94
JournalDistributed and Parallel Databases
Volume24
Issue number1-3
DOIs
Publication statusPublished - Dec 2008

Fingerprint

Ontology
Data warehouses
Maximum likelihood
Data mining
Scalability

Keywords

  • Distributed databases
  • Aggregate views
  • Heterogeneous data
  • Dynamic shared ontologies

Cite this

@article{ff792e02109b428c88273bd0c4447ca2,
title = "Integrating semantically heterogeneous aggregate views of distributed databases",
abstract = "In statistical databases and data warehousing applications it is commonly the case that aggregate views are maintained as an underlying mechanism for summarising information. Where the databases or applications are distributed, or arise from independent data collections or system developments, there may be incompatibility, heterogeneity, and data inconsistency. These challenges need to be overcome if federations of aggregated databases are to be successfully incorporated into systems for database management, querying, retrieval, and knowledge discovery. In this paper we address the issue of integrating aggregate views that have semantically heterogeneous classification schemes. In previous work we have developed a methodology that is efficient but that cannot easily handle data inconsistencies. Our previous approach is therefore not particularly well suited to very large databases or federations of large numbers of databases. We now address these scalability issues by introducing a methodology for heterogeneous aggregate view integration that constructs a dynamic shared ontology to which each of the aggregate views can be explicitly related. A maximum likelihood technique, implemented using the EM (Expectation Maximisation) algorithm, is used to inherently handle data inconsistencies in the computation of integrated aggregates that are described in terms of the dynamic shared ontology.",
keywords = "Distributed databases, Aggregate views, Heterogeneous data, Dynamic shared ontologies",
author = "SI McClean and BW Scotney and PJ Morrow and KRC Greer",
year = "2008",
month = "12",
doi = "10.1007/s10619-008-7031-6",
language = "English",
volume = "24",
pages = "73--94",
journal = "Distributed and Parallel Databases",
issn = "0926-8782",
number = "1-3",

}

Integrating semantically heterogeneous aggregate views of distributed databases. / McClean, SI; Scotney, BW; Morrow, PJ; Greer, KRC.

In: Distributed and Parallel Databases, Vol. 24, No. 1-3, 12.2008, p. 73-94.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Integrating semantically heterogeneous aggregate views of distributed databases

AU - McClean, SI

AU - Scotney, BW

AU - Morrow, PJ

AU - Greer, KRC

PY - 2008/12

Y1 - 2008/12

N2 - In statistical databases and data warehousing applications it is commonly the case that aggregate views are maintained as an underlying mechanism for summarising information. Where the databases or applications are distributed, or arise from independent data collections or system developments, there may be incompatibility, heterogeneity, and data inconsistency. These challenges need to be overcome if federations of aggregated databases are to be successfully incorporated into systems for database management, querying, retrieval, and knowledge discovery. In this paper we address the issue of integrating aggregate views that have semantically heterogeneous classification schemes. In previous work we have developed a methodology that is efficient but that cannot easily handle data inconsistencies. Our previous approach is therefore not particularly well suited to very large databases or federations of large numbers of databases. We now address these scalability issues by introducing a methodology for heterogeneous aggregate view integration that constructs a dynamic shared ontology to which each of the aggregate views can be explicitly related. A maximum likelihood technique, implemented using the EM (Expectation Maximisation) algorithm, is used to inherently handle data inconsistencies in the computation of integrated aggregates that are described in terms of the dynamic shared ontology.

AB - In statistical databases and data warehousing applications it is commonly the case that aggregate views are maintained as an underlying mechanism for summarising information. Where the databases or applications are distributed, or arise from independent data collections or system developments, there may be incompatibility, heterogeneity, and data inconsistency. These challenges need to be overcome if federations of aggregated databases are to be successfully incorporated into systems for database management, querying, retrieval, and knowledge discovery. In this paper we address the issue of integrating aggregate views that have semantically heterogeneous classification schemes. In previous work we have developed a methodology that is efficient but that cannot easily handle data inconsistencies. Our previous approach is therefore not particularly well suited to very large databases or federations of large numbers of databases. We now address these scalability issues by introducing a methodology for heterogeneous aggregate view integration that constructs a dynamic shared ontology to which each of the aggregate views can be explicitly related. A maximum likelihood technique, implemented using the EM (Expectation Maximisation) algorithm, is used to inherently handle data inconsistencies in the computation of integrated aggregates that are described in terms of the dynamic shared ontology.

KW - Distributed databases

KW - Aggregate views

KW - Heterogeneous data

KW - Dynamic shared ontologies

UR - http://www.springerlink.com/content/w6x83p732036u73t/

U2 - 10.1007/s10619-008-7031-6

DO - 10.1007/s10619-008-7031-6

M3 - Article

VL - 24

SP - 73

EP - 94

JO - Distributed and Parallel Databases

T2 - Distributed and Parallel Databases

JF - Distributed and Parallel Databases

SN - 0926-8782

IS - 1-3

ER -