A Scalable Approach to Integrating Heterogeneous Aggregate Views of Distributed Databases

SI McClean; BW Scotney; KRC Greer

doi:10.1109/TKDE.2003.1161592

A Scalable Approach to Integrating Heterogeneous Aggregate Views of Distributed Databases

SI McClean, BW Scotney, KRC Greer

Research output: Contribution to journal › Article › peer-review

16 Citations (Scopus)

Abstract

Aggregate views are commonly used for summarizing information held in very large databases such as those encountered in data warehousing, large scale transaction management, and statistical databases. Such applications often involve distributed databases that have developed independently and therefore may exhibit incompatibility, heterogeneity, and data inconsistency. We are here concerned with the integration of aggregates that have heterogeneous classification schemes where local ontologies, in the form of such classification schemes, may be mapped onto a common ontology. In previous work, we have developed a method for the integration of such aggregates; the method previously developed is efficient, but cannot handle innate data inconsistencies that are likely to arise when a large number of databases are being integrated. In this paper, we develop an approach that can handle data inconsistencies and is thus inherently much more scalable. In our new approach, we first construct a dynamic shared ontology by analyzing the correspondence graph that relates the heterogeneous classification schemes; the aggregates are then derived by minimization of the Kullback-Leibler information divergence using the EM (Expectation-Maximization) algorithm. Thus, we may assess whether global queries on such aggregates are answerable, partially answerable, or unanswerable in advance of computing the aggregates themselves.

Original language	English
Pages (from-to)	232-236
Journal	IEEE Transactions on Knowledge and Data Engineering
Volume	15
Issue number	1
DOIs	https://doi.org/10.1109/TKDE.2003.1161592
Publication status	Published (in print/issue) - 1 Jan 2003

Bibliographical note

Other Details
------------------------------------
This paper describes a scalable and efficient methodology for integrating aggregates from heterogeneous databases distributed over the Internet. The focus is on computing a dynamic shared ontology that is used as a framework for integration using mobile agents. The approach was developed and implemented as part of the EU-IST MISSION project and is ongoing work within the Information and Software Engineering research group in the area of distributed database management and knowledge discovery. The distributed data management concepts developed in this paper are currently being used in the SAP funded PERSERVE project, which is concerned with Service Oriented Architectures.

Access to Document

10.1109/TKDE.2003.1161592

Cite this

@article{951ac0a99e23482c9f2303b4602a0bfb,

title = "A Scalable Approach to Integrating Heterogeneous Aggregate Views of Distributed Databases",

abstract = "Aggregate views are commonly used for summarizing information held in very large databases such as those encountered in data warehousing, large scale transaction management, and statistical databases. Such applications often involve distributed databases that have developed independently and therefore may exhibit incompatibility, heterogeneity, and data inconsistency. We are here concerned with the integration of aggregates that have heterogeneous classification schemes where local ontologies, in the form of such classification schemes, may be mapped onto a common ontology. In previous work, we have developed a method for the integration of such aggregates; the method previously developed is efficient, but cannot handle innate data inconsistencies that are likely to arise when a large number of databases are being integrated. In this paper, we develop an approach that can handle data inconsistencies and is thus inherently much more scalable. In our new approach, we first construct a dynamic shared ontology by analyzing the correspondence graph that relates the heterogeneous classification schemes; the aggregates are then derived by minimization of the Kullback-Leibler information divergence using the EM (Expectation-Maximization) algorithm. Thus, we may assess whether global queries on such aggregates are answerable, partially answerable, or unanswerable in advance of computing the aggregates themselves.",

author = "SI McClean and BW Scotney and KRC Greer",

note = "Other Details ------------------------------------ This paper describes a scalable and efficient methodology for integrating aggregates from heterogeneous databases distributed over the Internet. The focus is on computing a dynamic shared ontology that is used as a framework for integration using mobile agents. The approach was developed and implemented as part of the EU-IST MISSION project and is ongoing work within the Information and Software Engineering research group in the area of distributed database management and knowledge discovery. The distributed data management concepts developed in this paper are currently being used in the SAP funded PERSERVE project, which is concerned with Service Oriented Architectures.",

year = "2003",

month = jan,

day = "1",

doi = "10.1109/TKDE.2003.1161592",

language = "English",

volume = "15",

pages = "232--236",

journal = "IEEE Transactions on Knowledge and Data Engineering",

publisher = "IEEE",

number = "1",

}

TY - JOUR

T1 - A Scalable Approach to Integrating Heterogeneous Aggregate Views of Distributed Databases

AU - McClean, SI

AU - Scotney, BW

AU - Greer, KRC

N1 - Other Details ------------------------------------ This paper describes a scalable and efficient methodology for integrating aggregates from heterogeneous databases distributed over the Internet. The focus is on computing a dynamic shared ontology that is used as a framework for integration using mobile agents. The approach was developed and implemented as part of the EU-IST MISSION project and is ongoing work within the Information and Software Engineering research group in the area of distributed database management and knowledge discovery. The distributed data management concepts developed in this paper are currently being used in the SAP funded PERSERVE project, which is concerned with Service Oriented Architectures.

PY - 2003/1/1

Y1 - 2003/1/1

N2 - Aggregate views are commonly used for summarizing information held in very large databases such as those encountered in data warehousing, large scale transaction management, and statistical databases. Such applications often involve distributed databases that have developed independently and therefore may exhibit incompatibility, heterogeneity, and data inconsistency. We are here concerned with the integration of aggregates that have heterogeneous classification schemes where local ontologies, in the form of such classification schemes, may be mapped onto a common ontology. In previous work, we have developed a method for the integration of such aggregates; the method previously developed is efficient, but cannot handle innate data inconsistencies that are likely to arise when a large number of databases are being integrated. In this paper, we develop an approach that can handle data inconsistencies and is thus inherently much more scalable. In our new approach, we first construct a dynamic shared ontology by analyzing the correspondence graph that relates the heterogeneous classification schemes; the aggregates are then derived by minimization of the Kullback-Leibler information divergence using the EM (Expectation-Maximization) algorithm. Thus, we may assess whether global queries on such aggregates are answerable, partially answerable, or unanswerable in advance of computing the aggregates themselves.

AB - Aggregate views are commonly used for summarizing information held in very large databases such as those encountered in data warehousing, large scale transaction management, and statistical databases. Such applications often involve distributed databases that have developed independently and therefore may exhibit incompatibility, heterogeneity, and data inconsistency. We are here concerned with the integration of aggregates that have heterogeneous classification schemes where local ontologies, in the form of such classification schemes, may be mapped onto a common ontology. In previous work, we have developed a method for the integration of such aggregates; the method previously developed is efficient, but cannot handle innate data inconsistencies that are likely to arise when a large number of databases are being integrated. In this paper, we develop an approach that can handle data inconsistencies and is thus inherently much more scalable. In our new approach, we first construct a dynamic shared ontology by analyzing the correspondence graph that relates the heterogeneous classification schemes; the aggregates are then derived by minimization of the Kullback-Leibler information divergence using the EM (Expectation-Maximization) algorithm. Thus, we may assess whether global queries on such aggregates are answerable, partially answerable, or unanswerable in advance of computing the aggregates themselves.

U2 - 10.1109/TKDE.2003.1161592

DO - 10.1109/TKDE.2003.1161592

M3 - Article

VL - 15

SP - 232

EP - 236

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

IS - 1

ER -

A Scalable Approach to Integrating Heterogeneous Aggregate Views of Distributed Databases

Abstract

Bibliographical note

Access to Document

Fingerprint

Cite this