Information from which knowledge can be discovered is frequently distributed due to having been recorded at different times or to having arisen from different sources. Such information is often subject to both imprecision and uncertainty. The Dempster-Shafer representation of evidence offers a way of representing uncertainty in the presence of imprecision, and may therefore be used to provide a mechanism for storing imprecise and uncertain information in databases. We consider an extended relational data model that allows the imprecision and uncertainty associated with attribute values to be quantified using a mass function distribution. When a query is executed, it may be necessary to combine imprecise and uncertain data from distributed sources in order to answer that query. A mechanism is therefore required both for combining the data and for generating measures of uncertainty to be attached to the (imprecise) combined data. In this paper we provide such a mechanism based on aggregation of evidence. We show first how this mechanism can be used to resolve inconsistencies and hence provide an essential database capability to perform the operations necessary to respond to queries on imprecise and uncertain data. We go on to exploit the aggregation operator in an attribute-driven approach to provide information on properties of and patterns in the data. This is fundamental to rule discovery, and hence such an aggregation operator provides a facility that is a central requirement in providing a distributed information system with the capability to perform the operations necessary for Knowledge Discovery.
|Journal||Information Sciences (Special Issue on Knowledge Discovery from Distributed Information Sources)|
|Publication status||Published (in print/issue) - 15 Oct 2003|
Bibliographical noteOther Details
This paper appears in a special issue on Knowledge Discovery from Distributed Information Sources. The paper develops an approach to integrating data that are both uncertain and imprecise. It thus addresses a fundamental issue in distributed database integration in order to enable query-driven processing in distributed heterogeneous systems. The approach was developed as part of the EU-IST MISSION project on integration of shared statistical information, in which a prototype system was developed for answering statistical queries on distributed heterogeneous databases held by different EU national statistical institutes, and the work is ongoing within the Information and Software Engineering group.
- data mining
- evidence theory
- rule induction