Measuring similarity for multidimensional sequences

Hui Wang, Zhiwei Lin, Sally McClean, Jun Liu

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    Multidimensional sequences are common, and measuring their similarity is a key to any analysis of such data. There is a wealth of similarity measures for sequences in the literature, but most of them are designed for a special type of sequence and later extended to more general types. These extensions are usually ad hoc, and the extended versions may lose the original conceptual interpretation of the measure. In this paper we consider the problem of how to measure similarity for the general type of multidimensional sequences effectively in a conceptually uniform way. We show that the subsequence concept behind longest common subsequence and all common subsequences can be extended from the temporal dimension to the spatial dimension, and we generalize the all common subsequences similarity to multidimensional sequences. The hard problem is how to compute the generalized similarity. We present a theorem that combines the temporal and spatial dimensions in a simple formula. This theorem suggests a dynamic programming algorithm to compute the generalized similarity. A preliminary experiment shows that this similarity produces competitive outcomes. However, this approach counts some subsequences multiple times when a sequence has repeated elements. We present a theorem that allows counting of distinct common subsequences.

    Original languageEnglish
    Title of host publicationProceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
    Pages281-287
    Number of pages7
    DOIs
    Publication statusPublished (in print/issue) - 2010
    Event10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 - Sydney, NSW, Australia
    Duration: 14 Dec 201017 Dec 2010

    Publication series

    NameProceedings - IEEE International Conference on Data Mining, ICDM
    ISSN (Print)1550-4786

    Conference

    Conference10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
    Country/TerritoryAustralia
    CitySydney, NSW
    Period14/12/1017/12/10

    Keywords

    • All common subsequences
    • Dynamic time warping
    • Multidimensional sequences
    • Similarity
    • The longest common subsequence

    Fingerprint

    Dive into the research topics of 'Measuring similarity for multidimensional sequences'. Together they form a unique fingerprint.

    Cite this