Transformation of XML Data Sources for Sequential Path Mining

Ruth McNerlan, Yaxin Bi, Gouge Zhao, Bing Hang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

94 Downloads (Pure)

Abstract

In recent years XML has become one of the most promising ways to define semi-structured data. Data mining techniques devised for detecting interesting patterns from semi-structure data have also grown in popularity, but carrying out such techniques on XML data can be problematic due to its hierarchical structure. Therefore, it has become necessary to transform XML into flattened, path data, so as to enable data mining to be carried out efficiently. However, problems may arise when the XML tree needs to be reconstructed from the traversal path. There are currently many transformation techniques for XML data, many of which take advantage of its tree-like hierarchical structure; but most of these approaches do not allow the XML tree to be reconstructed from the traversal path. In this paper we propose a new approach to the transformation of XML data into path data. The new approach employs a 5 step transformation process along with a new ‘Postorder Sequencing’ method of traversing the XML tree. The proposed method, on the one hand, can be seen an efficient and effective way of transforming XML data into collections of paths, and on the other hand enables XML trees to be generated from the traversal paths
Original languageEnglish
Title of host publicationUnknown Host Publication
PublisherSpringer
Number of pages10
ISBN (Print)978-3-319-69780-2
Publication statusPublished online - 19 Oct 2017
EventInternational workshop on graph data management and analysis (GDMA 2017) - Beijing, China
Duration: 19 Oct 2017 → …

Workshop

WorkshopInternational workshop on graph data management and analysis (GDMA 2017)
Period19/10/17 → …

Keywords

  • XML
  • Transformation
  • XPath
  • Sequential Data Mining

Fingerprint

Dive into the research topics of 'Transformation of XML Data Sources for Sequential Path Mining'. Together they form a unique fingerprint.

Cite this