Fast XML document filtering by sequencing twig patterns
Article Ecrit par: Kwon, Joonho ; Rao, Praveen ; Moon, Bongki ; Lee, Sukho ;
Résumé: XML-enabled publish-subscribe (pub-sub) systems have emerged as an increasingly important tool for e-commerce and Internet applications. In a typical pub-sub system, subscribed users specify their interests in a profile expressed in the XPath language. Each new data content is then matched against the user profiles so that the content is delivered only to the interested subscribers. As the number of subscribed users and their profiles can grow very large, the scalability of the service is critical to the success of pub-sub systems. In this article, we propose a novel scalable filtering system called iFiST that transforms user profiles of a twig pattern expressed in XPath into sequences using the Pr ¨ ufer’s method. Consequently, instead of breaking a twig pattern into multiple linear paths and matching them separately, iFiST performs holistic matching of twig patterns with each incoming document in a bottom-up fashion. iFiST organizes the sequences into a dynamic hash-based index for efficient filtering, and exploits the commonality among user profiles to enable shared processing during the filtering phase. We demonstrate that the holistic matching approach reduces filtering cost and memory consumption, thereby improving the scalability of iFiST.
Langue:
Anglais