Fast discovery of group Lag correlations in streams
Article Ecrit par: Sakurai, Yasushi ; Faloutsos, Christos ; Papadimitriou, Spiros ;
Résumé: The study of data streams has received considerable attention in various communities (theory, databases, data mining, networking), due to several important applications, such as network analysis, sensor monitoring, financial data analysis, and moving object tracking. Our goal in this article is to monitor multiple numerical streams and determine which pairs are correlated with lags, as well as the value of each such lag. Lag correlations and anticorrelations are frequent and very interesting in practice. For example, a decrease in interest rates typically precedes an increase in house sales by a few months; higher amounts of fluoride in drinking water may lead to fewer dental cavities some years later. Other lag settings include network analysis, sensor monitoring, financial data analysis, and tracking of moving objects. Such data streams are often correlated or anticorrelated, but with unknown lag. We propose BRAID, a method of detecting lag correlations among data streams. BRAID can handle data streams of semi-infinite length incrementally, quickly, and with small resource A preliminary version of this article appeared in Proceedings of the ACM SIGMOD International Conference on Management of Data [Sakurai et al. 2005b]. Part of this work was done while Y. Sakurai was at Carnegie Mellon University. The work of C. Faloutsos was supported by the National Science Foundation under Grant Nos. IIS- 0083148, IIS-0113089, IIS-0209107, IIS-0205224, INT-0318547, SENSOR-0329549, EF-0331657, IIS-0326322, and CNS-0433540, and by the Pennsylvania Infrastructure Technology Alliance (PTA) Grant No. 22-901-001. Additional funding was provided by Intel and Northrop-Grumman Corporation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties. Authors' addresses: Y. Sakurai, NTT Communication Science Laboratories, 2-4 Hikaridai, Seika, Souraku, Kyoto, 619-0237, Japan; email: [email protected]; C. Faloutsos, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213; email: [email protected]; S. Papadimitriou, IBM T.J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY 10532; email: [email protected]. Permission to make digital or hard copies part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from the Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. _c 2010 ACM 1556-4681/2010/12-ART5
Langue:
Anglais