Statistical Analysis of Massive Data Streams: Overview of a CATS Workshop
Sallie Keller-McNulty, (Los Alamos National Laboratory), email@example.com
The National Research Council's Committee on Applied and Theoretical Statistics (CATS) recently held a two-day workshop exploring methods for the statistical analysis of streams of data so as to stimulate further progress in this field. "Data Streams" may be defined as "A sequence of digitally encoded signals used to represent information in transmission". The workshop focus was on data streams that are too massive or dynamic to be subjected to batch processing. With such data streams, massive amounts of data are arriving continually and it is necessary to perform very frequent analyses or re-analyses on the constantly arriving data. Often there is so much data that only a short time window's worth is economically storable, necessitating summarization strategies. The workshop brought together a broad base of researchers from statistics, probability, and computer science who are dealing with massive data streams in different contexts. Sessions were held on the following topics: Atmospheric and Meteorological Data; High-Energy Physics; Integrated Data Streams; Network Traffic ; and Mining Commercial Streams of Data. The workshop was very successful in starting the crossfertilization of ideas among attendees. This talk will give an overview of the workshop's activities, with a focus on recommendations made during discussion sessions ranging from exciting research problems to ideas for increased collaborations.