Wavelets for Efficient Querying of Large Multidimensional Data Sets
Cyrus Shahabi, (USC), email@example.com
New data-intensive applications operate on diverse types of data, signifying new characteristics in both querying the data and obtaining the results. In particular: 1) the data set is large and multidimensional; popular examples are spatial and temporal data, as well as sensor data streams, 2) the queries are complex, asking for trends or outliers in data, correlation between different dimensions, or aggregation of one or more (measure) attributes given a bounded multidimensional space (termed, range-aggregate queries), and 3) the results can be approximate and/or progressively become exact. These characteristics lead us to believe that wavelet transform will become a likely tool for future database query processing. Although a straightforward adoption of wavelet is to utilize it to reduce the data size at different resolutions, unfortunately data compression methods are only effective on datasets that compress well and on queries that require the reconstruction of the entire signal. Therefore, we are taking a fundamentally different approach. Our approach is a data independent approximation technique that is based on query approximation rather than data compression. Many common queries, including relational algebra expressions and a large class of aggregate queries, can be conceptualized as performing a linear transformation on the density distribution of an input relation to produce the density of an output relation. This observation leads us to the following generic progressive query evaluation plan: decompose the query transformation into a series of small, cheaply computable sub-transformations and evaluate the most important sub-transformations first. In this paper, we discuss the details of this evaluation plan for polynomial range-sum queries, and show how it can be extended to support a batch of several submitted queries. In addition, we show that typical queries on wavelet data require a distinct access pattern which we describe and exploit to design a disk placement strategy for wavelet data that yields best-possible I/O complexity for point and range query evaluation. We conclude by discussing some open problems in dealing with wavelets from a database perspective such as how to perform I/O efficient wavelet transformation.