Interface 2003

Database Technology for Statistical Data
Arie Shoshani, (Lawrence Berkeley Laboratory),


Most statistical analysis is performed by statistical packages such as SAS or R. However, these systems offer little help in managing data before they are analyzed. Rather, it is expected that the data are organized ahead of time in a format suitable for the statistical package. While relational databases can be used to store statistical data, the table structure of the relational model is not the most appropriate for statistical data. In this talk, we will discuss advances in database technology designed for managing and manipulating statistical data. In particular, the areas of statistical databases and OLAP (On-Line Analytical Processing) have developed models suitable to represent multi-dimensional data where each dimension can be further organized as a hierarchy of categorical attributes. Further, some database products were built to optimize queries over such data. We will also discuss the concept of federating statistical databases with ordinary object databases, which typically contain other data or metadata associated with the statistical data. The purpose of the federation is to provide a capability of jointly querying the databases in the federation while maintaining their independence. We will present a prototype system that supports such a federation.

