The paper provides some perspective by raising a few general issues. One issue concerns the limits of human sensory input and the resulting conscious awareness. The limits are cause for humility in the face of overwhelming quantities of data. For example one paper indicates using a recursive partitioning algorithm on a problem with over two million variables. Analysts that would look at the data on much smaller problems inevitably end up looking at caricatures of the data. After making the assumption that it is still beneficial to have analysts involved in the analysis process, it seems that thought and computational power could be devoted to producing and prioritizing caricatures that exploit analysts' visual processing strengths.
In terms for constructing trees, a dynamic example shows the approach of using grand tour, brushing, alphablending and graphical partitioning to build trees. The visual approach uses linear combinations of predictor variables. When the data view allows partitioning on more than one predictor variable the approach includes a type of look ahead compared to a one variable at a time algorithm. More generally the views can be smoothed regression surfaces and various approaches can be used to graphically define multivariate partitions. The view used for partitioning may not be well chosen. Thus projection pursuit or related algorithms can help the analysts to select views. Trees displays can use graphical representations to show the partition boundaries. This can be done for traditional as well as graphically defined partitions.
The paper emphasizes graphical possibilities and does not evaluate the quality of analyst defined trees. Adjusting the significance tests of human defined partitions for multiple comparison is an open research question with some algorithm emulation possibilities. More generally graphics can be also be used in the evaluation process. One evaluation process generates different trees by weighted random selection of prioritized variables at each partitioning step. The paper closes by describing an approach to laying out trees based on their similarities.
These explanations differ from common statistical analysis in that they are both based on ``non-generative'' models of the world. I will explain how generative and non-generative models differ and why the difference is important.
As an application, we show how MaxEnt in graphical models can provide a practical tool for the assessment of model parameters in graphical models that are build in collaboration with domain experts.
[Postscript]
In nested designs, a factor is nested in another and each factor has multiple levels. In these cases, a layer of correlation corresponds to modeling the correlation structure within a factor in the hierarchical structures. A layer may consist of multiple blocks, and blocks of the same layer have the same parameterization. These blocks represent the levels of the factor associated with the layer. This approach can be easily extended to unbalanced hierarchical structures and heterogeneous clusters. Algorithms based on the approach are embedded in GEE methods. A case study from prostate cancer and simulation studies show this approach is more efficient than the existing GEE methods and multivariate methods.
A better approach is to allow the author to create a document using the common authoring tools (e.g. {\LaTeX}, MS Word or htm editors) and to conveniently insert dynamic and interactive components from other languages. The author focuses on the presentation and display of these components, including the usual multi-media elements such as text, images and sounds. She uses HTML form elements and Java components to provide interactive controls with which the reader can manipulate the contents of the document. And finally she performs statistical computations and renders visual displays using the statistical software that is embedded within the reader's browser.
In this presentation, we describe how we have created an environment for interactive statistical documents. It allows the author to use HTML, JavaScript and R to create the content and the interactivity. The reader accesses the interactive and dynamic functionality of the document via a plug-in for Netscape that embeds R within it. The different languages are all reasonably standard tools and each is used for the purposes for which it was designed. This makes it a reasonably straightforward environment in which to quickly and simply create interfaces for various different applications and audiences.
This paper introduces a new lazy learning algorithm --- the Lazy Option Trees --- based on which we derive a method for computing good class probability estimates.
Our algorithm builds on the basic ideas of the lazy decision tree classification algorithm introduced by Friedman et al. (1996). In order to compute good probability estimates, multiple tests are performed in each node on the query instance. The algorithm also allows tests on continuous attributes, and performs local smoothing in the leaf nodes. The class probability estimates are improved using Breiman's Bagging.
One of the most important uses of accurate class probability estimates in machine learning and data mining is prediction in the presence of arbitrarily large costs associated with the different kinds of errors. We tested our method for different cost models and application domains from the UCI ML repository and, for the majority of the tasks, its probability estimates improve over the probability estimates of both decision trees and bagged Probability Estimation Trees (unpruned, uncollapsed, and smoothed decision trees; B-PETs) - one of the best existing class probability estimators. For evaluating the quality of the predictions in a cost-sensitive context, we employed the paired BDeltaCost procedure introduced by Margineantu \& Dietterich (2000).
Conditional on the fitted common shape model, it is possible to fit and test nonlinear mixed effects using standard methods. While the sieve parametric form of the model suggests that a conditional likelihood ratio test should be available for testing whether the shape varies with a time invariant covariate, the null distribution of the likelihood ratio test may not be chi-squared.
This paper describes the use of spatial statistics to compress the size of large 1-meter imagery data sets. The images were taken over locations in the United States using a CAMIS (Computerized Airborne Multispectral Imaging System) instrument flown in an airplane and registered by trained image analysts. Models of spatial variation are first computed on an entire image, then on subsampled sets of the image. Parameters of the models are used to compress the original image. Image analysis operations are then performed on the original and compressed images and performance is compared. In some cases it is possible to compress data several orders of magnitude without substantially degrading results of subsequent analysis.
In this paper we propose a tree-based scan statistic for database surveillance use, to be used when the independent variable can be defined in the form of a hierarchical tree. The proposed method is illustrated by looking at whether death from silicosis is particularly common among specific occupations as classified by the Census Bureau, without preconceived idea of what specific occupation or group of occupations may be related to increased risk, if any at all. While the method can be used for many different types of databases, the proposed method will be described in terms of `occupation' and `mortality'.