tree {tree}R Documentation

Fit a Classification or Regression Tree

Description

A tree is grown by binary recursive partitioning using the response in the specified formula and choosing splits from the terms of the right-hand-side.

Usage

tree(formula = formula(data), data = sys.frame(sys.parent()),
     weights, subset,
     na.action = na.pass, control = tree.control(nobs, ...),
     method = "recursive.partition",
     split = c("deviance", "gini"),
     model = NULL, x = FALSE, y = TRUE, wts = TRUE, ...)

Arguments

formula A formula expression. The left-hand-side (response) should be either a numerical vector when a regression tree will be fitted or a factor, when a classification tree is produced. The right-hand-side should be a series of numeric or factor or ordered variables separated by +; there should be no interaction terms. Both . and - are allowed: regression trees can have offset terms.
data A data frame in which to preferentially interpret formula, weights and subset.
weights Vector of non-negative observational weights; fractional weights are allowed.
subset An expression specifying the subset of cases to be used.
na.action A function to filter missing data from the model frame. The default is na.pass (to do nothing) as tree handles missing values (by dropping them down the tree as far as possible).
control A list as returned by tree.control.
method character string giving the method to use. The only other useful value is "model.frame".
split Splitting criterion to use.
model If this argument is itself a model frame, then the formula and data arguments are ignored, and model is used to define the model.
x logical. If true, the matrix of variables for each case is returned.
y logical. If true, the response variable is returned.
wts logical. If true, the weights are returned.
... Additional arguments that are passed to tree.control. Normally used for mincut, minsize or mindev.

Details

A tree is grown by binary recursive partitioning using the response in the specified formula and choosing splits from the terms of the right-hand-side. Numeric variables and ordered factors are divided into X < a and X > a; the levels of an unordered factor are divided into two non-empty groups. The split which maximizes the reduction in impurity is chosen, the data set split and the process repeated. Splitting continues until the terminal nodes are too small or too few to be split.

Tree growth is limited to a depth of 31 by the use of integers to label nodes.

Factor predictor variables can have up to 32 levels. This limit is imposed for ease of labelling, but since their use in a classification tree with three or more levels in a response involves a search over 2^(k-1) - 1 groupings for k levels, the practical limit is much less.

Author(s)

B. D. Ripley

References

Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge. Chapter 7.

See Also

tree.control, prune.tree, predict.tree, snip.tree

Examples

library(MASS)
data(cpus)
cpus.ltr <- tree(log10(perf) ~ syct+mmin+mmax+cach+chmin+chmax, cpus)
cpus.ltr
summary(cpus.ltr)
plot(cpus.ltr);  text(cpus.ltr)

data(iris)
ir.tr <- tree(Species ~., iris)
ir.tr
summary(ir.tr)

[Package Contents]