Foundations and Trends R in
Computer Graphics and Vision
Vol. 7, Nos. 2–3 (2011) 81–227
c 2012 A. Criminisi, J. Shotton and E. Konukoglu
DOI: 10.1561/0600000035
Decision Forests: A Unified Framework
for Classification, Regression, Density
Estimation, Manifold Learning and
Semi-Supervised Learning
By Antonio Criminisi, Jamie Shotton,
and Ender Konukoglu
Contents
1 Overview and Scope
1.1 A Chronological Literature Review
2 The Random Decision Forest Model
2.1 Decision Tree Basics
2.2 Mathematical Notation and Basic Definitions
2.3 Randomly Trained Decision Trees
2.4 Ensembles of Trees (Decision Forest)
3 Classification Forests
3.1 Classification Algorithms in the Literature
3.2 Specializing the Decision Forest Model for Classification
3.3 Effect of Model Parameters
3.4 Maximum-margin Properties
3.5 Comparisons with Alternative Algorithms
3.6 Human Body Tracking in Microsoft Kinect
for XBox 360
83
84
86
87
90
92
101
105
106
106
110
118
126
128
4 Regression Forests
4.1 Nonlinear Regression in the Literature
4.2 Specializing the Decision Forest Model for Regression
4.3 Effect of Model Parameters
4.4 Comparison with Alternative Algorithms
4.5 Semantic Parsing of 3D Computed Tomography Scans
5 Density Forests
5.1 Literature on Density Estimation
5.2 Specializing the Forest Model for Density Estimation
5.3 Effect of Model Parameters
5.4 Comparison with Alternative Algorithms
5.5 Sampling from the Generative Model
5.6 Dealing with Non-function Relations
5.7 Quantitative Analysis
6 Manifold Forests
6.1 Literature on Manifold Learning
6.2 Specializing the Forest Model for Manifold Learning
6.3 Experiments and the Effect of Model Parameters
6.4 Learning Manifold of Object Shapes
6.5 Learning Manifold of Text Documents
6.6 Discussion
7 Semi-supervised Forests
7.1 Literature on Semi-supervised Learning
7.2 Specializing the Decision Forest Model
for Semi-supervised Classification
7.3 Label Propagation in Transduction Forest
7.4
7.5 Examples, Comparisons and Effect
Induction from Transduction
of Model Parameters
131
131
132
137
141
143
149
150
150
156
159
163
165
171
175
176
177
184
190
192
193
195
196
197
199
201
203
8 Random Ferns and Other Forest Variants
8.1 Extremely Randomized Trees
8.2 Random Ferns
8.3 Online Forest Training
8.4 Structured-output Forests
8.5 Further Forest Variants
9 Conclusions
Appendix A — Deriving the Regression
Information Gain
Acknowledgements
References
208
208
209
211
211
213
214
216
220
221
Foundations and Trends R in
Computer Graphics and Vision
Vol. 7, Nos. 2–3 (2011) 81–227
c 2012 A. Criminisi, J. Shotton and E. Konukoglu
DOI: 10.1561/0600000035
Decision Forests: A Unified Framework
for Classification, Regression, Density
Estimation, Manifold Learning and
Semi-Supervised Learning
Antonio Criminisi1, Jamie Shotton2,
and Ender Konukoglu3
1 Microsoft Research Ltd., 7 J J Thomson Ave, Cambridge, CB3 0FB, UK,
2 Microsoft Research Ltd., 7 J J Thomson Ave, Cambridge, CB3 0FB, UK,
antcrim@microsoft.com
jamiesho@microsoft.com
3 Microsoft Research Ltd., 7 J J Thomson Ave, Cambridge, CB3 0FB, UK,
enderk@microsoft.com
Abstract
This review presents a unified, efficient model of random decision forests
which can be applied to a number of machine learning, computer vision,
and medical image analysis tasks.
Our model extends existing forest-based techniques as it unifies
classification, regression, density estimation, manifold learning, semi-
supervised learning, and active learning under the same decision forest
framework. This gives us the opportunity to write and optimize the core
implementation only once, with application to many diverse tasks.
The proposed model may be used both in a discriminative or gen-
erative way and may be applied to discrete or continuous, labeled or
unlabeled data.
The main contributions of this review are: (1) Proposing a
unified, probabilistic and efficient model for a variety of learning
tasks; (2) Demonstrating margin-maximizing properties of classifica-
tion forests; (3) Discussing probabilistic regression forests in compari-
son with other nonlinear regression algorithms; (4) Introducing density
forests for estimating probability density functions; (5) Proposing an
efficient algorithm for sampling from a density forest; (6) Introducing
manifold forests for nonlinear dimensionality reduction; (7) Proposing
new algorithms for transductive learning and active learning. Finally,
we discuss how alternatives such as random ferns and extremely ran-
domized trees stem from our more general forest model.
This document is directed at both students who wish to learn the
basics of decision forests, as well as researchers interested in the new
contributions. It presents both fundamental and novel concepts in a
structured way, with many illustrative examples and real-world appli-
cations. Thorough comparisons with state-of-the-art algorithms such
as support vector machines, boosting and Gaussian processes are pre-
sented and relative advantages and disadvantages discussed. The many
synthetic examples and existing commercial applications demonstrate
the validity of the proposed model and its flexibility.
1
Overview and Scope
This review presents a unified, efficient model of random decision forests
which can be used in a number of applications such as scene recognition
from photographs, object recognition in images, automatic diagnosis
from radiological scans and semantic text parsing. Such applications
have traditionally been addressed by different, supervised or unsuper-
vised machine learning techniques.
In this review, we formulate diverse learning tasks such as regres-
sion, classification and semi-supervised learning as instances of the
same general decision forest model. The unified framework further
extends to novel uses of forests in tasks such as density estimation
and manifold learning. The underlying unified framework gives us the
opportunity to implement and optimize the general algorithm for all
these tasks only once, and then adapt it to individual applications with
relatively small changes.
This review is directed at engineers and PhD students who wish to
learn the basics of decision forests as well as more senior researchers
interested in the new research contributions.
We begin by presenting a roughly chronological, non-exhaustive
survey of decision trees and forests, and their use in the past two
decades. Further references will be available in the relevant sections.
83
84 Overview and Scope
1.1 A Chronological Literature Review
One of the earlier works on decision trees is the seminal “Classification
and Regression Trees (CART)” book by Breiman et al. [12], where the
authors describe the basics of decision trees and their use for both classi-
fication and regression problems. Following that publication researchers
then focused on algorithms for constructing (learning) optimal decision
trees for different tasks using available training data. For this pur-
pose, one of the most popular algorithms is “C4.5” of Quinlan [81].
Although, decision trees were proven to be useful, their application
remained limited to relatively low dimensional data.
In the nineties, researchers discovered how using ensembles of
learners (e.g., generic “weak” classifiers) yields greater accuracy and
generalization. This seems particulary true for high dimensional data,
as often encountered in real life applications. One of the earliest refer-
ences to ensemble methods is the boosting algorithm of Schapire [87],
where the author discusses how iterative re-weighting of training data
can be used to build “strong” classifiers as linear combination of many
weak ones.
Combining the ideas of decision trees and ensemble methods gave
rise to decision forests, that is, ensembles of randomly trained decision
trees. The idea of constructing and using ensembles of trees with ran-
domly generated node tests was introduced for the first time in the work
of Amit and Geman [1, 2] for handwritten digit recognition. In that
work the authors also propose using the mean of the tree probabilities
as output of the tree ensemble.
In the subsequent work of Ho [47] tree training via randomized
partitioning of the feature space is discussed further, and in [48] forests
are shown to yield superior generalization to both boosting and pruned
C4.5-trained trees, on some tasks. The author also shows comparisons
between different split functions in the tree nodes.
Breiman’s later work in [10, 11] further consolidated the role of
random forests and popularized their use. There, the author intro-
duces a different way of injecting randomness in the forest by randomly
sampling the labeled training data (namely “bagging”). The author
1.1 A Chronological Literature Review 85
also describes techniques for predicting the forest test error based on
measures of tree strength and correlation.
In computer vision, ensemble methods became popular with
the seminal
face and pedestrian detection papers of Viola and
Jones [107, 108]. Random decision forests where used in [63] for image
classification and in [60] for keypoint tracking in videos. Recent years
have seen an explosion of forest-based techniques in the machine learn-
ing, vision and medical imaging literature [9, 15, 25, 31, 35, 37, 58, 59,
65, 67, 68, 69, 74, 79, 89, 92, 97, 110]. Decision forests compare favor-
ably with respect to other techniques [15] and have lead to one of the
biggest success stories of computer vision in recent years: the Microsoft
Kinect for XBox 360 [39, 91, 66].