Introduction to Boosted Trees
Tianqi Chen
Oct. 22 2014
Outline
• Review of key concepts of supervised learning
• Regression Tree and Ensemble (What are we Learning)
• Gradient Boosting (How do we Learn)
• Summary
Elements in Supervised Learning
• Notations: i-th training example
• Model: how to make prediction given
Linear model: (include linear/logistic regression)
The prediction score can have different interpretations
depending on the task
Linear regression: is the predicted score
Logistic regression: is predicted the probability
of the instance being positive
Others… for example in ranking can be the rank score
• Parameters: the things we need to learn from data
Linear model:
Elements continued: Objective Function
• Objective function that is everywhere
Training Loss measures how
well model fit on training data
Regularization, measures
complexity of model
• Loss on training data:
Square loss:
Logistic loss:
• Regularization: how complicated the model is?
L2 norm:
L1 norm (lasso):
Putting known knowledge into context
• Ridge regression:
Linear model, square loss, L2 regularization
• Lasso:
Linear model, square loss, L1 regularization
• Logistic regression:
Linear model, logistic loss, L2 regularization
• The conceptual separation between model, parameter,
objective also gives you engineering benefits.
Think of how you can implement SGD for both ridge regression
and logistic regression
Objective and Bias Variance Trade-off
Training Loss measures how
well model fit on training data
Regularization, measures
complexity of model
• Why do we want to contain two component in the objective?
• Optimizing training loss encourages predictive models
Fitting well in training data at least get you close to training data
which is hopefully close to the underlying distribution
• Optimizing regularization encourages simple models
Simpler models tends to have smaller variance in future
predictions, making prediction stable
Outline
• Review of key concepts of supervised learning
• Regression Tree and Ensemble (What are we Learning)
• Gradient Boosting (How do we Learn)
• Summary
Regression Tree (CART)
• regression tree (also known as classification and regression
tree):
Decision rules same as in decision tree
Contains one score in each leaf value
Input: age, gender, occupation, …
Does the person like computer games
age < 15
Y
N
is male?
Y
N
prediction score in each leaf
+2
+0.1
-1