logo资料库

陈天奇博士xgboost讲义.pdf

第1页 / 共41页
第2页 / 共41页
第3页 / 共41页
第4页 / 共41页
第5页 / 共41页
第6页 / 共41页
第7页 / 共41页
第8页 / 共41页
资料共41页,剩余部分请下载后查看
Introduction to Boosted Trees Tianqi Chen Oct. 22 2014
Outline • Review of key concepts of supervised learning • Regression Tree and Ensemble (What are we Learning) • Gradient Boosting (How do we Learn) • Summary
Elements in Supervised Learning • Notations: i-th training example • Model: how to make prediction given  Linear model: (include linear/logistic regression)  The prediction score can have different interpretations depending on the task  Linear regression: is the predicted score  Logistic regression: is predicted the probability of the instance being positive  Others… for example in ranking can be the rank score • Parameters: the things we need to learn from data  Linear model:
Elements continued: Objective Function • Objective function that is everywhere Training Loss measures how well model fit on training data Regularization, measures complexity of model • Loss on training data:  Square loss:  Logistic loss: • Regularization: how complicated the model is?  L2 norm:  L1 norm (lasso):
Putting known knowledge into context • Ridge regression:  Linear model, square loss, L2 regularization • Lasso:  Linear model, square loss, L1 regularization • Logistic regression:  Linear model, logistic loss, L2 regularization • The conceptual separation between model, parameter, objective also gives you engineering benefits.  Think of how you can implement SGD for both ridge regression and logistic regression
Objective and Bias Variance Trade-off Training Loss measures how well model fit on training data Regularization, measures complexity of model • Why do we want to contain two component in the objective? • Optimizing training loss encourages predictive models  Fitting well in training data at least get you close to training data which is hopefully close to the underlying distribution • Optimizing regularization encourages simple models  Simpler models tends to have smaller variance in future predictions, making prediction stable
Outline • Review of key concepts of supervised learning • Regression Tree and Ensemble (What are we Learning) • Gradient Boosting (How do we Learn) • Summary
Regression Tree (CART) • regression tree (also known as classification and regression tree):  Decision rules same as in decision tree  Contains one score in each leaf value Input: age, gender, occupation, … Does the person like computer games age < 15 Y N is male? Y N prediction score in each leaf +2 +0.1 -1
分享到:
收藏