Kaggle 技术秘籍.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：2.46M 资料格式：pdf 举报版权申诉

qq_31452127-10950791-4744302542966458198.pdf-第1页.png

第1页 / 共58页

qq_31452127-10950791-4744302542966458198.pdf-第2页.png

第2页 / 共58页

qq_31452127-10950791-4744302542966458198.pdf-第3页.png

第3页 / 共58页

qq_31452127-10950791-4744302542966458198.pdf-第4页.png

第4页 / 共58页

qq_31452127-10950791-4744302542966458198.pdf-第5页.png

第5页 / 共58页

qq_31452127-10950791-4744302542966458198.pdf-第6页.png

第6页 / 共58页

qq_31452127-10950791-4744302542966458198.pdf-第7页.png

第7页 / 共58页

qq_31452127-10950791-4744302542966458198.pdf-第8页.png

第8页 / 共58页

文本预览

Winning Kaggle Competitions Hendrik Jacob van Veen - Nubank Brasil

About Kaggle Biggest platform for competitive data science in the world Currently 500k + competitors Great platform to learn about the latest techniques and avoiding overﬁt Great platform to share and meet up with other data freaks

Approach Get a good score as fast as possible Using versatile libraries Model ensembling

Get a good score as fast as possible Get the raw data into a universal format like SVMlight or Numpy arrays. Failing fast and failing often / Agile sprint / Iteration Sub-linear debugging:   “output enough intermediate information as a calculation is progressing to determine before it ﬁnishes whether you've injected a major defect or a signiﬁcant improvement.” Paul Mineiro

Using versatile libraries Scikit-learn Vowpal Wabbit XGBoost Keras Other tools get Scikit-learn API wrappers

Model Ensembling Voting Averaging Bagging Boosting Binning Blending Stacking

General Strategy Try to create “machine learning”-learning algorithms with optimized pipelines that are: Data agnostic (Sparse, dense, missing values, larger than memory) Problem agnostic (Classiﬁcation, regression, clustering) Solution agnostic (Production-ready, PoC, latency) Automated (Turn on and go to bed) Memory-friendly (Don’t want to pay for AWS) Robust (Good generalization, concept drift, consistent)

First Overview I Classiﬁcation? Regression? Evaluation Metric Description Benchmark code “Predict human activities based on their smartphone usage. Predict if a user is sitting, walking etc.” - Smartphone User Activity Prediction Given the HTML of ~337k websites served to users of StumbleUpon, identify the paid content disguised as real content. - Dato Truly Native?

分享到：

赞收藏

资料库

Kaggle 技术秘籍.pdf

相关推荐

人工智能

热门标签

最新资料