logo资料库

Machine Learning Refined: Foundations, Algorithms, and Applicati....pdf

第1页 / 共301页
第2页 / 共301页
第3页 / 共301页
第4页 / 共301页
第5页 / 共301页
第6页 / 共301页
第7页 / 共301页
第8页 / 共301页
资料共301页,剩余部分请下载后查看
Title page
Copyright
Table of contents
Preface
1 Introduction
1.1 Teaching a computer to distinguish cats from dogs
1.1.1 The pipeline of a typical machine learning problem
1.2 Predictive learning problems
1.2.1 Regression
1.2.2 Classification
1.3 Feature design
1.4 Numerical optimization
1.5 Summary
Part I: Fundamental tools and concepts
Overview of Part I
2 Fundamentals of numerical optimization
2.1 Calculus-defined optimality
2.1.1 Taylor series approximations
2.1.2 The first order condition for optimality
2.1.3 The convenience of convexity
2.2 Numerical methods for optimization
2.2.1 The big picture
2.2.2 Stopping condition
2.2.3 Gradient descent
2.2.4 Newton’s method
2.3 Summary
2.4 Exercises
3 Regression
3.1 The basics of linear regression
3.1.1 Notation and modeling
3.1.2 The Least Squares cost function for linear regression
3.1.3 Minimization of the Least Squares cost function
3.1.4 The efficacy of a learned model
3.1.5 Predicting the value of new input data
3.2 Knowledge-driven feature design for regression
3.2.1 General conclusions
3.3 Nonlinear regression and l2 regularization
3.3.1 Logistic regression
3.3.2 Non-convex cost functions and l2 regularization
3.4 Summary
3.5 Exercises
4 Classification
4.1 The perceptron cost functions
4.1.1 The basic perceptron model
4.1.2 The softmax cost function
4.1.3 The margin perceptron
4.1.4 Differentiable approximations to the margin perceptron
4.1.5 The accuracy of a learned classifier
4.1.6 Predicting the value of new input data
4.1.7 Which cost function produces the best results?
4.1.8 The connection between the perceptron and counting costs
4.2 The logistic regression perspective on the softmax cost
4.2.1 Step functions and classification
4.2.2 Convex logistic regression
4.3 The support vector machine perspective on the margin perceptron
4.3.1 A quest for the hyperplane with maximum margin
4.3.2 The hard-margin SVM problem
4.3.3 The soft-margin SVM problem
4.3.4 Support vector machines and logistic regression
4.4 Multiclass classification
4.4.1 One-versus-all multiclass classification
4.4.2 Multiclass softmax classification
4.4.3 The accuracy of a learned multiclass classifier
4.4.4 Which multiclass classification scheme works best?
4.5 Knowledge-driven feature design for classification
4.5.1 General conclusions
4.6 Histogram features for real data types
4.6.1 Histogram features for text data
4.6.2 Histogram features for image data
4.6.3 Histogram features for audio data
4.7 Summary
4.8 Exercises
Part II: Tools for fully data-driven machine learning
Overview of Part II
5 Automatic feature design for regression
5.1 Automatic feature design for the ideal regression scenario
5.1.1 Vector approximation
5.1.2 From vectors to continuous functions
5.1.3 Continuous function approximation
5.1.4 Common bases for continuous function approximation
5.1.5 Recovering weights
5.1.6 Graphical representation of a neural network
5.2 Automatic feature design for the real regression scenario
5.2.1 Approximation of discretized continuous functions
5.2.2 The real regression scenario
5.3 Cross-validation for regression
5.3.1 Diagnosing the problem of overfitting/underfitting
5.3.2 Hold out cross-validation
5.3.3 Hold out calculations
5.3.4 k-fold cross-validation
5.4 Which basis works best?
5.4.1 Understanding of the phenomenon underlying the data
5.4.2 Practical considerations
5.4.3 When the choice of basis is arbitrary
5.5 Summary
5.6 Exercises
6 Automatic feature design for classification
6.1 Automatic feature design for the ideal classification scenario
6.1.1 Approximation of piecewise continuous functions
6.1.2 The formal definition of an indicator function
6.1.3 Indicator function approximation
6.1.4 Recovering weights
6.2 Automatic feature design for the real classification scenario
6.2.1 Approximation of discretized indicator functions
6.2.2 The real classification scenario
6.2.3 Classifier accuracy and boundary definition
6.3 Multiclass classification
6.3.1 One-versus-all multiclass classification
6.3.2 Multiclass softmax classification
6.4 Cross-validation for classification
6.4.1 Hold out cross-validation
6.4.2 Hold out calculations
6.4.3 k-fold cross-validation
6.4.4 k-fold cross-validation for one-versus-all multiclass classification
6.5 Which basis works best?
6.6 Summary
6.7 Exercises
7 Kernels, backpropagation, and regularized cross-validation
7.1 Fixed feature kernels
7.1.1 The fundamental theorem of linear algebra
7.1.2 Kernelizing cost functions
7.1.3 The value of kernelization
7.1.4 Examples of kernels
7.1.5 Kernels as similarity matrices
7.2 The backpropagation algorithm
7.2.1 Computing the gradient of a two layer network cost function
7.2.2 Three layer neural network gradient calculations
7.2.3 Gradient descent with momentum
7.3 Cross-validation via l2 regularization
7.3.1 l2 regularization and cross-validation
7.3.2 Regularized k-fold cross-validation for regression
7.3.3 Regularized cross-validation for classification
7.4 Summary
7.5 Further kernel calculations
7.5.1 Kernelizing various cost functions
7.5.2 Fourier kernel calculations – scalar input
7.5.3 Fourier kernel calculations – vector input
Part III: Methods for large scale machine learning
Overview of Part III
8 Advanced gradient schemes
8.1 Fixed step length rules for gradient descent
8.1.1 Gradient descent and simple quadratic surrogates
8.1.2 Functions with bounded curvature and optimally conservative step length rules
8.1.3 How to use the conservative fixed step length rule
8.2 Adaptive step length rules for gradient descent
8.2.1 Adaptive step length rule via backtracking line search
8.2.2 How to use the adaptive step length rule
8.3 Stochastic gradient descent
8.3.1 Decomposing the gradient
8.3.2 The stochastic gradient descent iteration
8.3.3 The value of stochastic gradient descent
8.3.4 Step length rules for stochastic gradient descent
8.3.5 How to use the stochastic gradient method in practice
8.4 Convergence proofs for gradient descent schemes
8.4.1 Convergence of gradient descent with Lipschitz constant fixed step length
8.4.2 Convergence of gradient descent with backtracking line search
8.4.3 Convergence of the stochastic gradient method
8.4.4 Convergence rate of gradient descent for convex functions with fixed step length
8.5 Calculation of computable Lipschitz constants
8.6 Summary
8.7 Exercises
9 Dimension reduction techniques
9.1 Techniques for data dimension reduction
9.1.1 Random subsampling
9.1.2 K-means clustering
9.1.3 Optimization of the K-means problem
9.2 Principal component analysis
9.3 Recommender systems
9.3.1 Matrix completion setup
9.3.2 Optimization of the matrix completion model
9.4 Summary
9.5 Exercises
Part IV: Appendices
A: Basic vector and matrix operations
B: Basics of vector calculus
C: Fundamental matrix factorizations and the pseudo-inverse
D: Convex geometry
References
Index
Machine Learning Refined Providing a unique approach to machine learning, this text contains fresh and intuitive, yet rigorous, descriptions of all fundamental concepts necessary to conduct research, build products, tinker, and play. By prioritizing geometric intuition, algorithmic think- ing, and practical real-world applications in disciplines including computer vision, natural language processing, economics, neuroscience, recommender systems, physics, and biology, this text provides readers with both a lucid understanding of foundational material as well as the practical tools needed to solve real-world problems. With in- depth Python and MATLAB/OCTAVE-based computational exercises and a complete treatment of cutting edge numerical optimization techniques, this is an essential resource for students and an ideal reference for researchers and practitioners working in machine learning, computer science, electrical engineering, signal processing, and numerical op- timization. Key features: • A presentation built on lucid geometric intuition • A unique treatment of state-of-the-art numerical optimization techniques • A fused introduction to logistic regression and support vector machines • Inclusion of feature design and learning as major topics • An unparalleled presentation of advanced topics through the lens of function approximation • A refined description of deep neural networks and kernel methods Jeremy Watt received his PhD in Computer Science and Electrical Engineering from Northwestern University. His research interests lie in machine learning and computer vision, as well as numerical optimization. Reza Borhani received his PhD in Computer Science and Electrical Engineering from Northwestern University. His research interests lie in the design and analysis of algorithms for problems in machine learning and computer vision. Aggelos K. Katsaggelos is a professor and holder of the Joseph Cummings chair in the Department of Electrical Engineering and Computer Science at Northwestern University, where he also heads the Image and Video Processing Laboratory.
Machine Learning Refined Foundations, Algorithms, and Applications JEREMY WATT, REZA BORHANI, AND AGGELOS K. KATSAGGELOS Northwestern University
University Printing House, Cambridge CB2 8BS, United Kingdom Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781107123526 c Cambridge University Press 2016 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2016 Printed in the United Kingdom by Clays, St Ives plc A catalog record for this publication is available from the British Library Library of Congress Cataloging in Publication data Names: Watt, Jeremy, author. | Borhani, Reza. | Katsaggelos, Aggelos Konstantinos, 1956- Title: Machine learning refined : foundations, algorithms, and applications / Jeremy Watt, Reza Borhani, Aggelos Katsaggelos. Description: New York : Cambridge University Press, 2016. Identifiers: LCCN 2015041122 | ISBN 9781107123526 (hardback) Subjects: LCSH: Machine learning. Classification: LCC Q325.5 .W38 2016 | DDC 006.3/1–dc23 LC record available at http://lccn.loc.gov/2015041122 ISBN 978-1-107-12352-6 Hardback Additional resources for this publication at www.cambridge.org/watt Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents Preface page xi 1 2 3 1.2 1.3 1.4 1.5 2.2 2.3 2.4 Introduction 1.1 The pipeline of a typical machine learning problem Teaching a computer to distinguish cats from dogs 1.1.1 Predictive learning problems 1.2.1 1.2.2 Feature design Numerical optimization Summary Regression Classification Part I Fundamental tools and concepts Fundamentals of numerical optimization 2.1 Taylor series approximations The first order condition for optimality The convenience of convexity Calculus-defined optimality 2.1.1 2.1.2 2.1.3 Numerical methods for optimization 2.2.1 2.2.2 2.2.3 2.2.4 Summary Exercises The big picture Stopping condition Gradient descent Newton’s method Regression 3.1 The basics of linear regression Notation and modeling 3.1.1 3.1.2 The Least Squares cost function for linear regression 3.1.3 Minimization of the Least Squares cost function 1 1 5 6 6 9 12 15 16 19 21 21 21 22 24 26 27 27 29 33 38 38 45 45 45 47 48
vi Contents General conclusions 3.1.4 The efficacy of a learned model 3.1.5 Predicting the value of new input data Knowledge-driven feature design for regression 3.2.1 Nonlinear regression and 2 regularization 3.3.1 3.3.2 Summary Exercises Logistic regression Non-convex cost functions and 2 regularization 3.2 3.3 3.4 3.5 4 Classification 4.1 The basic perceptron model The softmax cost function The margin perceptron Differentiable approximations to the margin perceptron The accuracy of a learned classifier Predicting the value of new input data The perceptron cost functions 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 4.1.6 4.1.7 Which cost function produces the best results? 4.1.8 The connection between the perceptron and counting costs 4.2 4.3 Step functions and classification Convex logistic regression The logistic regression perspective on the softmax cost 4.2.1 4.2.2 The support vector machine perspective on the margin perceptron 4.3.1 4.3.2 4.3.3 4.3.4 A quest for the hyperplane with maximum margin The hard-margin SVM problem The soft-margin SVM problem Support vector machines and logistic regression 4.4 Multiclass classification General conclusions One-versus-all multiclass classification The accuracy of a learned multiclass classifier 4.4.1 4.4.2 Multiclass softmax classification 4.4.3 4.4.4 Which multiclass classification scheme works best? Knowledge-driven feature design for classification 4.5.1 Histogram features for real data types 4.6.1 4.6.2 4.6.3 Summary Exercises Histogram features for text data Histogram features for image data Histogram features for audio data 4.5 4.6 4.7 4.8 50 50 51 54 56 56 59 61 62 73 73 73 75 78 80 82 83 84 85 86 87 89 91 91 93 93 95 95 96 99 103 104 104 106 107 109 112 115 117 118
Contents vii Part II Tools for fully data-driven machine learning 5 6 Automatic feature design for regression 5.1 Vector approximation From vectors to continuous functions Continuous function approximation Common bases for continuous function approximation Recovering weights Graphical representation of a neural network Automatic feature design for the ideal regression scenario 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 5.1.6 Automatic feature design for the real regression scenario 5.2.1 5.2.2 Cross-validation for regression 5.3.1 5.3.2 5.3.3 5.3.4 Diagnosing the problem of overfitting/underfitting Hold out cross-validation Hold out calculations k-fold cross-validation Approximation of discretized continuous functions The real regression scenario 5.2 5.3 5.4 Which basis works best? Understanding of the phenomenon underlying the data Practical considerations 5.4.1 5.4.2 5.4.3 When the choice of basis is arbitrary Summary Exercises Notes on continuous function approximation 5.5 5.6 5.7 Automatic feature design for classification 6.1 Approximation of piecewise continuous functions The formal definition of an indicator function Indicator function approximation Recovering weights Automatic feature design for the ideal classification scenario 6.1.1 6.1.2 6.1.3 6.1.4 Automatic feature design for the real classification scenario 6.2.1 6.2.2 6.2.3 Approximation of discretized indicator functions The real classification scenario Classifier accuracy and boundary definition 6.2 6.3 Multiclass classification 6.4 One-versus-all multiclass classification 6.3.1 6.3.2 Multiclass softmax classification Cross-validation for classification Hold out cross-validation 6.4.1 6.4.2 Hold out calculations 129 131 131 132 133 134 135 140 140 141 142 142 146 149 149 151 152 155 156 156 156 158 158 165 166 166 166 168 170 170 171 171 172 178 179 179 180 180 182 182
分享到:
收藏