Machine Learning Refined: Foundations, Algorithms, and Applicati....pdf

发布时间：2022-05-29 发布人：admin 分类：说明书资料大小：30.92M 资料格式：pdf 举报版权申诉

u013003382-9764814-4744302543372554934.pdf-第1页.png

第1页 / 共301页

u013003382-9764814-4744302543372554934.pdf-第2页.png

第2页 / 共301页

u013003382-9764814-4744302543372554934.pdf-第3页.png

第3页 / 共301页

u013003382-9764814-4744302543372554934.pdf-第4页.png

第4页 / 共301页

u013003382-9764814-4744302543372554934.pdf-第5页.png

第5页 / 共301页

u013003382-9764814-4744302543372554934.pdf-第6页.png

第6页 / 共301页

u013003382-9764814-4744302543372554934.pdf-第7页.png

第7页 / 共301页

u013003382-9764814-4744302543372554934.pdf-第8页.png

第8页 / 共301页

Title page

Table of contents

Preface

1 Introduction

1.1 Teaching a computer to distinguish cats from dogs

1.1.1 The pipeline of a typical machine learning problem

1.2 Predictive learning problems

1.2.1 Regression

1.2.2 Classification

1.3 Feature design

1.4 Numerical optimization

1.5 Summary

Part I: Fundamental tools and concepts

Overview of Part I

2 Fundamentals of numerical optimization

2.1 Calculus-defined optimality

2.1.1 Taylor series approximations

2.1.2 The first order condition for optimality

2.1.3 The convenience of convexity

2.2 Numerical methods for optimization

2.2.1 The big picture

2.2.2 Stopping condition

2.2.3 Gradient descent

2.2.4 Newton’s method

2.3 Summary

2.4 Exercises

3 Regression

3.1 The basics of linear regression

3.1.1 Notation and modeling

3.1.2 The Least Squares cost function for linear regression

3.1.3 Minimization of the Least Squares cost function

3.1.4 The efficacy of a learned model

3.1.5 Predicting the value of new input data

3.2 Knowledge-driven feature design for regression

3.2.1 General conclusions

3.3 Nonlinear regression and l2 regularization

3.3.1 Logistic regression

3.3.2 Non-convex cost functions and l2 regularization

3.4 Summary

3.5 Exercises

4 Classification

4.1 The perceptron cost functions

4.1.1 The basic perceptron model

4.1.2 The softmax cost function

4.1.3 The margin perceptron

4.1.4 Differentiable approximations to the margin perceptron

4.1.5 The accuracy of a learned classifier

4.1.6 Predicting the value of new input data

4.1.7 Which cost function produces the best results?

4.1.8 The connection between the perceptron and counting costs

4.2 The logistic regression perspective on the softmax cost

4.2.1 Step functions and classification

4.2.2 Convex logistic regression

4.3 The support vector machine perspective on the margin perceptron

4.3.1 A quest for the hyperplane with maximum margin

4.3.2 The hard-margin SVM problem

4.3.3 The soft-margin SVM problem

4.3.4 Support vector machines and logistic regression

4.4 Multiclass classification

4.4.1 One-versus-all multiclass classification

4.4.2 Multiclass softmax classification

4.4.3 The accuracy of a learned multiclass classifier

4.4.4 Which multiclass classification scheme works best?

4.5 Knowledge-driven feature design for classification

4.5.1 General conclusions

4.6 Histogram features for real data types

4.6.1 Histogram features for text data

4.6.2 Histogram features for image data

4.6.3 Histogram features for audio data

4.7 Summary

4.8 Exercises

Part II: Tools for fully data-driven machine learning

Overview of Part II

5 Automatic feature design for regression

5.1 Automatic feature design for the ideal regression scenario

5.1.1 Vector approximation

5.1.2 From vectors to continuous functions

5.1.3 Continuous function approximation

5.1.4 Common bases for continuous function approximation

5.1.5 Recovering weights

5.1.6 Graphical representation of a neural network

5.2 Automatic feature design for the real regression scenario

5.2.1 Approximation of discretized continuous functions

5.2.2 The real regression scenario

5.3 Cross-validation for regression

5.3.1 Diagnosing the problem of overfitting/underfitting

5.3.2 Hold out cross-validation

5.3.3 Hold out calculations

5.3.4 k-fold cross-validation

5.4 Which basis works best?

5.4.1 Understanding of the phenomenon underlying the data

5.4.2 Practical considerations

5.4.3 When the choice of basis is arbitrary

5.5 Summary

5.6 Exercises

6 Automatic feature design for classification

6.1 Automatic feature design for the ideal classification scenario

6.1.1 Approximation of piecewise continuous functions

6.1.2 The formal definition of an indicator function

6.1.3 Indicator function approximation

6.1.4 Recovering weights

6.2 Automatic feature design for the real classification scenario

6.2.1 Approximation of discretized indicator functions

6.2.2 The real classification scenario

6.2.3 Classifier accuracy and boundary definition

6.3 Multiclass classification

6.3.1 One-versus-all multiclass classification

6.3.2 Multiclass softmax classification

6.4 Cross-validation for classification

6.4.1 Hold out cross-validation

6.4.2 Hold out calculations

6.4.3 k-fold cross-validation

6.4.4 k-fold cross-validation for one-versus-all multiclass classification

6.5 Which basis works best?

6.6 Summary

6.7 Exercises

7 Kernels, backpropagation, and regularized cross-validation

7.1 Fixed feature kernels

7.1.1 The fundamental theorem of linear algebra

7.1.2 Kernelizing cost functions

7.1.3 The value of kernelization

7.1.4 Examples of kernels

7.1.5 Kernels as similarity matrices

7.2 The backpropagation algorithm

7.2.1 Computing the gradient of a two layer network cost function

7.2.2 Three layer neural network gradient calculations

7.2.3 Gradient descent with momentum

7.3 Cross-validation via l2 regularization

7.3.1 l2 regularization and cross-validation

7.3.2 Regularized k-fold cross-validation for regression

7.3.3 Regularized cross-validation for classification

7.4 Summary

7.5 Further kernel calculations

7.5.1 Kernelizing various cost functions

7.5.2 Fourier kernel calculations – scalar input

7.5.3 Fourier kernel calculations – vector input

Part III: Methods for large scale machine learning

Overview of Part III

8 Advanced gradient schemes

8.1 Fixed step length rules for gradient descent

8.1.1 Gradient descent and simple quadratic surrogates

8.1.2 Functions with bounded curvature and optimally conservative step length rules

8.1.3 How to use the conservative fixed step length rule

8.2 Adaptive step length rules for gradient descent

8.2.1 Adaptive step length rule via backtracking line search

8.2.2 How to use the adaptive step length rule

8.3 Stochastic gradient descent

8.3.1 Decomposing the gradient

8.3.2 The stochastic gradient descent iteration

8.3.3 The value of stochastic gradient descent

8.3.4 Step length rules for stochastic gradient descent

8.3.5 How to use the stochastic gradient method in practice

8.4 Convergence proofs for gradient descent schemes

8.4.1 Convergence of gradient descent with Lipschitz constant fixed step length

8.4.2 Convergence of gradient descent with backtracking line search

8.4.3 Convergence of the stochastic gradient method

8.4.4 Convergence rate of gradient descent for convex functions with fixed step length

8.5 Calculation of computable Lipschitz constants

8.6 Summary

8.7 Exercises

9 Dimension reduction techniques

9.1 Techniques for data dimension reduction

9.1.1 Random subsampling

9.1.2 K-means clustering

9.1.3 Optimization of the K-means problem

9.2 Principal component analysis

9.3 Recommender systems

9.3.1 Matrix completion setup

9.3.2 Optimization of the matrix completion model

9.4 Summary

9.5 Exercises

Part IV: Appendices

A: Basic vector and matrix operations

B: Basics of vector calculus

C: Fundamental matrix factorizations and the pseudo-inverse

D: Convex geometry

References

Index

Machine Learning Reﬁned Providing a unique approach to machine learning, this text contains fresh and intuitive, yet rigorous, descriptions of all fundamental concepts necessary to conduct research, build products, tinker, and play. By prioritizing geometric intuition, algorithmic think- ing, and practical real-world applications in disciplines including computer vision, natural language processing, economics, neuroscience, recommender systems, physics, and biology, this text provides readers with both a lucid understanding of foundational material as well as the practical tools needed to solve real-world problems. With in- depth Python and MATLAB/OCTAVE-based computational exercises and a complete treatment of cutting edge numerical optimization techniques, this is an essential resource for students and an ideal reference for researchers and practitioners working in machine learning, computer science, electrical engineering, signal processing, and numerical op- timization. Key features: • A presentation built on lucid geometric intuition • A unique treatment of state-of-the-art numerical optimization techniques • A fused introduction to logistic regression and support vector machines • Inclusion of feature design and learning as major topics • An unparalleled presentation of advanced topics through the lens of function approximation • A reﬁned description of deep neural networks and kernel methods Jeremy Watt received his PhD in Computer Science and Electrical Engineering from Northwestern University. His research interests lie in machine learning and computer vision, as well as numerical optimization. Reza Borhani received his PhD in Computer Science and Electrical Engineering from Northwestern University. His research interests lie in the design and analysis of algorithms for problems in machine learning and computer vision. Aggelos K. Katsaggelos is a professor and holder of the Joseph Cummings chair in the Department of Electrical Engineering and Computer Science at Northwestern University, where he also heads the Image and Video Processing Laboratory.

Machine Learning Reﬁned Foundations, Algorithms, and Applications JEREMY WATT, REZA BORHANI, AND AGGELOS K. KATSAGGELOS Northwestern University

University Printing House, Cambridge CB2 8BS, United Kingdom Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781107123526 c Cambridge University Press 2016 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2016 Printed in the United Kingdom by Clays, St Ives plc A catalog record for this publication is available from the British Library Library of Congress Cataloging in Publication data Names: Watt, Jeremy, author. | Borhani, Reza. | Katsaggelos, Aggelos Konstantinos, 1956- Title: Machine learning reﬁned : foundations, algorithms, and applications / Jeremy Watt, Reza Borhani, Aggelos Katsaggelos. Description: New York : Cambridge University Press, 2016. Identiﬁers: LCCN 2015041122 | ISBN 9781107123526 (hardback) Subjects: LCSH: Machine learning. Classiﬁcation: LCC Q325.5 .W38 2016 | DDC 006.3/1–dc23 LC record available at http://lccn.loc.gov/2015041122 ISBN 978-1-107-12352-6 Hardback Additional resources for this publication at www.cambridge.org/watt Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

Contents Preface page xi 1 2 3 1.2 1.3 1.4 1.5 2.2 2.3 2.4 Introduction 1.1 The pipeline of a typical machine learning problem Teaching a computer to distinguish cats from dogs 1.1.1 Predictive learning problems 1.2.1 1.2.2 Feature design Numerical optimization Summary Regression Classiﬁcation Part I Fundamental tools and concepts Fundamentals of numerical optimization 2.1 Taylor series approximations The ﬁrst order condition for optimality The convenience of convexity Calculus-deﬁned optimality 2.1.1 2.1.2 2.1.3 Numerical methods for optimization 2.2.1 2.2.2 2.2.3 2.2.4 Summary Exercises The big picture Stopping condition Gradient descent Newton’s method Regression 3.1 The basics of linear regression Notation and modeling 3.1.1 3.1.2 The Least Squares cost function for linear regression 3.1.3 Minimization of the Least Squares cost function 1 1 5 6 6 9 12 15 16 19 21 21 21 22 24 26 27 27 29 33 38 38 45 45 45 47 48

vi Contents General conclusions 3.1.4 The efﬁcacy of a learned model 3.1.5 Predicting the value of new input data Knowledge-driven feature design for regression 3.2.1 Nonlinear regression and 2 regularization 3.3.1 3.3.2 Summary Exercises Logistic regression Non-convex cost functions and 2 regularization 3.2 3.3 3.4 3.5 4 Classiﬁcation 4.1 The basic perceptron model The softmax cost function The margin perceptron Differentiable approximations to the margin perceptron The accuracy of a learned classiﬁer Predicting the value of new input data The perceptron cost functions 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 4.1.6 4.1.7 Which cost function produces the best results? 4.1.8 The connection between the perceptron and counting costs 4.2 4.3 Step functions and classiﬁcation Convex logistic regression The logistic regression perspective on the softmax cost 4.2.1 4.2.2 The support vector machine perspective on the margin perceptron 4.3.1 4.3.2 4.3.3 4.3.4 A quest for the hyperplane with maximum margin The hard-margin SVM problem The soft-margin SVM problem Support vector machines and logistic regression 4.4 Multiclass classiﬁcation General conclusions One-versus-all multiclass classiﬁcation The accuracy of a learned multiclass classiﬁer 4.4.1 4.4.2 Multiclass softmax classiﬁcation 4.4.3 4.4.4 Which multiclass classiﬁcation scheme works best? Knowledge-driven feature design for classiﬁcation 4.5.1 Histogram features for real data types 4.6.1 4.6.2 4.6.3 Summary Exercises Histogram features for text data Histogram features for image data Histogram features for audio data 4.5 4.6 4.7 4.8 50 50 51 54 56 56 59 61 62 73 73 73 75 78 80 82 83 84 85 86 87 89 91 91 93 93 95 95 96 99 103 104 104 106 107 109 112 115 117 118

Contents vii Part II Tools for fully data-driven machine learning 5 6 Automatic feature design for regression 5.1 Vector approximation From vectors to continuous functions Continuous function approximation Common bases for continuous function approximation Recovering weights Graphical representation of a neural network Automatic feature design for the ideal regression scenario 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 5.1.6 Automatic feature design for the real regression scenario 5.2.1 5.2.2 Cross-validation for regression 5.3.1 5.3.2 5.3.3 5.3.4 Diagnosing the problem of overﬁtting/underﬁtting Hold out cross-validation Hold out calculations k-fold cross-validation Approximation of discretized continuous functions The real regression scenario 5.2 5.3 5.4 Which basis works best? Understanding of the phenomenon underlying the data Practical considerations 5.4.1 5.4.2 5.4.3 When the choice of basis is arbitrary Summary Exercises Notes on continuous function approximation 5.5 5.6 5.7 Automatic feature design for classiﬁcation 6.1 Approximation of piecewise continuous functions The formal deﬁnition of an indicator function Indicator function approximation Recovering weights Automatic feature design for the ideal classiﬁcation scenario 6.1.1 6.1.2 6.1.3 6.1.4 Automatic feature design for the real classiﬁcation scenario 6.2.1 6.2.2 6.2.3 Approximation of discretized indicator functions The real classiﬁcation scenario Classiﬁer accuracy and boundary deﬁnition 6.2 6.3 Multiclass classiﬁcation 6.4 One-versus-all multiclass classiﬁcation 6.3.1 6.3.2 Multiclass softmax classiﬁcation Cross-validation for classiﬁcation Hold out cross-validation 6.4.1 6.4.2 Hold out calculations 129 131 131 132 133 134 135 140 140 141 142 142 146 149 149 151 152 155 156 156 156 158 158 165 166 166 166 168 170 170 171 171 172 178 179 179 180 180 182 182

分享到：

赞收藏

资料库

Machine Learning Refined: Foundations, Algorithms, and Applicati....pdf

相关推荐

开发技术

热门标签

最新资料