logo资料库

统计学习,英文指导书,文字版.pdf

第1页 / 共295页
第2页 / 共295页
第3页 / 共295页
第4页 / 共295页
第5页 / 共295页
第6页 / 共295页
第7页 / 共295页
第8页 / 共295页
资料共295页,剩余部分请下载后查看
Preface
Festschrift
Credits
Contents
List of Contributors
Acronyms
Part I History of Statistical Learning Theory
Reference
1 In Hindsight: Doklady Akademii Nauk SSSR, 181(4), 1968
References
2 On the Uniform Convergence of the Frequencies of Occurrence of Events to Their Probabilities
2.1 Introduction
2.2 Statement of the Problem
Statement of the Problem
2.3 Some Additional Definitions
Some Additional Definitions
2.4 A Property of the Growth Function
A Property of the Growth Function
2.5 Sufficient Conditions for Uniform Convergence Not Depending on Properties of the Distribution
Sufficient Conditions for Uniform Convergence Not Depending on Properties of the Distribution
2.6 Necessary and Sufficient Conditions for the Uniform Convergence of Frequencies to Probabilities
Necessary and Sufficient Conditions for the Uniform Convergence of Frequencies to Probabilities
Reference
3 Early History of Support Vector Machines
3.1 The ``Generalised Portrait'' Method for Pattern Recognition (1962)
3.1.1 Initial Heuristic Ideas
3.1.2 Reducing to a Convex (Quadratic) Programming Problem
3.1.3 Generalisation of the Idea
3.2 The Kuhn–Tucker Theorem and Its Application to the ``Generalised Portrait'' Method
3.3 Searching for Coefficients Instead of Coordinates
3.4 Optimum (Largest Margin) Hyperplane
3.5 Lagrangian Dual Function
3.6 Generalisation Properties
3.7 Support Vector Machines
References
Part II Theory and Practice of Statistical Learning Theory
4 Some Remarks on the Statistical Analysis of SVMs and Related Methods
4.1 Introduction
4.2 Mathematical Prerequisites
4.3 Universal Consistency
4.3.1 Binary Classification
4.3.2 Least Squares Regression
4.4 Learning Rates
4.4.1 Binary Classification
4.4.2 Least Squares Regression
References
5 Explaining AdaBoost
5.1 Introduction
5.2 Direct Application of VC Theory
5.3 The Margins Explanation
5.4 Loss Minimization
5.5 Regularization
5.6 Inherently Unpredictable Data
5.7 Conclusions
References
6 On the Relations and Differences Between Popper Dimension, Exclusion Dimension and VC-Dimension
6.1 Introduction
6.2 Setting
6.3 Definitions with Logical Quantifiers
6.4 Relations
6.5 Discussion
References
7 On Learnability, Complexity and Stability
7.1 Introduction
7.2 Supervised Learning, Consistency and Learnability
7.2.1 Consistency and Learnability
7.3 Learnability of a Hypothesis Space
7.3.1 Complexity and Learnability
7.3.2 Stability and Learnability
7.4 Learnability in the General Learning Setting
7.4.1 Vapnik's Approach and Non-trivial Consistency
7.4.2 Stability and Learnability for General Learning
7.5 Discussion
References
8 Loss Functions
8.1 Introduction
8.2 Proper Losses and Their Representations
8.3 Divergences and the Bridge to Proper Losses
8.4 Surrogate Losses
8.5 Extension to Multiclass Losses and Divergences
8.6 Parameterisations, Links and Convexity
8.7 Effect of Losses on Statistical Convergence
8.8 Conclusion
References
9 Statistical Learning Theory in Practice
9.1 Introduction
9.2 Practical Issues with the Classical Approach
9.3 A Short Review of Existing Learning Models
9.3.1 Linear Models
9.3.2 Embedding Models
9.3.3 Nearest Neighbour
9.3.4 SVMs and Kernel Methods
9.3.5 Neural Networks
9.3.6 Further Methods
9.4 Practical Challenges on Real Datasets
References
10 PAC-Bayesian Theory
10.1 Introduction
10.2 Basic PAC-Bayesian Theorems
10.3 Catoni's Basic Inequality
10.4 Catoni's Localization Theorem
10.5 Discussion
References
11 Kernel Ridge Regression
11.1 Introduction
11.2 Kernel Ridge Regression
11.3 Kernel Ridge Regression Without Probability
11.3.1 Ordinary Ridge Regression
11.3.2 Kernel Ridge Regression
11.4 Kernel Ridge Regression in Conformal Prediction
References
12 Multi-task Learning for Computational Biology: Overview and Outlook
12.1 Introduction
12.2 Multi-task Learning
12.2.1 Relation to Transfer Learning
12.2.2 Regularization-Based Multi-task Learning
12.2.2.1 Common Approaches
12.2.3 Learning Task Similarities
12.2.3.1 A Simple Approach
12.2.3.2 Multi-task Relationship Learning (MTRL)
12.2.3.3 Multi-task Multiple Kernel Learning (MT-MKL)
12.2.3.4 Hierarchical MT-MKL
12.2.4 When Does MTL Pay Off?
12.3 Application in Computational Biology
12.4 Conclusion
References
13 Semi-supervised Learning in Causal and Anticausal Settings
13.1 Introduction
13.2 Causal Inference
13.3 Semi-supervised Learning
13.4 Empirical Results
13.5 Conclusion
References
14 Strong Universal Consistent Estimate of the Minimum Mean Squared Error
14.1 Introduction
14.2 Strong Universal Consistency
14.3 Rate of Convergence
References
15 The Median Hypothesis
15.1 Introduction
15.2 Relation to Previous Work
15.3 The Hypothesis Depth: Definitions and Properties
15.3.1 Depth for Linear Classifiers
15.3.2 Breakdown Point
15.4 Measuring Depth
15.4.1 Finding the Median
15.4.2 Implementation Issues
15.5 Discussion
References
16 Efficient Transductive Online Learning via RandomizedRounding
16.1 Introduction
16.2 The Minimax Forecaster
16.3 The R2 Forecaster
16.4 Application 1: Transductive Online Learning
16.5 Application 2: Online Collaborative Filtering
Appendix: Derivation of the Minimax Forecaster
References
17 Pivotal Estimation in High-Dimensional Regression via Linear Programming
17.1 Introduction
17.2 Notation
17.3 The Estimator
17.4 Sensitivity Characteristics
17.5 Bounds on the Estimation and Prediction Errors
References
18 On Sparsity Inducing Regularization Methods for Machine Learning
18.1 Introduction
18.2 Fixed Point Algorithms Based on Proximity Operators
18.2.1 Notation and Problem Formulation
18.2.2 Computation of a Generalized Proximity Operator with a Fixed Point Method
18.2.3 Connection to the Forward-Backward Algorithm
18.3 Examples of Composite Functions
18.4 Application to Support Vector Machines
18.5 Conclusion
References
19 Sharp Oracle Inequalities in Low Rank Estimation
19.1 Main Result
19.2 Proof of Theorem 19.1
References
20 On the Consistency of the Bootstrap Approach for Support Vector Machines and Related Kernel-Based Methods
20.1 Introduction
20.2 Support Vector Machines
20.3 Consistency of Bootstrap SVMs
20.4 Proofs
20.4.1 Tools for the Proof of Theorem 20.1
20.4.2 Proof of Theorem 20.1
References
21 Kernels, Pre-images and Optimization
21.1 Introduction
21.2 The Kernel Trick
21.3 Kernel PCA
21.4 Pre-images
21.5 Pre-images for Gradient-Based Optimization
21.5.1 Example: Shape Optimization
21.5.2 Origin of the ``Noise''
21.5.3 Optimization Constrained to the Data Manifold
21.5.4 Applications in Density Functional Theory
21.6 Conclusion
References
22 Efficient Learning of Sparse Ranking Functions
22.1 Introduction
22.2 Problem Setting
22.3 An Efficient Coordinate Descent Algorithm
22.4 Extensions
22.5 Experiments
22.6 Conclusions
References
23 Direct Approximation of Divergences Between Probability Distributions
23.1 Introduction
23.2 Divergence Measures
23.3 Direct Divergence Approximation
23.4 Usage of Divergence Estimators in Machine Learning
23.5 Conclusions
References
Index
Empirical Inference
Bernhard SchRolkopf • Zhiyuan Luo Vladimir Vovk Editors Empirical Inference Festschrift in Honor of Vladimir N. Vapnik 123
Editors Bernhard SchRolkopf Max Planck Institute for Intelligent Systems TRubingen Germany Zhiyuan Luo Vladimir Vovk Department of Computer Science Royal Holloway, University of London Egham Surrey United Kingdom ISBN 978-3-642-41135-9 DOI 10.1007/978-3-642-41136-6 Springer Heidelberg New York Dordrecht London ISBN 978-3-642-41136-6 (eBook) Library of Congress Control Number: 2013954388 © Springer-Verlag Berlin Heidelberg 2013 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
v Fig. 1 Some participants in the Empirical Inference Symposium (Photo taken by Robert Williamson) Fig. 2 Vladimir listening to a talk (Photo by Robert Williamson) Fig. 3 Vladimir delivering a talk (Photo by Robert Williamson)
Preface Vladimir Vapnik is a rare example of a scientist for whom the following three statements hold true simultaneously: his work has led to the inception of a whole new field of research, he has lived to see the field blossom and gain in popularity, and he is still as active as ever in his field. His field, the theory of statistical learning and empirical inference, did not exist when he started his PhD in Moscow in the early 1960s. He was working at the Institute of Control Sciences of the Russian Academy of Sciences under the supervision of Professor Aleksandr Lerner, a cyberneticist and expert in control theory who was later to become a prominent “Refusenik” (a Soviet Jew who was denied permission to emigrate by the Soviet government). Vladimir Vapnik started analyzing learning algorithms and invented the first version of a pattern recognition algorithm termed the “Generalized Portrait”, whose successor (the “Support Vector Machine”, co-invented by him 30 years later) would become the method of choice in many pattern recognition problems ranging from computer vision to bioinformatics. Following this, he started to collaborate with Aleksey Chervonenkis, also a PhD student in Lerner’s laboratory at the time, on the Generalized Portrait and the theory of the empirical risk minimization inductive principle. Vapnik and Chervonenkis found a stimulating intellectual environment at the Institute of Control Sciences. In 1951 Vadim Trapeznikov was appointed as the institute’s director, and it is largely due to his efforts that in the early 1960s it became a hub of new ideas and creativity. It included Mark Aizerman’s laboratory, working on the theory of learning systems and having Emmanuel Braverman and Lev Rozonoer among its members, as well as Lerner’s laboratory, where Vapnik and Chervonenkis carried out their work that would lead to profound changes in our understanding of machine learning. The impact of Vapnik and Chervonenkis’s work has been considerable in many areas of mathematics and computer science. In theoretical statistics and probability theory, their work is known for extensions of the Glivenko–Cantelli theorem and for uniform laws of large numbers. The latter can be considered as the starting point of an important branch of probability theory, the theory of empirical processes. vii
分享到:
收藏