统计学习，英文指导书，文字版.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：3.71M 资料格式：pdf 举报版权申诉

zhxd050946-10300621-Empirical Inference.pdf-第1页.png

第1页 / 共295页

zhxd050946-10300621-Empirical Inference.pdf-第2页.png

第2页 / 共295页

zhxd050946-10300621-Empirical Inference.pdf-第3页.png

第3页 / 共295页

zhxd050946-10300621-Empirical Inference.pdf-第4页.png

第4页 / 共295页

zhxd050946-10300621-Empirical Inference.pdf-第5页.png

第5页 / 共295页

zhxd050946-10300621-Empirical Inference.pdf-第6页.png

第6页 / 共295页

zhxd050946-10300621-Empirical Inference.pdf-第7页.png

第7页 / 共295页

zhxd050946-10300621-Empirical Inference.pdf-第8页.png

第8页 / 共295页

Preface

Festschrift

Credits

Contents

List of Contributors

Acronyms

Part I History of Statistical Learning Theory

Reference

1 In Hindsight: Doklady Akademii Nauk SSSR, 181(4), 1968

References

2 On the Uniform Convergence of the Frequencies of Occurrence of Events to Their Probabilities

2.1 Introduction

2.2 Statement of the Problem

Statement of the Problem

2.3 Some Additional Definitions

Some Additional Definitions

2.4 A Property of the Growth Function

A Property of the Growth Function

2.5 Sufficient Conditions for Uniform Convergence Not Depending on Properties of the Distribution

Sufficient Conditions for Uniform Convergence Not Depending on Properties of the Distribution

2.6 Necessary and Sufficient Conditions for the Uniform Convergence of Frequencies to Probabilities

Necessary and Sufficient Conditions for the Uniform Convergence of Frequencies to Probabilities

Reference

3 Early History of Support Vector Machines

3.1 The ``Generalised Portrait'' Method for Pattern Recognition (1962)

3.1.1 Initial Heuristic Ideas

3.1.2 Reducing to a Convex (Quadratic) Programming Problem

3.1.3 Generalisation of the Idea

3.2 The Kuhn–Tucker Theorem and Its Application to the ``Generalised Portrait'' Method

3.3 Searching for Coefficients Instead of Coordinates

3.4 Optimum (Largest Margin) Hyperplane

3.5 Lagrangian Dual Function

3.6 Generalisation Properties

3.7 Support Vector Machines

References

Part II Theory and Practice of Statistical Learning Theory

4 Some Remarks on the Statistical Analysis of SVMs and Related Methods

4.1 Introduction

4.2 Mathematical Prerequisites

4.3 Universal Consistency

4.3.1 Binary Classification

4.3.2 Least Squares Regression

4.4 Learning Rates

4.4.1 Binary Classification

4.4.2 Least Squares Regression

References

5 Explaining AdaBoost

5.1 Introduction

5.2 Direct Application of VC Theory

5.3 The Margins Explanation

5.4 Loss Minimization

5.5 Regularization

5.6 Inherently Unpredictable Data

5.7 Conclusions

References

6 On the Relations and Differences Between Popper Dimension, Exclusion Dimension and VC-Dimension

6.1 Introduction

6.2 Setting

6.3 Definitions with Logical Quantifiers

6.4 Relations

6.5 Discussion

References

7 On Learnability, Complexity and Stability

7.1 Introduction

7.2 Supervised Learning, Consistency and Learnability

7.2.1 Consistency and Learnability

7.3 Learnability of a Hypothesis Space

7.3.1 Complexity and Learnability

7.3.2 Stability and Learnability

7.4 Learnability in the General Learning Setting

7.4.1 Vapnik's Approach and Non-trivial Consistency

7.4.2 Stability and Learnability for General Learning

7.5 Discussion

References

8 Loss Functions

8.1 Introduction

8.2 Proper Losses and Their Representations

8.3 Divergences and the Bridge to Proper Losses

8.4 Surrogate Losses

8.5 Extension to Multiclass Losses and Divergences

8.6 Parameterisations, Links and Convexity

8.7 Effect of Losses on Statistical Convergence

8.8 Conclusion

References

9 Statistical Learning Theory in Practice

9.1 Introduction

9.2 Practical Issues with the Classical Approach

9.3 A Short Review of Existing Learning Models

9.3.1 Linear Models

9.3.2 Embedding Models

9.3.3 Nearest Neighbour

9.3.4 SVMs and Kernel Methods

9.3.5 Neural Networks

9.3.6 Further Methods

9.4 Practical Challenges on Real Datasets

References

10 PAC-Bayesian Theory

10.1 Introduction

10.2 Basic PAC-Bayesian Theorems

10.3 Catoni's Basic Inequality

10.4 Catoni's Localization Theorem

10.5 Discussion

References

11 Kernel Ridge Regression

11.1 Introduction

11.2 Kernel Ridge Regression

11.3 Kernel Ridge Regression Without Probability

11.3.1 Ordinary Ridge Regression

11.3.2 Kernel Ridge Regression

11.4 Kernel Ridge Regression in Conformal Prediction

References

12 Multi-task Learning for Computational Biology: Overview and Outlook

12.1 Introduction

12.2 Multi-task Learning

12.2.1 Relation to Transfer Learning

12.2.2 Regularization-Based Multi-task Learning

12.2.2.1 Common Approaches

12.2.3 Learning Task Similarities

12.2.3.1 A Simple Approach

12.2.3.2 Multi-task Relationship Learning (MTRL)

12.2.3.3 Multi-task Multiple Kernel Learning (MT-MKL)

12.2.3.4 Hierarchical MT-MKL

12.2.4 When Does MTL Pay Off?

12.3 Application in Computational Biology

12.4 Conclusion

References

13 Semi-supervised Learning in Causal and Anticausal Settings

13.1 Introduction

13.2 Causal Inference

13.3 Semi-supervised Learning

13.4 Empirical Results

13.5 Conclusion

References

14 Strong Universal Consistent Estimate of the Minimum Mean Squared Error

14.1 Introduction

14.2 Strong Universal Consistency

14.3 Rate of Convergence

References

15 The Median Hypothesis

15.1 Introduction

15.2 Relation to Previous Work

15.3 The Hypothesis Depth: Definitions and Properties

15.3.1 Depth for Linear Classifiers

15.3.2 Breakdown Point

15.4 Measuring Depth

15.4.1 Finding the Median

15.4.2 Implementation Issues

15.5 Discussion

References

16 Efficient Transductive Online Learning via RandomizedRounding

16.1 Introduction

16.2 The Minimax Forecaster

16.3 The R2 Forecaster

16.4 Application 1: Transductive Online Learning

16.5 Application 2: Online Collaborative Filtering

Appendix: Derivation of the Minimax Forecaster

References

17 Pivotal Estimation in High-Dimensional Regression via Linear Programming

17.1 Introduction

17.2 Notation

17.3 The Estimator

17.4 Sensitivity Characteristics

17.5 Bounds on the Estimation and Prediction Errors

References

18 On Sparsity Inducing Regularization Methods for Machine Learning

18.1 Introduction

18.2 Fixed Point Algorithms Based on Proximity Operators

18.2.1 Notation and Problem Formulation

18.2.2 Computation of a Generalized Proximity Operator with a Fixed Point Method

18.2.3 Connection to the Forward-Backward Algorithm

18.3 Examples of Composite Functions

18.4 Application to Support Vector Machines

18.5 Conclusion

References

19 Sharp Oracle Inequalities in Low Rank Estimation

19.1 Main Result

19.2 Proof of Theorem 19.1

References

20 On the Consistency of the Bootstrap Approach for Support Vector Machines and Related Kernel-Based Methods

20.1 Introduction

20.2 Support Vector Machines

20.3 Consistency of Bootstrap SVMs

20.4 Proofs

20.4.1 Tools for the Proof of Theorem 20.1

20.4.2 Proof of Theorem 20.1

References

21 Kernels, Pre-images and Optimization

21.1 Introduction

21.2 The Kernel Trick

21.3 Kernel PCA

21.4 Pre-images

21.5 Pre-images for Gradient-Based Optimization

21.5.1 Example: Shape Optimization

21.5.2 Origin of the ``Noise''

21.5.3 Optimization Constrained to the Data Manifold

21.5.4 Applications in Density Functional Theory

21.6 Conclusion

References

22 Efficient Learning of Sparse Ranking Functions

22.1 Introduction

22.2 Problem Setting

22.3 An Efficient Coordinate Descent Algorithm

22.4 Extensions

22.5 Experiments

22.6 Conclusions

References

23 Direct Approximation of Divergences Between Probability Distributions

23.1 Introduction

23.2 Divergence Measures

23.3 Direct Divergence Approximation

23.4 Usage of Divergence Estimators in Machine Learning

23.5 Conclusions

References

Index

Empirical Inference

Bernhard SchRolkopf • Zhiyuan Luo Vladimir Vovk Editors Empirical Inference Festschrift in Honor of Vladimir N. Vapnik 123

Editors Bernhard SchRolkopf Max Planck Institute for Intelligent Systems TRubingen Germany Zhiyuan Luo Vladimir Vovk Department of Computer Science Royal Holloway, University of London Egham Surrey United Kingdom ISBN 978-3-642-41135-9 DOI 10.1007/978-3-642-41136-6 Springer Heidelberg New York Dordrecht London ISBN 978-3-642-41136-6 (eBook) Library of Congress Control Number: 2013954388 © Springer-Verlag Berlin Heidelberg 2013 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied speciﬁcally for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

v Fig. 1 Some participants in the Empirical Inference Symposium (Photo taken by Robert Williamson) Fig. 2 Vladimir listening to a talk (Photo by Robert Williamson) Fig. 3 Vladimir delivering a talk (Photo by Robert Williamson)

Preface Vladimir Vapnik is a rare example of a scientist for whom the following three statements hold true simultaneously: his work has led to the inception of a whole new ﬁeld of research, he has lived to see the ﬁeld blossom and gain in popularity, and he is still as active as ever in his ﬁeld. His ﬁeld, the theory of statistical learning and empirical inference, did not exist when he started his PhD in Moscow in the early 1960s. He was working at the Institute of Control Sciences of the Russian Academy of Sciences under the supervision of Professor Aleksandr Lerner, a cyberneticist and expert in control theory who was later to become a prominent “Refusenik” (a Soviet Jew who was denied permission to emigrate by the Soviet government). Vladimir Vapnik started analyzing learning algorithms and invented the ﬁrst version of a pattern recognition algorithm termed the “Generalized Portrait”, whose successor (the “Support Vector Machine”, co-invented by him 30 years later) would become the method of choice in many pattern recognition problems ranging from computer vision to bioinformatics. Following this, he started to collaborate with Aleksey Chervonenkis, also a PhD student in Lerner’s laboratory at the time, on the Generalized Portrait and the theory of the empirical risk minimization inductive principle. Vapnik and Chervonenkis found a stimulating intellectual environment at the Institute of Control Sciences. In 1951 Vadim Trapeznikov was appointed as the institute’s director, and it is largely due to his efforts that in the early 1960s it became a hub of new ideas and creativity. It included Mark Aizerman’s laboratory, working on the theory of learning systems and having Emmanuel Braverman and Lev Rozonoer among its members, as well as Lerner’s laboratory, where Vapnik and Chervonenkis carried out their work that would lead to profound changes in our understanding of machine learning. The impact of Vapnik and Chervonenkis’s work has been considerable in many areas of mathematics and computer science. In theoretical statistics and probability theory, their work is known for extensions of the Glivenko–Cantelli theorem and for uniform laws of large numbers. The latter can be considered as the starting point of an important branch of probability theory, the theory of empirical processes. vii

分享到：

赞收藏

资料库

统计学习，英文指导书，文字版.pdf

相关推荐

人工智能

热门标签

最新资料