Adaptive and Learning Systems for Signal Processing, Communications, and Control
E d t o ~ : Simon Huykin
Werbos / THE ROOTS OF BACKPROPAGATION: From Ordered Derivatives to Neural
Networks and political Forecasting
Krshc 5 Kanellakopoulos, and Kokotovic:
/ NONLINEAR AND ADAPTIVE CONTROL
DESIGN
Niluas and Shao / SIGNAL PROCESSING WITH ALPHA-STABLE DISTRIBUTIONS AND
APPLICATIONS
Diamantaras and Kung / PRINCIPAL COMPONENT NEURAL NETWORKS: THEORY AND
APPLICATIONS
Tao and Kokotovic:/ ADAPTIVE CONTROL OF SYSTEMS WITH ACTUATOR AND
SENSOR NONLINEARITIES
Tsoukalas / FUZZY AND NEURAL APPROACHES IN ENGINEERING Hrycej /
NEUROCONTROL: TOWARDS AN INDUSTRIAL CONTROL METHODOLOGY
Beckeman / ADAPTIVE COOPERATIVE SYSTEMS
Cherkassky and Muher / LEARNING FROM DATA: CONCEPTS, THEORY, AND METHODS
passisto and Burgess / STABILITY ANALYSIS OF DISCRETE EVENT SYSTEMS
Sinchez-~eiia and Sznaier / ROBUST SYSTEMS THEORY AND APPLICATIONS
Vapnik / STATISTICAL LEARNING THEORY
Statistical Learning Theory
A Wt LEY-INf ERSCIENCE PUBLICATfON
JOHN WLLfY & SONS, LNC,
NEWYORK 1 CCilCHESTER j WEfNHEIM I BRISBANE j StNGAPORE I TORONTO
Page iv
Disclaimer:
T h s book contains characters with chacntics. When the characters can be represented using the IS0
8859- 1 character set (http, netLibrary will represent them as
they appear in the orignal text, and most computers will be able to show the full characters
correctly. In order to keep the text searchable and readable on most computers, characters with
hacritics that are not part of the IS0 8859- 1 list will be represented without their chacritical marks.
T h s book is printed on acid-fiee paper. '
Copynght O 1998 by John Wiley & Sons, Inc. All rights reserved.
published simultaneously in Canada.
No part of t h ~ s publication may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopyng, recordmg, scanning or otherwise,
except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without
either the prior written permission of the publisher, or authorizahon through payment of the
appropriate per-copy fee to the Copynght Clearance Center, 222 Rosewood Drive, Danvers, MA
0 1923, (978) 7504400, fax (978) 7504744. Requests to the publisher for permission should be
addressed to the permissions Department, John Wiley & Sons, Inc., 605 Thrd Avenue, New York,
NY 10 1584012, (21 2) 85040 11, fax (212) 8504008, E-Mail: pERMREQ@WILEY.COM.
cm.--(Adaptive and learning systems for signal
Librmy of Congress Cataloging-in-Publieution Data:
Vapnik, V l a h r Naurnovich
Statistical learning theory / Vladmir N. Vapnik
p.
processing, communications, and control)
Includes bibliographcal references and index.
ISBN 0 4 7 143003-1 (cloth : alk. paper)
1 . Computational learning theory. I. Title. 11. Series.
Q325.7.V38 1998
008.3 1--dc21
97-37075
CIP
printed in the United States of America
1 0 9 8 7 6 5 4 3 2 1
Page v
In memory ofmy father
Page v i ~
CONTENTS
Preface
Introduction: The Prohlen~ of Induction and Statidical Inference
0 I Learning Paradigm in Statistics
0 2 Two Approaches to Statistical Inference Particular (I)arametnc Inference) and General (Nonparametnc Inference)
0 3 The Paradigm Created by the Parametric Approach
0 4 Shortcoming of the Parametric Paradigm
0 5 After the Classical Parahgm
0 6 The Renassance
0 7 The Generalization of the Gl~venko-Cantelll-Kolmogorov Theory
0 8 The Structural R~skM~nimization Pnnciple
0 9 The M a n Pnnciple of Inference from a S m d Sample Slze
0 10 What Thls Book is About
I Theory of Learning and Generalization
1 Two Approaches to the Leamlng Problem
1 I General Model of Learning from Examples
1 2 The Problem of hfinimizlng the RiskFunchond from Empincd Data
1 3 The Problem of Pattern Recognition
1 4 The Problem of Regression Estimation
1 5 Problem of Interpreting Results of Indirect Measuring
1 6 The Problem of Eensity Estimatlon (the Fisher-Wald Setting)
1 7 Induction Principles for Minimizing the Rlsk Functional on the Basis of Empirical Data
1 8 Classical Methods for Solving the Function Estimation Problems
1 9 Identlficatlon of Stochastic Objects Est~mation of the Densities and Conditional Densities
1 9 1 Problem of Dens~ty Est~mation Dlrect Setting
1 9 2 Problem of Condihanal Probablllty Est~mahon
1 9 3 Problem of Conditional Dens~ty Estimation
1 10 The Problem of Solving an Approximately Determined Integral Equation
1 1 I Ghvenko-Cantelti Theorem
1 11 1 Convergence in Probability and Almost Sure Convergence
1 11 2 Ghvenko-Cantel11 Thealem
1 11 3 Three Important Statist~cal Laws
1 12 nl-PosedProblems
1 13 The Structure of the Learnlng Theory
Appendix to Chapter 1 Methods for Solving I11-PosedProblems
A l I The Problem of Solvlng an Operator Equation
A1 2 Problems Well-Pased~n Tlkhanovs Sense
A l 3 The Regulx~zation Method
A l 3 1 Idea of Regularization Method
A1 3 2 M a n Theorems about the Regulxization Method
2 Estimation of the Probability Measure andProblem of L e m i n g
2 I P r a b a b ~ l ~ l y Model of aRandom Experiment
2 2 The B a s ~ c Problem of Stahshcs
2 2 1 The Basic Problems of Probability and Statistics
2 2 2 Uniform Convergence of Probability Measure Est~mates
2 3 Cnnrl~i~nns
for the TTn~form Cnmr~rgenre of Estlrnates in the TTnknnwn Prnhablllty Measure
2 3 1 Structure of D ~ s t n b u t ~ a n Function
2 3 2 Estlmatar that Provldes Uniform Convergence
2 4 Pmilal Uniform Convergence and Generalization of Glivenko-Cantelli Theorem
2 4 1 Definition of Partial Uniform Convergence
2 4 2 Generalization of the Glivenko-Cantelli Problem
2 5 Mmimizing the Risk Functional Under the Condition of Uniform Convergence of Probability Measure Estimates
2 6 Iv5nimizing the RiskFunctional under the Condition of PKtial Uniform Convergence of Probability Measure
Eshmates
2 7 Remarks about Modes of Convergence of the Probability Measure Estimates and Statements of the Learnrig Problem
3 Conditions for Consistency of Emplncal R ~ s k Minimizahon Principle
3 1 Classical Definition of Consistency
3 2 Definition of Stnct (Nontrivial) Consistency
3 2 1 Definltlon of Strict Conslstency for the Pattern Recagnltion andthe Regress~on Estimation Problems
3 2 2 Definltlon of Strict Conslstency for the Density Estlmat~an Problem
3 3 Empirical Processes
3 3 1 Remark on the Law of L x g e Numbers andIts Generalization
3 4 The Key Theorem of Leaming Theory (Theorem about Equivalence)
3 5 Proof of the Key Thearem
3 6 Stnct Cans~stency a f t h e M a x ~ m u m L ~ k e l ~ h a o d M e t h o d
3 7 Necessacy and Sufficient Conditions for Uniform Convergence of Frequencies to their Probabilities
3 7 1 Three Cases of Uniform Convergence
3 7 2 Conditions of Uniform Convergence in the Simplest Model
3 7 3 Entropy of a Set of Functions
3 7 4 Thearem about Umform Two-S~ded Convergence
3 8 Necessary and Sufficlent Candihons far Umform Convergence of Means to their Expectations for a Set of Real-
ValuedBounded Functions
3 8 1 Entropy of a Set of Real-ValuedFunctions
3 8 2 Theorem about Uniform Two-Sided Convergence
3 9 Necessary and Sufficlent Conditions far Umform Convergence of Means to thelr Expectations for Sets of Unbounded
Functions
3 9 1 Proof of Theorem 3 5