Information Science and Statistics
Series Editors:
M. Jordan
J. Kleinberg
B. Schölkopf
Information Science and Statistics
Akaike and Kitagawa: The Practice of Time Series Analysis.
Cowell, Dawid, Lauritzen, and Spiegelhalter: Probabilistic Networks and Expert
Systems.
Doucet, de Freitas, and Gordon: Sequential Monte Carlo Methods in Practice.
Fine: Feedforward Neural Network Methodology.
Hawkins and Olwell: Cumulative Sum Charts and Charting for Quality
Improvement.
Jensen: Bayesian Networks and Decision Graphs.
Marchette: Computer Intrusion Detection and Network Monitoring: A Statistical
Viewpoint.
Rubinstein and Kroese: The Cross-Entropy Method: A Unified Approach to
Combinatorial Optimization, Monte Carlo Simulation, and Machine Learning.
Studen´y: Probabilistic Conditional Independence Structures.
Vapnik: The Nature of Statistical Learning Theory, Second Edition.
Wallace: Statistical and Inductive Inference by Minimum Massage Length.
Vladimir Vapnik
Estimation of
Dependences Based on
Empirical Data
Reprint of 1982 Edition
Empirical
Inference Science
Afterword of 2006
Vladimir Vapnik
NEC Labs America
4 Independence Way
Princeton, NJ 08540
vlad@nec-labs.com
Samuel Kotz (Translator)
Department of Engineering Management
and Systems Engineering
The George Washington University
Washington, D.C. 20052
Series Editors:
Michael Jordan
Division of Computer
Science and
Department of Statistics
University of California,
Berkeley
Berkeley, CA 94720
USA
Jon Kleinberg
Department of Computer
Science
Cornell University
Ithaca, NY 14853
USA
Bernhard Schölkopf
Max Planck Institute for
Biological Cybernetics
Spemannstrasse 38
72076 Tübingen
Germany
Library of Congress Control Number: 2005938355
ISBN-10: 0-387-30865-2
ISBN-13: 978-0387-30865-4
Printed on acid-free paper.
© 2006 Springer Science+Business Media, Inc.
All rights reserved. This work may not be translated or copied in whole or in part without the
written permission of the publisher (Springer Science+Business Media, Inc., 233 Spring Street,
New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly
analysis. Use in connection with any form of information storage and retrieval, electronic
adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if
they are not identified as such, is not to be taken as an expression of opinion as to whether or
not they are subject to proprietary rights.
Printed in the United States of America.
(MVY)
9 8 7 6 5 4 3 2 1
springer.com
Vladimir Vapnik
Estimation of
Dependences Based on
Empirical Data
Translated by Samuel Kotz
With 22 illustrations
To the students of my students in memory of my violin teacher
Ilia Shtein and PhD advisor Alexander Lerner, who taught me
several important things that are very difficult to learn from
books.
PREFACE
Twenty-five years have passed since the publication of the Russian version of the book
Estimation of Dependencies Based on Empirical Data (EDBED for short). Twenty-
five years is a long period of time. During these years many things have happened.
Looking back, one can see how rapidly life and technology have changed, and how
slow and difficult it is to change the theoretical foundation of the technology and its
philosophy.
I pursued two goals writing this Afterword: to update the technical results presented
in EDBED (the easy goal) and to describe a general picture of how the new ideas
developed over these years (a much more difficult goal).
The picture which I would like to present is a very personal (and therefore very
biased) account of the development of one particular branch of science, Empirical In-
ference Science.
Such accounts usually are not included in the content of technical publications. I
have followed this rule in all of my previous books. But this time I would like to violate
it for the following reasons. First of all, for me EDBED is the important milestone in
the development of empirical inference theory and I would like to explain why. Sec-
ond, during these years, there were a lot of discussions between supporters of the new
paradigm (now it is called the VC theory1) and the old one (classical statistics). Being
involved in these discussions from the very beginning I feel that it is my obligation to
describe the main events.
The story related to the book, which I would like to tell, is the story of how it
is difficult to overcome existing prejudices (both scientific and social), and how one
should be careful when evaluating and interpreting new technical concepts.
This story can be split into three parts that reflect three main ideas in the develop-
ment of empirical inference science: from the pure technical (mathematical) elements
of the theory to a new paradigm in the philosophy of generalization.
1VC theory is an abbreviation for Vapnik–Chervonenkis theory. This name for the corresponding theory
appeared in the 1990s after EDBED was published.
405