ISBN:0521780195
An Introduction to Support Vector Machines and Other
Kernel-based Learning Methods
by Nello Cristianini and John
Shawe-Taylor
Cambridge University Press ?2000 (190 pages)
This is the first comprehensive introduction to SVMs, a
new generation learning system based on recent
advances in statistical learning theory; it will help
readers understand the theory and its real-world
applications.
Companion Web Site
- The Learning Methodology
- Linear Learning Machines
- Kernel-Induced Feature Spaces
- Generalisation Theory
- Optimisation Theory
- Support Vector Machines
- Implementation Techniques
- Applications of Support Vector Machines
Table of Contents
An Introduction to Support Vector Machines and Other Kernel-Based
Learning Methods
Preface
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Appendix A - Pseudocode for the SMO Algorithm
Appendix B - Background Mathematics
References
Index
List of Figures
List of Tables
List of Examples
1
1
2
2
An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods
1
An Introduction to Support Vector Machines and
Other Kernel-Based Learning Methods
Nello Cristianini
John Shawe-Taylor
CAMBRIDGE UNIVERSITY PRESS
PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE
The Pitt Building, Trumpington Street, Cambridge, United Kingdom
CAMBRIDGE UNIVERSITY PRESS
The Edinburgh Building, Cambridge CB2 2RU, UK
40 West 20th Street, New York, NY 10011-4211, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
Ruiz de Alarcón 13, 28014 Madrid, Spain
Dock House, The Waterfront, Cape Town 8001, South Africa
http://www.cambridge.org
Copyright © 2000 Cambridge University Press
This book is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing
agreements, no reproduction of any part may take place without the written permission of Cambridge
University Press.
First published 2000
Reprinted 2000 (with corrections), 2001 (twice), 2002 (with corrections), 2003
Typeface Times 10/12pt. System LATEX 2ε [EPC]
A catalogue record for this book is available from the British Library
Library of Congress Cataloguing in Publication data available
0521780195
An Introduction to Support Vector Machines
This book is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation
learning system based on recent advances in statistical learning theory. SVMs deliver state-of-the-art
performance in real-world applications such as text categorisation, hand-written character recognition, image
classification, biosequence analysis, etc.
Their first introduction in the early '90s led to an explosion of applications and deepening theoretical analysis,
that has now established Support Vector Machines as one of the standard tools for machine learning and data
mining. Students will find the book both stimulating and accessible, while practitioners will be guided
smoothly through the material required for a good grasp of the theory and application of these techniques. The
concepts are introduced gradually in accessible and self-contained stages, while in each stage the presentation
is rigorous and thorough. Pointers to relevant literature and web sites containing software ensure that it forms
an ideal starting point for further study. Equally the book will equip the practitioner to apply the techniques
and its associated web site will provide pointers to updated literature, new applications, and on-line software.
An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods
1
2
An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods
Nello Cristianini was born in Gorizia, Italy. He has studied at University of Trieste in Italy; Royal Holloway,
University of London; the University of Bristol; and the University of California in Santa Cruz. He is an
active young researcher in the theory and applications of Support Vector Machines and other learning systems
and has published in a number of key international conferences and journals in this area.
John Shawe-Taylor was born in Cheltenham, England. He studied at the University of Cambridge; University
of Ljubljana in Slovenia; Simon Fraser University in Canada; Imperial College; and Royal Holloway,
University of London. He has published widely on the theoretical analysis of learning systems in addition to
other areas of discrete mathematics and computer science. He is a professor of Computing Science at Royal
Holloway, University of London. He is currently the co-ordinator of a European funded collaboration of
sixteen universities involved in research on Neural and Computational Learning.
2
An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods
Preface
Preface
1
In the last few years there have been very significant developments in the theoretical understanding of Support
Vector Machines (SVMs) as well as algorithmic strategies for implementing them, and applications of the
approach to practical problems. We believe that the topic has reached the point at which it should perhaps be
viewed as its own subfield of machine learning, a subfield which promises much in both theoretical insights
and practical usefulness. Despite reaching this stage of development, we were aware that no organic
integrated introduction to the subject had yet been attempted. Presenting a comprehensive introduction to
SVMs requires the synthesis of a surprisingly wide range of material, including dual representations, feature
spaces, learning theory, optimisation theory, and algorithmics. Though active research is still being pursued in
all of these areas, there are stable foundations in each that together form the basis for the SVM concept. By
building from those stable foundations, this book attempts a measured and accessible introduction to the
subject of Support Vector Machines.
The book is intended for machine learning students and practitioners who want a gentle but rigorous
introduction to this new class of learning systems. It is organised as a textbook that can be used either as a
central text for a course on SVMs, or as an additional text in a neural networks, machine learning, or pattern
recognition class. Despite its organisation as a textbook, we have kept the presentation self-contained to
ensure that it is suitable for the interested scientific reader not necessarily working directly in machine
learning of computer science. In this way the book should give readers from other scientific disciplines a
practical introduction to Support Vector Machines enabling them to apply the approach to problems from their
own domain. We have attempted to provide the reader with a route map through the rigorous derivation of the
material. For this reason we have only included proofs or proof sketches where they are accessible and where
we feel that they enhance the understanding of the main ideas. Readers who are interested in the detailed
proofs of the quoted results are referred to the original articles.
Exercises are provided at the end of the chapters, as well as pointers to relevant literature and on-line software
and articles. Given the potential instability of on-line material, in some cases the book points to a dedicated
website, where the relevant links will be kept updated, hence ensuring that readers can continue to access
on-line software and articles. We have always endeavoured to make clear who is responsible for the material
even if the pointer to it is an indirect one. We hope that authors will not be offended by these occasional
indirect pointers to their work. Each chapter finishes with a section entitled Further Reading and Advanced
Topics, which fulfils two functions. First by moving all the references into this section we have kept the main
text as uncluttered as possible. Again we ask for the indulgence of those who have contributed to this field
when we quote their work but delay giving a reference until this section. Secondly, the section is intended to
provide a starting point for readers who wish to delve further into the topics covered in that chapter. The
references will also be held and kept up to date on the website. A further motivation for moving the references
out of the main body of text is the fact that the field has now reached a stage of maturity which justifies our
unified presentation. The two exceptions we have made to this rule are firstly for theorems which are
generally known by the name of the original author such as Mercer's theorem, and secondly in Chapter 8
which describes specific experiments reported in the research literature.
The fundamental principle that guided the writing of the book is that it should be accessible to students and
practitioners who would prefer to avoid complicated proofs and definitions on their way to using SVMs. We
believe that by developing the material in intuitively appealing but rigorous stages, in fact SVMs appear as
simple and natural systems. Where possible we first introduce concepts in a simple example, only then
showing how they are used in more complex cases. The book is self-contained, with an appendix providing
any necessary mathematical tools beyond basic linear algebra and probability. This makes it suitable for a
very interdisciplinary audience.
Much of the material was presented in five hours of tutorials on SVMs and large margin generalisation held at
the University of California at Santa Cruz during 1999, and most of the feedback received from these was
incorporated into the book. Part of this book was written while Nello was visiting the University of California
Preface
1
2
Preface
at Santa Cruz, a wonderful place to work thanks to both his hosts and the environment of the campus. During
the writing of the book, Nello made frequent and long visits to Royal Holloway, University of London. Nello
would like to thank Lynda and her family for hosting him during these visits. Together with John he would
also like to thank Alex Gammerman, the technical and administrative staff, and academic colleagues of the
Department of Computer Science at Royal Holloway for providing a supportive and relaxed working
environment, allowing them the opportunity to concentrate on the writing.
Many people have contributed to the shape and substance of the book, both indirectly through discussions and
directly through comments on early versions of the manuscript. We would like to thank Kristin Bennett, Colin
Campbell, Nicolo Cesa-Bianchi, David Haussler, Ralf Herbrich, Ulrich Kockelkorn, John Platt, Tomaso
Poggio, Bernhard Schölkopf, Alex Smola, Chris Watkins, Manfred Warmuth, Chris Williams, and Bob
Williamson.
We would also like to thank David Tranah and Cambridge University Press for being so supportive and
helpful in the processing of the book. Alessio Cristianini assisted in the establishment of the website.
Kostantinos Veropoulos helped to create the pictures for Chapter 6 which were generated using his software
package at the University of Bristol. We would like to thank John Platt for providing the SMO pseudocode
included in Appendix A.
Nello would like to thank the EPSRC for supporting his research and Colin Campbell for being a very
understanding and helpful supervisor. John would like to thank the European Commission for support through
the NeuroCOLT2 Working Group, EP27150.
Since the first edition appeared a small number of errors have been brought to our attention, and we have
endeavoured to ensure that they were all corrected before reprinting. We would be grateful if anyone
discovering further problems contact us through the feedback facility on the book's web page
http://www.support-vector.net.
Nello Cristianini and John Shawe-Taylor
June, 2000
Notation
dimension of feature space
output and output space
input and input space
feature space
general class of real-valued functions
class of linear functions
inner product between x and z
mapping to feature space
kernel ீ
real- valued function before thresholding
dimension of input space
radius of the ball containing the data
loss function insensitive to errors less than ε
weight vector
bias
dual variables or Lagrange multipliers
φ(x) · φ(z)ு
N
y Ü Y
x Ü X
F
ீx · zு
φ : X → F
K(x,z)
f(x)
n
R
ε-insensitive
w
b
2
Preface
α
Preface
L
W
||·||p
ln
e
log
x′, X′
S
d
primal Lagrangian
dual Lagrangian
p-norm
natural logarithm
base of the natural logarithm
logarithm to the base 2
transpose of vector, matrix
natural, real numbers
training sample
training set size
learning rate
error probability
confidence
margin
slack variables
VC dimension
Preface
3
3
ℓ
η
ε
δ
γ
ξ
4
4
Preface
Preface