logo资料库

Kernel Methods for Pattern Analysis..pdf

第1页 / 共478页
第2页 / 共478页
第3页 / 共478页
第4页 / 共478页
第5页 / 共478页
第6页 / 共478页
第7页 / 共478页
第8页 / 共478页
资料共478页,剩余部分请下载后查看
Cover
Half-title
Title
Copyright
Contents
Code fragments
Preface
Part I Basic concepts
1 Pattern analysis
1.1 Patterns in data
1.1.1 Data
1.1.2 Patterns
1.2 Pattern analysis algorithms
1.2.1 Statistical stability of patterns
1.2.2 Detecting patterns by recoding
1.3 Exploiting patterns
1.3.1 The overall strategy
1.3.2 Common pattern analysis tasks
1.4 Summary
1.5 Further reading and advanced topics
2 Kernel methods: an overview
2.1 The overall picture
2.2 Linear regression in a feature space
2.2.1 Primal linear regression
2.2.2 Ridge regression: primal and dual
2.2.3 Kernel-defined nonlinear feature mappings
2.3 Other examples
2.3.1 Algorithms
2.3.2 Kernels
2.4 The modularity of kernel methods
2.5 Roadmap of the book
2.6 Summary
2.7 Further reading and advanced topics
3 Properties of kernels
3.1 Inner products and positive semi-definite matrices
3.1.1 Hilbert spaces
3.1.2 Gram matrix
3.2 Characterisation of kernels
3.3 The kernel matrix
3.4 Kernel construction
3.4.1 Operations on kernel functions
3.4.2 Operations on kernel matrices
3.5 Summary
3.6 Further reading and advanced topics
4 Detecting stable patterns
4.1 Concentration inequalities
4.2 Capacity and regularisation: Rademacher theory
4.3 Pattern stability for kernel-based classes
4.4 A pragmatic approach
4.5 Summary
4.6 Further reading and advanced topics
Part II Pattern analysis algorithms
5 Elementary algorithms in feature space
5.1 Means and distances
5.1.1 A simple algorithm for novelty-detection
5.1.2 A simple algorithm for classification
5.2 Computing projections: Gram–Schmidt, QR and Cholesky
5.3 Measuring the spread of the data
5.4 Fisher discriminant analysis I
5.5 Summary
5.6 Further reading and advanced topics
6 Pattern analysis using eigen-decompositions
6.1 Singular value decomposition
6.2 Principal components analysis
6.2.1 Kernel principal components analysis
6.2.2 Stability of principal components analysis
6.3 Directions of maximum covariance
6.4 The generalised eigenvector problem
6.5 Canonical correlation analysis
6.6 Fisher discriminant analysis II
6.7 Methods for linear regression
6.7.1 Partial least squares
6.7.2 Kernel partial least squares
6.8 Summary
6.9 Further reading and advanced topics
7 Pattern analysis using convex optimisation
7.1 The smallest enclosing hypersphere
7.1.1 The smallest hypersphere containing a set of points
7.1.2 Stability of novelty-detection
7.1.3 Hyperspheres containing most of the points
7.2 Support vector machines for classification
7.2.1 The maximal margin classifier
7.2.2 Soft margin classifiers
7.3 Support vector machines for regression
7.3.1 Stability of regression
7.3.2 Ridge regression
7.3.3 ε-insensitive regression
7.4 On-line classification and regression
7.5 Summary
7.6 Further reading and advanced topics
8 Ranking, clustering and data visualisation
8.1 Discovering rank relations
8.1.1 Batch ranking
8.1.2 On-line ranking
8.2 Discovering cluster structure in a feature space
8.2.1 Measuring cluster quality
8.2.2 Greedy solution: k-means
8.2.3 Relaxed solution: spectral methods
8.3 Data visualisation
8.4 Summary
8.5 Further reading and advanced topics
Part III Constructing kernels
9 Basic kernels and kernel types
9.1 Kernels in closed form
9.2 ANOVA kernels
9.3 Kernels from graphs
9.4 Diffusion kernels on graph nodes
9.5 Kernels on sets
9.6 Kernels on real numbers
9.7 Randomised kernels
9.8 Other kernel types
9.8.1 Kernels from successive embeddings
9.8.2 Kernels over general structures
9.8.3 Kernels from generative information
9.9 Summary
9.10 Further reading and advanced topics
10 Kernels for text
10.1 From bag of words to semantic space
10.1.1 Representing text
10.1.2 Semantic issues
10.2 Vector space kernels
10.2.1 Designing semantic kernels
10.2.2 Designing the proximity matrix
10.3 Summary
10.4 Further reading and advanced topics
11 Kernels for structured data: strings, trees, etc.
11.1 Comparing strings and sequences
11.2 Spectrum kernels
11.3 All-subsequences kernels
11.4 Fixed length subsequences kernels
11.5 Gap-weighted subsequences kernels
11.5.1 Naive implementation
11.5.2 Efficient implementation
11.5.3 Variations on the theme
11.6 Beyond dynamic programming: trie-based kernels
11.6.1 Trie computation of the p-spectrum kernels
11.6.2 Trie-based mismatch kernels
11.6.3 Trie-based restricted gap-weighted kernels
11.7 Kernels for structured data
11.7.1 Comparing trees
11.7.2 Structured data: a framework
11.8 Summary
11.9 Further reading and advanced topics
12 Kernels from generative models
12.1 P-kernels
12.1.1 Conditional-independence (CI) and marginalisation
12.1.2 Representing multivariate distributions
12.1.3 Fixed length strings generated by a hidden binomial model
12.1.4 Fixed length strings generated by a hidden markov model
12.1.5 Pair hidden Markov model kernels
12.1.6 Hidden tree model kernels
12.2 Fisher kernels
12.2.1 From probability to geometry
12.2.2 Fisher kernels for hidden Markov models
12.3 Summary
12.4 Further reading and advanced topics
Appendix A Proofs omitted from the main text
A.1 Proof of McDiarmid’s theorem
A.2 Stability of principal components analysis
A.3 Proofs of diffusion kernels
Appendix B Notational conventions
B.1 List of symbols
B.2 Notation for Tables
Appendix C List of pattern analysis methods
C.1 Pattern analysis computations
C.2 Pattern analysis algorithms
Appendix D List of kernels
D.1 Kernel definitions and computations
D.2 Kernel algorithms
References
Index
This page intentionally left blank
Kernel Methods for Pattern Analysis Pattern Analysis is the process of finding general relations in a set of data, and forms the core of many disciplines, from neural networks to so-called syn- tactical pattern recognition, from statistical pattern recognition to machine learning and data mining. Applications of pattern analysis range from bioin- formatics to document retrieval. The kernel methodology described here provides a powerful and unified framework for all of these disciplines, motivating algorithms that can act on general types of data (e.g. strings, vectors, text, etc.) and look for general types of relations (e.g. rankings, classifications, regressions, clusters, etc.). This book fulfils two major roles. Firstly it provides practitioners with a large toolkit of algorithms, kernels and solutions ready to be implemented, many given as Matlab code suitable for many pattern analysis tasks in fields such as bioinformatics, text analysis, and image analysis. Secondly it furnishes students and researchers with an easy introduction to the rapidly expanding field of kernel-based pattern analysis, demonstrating with examples how to handcraft an algorithm or a kernel for a new specific application, while covering the required conceptual and mathematical tools necessary to do so. The book is in three parts. The first provides the conceptual foundations of the field, both by giving an extended introductory example and by cov- ering the main theoretical underpinnings of the approach. The second part contains a number of kernel-based algorithms, from the simplest to sophis- ticated systems such as kernel partial least squares, canonical correlation analysis, support vector machines, principal components analysis, etc. The final part describes a number of kernel functions, from basic examples to advanced recursive kernels, kernels derived from generative models such as HMMs and string matching kernels based on dynamic programming, as well as special kernels designed to handle text documents. All those involved in pattern recognition, machine learning, neural net- works and their applications, from computational biology to text analysis will welcome this account.
Kernel Methods for Pattern Analysis John Shawe-Taylor University of Southampton Nello Cristianini University of California at Davis
cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge cb2 2ru, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521813976 © Cambridge University Press 2004 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2004 isbn-13 978-0-511-21060-0 isbn-10 0-511-21237-2 eBook (EBL) eBook (EBL) isbn-13 978-0-521-81397-6 isbn-10 0-521-81397-2 hardback hardback Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents List of code fragments Preface Part I Basic concepts Pattern analysis Summary Kernel methods: an overview Linear regression in a feature space 1 1.1 Patterns in data 1.2 Pattern analysis algorithms 1.3 Exploiting patterns 1.4 1.5 Further reading and advanced topics 2 2.1 The overall picture 2.2 2.3 Other examples 2.4 The modularity of kernel methods 2.5 Roadmap of the book 2.6 2.7 Further reading and advanced topics 3 3.1 3.2 Characterisation of kernels 3.3 The kernel matrix 3.4 Kernel construction 3.5 3.6 Further reading and advanced topics 4 4.1 Concentration inequalities 4.2 Capacity and regularisation: Rademacher theory Properties of kernels Inner products and positive semi-definite matrices Detecting stable patterns Summary Summary v page viii xi 1 3 4 12 17 22 23 25 26 27 36 42 43 44 45 47 48 60 68 74 82 82 85 86 93
vi Contents 4.3 Pattern stability for kernel-based classes 4.4 A pragmatic approach 4.5 4.6 Further reading and advanced topics Summary Part II Pattern analysis algorithms Elementary algorithms in feature space Summary 5 5.1 Means and distances 5.2 Computing projections: Gram–Schmidt, QR and Cholesky 5.3 Measuring the spread of the data 5.4 Fisher discriminant analysis I 5.5 5.6 Further reading and advanced topics Pattern analysis using eigen-decompositions 6 6.1 Singular value decomposition 6.2 Principal components analysis 6.3 Directions of maximum covariance 6.4 The generalised eigenvector problem 6.5 Canonical correlation analysis 6.6 Fisher discriminant analysis II 6.7 Methods for linear regression 6.8 6.9 Further reading and advanced topics 7 7.1 The smallest enclosing hypersphere 7.2 7.3 7.4 On-line classification and regression 7.5 7.6 Further reading and advanced topics 8 8.1 Discovering rank relations 8.2 Discovering cluster structure in a feature space 8.3 Data visualisation 8.4 8.5 Further reading and advanced topics Support vector machines for classification Support vector machines for regression Pattern analysis using convex optimisation Ranking, clustering and data visualisation Summary Summary Summary Part III Constructing kernels Basic kernels and kernel types 9 9.1 Kernels in closed form 97 104 105 106 109 111 112 122 128 132 137 138 140 141 143 155 161 164 176 176 192 193 195 196 211 230 241 249 250 252 253 264 280 286 286 289 291 292
分享到:
收藏