logo资料库

统计模式识别.pdf

第1页 / 共668页
第2页 / 共668页
第3页 / 共668页
第4页 / 共668页
第5页 / 共668页
第6页 / 共668页
第7页 / 共668页
第8页 / 共668页
资料共668页,剩余部分请下载后查看
Statistical Pattern Recognition
Contents
Preface
Notation
1 Introduction to Statistical Pattern Recognition
1.1 Statistical Pattern Recognition
1.1.1 Introduction
1.1.2 The Basic Model
1.2 Stages in a Pattern Recognition Problem
1.3 Issues
1.4 Approaches to Statistical Pattern Recognition
1.5 Elementary Decision Theory
1.5.1 Bayes’ Decision Rule for Minimum Error
1.5.2 Bayes’ Decision Rule for Minimum Error – Reject Option
1.5.3 Bayes’ Decision Rule for Minimum Risk
1.5.4 Bayes’ Decision Rule for Minimum Risk – Reject Option
1.5.5 Neyman–Pearson Decision Rule
1.5.6 Minimax Criterion
1.5.7 Discussion
1.6 Discriminant Functions
1.6.1 Introduction
1.6.2 Linear Discriminant Functions
1.6.3 Piecewise Linear Discriminant Functions
1.6.4 Generalised Linear Discriminant Function
1.6.5 Summary
1.7 Multiple Regression
1.8 Outline of Book
1.9 Notes and References
Exercises
2 Density Estimation – Parametric
2.1 Introduction
2.2 Estimating the Parameters of the Distributions
2.2.1 Estimative Approach
2.2.2 Predictive Approach
2.3 The Gaussian Classifier
2.3.1 Specification
2.3.2 Derivation of the Gaussian Classifier Plug-In Estimates
2.3.3 Example Application Study
2.4 Dealing with Singularities in the Gaussian Classifier
2.4.1 Introduction
2.4.2 Na¨ıve Bayes
2.4.3 Projection onto a Subspace
2.4.4 Linear Discriminant Function
2.4.5 Regularised Discriminant Analysis
2.4.6 Example Application Study
2.4.7 Further Developments
2.4.8 Summary
2.5 Finite Mixture Models
2.5.1 Introduction
2.5.2 Mixture Models for Discrimination
2.5.3 Parameter Estimation for Normal Mixture Models
2.5.4 Normal Mixture Model Covariance Matrix Constraints
2.5.5 How Many Components?
2.5.6 Maximum Likelihood Estimation via EM
2.5.7 Example Application Study
2.5.8 Further Developments
2.5.9 Summary
2.6 Application Studies
2.7 Summary and Discussion
2.8 Recommendations
2.9 Notes and References
Exercises
3 Density Estimation – Bayesian
3.1 Introduction
3.1.1 Basics
3.1.2 Recursive Calculation
3.1.3 Proportionality
3.2 Analytic Solutions
3.2.1 Conjugate Priors
3.2.2 Estimating the Mean of a Normal Distribution with Known Variance
3.2.3 Estimating the Mean and the Covariance Matrix of a Multivariate Normal Distribution
3.2.4 Unknown Prior Class Probabilities
3.2.5 Summary
3.3 Bayesian Sampling Schemes
3.3.1 Introduction
3.3.2 Summarisation
3.3.3 Sampling Version of the Bayesian Classifier
3.3.4 Rejection Sampling
3.3.5 Ratio of Uniforms
3.3.6 Importance Sampling
3.4 Markov Chain Monte Carlo Methods
3.4.1 Introduction
3.4.2 The Gibbs Sampler
3.4.3 Metropolis–Hastings Algorithm
3.4.4 Data Augmentation
3.4.5 Reversible Jump Markov Chain Monte Carlo
3.4.6 Slice Sampling
3.4.7 MCMC Example – Estimation of Noisy Sinusoids
3.4.8 Summary
3.4.9 Notes and References
3.5 Bayesian Approaches to Discrimination
3.5.1 Labelled Training Data
3.5.2 Unlabelled Training Data
3.6 Sequential Monte Carlo Samplers
3.6.1 Introduction
3.6.2 Basic Methodology
3.6.3 Summary
3.7 Variational Bayes
3.7.1 Introduction
3.7.2 Description
3.7.3 Factorised Variational Approximation
3.7.4 Simple Example
3.7.5 Use of the Procedure for Model Selection
3.7.6 Further Developments and Applications
3.7.7 Summary
3.8 Approximate Bayesian Computation
3.8.1 Introduction
3.8.2 ABC Rejection Sampling
3.8.3 ABC MCMC Sampling
3.8.4 ABC Population Monte Carlo Sampling
3.8.5 Model Selection
3.8.6 Summary
3.9 Example Application Study
3.10 Application Studies
3.11 Summary and Discussion
3.12 Recommendations
3.13 Notes and References
Exercises
4 Density Estimation – Nonparametric
4.1 Introduction
4.1.1 Basic Properties of Density Estimators
4.2 k-Nearest-Neighbour Method
4.2.1 k-Nearest-Neighbour Classifier
4.2.2 Derivation
4.2.3 Choice of Distance Metric
4.2.4 Properties of the Nearest-Neighbour Rule
4.2.5 Linear Approximating and Eliminating Search Algorithm
4.2.6 Branch and Bound Search Algorithms: kd-Trees
4.2.7 Branch and Bound Search Algorithms: Ball-Trees
4.2.8 Editing Techniques
4.2.9 Example Application Study
4.2.10 Further Developments
4.2.11 Summary
4.3 Histogram Method
4.3.1 Data Adaptive Histograms
4.3.2 Independence Assumption (Naïve Bayes)
4.3.3 Lancaster Models
4.3.4 Maximum Weight Dependence Trees
4.3.5 Bayesian Networks
4.3.6 Example Application Study – Naïve Bayes Text Classification
4.3.7 Summary
4.4 Kernel Methods
4.4.1 Biasedness
4.4.2 Multivariate Extension
4.4.3 Choice of Smoothing Parameter
4.4.4 Choice of Kernel
4.4.5 Example Application Study
4.4.6 Further Developments
4.4.7 Summary
4.5 Expansion by Basis Functions
4.6 Copulas
4.6.1 Introduction
4.6.2 Mathematical Basis
4.6.3 Copula Functions
4.6.4 Estimating Copula Probability Density Functions
4.6.5 Simple Example
4.6.6 Summary
4.7 Application Studies
4.7.1 Comparative Studies
4.8 Summary and Discussion
4.9 Recommendations
4.10 Notes and References
Exercises
5 Linear Discriminant Analysis
5.1 Introduction
5.2 Two-Class Algorithms
5.2.1 General Ideas
5.2.2 Perceptron Criterion
5.2.3 Fisher’s Criterion
5.2.4 Least Mean-Squared-Error Procedures
5.2.5 Further Developments
5.2.6 Summary
5.3 Multiclass Algorithms
5.3.1 General Ideas
5.3.2 Error-Correction Procedure
5.3.3 Fisher’s Criterion – Linear Discriminant Analysis
5.3.4 Least Mean-Squared-Error Procedures
5.3.5 Regularisation
5.3.6 Example Application Study
5.3.7 Further Developments
5.3.8 Summary
5.4 Support Vector Machines
5.4.1 Introduction
5.4.2 Linearly Separable Two-Class Data
5.4.3 Linearly Nonseparable Two-Class Data
5.4.4 Multiclass SVMs
5.4.5 SVMs for Regression
5.4.6 Implementation
5.4.7 Example Application Study
5.4.8 Summary
5.5 Logistic Discrimination
5.5.1 Two-Class Case
5.5.2 Maximum Likelihood Estimation
5.5.3 Multiclass Logistic Discrimination
5.5.4 Example Application Study
5.5.5 Further Developments
5.5.6 Summary
5.6 Application Studies
5.7 Summary and Discussion
5.8 Recommendations
5.9 Notes and References
Exercises
6 Nonlinear Discriminant Analysis – Kernel and Projection Methods
6.1 Introduction
6.2 Radial Basis Functions
6.2.1 Introduction
6.2.2 Specifying the Model
6.2.3 Specifying the Functional Form
6.2.4 The Positions of the Centres
6.2.5 Smoothing Parameters
6.2.6 Calculation of the Weights
6.2.7 Model Order Selection
6.2.8 Simple RBF
6.2.9 Motivation
6.2.10 RBF Properties
6.2.11 Example Application Study
6.2.12 Further Developments
6.2.13 Summary
6.3 Nonlinear Support Vector Machines
6.3.1 Introduction
6.3.2 Binary Classification
6.3.3 Types of Kernel
6.3.4 Model Selection
6.3.5 Multiclass SVMs
6.3.6 Probability Estimates
6.3.7 Nonlinear Regression
6.3.8 Example Application Study
6.3.9 Further Developments
6.3.10 Summary
6.4 The Multilayer Perceptron
6.4.1 Introduction
6.4.2 Specifying the MLP Structure
6.4.3 Determining the MLP Weights
6.4.4 Modelling Capacity of the MLP
6.4.5 Logistic Classification
6.4.6 Example Application Study
6.4.7 Bayesian MLP Networks
6.4.8 Projection Pursuit
6.4.9 Summary
6.5 Application Studies
6.6 Summary and Discussion
6.7 Recommendations
6.8 Notes and References
Exercises
7 Rule and Decision Tree Induction
7.1 Introduction
7.2 Decision Trees
7.2.1 Introduction
7.2.2 Decision Tree Construction
7.2.3 Selection of the Splitting Rule
7.2.4 Terminating the Splitting Procedure
7.2.5 Assigning Class Labels to Terminal Nodes
7.2.6 Decision Tree Pruning – Worked Example
7.2.7 Decision Tree Construction Methods
7.2.8 Other Issues
7.2.9 Example Application Study
7.2.10 Further Developments
7.2.11 Summary
7.3 Rule Induction
7.3.1 Introduction
7.3.2 Generating Rules from a Decision Tree
7.3.3 Rule Induction Using a Sequential Covering Algorithm
7.3.4 Example Application Study
7.3.5 Further Developments
7.3.6 Summary
7.4 Multivariate Adaptive Regression Splines
7.4.1 Introduction
7.4.2 Recursive Partitioning Model
7.4.3 Example Application Study
7.4.4 Further Developments
7.4.5 Summary
7.5 Application Studies
7.6 Summary and Discussion
7.7 Recommendations
7.8 Notes and References
Exercises
8 Ensemble Methods
8.1 Introduction
8.2 Characterising a Classifier Combination Scheme
8.2.1 Feature Space
8.2.2 Level
8.2.3 Degree of Training
8.2.4 Form of Component Classifiers
8.2.5 Structure
8.2.6 Optimisation
8.3 Data Fusion
8.3.1 Architectures
8.3.2 Bayesian Approaches
8.3.3 Neyman–Pearson Formulation
8.3.4 Trainable Rules
8.3.5 Fixed Rules
8.4 Classifier Combination Methods
8.4.1 Product Rule
8.4.2 Sum Rule
8.4.3 Min, Max and Median Combiners
8.4.4 Majority Vote
8.4.5 Borda Count
8.4.6 Combiners Trained on Class Predictions
8.4.7 Stacked Generalisation
8.4.8 Mixture of Experts
8.4.9 Bagging
8.4.10 Boosting
8.4.11 Random Forests
8.4.12 Model Averaging
8.4.13 Summary of Methods
8.4.14 Example Application Study
8.4.15 Further Developments
8.5 Application Studies
8.6 Summary and Discussion
8.7 Recommendations
8.8 Notes and References
Exercises
9 Performance Assessment
9.1 Introduction
9.2 Performance Assessment
9.2.1 Performance Measures
9.2.2 Discriminability
9.2.3 Reliability
9.2.4 ROC Curves for Performance Assessment
9.2.5 Population and Sensor Drift
9.2.6 Example Application Study
9.2.7 Further Developments
9.2.8 Summary
9.3 Comparing Classifier Performance
9.3.1 Which Technique is Best?
9.3.2 Statistical Tests
9.3.3 Comparing Rules When Misclassification Costs are Uncertain
9.3.4 Example Application Study
9.3.5 Further Developments
9.3.6 Summary
9.4 Application Studies
9.5 Summary and Discussion
9.6 Recommendations
9.7 Notes and References
Exercises
10 Feature Selection and Extraction
10.1 Introduction
10.2 Feature Selection
10.2.1 Introduction
10.2.2 Characterisation of Feature Selection Approaches
10.2.3 Evaluation Measures
10.2.4 Search Algorithms for Feature Subset Selection
10.2.5 Complete Search – Branch and Bound
10.2.6 Sequential Search
10.2.7 Random Search
10.2.8 Markov Blanket
10.2.9 Stability of Feature Selection
10.2.10 Example Application Study
10.2.11 Further Developments
10.2.12 Summary
10.3 Linear Feature Extraction
10.3.1 Principal Components Analysis
10.3.2 Karhunen–Loève Transformation
10.3.3 Example Application Study
10.3.4 Further Developments
10.3.5 Summary
10.4 Multidimensional Scaling
10.4.1 Classical Scaling
10.4.2 Metric MDS
10.4.3 Ordinal Scaling
10.4.4 Algorithms
10.4.5 MDS for Feature Extraction
10.4.6 Example Application Study
10.4.7 Further Developments
10.4.8 Summary
10.5 Application Studies
10.6 Summary and Discussion
10.7 Recommendations
10.8 Notes and References
Exercises
11 Clustering
11.1 Introduction
11.2 Hierarchical Methods
11.2.1 Single-Link Method
11.2.2 Complete-Link Method
11.2.3 Sum-of-Squares Method
11.2.4 General Agglomerative Algorithm
11.2.5 Properties of a Hierarchical Classification
11.2.6 Example Application Study
11.2.7 Summary
11.3 Quick Partitions
11.4 Mixture Models
11.4.1 Model Description
11.4.2 Example Application Study
11.5 Sum-of-Squares Methods
11.5.1 Clustering Criteria
11.5.2 Clustering Algorithms
11.5.3 Vector Quantisation
11.5.4 Example Application Study
11.5.5 Further Developments
11.5.6 Summary
11.6 Spectral Clustering
11.6.1 Elementary Graph Theory
11.6.2 Similarity Matrices
11.6.3 Application to Clustering
11.6.4 Spectral Clustering Algorithm
11.6.5 Forms of Graph Laplacian
11.6.6 Example Application Study
11.6.7 Further Developments
11.6.8 Summary
11.7 Cluster Validity
11.7.1 Introduction
11.7.2 Statistical Tests
11.7.3 Absence of Class Structure
11.7.4 Validity of Individual Clusters
11.7.5 Hierarchical Clustering
11.7.6 Validation of Individual Clusterings
11.7.7 Partitions
11.7.8 Relative Criteria
11.7.9 Choosing the Number of Clusters
11.8 Application Studies
11.9 Summary and Discussion
11.10 Recommendations
11.11 Notes and References
Exercises
12 Complex Networks
12.1 Introduction
12.1.1 Characteristics
12.1.2 Properties
12.1.3 Questions to Address
12.1.4 Descriptive Features
12.1.5 Outline
12.2 Mathematics of Networks
12.2.1 Graph Matrices
12.2.2 Connectivity
12.2.3 Distance Measures
12.2.4 Weighted Networks
12.2.5 Centrality Measures
12.2.6 Random Graphs
12.3 Community Detection
12.3.1 Clustering Methods
12.3.2 Girvan–Newman Algorithm
12.3.3 Modularity Approaches
12.3.4 Local Modularity
12.3.5 Clique Percolation
12.3.6 Example Application Study
12.3.7 Further Developments
12.3.8 Summary
12.4 Link Prediction
12.4.1 Approaches to Link Prediction
12.4.2 Example Application Study
12.4.3 Further Developments
12.5 Application Studies
12.6 Summary and Discussion
12.7 Recommendations
12.8 Notes and References
Exercises
13 Additional Topics
13.1 Model Selection
13.1.1 Separate Training and Test Sets
13.1.2 Cross-Validation
13.1.3 The Bayesian Viewpoint
13.1.4 Akaike’s Information Criterion
13.1.5 Minimum Description Length
13.2 Missing Data
13.3 Outlier Detection and Robust Procedures
13.4 Mixed Continuous and Discrete Variables
13.5 Structural Risk Minimisation and the Vapnik–Chervonenkis Dimension
13.5.1 Bounds on the Expected Risk
13.5.2 The VC Dimension
References
Index
STATISTICAL PATTERN RECOGNITION T h i r d E d i t i o n Andrew R. Webb and Keith D. Copsey Mathematics and Data Analysis Consultancy, Malvern, UK Statistical pattern recognition relates to the use of statistical techniques for analysing data measurements in order to extract information and make justified decisions. It is a very active area of study and research, which has seen many advances in recent years. Applications such as data mining, web searching, multimedia data retrieval, face recognition, and cursive handwriting recognition, all require robust and efficient pattern recognition techniques. This third edition provides an introduction to statistical pattern theory and techniques, with material drawn from a wide range of fields, including the areas of engineering, statistics, computer science and the social sciences. The book has been updated to cover new methods and applications, and includes a wide range of techniques such as Bayesian methods, neural networks, support vector machines, feature selection and feature reduction techniques. Technical descriptions and motivations are provided, and the techniques are illustrated using real examples. Statistical Pattern Recognition, Third Edition: • Provides a self-contained introduction to statistical pattern recognition. • Includes new material presenting the analysis of complex networks. • Introduces readers to methods for Bayesian density estimation. • Presents descriptions of new applications in biometrics, security, finance and • Provides descriptions and guidance for implementing techniques, which will be • Describes mathematically the range of statistical pattern recognition techniques. • Presents a variety of exercises including more extensive computer projects. condition monitoring. invaluable to software engineers and developers seeking to develop real applications. The in-depth technical descriptions make this book suitable for senior undergraduate and graduate students in statistics, computer science and engineering. Statistical Pattern Recognition is also an excellent reference source for technical professionals. Chapters have been arranged to facilitate implementation of the techniques by software engineers and developers in non-statistical engineering fields. www.wiley.com/go/statistical_pattern_recognition Cover design: Gary Thompson 38mm RED BOX RULES ARE FOR PROOF STAGE ONLY. DELETE BEFORE FINAL PRINTING. Webb Copsey S T A T I S T I C A L P A T T E R N R E C O G N I T I O N T h i r d E d i t i o n STATISTICAL PATTERN RECOGNITION T h i r d E d i t i o n Andrew R. Webb Keith D. Copsey
P1: OTA/XYZ JWST102-fm P2: ABC JWST102-Webb September 8, 2011 8:52 Printer Name: Yet to Come
P1: OTA/XYZ JWST102-fm P2: ABC JWST102-Webb September 8, 2011 8:52 Printer Name: Yet to Come Statistical Pattern Recognition
P1: OTA/XYZ JWST102-fm P2: ABC JWST102-Webb September 8, 2011 8:52 Printer Name: Yet to Come
P1: OTA/XYZ JWST102-fm P2: ABC JWST102-Webb September 8, 2011 8:52 Printer Name: Yet to Come Statistical Pattern Recognition Third Edition Andrew R. Webb • Keith D. Copsey Mathematics and Data Analysis Consultancy, Malvern, UK A John Wiley & Sons, Ltd., Publication
P1: OTA/XYZ JWST102-fm P2: ABC JWST102-Webb September 8, 2011 8:52 Printer Name: Yet to Come This edition first published 2011 © 2011 John Wiley & Sons, Ltd Registered office John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com. The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Library of Congress Cataloging-in-Publication Data Webb, A. R. (Andrew R.) Statistical pattern recognition / Andrew R. Webb, Keith D. Copsey. – 3rd ed. p. cm. Includes bibliographical references and index. ISBN 978-0-470-68227-2 (hardback) – ISBN 978-0-470-68228-9 (paper) II. Title. 1. Pattern perception–Statistical methods. I. Copsey, Keith D. Q327.W43 2011 006.4–dc23 2011024957 A catalogue record for this book is available from the British Library. HB ISBN: 978-0-470-68227-2 PB ISBN: 978-0-470-68228-9 ePDF ISBN: 978-1-119-95296-1 oBook ISBN: 978-1-119-95295-4 ePub ISBN: 978-1-119-96140-6 Mobi ISBN: 978-1-119-96141-3 Typeset in 10/12pt Times by Aptara Inc., New Delhi, India
P1: OTA/XYZ JWST102-fm P2: ABC JWST102-Webb September 8, 2011 8:52 Printer Name: Yet to Come To Rosemary, Samuel, Miriam, Jacob and Ethan
P1: OTA/XYZ JWST102-fm P2: ABC JWST102-Webb September 8, 2011 8:52 Printer Name: Yet to Come
分享到:
收藏