统计模式识别.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：6.42M 资料格式：pdf 举报版权申诉

xuangestallone-9982071-4744300845196750545.pdf-第1页.png

第1页 / 共668页

xuangestallone-9982071-4744300845196750545.pdf-第2页.png

第2页 / 共668页

xuangestallone-9982071-4744300845196750545.pdf-第3页.png

第3页 / 共668页

xuangestallone-9982071-4744300845196750545.pdf-第4页.png

第4页 / 共668页

xuangestallone-9982071-4744300845196750545.pdf-第5页.png

第5页 / 共668页

xuangestallone-9982071-4744300845196750545.pdf-第6页.png

第6页 / 共668页

xuangestallone-9982071-4744300845196750545.pdf-第7页.png

第7页 / 共668页

xuangestallone-9982071-4744300845196750545.pdf-第8页.png

第8页 / 共668页

Statistical Pattern Recognition

Contents

Preface

Notation

1 Introduction to Statistical Pattern Recognition

1.1 Statistical Pattern Recognition

1.1.1 Introduction

1.1.2 The Basic Model

1.2 Stages in a Pattern Recognition Problem

1.3 Issues

1.4 Approaches to Statistical Pattern Recognition

1.5 Elementary Decision Theory

1.5.1 Bayes’ Decision Rule for Minimum Error

1.5.2 Bayes’ Decision Rule for Minimum Error – Reject Option

1.5.3 Bayes’ Decision Rule for Minimum Risk

1.5.4 Bayes’ Decision Rule for Minimum Risk – Reject Option

1.5.5 Neyman–Pearson Decision Rule

1.5.6 Minimax Criterion

1.5.7 Discussion

1.6 Discriminant Functions

1.6.1 Introduction

1.6.2 Linear Discriminant Functions

1.6.3 Piecewise Linear Discriminant Functions

1.6.4 Generalised Linear Discriminant Function

1.6.5 Summary

1.7 Multiple Regression

1.8 Outline of Book

1.9 Notes and References

Exercises

2 Density Estimation – Parametric

2.1 Introduction

2.2 Estimating the Parameters of the Distributions

2.2.1 Estimative Approach

2.2.2 Predictive Approach

2.3 The Gaussian Classifier

2.3.1 Specification

2.3.2 Derivation of the Gaussian Classifier Plug-In Estimates

2.3.3 Example Application Study

2.4 Dealing with Singularities in the Gaussian Classifier

2.4.1 Introduction

2.4.2 Na¨ıve Bayes

2.4.3 Projection onto a Subspace

2.4.4 Linear Discriminant Function

2.4.5 Regularised Discriminant Analysis

2.4.6 Example Application Study

2.4.7 Further Developments

2.4.8 Summary

2.5 Finite Mixture Models

2.5.1 Introduction

2.5.2 Mixture Models for Discrimination

2.5.3 Parameter Estimation for Normal Mixture Models

2.5.4 Normal Mixture Model Covariance Matrix Constraints

2.5.5 How Many Components?

2.5.6 Maximum Likelihood Estimation via EM

2.5.7 Example Application Study

2.5.8 Further Developments

2.5.9 Summary

2.6 Application Studies

2.7 Summary and Discussion

2.8 Recommendations

2.9 Notes and References

Exercises

3 Density Estimation – Bayesian

3.1 Introduction

3.1.1 Basics

3.1.2 Recursive Calculation

3.1.3 Proportionality

3.2 Analytic Solutions

3.2.1 Conjugate Priors

3.2.2 Estimating the Mean of a Normal Distribution with Known Variance

3.2.3 Estimating the Mean and the Covariance Matrix of a Multivariate Normal Distribution

3.2.4 Unknown Prior Class Probabilities

3.2.5 Summary

3.3 Bayesian Sampling Schemes

3.3.1 Introduction

3.3.2 Summarisation

3.3.3 Sampling Version of the Bayesian Classifier

3.3.4 Rejection Sampling

3.3.5 Ratio of Uniforms

3.3.6 Importance Sampling

3.4 Markov Chain Monte Carlo Methods

3.4.1 Introduction

3.4.2 The Gibbs Sampler

3.4.3 Metropolis–Hastings Algorithm

3.4.4 Data Augmentation

3.4.5 Reversible Jump Markov Chain Monte Carlo

3.4.6 Slice Sampling

3.4.7 MCMC Example – Estimation of Noisy Sinusoids

3.4.8 Summary

3.4.9 Notes and References

3.5 Bayesian Approaches to Discrimination

3.5.1 Labelled Training Data

3.5.2 Unlabelled Training Data

3.6 Sequential Monte Carlo Samplers

3.6.1 Introduction

3.6.2 Basic Methodology

3.6.3 Summary

3.7 Variational Bayes

3.7.1 Introduction

3.7.2 Description

3.7.3 Factorised Variational Approximation

3.7.4 Simple Example

3.7.5 Use of the Procedure for Model Selection

3.7.6 Further Developments and Applications

3.7.7 Summary

3.8 Approximate Bayesian Computation

3.8.1 Introduction

3.8.2 ABC Rejection Sampling

3.8.3 ABC MCMC Sampling

3.8.4 ABC Population Monte Carlo Sampling

3.8.5 Model Selection

3.8.6 Summary

3.9 Example Application Study

3.10 Application Studies

3.11 Summary and Discussion

3.12 Recommendations

3.13 Notes and References

Exercises

4 Density Estimation – Nonparametric

4.1 Introduction

4.1.1 Basic Properties of Density Estimators

4.2 k-Nearest-Neighbour Method

4.2.1 k-Nearest-Neighbour Classifier

4.2.2 Derivation

4.2.3 Choice of Distance Metric

4.2.4 Properties of the Nearest-Neighbour Rule

4.2.5 Linear Approximating and Eliminating Search Algorithm

4.2.6 Branch and Bound Search Algorithms: kd-Trees

4.2.7 Branch and Bound Search Algorithms: Ball-Trees

4.2.8 Editing Techniques

4.2.9 Example Application Study

4.2.10 Further Developments

4.2.11 Summary

4.3 Histogram Method

4.3.1 Data Adaptive Histograms

4.3.2 Independence Assumption (Naïve Bayes)

4.3.3 Lancaster Models

4.3.4 Maximum Weight Dependence Trees

4.3.5 Bayesian Networks

4.3.6 Example Application Study – Naïve Bayes Text Classification

4.3.7 Summary

4.4 Kernel Methods

4.4.1 Biasedness

4.4.2 Multivariate Extension

4.4.3 Choice of Smoothing Parameter

4.4.4 Choice of Kernel

4.4.5 Example Application Study

4.4.6 Further Developments

4.4.7 Summary

4.5 Expansion by Basis Functions

4.6 Copulas

4.6.1 Introduction

4.6.2 Mathematical Basis

4.6.3 Copula Functions

4.6.4 Estimating Copula Probability Density Functions

4.6.5 Simple Example

4.6.6 Summary

4.7 Application Studies

4.7.1 Comparative Studies

4.8 Summary and Discussion

4.9 Recommendations

4.10 Notes and References

Exercises

5 Linear Discriminant Analysis

5.1 Introduction

5.2 Two-Class Algorithms

5.2.1 General Ideas

5.2.2 Perceptron Criterion

5.2.3 Fisher’s Criterion

5.2.4 Least Mean-Squared-Error Procedures

5.2.5 Further Developments

5.2.6 Summary

5.3 Multiclass Algorithms

5.3.1 General Ideas

5.3.2 Error-Correction Procedure

5.3.3 Fisher’s Criterion – Linear Discriminant Analysis

5.3.4 Least Mean-Squared-Error Procedures

5.3.5 Regularisation

5.3.6 Example Application Study

5.3.7 Further Developments

5.3.8 Summary

5.4 Support Vector Machines

5.4.1 Introduction

5.4.2 Linearly Separable Two-Class Data

5.4.3 Linearly Nonseparable Two-Class Data

5.4.4 Multiclass SVMs

5.4.5 SVMs for Regression

5.4.6 Implementation

5.4.7 Example Application Study

5.4.8 Summary

5.5 Logistic Discrimination

5.5.1 Two-Class Case

5.5.2 Maximum Likelihood Estimation

5.5.3 Multiclass Logistic Discrimination

5.5.4 Example Application Study

5.5.5 Further Developments

5.5.6 Summary

5.6 Application Studies

5.7 Summary and Discussion

5.8 Recommendations

5.9 Notes and References

Exercises

6 Nonlinear Discriminant Analysis – Kernel and Projection Methods

6.1 Introduction

6.2 Radial Basis Functions

6.2.1 Introduction

6.2.2 Specifying the Model

6.2.3 Specifying the Functional Form

6.2.4 The Positions of the Centres

6.2.5 Smoothing Parameters

6.2.6 Calculation of the Weights

6.2.7 Model Order Selection

6.2.8 Simple RBF

6.2.9 Motivation

6.2.10 RBF Properties

6.2.11 Example Application Study

6.2.12 Further Developments

6.2.13 Summary

6.3 Nonlinear Support Vector Machines

6.3.1 Introduction

6.3.2 Binary Classification

6.3.3 Types of Kernel

6.3.4 Model Selection

6.3.5 Multiclass SVMs

6.3.6 Probability Estimates

6.3.7 Nonlinear Regression

6.3.8 Example Application Study

6.3.9 Further Developments

6.3.10 Summary

6.4 The Multilayer Perceptron

6.4.1 Introduction

6.4.2 Specifying the MLP Structure

6.4.3 Determining the MLP Weights

6.4.4 Modelling Capacity of the MLP

6.4.5 Logistic Classification

6.4.6 Example Application Study

6.4.7 Bayesian MLP Networks

6.4.8 Projection Pursuit

6.4.9 Summary

6.5 Application Studies

6.6 Summary and Discussion

6.7 Recommendations

6.8 Notes and References

Exercises

7 Rule and Decision Tree Induction

7.1 Introduction

7.2 Decision Trees

7.2.1 Introduction

7.2.2 Decision Tree Construction

7.2.3 Selection of the Splitting Rule

7.2.4 Terminating the Splitting Procedure

7.2.5 Assigning Class Labels to Terminal Nodes

7.2.6 Decision Tree Pruning – Worked Example

7.2.7 Decision Tree Construction Methods

7.2.8 Other Issues

7.2.9 Example Application Study

7.2.10 Further Developments

7.2.11 Summary

7.3 Rule Induction

7.3.1 Introduction

7.3.2 Generating Rules from a Decision Tree

7.3.3 Rule Induction Using a Sequential Covering Algorithm

7.3.4 Example Application Study

7.3.5 Further Developments

7.3.6 Summary

7.4 Multivariate Adaptive Regression Splines

7.4.1 Introduction

7.4.2 Recursive Partitioning Model

7.4.3 Example Application Study

7.4.4 Further Developments

7.4.5 Summary

7.5 Application Studies

7.6 Summary and Discussion

7.7 Recommendations

7.8 Notes and References

Exercises

8 Ensemble Methods

8.1 Introduction

8.2 Characterising a Classifier Combination Scheme

8.2.1 Feature Space

8.2.2 Level

8.2.3 Degree of Training

8.2.4 Form of Component Classifiers

8.2.5 Structure

8.2.6 Optimisation

8.3 Data Fusion

8.3.1 Architectures

8.3.2 Bayesian Approaches

8.3.3 Neyman–Pearson Formulation

8.3.4 Trainable Rules

8.3.5 Fixed Rules

8.4 Classifier Combination Methods

8.4.1 Product Rule

8.4.2 Sum Rule

8.4.3 Min, Max and Median Combiners

8.4.4 Majority Vote

8.4.5 Borda Count

8.4.6 Combiners Trained on Class Predictions

8.4.7 Stacked Generalisation

8.4.8 Mixture of Experts

8.4.9 Bagging

8.4.10 Boosting

8.4.11 Random Forests

8.4.12 Model Averaging

8.4.13 Summary of Methods

8.4.14 Example Application Study

8.4.15 Further Developments

8.5 Application Studies

8.6 Summary and Discussion

8.7 Recommendations

8.8 Notes and References

Exercises

9 Performance Assessment

9.1 Introduction

9.2 Performance Assessment

9.2.1 Performance Measures

9.2.2 Discriminability

9.2.3 Reliability

9.2.4 ROC Curves for Performance Assessment

9.2.5 Population and Sensor Drift

9.2.6 Example Application Study

9.2.7 Further Developments

9.2.8 Summary

9.3 Comparing Classifier Performance

9.3.1 Which Technique is Best?

9.3.2 Statistical Tests

9.3.3 Comparing Rules When Misclassification Costs are Uncertain

9.3.4 Example Application Study

9.3.5 Further Developments

9.3.6 Summary

9.4 Application Studies

9.5 Summary and Discussion

9.6 Recommendations

9.7 Notes and References

Exercises

10 Feature Selection and Extraction

10.1 Introduction

10.2 Feature Selection

10.2.1 Introduction

10.2.2 Characterisation of Feature Selection Approaches

10.2.3 Evaluation Measures

10.2.4 Search Algorithms for Feature Subset Selection

10.2.5 Complete Search – Branch and Bound

10.2.6 Sequential Search

10.2.7 Random Search

10.2.8 Markov Blanket

10.2.9 Stability of Feature Selection

10.2.10 Example Application Study

10.2.11 Further Developments

10.2.12 Summary

10.3 Linear Feature Extraction

10.3.1 Principal Components Analysis

10.3.2 Karhunen–Loève Transformation

10.3.3 Example Application Study

10.3.4 Further Developments

10.3.5 Summary

10.4 Multidimensional Scaling

10.4.1 Classical Scaling

10.4.2 Metric MDS

10.4.3 Ordinal Scaling

10.4.4 Algorithms

10.4.5 MDS for Feature Extraction

10.4.6 Example Application Study

10.4.7 Further Developments

10.4.8 Summary

10.5 Application Studies

10.6 Summary and Discussion

10.7 Recommendations

10.8 Notes and References

Exercises

11 Clustering

11.1 Introduction

11.2 Hierarchical Methods

11.2.1 Single-Link Method

11.2.2 Complete-Link Method

11.2.3 Sum-of-Squares Method

11.2.4 General Agglomerative Algorithm

11.2.5 Properties of a Hierarchical Classification

11.2.6 Example Application Study

11.2.7 Summary

11.3 Quick Partitions

11.4 Mixture Models

11.4.1 Model Description

11.4.2 Example Application Study

11.5 Sum-of-Squares Methods

11.5.1 Clustering Criteria

11.5.2 Clustering Algorithms

11.5.3 Vector Quantisation

11.5.4 Example Application Study

11.5.5 Further Developments

11.5.6 Summary

11.6 Spectral Clustering

11.6.1 Elementary Graph Theory

11.6.2 Similarity Matrices

11.6.3 Application to Clustering

11.6.4 Spectral Clustering Algorithm

11.6.5 Forms of Graph Laplacian

11.6.6 Example Application Study

11.6.7 Further Developments

11.6.8 Summary

11.7 Cluster Validity

11.7.1 Introduction

11.7.2 Statistical Tests

11.7.3 Absence of Class Structure

11.7.4 Validity of Individual Clusters

11.7.5 Hierarchical Clustering

11.7.6 Validation of Individual Clusterings

11.7.7 Partitions

11.7.8 Relative Criteria

11.7.9 Choosing the Number of Clusters

11.8 Application Studies

11.9 Summary and Discussion

11.10 Recommendations

11.11 Notes and References

Exercises

12 Complex Networks

12.1 Introduction

12.1.1 Characteristics

12.1.2 Properties

12.1.3 Questions to Address

12.1.4 Descriptive Features

12.1.5 Outline

12.2 Mathematics of Networks

12.2.1 Graph Matrices

12.2.2 Connectivity

12.2.3 Distance Measures

12.2.4 Weighted Networks

12.2.5 Centrality Measures

12.2.6 Random Graphs

12.3 Community Detection

12.3.1 Clustering Methods

12.3.2 Girvan–Newman Algorithm

12.3.3 Modularity Approaches

12.3.4 Local Modularity

12.3.5 Clique Percolation

12.3.6 Example Application Study

12.3.7 Further Developments

12.3.8 Summary

12.4 Link Prediction

12.4.1 Approaches to Link Prediction

12.4.2 Example Application Study

12.4.3 Further Developments

12.5 Application Studies

12.6 Summary and Discussion

12.7 Recommendations

12.8 Notes and References

Exercises

13 Additional Topics

13.1 Model Selection

13.1.1 Separate Training and Test Sets

13.1.2 Cross-Validation

13.1.3 The Bayesian Viewpoint

13.1.4 Akaike’s Information Criterion

13.1.5 Minimum Description Length

13.2 Missing Data

13.3 Outlier Detection and Robust Procedures

13.4 Mixed Continuous and Discrete Variables

13.5 Structural Risk Minimisation and the Vapnik–Chervonenkis Dimension

13.5.1 Bounds on the Expected Risk

13.5.2 The VC Dimension

References

Index

STATISTICAL PATTERN RECOGNITION T h i r d E d i t i o n Andrew R. Webb and Keith D. Copsey Mathematics and Data Analysis Consultancy, Malvern, UK Statistical pattern recognition relates to the use of statistical techniques for analysing data measurements in order to extract information and make justified decisions. It is a very active area of study and research, which has seen many advances in recent years. Applications such as data mining, web searching, multimedia data retrieval, face recognition, and cursive handwriting recognition, all require robust and efficient pattern recognition techniques. This third edition provides an introduction to statistical pattern theory and techniques, with material drawn from a wide range of fields, including the areas of engineering, statistics, computer science and the social sciences. The book has been updated to cover new methods and applications, and includes a wide range of techniques such as Bayesian methods, neural networks, support vector machines, feature selection and feature reduction techniques. Technical descriptions and motivations are provided, and the techniques are illustrated using real examples. Statistical Pattern Recognition, Third Edition: • Provides a self-contained introduction to statistical pattern recognition. • Includes new material presenting the analysis of complex networks. • Introduces readers to methods for Bayesian density estimation. • Presents descriptions of new applications in biometrics, security, finance and • Provides descriptions and guidance for implementing techniques, which will be • Describes mathematically the range of statistical pattern recognition techniques. • Presents a variety of exercises including more extensive computer projects. condition monitoring. invaluable to software engineers and developers seeking to develop real applications. The in-depth technical descriptions make this book suitable for senior undergraduate and graduate students in statistics, computer science and engineering. Statistical Pattern Recognition is also an excellent reference source for technical professionals. Chapters have been arranged to facilitate implementation of the techniques by software engineers and developers in non-statistical engineering fields. www.wiley.com/go/statistical_pattern_recognition Cover design: Gary Thompson 38mm RED BOX RULES ARE FOR PROOF STAGE ONLY. DELETE BEFORE FINAL PRINTING. Webb Copsey S T A T I S T I C A L P A T T E R N R E C O G N I T I O N T h i r d E d i t i o n STATISTICAL PATTERN RECOGNITION T h i r d E d i t i o n Andrew R. Webb Keith D. Copsey

P1: OTA/XYZ JWST102-fm P2: ABC JWST102-Webb September 8, 2011 8:52 Printer Name: Yet to Come

P1: OTA/XYZ JWST102-fm P2: ABC JWST102-Webb September 8, 2011 8:52 Printer Name: Yet to Come Statistical Pattern Recognition

P1: OTA/XYZ JWST102-fm P2: ABC JWST102-Webb September 8, 2011 8:52 Printer Name: Yet to Come

P1: OTA/XYZ JWST102-fm P2: ABC JWST102-Webb September 8, 2011 8:52 Printer Name: Yet to Come Statistical Pattern Recognition Third Edition Andrew R. Webb • Keith D. Copsey Mathematics and Data Analysis Consultancy, Malvern, UK A John Wiley & Sons, Ltd., Publication

P1: OTA/XYZ JWST102-fm P2: ABC JWST102-Webb September 8, 2011 8:52 Printer Name: Yet to Come This edition ﬁrst published 2011 © 2011 John Wiley & Sons, Ltd Registered ofﬁce John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom For details of our global editorial ofﬁces, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com. The right of the author to be identiﬁed as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Library of Congress Cataloging-in-Publication Data Webb, A. R. (Andrew R.) Statistical pattern recognition / Andrew R. Webb, Keith D. Copsey. – 3rd ed. p. cm. Includes bibliographical references and index. ISBN 978-0-470-68227-2 (hardback) – ISBN 978-0-470-68228-9 (paper) II. Title. 1. Pattern perception–Statistical methods. I. Copsey, Keith D. Q327.W43 2011 006.4–dc23 2011024957 A catalogue record for this book is available from the British Library. HB ISBN: 978-0-470-68227-2 PB ISBN: 978-0-470-68228-9 ePDF ISBN: 978-1-119-95296-1 oBook ISBN: 978-1-119-95295-4 ePub ISBN: 978-1-119-96140-6 Mobi ISBN: 978-1-119-96141-3 Typeset in 10/12pt Times by Aptara Inc., New Delhi, India

P1: OTA/XYZ JWST102-fm P2: ABC JWST102-Webb September 8, 2011 8:52 Printer Name: Yet to Come To Rosemary, Samuel, Miriam, Jacob and Ethan

P1: OTA/XYZ JWST102-fm P2: ABC JWST102-Webb September 8, 2011 8:52 Printer Name: Yet to Come

分享到：

赞收藏

资料库

统计模式识别.pdf

相关推荐

人工智能

热门标签

最新资料