logo资料库

Neural Networks and Learning Machines.pdf

第1页 / 共937页
第2页 / 共937页
第3页 / 共937页
第4页 / 共937页
第5页 / 共937页
第6页 / 共937页
第7页 / 共937页
第8页 / 共937页
资料共937页,剩余部分请下载后查看
Cover
Title Page
Copyright Page
Contents
Preface
Acknowledgments
GLOSSARY
Introduction
1. What is a Neural Network?
2. The Human Brain
3. Models of a Neuron
4. Neural Networks Viewed As Directed Graphs
5. Feedback
6. Network Architectures
7. Knowledge Representation
8. Learning Processes
9. Learning Tasks
10. Concluding Remarks
Notes and References
Chapter 1 Rosenblatt's Perceptron
1.1 Introduction
1.2. Perceptron
1.3. The Perceptron Convergence Theorem
1.4. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment
1.5. Computer Experiment: Pattern Classification
1.6. The Batch Perceptron Algorithm
1.7. Summary and Discussion
Notes and References
Problems
Chapter 2 Model Building through Regression
2.1 Introduction
2.2 Linear Regression Model: Preliminary Considerations
2.3 Maximum a Posteriori Estimation of the Parameter Vector
2.4 Relationship Between Regularized Least-Squares Estimation and MAP Estimation
2.5 Computer Experiment: Pattern Classification
2.6 The Minimum-Description-Length Principle
2.7 Finite Sample-Size Considerations
2.8 The Instrumental-Variables Method
2.9 Summary and Discussion
Notes and References
Problems
Chapter 3 The Least-Mean-Square Algorithm
3.1 Introduction
3.2 Filtering Structure of the LMS Algorithm
3.3 Unconstrained Optimization: a Review
3.4 The Wiener Filter
3.5 The Least-Mean-Square Algorithm
3.6 Markov Model Portraying the Deviation of the LMS Algorithm from the Wiener Filter
3.7 The Langevin Equation: Characterization of Brownian Motion
3.8 Kushner's Direct-Averaging Method
3.9 Statistical LMS Learning Theory for Small Learning-Rate Parameter
3.10 Computer Experiment I: Linear Prediction
3.11 Computer Experiment II: Pattern Classification
3.12 Virtues and Limitations of the LMS Algorithm
3.13 Learning-Rate Annealing Schedules
3.14 Summary and Discussion
Notes and References
Problems
Chapter 4 Multilayer Perceptrons
4.1 Introduction
4.2 Some Preliminaries
4.3 Batch Learning and On-Line Learning
4.4 The Back-Propagation Algorithm
4.5 XOR Problem
4.6 Heuristics for Making the Back-Propagation Algorithm Perform Better
4.7 Computer Experiment: Pattern Classification
4.8 Back Propagation and Differentiation
4.9 The Hessian and Its Role in On-Line Learning
4.10 Optimal Annealing and Adaptive Control of the Learning Rate
4.11 Generalization
4.12 Approximations of Functions
4.13 Cross-Validation
4.14 Complexity Regularization and Network Pruning
4.15 Virtues and Limitations of Back-Propagation Learning
4.16 Supervised Learning Viewed as an Optimization Problem
4.17 Convolutional Networks
4.18 Nonlinear Filtering
4.19 Small-Scale Versus Large-Scale Learning Problems
4.20 Summary and Discussion
Notes and References
Problems
Chapter 5 Kernel Methods and Radial-Basis Function Networks
5.1 Introduction
5.2 Cover's Theorem on the Separability of Patterns
5.3 The Interpolation Problem
5.4 Radial-Basis-Function Networks
5.5 K-Means Clustering
5.6 Recursive Least-Squares Estimation of the Weight Vector
5.7 Hybrid Learning Procedure for RBF Networks
5.8 Computer Experiment: Pattern Classification
5.9 Interpretations of the Gaussian Hidden Units
5.10 Kernel Regression and Its Relation to RBF Networks
5.11 Summary and Discussion
Notes and References
Problems
Chapter 6 Support Vector Machines
6.1 Introduction
6.2 Optimal Hyperplane for Linearly Separable Patterns
6.3 Optimal Hyperplane for Nonseparable Patterns
6.4 The Support Vector Machine Viewed as a Kernel Machine
6.5 Design of Support Vector Machines
6.6 XOR Problem
6.7 Computer Experiment: Pattern Classification
6.8 Regression: Robustness Considerations
6.9 Optimal Solution of the Linear Regression Problem
6.10 The Representer Theorem and Related Issues
6.11 Summary and Discussion
Notes and References
Problems
Chapter 7 Regularization Theory
7.1 Introduction
7.2 Hadamard's Conditions for Well-Posedness
7.3 Tikhonov's Regularization Theory
7.4 Regularization Networks
7.5 Generalized Radial-Basis-Function Networks
7.6 The Regularized Least-Squares Estimator: Revisited
7.7 Additional Notes of Interest on Regularization
7.8 Estimation of the Regularization Parameter
7.9 Semisupervised Learning
7.10 Manifold Regularization: Preliminary Considerations
7.11 Differentiable Manifolds
7.12 Generalized Regularization Theory
7.13 Spectral Graph Theory
7.14 Generalized Representer Theorem
7.15 Laplacian Regularized Least-Squares Algorithm
7.16 Experiments on Pattern Classification Using Semisupervised Learning
7.17 Summary and Discussion
Notes and References
Problems
Chapter 8 Principal-Components Analysis
8.1 Introduction
8.2 Principles of Self-Organization
8.3 Self-Organized Feature Analysis
8.4 Principal-Components Analysis: Perturbation Theory
8.5 Hebbian-Based Maximum Eigenfilter
8.6 Hebbian-Based Principal-Components Analysis
8.7 Case Study: Image Coding
8.8 Kernel Principal-Components Analysis
8.9 Basic Issues Involved in the Coding of Natural Images
8.10 Kernel Hebbian Algorithm
8.11 Summary and Discussion
Notes and References
Problems
Chapter 9 Self-Organizing Maps
9.1 Introduction
9.2 Two Basic Feature-Mapping Models
9.3 Self-Organizing Map
9.4 Properties of the Feature Map
9.5 Computer Experiments I: Disentangling Lattice Dynamics Using SOM
9.6 Contextual Maps
9.7 Hierarchical Vector Quantization
9.8 Kernel Self-Organizing Map
9.9 Computer Experiment II: Disentangling Lattice Dynamics Using Kernel SOM
9.10 Relationship Between Kernel SOM and Kullback–Leibler Divergence
9.11 Summary and Discussion
Notes and References
Problems
Chapter 10 Information-Theoretic Learning Models
10.1 Introduction
10.2 Entropy
10.3 Maximum-Entropy Principle
10.4 Mutual Information
10.5 Kullback–Leibler Divergence
10.6 Copulas
10.7 Mutual Information as an Objective Function to be Optimized
10.8 Maximum Mutual Information Principle
10.9 Infomax and Redundancy Reduction
10.10 Spatially Coherent Features
10.11 Spatially Incoherent Features
10.12 Independent-Components Analysis
10.13 Sparse Coding of Natural Images and Comparison with ICA Coding
10.14 Natural-Gradient Learning for Independent-Components Analysis
10.15 Maximum-Likelihood Estimation for Independent-Components Analysis
10.16 Maximum-Entropy Learning for Blind Source Separation
10.17 Maximization of Negentropy for Independent-Components Analysis
10.18 Coherent Independent-Components Analysis
10.19 Rate Distortion Theory and Information Bottleneck
10.20 Optimal Manifold Representation of Data
10.21 Computer Experiment: Pattern Classification
10.22 Summary and Discussion
Notes and References
Problems
Chapter 11 Stochastic Methods Rooted in Statistical Mechanics
11.1 Introduction
11.2 Statistical Mechanics
11.3 Markov Chains
11.4 Metropolis Algorithm
11.5 Simulated Annealing
11.6 Gibbs Sampling
11.7 Boltzmann Machine
11.8 Logistic Belief Nets
11.9 Deep Belief Nets
11.10 Deterministic Annealing
11.11 Analogy of Deterministic Annealing with Expectation-Maximization Algorithm
11.12 Summary and Discussion
Notes and References
Problems
Chapter 12 Dynamic Programming
12.1 Introduction
12.2 Markov Decision Process
12.3 Bellman's Optimality Criterion
12.4 Policy Iteration
12.5 Value Iteration
12.6 Approximate Dynamic Programming: Direct Methods
12.7 Temporal-Difference Learning
12.8 Q-Learning
12.9 Approximate Dynamic Programming: Indirect Methods
12.10 Least-Squares Policy Evaluation
12.11 Approximate Policy Iteration
12.12 Summary and Discussion
Notes and References
Problems
Chapter 13 Neurodynamics
13.1 Introduction
13.2 Dynamic Systems
13.3 Stability of Equilibrium States
13.4 Attractors
13.5 Neurodynamic Models
13.6 Manipulation of Attractors as a Recurrent Network Paradigm
13.7 Hopfield Model
13.8 The Cohen–Grossberg Theorem
13.9 Brain-State-In-A-Box Model
13.10 Strange Attractors and Chaos
13.11 Dynamic Reconstruction of a Chaotic Process
13.12 Summary and Discussion
Notes and References
Problems
Chapter 14 Bayseian Filtering for State Estimation of Dynamic Systems
14.1 Introduction
14.2 State-Space Models
14.3 Kalman Filters
14.4 The Divergence-Phenomenon and Square-Root Filtering
14.5 The Extended Kalman Filter
14.6 The Bayesian Filter
14.7 Cubature Kalman Filter: Building on the Kalman Filter
14.8 Particle Filters
14.9 Computer Experiment: Comparative Evaluation of Extended Kalman and Particle Filters
14.10 Kalman Filtering in Modeling of Brain Functions
14.11 Summary and Discussion
Notes and References
Problems
Chapter 15 Dynamically Driven Recurrent Networks
15.1 Introduction
15.2 Recurrent Network Architectures
15.3 Universal Approximation Theorem
15.4 Controllability and Observability
15.5 Computational Power of Recurrent Networks
15.6 Learning Algorithms
15.7 Back Propagation Through Time
15.8 Real-Time Recurrent Learning
15.9 Vanishing Gradients in Recurrent Networks
15.10 Supervised Training Framework for Recurrent Networks Using Nonlinear Sequential State Estimators
15.11 Computer Experiment: Dynamic Reconstruction of Mackay–Glass Attractor
15.12 Adaptivity Considerations
15.13 Case Study: Model Reference Applied to Neurocontrol
15.14 Summary and Discussion
Notes and References
Problems
Bibliography
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Z
Neural Networks and Learning Machines Third Edition Simon Haykin McMaster University Hamilton, Ontario, Canada New York Boston San Francisco London Toronto Sydney Tokyo Singapore Madrid Mexico City Munich Paris Cape Town Hong Kong Montreal
Library of Congress Cataloging-in-Publication Data Haykin, Simon Neural networks and learning machines / Simon Haykin.—3rd ed. p. cm. Rev. ed of: Neural networks. 2nd ed., 1999. Includes bibliographical references and index. ISBN-13: 978-0-13-147139-9 ISBN-10: 0-13-147139-2 1. Neural networks (Computer science) 2. Adaptive filters. I. Haykin, Simon Neural networks. II. Title. QA76.87.H39 2008 006.3⬘--dc22 2008034079 Vice President and Editorial Director, ECS: Marcia J. Horton Associate Editor: Alice Dworkin Supervisor/Editorial Assistant: Dolores Mars Editorial Assistant: William Opaluch Director of Team-Based Project Management: Vince O’Brien Senior Managing Editor: Scott Disanno A/V Production Editor: Greg Dulles Art Director: Jayne Conte Cover Designer: Bruce Kenselaar Manufacturing Manager: Alan Fischer Manufacturing Buyer: Lisa McDowell Marketing Manager: Tim Galligan Copyright © 2009 by Pearson Education, Inc., Upper Saddle River, New Jersey 07458. Pearson Prentice Hall. All rights reserved. Printed in the United States of America. This publication is protected by Copyright and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permission(s), write to: Rights and Permissions Department. Pearson® is a registered trademark of Pearson plc Pearson Education Ltd. Pearson Education Singapore Pte. Ltd. Pearson Education Canada, Ltd. Pearson Education–Japan Pearson Education Australia Pty. Limited Pearson Education North Asia Ltd. Pearson Educación de Mexico, S.A. de C.V. Pearson Education Malaysia Pte. Ltd. 10 9 8 7 6 5 4 3 2 1 ISBN-13: 978-0-13-147139-9 ISBN-10: 0-13-147139-2
To my wife, Nancy, for her patience and tolerance, and to the countless researchers in neural networks for their original contributions, the many reviewers for their critical inputs, and many of my graduate students for their keen interest.
This page intentionally left blank
Contents Preface x Introduction 1 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. What is a Neural Network? The Human Brain 6 Models of a Neuron 10 Neural Networks Viewed As Directed Graphs Feedback 18 Network Architectures Knowledge Representation 24 Learning Processes Learning Tasks 38 Concluding Remarks Notes and References 21 1 34 45 46 15 Introduction 47 Perceptron 48 The Perceptron Convergence Theorem 50 Chapter 1 Rosenblatt’s Perceptron 47 1.1 1.2. 1.3. 1.4. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 55 1.5. 1.6. 1.7. Computer Experiment: Pattern Classification 60 The Batch Perceptron Algorithm 62 Summary and Discussion 65 Notes and References 66 Problems 66 Introduction 68 Linear Regression Model: Preliminary Considerations 69 Chapter 2 Model Building through Regression 68 2.1 2.2 2.3 Maximum a Posteriori Estimation of the Parameter Vector 71 Relationship Between Regularized Least-Squares Estimation 2.4 and MAP Estimation 76 Computer Experiment: Pattern Classification 77 The Minimum-Description-Length Principle 79 Finite Sample-Size Considerations 82 The Instrumental-Variables Method 86 Summary and Discussion 88 Notes and References 89 Problems 89 2.5 2.6 2.7 2.8 2.9 v
vi Contents Chapter 3 The Least-Mean-Square Algorithm 91 3.1 3.2 3.3 3.4 3.5 3.6 Markov Model Portraying the Deviation of the LMS Algorithm Introduction 91 Filtering Structure of the LMS Algorithm 92 Unconstrained Optimization: a Review 94 The Wiener Filter 100 The Least-Mean-Square Algorithm 102 from the Wiener Filter 104 The Langevin Equation: Characterization of Brownian Motion 106 Kushner’s Direct-Averaging Method 107 Statistical LMS Learning Theory for Small Learning-Rate Parameter 108 3.7 3.8 3.9 3.10 Computer Experiment I: Linear Prediction 110 3.11 Computer Experiment II: Pattern Classification 112 3.12 Virtues and Limitations of the LMS Algorithm 113 3.13 Learning-Rate Annealing Schedules 115 3.14 Summary and Discussion 117 Notes and References 118 Problems 119 Introduction 123 Some Preliminaries 124 Batch Learning and On-Line Learning 126 The Back-Propagation Algorithm 129 XOR Problem 141 Heuristics for Making the Back-Propagation Algorithm Perform Better 144 Computer Experiment: Pattern Classification 150 Back Propagation and Differentiation 153 The Hessian and Its Role in On-Line Learning 155 Chapter 4 Multilayer Perceptrons 122 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 Optimal Annealing and Adaptive Control of the Learning Rate 157 4.11 Generalization 164 4.12 Approximations of Functions 166 4.13 Cross-Validation 171 4.14 Complexity Regularization and Network Pruning 175 4.15 Virtues and Limitations of Back-Propagation Learning 180 4.16 4.17 Convolutional Networks 201 4.18 Nonlinear Filtering 203 4.19 4.20 Summary and Discussion 217 Supervised Learning Viewed as an Optimization Problem 186 Small-Scale Versus Large-Scale Learning Problems 209 Notes and References 219 Problems 221 Chapter 5 Kernel Methods and Radial-Basis Function Networks 230 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 Introduction 230 Cover’s Theorem on the Separability of Patterns 231 The Interpolation Problem 236 Radial-Basis-Function Networks 239 K-Means Clustering 242 Recursive Least-Squares Estimation of the Weight Vector 245 Hybrid Learning Procedure for RBF Networks 249 Computer Experiment: Pattern Classification 250 Interpretations of the Gaussian Hidden Units 252
5.10 Kernel Regression and Its Relation to RBF Networks 255 5.11 Summary and Discussion 259 Notes and References 261 Problems 263 Contents vii Chapter 6 Support Vector Machines 268 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 The Representer Theorem and Related Issues 296 6.11 Introduction 268 Optimal Hyperplane for Linearly Separable Patterns 269 Optimal Hyperplane for Nonseparable Patterns 276 The Support Vector Machine Viewed as a Kernel Machine 281 Design of Support Vector Machines 284 XOR Problem 286 Computer Experiment: Pattern Classification 289 Regression: Robustness Considerations 289 Optimal Solution of the Linear Regression Problem 293 Summary and Discussion 302 Notes and References 304 Problems 307 Introduction 313 Hadamard’s Conditions for Well-Posedness 314 Tikhonov’s Regularization Theory 315 Regularization Networks 326 Generalized Radial-Basis-Function Networks 327 The Regularized Least-Squares Estimator: Revisited 331 Additional Notes of Interest on Regularization Estimation of the Regularization Parameter 336 Semisupervised Learning 342 Chapter 7 Regularization Theory 313 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 Manifold Regularization: Preliminary Considerations 343 7.11 Differentiable Manifolds 345 7.12 Generalized Regularization Theory 348 7.13 7.14 Generalized Representer Theorem 352 7.15 Laplacian Regularized Least-Squares Algorithm 354 7.16 Experiments on Pattern Classification Using Semisupervised Learning 356 7.17 Spectral Graph Theory 350 335 Summary and Discussion 359 Notes and References 361 Problems 363 Chapter 8 Principal-Components Analysis 367 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 Kernel Hebbian Algorithm 407 8.11 Introduction 367 Principles of Self-Organization 368 Self-Organized Feature Analysis 372 Principal-Components Analysis: Perturbation Theory 373 Hebbian-Based Maximum Eigenfilter 383 Hebbian-Based Principal-Components Analysis 392 Case Study: Image Coding 398 Kernel Principal-Components Analysis 401 Basic Issues Involved in the Coding of Natural Images 406 Summary and Discussion 412 Notes and References 415 Problems 418
分享到:
收藏