Neural Networks and Learning Machines.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：13.71M 资料格式：pdf 举报版权申诉

qq_22124033-10051829-4744302543287857532.pdf-第1页.png

第1页 / 共937页

qq_22124033-10051829-4744302543287857532.pdf-第2页.png

第2页 / 共937页

qq_22124033-10051829-4744302543287857532.pdf-第3页.png

第3页 / 共937页

qq_22124033-10051829-4744302543287857532.pdf-第4页.png

第4页 / 共937页

qq_22124033-10051829-4744302543287857532.pdf-第5页.png

第5页 / 共937页

qq_22124033-10051829-4744302543287857532.pdf-第6页.png

第6页 / 共937页

qq_22124033-10051829-4744302543287857532.pdf-第7页.png

第7页 / 共937页

qq_22124033-10051829-4744302543287857532.pdf-第8页.png

第8页 / 共937页

Cover

Title Page

Contents

Preface

Acknowledgments

GLOSSARY

Introduction

1. What is a Neural Network?

2. The Human Brain

3. Models of a Neuron

4. Neural Networks Viewed As Directed Graphs

5. Feedback

6. Network Architectures

7. Knowledge Representation

8. Learning Processes

9. Learning Tasks

10. Concluding Remarks

Notes and References

Chapter 1 Rosenblatt's Perceptron

1.1 Introduction

1.2. Perceptron

1.3. The Perceptron Convergence Theorem

1.4. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment

1.5. Computer Experiment: Pattern Classification

1.6. The Batch Perceptron Algorithm

1.7. Summary and Discussion

Notes and References

Problems

Chapter 2 Model Building through Regression

2.1 Introduction

2.2 Linear Regression Model: Preliminary Considerations

2.3 Maximum a Posteriori Estimation of the Parameter Vector

2.4 Relationship Between Regularized Least-Squares Estimation and MAP Estimation

2.5 Computer Experiment: Pattern Classification

2.6 The Minimum-Description-Length Principle

2.7 Finite Sample-Size Considerations

2.8 The Instrumental-Variables Method

2.9 Summary and Discussion

Notes and References

Problems

Chapter 3 The Least-Mean-Square Algorithm

3.1 Introduction

3.2 Filtering Structure of the LMS Algorithm

3.3 Unconstrained Optimization: a Review

3.4 The Wiener Filter

3.5 The Least-Mean-Square Algorithm

3.6 Markov Model Portraying the Deviation of the LMS Algorithm from the Wiener Filter

3.7 The Langevin Equation: Characterization of Brownian Motion

3.8 Kushner's Direct-Averaging Method

3.9 Statistical LMS Learning Theory for Small Learning-Rate Parameter

3.10 Computer Experiment I: Linear Prediction

3.11 Computer Experiment II: Pattern Classification

3.12 Virtues and Limitations of the LMS Algorithm

3.13 Learning-Rate Annealing Schedules

3.14 Summary and Discussion

Notes and References

Problems

Chapter 4 Multilayer Perceptrons

4.1 Introduction

4.2 Some Preliminaries

4.3 Batch Learning and On-Line Learning

4.4 The Back-Propagation Algorithm

4.5 XOR Problem

4.6 Heuristics for Making the Back-Propagation Algorithm Perform Better

4.7 Computer Experiment: Pattern Classification

4.8 Back Propagation and Differentiation

4.9 The Hessian and Its Role in On-Line Learning

4.10 Optimal Annealing and Adaptive Control of the Learning Rate

4.11 Generalization

4.12 Approximations of Functions

4.13 Cross-Validation

4.14 Complexity Regularization and Network Pruning

4.15 Virtues and Limitations of Back-Propagation Learning

4.16 Supervised Learning Viewed as an Optimization Problem

4.17 Convolutional Networks

4.18 Nonlinear Filtering

4.19 Small-Scale Versus Large-Scale Learning Problems

4.20 Summary and Discussion

Notes and References

Problems

Chapter 5 Kernel Methods and Radial-Basis Function Networks

5.1 Introduction

5.2 Cover's Theorem on the Separability of Patterns

5.3 The Interpolation Problem

5.4 Radial-Basis-Function Networks

5.5 K-Means Clustering

5.6 Recursive Least-Squares Estimation of the Weight Vector

5.7 Hybrid Learning Procedure for RBF Networks

5.8 Computer Experiment: Pattern Classification

5.9 Interpretations of the Gaussian Hidden Units

5.10 Kernel Regression and Its Relation to RBF Networks

5.11 Summary and Discussion

Notes and References

Problems

Chapter 6 Support Vector Machines

6.1 Introduction

6.2 Optimal Hyperplane for Linearly Separable Patterns

6.3 Optimal Hyperplane for Nonseparable Patterns

6.4 The Support Vector Machine Viewed as a Kernel Machine

6.5 Design of Support Vector Machines

6.6 XOR Problem

6.7 Computer Experiment: Pattern Classification

6.8 Regression: Robustness Considerations

6.9 Optimal Solution of the Linear Regression Problem

6.10 The Representer Theorem and Related Issues

6.11 Summary and Discussion

Notes and References

Problems

Chapter 7 Regularization Theory

7.1 Introduction

7.2 Hadamard's Conditions for Well-Posedness

7.3 Tikhonov's Regularization Theory

7.4 Regularization Networks

7.5 Generalized Radial-Basis-Function Networks

7.6 The Regularized Least-Squares Estimator: Revisited

7.7 Additional Notes of Interest on Regularization

7.8 Estimation of the Regularization Parameter

7.9 Semisupervised Learning

7.10 Manifold Regularization: Preliminary Considerations

7.11 Differentiable Manifolds

7.12 Generalized Regularization Theory

7.13 Spectral Graph Theory

7.14 Generalized Representer Theorem

7.15 Laplacian Regularized Least-Squares Algorithm

7.16 Experiments on Pattern Classification Using Semisupervised Learning

7.17 Summary and Discussion

Notes and References

Problems

Chapter 8 Principal-Components Analysis

8.1 Introduction

8.2 Principles of Self-Organization

8.3 Self-Organized Feature Analysis

8.4 Principal-Components Analysis: Perturbation Theory

8.5 Hebbian-Based Maximum Eigenfilter

8.6 Hebbian-Based Principal-Components Analysis

8.7 Case Study: Image Coding

8.8 Kernel Principal-Components Analysis

8.9 Basic Issues Involved in the Coding of Natural Images

8.10 Kernel Hebbian Algorithm

8.11 Summary and Discussion

Notes and References

Problems

Chapter 9 Self-Organizing Maps

9.1 Introduction

9.2 Two Basic Feature-Mapping Models

9.3 Self-Organizing Map

9.4 Properties of the Feature Map

9.5 Computer Experiments I: Disentangling Lattice Dynamics Using SOM

9.6 Contextual Maps

9.7 Hierarchical Vector Quantization

9.8 Kernel Self-Organizing Map

9.9 Computer Experiment II: Disentangling Lattice Dynamics Using Kernel SOM

9.10 Relationship Between Kernel SOM and Kullback–Leibler Divergence

9.11 Summary and Discussion

Notes and References

Problems

Chapter 10 Information-Theoretic Learning Models

10.1 Introduction

10.2 Entropy

10.3 Maximum-Entropy Principle

10.4 Mutual Information

10.5 Kullback–Leibler Divergence

10.6 Copulas

10.7 Mutual Information as an Objective Function to be Optimized

10.8 Maximum Mutual Information Principle

10.9 Infomax and Redundancy Reduction

10.10 Spatially Coherent Features

10.11 Spatially Incoherent Features

10.12 Independent-Components Analysis

10.13 Sparse Coding of Natural Images and Comparison with ICA Coding

10.14 Natural-Gradient Learning for Independent-Components Analysis

10.15 Maximum-Likelihood Estimation for Independent-Components Analysis

10.16 Maximum-Entropy Learning for Blind Source Separation

10.17 Maximization of Negentropy for Independent-Components Analysis

10.18 Coherent Independent-Components Analysis

10.19 Rate Distortion Theory and Information Bottleneck

10.20 Optimal Manifold Representation of Data

10.21 Computer Experiment: Pattern Classification

10.22 Summary and Discussion

Notes and References

Problems

Chapter 11 Stochastic Methods Rooted in Statistical Mechanics

11.1 Introduction

11.2 Statistical Mechanics

11.3 Markov Chains

11.4 Metropolis Algorithm

11.5 Simulated Annealing

11.6 Gibbs Sampling

11.7 Boltzmann Machine

11.8 Logistic Belief Nets

11.9 Deep Belief Nets

11.10 Deterministic Annealing

11.11 Analogy of Deterministic Annealing with Expectation-Maximization Algorithm

11.12 Summary and Discussion

Notes and References

Problems

Chapter 12 Dynamic Programming

12.1 Introduction

12.2 Markov Decision Process

12.3 Bellman's Optimality Criterion

12.4 Policy Iteration

12.5 Value Iteration

12.6 Approximate Dynamic Programming: Direct Methods

12.7 Temporal-Difference Learning

12.8 Q-Learning

12.9 Approximate Dynamic Programming: Indirect Methods

12.10 Least-Squares Policy Evaluation

12.11 Approximate Policy Iteration

12.12 Summary and Discussion

Notes and References

Problems

Chapter 13 Neurodynamics

13.1 Introduction

13.2 Dynamic Systems

13.3 Stability of Equilibrium States

13.4 Attractors

13.5 Neurodynamic Models

13.6 Manipulation of Attractors as a Recurrent Network Paradigm

13.7 Hopfield Model

13.8 The Cohen–Grossberg Theorem

13.9 Brain-State-In-A-Box Model

13.10 Strange Attractors and Chaos

13.11 Dynamic Reconstruction of a Chaotic Process

13.12 Summary and Discussion

Notes and References

Problems

Chapter 14 Bayseian Filtering for State Estimation of Dynamic Systems

14.1 Introduction

14.2 State-Space Models

14.3 Kalman Filters

14.4 The Divergence-Phenomenon and Square-Root Filtering

14.5 The Extended Kalman Filter

14.6 The Bayesian Filter

14.7 Cubature Kalman Filter: Building on the Kalman Filter

14.8 Particle Filters

14.9 Computer Experiment: Comparative Evaluation of Extended Kalman and Particle Filters

14.10 Kalman Filtering in Modeling of Brain Functions

14.11 Summary and Discussion

Notes and References

Problems

Chapter 15 Dynamically Driven Recurrent Networks

15.1 Introduction

15.2 Recurrent Network Architectures

15.3 Universal Approximation Theorem

15.4 Controllability and Observability

15.5 Computational Power of Recurrent Networks

15.6 Learning Algorithms

15.7 Back Propagation Through Time

15.8 Real-Time Recurrent Learning

15.9 Vanishing Gradients in Recurrent Networks

15.10 Supervised Training Framework for Recurrent Networks Using Nonlinear Sequential State Estimators

15.11 Computer Experiment: Dynamic Reconstruction of Mackay–Glass Attractor

15.12 Adaptivity Considerations

15.13 Case Study: Model Reference Applied to Neurocontrol

15.14 Summary and Discussion

Notes and References

Problems

Bibliography

Index

Neural Networks and Learning Machines Third Edition Simon Haykin McMaster University Hamilton, Ontario, Canada New York Boston San Francisco London Toronto Sydney Tokyo Singapore Madrid Mexico City Munich Paris Cape Town Hong Kong Montreal

Library of Congress Cataloging-in-Publication Data Haykin, Simon Neural networks and learning machines / Simon Haykin.—3rd ed. p. cm. Rev. ed of: Neural networks. 2nd ed., 1999. Includes bibliographical references and index. ISBN-13: 978-0-13-147139-9 ISBN-10: 0-13-147139-2 1. Neural networks (Computer science) 2. Adaptive filters. I. Haykin, Simon Neural networks. II. Title. QA76.87.H39 2008 006.3⬘--dc22 2008034079 Vice President and Editorial Director, ECS: Marcia J. Horton Associate Editor: Alice Dworkin Supervisor/Editorial Assistant: Dolores Mars Editorial Assistant: William Opaluch Director of Team-Based Project Management: Vince O’Brien Senior Managing Editor: Scott Disanno A/V Production Editor: Greg Dulles Art Director: Jayne Conte Cover Designer: Bruce Kenselaar Manufacturing Manager: Alan Fischer Manufacturing Buyer: Lisa McDowell Marketing Manager: Tim Galligan Copyright © 2009 by Pearson Education, Inc., Upper Saddle River, New Jersey 07458. Pearson Prentice Hall. All rights reserved. Printed in the United States of America. This publication is protected by Copyright and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permission(s), write to: Rights and Permissions Department. Pearson® is a registered trademark of Pearson plc Pearson Education Ltd. Pearson Education Singapore Pte. Ltd. Pearson Education Canada, Ltd. Pearson Education–Japan Pearson Education Australia Pty. Limited Pearson Education North Asia Ltd. Pearson Educación de Mexico, S.A. de C.V. Pearson Education Malaysia Pte. Ltd. 10 9 8 7 6 5 4 3 2 1 ISBN-13: 978-0-13-147139-9 ISBN-10: 0-13-147139-2

To my wife, Nancy, for her patience and tolerance, and to the countless researchers in neural networks for their original contributions, the many reviewers for their critical inputs, and many of my graduate students for their keen interest.

This page intentionally left blank

Contents Preface x Introduction 1 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. What is a Neural Network? The Human Brain 6 Models of a Neuron 10 Neural Networks Viewed As Directed Graphs Feedback 18 Network Architectures Knowledge Representation 24 Learning Processes Learning Tasks 38 Concluding Remarks Notes and References 21 1 34 45 46 15 Introduction 47 Perceptron 48 The Perceptron Convergence Theorem 50 Chapter 1 Rosenblatt’s Perceptron 47 1.1 1.2. 1.3. 1.4. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 55 1.5. 1.6. 1.7. Computer Experiment: Pattern Classification 60 The Batch Perceptron Algorithm 62 Summary and Discussion 65 Notes and References 66 Problems 66 Introduction 68 Linear Regression Model: Preliminary Considerations 69 Chapter 2 Model Building through Regression 68 2.1 2.2 2.3 Maximum a Posteriori Estimation of the Parameter Vector 71 Relationship Between Regularized Least-Squares Estimation 2.4 and MAP Estimation 76 Computer Experiment: Pattern Classification 77 The Minimum-Description-Length Principle 79 Finite Sample-Size Considerations 82 The Instrumental-Variables Method 86 Summary and Discussion 88 Notes and References 89 Problems 89 2.5 2.6 2.7 2.8 2.9 v

vi Contents Chapter 3 The Least-Mean-Square Algorithm 91 3.1 3.2 3.3 3.4 3.5 3.6 Markov Model Portraying the Deviation of the LMS Algorithm Introduction 91 Filtering Structure of the LMS Algorithm 92 Unconstrained Optimization: a Review 94 The Wiener Filter 100 The Least-Mean-Square Algorithm 102 from the Wiener Filter 104 The Langevin Equation: Characterization of Brownian Motion 106 Kushner’s Direct-Averaging Method 107 Statistical LMS Learning Theory for Small Learning-Rate Parameter 108 3.7 3.8 3.9 3.10 Computer Experiment I: Linear Prediction 110 3.11 Computer Experiment II: Pattern Classification 112 3.12 Virtues and Limitations of the LMS Algorithm 113 3.13 Learning-Rate Annealing Schedules 115 3.14 Summary and Discussion 117 Notes and References 118 Problems 119 Introduction 123 Some Preliminaries 124 Batch Learning and On-Line Learning 126 The Back-Propagation Algorithm 129 XOR Problem 141 Heuristics for Making the Back-Propagation Algorithm Perform Better 144 Computer Experiment: Pattern Classification 150 Back Propagation and Differentiation 153 The Hessian and Its Role in On-Line Learning 155 Chapter 4 Multilayer Perceptrons 122 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 Optimal Annealing and Adaptive Control of the Learning Rate 157 4.11 Generalization 164 4.12 Approximations of Functions 166 4.13 Cross-Validation 171 4.14 Complexity Regularization and Network Pruning 175 4.15 Virtues and Limitations of Back-Propagation Learning 180 4.16 4.17 Convolutional Networks 201 4.18 Nonlinear Filtering 203 4.19 4.20 Summary and Discussion 217 Supervised Learning Viewed as an Optimization Problem 186 Small-Scale Versus Large-Scale Learning Problems 209 Notes and References 219 Problems 221 Chapter 5 Kernel Methods and Radial-Basis Function Networks 230 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 Introduction 230 Cover’s Theorem on the Separability of Patterns 231 The Interpolation Problem 236 Radial-Basis-Function Networks 239 K-Means Clustering 242 Recursive Least-Squares Estimation of the Weight Vector 245 Hybrid Learning Procedure for RBF Networks 249 Computer Experiment: Pattern Classification 250 Interpretations of the Gaussian Hidden Units 252

5.10 Kernel Regression and Its Relation to RBF Networks 255 5.11 Summary and Discussion 259 Notes and References 261 Problems 263 Contents vii Chapter 6 Support Vector Machines 268 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 The Representer Theorem and Related Issues 296 6.11 Introduction 268 Optimal Hyperplane for Linearly Separable Patterns 269 Optimal Hyperplane for Nonseparable Patterns 276 The Support Vector Machine Viewed as a Kernel Machine 281 Design of Support Vector Machines 284 XOR Problem 286 Computer Experiment: Pattern Classification 289 Regression: Robustness Considerations 289 Optimal Solution of the Linear Regression Problem 293 Summary and Discussion 302 Notes and References 304 Problems 307 Introduction 313 Hadamard’s Conditions for Well-Posedness 314 Tikhonov’s Regularization Theory 315 Regularization Networks 326 Generalized Radial-Basis-Function Networks 327 The Regularized Least-Squares Estimator: Revisited 331 Additional Notes of Interest on Regularization Estimation of the Regularization Parameter 336 Semisupervised Learning 342 Chapter 7 Regularization Theory 313 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 Manifold Regularization: Preliminary Considerations 343 7.11 Differentiable Manifolds 345 7.12 Generalized Regularization Theory 348 7.13 7.14 Generalized Representer Theorem 352 7.15 Laplacian Regularized Least-Squares Algorithm 354 7.16 Experiments on Pattern Classification Using Semisupervised Learning 356 7.17 Spectral Graph Theory 350 335 Summary and Discussion 359 Notes and References 361 Problems 363 Chapter 8 Principal-Components Analysis 367 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 Kernel Hebbian Algorithm 407 8.11 Introduction 367 Principles of Self-Organization 368 Self-Organized Feature Analysis 372 Principal-Components Analysis: Perturbation Theory 373 Hebbian-Based Maximum Eigenfilter 383 Hebbian-Based Principal-Components Analysis 392 Case Study: Image Coding 398 Kernel Principal-Components Analysis 401 Basic Issues Involved in the Coding of Natural Images 406 Summary and Discussion 412 Notes and References 415 Problems 418

分享到：

赞收藏

资料库

Neural Networks and Learning Machines.pdf

相关推荐

人工智能

热门标签

最新资料