Independent
Component
Analysis
Independent
Component
Ana&sis
Aapo Hyvtirinen
Juha Karhunen
Erkki Oja
New York
/ Chichester
/ Weinheim
A Wiley-Interscience
Publication
JOHN WILEY & SONS, INC.
/ Toronto
/ Brisbane
/ Singapore
Designations used by companies to distinguish their products are often
claimed as trademarks. In all instances where John Wiley & Sons, Inc., is
aware of a claim, the product names appear in initial capital or ALL
CAPITAL LETTERS. Readers, however, should contact the appropriate
companies for more complete information regarding trademarks and
registration.
Copyright 2001 by John Wiley & Sons, Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system
or transmitted in any form or by any means, electronic or mechanical,
including uploading, downloading, printing, decompiling, recording or
otherwise, except as permitted under Sections 107 or 108 of the 1976
United States Copyright Act, without the prior written permission of the
Publisher. Requests to the Publisher for permission should be addressed to
the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue,
New York, NY 10158-0012, (212) 850-6011, fax (212) 850-6008,
E-Mail: PERMREQ@WILEY.COM.
This publication is designed to provide accurate and authoritative
information in regard to the subject matter covered. It is sold with the
understanding that the publisher is not engaged in rendering professional
services. If professional advice or other expert assistance is required, the
services of a competent professional person should be sought.
ISBN 0-471-22131-7
This title is also available in print as ISBN 0-471-40540-X.
For more information about Wiley products, visit our web site at
www.Wiley.com.
Contents
Preface
1 Introduction
1.1
1.2
1.3
Independence as a guiding principle
Linear representation of multivariate data
1.1.1
The general statistical setting
1.1.2 Dimension reduction methods
1.1.3
Blind source separation
1.2.1 Observing mixtures of unknown signals
1.2.2
Independent component analysis
1.3.1 Definition
1.3.2 Applications
1.3.3 How to find the independent components
Source separation based on independence
1.4 History of ICA
xvii
1
1
1
2
3
3
4
5
6
6
7
7
11
v
vi
CONTENTS
Part I MATHEMATICAL PRELIMINARIES
2 Random Vectors and Independence
2.1
2.2
Probability distributions and densities
2.1.1 Distribution of a random variable
2.1.2 Distribution of a random vector
2.1.3
Joint and marginal distributions
Expectations and moments
2.2.1 Definition and general properties
2.2.2 Mean vector and correlation matrix
2.2.3 Covariances and joint moments
2.2.4 Estimation of expectations
2.3 Uncorrelatedness and independence
2.3.1 Uncorrelatedness and whiteness
2.3.2
Statistical independence
2.4 Conditional densities and Bayes’ rule
2.5
The multivariate gaussian density
2.5.1 Properties of the gaussian density
2.5.2 Central limit theorem
2.6 Density of a transformation
2.7 Higher-order statistics
2.8
Introduction and definition
Stationarity, mean, and autocorrelation
2.7.1 Kurtosis and classification of densities
2.7.2 Cumulants, moments, and their properties
Stochastic processes *
2.8.1
2.8.2
2.8.3 Wide-sense stationary processes
2.8.4
2.8.5 Power spectrum
2.8.6
Time averages and ergodicity
Stochastic signal models
2.9 Concluding remarks and references
Problems
3 Gradients and Optimization Methods
3.1
Vector and matrix gradients
3.1.1 Vector gradient
3.1.2 Matrix gradient
3.1.3 Examples of gradients
15
15
15
17
18
19
19
20
22
24
24
24
27
28
31
32
34
35
36
37
40
43
43
45
46
48
49
50
51
52
57
57
57
59
59
CONTENTS
vii
3.2
3.3
Taylor series expansions
Second-order learning
The natural gradient and relative gradient
Stochastic gradient descent
3.1.4
Learning rules for unconstrained optimization
3.2.1 Gradient descent
3.2.2
3.2.3
3.2.4
3.2.5 Convergence of stochastic on-line algorithms *
Learning rules for constrained optimization
3.3.1
3.3.2 Projection methods
The Lagrange method
3.4 Concluding remarks and references
Problems
4 Estimation Theory
Basic concepts
Properties of estimators
4.1
4.2
4.3 Method of moments
4.4
Least-squares estimation
4.4.1
4.4.2 Nonlinear and generalized least squares *
Linear least-squares method
4.5 Maximum likelihood method
4.6
Bayesian estimation *
4.6.1 Minimum mean-square error estimator
4.6.2 Wiener filtering
4.6.3 Maximum a posteriori (MAP) estimator
4.7 Concluding remarks and references
Problems
5 Information Theory
5.1
Entropy
5.1.1 Definition of entropy
5.1.2 Entropy and coding length
5.1.3 Differential entropy
5.1.4 Entropy of a transformation
5.2 Mutual information
5.2.1 Definition using entropy
5.2.2 Definition using Kullback-Leibler divergence
62
63
63
65
67
68
71
73
73
73
75
75
77
78
80
84
86
86
88
90
94
94
96
97
99
101
105
105
105
107
108
109
110
110
110