Studies in Big Data 20
Oliver Kramer
Machine
Learning for
Evolution
Strategies
Studies in Big Data
Volume 20
Series editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
e-mail: kacprzyk@ibspan.waw.pl
About this Series
The series “Studies in Big Data” (SBD) publishes new developments and advances
in the various areas of Big Data- quickly and with a high quality. The intent is to
cover the theory, research, development, and applications of Big Data, as embedded
in the fields of engineering, computer science, physics, economics and life sciences.
The books of the series refer to the analysis and understanding of large, complex,
and/or distributed data sets generated from recent digital sources coming from
sensors or other physical instruments as well as simulations, crowd sourcing, social
networks or other internet transactions, such as emails or video click streams and
other. The series contains monographs, lecture notes and edited volumes in Big
Data spanning the areas of computational
intelligence incl. neural networks,
evolutionary computation, soft computing, fuzzy systems, as well as artificial
intelligence, data mining, modern statistics and Operations research, as well as
self-organizing systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.
More information about this series at http://www.springer.com/series/11970
Oliver Kramer
Machine Learning
for Evolution Strategies
123
Oliver Kramer
Informatik
Universität Oldenburg
Oldenburg
Germany
ISSN 2197-6503
Studies in Big Data
ISBN 978-3-319-33381-6
DOI 10.1007/978-3-319-33383-0
ISSN 2197-6511
(electronic)
ISBN 978-3-319-33383-0
(eBook)
Library of Congress Control Number: 2016938389
© Springer International Publishing Switzerland 2016
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names,
in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.
trademarks, service marks, etc.
Printed on acid-free paper
This Springer imprint is published by Springer Nature
The registered company is Springer International Publishing AG Switzerland
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Computational Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1
1.2
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Machine Learning and Big Data . . . . . . . . . . . . . . . . . . . . . .
1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Benchmark Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5
1.6
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Previous Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.7
Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.8
1.9
Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part I Evolution Strategies
2
3
Evolution Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3
2.4
Recombination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6
Rechenberg’s 1/5th Success Rule . . . . . . . . . . . . . . . . . . . . . .
2.7
(1+1)-ES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8
2.9
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
3.2
Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . .
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3
3.4
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
1
3
5
5
6
8
8
9
10
13
13
14
15
16
16
17
18
19
20
21
23
23
24
25
26
v
vi
Contents
Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5
3.6
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part II Machine Learning
4 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
Prediction and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
4.3
Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Model Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5
4.6
Bias-Variance Trade-Off . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Feature Selection and Extraction . . . . . . . . . . . . . . . . . . . . . .
4.7
4.8
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Scikit-Learn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2
Supervised Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3
5.4
Pre-processing Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5 Model Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.6 Model Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7
5.8
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part III Supervised Learning
6
7
Fitness Meta-Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1
Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3
6.4
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.5
6.6
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Constraint Meta-Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1
Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3
7.4
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
30
31
35
35
36
37
38
39
40
41
42
43
45
45
46
47
48
49
50
51
52
53
57
57
58
59
60
61
64
64
67
67
68
71
72
Contents
Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5
7.6
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part IV Unsupervised Learning
8
9
Dimensionality Reduction Optimization. . . . . . . . . . . . . . . . . . . . .
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1
8.2
Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . .
8.3
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5
8.6
Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.7
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Solution Space Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Isometric Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4
9.5
Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.6
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
73
75
75
79
79
80
80
82
83
84
86
87
89
89
90
92
93
94
96
97
10 Clustering-Based Niching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
10.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
10.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
10.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
10.5 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
10.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Part V Ending
11 Summary and Outlook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
11.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
11.2 Evolutionary Computation for Machine Learning . . . . . . . . . . . 113
11.3 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Appendix A: Benchmark Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123