Volume 2
DEEP LEARNING:
From Basics
to Practice
Andrew Glassner
Deep Learning:
From Basics to Practice
Volume 2
Copyright (c) 2018 by Andrew Glassner
www.glassner.com / @AndrewGlassner
All rights reserved. No part of this book, except as noted below, may be reproduced,
stored in a retrieval system, or transmitted in any form or by any means, without
the prior written permission of the author, except in the case of brief quotations
embedded in critical articles or reviews.
The above reservation of rights does not apply to the program files associated with
this book (available on GitHub), or to the images and figures (also available on
GitHub), which are released under the MIT license. Any images or figures that are
not original to the author retain their original copyrights and protections, as noted
in the book and on the web pages where the images are provided.
All software in this book, or in its associated repositories, is provided “as is,” with-
out warranty of any kind, express or implied, including but not limited to the
warranties of merchantability, fitness for a particular pupose, and noninfringe-
ment. In no event shall the authors or copyright holders be liable for any claim,
damages or other liability, whether in an action of contract, tort, or otherwise,
arising from, out of or in connection with the software or the use or other dealings
in the software.
First published February 20, 2018
Version 1.0.1 March 3, 2018
March 22, 2018
Version 1.1
Published by The Imaginary Institute, Seattle, WA.
http://www.imaginary-institute.com
Contact: andrew@imaginary-institute.com
For Niko,
who’s always there
with a smile
and a wag.
Contents of Both Volumes
Volume 1
Preface ....................................................................i
Chapter 1: An Introduction ...................................1
1.1 Why This Chapter Is Here ...............................3
1.1.1 Extracting Meaning from Data ............................ 4
1.1.2 Expert Systems ..................................................... 6
1.2 Learning from Labeled Data ..........................9
1.2.1 A Learning Strategy .............................................. 10
1.2.2 A Computerized Learning Strategy ................... 12
1.2.3 Generalization ...................................................... 16
1.2.4 A Closer Look at Learning ................................... 18
1.3 Supervised Learning ........................................21
1.3.1 Classification ......................................................... 21
1.3.2 Regression ............................................................. 22
1.4 Unsupervised Learning ...................................25
1.4.1 Clustering .............................................................. 25
1.4.2 Noise Reduction ................................................... 26
1.4.3 Dimensionality Reduction .................................. 28
1.5 Generators ........................................................32
1.6 Reinforcement Learning .................................34
1.7 Deep Learning ..................................................37
1.8 What’s Coming Next .......................................43
References ..............................................................44
Image credits ................................................................. 45
Chapter 2: Randomness and Basic Statistics .....46
2.1 Why This Chapter Is Here ...............................48
2.2 Random Variables ...........................................49
2.2.1 Random Numbers in Practice............................. 57
2.3 Some Common Distributions ........................59
2.3.1 The Uniform Distribution ................................... 60
2.3.2 The Normal Distribution .................................... 61
2.3.3 The Bernoulli Distribution ................................. 67
2.3.4 The Multinoulli Distribution .............................. 69
2.3.5 Expected Value .................................................... 70
2.4 Dependence ....................................................70
2.4.1 i.i.d. Variables ........................................................ 71
2.5 Sampling and Replacement ...........................71
2.5.1 Selection With Replacement .............................. 73
2.5.2 Selection Without Replacement ....................... 74
2.5.3 Making Selections ............................................... 75
2.6 Bootstrapping .................................................76
2.7 High-Dimensional Spaces ..............................82
2.8 Covariance and Correlation ...........................85
2.8.1 Covariance ............................................................ 86
2.8.2 Correlation ........................................................... 88
2.9 Anscombe’s Quartet .......................................93
References ..............................................................95
Chapter 3: Probability ...........................................97
3.1 Why This Chapter Is Here ...............................99
3.2 Dart Throwing .................................................100
3.3 Simple Probability ..........................................103
3.4 Conditional Probability ..................................104
3.5 Joint Probability ..............................................109
3.6 Marginal Probability .......................................114
3.7 Measuring Correctness ..................................115
3.7.1 Classifying Samples .............................................. 116
3.7.2 The Confusion Matrix ......................................... 119
3.7.3 Interpreting the Confusion Matrix ................... 121
3.7.4 When Misclassification Is Okay ......................... 126
3.7.5 Accuracy ................................................................ 129
3.7.6 Precision ............................................................... 130
3.7.7 Recall ..................................................................... 132
3.7.8 About Precision and Recall ................................ 134
3.7.9 Other Measures ................................................... 137
3.7.10 Using Precision and Recall Together ............... 141
3.7.11 f1 Score ................................................................. 143
3.8 Applying the Confusion Matrix ....................144
References ..............................................................151
Chapter 4: Bayes Rule ...........................................153
4.1 Why This Chapter Is Here ..............................155
4.2 Frequentist and Bayesian Probability .........156
4.2.1 The Frequentist Approach .................................. 156
4.2.2 The Bayesian Approach ...................................... 157
4.2.3 Discussion ............................................................ 158
4.3 Coin Flipping ..................................................159
4.4 Is This a Fair Coin? ..........................................161
4.4.1 Bayes’ Rule ............................................................ 173
4.4.2 Notes on Bayes’ Rule .......................................... 175
4.5 Finding Life Out There ..................................178
4.6 Repeating Bayes’ Rule ....................................183
4.6.1 The Posterior-Prior Loop .................................... 184
4.6.2 Example: Which Coin Do We Have? ................. 186
4.7 Multiple Hypotheses ......................................194
References ..............................................................203
Chapter 5: Curves and Surfaces ...........................205
5.1 Why This Chapter Is Here ...............................207
5.2 Introduction ....................................................207
5.3 The Derivative .................................................210
5.4 The Gradient ...................................................222
References ..............................................................229
Chapter 6: Information Theory ............................231
6.1 Why This Chapter Is Here ..............................233
6.1.1 Information: One Word, Two Meanings ............ 233
6.2 Surprise and Context .....................................234
6.2.1 Surprise ................................................................. 234
6.2.2 Context ................................................................. 236
6.3 The Bit as Unit ................................................237
6.4 Measuring Information .................................238
6.5 The Size of an Event .......................................240
6.6 Adaptive Codes ...............................................241
6.7 Entropy ...........................................................250
6.8 Cross-Entropy .................................................253
6.8.1 Two Adaptive Codes ............................................ 253
6.8.2 Mixing Up the Codes ......................................... 257
6.9 KL Divergence .................................................260
References ..............................................................262
Chapter 7: Classification .......................................265
7.1 Why This Chapter Is Here ...............................267
7.2 2D Classification ..............................................268
7.2.1 2D Binary Classification ....................................... 269
7.3 2D Multi-class classification ..........................275
7.4 Multiclass Binary Categorizing......................277
7.4.1 One-Versus-Rest ................................................. 278
7.4.2 One-Versus-One ................................................. 280
7.5 Clustering .........................................................286