Preface
Artificial Intelligence
Evolution of Expert Systems to Machine Learning
Machine Learning and Deep Learning
Applications and Research in Deep Learning
Intended Audience
Acknowledgements
About This Book
Contents
About the Author
1 Introduction to Machine Learning
1.1 Machine Learning
1.1.1 Difference Between Machine Learning and Statistics
1.1.2 Difference Between Machine Learning and Deep Learning
1.2 Bias and Variance
1.3 Bias–Variance Trade-off in Machine Learning
1.4 Addressing Bias and Variance in the Model
1.5 Underfitting and Overfitting
1.6 Loss Function
1.7 Regularization
1.8 Gradient Descent
1.9 Hyperparameter Tuning
1.9.1 Searching for Hyperparameters
1.10 Maximum Likelihood Estimation
1.11 Quantifying Loss
1.11.1 The Cross-Entropy Loss
1.11.2 Negative Log-Likelihood
1.11.3 Entropy
1.11.4 Cross-Entropy
1.11.5 Kullback–Leibler Divergence
1.11.6 Summarizing the Measurement of Loss
1.12 Conclusion
2 Introduction to Neural Networks
2.1 Introduction
2.2 Types of Neural Network Architectures
2.2.1 Feedforward Neural Networks (FFNNs)
2.2.2 Convolutional Neural Networks (ConvNets)
2.2.3 Recurrent Neural Networks (RNNs)
2.3 Forward Propagation
2.3.1 Notations
2.3.2 Input Matrix
2.3.3 Bias Matrix
2.3.4 Weight Matrix of Layer-1
2.3.5 Activation Function at Layer-1
2.3.6 Weights Matrix of Layer-2
2.3.7 Activation Function at Layer-2
2.3.8 Output Layer
2.3.9 Summary of Forward Propagation
2.4 Activation Functions
2.4.1 Sigmoid
2.4.2 Hyperbolic Tangent
2.4.3 Rectified Linear Unit
2.4.4 Leaky Rectified Linear Unit
2.4.5 Softmax
2.5 Derivatives of Activation Functions
2.5.1 Derivative of Sigmoid
2.5.2 Derivative of tanh
2.5.3 Derivative of Rectified Linear Unit
2.5.4 Derivative of Leaky Rectified Linear Unit
2.5.5 Derivative of Softmax
2.6 Cross-Entropy Loss
2.7 Derivative of the Cost Function
2.7.1 Derivative of Cross-Entropy Loss with Sigmoid
2.7.2 Derivative of Cross-Entropy Loss with Softmax
2.8 Back Propagation
2.8.1 Summary of Backward Propagation
2.9 Writing a Simple Neural Network Application
2.10 Conclusion
3 Deep Neural Networks-I
3.1 Writing a Deep Neural Network (DNN) Algorithm
3.2 Overview of Packages for Deep Learning in
3.3 Introduction to
3.3.1 Installing
3.3.2 Pipe Operator in
3.3.3 Defining a Model
3.3.4 Configuring the Model
3.3.5 Compile and Fit the Model
3.4 Conclusion
4 Initialization of Network Parameters
4.1 Initialization
4.1.1 Breaking Symmetry
4.1.2 Zero Initialization
4.1.3 Random Initialization
4.1.4 Initialization
4.1.5 Initialization
4.2 Dealing with NaNs
4.2.1 Hyperparameters and Weight Initialization
4.2.2 Normalization
4.2.3 Using Different Activation Functions
4.2.4 Use of NanGuardMode, DebugMode, or MonitorMode
4.2.5 Numerical Stability
4.2.6 Algorithm Related
4.2.7 NaN Introduced by AllocEmpty
4.3 Conclusion
5 Optimization
5.1 Introduction
5.2 Gradient Descent
5.2.1 Gradient Descent or Batch Gradient Descent
5.2.2 Stochastic Gradient Descent
5.2.3 Mini-Batch Gradient Descent
5.3 Parameter Updates
5.3.1 Simple Update
5.3.2 Momentum Update
5.3.3 Nesterov Momentum Update
5.3.4 Annealing the Learning Rate
5.3.5 Second-Order Methods
5.3.6 Per-Parameter Adaptive Learning Rate Methods
5.4 Vanishing Gradient
5.5 Regularization
5.5.1 Dropout Regularization
5.5.2 \ell_2 Regularization
5.5.3 Combining Dropout and \ell_2 Regularization?
5.6 Gradient Checking
5.7 Conclusion
6 Deep Neural Networks-II
6.1 Revisiting DNNs
6.2 Modeling Using
6.2.1 Adjust Epochs
6.2.2 Add Batch Normalization
6.2.3 Add Dropout
6.2.4 Add Weight Regularization
6.2.5 Adjust Learning Rate
6.2.6 Prediction
6.3 Introduction to
6.3.1 What is Flow?
6.3.2
6.3.3 Installing and Running
6.4 Modeling Using
6.4.1 Importing MNIST Data Set from
6.4.2 Define
6.4.3 Training the Model
6.4.4 Instantiating a and Running the Model
6.4.5 Model Evaluation
6.5 Conclusion
7 Convolutional Neural Networks (ConvNets)
7.1 Building Blocks of a Convolution Operation
7.1.1 What is a Convolution Operation?
7.1.2 Edge Detection
7.1.3 Padding
7.1.4 Strided Convolutions
7.1.5 Convolutions over Volume
7.1.6 Pooling
7.2 Single-Layer Convolutional Network
7.2.1 Writing a ConvNet Application
7.3 Training a ConvNet on a Small DataSet Using keras
7.3.1 Data Augmentation
7.4 Specialized Neural Network Architectures
7.4.1 LeNet-5
7.4.2 AlexNet
7.4.3 VGG-16
7.4.4 GoogleNet
7.4.5 Transfer Learning or Using Pretrained Models
7.4.6 Feature Extraction
7.5 What is the ConvNet Learning? A Visualization of Different Layers
7.6 Introduction to Neural Style Transfer
7.6.1 Content Loss
7.6.2 Style Loss
7.6.3 Generating Art Using Neural Style Transfer
7.7 Conclusion
8 Recurrent Neural Networks (RNN) or Sequence Models
8.1 Sequence Models or RNNs
8.2 Applications of Sequence Models
8.3 Sequence Model Architectures
8.4 Writing the Basic Sequence Model Architecture
8.4.1 Backpropagation in Basic RNN
8.5 Long Short-Term Memory (LSTM) Models
8.5.1 The Problem with Sequence Models
8.5.2 Walking Through LSTM
8.6 Writing the LSTM Architecture
8.7 Text Generation with LSTM
8.7.1 Working with Text Data
8.7.2 Generating Sequence Data
8.7.3 Sampling Strategy and the Importance of Softmax Diversity
8.7.4 Implementing LSTM Text Generation
8.8 Natural Language Processing
8.8.1 Word Embeddings
8.8.2 Transfer Learning and Word Embedding
8.8.3 Analyzing Word Similarity Using Word Vectors
8.8.4 Analyzing Word Analogies Using Word Vectors
8.8.5 Debiasing Word Vectors
8.9 Conclusion
9 Epilogue
9.1 Gathering Experience and Knowledge
9.1.1 Research Papers
9.2 Towards Lifelong Learning
9.2.1 Final Words
References