Preface
What’s in This Book?
Who Is “The Practitioner”?
Who Should Read This Book?
The Enterprise Machine Learning Practitioner
The Enterprise Executive
The Academic
Conventions Used in This Book
Using Code Examples
Administrative Notes
O’Reilly Safari
How to Contact Us
Acknowledgments
Josh
Adam
1. A Review of Machine Learning
The Learning Machines
How Can Machines Learn?
Biological Inspiration
What Is Deep Learning?
Going Down the Rabbit Hole
Framing the Questions
The Math Behind Machine Learning: Linear Algebra
Scalars
Vectors
Matrices
Tensors
Hyperplanes
Relevant Mathematical Operations
Converting Data Into Vectors
Solving Systems of Equations
The Math Behind Machine Learning: Statistics
Probability
Conditional Probabilities
Posterior Probability
Distributions
Samples Versus Population
Resampling Methods
Selection Bias
Likelihood
How Does Machine Learning Work?
Regression
Classification
Clustering
Underfitting and Overfitting
Optimization
Convex Optimization
Gradient Descent
Stochastic Gradient Descent
Quasi-Newton Optimization Methods
Generative Versus Discriminative Models
Logistic Regression
The Logistic Function
Understanding Logistic Regression Output
Evaluating Models
The Confusion Matrix
Building an Understanding of Machine Learning
2. Foundations of Neural Networks and Deep Learning
Neural Networks
The Biological Neuron
The Perceptron
Multilayer Feed-Forward Networks
Training Neural Networks
Backpropagation Learning
Activation Functions
Linear
Sigmoid
Tanh
Hard Tanh
Softmax
Rectified Linear
Loss Functions
Loss Function Notation
Loss Functions for Regression
Loss Functions for Classification
Loss Functions for Reconstruction
Hyperparameters
Learning Rate
Regularization
Momentum
Sparsity
3. Fundamentals of Deep Networks
Defining Deep Learning
What Is Deep Learning?
Organization of This Chapter
Common Architectural Principles of Deep Networks
Parameters
Layers
Activation Functions
Loss Functions
Optimization Algorithms
Hyperparameters
Summary
Building Blocks of Deep Networks
RBMs
Autoencoders
Variational Autoencoders
4. Major Architectures of Deep Networks
Unsupervised Pretrained Networks
Deep Belief Networks
Generative Adversarial Networks
Convolutional Neural Networks (CNNs)
Biological Inspiration
Intuition
CNN Architecture Overview
Input Layers
Convolutional Layers
Pooling Layers
Fully Connected Layers
Other Applications of CNNs
CNNs of Note
Summary
Recurrent Neural Networks
Modeling the Time Dimension
3D Volumetric Input
Why Not Markov Models?
General Recurrent Neural Network Architecture
LSTM Networks
Domain-Specific Applications and Blended Networks
Recursive Neural Networks
Network Architecture
Varieties of Recursive Neural Networks
Applications of Recursive Neural Networks
Summary and Discussion
Will Deep Learning Make Other Algorithms Obsolete?
Different Problems Have Different Best Methods
When Do I Need Deep Learning?
5. Building Deep Networks
Matching Deep Networks to the Right Problem
Columnar Data and Multilayer Perceptrons
Images and Convolutional Neural Networks
Time-series Sequences and Recurrent Neural Networks
Using Hybrid Networks
The DL4J Suite of Tools
Vectorization and DataVec
Runtimes and ND4J
Basic Concepts of the DL4J API
Loading and Saving Models
Getting Input for the Model
Setting Up Model Architecture
Training and Evaluation
Modeling CSV Data with Multilayer Perceptron Networks
Setting Up Input Data
Determining Network Architecture
Training the Model
Evaluating the Model
Modeling Handwritten Images Using CNNs
Java Code Listing for the LeNet CNN
Loading and Vectorizing the Input Images
Network Architecture for LeNet in DL4J
Training the CNN
Modeling Sequence Data by Using Recurrent Neural Networks
Generating Shakespeare via LSTMs
Classifying Sensor Time-series Sequences Using LSTMs
Using Autoencoders for Anomaly Detection
Java Code Listing for Autoencoder Example
Setting Up Input Data
Autoencoder Network Architecture and Training
Evaluating the Model
Using Variational Autoencoders to Reconstruct MNIST Digits
Code Listing to Reconstruct MNIST Digits
Examining the VAE Model
Applications of Deep Learning in Natural Language Processing
Learning Word Embedding Using Word2Vec
Distributed Representations of Sentences with Paragraph Vectors
Using Paragraph Vectors for Document Classification
6. Tuning Deep Networks
Basic Concepts in Tuning Deep Networks
An Intuition for Building Deep Networks
Building the Intuition as a Step-by-Step Process
Matching Input Data and Network Architectures
Summary
Relating Model Goal and Output Layers
Regression Model Output Layer
Classification Model Output Layer
Working with Layer Count, Parameter Count, and Memory
Feed-Forward Multilayer Neural Networks
Controlling Layer and Parameter Counts
Estimating Network Memory Requirements
Weight Initialization Strategies
Using Activation Functions
Summary Table for Activation Functions
Applying Loss Functions
Understanding Learning Rates
Using the Ratio of Updates-to-Parameters
Specific Recommendations for Learning Rates
How Sparsity Affects Learning
Applying Methods of Optimization
SGD Best Practices
Using Parallelization and GPUs for Faster Training
Online Learning and Parallel Iterative Algorithms
Parallelizing SGD in DL4J
GPUs
Controlling Epochs and Mini-Batch Size
Understanding Mini-Batch Size Trade-Offs
How to Use Regularization
Priors as Regularizers
Max-Norm Regularization
Dropout
Other Regularization Topics
Working with Class Imbalance
Methods for Sampling Classes
Weighted Loss Functions
Dealing with Overfitting
Using Network Statistics from the Tuning UI
Detecting Poor Weight Initialization
Detecting Nonshuffled Data
Detecting Issues with Regularization
7. Tuning Specific Deep Network Architectures
Convolutional Neural Networks (CNNs)
Common Convolutional Architectural Patterns
Configuring Convolutional Layers
Configuring Pooling Layers
Transfer Learning
Recurrent Neural Networks
Network Input Data and Input Layers
Output Layers and RnnOutputLayer
Training the Network
Debugging Common Issues with LSTMs
Padding and Masking
Evaluation and Scoring With Masking
Variants of Recurrent Network Architectures
Restricted Boltzmann Machines
Hidden Units and Modeling Available Information
Using Different Units
Using Regularization with RBMs
DBNs
Using Momentum
Using Regularization
Determining Hidden Unit Count
8. Vectorization
Introduction to Vectorization in Machine Learning
Why Do We Need to Vectorize Data?
Strategies for Dealing with Columnar Raw Data Attributes
Feature Engineering and Normalization Techniques
Using DataVec for ETL and Vectorization
Vectorizing Image Data
Image Data Representation in DL4J
Image Data and Vector Normalization with DataVec
Working with Sequential Data in Vectorization
Major Variations of Sequential Data Sources
Vectorizing Sequential Data with DataVec
Working with Text in Vectorization
Bag of Words
TF-IDF
Comparing Word2Vec and VSM Comparison
Working with Graphs
9. Using Deep Learning and DL4J on Spark
Introduction to Using DL4J with Spark and Hadoop
Operating Spark from the Command Line
Configuring and Tuning Spark Execution
Running Spark on Mesos
Running Spark on YARN
General Spark Tuning Guide
Tuning DL4J Jobs on Spark
Setting Up a Maven Project Object Model for Spark and DL4J
A pom.xml File Dependency Template
Setting Up a POM File for CDH 5.X
Setting Up a POM File for HDP 2.4
Troubleshooting Spark and Hadoop
Common Issues with ND4J
DL4J Parallel Execution on Spark
A Minimal Spark Training Example
DL4J API Best Practices for Spark
Multilayer Perceptron Spark Example
Setting Up MLP Network Architecture for Spark
Distributed Training and Model Evaluation
Building and Executing a DL4J Spark Job
Generating Shakespeare Text with Spark and Long Short-Term Memory
Setting Up the LSTM Network Architecture
Training, Tracking Progress, and Understanding Results
Modeling MNIST with a Convolutional Neural Network on Spark
Configuring the Spark Job and Loading MNIST Data
Setting Up the LeNet CNN Architecture and Training
A. What Is Artificial Intelligence?
The Story So Far
Defining Deep Learning
Defining Artificial Intelligence
What Is Driving Interest Today in AI Today?
Winter Is Coming
B. RL4J and Reinforcement Learning
Preliminaries
Markov Decision Process
Terminology
Different Settings
Model-Free
Observation Setting
Single-Player and Adversarial Games
Q-Learning
From Policy to Neural Networks the following
Policy Iteration
Exploration Versus Exploitation
Bellman Equation
Initial State Sampling
Q-Learning Implementation
Modeling Q(s,a)
Experience Replay
Convolutional Layers and Image Preprocessing
History Processing
Double Q-Learning
Clipping
Scaling Rewards
Prioritized Replay
Graph, Visualization, and Mean-Q
RL4J
Conclusion
C. Numbers Everyone Should Know
D. Neural Networks and Backpropagation: A Mathematical Approach
Introduction
Backpropagation in a Multilayer Perceptron
E. Using the ND4J API
Design and Basic Usage
Understanding NDArrays
ND4J General Syntax
The Basics of Working with NDArrays
Dataset
Creating Input Vectors
Basics of Vector Creation
Using MLLibUtil
Converting from INDArray to MLLib Vector
Converting from MLLib Vector to INDArray
Making Model Predictions with DL4J
Using the DL4J and ND4J Together
F. Using DataVec
Loading Data for Machine Learning
Loading CSV Data for Multilayer Perceptrons
Loading Image Data for Convolutional Neural Networks
Loading Sequence Data for Recurrent Neural Networks
Transforming Data: Data Wrangling with DataVec
DataVec Transforms: Key Concepts
DataVec Transform Functionality: An Example
G. Working with DL4J from Source
Verifying Git Is Installed
Cloning Key DL4J GitHub Projects
Downloading Source via Zip File
Using Maven to Build Source Code
H. Setting Up DL4J Projects
Creating a New DL4J Project
Java
Working with Maven
IDEs
Setting Up Other Maven POMs
ND4J and Maven
I. Setting Up GPUs for DL4J Projects
Switching Backends to GPU
Picking a GPU
Training on a Multiple GPU System
CUDA on Different Platforms
Monitoring GPU Performance
NVIDIA System Management Interface
J. Troubleshooting DL4J Installations
Previous Installation
Memory Errors When Installing From Source
Older Versions of Maven
Maven and PATH Variables
Bad JDK Versions
C++ and Other Development Tools
Windows and Include Paths
Monitoring GPUs
Using the JVisualVM
Working with Clojure
OS X and Float Support
Fork-Join Bug in Java 7
Precautions
Other Local Repositories
Check Maven Dependencies
Reinstall Dependencies
If All Else Fails
Different Platforms
OS X
Windows
Linux
Index