logo资料库

Deep Learning A Practitioners Approach.pdf

第1页 / 共530页
第2页 / 共530页
第3页 / 共530页
第4页 / 共530页
第5页 / 共530页
第6页 / 共530页
第7页 / 共530页
第8页 / 共530页
资料共530页,剩余部分请下载后查看
Preface
What’s in This Book?
Who Is “The Practitioner”?
Who Should Read This Book?
The Enterprise Machine Learning Practitioner
The Enterprise Executive
The Academic
Conventions Used in This Book
Using Code Examples
Administrative Notes
O’Reilly Safari
How to Contact Us
Acknowledgments
Josh
Adam
1. A Review of Machine Learning
The Learning Machines
How Can Machines Learn?
Biological Inspiration
What Is Deep Learning?
Going Down the Rabbit Hole
Framing the Questions
The Math Behind Machine Learning: Linear Algebra
Scalars
Vectors
Matrices
Tensors
Hyperplanes
Relevant Mathematical Operations
Converting Data Into Vectors
Solving Systems of Equations
The Math Behind Machine Learning: Statistics
Probability
Conditional Probabilities
Posterior Probability
Distributions
Samples Versus Population
Resampling Methods
Selection Bias
Likelihood
How Does Machine Learning Work?
Regression
Classification
Clustering
Underfitting and Overfitting
Optimization
Convex Optimization
Gradient Descent
Stochastic Gradient Descent
Quasi-Newton Optimization Methods
Generative Versus Discriminative Models
Logistic Regression
The Logistic Function
Understanding Logistic Regression Output
Evaluating Models
The Confusion Matrix
Building an Understanding of Machine Learning
2. Foundations of Neural Networks and Deep Learning
Neural Networks
The Biological Neuron
The Perceptron
Multilayer Feed-Forward Networks
Training Neural Networks
Backpropagation Learning
Activation Functions
Linear
Sigmoid
Tanh
Hard Tanh
Softmax
Rectified Linear
Loss Functions
Loss Function Notation
Loss Functions for Regression
Loss Functions for Classification
Loss Functions for Reconstruction
Hyperparameters
Learning Rate
Regularization
Momentum
Sparsity
3. Fundamentals of Deep Networks
Defining Deep Learning
What Is Deep Learning?
Organization of This Chapter
Common Architectural Principles of Deep Networks
Parameters
Layers
Activation Functions
Loss Functions
Optimization Algorithms
Hyperparameters
Summary
Building Blocks of Deep Networks
RBMs
Autoencoders
Variational Autoencoders
4. Major Architectures of Deep Networks
Unsupervised Pretrained Networks
Deep Belief Networks
Generative Adversarial Networks
Convolutional Neural Networks (CNNs)
Biological Inspiration
Intuition
CNN Architecture Overview
Input Layers
Convolutional Layers
Pooling Layers
Fully Connected Layers
Other Applications of CNNs
CNNs of Note
Summary
Recurrent Neural Networks
Modeling the Time Dimension
3D Volumetric Input
Why Not Markov Models?
General Recurrent Neural Network Architecture
LSTM Networks
Domain-Specific Applications and Blended Networks
Recursive Neural Networks
Network Architecture
Varieties of Recursive Neural Networks
Applications of Recursive Neural Networks
Summary and Discussion
Will Deep Learning Make Other Algorithms Obsolete?
Different Problems Have Different Best Methods
When Do I Need Deep Learning?
5. Building Deep Networks
Matching Deep Networks to the Right Problem
Columnar Data and Multilayer Perceptrons
Images and Convolutional Neural Networks
Time-series Sequences and Recurrent Neural Networks
Using Hybrid Networks
The DL4J Suite of Tools
Vectorization and DataVec
Runtimes and ND4J
Basic Concepts of the DL4J API
Loading and Saving Models
Getting Input for the Model
Setting Up Model Architecture
Training and Evaluation
Modeling CSV Data with Multilayer Perceptron Networks
Setting Up Input Data
Determining Network Architecture
Training the Model
Evaluating the Model
Modeling Handwritten Images Using CNNs
Java Code Listing for the LeNet CNN
Loading and Vectorizing the Input Images
Network Architecture for LeNet in DL4J
Training the CNN
Modeling Sequence Data by Using Recurrent Neural Networks
Generating Shakespeare via LSTMs
Classifying Sensor Time-series Sequences Using LSTMs
Using Autoencoders for Anomaly Detection
Java Code Listing for Autoencoder Example
Setting Up Input Data
Autoencoder Network Architecture and Training
Evaluating the Model
Using Variational Autoencoders to Reconstruct MNIST Digits
Code Listing to Reconstruct MNIST Digits
Examining the VAE Model
Applications of Deep Learning in Natural Language Processing
Learning Word Embedding Using Word2Vec
Distributed Representations of Sentences with Paragraph Vectors
Using Paragraph Vectors for Document Classification
6. Tuning Deep Networks
Basic Concepts in Tuning Deep Networks
An Intuition for Building Deep Networks
Building the Intuition as a Step-by-Step Process
Matching Input Data and Network Architectures
Summary
Relating Model Goal and Output Layers
Regression Model Output Layer
Classification Model Output Layer
Working with Layer Count, Parameter Count, and Memory
Feed-Forward Multilayer Neural Networks
Controlling Layer and Parameter Counts
Estimating Network Memory Requirements
Weight Initialization Strategies
Using Activation Functions
Summary Table for Activation Functions
Applying Loss Functions
Understanding Learning Rates
Using the Ratio of Updates-to-Parameters
Specific Recommendations for Learning Rates
How Sparsity Affects Learning
Applying Methods of Optimization
SGD Best Practices
Using Parallelization and GPUs for Faster Training
Online Learning and Parallel Iterative Algorithms
Parallelizing SGD in DL4J
GPUs
Controlling Epochs and Mini-Batch Size
Understanding Mini-Batch Size Trade-Offs
How to Use Regularization
Priors as Regularizers
Max-Norm Regularization
Dropout
Other Regularization Topics
Working with Class Imbalance
Methods for Sampling Classes
Weighted Loss Functions
Dealing with Overfitting
Using Network Statistics from the Tuning UI
Detecting Poor Weight Initialization
Detecting Nonshuffled Data
Detecting Issues with Regularization
7. Tuning Specific Deep Network Architectures
Convolutional Neural Networks (CNNs)
Common Convolutional Architectural Patterns
Configuring Convolutional Layers
Configuring Pooling Layers
Transfer Learning
Recurrent Neural Networks
Network Input Data and Input Layers
Output Layers and RnnOutputLayer
Training the Network
Debugging Common Issues with LSTMs
Padding and Masking
Evaluation and Scoring With Masking
Variants of Recurrent Network Architectures
Restricted Boltzmann Machines
Hidden Units and Modeling Available Information
Using Different Units
Using Regularization with RBMs
DBNs
Using Momentum
Using Regularization
Determining Hidden Unit Count
8. Vectorization
Introduction to Vectorization in Machine Learning
Why Do We Need to Vectorize Data?
Strategies for Dealing with Columnar Raw Data Attributes
Feature Engineering and Normalization Techniques
Using DataVec for ETL and Vectorization
Vectorizing Image Data
Image Data Representation in DL4J
Image Data and Vector Normalization with DataVec
Working with Sequential Data in Vectorization
Major Variations of Sequential Data Sources
Vectorizing Sequential Data with DataVec
Working with Text in Vectorization
Bag of Words
TF-IDF
Comparing Word2Vec and VSM Comparison
Working with Graphs
9. Using Deep Learning and DL4J on Spark
Introduction to Using DL4J with Spark and Hadoop
Operating Spark from the Command Line
Configuring and Tuning Spark Execution
Running Spark on Mesos
Running Spark on YARN
General Spark Tuning Guide
Tuning DL4J Jobs on Spark
Setting Up a Maven Project Object Model for Spark and DL4J
A pom.xml File Dependency Template
Setting Up a POM File for CDH 5.X
Setting Up a POM File for HDP 2.4
Troubleshooting Spark and Hadoop
Common Issues with ND4J
DL4J Parallel Execution on Spark
A Minimal Spark Training Example
DL4J API Best Practices for Spark
Multilayer Perceptron Spark Example
Setting Up MLP Network Architecture for Spark
Distributed Training and Model Evaluation
Building and Executing a DL4J Spark Job
Generating Shakespeare Text with Spark and Long Short-Term Memory
Setting Up the LSTM Network Architecture
Training, Tracking Progress, and Understanding Results
Modeling MNIST with a Convolutional Neural Network on Spark
Configuring the Spark Job and Loading MNIST Data
Setting Up the LeNet CNN Architecture and Training
A. What Is Artificial Intelligence?
The Story So Far
Defining Deep Learning
Defining Artificial Intelligence
What Is Driving Interest Today in AI Today?
Winter Is Coming
B. RL4J and Reinforcement Learning
Preliminaries
Markov Decision Process
Terminology
Different Settings
Model-Free
Observation Setting
Single-Player and Adversarial Games
Q-Learning
From Policy to Neural Networks the following
Policy Iteration
Exploration Versus Exploitation
Bellman Equation
Initial State Sampling
Q-Learning Implementation
Modeling Q(s,a)
Experience Replay
Convolutional Layers and Image Preprocessing
History Processing
Double Q-Learning
Clipping
Scaling Rewards
Prioritized Replay
Graph, Visualization, and Mean-Q
RL4J
Conclusion
C. Numbers Everyone Should Know
D. Neural Networks and Backpropagation: A Mathematical Approach
Introduction
Backpropagation in a Multilayer Perceptron
E. Using the ND4J API
Design and Basic Usage
Understanding NDArrays
ND4J General Syntax
The Basics of Working with NDArrays
Dataset
Creating Input Vectors
Basics of Vector Creation
Using MLLibUtil
Converting from INDArray to MLLib Vector
Converting from MLLib Vector to INDArray
Making Model Predictions with DL4J
Using the DL4J and ND4J Together
F. Using DataVec
Loading Data for Machine Learning
Loading CSV Data for Multilayer Perceptrons
Loading Image Data for Convolutional Neural Networks
Loading Sequence Data for Recurrent Neural Networks
Transforming Data: Data Wrangling with DataVec
DataVec Transforms: Key Concepts
DataVec Transform Functionality: An Example
G. Working with DL4J from Source
Verifying Git Is Installed
Cloning Key DL4J GitHub Projects
Downloading Source via Zip File
Using Maven to Build Source Code
H. Setting Up DL4J Projects
Creating a New DL4J Project
Java
Working with Maven
IDEs
Setting Up Other Maven POMs
ND4J and Maven
I. Setting Up GPUs for DL4J Projects
Switching Backends to GPU
Picking a GPU
Training on a Multiple GPU System
CUDA on Different Platforms
Monitoring GPU Performance
NVIDIA System Management Interface
J. Troubleshooting DL4J Installations
Previous Installation
Memory Errors When Installing From Source
Older Versions of Maven
Maven and PATH Variables
Bad JDK Versions
C++ and Other Development Tools
Windows and Include Paths
Monitoring GPUs
Using the JVisualVM
Working with Clojure
OS X and Float Support
Fork-Join Bug in Java 7
Precautions
Other Local Repositories
Check Maven Dependencies
Reinstall Dependencies
If All Else Fails
Different Platforms
OS X
Windows
Linux
Index
Deep Learning A Practitioner’s Approach Josh Patterson and Adam Gibson
Deep Learning by Josh Patterson and Adam Gibson Copyright © 2017 Josh Patterson and Adam Gibson. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Editors: Mike Loukides and Tim McGovern Production Editor: Nicholas Adams Copyeditor: Bob Russell, Octal Publishing, Inc. Proofreader: Christina Edwards Indexer: Judy McConville Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Rebecca Demarest August 2017: First Edition Revision History for the First Edition 2017-07-27: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491914250 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Deep Learning, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-491-91425-0 [M]
For my sons Ethan, Griffin, and Dane: Go forth, be persistent, be bold. —J. Patterson
Preface What’s in This Book? The first four chapters of this book are focused on enough theory and fundamentals to give you, the practitioner, a working foundation for the rest of the book. The last five chapters then work from these concepts to lead you through a series of practical paths in deep learning using DL4J: Building deep networks Advanced tuning techniques Vectorization for different data types Running deep learning workflows on Spark DL4J AS SHORTHAND FOR DEEPLEARNING4J We use the names DL4J and Deeplearning4j interchangeably in this book. Both terms refer to the suite of tools in the Deeplearning4j library. We designed the book in this manner because we felt there was a need for a book covering “enough theory” while being practical enough to build production-class deep learning workflows. We feel that this hybrid approach to the book’s coverage fits this space well. Chapter 1 is a review of machine learning concepts in general as well as deep learning in particular, to bring any reader up to speed on the basics needed to understand the rest of the book. We added this chapter because many beginners can use a refresher or primer on these concepts and we wanted to make the project accessible to the largest audience possible. Chapter 2 builds on the concepts from Chapter 1 and gives you the foundations of neural networks. It is largely a chapter in neural network theory but we aim to present the information in an accessible way. Chapter 3 further builds on the first two chapters by bringing you up to speed on how deep networks evolved from the fundamentals of neural networks. Chapter 4 then introduces the four major architectures of deep networks and provides you with the foundation for the rest of the book. In Chapter 5, we take you through a number of Java code examples using the techniques from the first half of the book. Chapters 6 and 7 examine the fundamentals of tuning general neural networks and then how to tune specific architectures of deep networks. These chapters are platform-agnostic and will be applicable to the practitioner of any deep learning library. Chapter 8 is a review of the techniques of vectorization and the basics on how to use DataVec (DL4J’s ETL and vectorization workflow tool). Chapter 9 concludes the main body of the book
with a review on how to use DL4J natively on Spark and Hadoop and illustrates three real examples that you can run on your own Spark clusters. The book has many Appendix chapters for topics that were relevant yet didn’t fit directly in the main chapters. Topics include: Artificial Intelligence Using Maven with DL4J projects Working with GPUs Using the ND4J API and more Who Is “The Practitioner”? Today, the term “data science” has no clean definition and often is used in many different ways. The world of data science and artificial intelligence (AI) is as broad and hazy as any terms in computer science today. This is largely because the world of machine learning has become entangled in nearly all disciplines. This widespread entanglement has historical parallels to when the World Wide Web (90s) wove HTML into every discipline and brought many new people into the land of technology. In the same way, all types—engineers, statisticians, analysts, artists—are entering the machine learning fray every day. With this book, our goal is to democratize deep learning (and machine learning) and bring it to the broadest audience possible. If you find the topic interesting and are reading this preface—you are the practitioner, and this book is for you. Who Should Read This Book? As opposed to starting out with toy examples and building around those, we chose to start the book with a series of fundamentals to take you on a full journey through deep learning. We feel that too many books leave out core topics that the enterprise practitioner often needs for a quick review. Based on our machine learning experiences in the field, we decided to lead-off with the materials that entry-level practitioners often need to brush up on to better support their deep learning projects. You might want to skip Chapters 1 and 2 and get right to the deep learning fundamentals. However, we expect that you will appreciate having the material up front so that you can have a smooth glide path into the more difficult topics in deep learning that build on these principles. In the following sections, we suggest some reading strategies for different backgrounds.
The Enterprise Machine Learning Practitioner We split this category into two subgroups: Practicing data scientist Java engineer The practicing data scientist This group typically builds models already and is fluent in the realm of data science. If this is you, you can probably skip Chapter 1 and you’ll want to lightly skim Chapter 2. We suggest moving on to Chapter 3 because you’ll probably be ready to jump into the fundamentals of deep networks. The Java engineer Java engineers are typically tasked with integrating machine learning code with production systems. If this is you, starting with Chapter 1 will be interesting for you because it will give you a better understanding of the vernacular of data science. Appendix E should also be of keen interest to you because integration code for model scoring will typically touch ND4J’s API directly. The Enterprise Executive Some of our reviewers were executives of large Fortune 500 companies and appreciated the content from the perspective of getting a better grasp on what is happening in deep learning. One executive commented that it had “been a minute” since college, and Chapter 1 was a nice review of concepts. If you’re an executive, we suggest that you begin with a quick skim of Chapter 1 to reacclimate yourself to some terminology. You might want to skip the chapters that are heavy on APIs and examples, however. The Academic If you’re an academic, you likely will want to skip Chapters 1 and 2 because graduate school will have already covered these topics. The chapters on tuning neural networks in general and then architecture-specific tuning will be of keen interest to you because this information is based on research and transcends any specific deep learning implementation. The coverage of ND4J will also be of interest to you if you prefer to do high-performance linear algebra on the Java Virtual Machine (JVM). Conventions Used in This Book The following typographical conventions are used in this book: Italic
分享到:
收藏