Deep Learning A Practitioners Approach.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：16.02M 资料格式：pdf 举报版权申诉

zhaimao5279-10691027-4744302543364093920.pdf-第1页.png

第1页 / 共530页

zhaimao5279-10691027-4744302543364093920.pdf-第2页.png

第2页 / 共530页

zhaimao5279-10691027-4744302543364093920.pdf-第3页.png

第3页 / 共530页

zhaimao5279-10691027-4744302543364093920.pdf-第4页.png

第4页 / 共530页

zhaimao5279-10691027-4744302543364093920.pdf-第5页.png

第5页 / 共530页

zhaimao5279-10691027-4744302543364093920.pdf-第6页.png

第6页 / 共530页

zhaimao5279-10691027-4744302543364093920.pdf-第7页.png

第7页 / 共530页

zhaimao5279-10691027-4744302543364093920.pdf-第8页.png

第8页 / 共530页

Preface

What’s in This Book?

Who Is “The Practitioner”?

Who Should Read This Book?

The Enterprise Machine Learning Practitioner

The Enterprise Executive

The Academic

Conventions Used in This Book

Using Code Examples

Administrative Notes

O’Reilly Safari

How to Contact Us

Acknowledgments

Josh

Adam

1. A Review of Machine Learning

The Learning Machines

How Can Machines Learn?

Biological Inspiration

What Is Deep Learning?

Going Down the Rabbit Hole

Framing the Questions

The Math Behind Machine Learning: Linear Algebra

Scalars

Vectors

Matrices

Tensors

Hyperplanes

Relevant Mathematical Operations

Converting Data Into Vectors

Solving Systems of Equations

The Math Behind Machine Learning: Statistics

Probability

Conditional Probabilities

Posterior Probability

Distributions

Samples Versus Population

Resampling Methods

Selection Bias

Likelihood

How Does Machine Learning Work?

Regression

Classification

Clustering

Underfitting and Overfitting

Optimization

Convex Optimization

Gradient Descent

Stochastic Gradient Descent

Quasi-Newton Optimization Methods

Generative Versus Discriminative Models

Logistic Regression

The Logistic Function

Understanding Logistic Regression Output

Evaluating Models

The Confusion Matrix

Building an Understanding of Machine Learning

2. Foundations of Neural Networks and Deep Learning

Neural Networks

The Biological Neuron

The Perceptron

Multilayer Feed-Forward Networks

Training Neural Networks

Backpropagation Learning

Activation Functions

Linear

Sigmoid

Tanh

Hard Tanh

Softmax

Rectified Linear

Loss Functions

Loss Function Notation

Loss Functions for Regression

Loss Functions for Classification

Loss Functions for Reconstruction

Hyperparameters

Learning Rate

Regularization

Momentum

Sparsity

3. Fundamentals of Deep Networks

Defining Deep Learning

What Is Deep Learning?

Organization of This Chapter

Common Architectural Principles of Deep Networks

Parameters

Layers

Activation Functions

Loss Functions

Optimization Algorithms

Hyperparameters

Summary

Building Blocks of Deep Networks

RBMs

Autoencoders

Variational Autoencoders

4. Major Architectures of Deep Networks

Unsupervised Pretrained Networks

Deep Belief Networks

Generative Adversarial Networks

Convolutional Neural Networks (CNNs)

Biological Inspiration

Intuition

CNN Architecture Overview

Input Layers

Convolutional Layers

Pooling Layers

Fully Connected Layers

Other Applications of CNNs

CNNs of Note

Summary

Recurrent Neural Networks

Modeling the Time Dimension

3D Volumetric Input

Why Not Markov Models?

General Recurrent Neural Network Architecture

LSTM Networks

Domain-Specific Applications and Blended Networks

Recursive Neural Networks

Network Architecture

Varieties of Recursive Neural Networks

Applications of Recursive Neural Networks

Summary and Discussion

Will Deep Learning Make Other Algorithms Obsolete?

Different Problems Have Different Best Methods

When Do I Need Deep Learning?

5. Building Deep Networks

Matching Deep Networks to the Right Problem

Columnar Data and Multilayer Perceptrons

Images and Convolutional Neural Networks

Time-series Sequences and Recurrent Neural Networks

Using Hybrid Networks

The DL4J Suite of Tools

Vectorization and DataVec

Runtimes and ND4J

Basic Concepts of the DL4J API

Loading and Saving Models

Getting Input for the Model

Setting Up Model Architecture

Training and Evaluation

Modeling CSV Data with Multilayer Perceptron Networks

Setting Up Input Data

Determining Network Architecture

Training the Model

Evaluating the Model

Modeling Handwritten Images Using CNNs

Java Code Listing for the LeNet CNN

Loading and Vectorizing the Input Images

Network Architecture for LeNet in DL4J

Training the CNN

Modeling Sequence Data by Using Recurrent Neural Networks

Generating Shakespeare via LSTMs

Classifying Sensor Time-series Sequences Using LSTMs

Using Autoencoders for Anomaly Detection

Java Code Listing for Autoencoder Example

Setting Up Input Data

Autoencoder Network Architecture and Training

Evaluating the Model

Using Variational Autoencoders to Reconstruct MNIST Digits

Code Listing to Reconstruct MNIST Digits

Examining the VAE Model

Applications of Deep Learning in Natural Language Processing

Learning Word Embedding Using Word2Vec

Distributed Representations of Sentences with Paragraph Vectors

Using Paragraph Vectors for Document Classification

6. Tuning Deep Networks

Basic Concepts in Tuning Deep Networks

An Intuition for Building Deep Networks

Building the Intuition as a Step-by-Step Process

Matching Input Data and Network Architectures

Summary

Relating Model Goal and Output Layers

Regression Model Output Layer

Classification Model Output Layer

Working with Layer Count, Parameter Count, and Memory

Feed-Forward Multilayer Neural Networks

Controlling Layer and Parameter Counts

Estimating Network Memory Requirements

Weight Initialization Strategies

Using Activation Functions

Summary Table for Activation Functions

Applying Loss Functions

Understanding Learning Rates

Using the Ratio of Updates-to-Parameters

Specific Recommendations for Learning Rates

How Sparsity Affects Learning

Applying Methods of Optimization

SGD Best Practices

Using Parallelization and GPUs for Faster Training

Online Learning and Parallel Iterative Algorithms

Parallelizing SGD in DL4J

GPUs

Controlling Epochs and Mini-Batch Size

Understanding Mini-Batch Size Trade-Offs

How to Use Regularization

Priors as Regularizers

Max-Norm Regularization

Dropout

Other Regularization Topics

Working with Class Imbalance

Methods for Sampling Classes

Weighted Loss Functions

Dealing with Overfitting

Using Network Statistics from the Tuning UI

Detecting Poor Weight Initialization

Detecting Nonshuffled Data

Detecting Issues with Regularization

7. Tuning Specific Deep Network Architectures

Convolutional Neural Networks (CNNs)

Common Convolutional Architectural Patterns

Configuring Convolutional Layers

Configuring Pooling Layers

Transfer Learning

Recurrent Neural Networks

Network Input Data and Input Layers

Output Layers and RnnOutputLayer

Training the Network

Debugging Common Issues with LSTMs

Padding and Masking

Evaluation and Scoring With Masking

Variants of Recurrent Network Architectures

Restricted Boltzmann Machines

Hidden Units and Modeling Available Information

Using Different Units

Using Regularization with RBMs

DBNs

Using Momentum

Using Regularization

Determining Hidden Unit Count

8. Vectorization

Introduction to Vectorization in Machine Learning

Why Do We Need to Vectorize Data?

Strategies for Dealing with Columnar Raw Data Attributes

Feature Engineering and Normalization Techniques

Using DataVec for ETL and Vectorization

Vectorizing Image Data

Image Data Representation in DL4J

Image Data and Vector Normalization with DataVec

Working with Sequential Data in Vectorization

Major Variations of Sequential Data Sources

Vectorizing Sequential Data with DataVec

Working with Text in Vectorization

Bag of Words

TF-IDF

Comparing Word2Vec and VSM Comparison

Working with Graphs

9. Using Deep Learning and DL4J on Spark

Introduction to Using DL4J with Spark and Hadoop

Operating Spark from the Command Line

Configuring and Tuning Spark Execution

Running Spark on Mesos

Running Spark on YARN

General Spark Tuning Guide

Tuning DL4J Jobs on Spark

Setting Up a Maven Project Object Model for Spark and DL4J

A pom.xml File Dependency Template

Setting Up a POM File for CDH 5.X

Setting Up a POM File for HDP 2.4

Troubleshooting Spark and Hadoop

Common Issues with ND4J

DL4J Parallel Execution on Spark

A Minimal Spark Training Example

DL4J API Best Practices for Spark

Multilayer Perceptron Spark Example

Setting Up MLP Network Architecture for Spark

Distributed Training and Model Evaluation

Building and Executing a DL4J Spark Job

Generating Shakespeare Text with Spark and Long Short-Term Memory

Setting Up the LSTM Network Architecture

Training, Tracking Progress, and Understanding Results

Modeling MNIST with a Convolutional Neural Network on Spark

Configuring the Spark Job and Loading MNIST Data

Setting Up the LeNet CNN Architecture and Training

A. What Is Artificial Intelligence?

The Story So Far

Defining Deep Learning

Defining Artificial Intelligence

What Is Driving Interest Today in AI Today?

Winter Is Coming

B. RL4J and Reinforcement Learning

Preliminaries

Markov Decision Process

Terminology

Different Settings

Model-Free

Observation Setting

Single-Player and Adversarial Games

Q-Learning

From Policy to Neural Networks the following

Policy Iteration

Exploration Versus Exploitation

Bellman Equation

Initial State Sampling

Q-Learning Implementation

Modeling Q(s,a)

Experience Replay

Convolutional Layers and Image Preprocessing

History Processing

Double Q-Learning

Clipping

Scaling Rewards

Prioritized Replay

Graph, Visualization, and Mean-Q

RL4J

Conclusion

C. Numbers Everyone Should Know

D. Neural Networks and Backpropagation: A Mathematical Approach

Introduction

Backpropagation in a Multilayer Perceptron

E. Using the ND4J API

Design and Basic Usage

Understanding NDArrays

ND4J General Syntax

The Basics of Working with NDArrays

Dataset

Creating Input Vectors

Basics of Vector Creation

Using MLLibUtil

Converting from INDArray to MLLib Vector

Converting from MLLib Vector to INDArray

Making Model Predictions with DL4J

Using the DL4J and ND4J Together

F. Using DataVec

Loading Data for Machine Learning

Loading CSV Data for Multilayer Perceptrons

Loading Image Data for Convolutional Neural Networks

Loading Sequence Data for Recurrent Neural Networks

Transforming Data: Data Wrangling with DataVec

DataVec Transforms: Key Concepts

DataVec Transform Functionality: An Example

G. Working with DL4J from Source

Verifying Git Is Installed

Cloning Key DL4J GitHub Projects

Downloading Source via Zip File

Using Maven to Build Source Code

H. Setting Up DL4J Projects

Creating a New DL4J Project

Java

Working with Maven

IDEs

Setting Up Other Maven POMs

ND4J and Maven

I. Setting Up GPUs for DL4J Projects

Switching Backends to GPU

Picking a GPU

Training on a Multiple GPU System

CUDA on Different Platforms

Monitoring GPU Performance

NVIDIA System Management Interface

J. Troubleshooting DL4J Installations

Previous Installation

Memory Errors When Installing From Source

Older Versions of Maven

Maven and PATH Variables

Bad JDK Versions

C++ and Other Development Tools

Windows and Include Paths

Monitoring GPUs

Using the JVisualVM

Working with Clojure

OS X and Float Support

Fork-Join Bug in Java 7

Precautions

Other Local Repositories

Check Maven Dependencies

Reinstall Dependencies

If All Else Fails

Different Platforms

OS X

Windows

Linux

Index

Deep Learning A Practitioner’s Approach Josh Patterson and Adam Gibson

Deep Learning by Josh Patterson and Adam Gibson Copyright © 2017 Josh Patterson and Adam Gibson. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Editors: Mike Loukides and Tim McGovern Production Editor: Nicholas Adams Copyeditor: Bob Russell, Octal Publishing, Inc. Proofreader: Christina Edwards Indexer: Judy McConville Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Rebecca Demarest August 2017: First Edition Revision History for the First Edition 2017-07-27: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491914250 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Deep Learning, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

978-1-491-91425-0 [M]

For my sons Ethan, Griffin, and Dane: Go forth, be persistent, be bold. —J. Patterson

Preface What’s in This Book? The first four chapters of this book are focused on enough theory and fundamentals to give you, the practitioner, a working foundation for the rest of the book. The last five chapters then work from these concepts to lead you through a series of practical paths in deep learning using DL4J: Building deep networks Advanced tuning techniques Vectorization for different data types Running deep learning workflows on Spark DL4J AS SHORTHAND FOR DEEPLEARNING4J We use the names DL4J and Deeplearning4j interchangeably in this book. Both terms refer to the suite of tools in the Deeplearning4j library. We designed the book in this manner because we felt there was a need for a book covering “enough theory” while being practical enough to build production-class deep learning workflows. We feel that this hybrid approach to the book’s coverage fits this space well. Chapter 1 is a review of machine learning concepts in general as well as deep learning in particular, to bring any reader up to speed on the basics needed to understand the rest of the book. We added this chapter because many beginners can use a refresher or primer on these concepts and we wanted to make the project accessible to the largest audience possible. Chapter 2 builds on the concepts from Chapter 1 and gives you the foundations of neural networks. It is largely a chapter in neural network theory but we aim to present the information in an accessible way. Chapter 3 further builds on the first two chapters by bringing you up to speed on how deep networks evolved from the fundamentals of neural networks. Chapter 4 then introduces the four major architectures of deep networks and provides you with the foundation for the rest of the book. In Chapter 5, we take you through a number of Java code examples using the techniques from the first half of the book. Chapters 6 and 7 examine the fundamentals of tuning general neural networks and then how to tune specific architectures of deep networks. These chapters are platform-agnostic and will be applicable to the practitioner of any deep learning library. Chapter 8 is a review of the techniques of vectorization and the basics on how to use DataVec (DL4J’s ETL and vectorization workflow tool). Chapter 9 concludes the main body of the book

with a review on how to use DL4J natively on Spark and Hadoop and illustrates three real examples that you can run on your own Spark clusters. The book has many Appendix chapters for topics that were relevant yet didn’t fit directly in the main chapters. Topics include: Artificial Intelligence Using Maven with DL4J projects Working with GPUs Using the ND4J API and more Who Is “The Practitioner”? Today, the term “data science” has no clean definition and often is used in many different ways. The world of data science and artificial intelligence (AI) is as broad and hazy as any terms in computer science today. This is largely because the world of machine learning has become entangled in nearly all disciplines. This widespread entanglement has historical parallels to when the World Wide Web (90s) wove HTML into every discipline and brought many new people into the land of technology. In the same way, all types—engineers, statisticians, analysts, artists—are entering the machine learning fray every day. With this book, our goal is to democratize deep learning (and machine learning) and bring it to the broadest audience possible. If you find the topic interesting and are reading this preface—you are the practitioner, and this book is for you. Who Should Read This Book? As opposed to starting out with toy examples and building around those, we chose to start the book with a series of fundamentals to take you on a full journey through deep learning. We feel that too many books leave out core topics that the enterprise practitioner often needs for a quick review. Based on our machine learning experiences in the field, we decided to lead-off with the materials that entry-level practitioners often need to brush up on to better support their deep learning projects. You might want to skip Chapters 1 and 2 and get right to the deep learning fundamentals. However, we expect that you will appreciate having the material up front so that you can have a smooth glide path into the more difficult topics in deep learning that build on these principles. In the following sections, we suggest some reading strategies for different backgrounds.

The Enterprise Machine Learning Practitioner We split this category into two subgroups: Practicing data scientist Java engineer The practicing data scientist This group typically builds models already and is fluent in the realm of data science. If this is you, you can probably skip Chapter 1 and you’ll want to lightly skim Chapter 2. We suggest moving on to Chapter 3 because you’ll probably be ready to jump into the fundamentals of deep networks. The Java engineer Java engineers are typically tasked with integrating machine learning code with production systems. If this is you, starting with Chapter 1 will be interesting for you because it will give you a better understanding of the vernacular of data science. Appendix E should also be of keen interest to you because integration code for model scoring will typically touch ND4J’s API directly. The Enterprise Executive Some of our reviewers were executives of large Fortune 500 companies and appreciated the content from the perspective of getting a better grasp on what is happening in deep learning. One executive commented that it had “been a minute” since college, and Chapter 1 was a nice review of concepts. If you’re an executive, we suggest that you begin with a quick skim of Chapter 1 to reacclimate yourself to some terminology. You might want to skip the chapters that are heavy on APIs and examples, however. The Academic If you’re an academic, you likely will want to skip Chapters 1 and 2 because graduate school will have already covered these topics. The chapters on tuning neural networks in general and then architecture-specific tuning will be of keen interest to you because this information is based on research and transcends any specific deep learning implementation. The coverage of ND4J will also be of interest to you if you prefer to do high-performance linear algebra on the Java Virtual Machine (JVM). Conventions Used in This Book The following typographical conventions are used in this book: Italic

分享到：

赞收藏

资料库

Deep Learning A Practitioners Approach.pdf

相关推荐

人工智能

热门标签

最新资料