logo资料库

Deep Learning with R.pdf

第1页 / 共259页
第2页 / 共259页
第3页 / 共259页
第4页 / 共259页
第5页 / 共259页
第6页 / 共259页
第7页 / 共259页
第8页 / 共259页
资料共259页,剩余部分请下载后查看
Preface
Artificial Intelligence
Evolution of Expert Systems to Machine Learning
Machine Learning and Deep Learning
Applications and Research in Deep Learning
Intended Audience
Acknowledgements
About This Book
Contents
About the Author
1 Introduction to Machine Learning
1.1 Machine Learning
1.1.1 Difference Between Machine Learning and Statistics
1.1.2 Difference Between Machine Learning and Deep Learning
1.2 Bias and Variance
1.3 Bias–Variance Trade-off in Machine Learning
1.4 Addressing Bias and Variance in the Model
1.5 Underfitting and Overfitting
1.6 Loss Function
1.7 Regularization
1.8 Gradient Descent
1.9 Hyperparameter Tuning
1.9.1 Searching for Hyperparameters
1.10 Maximum Likelihood Estimation
1.11 Quantifying Loss
1.11.1 The Cross-Entropy Loss
1.11.2 Negative Log-Likelihood
1.11.3 Entropy
1.11.4 Cross-Entropy
1.11.5 Kullback–Leibler Divergence
1.11.6 Summarizing the Measurement of Loss
1.12 Conclusion
2 Introduction to Neural Networks
2.1 Introduction
2.2 Types of Neural Network Architectures
2.2.1 Feedforward Neural Networks (FFNNs)
2.2.2 Convolutional Neural Networks (ConvNets)
2.2.3 Recurrent Neural Networks (RNNs)
2.3 Forward Propagation
2.3.1 Notations
2.3.2 Input Matrix
2.3.3 Bias Matrix
2.3.4 Weight Matrix of Layer-1
2.3.5 Activation Function at Layer-1
2.3.6 Weights Matrix of Layer-2
2.3.7 Activation Function at Layer-2
2.3.8 Output Layer
2.3.9 Summary of Forward Propagation
2.4 Activation Functions
2.4.1 Sigmoid
2.4.2 Hyperbolic Tangent
2.4.3 Rectified Linear Unit
2.4.4 Leaky Rectified Linear Unit
2.4.5 Softmax
2.5 Derivatives of Activation Functions
2.5.1 Derivative of Sigmoid
2.5.2 Derivative of tanh
2.5.3 Derivative of Rectified Linear Unit
2.5.4 Derivative of Leaky Rectified Linear Unit
2.5.5 Derivative of Softmax
2.6 Cross-Entropy Loss
2.7 Derivative of the Cost Function
2.7.1 Derivative of Cross-Entropy Loss with Sigmoid
2.7.2 Derivative of Cross-Entropy Loss with Softmax
2.8 Back Propagation
2.8.1 Summary of Backward Propagation
2.9 Writing a Simple Neural Network Application
2.10 Conclusion
3 Deep Neural Networks-I
3.1 Writing a Deep Neural Network (DNN) Algorithm
3.2 Overview of Packages for Deep Learning in
3.3 Introduction to
3.3.1 Installing
3.3.2 Pipe Operator in
3.3.3 Defining a Model
3.3.4 Configuring the Model
3.3.5 Compile and Fit the Model
3.4 Conclusion
4 Initialization of Network Parameters
4.1 Initialization
4.1.1 Breaking Symmetry
4.1.2 Zero Initialization
4.1.3 Random Initialization
4.1.4 Initialization
4.1.5 Initialization
4.2 Dealing with NaNs
4.2.1 Hyperparameters and Weight Initialization
4.2.2 Normalization
4.2.3 Using Different Activation Functions
4.2.4 Use of NanGuardMode, DebugMode, or MonitorMode
4.2.5 Numerical Stability
4.2.6 Algorithm Related
4.2.7 NaN Introduced by AllocEmpty
4.3 Conclusion
5 Optimization
5.1 Introduction
5.2 Gradient Descent
5.2.1 Gradient Descent or Batch Gradient Descent
5.2.2 Stochastic Gradient Descent
5.2.3 Mini-Batch Gradient Descent
5.3 Parameter Updates
5.3.1 Simple Update
5.3.2 Momentum Update
5.3.3 Nesterov Momentum Update
5.3.4 Annealing the Learning Rate
5.3.5 Second-Order Methods
5.3.6 Per-Parameter Adaptive Learning Rate Methods
5.4 Vanishing Gradient
5.5 Regularization
5.5.1 Dropout Regularization
5.5.2 \ell_2 Regularization
5.5.3 Combining Dropout and \ell_2 Regularization?
5.6 Gradient Checking
5.7 Conclusion
6 Deep Neural Networks-II
6.1 Revisiting DNNs
6.2 Modeling Using
6.2.1 Adjust Epochs
6.2.2 Add Batch Normalization
6.2.3 Add Dropout
6.2.4 Add Weight Regularization
6.2.5 Adjust Learning Rate
6.2.6 Prediction
6.3 Introduction to
6.3.1 What is Flow?
6.3.2
6.3.3 Installing and Running
6.4 Modeling Using
6.4.1 Importing MNIST Data Set from
6.4.2 Define
6.4.3 Training the Model
6.4.4 Instantiating a and Running the Model
6.4.5 Model Evaluation
6.5 Conclusion
7 Convolutional Neural Networks (ConvNets)
7.1 Building Blocks of a Convolution Operation
7.1.1 What is a Convolution Operation?
7.1.2 Edge Detection
7.1.3 Padding
7.1.4 Strided Convolutions
7.1.5 Convolutions over Volume
7.1.6 Pooling
7.2 Single-Layer Convolutional Network
7.2.1 Writing a ConvNet Application
7.3 Training a ConvNet on a Small DataSet Using keras
7.3.1 Data Augmentation
7.4 Specialized Neural Network Architectures
7.4.1 LeNet-5
7.4.2 AlexNet
7.4.3 VGG-16
7.4.4 GoogleNet
7.4.5 Transfer Learning or Using Pretrained Models
7.4.6 Feature Extraction
7.5 What is the ConvNet Learning? A Visualization of Different Layers
7.6 Introduction to Neural Style Transfer
7.6.1 Content Loss
7.6.2 Style Loss
7.6.3 Generating Art Using Neural Style Transfer
7.7 Conclusion
8 Recurrent Neural Networks (RNN) or Sequence Models
8.1 Sequence Models or RNNs
8.2 Applications of Sequence Models
8.3 Sequence Model Architectures
8.4 Writing the Basic Sequence Model Architecture
8.4.1 Backpropagation in Basic RNN
8.5 Long Short-Term Memory (LSTM) Models
8.5.1 The Problem with Sequence Models
8.5.2 Walking Through LSTM
8.6 Writing the LSTM Architecture
8.7 Text Generation with LSTM
8.7.1 Working with Text Data
8.7.2 Generating Sequence Data
8.7.3 Sampling Strategy and the Importance of Softmax Diversity
8.7.4 Implementing LSTM Text Generation
8.8 Natural Language Processing
8.8.1 Word Embeddings
8.8.2 Transfer Learning and Word Embedding
8.8.3 Analyzing Word Similarity Using Word Vectors
8.8.4 Analyzing Word Analogies Using Word Vectors
8.8.5 Debiasing Word Vectors
8.9 Conclusion
9 Epilogue
9.1 Gathering Experience and Knowledge
9.1.1 Research Papers
9.2 Towards Lifelong Learning
9.2.1 Final Words
References
Abhijit Ghatak Deep Learning with R
Deep Learning with R
Abhijit Ghatak Deep Learning with R 123
Abhijit Ghatak Kolkata, India ISBN 978-981-13-5849-4 https://doi.org/10.1007/978-981-13-5850-0 ISBN 978-981-13-5850-0 (eBook) Library of Congress Control Number: 2019933713 © Springer Nature Singapore Pte Ltd. 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. trademarks, service marks, etc. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
I dedicate this book to the deep learning fraternity at large who are trying their best, to get systems to reason over longtime horizons.
Preface Artificial Intelligence The term ‘Artificial Intelligence’ (AI) was coined by John McCarthy in 1956, but the journey to understand if machines can truly think began much before that. Vannevar Bush [1] in his seminal work—As We May Think,1—proposed a system which amplifies people’s own knowledge and understanding. Alan Turing was a pioneer in bringing AI from the realm of philosophical prediction to reality. He wrote a paper on the notion of machines being able to simulate human beings and the ability to do intelligent things. He also realized in the 1950s that it would need a greater understanding of human intelligence before we could hope to build machines which would “think” like humans. His paper titled “Computing Machinery and Intelligence” in 1950 (published in a philosophical journal called Mind) opened the doors to the field that would be called AI, much before the term was actually adopted. The paper defined what would be known as the Turing test,2 which is a model for measuring “intelligence.” Significant AI breakthroughs have been promised “in the next 10 years,” for the past 60 years. One of the proponents of AI, Marvin Minsky, claimed in 1967—“Within a generation …, the problem of creating “artificial intelligence” will substantially be solved,” and in 1970, he quantified his earlier prediction by stating—“In from three to eight years we will have a machine with the general intelligence of a human being.” In the 1960s and early 1970s, several other experts believed it to be right around the corner. When it did not happen, it resulted in drying up of funds and a decline in research activities, resulting in what we term as the first AI winter. During the 1980s, interest in an approach to AI known as expert systems started gathering momentum and a significant amount of money was being spent on 1https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/. 2https://www.turing.org.uk/scrapbook/test.html. vii
viii Preface research and development. By the beginning of the 1990s, due to the limited scope of expert systems, interest waned and this resulted in the second AI winter. Somehow, it appeared that expectations in AI always outpaced the results. Evolution of Expert Systems to Machine Learning An expert system (ES) is a program that is designed to solve problems in a specific domain, which can replace a human expert. By mimicking the thinking of human experts, the expert system was envisaged to analyze and make decisions. The knowledge base of an ES contains both factual knowledge and heuristic knowledge. The ES inference engine was supposed to provide a methodology for reasoning the information present in the knowledge base. Its goal was to come up with a recommendation, and to do so, it combined the facts of a specific case (input data), with the knowledge contained in the knowledge base (rules), resulting in a particular recommendation (answers). Though ES was suitable to solve some well-defined logical problems, it proved otherwise in solving other types of complex problems like image classification and natural language processing (NLP). As a result, ES did not live up to its expecta- tions and gave rise to a shift from the rule-based approach to a data-driven approach. This paved the way to a new era in AI—machine learning. Research over the past 60 years has resulted in significant advances in search algorithms, machine learning algorithms, and integrating statistical analysis to understand the world at large. In machine learning, the system is trained rather than explicitly programmed (unlike that in ES). By exposing large quantities of known facts (input data and answers) to a learning mechanism and performing tuning sessions, we get a system that can make predictions or classifications of unseen input data. It does this by finding out the statistical structure of the input data (and the answers) and comes up with rules for automating the task. Starting in the 1990s, machine learning has quickly become the most popular subfield of AI. This trend has also been driven by the availability of faster com- puting and availability of diverse data sets. A machine learning algorithm transforms its input data into meaningful outputs by a process known as representations. Representations are transformations of the input data, to represent it closer to the expected output. “Learning,” in the context of machine learning, is an automatic search process for better representations of data. Machine learning algorithms find these representations by searching through a predefined set of operations. To summarize, machine learning is searching for useful representations of the input data within a predefined space, using the loss function (difference between the actual output and the estimated output) as a feedback to modify the parameters of the model.
Preface ix Machine Learning and Deep Learning It turns out that machine learning focuses on learning only one or two layers of representations of the input data. This proved intractable for solving human per- ception problems like image classification, text-to-speech translation, handwriting transcription, etc. Therefore, it gave way to a new take on learning representations, which put an emphasis on learning multiple successive layers of representations, resulting in deep learning. The word deep in deep learning only implies the number of layers used in a deep learning model. In deep learning, we deal with layers. A layer is a data transformation function which carries out the transformation of the data which goes through that layer. These transformations are parametrized by a set of weights and biases, which determine the transformation behavior at that layer. Deep learning is a specific subfield of machine learning, which makes use of tens/hundreds of successive layers of representations. The specification of what a layer does to its input is stored in the layer’s parameters. Learning in deep learning can also be defined as finding a set of values for the parameters of each layer of a deep learning model, which will result in the appropriate mapping of the inputs to the associated answers (outputs). Deep learning has been proven to be better than conventional machine learning algorithms for these “perceptual” tasks, but not yet proven to be better in other domains as well. Applications and Research in Deep Learning Deep learning has been gaining traction in many fields, and some of them are listed below. Although most of the work to this date are proof-of-concept (PoC), some of the results have actually provided a new physical insight. Engineering—Signal processing techniques using traditional machine learning exploit shallow architectures often containing a single layer of nonlinear feature transformation. Examples of shallow architecture models are conventional hidden Markov models (HMMs), linear or nonlinear dynamical systems, con- ditional random fields (CRFs), maximum entropy (MaxEnt) models, support vector machines (SVMs), kernel regression, multilayer perceptron (MLP) with a single hidden layer, etc. Signal processing using machine learning also depends a lot on handcrafted features. Deep learning can help in getting task-specific feature representations, learning how to deal with noise in the signal and also work with long-term sequential behaviors. Vision and speech signals require deep architectures for extracting complex structures, and deep learning can provide the necessary architecture. Specific signal processing areas where deep
分享到:
收藏