Deep Learning for NLP and Speech Recognition.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：8.13M 资料格式：pdf 举报版权申诉

第1页 / 共640页

wangshuai1990-11243458-Deep Learning for NLP and Speech Recognition.pdf-第2页.png

第2页 / 共640页

wangshuai1990-11243458-Deep Learning for NLP and Speech Recognition.pdf-第3页.png

第3页 / 共640页

wangshuai1990-11243458-Deep Learning for NLP and Speech Recognition.pdf-第4页.png

第4页 / 共640页

wangshuai1990-11243458-Deep Learning for NLP and Speech Recognition.pdf-第5页.png

第5页 / 共640页

wangshuai1990-11243458-Deep Learning for NLP and Speech Recognition.pdf-第6页.png

第6页 / 共640页

wangshuai1990-11243458-Deep Learning for NLP and Speech Recognition.pdf-第7页.png

第7页 / 共640页

wangshuai1990-11243458-Deep Learning for NLP and Speech Recognition.pdf-第8页.png

第8页 / 共640页

Foreword

Preface

Why This Book?

Who Is This Book for?

What Does This Book Cover?

Acknowledgments

Contents

Notation

Part I Machine Learning, NLP, and Speech Introduction

1 Introduction

1.1 Machine Learning

1.1.1 Supervised Learning

1.1.2 Unsupervised Learning

1.1.3 Semi-Supervised Learning and Active Learning

1.1.4 Transfer Learning and Multitask Learning

1.1.5 Reinforcement Learning

1.2 History

1.2.1 Deep Learning: A Brief History

1.2.2 Natural Language Processing: A Brief History

1.2.3 Automatic Speech Recognition: A Brief History

1.3 Tools, Libraries, Datasets, and Resources for the Practitioners

1.3.1 Deep Learning

1.3.2 Natural Language Processing

1.3.3 Speech Recognition

1.3.3.1 Frameworks

1.3.3.2 Audio Processing

1.3.3.3 Additional Tools and Libraries

1.3.4 Books

1.3.5 Online Courses and Resources

1.3.6 Datasets

1.4 Case Studies and Implementation Details

References

2 Basics of Machine Learning

2.1 Introduction

2.2 Supervised Learning: Framework and Formal Definitions

2.2.1 Input Space and Samples

2.2.2 Target Function and Labels

2.2.3 Training and Prediction

2.3 The Learning Process

2.4 Machine Learning Theory

2.4.1 Generalization–Approximation Trade-Off via the Vapnik–Chervonenkis Analysis

2.4.2 Generalization–Approximation Trade-off via the Bias–Variance Analysis

2.4.3 Model Performance and Evaluation Metrics

2.4.3.1 Classification Evaluation Metrics

2.4.3.2 Regression Evaluation Metrics

2.4.4 Model Validation

2.4.5 Model Estimation and Comparisons

2.4.6 Practical Tips for Machine Learning

2.5 Linear Algorithms

2.5.1 Linear Regression

2.5.1.1 Discussion Points

2.5.2 Perceptron

2.5.2.1 Discussion Points

2.5.3 Regularization

2.5.3.1 Ridge Regularization: L2 Norm

2.5.3.2 Lasso Regularization: L1 Norm

2.5.4 Logistic Regression

2.5.4.1 Gradient Descent

2.5.4.2 Stochastic Gradient Descent

2.5.5 Generative Classifiers

2.5.5.1 Naive Bayes

2.5.5.2 Linear Discriminant Analysis

2.5.6 Practical Tips for Linear Algorithms

2.6 Non-linear Algorithms

2.6.1 Support Vector Machines

2.6.2 Other Non-linear Algorithms

2.7 Feature Transformation, Selection, and Dimensionality Reduction

2.7.1 Feature Transformation

2.7.1.1 Centering or Zero Mean

2.7.1.2 Unit Range

2.7.1.3 Standardization

2.7.1.4 Discretization

2.7.2 Feature Selection and Reduction

2.7.2.1 Principal Component Analysis

2.8 Sequence Data and Modeling

2.8.1 Discrete Time Markov Chains

2.8.2 Discriminative Approach: Hidden Markov Models

2.8.3 Generative Approach: Conditional Random Fields

2.8.3.1 Feature Functions

2.8.3.2 CRF Distribution

2.8.3.3 CRF Training

2.9 Case Study

2.9.1 Software Tools and Libraries

2.9.2 Exploratory Data Analysis (EDA)

2.9.3 Model Training and Hyperparameter Search

2.9.3.1 Feature Transformation and Reduction Impact

2.9.3.2 Hyperparameter Search and Validation

2.9.3.3 Learning Curves

2.9.4 Final Training and Testing Models

2.9.5 Exercises for Readers and Practitioners

References

3 Text and Speech Basics

3.1 Introduction

3.1.1 Computational Linguistics

3.1.2 Natural Language

3.1.3 Model of Language

3.2 Morphological Analysis

3.2.1 Stemming

3.2.2 Lemmatization

3.3 Lexical Representations

3.3.1 Tokens

3.3.2 Stop Words

3.3.3 N-Grams

3.3.4 Documents

3.3.4.1 Document-Term Matrix

3.3.4.2 Bag-of-Words

3.3.4.3 TFIDF

3.4 Syntactic Representations

3.4.1 Part-of-Speech

3.4.1.1 Rules Based

3.4.1.2 Hidden Markov Models

3.4.2 Dependency Parsing

3.4.2.1 Context-Free Grammars

3.4.2.2 Chunking

3.4.2.3 Treebanks

3.5 Semantic Representations

3.5.1 Named Entity Recognition

3.5.2 Relation Extraction

3.5.3 Event Extraction

3.5.4 Semantic Role Labeling

3.6 Discourse Representations

3.6.1 Cohesion

3.6.2 Coherence

3.6.3 Anaphora/Cataphora

3.6.4 Local and Global Coreference

3.7 Language Models

3.7.1 N-Gram Model

3.7.2 Laplace Smoothing

3.7.3 Out-of-Vocabulary

3.7.4 Perplexity

3.8 Text Classification

3.8.1 Machine Learning Approach

3.8.2 Sentiment Analysis

3.8.2.1 Emotional State Model

3.8.2.2 Subjectivity and Objectivity Detection

3.8.3 Entailment

3.9 Text Clustering

3.9.1 Lexical Chains

3.9.2 Topic Modeling

3.9.2.1 LSA

3.9.2.2 LDA

3.10 Machine Translation

3.10.1 Dictionary Based

3.10.2 Statistical Translation

3.11 Question Answering

3.11.1 Information Retrieval Based

3.11.2 Knowledge-Based QA

3.11.3 Automated Reasoning

3.12 Automatic Summarization

3.12.1 Extraction Based

3.12.2 Abstraction Based

3.13 Automated Speech Recognition

3.13.1 Acoustic Model

3.13.1.1 Spectrograms

3.13.1.2 MFCC

3.14 Case Study

3.14.1 Software Tools and Libraries

3.14.2 EDA

3.14.3 Text Clustering

3.14.4 Topic Modeling

3.14.4.1 LSA

3.14.4.2 LDA

3.14.5 Text Classification

3.14.6 Exercises for Readers and Practitioners

References

Part II Deep Learning Basics

4 Basics of Deep Learning

4.1 Introduction

4.2 Perceptron Algorithm Explained

4.2.1 Bias

4.2.2 Linear and Non-linear Separability

4.3 Multilayer Perceptron (Neural Networks)

4.3.1 Training an MLP

4.3.2 Forward Propagation

4.3.3 Error Computation

4.3.4 Backpropagation

4.3.5 Parameter Update

4.3.6 Universal Approximation Theorem

4.4 Deep Learning

4.4.1 Activation Functions

4.4.1.1 Sigmoid

4.4.1.2 Tanh

4.4.1.3 ReLU

4.4.1.4 Other Activation Functions

4.4.1.5 Softmax

4.4.1.6 Hierarchical Softmax

4.4.2 Loss Functions

4.4.2.1 Mean Squared (L2) Error

4.4.2.2 Mean Absolute (L1) Error

4.4.2.3 Negative Log Likelihood

4.4.2.4 Hinge Loss

4.4.2.5 Kullback–Leibler (KL) Loss

4.4.3 Optimization Methods

4.4.3.1 Stochastic Gradient Descent

4.4.3.2 Momentum

4.4.3.3 Adagrad

4.4.3.4 RMS-Prop

4.4.3.5 ADAM

4.5 Model Training

4.5.1 Early Stopping

4.5.2 Vanishing/Exploding Gradients

4.5.3 Full-Batch and Mini-Batch Gradient Decent

4.5.4 Regularization

4.5.4.1 L2 Regularization: Weight Decay

4.5.4.2 L1 Regularization

4.5.4.3 Dropout

4.5.4.4 Multitask Learning

4.5.4.5 Parameter Sharing

4.5.4.6 Batch Normalization

4.5.5 Hyperparameter Selection

4.5.5.1 Manual Tuning

4.5.5.2 Automated Tuning

4.5.6 Data Availability and Quality

4.5.6.1 Data Augmentation

4.5.6.2 Bagging

4.5.6.3 Adversarial Training

4.5.7 Discussion

4.5.7.1 Computation and Memory Constraints

4.6 Unsupervised Deep Learning

4.6.1 Energy-Based Models

4.6.2 Restricted Boltzmann Machines

4.6.3 Deep Belief Networks

4.6.4 Autoencoders

4.6.4.1 Undercomplete Autoencoders

4.6.4.2 Denoising Autoencoders

4.6.4.3 Sparse Autoencoders

4.6.4.4 Variational Autoencoders

4.6.5 Sparse Coding

4.6.6 Generative Adversarial Networks

4.7 Framework Considerations

4.7.1 Layer Abstraction

4.7.2 Computational Graphs

4.7.3 Reverse-Mode Automatic Differentiation

4.7.4 Static Computational Graphs

4.7.5 Dynamic Computational Graphs

4.8 Case Study

4.8.1 Software Tools and Libraries

4.8.2 Exploratory Data Analysis (EDA)

4.8.3 Supervised Learning

4.8.4 Unsupervised Learning

4.8.5 Classifying with Unsupervised Features

4.8.6 Results

4.8.7 Exercises for Readers and Practitioners

References

5 Distributed Representations

5.1 Introduction

5.2 Distributional Semantics

5.2.1 Vector Space Model

5.2.1.1 Curse of Dimensionality

5.2.2 Word Representations

5.2.2.1 Co-occurrence

5.2.2.2 LSA

5.2.3 Neural Language Models

5.2.3.1 Bengio

5.2.3.2 Collobert and Weston

5.2.4 word2vec

5.2.4.1 CBOW

5.2.4.2 Skip-Gram

5.2.4.3 Hierarchical Softmax

5.2.4.4 Negative Sampling

5.2.4.5 Phrase Representations

5.2.4.6 word2vec CBOW: Forward and Backward Propagation

5.2.4.7 word2vec Skip-gram: Forward and Backward Propagation

5.2.5 GloVe

5.2.6 Spectral Word Embeddings

5.2.7 Multilingual Word Embeddings

5.3 Limitations of Word Embeddings

5.3.1 Out of Vocabulary

5.3.2 Antonymy

5.3.3 Polysemy

5.3.3.1 Clustering-Weighted Context Embeddings

5.3.3.2 Sense2vec

5.3.4 Biased Embeddings

5.3.5 Other Limitations

5.4 Beyond Word Embeddings

5.4.1 Subword Embeddings

5.4.2 Word Vector Quantization

5.4.3 Sentence Embeddings

5.4.4 Concept Embeddings

5.4.5 Retrofitting with Semantic Lexicons

5.4.6 Gaussian Embeddings

5.4.6.1 Word2Gauss

5.4.6.2 Bayesian Skip-Gram

5.4.7 Hyperbolic Embeddings

5.5 Applications

5.5.1 Classification

5.5.2 Document Clustering

5.5.3 Language Modeling

5.5.4 Text Anomaly Detection

5.5.5 Contextualized Embeddings

5.6 Case Study

5.6.1 Software Tools and Libraries

5.6.2 Exploratory Data Analysis

5.6.3 Learning Word Embeddings

5.6.3.1 Word2Vec

5.6.3.2 Negative Sampling

5.6.3.3 Training the Model

5.6.3.4 Visualize Embeddings

5.6.3.5 Using the Gensim package

5.6.3.6 Similarity

5.6.3.7 GloVe Embeddings

5.6.3.8 Co-occurrence Matrix

5.6.3.9 GloVe Training

5.6.3.10 GloVe Vector Similarity

5.6.3.11 Using the Glove Package

5.6.4 Document Clustering

5.6.4.1 Document Vectors

5.6.5 Word Sense Disambiguation

5.6.5.1 Supervised Disambiguation Annotations

5.6.5.2 Training with word2vec

5.6.6 Exercises for Readers and Practitioners

References

6 Convolutional Neural Networks

6.1 Introduction

6.2 Basic Building Blocks of CNN

6.2.1 Convolution and Correlation in Linear Time-Invariant Systems

6.2.1.1 Linear Time-Invariant Systems

6.2.1.2 The Convolution Operator and Its Properties

6.2.1.3 Cross-Correlation and Its Properties

6.2.2 Local Connectivity or Sparse Interactions

6.2.3 Parameter Sharing

6.2.4 Spatial Arrangement

6.2.5 Detector Using Nonlinearity

6.2.6 Pooling and Subsampling

6.2.6.1 Max Pooling

6.2.6.2 Average Pooling

6.2.6.3 L2-Norm Pooling

6.2.6.4 Stochastic Pooling

6.2.6.5 Spectral Pooling

6.3 Forward and Backpropagation in CNN

6.3.1 Gradient with Respect to Weights ∂E∂W

6.3.2 Gradient with Respect to the Inputs ∂E∂X

6.3.3 Max Pooling Layer

6.4 Text Inputs and CNNs

6.4.1 Word Embeddings and CNN

6.4.2 Character-Based Representation and CNN

6.5 Classic CNN Architectures

6.5.1 LeNet-5

6.5.2 AlexNet

6.5.3 VGG-16

6.6 Modern CNN Architectures

6.6.1 Stacked or Hierarchical CNN

6.6.2 Dilated CNN

6.6.3 Inception Networks

6.6.4 Other CNN Structures

6.7 Applications of CNN in NLP

6.7.1 Text Classification and Categorization

6.7.2 Text Clustering and Topic Mining

6.7.3 Syntactic Parsing

6.7.4 Information Extraction

6.7.5 Machine Translation

6.7.6 Summarizations

6.7.7 Question and Answers

6.8 Fast Algorithms for Convolutions

6.8.1 Convolution Theorem and Fast Fourier Transform

6.8.2 Fast Filtering Algorithm

6.9 Case Study

6.9.1 Software Tools and Libraries

6.9.2 Exploratory Data Analysis

6.9.3 Data Preprocessing and Data Splits

6.9.4 CNN Model Experiments

6.9.5 Understanding and Improving the Models

6.9.6 Exercises for Readers and Practitioners

6.10 Discussion

References

7 Recurrent Neural Networks

7.1 Introduction

7.2 Basic Building Blocks of RNNs

7.2.1 Recurrence and Memory

7.2.2 PyTorch Example

7.3 RNNs and Properties

7.3.1 Forward and Backpropagation in RNNs

7.3.1.1 Output Weights (V)

7.3.1.2 Recurrent Weights (W)

7.3.1.3 Input Weights (U)

7.3.1.4 Aggregate Gradient

7.3.2 Vanishing Gradient Problem and Regularization

7.3.2.1 Long Short-Term Memory

7.3.2.2 Gated Recurrent Unit

7.3.2.3 Gradient Clipping

7.3.2.4 BPTT Sequence Length

7.3.2.5 Recurrent Dropout

7.4 Deep RNN Architectures

7.4.1 Deep RNNs

7.4.2 Residual LSTM

7.4.3 Recurrent Highway Networks

7.4.4 Bidirectional RNNs

7.4.5 SRU and Quasi-RNN

7.4.6 Recursive Neural Networks

7.5 Extensions of Recurrent Networks

7.5.1 Sequence-to-Sequence

7.5.2 Attention

7.5.3 Pointer Networks

7.5.4 Transformer Networks

7.6 Applications of RNNs in NLP

7.6.1 Text Classification

7.6.2 Part-of-Speech Tagging and Named EntityRecognition

7.6.3 Dependency Parsing

7.6.4 Topic Modeling and Summarization

7.6.5 Question Answering

7.6.6 Multi-Modal

7.6.7 Language Models

7.6.7.1 Perplexity

7.6.7.2 Recurrent Variational Autoencoder

7.6.8 Neural Machine Translation

7.6.8.1 BLEU

7.6.9 Prediction/Sampling Output

7.6.9.1 Greedy Search

7.6.9.2 Random Sampling and Temperature Sampling

7.6.9.3 Optimizing Output: Beam Search Decoding

7.7 Case Study

7.7.1 Software Tools and Libraries

7.7.2 Exploratory Data Analysis

7.7.2.1 Sequence Length Filtering

7.7.2.2 Vocabulary Inspection

7.7.3 Model Training

7.7.3.1 RNN Baseline

7.7.3.2 RNN, LSTM, and GRU Comparison

7.7.3.3 RNN, LSTM, and GRU Layer Depth Comparison

7.7.3.4 Bidirectional RNN, LSTM, and GRU Comparison

7.7.3.5 Deep Bidirectional Comparison

7.7.3.6 Transformer Network

7.7.3.7 Comparison of Experiments

7.7.4 Results

7.7.5 Exercises for Readers and Practitioners

7.8 Discussion

7.8.1 Memorization or Generalization

7.8.2 Future of RNNs

References

8 Automatic Speech Recognition

8.1 Introduction

8.2 Acoustic Features

8.2.1 Speech Production

8.2.2 Raw Waveform

8.2.3 MFCC

8.2.3.1 Pre-emphasis

8.2.3.2 Framing

8.2.3.3 Windowing

8.2.3.4 Fast Fourier Transform

8.2.3.5 Mel Filter Bank

8.2.3.6 Discrete Cosine Transform

8.2.3.7 Delta Energy and Delta Spectrum

8.2.4 Other Feature Types

8.2.4.1 Automatically Learned

8.3 Phones

8.4 Statistical Speech Recognition

8.4.1 Acoustic Model: P(X|W)

8.4.1.1 Lexicon Model: P(S|W)

8.4.2 Language Model: P(W)

8.4.3 HMM Decoding

8.5 Error Metrics

8.6 DNN/HMM Hybrid Model

8.7 Case Study

8.7.1 Dataset: Common Voice

8.7.2 Software Tools and Libraries

8.7.3 Sphinx

8.7.3.1 Data Preparation

8.7.3.2 Model Training

8.7.4 Kaldi

8.7.4.1 Data Preparation

8.7.4.2 Model Training

8.7.5 Results

8.7.6 Exercises for Readers and Practitioners

References

Part III Advanced Deep Learning Techniques for Text and Speech

9 Attention and Memory Augmented Networks

9.1 Introduction

9.2 Attention Mechanism

9.2.1 The Need for Attention Mechanism

9.2.2 Soft Attention

9.2.3 Scores-Based Attention

9.2.4 Soft vs. Hard Attention

9.2.5 Local vs. Global Attention

9.2.6 Self-Attention

9.2.7 Key-Value Attention

9.2.8 Multi-Head Self-Attention

9.2.9 Hierarchical Attention

9.2.10 Applications of Attention Mechanism in Text and Speech

9.3 Memory Augmented Networks

9.3.1 Memory Networks

9.3.2 End-to-End Memory Networks

9.3.2.1 Single Layer MemN2N

9.3.2.2 Input and Query

9.3.2.3 Controller and Memory

9.3.2.4 Controller and Output

9.3.2.5 Final Prediction and Learning

9.3.2.6 Multiple Layers

9.3.3 Neural Turing Machines

9.3.3.1 Read Operations

9.3.3.2 Write Operations

9.3.3.3 Addressing Mechanism

9.3.4 Differentiable Neural Computer

9.3.4.1 Input and Outputs

9.3.4.2 Memory Reads and Writes

9.3.4.3 Selective Attention

9.3.5 Dynamic Memory Networks

9.3.5.1 Input Module

9.3.5.2 Question Module

9.3.5.3 Episodic Memory Module

9.3.5.4 Answer Module

9.3.5.5 Training

9.3.6 Neural Stack, Queues, and Deques

9.3.6.1 Neural Stack

9.3.6.2 Recurrent Networks, Controller, and Training

9.3.7 Recurrent Entity Networks

9.3.7.1 Input Encoder

9.3.7.2 Dynamic Memory

9.3.7.3 Output Module and Training

9.3.8 Applications of Memory Augmented Networks in Text and Speech

9.4 Case Study

9.4.1 Attention-Based NMT

9.4.2 Exploratory Data Analysis

9.4.2.1 Software Tools and Libraries

9.4.2.2 Model Training

9.4.2.3 Bahdanau Attention

9.4.2.4 Results

9.4.3 Question and Answering

9.4.3.1 Software Tools and Libraries

9.4.3.2 Exploratory Data Analysis

9.4.3.3 LSTM Baseline

9.4.3.4 End-to-End Memory Network

9.4.4 Dynamic Memory Network

9.4.4.1 Differentiable Neural Computer

9.4.4.2 Recurrent Entity Network

9.4.5 Exercises for Readers and Practitioners

References

10 Transfer Learning: Scenarios, Self-Taught Learning, and Multitask Learning

10.1 Introduction

10.2 Transfer Learning: Definition, Scenarios, and Categorization

10.2.1 Definition

10.2.2 Transfer Learning Scenarios

10.2.3 Transfer Learning Categories

10.3 Self-Taught Learning

10.3.1 Techniques

10.3.1.1 Unsupervised Pre-training and Supervised Fine-Tuning

10.3.2 Theory

10.3.3 Applications in NLP

10.3.4 Applications in Speech

10.4 Multitask Learning

10.4.1 Techniques

10.4.1.1 Multilinear Relationship Network

10.4.1.2 Fully Adaptive Feature Sharing Network

10.4.1.3 Cross-Stitch Networks

10.4.1.4 A Joint Many-Task Network

10.4.1.5 Sluice Networks

10.4.2 Theory

10.4.3 Applications in NLP

10.4.4 Applications in Speech Recognition

10.5 Case Study

10.5.1 Software Tools and Libraries

10.5.2 Exploratory Data Analysis

10.5.3 Multitask Learning Experiments and Analysis

10.5.4 Exercises for Readers and Practitioners

References

11 Transfer Learning: Domain Adaptation

11.1 Introduction

11.1.1 Techniques

11.1.1.1 Stacked Autoencoders

11.1.1.2 Deep Interpolation Between Source and Target

11.1.1.3 Deep Domain Confusion

11.1.1.4 Deep Adaptation Network

11.1.1.5 Domain-Invariant Representation

11.1.1.6 Domain Confusion and Invariant Representation

11.1.1.7 Domain-Adversarial Neural Network

11.1.1.8 Adversarial Discriminative Domain Adaptation

11.1.1.9 Coupled Generative Adversarial Networks

11.1.1.10 Cycle Generative Adversarial Networks

11.1.1.11 Domain Separation Networks

11.1.2 Theory

11.1.2.1 Siamese Networks Based Domain Adaptations

11.1.2.2 Optimal Transport

11.1.3 Applications in NLP

11.1.4 Applications in Speech Recognition

11.2 Zero-Shot, One-Shot, and Few-Shot Learning

11.2.1 Zero-Shot Learning

11.2.1.1 Techniques

11.2.2 One-Shot Learning

11.2.2.1 Techniques

11.2.3 Few-Shot Learning

11.2.3.1 Techniques

11.2.4 Theory

11.2.5 Applications in NLP and Speech Recognition

11.3 Case Study

11.3.1 Software Tools and Libraries

11.3.2 Exploratory Data Analysis

11.3.3 Domain Adaptation Experiments

11.3.3.1 Preprocessing

11.3.3.2 Experiments

11.3.3.3 Results and Analysis

11.3.4 Exercises for Readers and Practitioners

References

12 End-to-End Speech Recognition

12.1 Introduction

12.2 Connectionist Temporal Classification (CTC)

12.2.1 End-to-End Phoneme Recognition

12.2.2 Deep Speech

12.2.2.1 GPU Parallelism

12.2.3 Deep Speech 2

12.2.4 Wav2Letter

12.2.5 Extensions of CTC

12.2.5.1 Gram-CTC

12.2.5.2 RNN Transducer

12.3 Seq-to-Seq

12.3.0.1 Content-Based Attention

12.3.0.2 Location-Aware Attention

12.3.1 Early Seq-to-Seq ASR

12.3.2 Listen, Attend, and Spell (LAS)

12.4 Multitask Learning

12.5 End-to-End Decoding

12.5.1 Language Models for ASR

12.5.1.1 N-gram

12.5.1.2 RNN Language Models

12.5.2 CTC Decoding

12.5.3 Attention Decoding

12.5.3.1 Shallow Fusion

12.5.4 Combined Language Model Training

12.5.4.1 Deep Fusion

12.5.4.2 Cold Fusion

12.5.5 Combined CTC–Attention Decoding

12.5.5.1 Rescoring

12.5.6 One-Pass Decoding

12.6 Speech Embeddings and Unsupervised Speech Recognition

12.6.1 Speech Embeddings

12.6.2 Unspeech

12.6.3 Audio Word2Vec

12.7 Case Study

12.7.1 Software Tools and Libraries

12.7.2 Deep Speech 2

12.7.2.1 Data Preparation

12.7.2.2 Acoustic Model Training

12.7.3 Language Model Training

12.7.4 ESPnet

12.7.4.1 Data Preparation

12.7.4.2 Model Training

12.7.5 Results

12.7.6 Exercises for Readers and Practitioners

References

13 Deep Reinforcement Learning for Text and Speech

13.1 Introduction

13.2 RL Fundamentals

13.2.1 Markov Decision Processes

13.2.2 Value, Q, and Advantage Functions

13.2.3 Bellman Equations

13.2.4 Optimality

13.2.5 Dynamic Programming Methods

13.2.5.1 Policy Evaluation

13.2.5.2 Policy Improvement

13.2.5.3 Value Iteration

13.2.5.4 Bootstrapping

13.2.5.5 Asynchronous DP

13.2.6 Monte Carlo

13.2.6.1 Importance Sampling

13.2.7 Temporal Difference Learning

13.2.7.1 SARSA

13.2.8 Policy Gradient

13.2.9 Q-Learning

13.2.10 Actor-Critic

13.2.10.1 Advantage Actor Critic A2C

13.2.10.2 Asynchronous Advantage Actor Critic A3C

13.3 Deep Reinforcement Learning Algorithms

13.3.1 Why RL for Seq2seq

13.3.2 Deep Policy Gradient

13.3.3 Deep Q-Learning

13.3.3.1 DQN

13.3.3.2 Double DQN

13.3.3.3 Dueling Networks

13.3.4 Deep Advantage Actor-Critic

13.4 DRL for Text

13.4.1 Information Extraction

13.4.1.1 Entity Extraction

13.4.1.2 Relation Extraction

13.4.1.3 Action Extraction

13.4.1.4 Joint Entity/Relation Extraction

13.4.2 Text Classification

13.4.3 Dialogue Systems

13.4.4 Text Summarization

13.4.5 Machine Translation

13.5 DRL for Speech

13.5.1 Automatic Speech Recognition

13.5.2 Speech Enhancement and Noise Suppression

13.6 Case Study

13.6.1 Software Tools and Libraries

13.6.2 Text Summarization

13.6.3 Exploratory Data Analysis

13.6.3.1 Seq2Seq Model

13.6.3.2 Policy Gradient

13.6.3.3 DDQN

13.6.4 Exercises for Readers and Practitioners

References

Future Outlook

End-to-End Architecture Prevalence

Transition to AI-Centric

Specialized Hardware

Transition Away from Supervised Learning

Explainable AI

Model Development and Deployment Process

Democratization of AI

NLP Trends

Speech Trends

Closing Remarks

Index

UdayKamath· JohnLiu· JamesWhitaker Deep Learning for NLP and Speech Recognition

Deep Learning for NLP and Speech Recognition

Uday Kamath • John Liu • James Whitaker Deep Learning for NLP and Speech Recognition 123

Uday Kamath Digital Reasoning Systems Inc. McLean VA, USA James Whitaker Digital Reasoning Systems Inc. McLean VA, USA John Liu Intelluron Corporation Nashville TN, USA ISBN 978-3-030-14595-8 https://doi.org/10.1007/978-3-030-14596-5 ISBN 978-3-030-14596-5 (eBook) © Springer Nature Switzerland AG 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To my parents Krishna and Bharathi, my wife Pratibha, the kids Aaroh and Brandy, my family and friends for their support. –Uday Kamath To Catherine, Gabrielle Kaili-May, Eugene and Tina for inspiring me always. –John Liu To my mother Nancy for her constant support, my family, and my friends who have blessed my life with love. –James Whitaker

Foreword The publication of this book is a perfect timing. Existing books on deep learning either focus on theoretical aspects or are largely manuals for tools. But this book presents an unprecedented analysis and comparison of deep learning techniques for natural language and speech processing, closing the substantial gap between the- ory and practice. Each chapter discusses the theory underpinning the topics, and an exceptional collection of 13 case studies in different application areas is presented. They include classiﬁcation via distributed representation, summarization, machine translation, sentiment analysis, transfer learning, multitask NLP, end-to-end speech, and question answering. Each case study includes the implementation and compar- ison of state-of-the-art techniques, and the accompanying website provides source code and data. This is extraordinarily valuable for practitioners, who can experiment ﬁrsthand with the methods and can deepen their understanding of the methods by applying them to real-world scenarios. This book offers a comprehensive coverage of deep learning, from its foundations to advanced and recent topics, including word embedding, convolutional neural net- works, recurrent neural networks, attention mechanisms, memory-augmented net- works, multitask learning, domain adaptation, and reinforcement learning. The book is a great resource for practitioners and researchers both in industry and academia, and the discussed case studies and associated material can serve as inspiration for a variety of projects and hands-on assignments in a classroom setting. Associate Professor at GMU Fairfax, VA, USA February 2019 Carlotta Domeniconi, PhD Natural language and speech processing applications such as virtual assistants and smart speakers play an important and ever-growing role in our lives. At the same time, amid an increasing number of publications, it is becoming harder to iden- tify the most promising approaches. As the Chief Analytics Ofﬁcer at Digital Rea- soning and with a PhD in Big Data Machine Learning, Uday has access to both the practical and research aspects of this rapidly growing ﬁeld. Having authored vii

viii Foreword Mastering Java Machine Learning, he is uniquely suited to break down both practi- cal and cutting-edge approaches. This book combines both theoretical and practical aspects of machine learning in a rare blend. It consists of an introduction that makes it accessible to people starting in the ﬁeld, an overview of state-of-the-art methods that should be interesting even to people working in research, and a selection of hands-on examples that ground the material in real-world applications and demon- strate its usefulness to industry practitioners. Research Scientist at DeepMind London, UK February 2019 Sebastian Ruder, PhD A few years ago, I picked up a few text-books to study topics related to arti- ﬁcial intelligence—such as natural language processing and computer vision. My memory of reading these text-books largely consisted of staring helplessly out of the window. Whenever I attempted to implement the described concepts and math, I wouldn’t know where to start. This is fairly common in books written for aca- demic purposes; they mockingly leave the actual implementation “as an exercise to the reader.” There are a few exceptional books that try to bridge this gap, written by people who know the importance of going beyond the math all the way to a working system. This book is one of those exceptions—with it’s discussions, case studies, code snippets, and comprehensive references, it delightfully bridges the gap between learning and doing. I especially like the use of Python and open-source tools out there. It’s an opin- ionated take on implementing machine learning systems—one might ask the fol- lowing question: “Why not X,” where X could be Java, C++, or Matlab? However, I ﬁnd solace in the fact that it’s the most popular opinion, which gives the read- ers an immense support structure as they implement their own ideas. In the mod- ern Internet-connected world, joining a popular ecosystem is equivalent to having thousands of humans connecting together to help each other—from Stack Overﬂow posts solving an error message to GitHub repositories implementing high-quality systems. To give you perspective, I’ve seen the other side, supporting a niche com- munity of enthusiasts in machine learning using the programming language Lua for several years. It was a daily struggle to do new things—even basic things such as making a bar chart—precisely because our community of people was a few orders of magnitude smaller than Python’s. Overall, I hope the reader enjoys a modern, practical take on deep learning sys- tems, leveraging open-source machine learning systems heavily, and being taught a lot of “tricks of the trade” by the incredibly talented authors, one of whom I’ve known for years and have seen build robust speech recognition systems. Research Engineer at Facebook AI Research (FAIR) New York, NY, USA February 2019 Soumith Chintala, PhD

Preface Why This Book? With the widespread adoption of deep learning, natural language processing (NLP), and speech applications in various domains such as ﬁnance, healthcare, and gov- ernment and across our daily lives, there is a growing need for one comprehensive resource that maps deep learning techniques to NLP and speech and provides in- sights into using the tools and libraries for real-world applications. Many books focus on deep learning theory or deep learning for NLP-speciﬁc tasks, while oth- ers are cookbooks for tools and libraries. But, the constant ﬂux of new algorithms, tools, frameworks, and libraries in a rapidly evolving landscape means that there are few available texts that contain explanations of the recent deep learning methods and state-of-the-art approaches applicable to NLP and speech, as well as real-world case studies with code to provide hands-on experience. As an example, you would ﬁnd it difﬁcult to ﬁnd a single source that explains the impact of neural attention techniques applied to a real-world NLP task such as machine translation across a range of approaches, from the basic to the state-of-the-art. Likewise, it would be difﬁcult to ﬁnd a source that includes accompanying code based on well-known li- braries with comparisons and analysis across these techniques. This book provides the following all in one place: • A comprehensive resource that builds up from elementary deep learning, text, and speech principles to advanced state-of-the-art neural architectures • A ready reference for deep learning techniques applicable to common NLP and speech recognition applications • A useful resource on successful architectures and algorithms with essential math- ematical insights explained in detail • An in-depth reference and comparison of the latest end-to-end neural speech processing approaches ix

分享到：

赞收藏

资料库

Deep Learning for NLP and Speech Recognition.pdf

相关推荐

人工智能

热门标签

最新资料