Supervised Sequence Labelling with Recurrent Neural Networks.pdf

发布时间：2022-05-29 发布人：admin 分类：说明书资料大小：7.68M 资料格式：pdf 举报版权申诉

mjy304122-9791594-4744302542937409201.pdf-第1页.png

第1页 / 共137页

mjy304122-9791594-4744302542937409201.pdf-第2页.png

第2页 / 共137页

mjy304122-9791594-4744302542937409201.pdf-第3页.png

第3页 / 共137页

mjy304122-9791594-4744302542937409201.pdf-第4页.png

第4页 / 共137页

mjy304122-9791594-4744302542937409201.pdf-第5页.png

第5页 / 共137页

mjy304122-9791594-4744302542937409201.pdf-第6页.png

第6页 / 共137页

mjy304122-9791594-4744302542937409201.pdf-第7页.png

第7页 / 共137页

mjy304122-9791594-4744302542937409201.pdf-第8页.png

第8页 / 共137页

List of Tables

List of Figures

List of Algorithms

Introduction

Structure of the Book

Supervised Sequence Labelling

Supervised Learning

Pattern Classification

Probabilistic Classification

Training Probabilistic Classifiers

Generative and Discriminative Methods

Sequence Labelling

Sequence Classification

Segment Classification

Temporal Classification

Neural Networks

Multilayer Perceptrons

Forward Pass

Output Layers

Loss Functions

Backward Pass

Recurrent Neural Networks

Forward Pass

Backward Pass

Unfolding

Bidirectional Networks

Sequential Jacobian

Network Training

Gradient Descent Algorithms

Generalisation

Input Representation

Weight Initialisation

Long Short-Term Memory

Network Architecture

Influence of Preprocessing

Gradient Calculation

Architectural Variants

Bidirectional Long Short-Term Memory

Network Equations

Forward Pass

Backward Pass

A Comparison of Network Architectures

Experimental Setup

Network Architectures

Computational Complexity

Range of Context

Output Layers

Network Training

Retraining

Results

Previous Work

Effect of Increased Context

Weighted Error

Hidden Markov Model Hybrids

Background

Experiment: Phoneme Recognition

Experimental Setup

Results

Connectionist Temporal Classification

Background

From Outputs to Labellings

Role of the Blank Labels

Bidirectional and Unidirectional Networks

Forward-Backward Algorithm

Log Scale

Loss Function

Loss Gradient

Decoding

Best Path Decoding

Prefix Search Decoding

Constrained Decoding

Experiments

Phoneme Recognition 1

Phoneme Recognition 2

Keyword Spotting

Online Handwriting Recognition

Offline Handwriting Recognition

Discussion

Multidimensional Networks

Background

Network Architecture

Multidirectional Networks

Multidimensional Long Short-Term Memory

Experiments

Air Freight Data

MNIST Data

Analysis

Hierarchical Subsampling Networks

Network Architecture

Subsampling Window Sizes

Hidden Layer Sizes

Number of Levels

Multidimensional Networks

Output Layers

Complete System

Experiments

Offline Arabic Handwriting Recognition

Online Arabic Handwriting Recognition

French Handwriting Recognition

Farsi/Arabic Character Classification

Phoneme Recognition

Bibliography

Acknowledgements

Supervised Sequence Labelling with Recurrent Neural Networks Alex Graves

Contents List of Tables List of Figures List of Algorithms 1 Introduction 1.1 Structure of the Book . . . . . . . . . . . . . . . . . . . . . . . . 2 Supervised Sequence Labelling 2.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Pattern Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Probabilistic Classiﬁcation . . . . . . . . . . . . . . . . . . 2.2.2 Training Probabilistic Classiﬁers . . . . . . . . . . . . . . 2.2.3 Generative and Discriminative Methods . . . . . . . . . . 2.3 Sequence Labelling . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence Classiﬁcation . . . . . . . . . . . . . . . . . . . . 2.3.1 2.3.2 Segment Classiﬁcation . . . . . . . . . . . . . . . . . . . . 2.3.3 Temporal Classiﬁcation . . . . . . . . . . . . . . . . . . . 3 Neural Networks 3.2 Recurrent Neural Networks 3.1 Multilayer Perceptrons . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Forward Pass . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Output Layers . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . 3.1.4 Backward Pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Forward Pass . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Backward Pass . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Unfolding . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Bidirectional Networks . . . . . . . . . . . . . . . . . . . . 3.2.5 Sequential Jacobian . . . . . . . . . . . . . . . . . . . . . 3.3 Network Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Gradient Descent Algorithms . . . . . . . . . . . . . . . . 3.3.2 Generalisation . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Input Representation . . . . . . . . . . . . . . . . . . . . . 3.3.4 Weight Initialisation . . . . . . . . . . . . . . . . . . . . . i iv v vii 1 3 4 4 5 5 5 7 7 9 10 11 12 12 13 15 16 16 18 19 19 20 21 23 25 25 26 29 30

CONTENTS 4 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Network Architecture 4.2 Inﬂuence of Preprocessing . . . . . . . . . . . . . . . . . . . . . . 4.3 Gradient Calculation . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Architectural Variants . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Bidirectional Long Short-Term Memory . . . . . . . . . . . . . . 4.6 Network Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Forward Pass . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Backward Pass . . . . . . . . . . . . . . . . . . . . . . . . 5 A Comparison of Network Architectures 5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Network Architectures . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Computational Complexity . . . . . . . . . . . . . . . . . 5.2.2 Range of Context . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Output Layers . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Network Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Retraining . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Eﬀect of Increased Context . . . . . . . . . . . . . . . . . 5.4.3 Weighted Error . . . . . . . . . . . . . . . . . . . . . . . . 6 Hidden Markov Model Hybrids 6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Experiment: Phoneme Recognition . . . . . . . . . . . . . . . . . 6.2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 6.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Connectionist Temporal Classiﬁcation 7.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 From Outputs to Labellings . . . . . . . . . . . . . . . . . . . . . 7.2.1 Role of the Blank Labels . . . . . . . . . . . . . . . . . . . 7.2.2 Bidirectional and Unidirectional Networks . . . . . . . . . 7.3 Forward-Backward Algorithm . . . . . . . . . . . . . . . . . . . . 7.3.1 Log Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Loss Gradient . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Best Path Decoding . . . . . . . . . . . . . . . . . . . . . 7.5.2 Preﬁx Search Decoding . . . . . . . . . . . . . . . . . . . 7.5.3 Constrained Decoding . . . . . . . . . . . . . . . . . . . . 7.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.1 Phoneme Recognition 1 . . . . . . . . . . . . . . . . . . . 7.6.2 Phoneme Recognition 2 . . . . . . . . . . . . . . . . . . . 7.6.3 Keyword Spotting . . . . . . . . . . . . . . . . . . . . . . 7.6.4 Online Handwriting Recognition . . . . . . . . . . . . . . 7.6.5 Oﬄine Handwriting Recognition . . . . . . . . . . . . . . 7.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii 31 31 35 35 36 36 36 37 38 39 39 40 41 41 41 41 43 43 45 46 46 48 48 49 49 50 52 52 54 54 55 55 58 58 59 60 62 62 63 68 69 70 71 75 78 81

CONTENTS 8 Multidimensional Networks 8.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Multidirectional Networks . . . . . . . . . . . . . . . . . . 8.2.2 Multidimensional Long Short-Term Memory . . . . . . . . 8.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Air Freight Data . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 MNIST Data . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 83 83 85 87 90 91 91 92 93 9 Hierarchical Subsampling Networks 9.1 Network Architecture 96 . . . . . . . . . . . . . . . . . . . . . . . . 97 9.1.1 Subsampling Window Sizes . . . . . . . . . . . . . . . . . 99 9.1.2 Hidden Layer Sizes . . . . . . . . . . . . . . . . . . . . . . 99 9.1.3 Number of Levels . . . . . . . . . . . . . . . . . . . . . . . 100 9.1.4 Multidimensional Networks . . . . . . . . . . . . . . . . . 100 9.1.5 Output Layers . . . . . . . . . . . . . . . . . . . . . . . . 101 9.1.6 Complete System . . . . . . . . . . . . . . . . . . . . . . . 103 9.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 9.2.1 Oﬄine Arabic Handwriting Recognition . . . . . . . . . . 106 9.2.2 Online Arabic Handwriting Recognition . . . . . . . . . . 108 9.2.3 French Handwriting Recognition . . . . . . . . . . . . . . 111 9.2.4 Farsi/Arabic Character Classiﬁcation . . . . . . . . . . . 112 9.2.5 Phoneme Recognition . . . . . . . . . . . . . . . . . . . . 113 Bibliography Acknowledgements 117 128

List of Tables 5.1 Framewise phoneme classiﬁcation results on TIMIT . . . . . . . . 5.2 Comparison of BLSTM with previous network . . . . . . . . . . . 45 46 6.1 Phoneme recognition results on TIMIT . . . . . . . . . . . . . . . 50 7.1 Phoneme recognition results on TIMIT with 61 phonemes . . . . 7.2 Folding the 61 phonemes in TIMIT onto 39 categories . . . . . . 7.3 Phoneme recognition results on TIMIT with 39 phonemes . . . . 7.4 Keyword spotting results on Verbmobil . . . . . . . . . . . . . . 7.5 Character recognition results on IAM-OnDB . . . . . . . . . . . 7.6 Word recognition on IAM-OnDB . . . . . . . . . . . . . . . . . . 7.7 Word recognition results on IAM-DB . . . . . . . . . . . . . . . . 69 70 72 73 76 76 81 8.1 Classiﬁcation results on MNIST . . . . . . . . . . . . . . . . . . . 93 9.1 Networks for oﬄine Arabic handwriting recognition . . . . . . . . 107 9.2 Oﬄine Arabic handwriting recognition competition results . . . . 108 9.3 Networks for online Arabic handwriting recognition . . . . . . . . 110 9.4 Online Arabic handwriting recognition competition results . . . . 111 9.5 Network for French handwriting recognition . . . . . . . . . . . . 112 9.6 French handwriting recognition competition results . . . . . . . . 113 9.7 Networks for Farsi/Arabic handwriting recognition . . . . . . . . 114 9.8 Farsi/Arabic handwriting recognition competition results . . . . 114 9.9 Networks for phoneme recognition on TIMIT . . . . . . . . . . . 116 9.10 Phoneme recognition results on TIMIT . . . . . . . . . . . . . . . 116 iv

List of Figures 2.1 Sequence labelling . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Three classes of sequence labelling task . . . . . . . . . . . . . . . Importance of context in segment classiﬁcation . . . . . . . . . . 2.3 3.1 A multilayer perceptron . . . . . . . . . . . . . . . . . . . . . . . 3.2 Neural network activation functions . . . . . . . . . . . . . . . . 3.3 A recurrent neural network . . . . . . . . . . . . . . . . . . . . . 3.4 An unfolded recurrent network . . . . . . . . . . . . . . . . . . . 3.5 An unfolded bidirectional network . . . . . . . . . . . . . . . . . 3.6 Sequential Jacobian for a bidirectional network . . . . . . . . . . 3.7 Overﬁtting on training data . . . . . . . . . . . . . . . . . . . . . 3.8 Diﬀerent Kinds of Input Perturbation . . . . . . . . . . . . . . . 4.1 The vanishing gradient problem for RNNs . . . . . . . . . . . . . 4.2 LSTM memory block with one cell . . . . . . . . . . . . . . . . . 4.3 An LSTM network . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Preservation of gradient information by LSTM . . . . . . . . . . 5.1 Various networks classifying an excerpt from TIMIT . . . . . . . 5.2 Framewise phoneme classiﬁcation results on TIMIT . . . . . . . . 5.3 Learning curves on TIMIT . . . . . . . . . . . . . . . . . . . . . . 5.4 BLSTM network classifying the utterance “one oh ﬁve” . . . . . 7.1 CTC and framewise classiﬁcation . . . . . . . . . . . . . . . . . . 7.2 Unidirectional and Bidirectional CTC Networks Phonetically Tran- scribing an Excerpt from TIMIT . . . . . . . . . . . . . . . . . . 7.3 CTC forward-backward algorithm . . . . . . . . . . . . . . . . . 7.4 Evolution of the CTC error signal during training . . . . . . . . . 7.5 Problem with best path decoding . . . . . . . . . . . . . . . . . . 7.6 Preﬁx search decoding . . . . . . . . . . . . . . . . . . . . . . . . 7.7 CTC outputs for keyword spotting on Verbmobil . . . . . . . . . 7.8 Sequential Jacobian for keyword spotting on Verbmobil . . . . . 7.9 BLSTM-CTC network labelling an excerpt from IAM-OnDB . . 7.10 BLSTM-CTC Sequential Jacobian from IAM-OnDB with raw in- puts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.11 BLSTM-CTC Sequential Jacobian from IAM-OnDB with prepro- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cessed inputs 8 9 10 13 14 18 20 22 24 27 28 32 33 34 35 42 44 44 47 53 56 58 61 62 63 74 74 77 79 80 8.1 MDRNN forward pass . . . . . . . . . . . . . . . . . . . . . . . . 85 v

LIST OF FIGURES 8.2 MDRNN backward pass . . . . . . . . . . . . . . . . . . . . . . . 8.3 Sequence ordering of 2D data . . . . . . . . . . . . . . . . . . . . 8.4 Context available to a unidirectional two dimensional RNN . . . 8.5 Axes used by the hidden layers in a multidirectional MDRNN . . 8.6 Context available to a multidirectional MDRNN . . . . . . . . . 8.7 Frame from the Air Freight database . . . . . . . . . . . . . . . . 8.8 MNIST image before and after deformation . . . . . . . . . . . . 8.9 MDRNN applied to an image from the Air Freight database . . . 8.10 Sequential Jacobian of an MDRNN for an image from MNIST . . vi 85 85 88 88 88 92 93 94 95 97 9.1 Information ﬂow through an HSRNN . . . . . . . . . . . . . . . . 98 9.2 An unfolded HSRNN . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Information ﬂow through a multidirectional HSRNN . . . . . . . 101 9.4 HSRNN applied to oﬄine Arabic handwriting recognition . . . . 104 9.5 Oﬄine Arabic word images . . . . . . . . . . . . . . . . . . . . . 106 9.6 Oﬄine Arabic error curves . . . . . . . . . . . . . . . . . . . . . . 109 9.7 Online Arabic input sequences . . . . . . . . . . . . . . . . . . . 110 9.8 French word images . . . . . . . . . . . . . . . . . . . . . . . . . 111 9.9 Farsi character images . . . . . . . . . . . . . . . . . . . . . . . . 114 9.10 Three representations of a TIMIT utterance . . . . . . . . . . . . 115

List of Algorithms 3.1 BRNN Forward Pass . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 BRNN Backward Pass . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Online Learning with Gradient Descent . . . . . . . . . . . . . . 3.4 Online Learning with Gradient Descent and Weight Noise . . . . 7.1 Preﬁx Search Decoding . . . . . . . . . . . . . . . . . . . . . . . 7.2 CTC Token Passing . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 MDRNN Forward Pass . . . . . . . . . . . . . . . . . . . . . . . . 8.2 MDRNN Backward Pass . . . . . . . . . . . . . . . . . . . . . . . 8.3 Multidirectional MDRNN Forward Pass . . . . . . . . . . . . . . 8.4 Multidirectional MDRNN Backward Pass . . . . . . . . . . . . . 21 22 25 29 64 67 86 87 89 89 vii

分享到：

赞收藏

资料库

Supervised Sequence Labelling with Recurrent Neural Networks.pdf

相关推荐

开发技术

热门标签

最新资料