Supervised Sequence Labelling with Recurrent Neural Networks.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：2.89M 资料格式：pdf 举报版权申诉

yebo1986-10242856-4744302542862915795.pdf-第1页.png

第1页 / 共147页

yebo1986-10242856-4744302542862915795.pdf-第2页.png

第2页 / 共147页

yebo1986-10242856-4744302542862915795.pdf-第3页.png

第3页 / 共147页

yebo1986-10242856-4744302542862915795.pdf-第4页.png

第4页 / 共147页

yebo1986-10242856-4744302542862915795.pdf-第5页.png

第5页 / 共147页

yebo1986-10242856-4744302542862915795.pdf-第6页.png

第6页 / 共147页

yebo1986-10242856-4744302542862915795.pdf-第7页.png

第7页 / 共147页

yebo1986-10242856-4744302542862915795.pdf-第8页.png

第8页 / 共147页

Title

Contents

Introduction

Structure of the Book

Supervised Sequence Labelling

Supervised Learning

Pattern Classification

Probabilistic Classification

Training Probabilistic Classifiers

Generative and Discriminative Methods

Sequence Labelling

Sequence Classification

Segment Classification

Temporal Classification

Neural Networks

Multilayer Perceptrons

Forward Pass

Output Layers

Loss Functions

Backward Pass

Recurrent Neural Networks

Forward Pass

Backward Pass

Unfolding

Bidirectional Networks

Sequential Jacobian

Network Training

Gradient Descent Algorithms

Generalisation

Input Representation

Weight Initialisation

Long Short-Term Memory

Network Architecture

Influence of Preprocessing

Gradient Calculation

Architectural Variants

Bidirectional Long Short-Term Memory

Network Equations

Forward Pass

Backward Pass

A Comparison of Network Architectures

Experimental Setup

Network Architectures

Computational Complexity

Range of Context

Output Layers

Network Training

Retraining

Results

Previous Work

Effect of Increased Context

Weighted Error

Hidden Markov Model Hybrids

Background

Experiment: Phoneme Recognition

Experimental Setup

Results

Connectionist Temporal Classification

Background

From Outputs to Labellings

Role of the Blank Labels

Bidirectional and Unidirectional Networks

Forward-Backward Algorithm

Log Scale

Loss Function

Loss Gradient

Decoding

Best Path Decoding

Prefix Search Decoding

Constrained Decoding

Experiments

Phoneme Recognition 1

Phoneme Recognition 2

Keyword Spotting

Online Handwriting Recognition

Offline Handwriting Recognition

Discussion

Multidimensional Networks

Background

Network Architecture

Multidirectional Networks

Multidimensional Long Short-Term Memory

Experiments

Air Freight Data

MNIST Data

Analysis

Hierarchical Subsampling Networks

Network Architecture

Subsampling Window Sizes

Hidden Layer Sizes

Number of Levels

Multidimensional Networks

Output Layers

Complete System

Experiments

Offline Arabic Handwriting Recognition

Online Arabic Handwriting Recognition

French Handwriting Recognition

Farsi/Arabic Character Classification

Phoneme Recognition

References

Acknowledgements

Alex Graves Supervised Sequence Labelling with Recurrent Neural Networks

Studies in Computational Intelligence,Volume 385 Editor-in-Chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail: kacprzyk@ibspan.waw.pl Further volumes of this series can be found on our homepage: springer.com Vol. 362. Pascal Bouvry, Horacio González-Vélez, and Joanna Kolodziej (Eds.) Intelligent Decision Systems in Large-Scale Distributed Environments, 2011 ISBN 978-3-642-21270-3 Vol. 363. Kishan G. Mehrotra, Chilukuri Mohan, Jae C. Oh, Pramod K.Varshney, and Moonis Ali (Eds.) Developing Concepts in Applied Intelligence, 2011 ISBN 978-3-642-21331-1 Vol. 364. Roger Lee (Ed.) Computer and Information Science, 2011 ISBN 978-3-642-21377-9 Vol. 365. Roger Lee (Ed.) Computers, Networks, Systems, and Industrial Engineering 2011, 2011 ISBN 978-3-642-21374-8 Vol. 366. Mario Köppen, Gerald Schaefer, and Ajith Abraham (Eds.) Intelligent Computational Optimization in Engineering, 2011 ISBN 978-3-642-21704-3 Vol. 367. Gabriel Luque and Enrique Alba Parallel Genetic Algorithms, 2011 ISBN 978-3-642-22083-8 Vol. 368. Roger Lee (Ed.) Software Engineering, Artiﬁcial Intelligence, Networking and Parallel/Distributed Computing 2011, 2011 ISBN 978-3-642-22287-0 Vol. 369. Dominik Ry_zko, Piotr Gawrysiak, Henryk Rybinski, and Marzena Kryszkiewicz (Eds.) Emerging Intelligent Technologies in Industry, 2011 ISBN 978-3-642-22731-8 Vol. 370. Alexander Mehler, Kai-Uwe Kühnberger, Henning Lobin, Harald Lüngen, Angelika Storrer, and Andreas Witt (Eds.) Modeling, Learning, and Processing of Text Technological Data Structures, 2011 ISBN 978-3-642-22612-0 Vol. 371. Leonid Perlovsky, Ross Deming, and Roman Ilin (Eds.) Emotional Cognitive Neural Algorithms with Engineering Applications, 2011 ISBN 978-3-642-22829-2 Vol. 372. Ant´onio E. Ruano and Annam´aria R. V´arkonyi-K´oczy (Eds.) New Advances in Intelligent Signal Processing, 2011 ISBN 978-3-642-11738-1 Vol. 373. Oleg Okun, Giorgio Valentini, and Matteo Re (Eds.) Ensembles in Machine Learning Applications, 2011 ISBN 978-3-642-22909-1 Vol. 374. Dimitri Plemenos and Georgios Miaoulis (Eds.) Intelligent Computer Graphics 2011, 2011 ISBN 978-3-642-22906-0 Vol. 375. Marenglen Biba and Fatos Xhafa (Eds.) Learning Structure and Schemas from Documents, 2011 ISBN 978-3-642-22912-1 Vol. 376. Toyohide Watanabe and Lakhmi C. Jain (Eds.) Innovations in Intelligent Machines – 2, 2012 ISBN 978-3-642-23189-6 Vol. 377. Roger Lee (Ed.) Software Engineering Research, Management and Applications 2011, 2011 ISBN 978-3-642-23201-5 Vol. 378. János Fodor, Ryszard Klempous, and Carmen Paz Suárez Araujo (Eds.) Recent Advances in Intelligent Engineering Systems, 2011 ISBN 978-3-642-23228-2 Vol. 379. Ferrante Neri, Carlos Cotta, and Pablo Moscato (Eds.) Handbook of Memetic Algorithms, 2012 ISBN 978-3-642-23246-6 Vol. 380. Anthony Brabazon, Michael O’Neill, and Dietmar Maringer (Eds.) Natural Computing in Computational Finance, 2011 ISBN 978-3-642-23335-7 Vol. 381. Rados Chao-Fu Hong, and Ngoc Thanh Nguyen (Eds.) Semantic Methods for Knowledge Management and Communication, 2011 ISBN 978-3-642-23417-0 Vol. 382. F.M.T. Brazier, Kees Nieuwenhuis, Gregor Pavlin, Martijn Warnier, and Costin Badica (Eds.) Intelligent Distributed Computing V, 2011 ISBN 978-3-642-24012-6 Vol. 383. Takayuki Ito, Minjie Zhang, Valentin Robu, Shaheen Fatima, and Tokuro Matsuo(Eds.) New Trends in Agent-based Complex Automated Negotiations, 2011 ISBN 978-3-642-24695-1 Vol. 384. Daphna Weinshall, J¨orn Anem¨uller, and Luc van Gool (Eds.) Detection and Identiﬁcation of Rare Audiovisual Cues, 2012 ISBN 978-3-642-24033-1 Vol. 385. Alex Graves Supervised Sequence Labelling with Recurrent Neural Networks, 2012 ISBN 978-3-642-24796-5 law Katarzyniak, Tzu-Fu Chiu,

Alex Graves Supervised Sequence Labelling with Recurrent Neural Networks 123

Author Dr. Alex Graves University of Toronto Department of Computer Science Toronto, Ontario Canada ISSN 1860-949X ISBN 978-3-642-24796-5 DOI 10.1007/978-3-642-24797-2 Springer Heidelberg New York Dordrecht London e-ISSN 1860-9503 e-ISBN 978-3-642-24797-2 Library of Congress Control Number: 2011940011 c Springer-Verlag Berlin Heidelberg 2012 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief ex- cerpts in connection with reviews or scholarly analysis or material supplied speciﬁcally for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publi- cation does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

Contents List of Tables List of Figures List of Algorithms 1 Introduction 1.1 Structure of the Book . . . . . . . . . . . . . . . . . . . . . . 2 Supervised Sequence Labelling 2.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . 2.2 Pattern Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Probabilistic Classiﬁcation . . . . . . . . . . . . . . . . 2.2.2 Training Probabilistic Classiﬁers . . . . . . . . . . . . 2.2.3 Generative and Discriminative Methods . . . . . . . . 2.3 Sequence Labelling . . . . . . . . . . . . . . . . . . . . . . . . Sequence Classiﬁcation . . . . . . . . . . . . . . . . . . 2.3.1 2.3.2 Segment Classiﬁcation . . . . . . . . . . . . . . . . . . 2.3.3 Temporal Classiﬁcation . . . . . . . . . . . . . . . . . 3 Neural Networks 3.2 Recurrent Neural Networks 3.1 Multilayer Perceptrons . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Forward Pass . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Output Layers . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Loss Functions . . . . . . . . . . . . . . . . . . . . . . 3.1.4 Backward Pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Forward Pass . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Backward Pass . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Unfolding . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Bidirectional Networks . . . . . . . . . . . . . . . . . . Sequential Jacobian . . . . . . . . . . . . . . . . . . . 3.2.5 3.3 Network Training . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Gradient Descent Algorithms . . . . . . . . . . . . . . 3.3.2 Generalisation . . . . . . . . . . . . . . . . . . . . . . IX XI XIII 1 3 5 5 6 6 7 8 9 11 11 12 15 15 16 18 19 20 22 22 23 24 24 27 29 29 30

VI Contents 3.3.3 Input Representation . . . . . . . . . . . . . . . . . . . 3.3.4 Weight Initialisation . . . . . . . . . . . . . . . . . . . 4 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . . . 4.1 Network Architecture 4.2 Inﬂuence of Preprocessing . . . . . . . . . . . . . . . . . . . . 4.3 Gradient Calculation . . . . . . . . . . . . . . . . . . . . . . . 4.4 Architectural Variants . . . . . . . . . . . . . . . . . . . . . . 4.5 Bidirectional Long Short-Term Memory . . . . . . . . . . . . 4.6 Network Equations . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Forward Pass . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Backward Pass . . . . . . . . . . . . . . . . . . . . . . 5 A Comparison of Network Architectures 5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 5.2 Network Architectures . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Computational Complexity . . . . . . . . . . . . . . . 5.2.2 Range of Context . . . . . . . . . . . . . . . . . . . . . 5.2.3 Output Layers . . . . . . . . . . . . . . . . . . . . . . 5.3 Network Training . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Retraining . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Eﬀect of Increased Context . . . . . . . . . . . . . . . 5.4.3 Weighted Error . . . . . . . . . . . . . . . . . . . . . . 6 Hidden Markov Model Hybrids 6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Experiment: Phoneme Recognition . . . . . . . . . . . . . . . 6.2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Results 7 Connectionist Temporal Classiﬁcation 7.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 From Outputs to Labellings . . . . . . . . . . . . . . . . . . . 7.2.1 Role of the Blank Labels . . . . . . . . . . . . . . . . . 7.2.2 Bidirectional and Unidirectional Networks . . . . . . . 7.3 Forward-Backward Algorithm . . . . . . . . . . . . . . . . . . 7.3.1 Log Scale . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Loss Gradient . . . . . . . . . . . . . . . . . . . . . . . 7.5 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Best Path Decoding . . . . . . . . . . . . . . . . . . . 7.5.2 Preﬁx Search Decoding . . . . . . . . . . . . . . . . . 7.5.3 Constrained Decoding . . . . . . . . . . . . . . . . . . 7.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 35 37 37 41 42 42 43 43 44 44 47 47 48 49 49 49 51 51 52 54 54 56 57 57 58 59 59 61 61 63 64 64 64 68 68 69 71 71 72 73 78

Contents 7.6.1 Phoneme Recognition 1 . . . . . . . . . . . . . . . . . 7.6.2 Phoneme Recognition 2 . . . . . . . . . . . . . . . . . 7.6.3 Keyword Spotting . . . . . . . . . . . . . . . . . . . . 7.6.4 Online Handwriting Recognition . . . . . . . . . . . . 7.6.5 Oﬄine Handwriting Recognition . . . . . . . . . . . . 7.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII 79 80 81 85 88 91 8 Multidimensional Networks 8.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Network Architecture 95 95 97 8.2.1 Multidirectional Networks . . . . . . . . . . . . . . . . 100 8.2.2 Multidimensional Long Short-Term Memory . . . . . . 102 8.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 8.3.1 Air Freight Data . . . . . . . . . . . . . . . . . . . . . 104 8.3.2 MNIST Data . . . . . . . . . . . . . . . . . . . . . . . 105 8.3.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 107 9 Hierarchical Subsampling Networks 9.1 Network Architecture 109 . . . . . . . . . . . . . . . . . . . . . . 110 9.1.1 Subsampling Window Sizes . . . . . . . . . . . . . . . 112 9.1.2 Hidden Layer Sizes . . . . . . . . . . . . . . . . . . . . 112 9.1.3 Number of Levels . . . . . . . . . . . . . . . . . . . . . 113 9.1.4 Multidimensional Networks . . . . . . . . . . . . . . . 113 9.1.5 Output Layers . . . . . . . . . . . . . . . . . . . . . . 115 9.1.6 Complete System . . . . . . . . . . . . . . . . . . . . . 116 9.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 9.2.1 Oﬄine Arabic Handwriting Recognition . . . . . . . . 119 9.2.2 Online Arabic Handwriting Recognition . . . . . . . . 122 9.2.3 French Handwriting Recognition . . . . . . . . . . . . 124 9.2.4 Farsi/Arabic Character Classiﬁcation . . . . . . . . . 126 9.2.5 Phoneme Recognition . . . . . . . . . . . . . . . . . . 127 References Acknowledgements 133 141

List of Tables 5.1 Framewise phoneme classiﬁcation results on TIMIT . . . . . . 5.2 Comparison of BLSTM with previous network . . . . . . . . . 6.1 Phoneme recognition results on TIMIT . . . . . . . . . . . . . 7.1 Phoneme recognition results on TIMIT with 61 phonemes . . 7.2 Folding the 61 phonemes in TIMIT onto 39 categories . . . . 7.3 Phoneme recognition results on TIMIT with 39 phonemes . . 7.4 Keyword spotting results on Verbmobil . . . . . . . . . . . . 7.5 Character recognition results on IAM-OnDB . . . . . . . . . 7.6 Word recognition on IAM-OnDB . . . . . . . . . . . . . . . . 7.7 Word recognition results on IAM-DB . . . . . . . . . . . . . . 52 54 60 79 80 82 83 86 88 92 8.1 Classiﬁcation results on MNIST . . . . . . . . . . . . . . . . . 106 9.1 Networks for oﬄine Arabic handwriting recognition . . . . . . 121 9.2 Oﬄine Arabic handwriting recognition competition results . . 121 9.3 Networks for online Arabic handwriting recognition . . . . . . 124 9.4 Online Arabic handwriting recognition competition results . . 125 9.5 Network for French handwriting recognition . . . . . . . . . . 126 9.6 French handwriting recognition competition results . . . . . . 127 9.7 Networks for Farsi/Arabic handwriting recognition . . . . . . 129 9.8 Farsi/Arabic handwriting recognition competition results . . 129 9.9 Networks for phoneme recognition on TIMIT . . . . . . . . . 129 9.10 Phoneme recognition results on TIMIT . . . . . . . . . . . . . 130

分享到：

赞收藏

资料库

Supervised Sequence Labelling with Recurrent Neural Networks.pdf

相关推荐

人工智能

热门标签

最新资料