logo资料库

Supervised Sequence Labelling with Recurrent Neural Networks.pdf

第1页 / 共147页
第2页 / 共147页
第3页 / 共147页
第4页 / 共147页
第5页 / 共147页
第6页 / 共147页
第7页 / 共147页
第8页 / 共147页
资料共147页,剩余部分请下载后查看
Title
Contents
Introduction
Structure of the Book
Supervised Sequence Labelling
Supervised Learning
Pattern Classification
Probabilistic Classification
Training Probabilistic Classifiers
Generative and Discriminative Methods
Sequence Labelling
Sequence Classification
Segment Classification
Temporal Classification
Neural Networks
Multilayer Perceptrons
Forward Pass
Output Layers
Loss Functions
Backward Pass
Recurrent Neural Networks
Forward Pass
Backward Pass
Unfolding
Bidirectional Networks
Sequential Jacobian
Network Training
Gradient Descent Algorithms
Generalisation
Input Representation
Weight Initialisation
Long Short-Term Memory
Network Architecture
Influence of Preprocessing
Gradient Calculation
Architectural Variants
Bidirectional Long Short-Term Memory
Network Equations
Forward Pass
Backward Pass
A Comparison of Network Architectures
Experimental Setup
Network Architectures
Computational Complexity
Range of Context
Output Layers
Network Training
Retraining
Results
Previous Work
Effect of Increased Context
Weighted Error
Hidden Markov Model Hybrids
Background
Experiment: Phoneme Recognition
Experimental Setup
Results
Connectionist Temporal Classification
Background
From Outputs to Labellings
Role of the Blank Labels
Bidirectional and Unidirectional Networks
Forward-Backward Algorithm
Log Scale
Loss Function
Loss Gradient
Decoding
Best Path Decoding
Prefix Search Decoding
Constrained Decoding
Experiments
Phoneme Recognition 1
Phoneme Recognition 2
Keyword Spotting
Online Handwriting Recognition
Offline Handwriting Recognition
Discussion
Multidimensional Networks
Background
Network Architecture
Multidirectional Networks
Multidimensional Long Short-Term Memory
Experiments
Air Freight Data
MNIST Data
Analysis
Hierarchical Subsampling Networks
Network Architecture
Subsampling Window Sizes
Hidden Layer Sizes
Number of Levels
Multidimensional Networks
Output Layers
Complete System
Experiments
Offline Arabic Handwriting Recognition
Online Arabic Handwriting Recognition
French Handwriting Recognition
Farsi/Arabic Character Classification
Phoneme Recognition
References
Acknowledgements
Alex Graves Supervised Sequence Labelling with Recurrent Neural Networks
Studies in Computational Intelligence,Volume 385 Editor-in-Chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail: kacprzyk@ibspan.waw.pl Further volumes of this series can be found on our homepage: springer.com Vol. 362. Pascal Bouvry, Horacio González-Vélez, and Joanna Kolodziej (Eds.) Intelligent Decision Systems in Large-Scale Distributed Environments, 2011 ISBN 978-3-642-21270-3 Vol. 363. Kishan G. Mehrotra, Chilukuri Mohan, Jae C. Oh, Pramod K.Varshney, and Moonis Ali (Eds.) Developing Concepts in Applied Intelligence, 2011 ISBN 978-3-642-21331-1 Vol. 364. Roger Lee (Ed.) Computer and Information Science, 2011 ISBN 978-3-642-21377-9 Vol. 365. Roger Lee (Ed.) Computers, Networks, Systems, and Industrial Engineering 2011, 2011 ISBN 978-3-642-21374-8 Vol. 366. Mario Köppen, Gerald Schaefer, and Ajith Abraham (Eds.) Intelligent Computational Optimization in Engineering, 2011 ISBN 978-3-642-21704-3 Vol. 367. Gabriel Luque and Enrique Alba Parallel Genetic Algorithms, 2011 ISBN 978-3-642-22083-8 Vol. 368. Roger Lee (Ed.) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2011, 2011 ISBN 978-3-642-22287-0 Vol. 369. Dominik Ry_zko, Piotr Gawrysiak, Henryk Rybinski, and Marzena Kryszkiewicz (Eds.) Emerging Intelligent Technologies in Industry, 2011 ISBN 978-3-642-22731-8 Vol. 370. Alexander Mehler, Kai-Uwe Kühnberger, Henning Lobin, Harald Lüngen, Angelika Storrer, and Andreas Witt (Eds.) Modeling, Learning, and Processing of Text Technological Data Structures, 2011 ISBN 978-3-642-22612-0 Vol. 371. Leonid Perlovsky, Ross Deming, and Roman Ilin (Eds.) Emotional Cognitive Neural Algorithms with Engineering Applications, 2011 ISBN 978-3-642-22829-2 Vol. 372. Ant´onio E. Ruano and Annam´aria R. V´arkonyi-K´oczy (Eds.) New Advances in Intelligent Signal Processing, 2011 ISBN 978-3-642-11738-1 Vol. 373. Oleg Okun, Giorgio Valentini, and Matteo Re (Eds.) Ensembles in Machine Learning Applications, 2011 ISBN 978-3-642-22909-1 Vol. 374. Dimitri Plemenos and Georgios Miaoulis (Eds.) Intelligent Computer Graphics 2011, 2011 ISBN 978-3-642-22906-0 Vol. 375. Marenglen Biba and Fatos Xhafa (Eds.) Learning Structure and Schemas from Documents, 2011 ISBN 978-3-642-22912-1 Vol. 376. Toyohide Watanabe and Lakhmi C. Jain (Eds.) Innovations in Intelligent Machines – 2, 2012 ISBN 978-3-642-23189-6 Vol. 377. Roger Lee (Ed.) Software Engineering Research, Management and Applications 2011, 2011 ISBN 978-3-642-23201-5 Vol. 378. János Fodor, Ryszard Klempous, and Carmen Paz Suárez Araujo (Eds.) Recent Advances in Intelligent Engineering Systems, 2011 ISBN 978-3-642-23228-2 Vol. 379. Ferrante Neri, Carlos Cotta, and Pablo Moscato (Eds.) Handbook of Memetic Algorithms, 2012 ISBN 978-3-642-23246-6 Vol. 380. Anthony Brabazon, Michael O’Neill, and Dietmar Maringer (Eds.) Natural Computing in Computational Finance, 2011 ISBN 978-3-642-23335-7 Vol. 381. Rados Chao-Fu Hong, and Ngoc Thanh Nguyen (Eds.) Semantic Methods for Knowledge Management and Communication, 2011 ISBN 978-3-642-23417-0 Vol. 382. F.M.T. Brazier, Kees Nieuwenhuis, Gregor Pavlin, Martijn Warnier, and Costin Badica (Eds.) Intelligent Distributed Computing V, 2011 ISBN 978-3-642-24012-6 Vol. 383. Takayuki Ito, Minjie Zhang, Valentin Robu, Shaheen Fatima, and Tokuro Matsuo(Eds.) New Trends in Agent-based Complex Automated Negotiations, 2011 ISBN 978-3-642-24695-1 Vol. 384. Daphna Weinshall, J¨orn Anem¨uller, and Luc van Gool (Eds.) Detection and Identification of Rare Audiovisual Cues, 2012 ISBN 978-3-642-24033-1 Vol. 385. Alex Graves Supervised Sequence Labelling with Recurrent Neural Networks, 2012 ISBN 978-3-642-24796-5 law Katarzyniak, Tzu-Fu Chiu,
Alex Graves Supervised Sequence Labelling with Recurrent Neural Networks 123
Author Dr. Alex Graves University of Toronto Department of Computer Science Toronto, Ontario Canada ISSN 1860-949X ISBN 978-3-642-24796-5 DOI 10.1007/978-3-642-24797-2 Springer Heidelberg New York Dordrecht London e-ISSN 1860-9503 e-ISBN 978-3-642-24797-2 Library of Congress Control Number: 2011940011 c Springer-Verlag Berlin Heidelberg 2012 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief ex- cerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publi- cation does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Contents List of Tables List of Figures List of Algorithms 1 Introduction 1.1 Structure of the Book . . . . . . . . . . . . . . . . . . . . . . 2 Supervised Sequence Labelling 2.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . 2.2 Pattern Classification . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Probabilistic Classification . . . . . . . . . . . . . . . . 2.2.2 Training Probabilistic Classifiers . . . . . . . . . . . . 2.2.3 Generative and Discriminative Methods . . . . . . . . 2.3 Sequence Labelling . . . . . . . . . . . . . . . . . . . . . . . . Sequence Classification . . . . . . . . . . . . . . . . . . 2.3.1 2.3.2 Segment Classification . . . . . . . . . . . . . . . . . . 2.3.3 Temporal Classification . . . . . . . . . . . . . . . . . 3 Neural Networks 3.2 Recurrent Neural Networks 3.1 Multilayer Perceptrons . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Forward Pass . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Output Layers . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Loss Functions . . . . . . . . . . . . . . . . . . . . . . 3.1.4 Backward Pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Forward Pass . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Backward Pass . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Unfolding . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Bidirectional Networks . . . . . . . . . . . . . . . . . . Sequential Jacobian . . . . . . . . . . . . . . . . . . . 3.2.5 3.3 Network Training . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Gradient Descent Algorithms . . . . . . . . . . . . . . 3.3.2 Generalisation . . . . . . . . . . . . . . . . . . . . . . IX XI XIII 1 3 5 5 6 6 7 8 9 11 11 12 15 15 16 18 19 20 22 22 23 24 24 27 29 29 30
VI Contents 3.3.3 Input Representation . . . . . . . . . . . . . . . . . . . 3.3.4 Weight Initialisation . . . . . . . . . . . . . . . . . . . 4 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . . . 4.1 Network Architecture 4.2 Influence of Preprocessing . . . . . . . . . . . . . . . . . . . . 4.3 Gradient Calculation . . . . . . . . . . . . . . . . . . . . . . . 4.4 Architectural Variants . . . . . . . . . . . . . . . . . . . . . . 4.5 Bidirectional Long Short-Term Memory . . . . . . . . . . . . 4.6 Network Equations . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Forward Pass . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Backward Pass . . . . . . . . . . . . . . . . . . . . . . 5 A Comparison of Network Architectures 5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 5.2 Network Architectures . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Computational Complexity . . . . . . . . . . . . . . . 5.2.2 Range of Context . . . . . . . . . . . . . . . . . . . . . 5.2.3 Output Layers . . . . . . . . . . . . . . . . . . . . . . 5.3 Network Training . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Retraining . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Effect of Increased Context . . . . . . . . . . . . . . . 5.4.3 Weighted Error . . . . . . . . . . . . . . . . . . . . . . 6 Hidden Markov Model Hybrids 6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Experiment: Phoneme Recognition . . . . . . . . . . . . . . . 6.2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Results 7 Connectionist Temporal Classification 7.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 From Outputs to Labellings . . . . . . . . . . . . . . . . . . . 7.2.1 Role of the Blank Labels . . . . . . . . . . . . . . . . . 7.2.2 Bidirectional and Unidirectional Networks . . . . . . . 7.3 Forward-Backward Algorithm . . . . . . . . . . . . . . . . . . 7.3.1 Log Scale . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Loss Gradient . . . . . . . . . . . . . . . . . . . . . . . 7.5 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Best Path Decoding . . . . . . . . . . . . . . . . . . . 7.5.2 Prefix Search Decoding . . . . . . . . . . . . . . . . . 7.5.3 Constrained Decoding . . . . . . . . . . . . . . . . . . 7.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 35 37 37 41 42 42 43 43 44 44 47 47 48 49 49 49 51 51 52 54 54 56 57 57 58 59 59 61 61 63 64 64 64 68 68 69 71 71 72 73 78
Contents 7.6.1 Phoneme Recognition 1 . . . . . . . . . . . . . . . . . 7.6.2 Phoneme Recognition 2 . . . . . . . . . . . . . . . . . 7.6.3 Keyword Spotting . . . . . . . . . . . . . . . . . . . . 7.6.4 Online Handwriting Recognition . . . . . . . . . . . . 7.6.5 Offline Handwriting Recognition . . . . . . . . . . . . 7.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII 79 80 81 85 88 91 8 Multidimensional Networks 8.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Network Architecture 95 95 97 8.2.1 Multidirectional Networks . . . . . . . . . . . . . . . . 100 8.2.2 Multidimensional Long Short-Term Memory . . . . . . 102 8.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 8.3.1 Air Freight Data . . . . . . . . . . . . . . . . . . . . . 104 8.3.2 MNIST Data . . . . . . . . . . . . . . . . . . . . . . . 105 8.3.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 107 9 Hierarchical Subsampling Networks 9.1 Network Architecture 109 . . . . . . . . . . . . . . . . . . . . . . 110 9.1.1 Subsampling Window Sizes . . . . . . . . . . . . . . . 112 9.1.2 Hidden Layer Sizes . . . . . . . . . . . . . . . . . . . . 112 9.1.3 Number of Levels . . . . . . . . . . . . . . . . . . . . . 113 9.1.4 Multidimensional Networks . . . . . . . . . . . . . . . 113 9.1.5 Output Layers . . . . . . . . . . . . . . . . . . . . . . 115 9.1.6 Complete System . . . . . . . . . . . . . . . . . . . . . 116 9.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 9.2.1 Offline Arabic Handwriting Recognition . . . . . . . . 119 9.2.2 Online Arabic Handwriting Recognition . . . . . . . . 122 9.2.3 French Handwriting Recognition . . . . . . . . . . . . 124 9.2.4 Farsi/Arabic Character Classification . . . . . . . . . 126 9.2.5 Phoneme Recognition . . . . . . . . . . . . . . . . . . 127 References Acknowledgements 133 141
List of Tables 5.1 Framewise phoneme classification results on TIMIT . . . . . . 5.2 Comparison of BLSTM with previous network . . . . . . . . . 6.1 Phoneme recognition results on TIMIT . . . . . . . . . . . . . 7.1 Phoneme recognition results on TIMIT with 61 phonemes . . 7.2 Folding the 61 phonemes in TIMIT onto 39 categories . . . . 7.3 Phoneme recognition results on TIMIT with 39 phonemes . . 7.4 Keyword spotting results on Verbmobil . . . . . . . . . . . . 7.5 Character recognition results on IAM-OnDB . . . . . . . . . 7.6 Word recognition on IAM-OnDB . . . . . . . . . . . . . . . . 7.7 Word recognition results on IAM-DB . . . . . . . . . . . . . . 52 54 60 79 80 82 83 86 88 92 8.1 Classification results on MNIST . . . . . . . . . . . . . . . . . 106 9.1 Networks for offline Arabic handwriting recognition . . . . . . 121 9.2 Offline Arabic handwriting recognition competition results . . 121 9.3 Networks for online Arabic handwriting recognition . . . . . . 124 9.4 Online Arabic handwriting recognition competition results . . 125 9.5 Network for French handwriting recognition . . . . . . . . . . 126 9.6 French handwriting recognition competition results . . . . . . 127 9.7 Networks for Farsi/Arabic handwriting recognition . . . . . . 129 9.8 Farsi/Arabic handwriting recognition competition results . . 129 9.9 Networks for phoneme recognition on TIMIT . . . . . . . . . 129 9.10 Phoneme recognition results on TIMIT . . . . . . . . . . . . . 130
分享到:
收藏