logo资料库

自然语言处理手册.pdf

第1页 / 共676页
第2页 / 共676页
第3页 / 共676页
第4页 / 共676页
第5页 / 共676页
第6页 / 共676页
第7页 / 共676页
第8页 / 共676页
资料共676页,剩余部分请下载后查看
Contents
List of Figures
List of Tables
Editors
Board of Reviewers
Contributors
Preface
Part I: Classical Approaches
1. Classical Approaches to Natural Language Processing
1.1 Context
1.2 The Classical Toolkit
Text Preprocessing
Lexical Analysis
Syntactic Parsing
Semantic Analysis
Natural Language Generation
1.3 Conclusions
Reference
2. Text Preprocessing
2.1 Introduction
2.2 Challenges of Text Preprocessing
Character-Set Dependence
Language Dependence
Corpus Dependence
Application Dependence
2.3 Tokenization
Tokenization in Space-Delimited Languages
Tokenization in Unsegmented Languages
2.4 Sentence Segmentation
Sentence Boundary Punctuation
The Importance of Context
Traditional Rule-Based Approaches
Robustness and Trainability
Trainable Algorithms
2.5 Conclusion
References
3. Lexical Analysis
3.1 Introduction
3.2 Finite State Morphonology
Closing Remarks on Finite State Morphonology
3.3 Finite State Morphology
Disjunctive Affixes, Inflectional Classes, and Exceptionality
Further Remarks on Finite State Lexical Analysis
3.4 "Difficult" Morphology and Lexical Analysis
Isomorphism Problems
Contiguity Problems
3.5 Paradigm-Based Lexical Analysis
Paradigmatic Relations and Generalization
The Role of Defaults
Paradigm-Based Accounts of Difficult Morphology
Further Remarks on Paradigm-Based Approaches
3.6 Concluding Remarks
Acknowledgments
References
4. Syntactic Parsing
4.1 Introduction
4.2 Background
Context-Free Grammars
Example Grammar
Syntax Trees
Other Grammar Formalisms
Basic Concepts in Parsing
4.3 The Cocke–Kasami–Younger Algorithm
Handling Unary Rules
Example Session
Handling Long Right-Hand Sides
4.4 Parsing as Deduction
Deduction Systems
The CKY Algorithm
Chart Parsing
Bottom-Up Left-Corner Parsing
Top-Down Earley-Style Parsing
Example Session
Dynamic Filtering
4.5 Implementing Deductive Parsing
Agenda-Driven Chart Parsing
Storing and Retrieving Parse Results
4.6 LR Parsing
The LR(0) Table
Deterministic LR Parsing
Generalized LR Parsing
Optimized GLR Parsing
4.7 Constraint-Based Grammars
Overview
Unification
Tabular Parsing with Unification
4.8 Issues in Parsing
Robustness
Disambiguation
Efficiency
4.9 Historical Notes and Outlook
Acknowledgments
References
5. Semantic Analysis
5.1 Basic Concepts and Issues in Natural Language Semantics
5.2 Theories and Approaches to Semantic Representation
Logical Approaches
Discourse Representation Theory
Pustejovsky's Generative Lexicon
Natural Semantic Metalanguage
Object-Oriented Semantics
5.3 Relational Issues in Lexical Semantics
Sense Relations and Ontologies
Roles
5.4 Fine-Grained Lexical-Semantic Analysis: Three Case Studies
Emotional Meanings: "Sadness" and "Worry" in English and Chinese
Ethnogeographical Categories: "Rivers" and "Creeks"
Functional Macro-Categories
5.5 Prospectus and "Hard Problems"
Acknowledgments
References
6. Natural Language Generation
6.1 Introduction
6.2 Examples of Generated Texts: From Complex to Simple and Back Again
Complex
Simple
Today
6.3 The Components of a Generator
Components and Levels of Representation
6.4 Approaches to Text Planning
The Function of the Speaker
Desiderata for Text Planning
Pushing vs. Pulling
Planning by Progressive Refinement of the Speaker's Message
Planning Using Rhetorical Operators
Text Schemas
6.5 The Linguistic Component
Surface Realization Components
Relationship to Linguistic Theory
Chunk Size
Assembling vs. Navigating
Systemic Grammars
Functional Unification Grammars
6.6 The Cutting Edge
Story Generation
Personality-Sensitive Generation
6.7 Conclusions
References
Part II: Empirical and Statistical Approaches
7. Corpus Creation
7.1 Introduction
7.2 Corpus Size
7.3 Balance, Representativeness, and Sampling
7.4 Data Capture and Copyright
7.5 Corpus Markup and Annotation
7.6 Multilingual Corpora
7.7 Multimodal Corpora
7.8 Conclusions
References
8. Treebank Annotation
8.1 Introduction
8.2 Corpus Annotation Types
8.3 Morphosyntactic Annotation
8.4 Treebanks: Syntactic, Semantic, and Discourse Annotation
Motivation and Definition
An Example: The Penn Treebank
Annotation and Linguistic Theory
Going Beyond the Surface Shape of the Sentence
8.5 The Process of Building Treebanks
8.6 Applications of Treebanks
8.7 Searching Treebanks
8.8 Conclusions
Acknowledgments
References
9. Fundamental Statistical Techniques
9.1 Binary Linear Classification
9.2 One-versus-All Method for Multi-Category Classification
9.3 Maximum Likelihood Estimation
9.4 Generative and Discriminative Models
Naive Bayes
Logistic Regression
9.5 Mixture Model and EM
9.6 Sequence Prediction Models
Hidden Markov Model
Local Discriminative Model for Sequence Prediction
Global Discriminative Model for Sequence Prediction
References
10. Part-of-Speech Tagging
10.1 Introduction
Parts of Speech
Part-of-Speech Problem
10.2 The General Framework
10.3 Part-of-Speech Tagging Approaches
Rule-Based Approaches
Markov Model Approaches
Maximum Entropy Approaches
10.4 Other Statistical and Machine Learning Approaches
Methods and Relevant Work
Combining Taggers
10.5 POS Tagging in Languages Other Than English
Chinese
Korean
Other Languages
10.6 Conclusion
References
11. Statistical Parsing
11.1 Introduction
11.2 Basic Concepts and Terminology
Syntactic Representations
Statistical Parsing Models
Parser Evaluation
11.3 Probabilistic Context-Free Grammars
Basic Definitions
PCFGs as Statistical Parsing Models
Learning and Inference
11.4 Generative Models
History-Based Models
PCFG Transformations
Data-Oriented Parsing
11.5 Discriminative Models
Local Discriminative Models
Global Discriminative Models
11.6 Beyond Supervised Parsing
Weakly Supervised Parsing
Unsupervised Parsing
11.7 Summary and Conclusions
Acknowledgments
References
12. Multiword Expressions
12.1 Introduction
12.2 Linguistic Properties of MWEs
Idiomaticity
Other Properties of MWEs
Testing an Expression for MWEhood
Collocations and MWEs
A Word on Terminology and Related Fields
12.3 Types of MWEs
Nominal MWEs
Verbal MWEs
Prepositional MWEs
12.4 MWE Classification
12.5 Research Issues
Identification
Extraction
Internal Syntactic Disambiguation
MWE Interpretation
12.6 Summary
Acknowledgments
References
13. Normalized Web Distance and Word Similarity
13.1 Introduction
13.2 Some Methods for Word Similarity
Association Measures
Attributes
Relational Word Similarity
Latent Semantic Analysis
13.3 Background of the NWD Method
13.4 Brief Introduction to Kolmogorov Complexity
13.5 Information Distance
Normalized Information Distance
Normalized Compression Distance
13.6 Word Similarity: Normalized Web Distance
13.7 Applications and Experiments
Hierarchical Clustering
Classification
Matching the Meaning
Systematic Comparison with WordNet Semantics
13.8 Conclusion
References
14. Word Sense Disambiguation
14.1 Introduction
14.2 Word Sense Inventories and Problem Characteristics
Treatment of Part of Speech
Sources of Sense Inventories
Granularity of Sense Partitions
Hierarchical vs. Flat Sense Partitions
Idioms and Specialized Collocational Meanings
Regular Polysemy
Related Problems
14.3 Applications of Word Sense Disambiguation
Applications in Information Retrieval
Applications in Machine Translation
Other Applications
14.4 Early Approaches to Sense Disambiguation
Bar-Hillel: An Early Perspective on WSD
Early AI Systems: Word Experts
Dictionary-Based Methods
Kelly and Stone: An Early Corpus-Based Approach
14.5 Supervised Approaches to Sense Disambiguation
Training Data for Supervised WSD Algorithms
Features for WSD Algorithms
Supervised WSD Algorithms
14.6 Lightly Supervised Approaches to WSD
WSD via Word-Class Disambiguation
WSD via Monosemous Relatives
Hierarchical Class Models Using Selectional Restriction
Graph-Based Algorithms for WSD
Iterative Bootstrapping Algorithms
14.7 Unsupervised WSD and Sense Discovery
14.8 Conclusion
References
15. An Overview of Modern Speech Recognition
15.1 Introduction
15.2 Major Architectural Components
Acoustic Models
Language Models
Decoding
15.3 Major Historical Developments in Speech Recognition
15.4 Speech-Recognition Applications
IVR Applications
Appliance—"Response Point"
Mobile Applications
15.5 Technical Challenges and Future Research Directions
Robustness against Acoustic Environments and a Multitude of Other Factors
Capitalizing on Data Deluge for Speech Recognition
Self-Learning and Adaptation for Speech Recognition
Developing Speech Recognizers beyond the Language Barrier
Detection of Unknown Events in Speech Recognition
Learning from Human Speech Perception and Production
Capitalizing on New Trends in Computational Architectures for Speech Recognition
Embedding Knowledge and Parallelism into Speech-Recognition Decoding
15.6 Summary
References
16. Alignment
16.1 Introduction
16.2 Definitions and Concepts
Alignment
Constraints and Correlations
Classes of Algorithms
16.3 Sentence Alignment
Length-Based Sentence Alignment
Lexical Sentence Alignment
Cognate-Based Sentence Alignment
Multifeature Sentence Alignment
Comments on Sentence Alignment
16.4 Character, Word, and Phrase Alignment
Monotonic Alignment for Words
Non-Monotonic Alignment for Single-Token Words
Non-Monotonic Alignment for Multitoken Words and Phrases
16.5 Structure and Tree Alignment
Cost Functions
Algorithms
Strengths and Weaknesses of Structure and Tree Alignment Techniques
16.6 Biparsing and ITG Tree Alignment
Syntax-Directed Transduction Grammars (or Synchronous CFGs)
Inversion Transduction Grammars
Cost Functions
Algorithms
Grammars for Biparsing
Strengths and Weaknesses of Biparsing and ITG Tree Alignment Techniques
16.7 Conclusion
Acknowledgments
References
17. Statistical Machine Translation
17.1 Introduction
17.2 Approaches
17.3 Language Models
17.4 Parallel Corpora
17.5 Word Alignment
17.6 Phrase Library
17.7 Translation Models
IBM Models
Phrase-Based Systems
Syntax-Based Systems for Machine Translation
Direct Translation Models
17.8 Search Strategies
17.9 Research Areas
Acknowledgment
References
Part III: Applications
18. Chinese Machine Translation
18.1 Introduction
18.2 Preprocessing—What Is a Chinese Word?
The Maximum Entropy Framework for Word Segmentation
Translation-Driven Word Segmentation
18.3 Phrase-Based SMT—From Words to Phrases
18.4 Example-Based MT—Translation by Analogy
18.5 Syntax-Based MT—Structural Transfer
18.6 Semantics-Based SMT and Interlingua
Word Sense Translation
Semantic Role Labels
18.7 Applications
Chinese Term and Named Entity Translation
Chinese Spoken Language Translation
Crosslingual Information Retrieval Using Machine Translation
18.8 Conclusion and Discussion
Acknowledgments
References
19. Information Retrieval
19.1 Introduction
19.2 Indexing
Indexing Dimensions
Indexing Process
19.3 IR Models
Classical Boolean Model
Vector-Space Models
Probabilistic Models
Query Expansion and Relevance Feedback
Advanced Models
19.4 Evaluation and Failure Analysis
Evaluation Campaigns
Evaluation Measures
Failure Analysis
19.5 Natural Language Processing and Information Retrieval
Morphology
Orthographic Variation and Spelling Errors
Syntax
Semantics
Related Applications
19.6 Conclusion
Acknowledgments
References
20. Question Answering
20.1 Introduction
20.2 Historical Context
20.3 A Generic Question-Answering System
Question Analysis
Document or Passage Selection
Answer Extraction
Variations of the General Architecture
20.4 Evaluation of QA Systems
Evolution of the TREC QA Track
Evaluation Metrics
20.5 Multilinguality in Question Answering
20.6 Question Answering on Restricted Domains
20.7 Recent Trends and Related Work
Acknowledgment
References
21. Information Extraction
21.1 Introduction
21.2 Diversity of IE Tasks
Unstructured vs. Semi-Structured Text
Single-Document vs. Multi-Document IE
Assumptions about Incoming Documents
21.3 IE with Cascaded Finite-State Transducers
Complex Words
Basic Phrases
Complex Phrases
Domain Events
Template Generation: Merging Structures
21.4 Learning-Based Approaches to IE
Supervised Learning of Extraction Patterns and Rules
Supervised Learning of Sequential Classifier Models
Weakly Supervised and Unsupervised Approaches
Discourse-Oriented Approaches to IE
21.5 How Good Is Information Extraction?
Acknowledgments
References
22. Report Generation
22.1 Introduction
22.2 What Makes Report Generation a Distinct Task?
What Makes a Text a Report?
Report as Text Genre
Characteristic Features of Report Generation
22.3 What Does Report Generation Start From?
Data and Knowledge Sources
Data Assessment and Interpretation
22.4 Text Planning for Report Generation
Content Selection
Discourse Planning
22.5 Linguistic Realization for Report Generation
Input and Levels of Linguistic Representation
Tasks of Linguistic Realization
22.6 Sample Report Generators
22.7 Evaluation in Report Generation
22.8 Conclusions: The Present and the Future of RG
Acknowledgments
References
23. Emerging Applications of Natural Language Generation in Information Visualization, Education, and Health Care
23.1 Introduction
23.2 Multimedia Presentation Generation
23.3 Language Interfaces for Intelligent Tutoring Systems
CIRCSIM-Tutor
AUTOTUTOR
ATLAS-ANDES, WHY2-ATLAS, and WHY2-AUTOTUTOR
Briefly Noted
23.4 Argumentation for Health-Care Consumers
Acknowledgments
References
24. Ontology Construction
24.1 Introduction
24.2 Ontology and Ontologies
Anatomy of an Ontology
Types of Ontologies
Ontology Languages
Ontologies and Natural Language
24.3 Ontology Engineering
Principles
Methodologies
24.4 Ontology Learning
State of the Art
24.5 Summary
Acknowledgments
References
25. BioNLP: Biomedical Text Mining
25.1 Introduction
25.2 The Two Basic Domains of BioNLP: Medical and Biological
Medical/Clinical NLP
Biological/Genomic NLP
Model Organism Databases, and the Database Curator as a Canonical User
BioNLP in the Context of Natural Language Processing and Computational Linguistics
25.3 Just Enough Biology
25.4 Biomedical Text Mining
Named Entity Recognition
Named Entity Normalization
Abbreviation Disambiguation
25.5 Getting Up to Speed: Ten Papers and Resources That Will Let You Read Most Other Work in BioNLP
Named Entity Recognition 1: KeX
Named Entity Recognition 2: Collier, Nobata, and GENIA
Information Extraction 1: Blaschke et al. (1999)
Information Extraction 2: Craven and Kumlein (1999)
Information Extraction 3: MedLEE, BioMedLEE, and GENIES
Corpora 1: PubMed/MEDLINE
Corpora 2: GENIA
Lexical Resources 1: The Gene Ontology
Lexical Resources 2: Entrez Gene
Lexical Resources 3: Unified Medical Language System
25.6 Tools
Tokenization
Named Entity Recognition (Genes/Proteins)
Named Entity Recognition (Medical Concepts)
Full Parsers
25.7 Special Considerations in BioNLP
Code Coverage, Testing, and Corpora
User Interface Design
Portability
Proposition Banks and Semantic Role Labeling in BioNLP
The Difference between an Application That Will Relieve Pain and Suffering and an Application That Will Get You Published at ACL
Acknowledgments
References
26. Sentiment Analysis and Subjectivity
26.1 The Problem of Sentiment Analysis
26.2 Sentiment and Subjectivity Classification
Document-Level Sentiment Classification
Sentence-Level Subjectivity and Sentiment Classification
Opinion Lexicon Generation
26.3 Feature-Based Sentiment Analysis
Feature Extraction
Opinion Orientation Identification
Basic Rules of Opinions
26.4 Sentiment Analysis of Comparative Sentences
Problem Definition
Comparative Sentence Identification
Object and Feature Extraction
Preferred Object Identification
26.5 Opinion Search and Retrieval
26.6 Opinion Spam and Utility of Opinions
Opinion Spam
Utility of Reviews
26.7 Conclusions
Acknowledgments
References
HANDBOOK OF NATURAL LANGUAGE PROCESSING SECOND EDITION
Chapman & Hall/CRC Machine Learning & Pattern Recognition Series 6(5,(6(',7256 5DOI+HUEULFKDQG7KRUH*UDHSHO 0LFURVRIW5HVHDUFK/WG &DPEULGJH8. $,06$1'6&23( 7KLVVHULHVUHÀHFWVWKHODWHVWDGYDQFHVDQGDSSOLFDWLRQVLQPDFKLQHOHDUQLQJ DQGSDWWHUQUHFRJQLWLRQWKURXJKWKHSXEOLFDWLRQRIDEURDGUDQJHRIUHIHUHQFH ZRUNVWH[WERRNVDQGKDQGERRNV7KHLQFOXVLRQRIFRQFUHWHH[DPSOHVDSSOL FDWLRQVDQGPHWKRGVLVKLJKO\HQFRXUDJHG7KHVFRSHRIWKHVHULHVLQFOXGHV EXWLVQRWOLPLWHGWRWLWOHVLQWKHDUHDVRIPDFKLQHOHDUQLQJSDWWHUQUHFRJQL WLRQ FRPSXWDWLRQDO LQWHOOLJHQFH URERWLFV FRPSXWDWLRQDOVWDWLVWLFDO OHDUQLQJ WKHRU\QDWXUDOODQJXDJHSURFHVVLQJFRPSXWHUYLVLRQJDPH$,JDPHWKHRU\ QHXUDOQHWZRUNVFRPSXWDWLRQDOQHXURVFLHQFHDQGRWKHUUHOHYDQWWRSLFVVXFK DV PDFKLQH OHDUQLQJ DSSOLHG WR ELRLQIRUPDWLFV RU FRJQLWLYH VFLHQFH ZKLFK PLJKWEHSURSRVHGE\SRWHQWLDOFRQWULEXWRUV 38%/,6+('7,7/(6 0$&+,1(/($51,1*$Q$OJRULWKPLF3HUVSHFWLYH 6WHSKHQ0DUVODQG +$1'%22.2)1$785$//$1*8$*(352&(66,1* 6HFRQG(GLWLRQ 1LWLQ,QGXUNK\DDQG)UHG-'DPHUDX
Chapman & Hall/CRC Machine Learning & Pattern Recognition Series HANDBOOK OF NATURAL LANGUAGE PROCESSING SECOND EDITION Edited by NITIN INDURKHYA FRED J. DAMERAU
Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2010 by Taylor and Francis Group, LLC Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-13: 978-1-4200-8593-8 (Ebook-PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid- ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti- lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy- ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
To Fred Damerau born December 25, 1931; died January 27, 2009 Some enduring publications: Damerau, F. 1964. A technique for computer detection and correction of spelling errors. Commun. ACM 7, 3 (Mar. 1964), 171–176. Damerau, F. 1971. Markov Models and Linguistic Theory: An Experimental Study of a Model for English. The Hague, the Netherlands: Mouton. Damerau, F. 1985. Problems and some solutions in customization of natural language database front ends. ACM Trans. Inf. Syst. 3, 2 (Apr. 1985), 165–184. Apté, C., Damerau, F., and Weiss, S. 1994. Automated learning of decision rules for text categorization. ACM Trans. Inf. Syst. 12, 3 (Jul. 1994), 233–251. Weiss, S., Indurkhya, N., Zhang, T., and Damerau, F. 2005. Text Mining: Predictive Methods for Analyzing Unstructured Information. New York: Springer.
Contents List of Figures ................................................................................................. ix List of Tables .................................................................................................. xiii Editors........................................................................................................... xv Board of Reviewers .......................................................................................... xvii Contributors................................................................................................... xix Preface .......................................................................................................... xxi PART I Classical Approaches 1 Classical Approaches to Natural Language Processing Robert Dale ................. 2 Text Preprocessing David D. Palmer .......................................................... 3 Lexical Analysis Andrew Hippisley............................................................. 4 Syntactic Parsing Peter Ljunglöf and Mats Wirén ......................................... 5 Semantic Analysis Cliff Goddard and Andrea C. Schalley ............................... 93 6 Natural Language Generation David D. McDonald ...................................... 121 31 59 3 9 PART II Empirical and Statistical Approaches 7 Corpus Creation Richard Xiao .................................................................. 147 8 Treebank Annotation Eva Hajiˇcová, Anne Abeillé, Jan Hajiˇc, Jiˇrí Mírovský, and Zdeˇnka Urešová .................................................................................. 167 9 Fundamental Statistical Techniques Tong Zhang ......................................... 189 10 Part-of-Speech Tagging Tunga Güngör ...................................................... 205 11 Statistical Parsing Joakim Nivre................................................................. 237 12 Multiword Expressions Timothy Baldwin and Su Nam Kim ........................... 267 vii
viii Contents 13 Normalized Web Distance and Word Similarity Paul M.B. Vitányi and Rudi L. Cilibrasi .................................................................................. 293 14 Word Sense Disambiguation David Yarowsky ............................................. 315 15 An Overview of Modern Speech Recognition Xuedong Huang and Li Deng ...... 339 16 Alignment Dekai Wu .............................................................................. 367 17 Statistical Machine Translation Abraham Ittycheriah.................................... 409 PART III Applications 18 Chinese Machine Translation Pascale Fung ................................................ 425 19 Information Retrieval Jacques Savoy and Eric Gaussier ................................. 455 20 Question Answering Diego Mollá-Aliod and José-Luis Vicedo ........................ 485 21 Information Extraction Jerry R. Hobbs and Ellen Riloff ................................. 511 22 Report Generation Leo Wanner ................................................................ 533 23 Emerging Applications of Natural Language Generation in Information Visualization, Education, and Health Care Barbara Di Eugenio and Nancy L. Green ................................................................................... 557 24 Ontology Construction Philipp Cimiano, Johanna Völker, and Paul Buitelaar .................................................................................... 577 25 BioNLP: Biomedical Text Mining K. Bretonnel Cohen .................................. 605 26 Sentiment Analysis and Subjectivity Bing Liu .............................................. 627 Index ............................................................................................................ 667
List of Figures Figure 1.1 The stages of analysis in processing natural language ........................................ Figure 3.1 A spelling rule FST for glasses ................................................................... Figure 3.2 A spelling rule FST for flies ...................................................................... Figure 3.3 An FST with symbol classes ..................................................................... Figure 3.4 Russian nouns classes as an inheritance hierarchy ........................................... Figure 4.1 Example grammar ................................................................................ Figure 4.2 Syntax tree of the sentence “the old man a ship”.............................................. Figure 4.3 CKY matrix after parsing the sentence “the old man a ship” ................................ Figure 4.4 Final chart after bottom-up parsing of the sentence “the old man a ship.” The dotted edges are inferred but useless.................................................................... 4 34 34 36 48 62 62 67 71 Figure 4.5 Final chart after top-down parsing of the sentence “the old man a ship.” The dotted 71 edges are inferred but useless.................................................................... Figure 4.6 Example LR(0) table for the grammar in Figure 4.1 .......................................... 74 Figure 5.1 The lexical representation for the English verb build......................................... 99 Figure 5.2 UER diagrammatic modeling for transitive verb wake up ................................... 103 Figure 8.1 A scheme of annotation types (layers) ......................................................... 169 Figure 8.2 Example of a Penn treebank sentence .......................................................... 171 Figure 8.3 A simplified constituency-based tree structure for the sentence John wants to eat cakes ........................................................................... 171 Figure 8.4 A simplified dependency-based tree structure for the sentence John wants to eat cakes ........................................................................... 171 Figure 8.5 A sample tree from the PDT for the sentence: Česká opozice se nijak netají tím, že pokud se dostane k moci, nebude se deficitnímu rozpočtu nijak bránit. (lit.: Czech opposition Refl. in-no-way keeps-back the-fact that in-so-far-as [it] will-come into power, [it] will-not Refl. deficit budget in-no-way oppose. English translation: The Czech opposition does not keep back that if they come into power, they will not oppose the deficit budget.) ................................................................................ 173 Figure 8.6 A sample French tree. (English translation: It is understood that the public functions remain open to all the citizens.) ................................................................. 174 Figure 8.7 Example from the Tiger corpus: complex syntactic and semantic dependency annotation. (English translation: It develops and prints packaging materials and labels.)......................................................................................... 174 Figure 9.1 Effect of regularization ........................................................................... 191 Figure 9.2 Margin and linear separating hyperplane...................................................... 192 Figure 9.3 Multi-class linear classifier decision boundary ................................................ 193 Figure 9.4 Graphical representation of generative model ................................................. 195 Figure 9.5 Graphical representation of discriminative model ............................................ 196 ix
分享到:
收藏