自然语言处理手册.pdf

发布时间：2022-06-09 发布人：admin 分类：说明书资料大小：8.34M 资料格式：pdf 举报版权申诉

u010373893-7880895-4744300845153435277.pdf-第1页.png

第1页 / 共676页

u010373893-7880895-4744300845153435277.pdf-第2页.png

第2页 / 共676页

u010373893-7880895-4744300845153435277.pdf-第3页.png

第3页 / 共676页

u010373893-7880895-4744300845153435277.pdf-第4页.png

第4页 / 共676页

u010373893-7880895-4744300845153435277.pdf-第5页.png

第5页 / 共676页

u010373893-7880895-4744300845153435277.pdf-第6页.png

第6页 / 共676页

u010373893-7880895-4744300845153435277.pdf-第7页.png

第7页 / 共676页

u010373893-7880895-4744300845153435277.pdf-第8页.png

第8页 / 共676页

Contents

List of Figures

List of Tables

Editors

Board of Reviewers

Contributors

Preface

Part I: Classical Approaches

1. Classical Approaches to Natural Language Processing

1.1 Context

1.2 The Classical Toolkit

Text Preprocessing

Lexical Analysis

Syntactic Parsing

Semantic Analysis

Natural Language Generation

1.3 Conclusions

Reference

2. Text Preprocessing

2.1 Introduction

2.2 Challenges of Text Preprocessing

Character-Set Dependence

Language Dependence

Corpus Dependence

Application Dependence

2.3 Tokenization

Tokenization in Space-Delimited Languages

Tokenization in Unsegmented Languages

2.4 Sentence Segmentation

Sentence Boundary Punctuation

The Importance of Context

Traditional Rule-Based Approaches

Robustness and Trainability

Trainable Algorithms

2.5 Conclusion

References

3. Lexical Analysis

3.1 Introduction

3.2 Finite State Morphonology

Closing Remarks on Finite State Morphonology

3.3 Finite State Morphology

Disjunctive Affixes, Inflectional Classes, and Exceptionality

Further Remarks on Finite State Lexical Analysis

3.4 "Difficult" Morphology and Lexical Analysis

Isomorphism Problems

Contiguity Problems

3.5 Paradigm-Based Lexical Analysis

Paradigmatic Relations and Generalization

The Role of Defaults

Paradigm-Based Accounts of Difficult Morphology

Further Remarks on Paradigm-Based Approaches

3.6 Concluding Remarks

Acknowledgments

References

4. Syntactic Parsing

4.1 Introduction

4.2 Background

Context-Free Grammars

Example Grammar

Syntax Trees

Other Grammar Formalisms

Basic Concepts in Parsing

4.3 The Cocke–Kasami–Younger Algorithm

Handling Unary Rules

Example Session

Handling Long Right-Hand Sides

4.4 Parsing as Deduction

Deduction Systems

The CKY Algorithm

Chart Parsing

Bottom-Up Left-Corner Parsing

Top-Down Earley-Style Parsing

Example Session

Dynamic Filtering

4.5 Implementing Deductive Parsing

Agenda-Driven Chart Parsing

Storing and Retrieving Parse Results

4.6 LR Parsing

The LR(0) Table

Deterministic LR Parsing

Generalized LR Parsing

Optimized GLR Parsing

4.7 Constraint-Based Grammars

Overview

Unification

Tabular Parsing with Unification

4.8 Issues in Parsing

Robustness

Disambiguation

Efficiency

4.9 Historical Notes and Outlook

Acknowledgments

References

5. Semantic Analysis

5.1 Basic Concepts and Issues in Natural Language Semantics

5.2 Theories and Approaches to Semantic Representation

Logical Approaches

Discourse Representation Theory

Pustejovsky's Generative Lexicon

Natural Semantic Metalanguage

Object-Oriented Semantics

5.3 Relational Issues in Lexical Semantics

Sense Relations and Ontologies

Roles

5.4 Fine-Grained Lexical-Semantic Analysis: Three Case Studies

Emotional Meanings: "Sadness" and "Worry" in English and Chinese

Ethnogeographical Categories: "Rivers" and "Creeks"

Functional Macro-Categories

5.5 Prospectus and "Hard Problems"

Acknowledgments

References

6. Natural Language Generation

6.1 Introduction

6.2 Examples of Generated Texts: From Complex to Simple and Back Again

Complex

Simple

Today

6.3 The Components of a Generator

Components and Levels of Representation

6.4 Approaches to Text Planning

The Function of the Speaker

Desiderata for Text Planning

Pushing vs. Pulling

Planning by Progressive Refinement of the Speaker's Message

Planning Using Rhetorical Operators

Text Schemas

6.5 The Linguistic Component

Surface Realization Components

Relationship to Linguistic Theory

Chunk Size

Assembling vs. Navigating

Systemic Grammars

Functional Unification Grammars

6.6 The Cutting Edge

Story Generation

Personality-Sensitive Generation

6.7 Conclusions

References

Part II: Empirical and Statistical Approaches

7. Corpus Creation

7.1 Introduction

7.2 Corpus Size

7.3 Balance, Representativeness, and Sampling

7.4 Data Capture and Copyright

7.5 Corpus Markup and Annotation

7.6 Multilingual Corpora

7.7 Multimodal Corpora

7.8 Conclusions

References

8. Treebank Annotation

8.1 Introduction

8.2 Corpus Annotation Types

8.3 Morphosyntactic Annotation

8.4 Treebanks: Syntactic, Semantic, and Discourse Annotation

Motivation and Definition

An Example: The Penn Treebank

Annotation and Linguistic Theory

Going Beyond the Surface Shape of the Sentence

8.5 The Process of Building Treebanks

8.6 Applications of Treebanks

8.7 Searching Treebanks

8.8 Conclusions

Acknowledgments

References

9. Fundamental Statistical Techniques

9.1 Binary Linear Classification

9.2 One-versus-All Method for Multi-Category Classification

9.3 Maximum Likelihood Estimation

9.4 Generative and Discriminative Models

Naive Bayes

Logistic Regression

9.5 Mixture Model and EM

9.6 Sequence Prediction Models

Hidden Markov Model

Local Discriminative Model for Sequence Prediction

Global Discriminative Model for Sequence Prediction

References

10. Part-of-Speech Tagging

10.1 Introduction

Parts of Speech

Part-of-Speech Problem

10.2 The General Framework

10.3 Part-of-Speech Tagging Approaches

Rule-Based Approaches

Markov Model Approaches

Maximum Entropy Approaches

10.4 Other Statistical and Machine Learning Approaches

Methods and Relevant Work

Combining Taggers

10.5 POS Tagging in Languages Other Than English

Chinese

Korean

Other Languages

10.6 Conclusion

References

11. Statistical Parsing

11.1 Introduction

11.2 Basic Concepts and Terminology

Syntactic Representations

Statistical Parsing Models

Parser Evaluation

11.3 Probabilistic Context-Free Grammars

Basic Definitions

PCFGs as Statistical Parsing Models

Learning and Inference

11.4 Generative Models

History-Based Models

PCFG Transformations

Data-Oriented Parsing

11.5 Discriminative Models

Local Discriminative Models

Global Discriminative Models

11.6 Beyond Supervised Parsing

Weakly Supervised Parsing

Unsupervised Parsing

11.7 Summary and Conclusions

Acknowledgments

References

12. Multiword Expressions

12.1 Introduction

12.2 Linguistic Properties of MWEs

Idiomaticity

Other Properties of MWEs

Testing an Expression for MWEhood

Collocations and MWEs

A Word on Terminology and Related Fields

12.3 Types of MWEs

Nominal MWEs

Verbal MWEs

Prepositional MWEs

12.4 MWE Classification

12.5 Research Issues

Identification

Extraction

Internal Syntactic Disambiguation

MWE Interpretation

12.6 Summary

Acknowledgments

References

13. Normalized Web Distance and Word Similarity

13.1 Introduction

13.2 Some Methods for Word Similarity

Association Measures

Attributes

Relational Word Similarity

Latent Semantic Analysis

13.3 Background of the NWD Method

13.4 Brief Introduction to Kolmogorov Complexity

13.5 Information Distance

Normalized Information Distance

Normalized Compression Distance

13.6 Word Similarity: Normalized Web Distance

13.7 Applications and Experiments

Hierarchical Clustering

Classification

Matching the Meaning

Systematic Comparison with WordNet Semantics

13.8 Conclusion

References

14. Word Sense Disambiguation

14.1 Introduction

14.2 Word Sense Inventories and Problem Characteristics

Treatment of Part of Speech

Sources of Sense Inventories

Granularity of Sense Partitions

Hierarchical vs. Flat Sense Partitions

Idioms and Specialized Collocational Meanings

Regular Polysemy

Related Problems

14.3 Applications of Word Sense Disambiguation

Applications in Information Retrieval

Applications in Machine Translation

Other Applications

14.4 Early Approaches to Sense Disambiguation

Bar-Hillel: An Early Perspective on WSD

Early AI Systems: Word Experts

Dictionary-Based Methods

Kelly and Stone: An Early Corpus-Based Approach

14.5 Supervised Approaches to Sense Disambiguation

Training Data for Supervised WSD Algorithms

Features for WSD Algorithms

Supervised WSD Algorithms

14.6 Lightly Supervised Approaches to WSD

WSD via Word-Class Disambiguation

WSD via Monosemous Relatives

Hierarchical Class Models Using Selectional Restriction

Graph-Based Algorithms for WSD

Iterative Bootstrapping Algorithms

14.7 Unsupervised WSD and Sense Discovery

14.8 Conclusion

References

15. An Overview of Modern Speech Recognition

15.1 Introduction

15.2 Major Architectural Components

Acoustic Models

Language Models

Decoding

15.3 Major Historical Developments in Speech Recognition

15.4 Speech-Recognition Applications

IVR Applications

Appliance—"Response Point"

Mobile Applications

15.5 Technical Challenges and Future Research Directions

Robustness against Acoustic Environments and a Multitude of Other Factors

Capitalizing on Data Deluge for Speech Recognition

Self-Learning and Adaptation for Speech Recognition

Developing Speech Recognizers beyond the Language Barrier

Detection of Unknown Events in Speech Recognition

Learning from Human Speech Perception and Production

Capitalizing on New Trends in Computational Architectures for Speech Recognition

Embedding Knowledge and Parallelism into Speech-Recognition Decoding

15.6 Summary

References

16. Alignment

16.1 Introduction

16.2 Definitions and Concepts

Alignment

Constraints and Correlations

Classes of Algorithms

16.3 Sentence Alignment

Length-Based Sentence Alignment

Lexical Sentence Alignment

Cognate-Based Sentence Alignment

Multifeature Sentence Alignment

Comments on Sentence Alignment

16.4 Character, Word, and Phrase Alignment

Monotonic Alignment for Words

Non-Monotonic Alignment for Single-Token Words

Non-Monotonic Alignment for Multitoken Words and Phrases

16.5 Structure and Tree Alignment

Cost Functions

Algorithms

Strengths and Weaknesses of Structure and Tree Alignment Techniques

16.6 Biparsing and ITG Tree Alignment

Syntax-Directed Transduction Grammars (or Synchronous CFGs)

Inversion Transduction Grammars

Cost Functions

Algorithms

Grammars for Biparsing

Strengths and Weaknesses of Biparsing and ITG Tree Alignment Techniques

16.7 Conclusion

Acknowledgments

References

17. Statistical Machine Translation

17.1 Introduction

17.2 Approaches

17.3 Language Models

17.4 Parallel Corpora

17.5 Word Alignment

17.6 Phrase Library

17.7 Translation Models

IBM Models

Phrase-Based Systems

Syntax-Based Systems for Machine Translation

Direct Translation Models

17.8 Search Strategies

17.9 Research Areas

Acknowledgment

References

Part III: Applications

18. Chinese Machine Translation

18.1 Introduction

18.2 Preprocessing—What Is a Chinese Word?

The Maximum Entropy Framework for Word Segmentation

Translation-Driven Word Segmentation

18.3 Phrase-Based SMT—From Words to Phrases

18.4 Example-Based MT—Translation by Analogy

18.5 Syntax-Based MT—Structural Transfer

18.6 Semantics-Based SMT and Interlingua

Word Sense Translation

Semantic Role Labels

18.7 Applications

Chinese Term and Named Entity Translation

Chinese Spoken Language Translation

Crosslingual Information Retrieval Using Machine Translation

18.8 Conclusion and Discussion

Acknowledgments

References

19. Information Retrieval

19.1 Introduction

19.2 Indexing

Indexing Dimensions

Indexing Process

19.3 IR Models

Classical Boolean Model

Vector-Space Models

Probabilistic Models

Query Expansion and Relevance Feedback

Advanced Models

19.4 Evaluation and Failure Analysis

Evaluation Campaigns

Evaluation Measures

Failure Analysis

19.5 Natural Language Processing and Information Retrieval

Morphology

Orthographic Variation and Spelling Errors

Syntax

Semantics

Related Applications

19.6 Conclusion

Acknowledgments

References

20. Question Answering

20.1 Introduction

20.2 Historical Context

20.3 A Generic Question-Answering System

Question Analysis

Document or Passage Selection

Answer Extraction

Variations of the General Architecture

20.4 Evaluation of QA Systems

Evolution of the TREC QA Track

Evaluation Metrics

20.5 Multilinguality in Question Answering

20.6 Question Answering on Restricted Domains

20.7 Recent Trends and Related Work

Acknowledgment

References

21. Information Extraction

21.1 Introduction

21.2 Diversity of IE Tasks

Unstructured vs. Semi-Structured Text

Single-Document vs. Multi-Document IE

Assumptions about Incoming Documents

21.3 IE with Cascaded Finite-State Transducers

Complex Words

Basic Phrases

Complex Phrases

Domain Events

Template Generation: Merging Structures

21.4 Learning-Based Approaches to IE

Supervised Learning of Extraction Patterns and Rules

Supervised Learning of Sequential Classifier Models

Weakly Supervised and Unsupervised Approaches

Discourse-Oriented Approaches to IE

21.5 How Good Is Information Extraction?

Acknowledgments

References

22. Report Generation

22.1 Introduction

22.2 What Makes Report Generation a Distinct Task?

What Makes a Text a Report?

Report as Text Genre

Characteristic Features of Report Generation

22.3 What Does Report Generation Start From?

Data and Knowledge Sources

Data Assessment and Interpretation

22.4 Text Planning for Report Generation

Content Selection

Discourse Planning

22.5 Linguistic Realization for Report Generation

Input and Levels of Linguistic Representation

Tasks of Linguistic Realization

22.6 Sample Report Generators

22.7 Evaluation in Report Generation

22.8 Conclusions: The Present and the Future of RG

Acknowledgments

References

23. Emerging Applications of Natural Language Generation in Information Visualization, Education, and Health Care

23.1 Introduction

23.2 Multimedia Presentation Generation

23.3 Language Interfaces for Intelligent Tutoring Systems

CIRCSIM-Tutor

AUTOTUTOR

ATLAS-ANDES, WHY2-ATLAS, and WHY2-AUTOTUTOR

Briefly Noted

23.4 Argumentation for Health-Care Consumers

Acknowledgments

References

24. Ontology Construction

24.1 Introduction

24.2 Ontology and Ontologies

Anatomy of an Ontology

Types of Ontologies

Ontology Languages

Ontologies and Natural Language

24.3 Ontology Engineering

Principles

Methodologies

24.4 Ontology Learning

State of the Art

24.5 Summary

Acknowledgments

References

25. BioNLP: Biomedical Text Mining

25.1 Introduction

25.2 The Two Basic Domains of BioNLP: Medical and Biological

Medical/Clinical NLP

Biological/Genomic NLP

Model Organism Databases, and the Database Curator as a Canonical User

BioNLP in the Context of Natural Language Processing and Computational Linguistics

25.3 Just Enough Biology

25.4 Biomedical Text Mining

Named Entity Recognition

Named Entity Normalization

Abbreviation Disambiguation

25.5 Getting Up to Speed: Ten Papers and Resources That Will Let You Read Most Other Work in BioNLP

Named Entity Recognition 1: KeX

Named Entity Recognition 2: Collier, Nobata, and GENIA

Information Extraction 1: Blaschke et al. (1999)

Information Extraction 2: Craven and Kumlein (1999)

Information Extraction 3: MedLEE, BioMedLEE, and GENIES

Corpora 1: PubMed/MEDLINE

Corpora 2: GENIA

Lexical Resources 1: The Gene Ontology

Lexical Resources 2: Entrez Gene

Lexical Resources 3: Unified Medical Language System

25.6 Tools

Tokenization

Named Entity Recognition (Genes/Proteins)

Named Entity Recognition (Medical Concepts)

Full Parsers

25.7 Special Considerations in BioNLP

Code Coverage, Testing, and Corpora

User Interface Design

Portability

Proposition Banks and Semantic Role Labeling in BioNLP

The Difference between an Application That Will Relieve Pain and Suffering and an Application That Will Get You Published at ACL

Acknowledgments

References

26. Sentiment Analysis and Subjectivity

26.1 The Problem of Sentiment Analysis

26.2 Sentiment and Subjectivity Classification

Document-Level Sentiment Classification

Sentence-Level Subjectivity and Sentiment Classification

Opinion Lexicon Generation

26.3 Feature-Based Sentiment Analysis

Feature Extraction

Opinion Orientation Identification

Basic Rules of Opinions

26.4 Sentiment Analysis of Comparative Sentences

Problem Definition

Comparative Sentence Identification

Object and Feature Extraction

Preferred Object Identification

26.5 Opinion Search and Retrieval

26.6 Opinion Spam and Utility of Opinions

Opinion Spam

Utility of Reviews

26.7 Conclusions

Acknowledgments

References

HANDBOOK OF NATURAL LANGUAGE PROCESSING SECOND EDITION

Chapman & Hall/CRC Machine Learning & Pattern Recognition Series 6(5,(6(',7256 5DOI+HUEULFKDQG7KRUH*UDHSHO 0LFURVRIW5HVHDUFK/WG &DPEULGJH8. $,06$1'6&23( 7KLVVHULHVUHÀHFWVWKHODWHVWDGYDQFHVDQGDSSOLFDWLRQVLQPDFKLQHOHDUQLQJ DQGSDWWHUQUHFRJQLWLRQWKURXJKWKHSXEOLFDWLRQRIDEURDGUDQJHRIUHIHUHQFH ZRUNVWH[WERRNVDQGKDQGERRNV7KHLQFOXVLRQRIFRQFUHWHH[DPSOHVDSSOL FDWLRQVDQGPHWKRGVLVKLJKO\HQFRXUDJHG7KHVFRSHRIWKHVHULHVLQFOXGHV EXWLVQRWOLPLWHGWRWLWOHVLQWKHDUHDVRIPDFKLQHOHDUQLQJSDWWHUQUHFRJQL WLRQ FRPSXWDWLRQDO LQWHOOLJHQFH URERWLFV FRPSXWDWLRQDOVWDWLVWLFDO OHDUQLQJ WKHRU\QDWXUDOODQJXDJHSURFHVVLQJFRPSXWHUYLVLRQJDPH$,JDPHWKHRU\ QHXUDOQHWZRUNVFRPSXWDWLRQDOQHXURVFLHQFHDQGRWKHUUHOHYDQWWRSLFVVXFK DV PDFKLQH OHDUQLQJ DSSOLHG WR ELRLQIRUPDWLFV RU FRJQLWLYH VFLHQFH ZKLFK PLJKWEHSURSRVHGE\SRWHQWLDOFRQWULEXWRUV 38%/,6+('7,7/(6 0$&+,1(/($51,1*$Q$OJRULWKPLF3HUVSHFWLYH 6WHSKHQ0DUVODQG +$1'%22.2)1$785$//$1*8$*(352&(66,1* 6HFRQG(GLWLRQ 1LWLQ,QGXUNK\DDQG)UHG-'DPHUDX

Chapman & Hall/CRC Machine Learning & Pattern Recognition Series HANDBOOK OF NATURAL LANGUAGE PROCESSING SECOND EDITION Edited by NITIN INDURKHYA FRED J. DAMERAU

Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2010 by Taylor and Francis Group, LLC Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-13: 978-1-4200-8593-8 (Ebook-PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid- ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti- lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy- ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com

To Fred Damerau born December 25, 1931; died January 27, 2009 Some enduring publications: Damerau, F. 1964. A technique for computer detection and correction of spelling errors. Commun. ACM 7, 3 (Mar. 1964), 171–176. Damerau, F. 1971. Markov Models and Linguistic Theory: An Experimental Study of a Model for English. The Hague, the Netherlands: Mouton. Damerau, F. 1985. Problems and some solutions in customization of natural language database front ends. ACM Trans. Inf. Syst. 3, 2 (Apr. 1985), 165–184. Apté, C., Damerau, F., and Weiss, S. 1994. Automated learning of decision rules for text categorization. ACM Trans. Inf. Syst. 12, 3 (Jul. 1994), 233–251. Weiss, S., Indurkhya, N., Zhang, T., and Damerau, F. 2005. Text Mining: Predictive Methods for Analyzing Unstructured Information. New York: Springer.

Contents List of Figures ................................................................................................. ix List of Tables .................................................................................................. xiii Editors........................................................................................................... xv Board of Reviewers .......................................................................................... xvii Contributors................................................................................................... xix Preface .......................................................................................................... xxi PART I Classical Approaches 1 Classical Approaches to Natural Language Processing Robert Dale ................. 2 Text Preprocessing David D. Palmer .......................................................... 3 Lexical Analysis Andrew Hippisley............................................................. 4 Syntactic Parsing Peter Ljunglöf and Mats Wirén ......................................... 5 Semantic Analysis Cliﬀ Goddard and Andrea C. Schalley ............................... 93 6 Natural Language Generation David D. McDonald ...................................... 121 31 59 3 9 PART II Empirical and Statistical Approaches 7 Corpus Creation Richard Xiao .................................................................. 147 8 Treebank Annotation Eva Hajiˇcová, Anne Abeillé, Jan Hajiˇc, Jiˇrí Mírovský, and Zdeˇnka Urešová .................................................................................. 167 9 Fundamental Statistical Techniques Tong Zhang ......................................... 189 10 Part-of-Speech Tagging Tunga Güngör ...................................................... 205 11 Statistical Parsing Joakim Nivre................................................................. 237 12 Multiword Expressions Timothy Baldwin and Su Nam Kim ........................... 267 vii

viii Contents 13 Normalized Web Distance and Word Similarity Paul M.B. Vitányi and Rudi L. Cilibrasi .................................................................................. 293 14 Word Sense Disambiguation David Yarowsky ............................................. 315 15 An Overview of Modern Speech Recognition Xuedong Huang and Li Deng ...... 339 16 Alignment Dekai Wu .............................................................................. 367 17 Statistical Machine Translation Abraham Ittycheriah.................................... 409 PART III Applications 18 Chinese Machine Translation Pascale Fung ................................................ 425 19 Information Retrieval Jacques Savoy and Eric Gaussier ................................. 455 20 Question Answering Diego Mollá-Aliod and José-Luis Vicedo ........................ 485 21 Information Extraction Jerry R. Hobbs and Ellen Riloﬀ ................................. 511 22 Report Generation Leo Wanner ................................................................ 533 23 Emerging Applications of Natural Language Generation in Information Visualization, Education, and Health Care Barbara Di Eugenio and Nancy L. Green ................................................................................... 557 24 Ontology Construction Philipp Cimiano, Johanna Völker, and Paul Buitelaar .................................................................................... 577 25 BioNLP: Biomedical Text Mining K. Bretonnel Cohen .................................. 605 26 Sentiment Analysis and Subjectivity Bing Liu .............................................. 627 Index ............................................................................................................ 667

List of Figures Figure 1.1 The stages of analysis in processing natural language ........................................ Figure 3.1 A spelling rule FST for glasses ................................................................... Figure 3.2 A spelling rule FST for ﬂies ...................................................................... Figure 3.3 An FST with symbol classes ..................................................................... Figure 3.4 Russian nouns classes as an inheritance hierarchy ........................................... Figure 4.1 Example grammar ................................................................................ Figure 4.2 Syntax tree of the sentence “the old man a ship”.............................................. Figure 4.3 CKY matrix after parsing the sentence “the old man a ship” ................................ Figure 4.4 Final chart after bottom-up parsing of the sentence “the old man a ship.” The dotted edges are inferred but useless.................................................................... 4 34 34 36 48 62 62 67 71 Figure 4.5 Final chart after top-down parsing of the sentence “the old man a ship.” The dotted 71 edges are inferred but useless.................................................................... Figure 4.6 Example LR(0) table for the grammar in Figure 4.1 .......................................... 74 Figure 5.1 The lexical representation for the English verb build......................................... 99 Figure 5.2 UER diagrammatic modeling for transitive verb wake up ................................... 103 Figure 8.1 A scheme of annotation types (layers) ......................................................... 169 Figure 8.2 Example of a Penn treebank sentence .......................................................... 171 Figure 8.3 A simpliﬁed constituency-based tree structure for the sentence John wants to eat cakes ........................................................................... 171 Figure 8.4 A simpliﬁed dependency-based tree structure for the sentence John wants to eat cakes ........................................................................... 171 Figure 8.5 A sample tree from the PDT for the sentence: Česká opozice se nijak netají tím, že pokud se dostane k moci, nebude se deﬁcitnímu rozpočtu nijak bránit. (lit.: Czech opposition Reﬂ. in-no-way keeps-back the-fact that in-so-far-as [it] will-come into power, [it] will-not Reﬂ. deﬁcit budget in-no-way oppose. English translation: The Czech opposition does not keep back that if they come into power, they will not oppose the deﬁcit budget.) ................................................................................ 173 Figure 8.6 A sample French tree. (English translation: It is understood that the public functions remain open to all the citizens.) ................................................................. 174 Figure 8.7 Example from the Tiger corpus: complex syntactic and semantic dependency annotation. (English translation: It develops and prints packaging materials and labels.)......................................................................................... 174 Figure 9.1 Eﬀect of regularization ........................................................................... 191 Figure 9.2 Margin and linear separating hyperplane...................................................... 192 Figure 9.3 Multi-class linear classiﬁer decision boundary ................................................ 193 Figure 9.4 Graphical representation of generative model ................................................. 195 Figure 9.5 Graphical representation of discriminative model ............................................ 196 ix

分享到：

赞收藏

资料库

自然语言处理手册.pdf

相关推荐

课程资源

热门标签

最新资料