Natural Language Processing with Python(使用Python进行自然语言处理).pdf

发布时间：2022-06-08 发布人：admin 分类：说明书资料大小：5.18M 资料格式：pdf 举报版权申诉

lengwuqin-8660733-4744300845376802681.pdf-第1页.png

第1页 / 共504页

lengwuqin-8660733-4744300845376802681.pdf-第2页.png

第2页 / 共504页

lengwuqin-8660733-4744300845376802681.pdf-第3页.png

第3页 / 共504页

lengwuqin-8660733-4744300845376802681.pdf-第4页.png

第4页 / 共504页

lengwuqin-8660733-4744300845376802681.pdf-第5页.png

第5页 / 共504页

lengwuqin-8660733-4744300845376802681.pdf-第6页.png

第6页 / 共504页

lengwuqin-8660733-4744300845376802681.pdf-第7页.png

第7页 / 共504页

lengwuqin-8660733-4744300845376802681.pdf-第8页.png

第8页 / 共504页

Table of Contents

Preface

Audience

Emphasis

What You Will Learn

Organization

Why Python?

Software Requirements

Natural Language Toolkit (NLTK)

For Instructors

Conventions Used in This Book

Using Code Examples

Safari® Books Online

How to Contact Us

Acknowledgments

Royalties

Chapter 1. Language Processing and Python

1.1 Computing with Language: Texts and Words

Getting Started with Python

Getting Started with NLTK

Searching Text

Counting Vocabulary

1.2 A Closer Look at Python: Texts as Lists of Words

Lists

Indexing Lists

Variables

Strings

1.3 Computing with Language: Simple Statistics

Frequency Distributions

Fine-Grained Selection of Words

Collocations and Bigrams

Counting Other Things

1.4 Back to Python: Making Decisions and Taking Control

Conditionals

Operating on Every Element

Nested Code Blocks

Looping with Conditions

1.5 Automatic Natural Language Understanding

Word Sense Disambiguation

Pronoun Resolution

Generating Language Output

Machine Translation

Spoken Dialogue Systems

Textual Entailment

Limitations of NLP

1.6 Summary

1.7 Further Reading

1.8 Exercises

Chapter 2. Accessing Text Corpora and Lexical Resources

2.1 Accessing Text Corpora

Gutenberg Corpus

Web and Chat Text

Brown Corpus

Reuters Corpus

Inaugural Address Corpus

Annotated Text Corpora

Corpora in Other Languages

Text Corpus Structure

Loading Your Own Corpus

2.2 Conditional Frequency Distributions

Conditions and Events

Counting Words by Genre

Plotting and Tabulating Distributions

Generating Random Text with Bigrams

2.3 More Python: Reusing Code

Creating Programs with a Text Editor

Functions

Modules

2.4 Lexical Resources

Wordlist Corpora

A Pronouncing Dictionary

Comparative Wordlists

Shoebox and Toolbox Lexicons

2.5 WordNet

Senses and Synonyms

The WordNet Hierarchy

More Lexical Relations

Semantic Similarity

2.6 Summary

2.7 Further Reading

2.8 Exercises

Chapter 3. Processing Raw Text

3.1 Accessing Text from the Web and from Disk

Electronic Books

Dealing with HTML

Processing Search Engine Results

Processing RSS Feeds

Reading Local Files

Extracting Text from PDF, MSWord, and Other Binary Formats

Capturing User Input

The NLP Pipeline

3.2 Strings: Text Processing at the Lowest Level

Basic Operations with Strings

Printing Strings

Accessing Individual Characters

Accessing Substrings

More Operations on Strings

The Difference Between Lists and Strings

3.3 Text Processing with Unicode

What Is Unicode?

Extracting Encoded Text from Files

Using Your Local Encoding in Python

3.4 Regular Expressions for Detecting Word Patterns

Using Basic Metacharacters

Ranges and Closures

3.5 Useful Applications of Regular Expressions

Extracting Word Pieces

Doing More with Word Pieces

Finding Word Stems

Searching Tokenized Text

3.6 Normalizing Text

Stemmers

Lemmatization

3.7 Regular Expressions for Tokenizing Text

Simple Approaches to Tokenization

NLTK’s Regular Expression Tokenizer

Further Issues with Tokenization

3.8 Segmentation

Sentence Segmentation

Word Segmentation

3.9 Formatting: From Lists to Strings

From Lists to Strings

Strings and Formats

Lining Things Up

Writing Results to a File

Text Wrapping

3.10 Summary

3.11 Further Reading

3.12 Exercises

Chapter 4. Writing Structured Programs

4.1 Back to the Basics

Assignment

Equality

Conditionals

4.2 Sequences

Operating on Sequence Types

Combining Different Sequence Types

Generator Expressions

4.3 Questions of Style

Python Coding Style

Procedural Versus Declarative Style

Some Legitimate Uses for Counters

4.4 Functions: The Foundation of Structured Programming

Function Inputs and Outputs

Parameter Passing

Variable Scope

Checking Parameter Types

Functional Decomposition

Documenting Functions

4.5 Doing More with Functions

Functions As Arguments

Accumulative Functions

Higher-Order Functions

Named Arguments

4.6 Program Development

Structure of a Python Module

Multimodule Programs

Sources of Error

Debugging Techniques

Defensive Programming

4.7 Algorithm Design

Recursion

Space-Time Trade-offs

Dynamic Programming

4.8 A Sample of Python Libraries

Matplotlib

NetworkX

csv

NumPy

Other Python Libraries

4.9 Summary

4.10 Further Reading

4.11 Exercises

Chapter 5. Categorizing and Tagging Words

5.1 Using a Tagger

5.2 Tagged Corpora

Representing Tagged Tokens

Reading Tagged Corpora

A Simplified Part-of-Speech Tagset

Nouns

Verbs

Adjectives and Adverbs

Unsimplified Tags

Exploring Tagged Corpora

5.3 Mapping Words to Properties Using Python Dictionaries

Indexing Lists Versus Dictionaries

Dictionaries in Python

Defining Dictionaries

Default Dictionaries

Incrementally Updating a Dictionary

Complex Keys and Values

Inverting a Dictionary

5.4 Automatic Tagging

The Default Tagger

The Regular Expression Tagger

The Lookup Tagger

Evaluation

5.5 N-Gram Tagging

Unigram Tagging

Separating the Training and Testing Data

General N-Gram Tagging

Combining Taggers

Tagging Unknown Words

Storing Taggers

Performance Limitations

Tagging Across Sentence Boundaries

5.6 Transformation-Based Tagging

5.7 How to Determine the Category of a Word

Morphological Clues

Syntactic Clues

Semantic Clues

New Words

Morphology in Part-of-Speech Tagsets

5.8 Summary

5.9 Further Reading

5.10 Exercises

Chapter 6. Learning to Classify Text

6.1 Supervised Classification

Gender Identification

Choosing the Right Features

Document Classification

Part-of-Speech Tagging

Exploiting Context

Sequence Classification

Other Methods for Sequence Classification

6.2 Further Examples of Supervised Classification

Sentence Segmentation

Identifying Dialogue Act Types

Recognizing Textual Entailment

Scaling Up to Large Datasets

6.3 Evaluation

The Test Set

Accuracy

Precision and Recall

Confusion Matrices

Cross-Validation

6.4 Decision Trees

Entropy and Information Gain

6.5 Naive Bayes Classifiers

Underlying Probabilistic Model

Zero Counts and Smoothing

Non-Binary Features

The Naivete of Independence

The Cause of Double-Counting

6.6 Maximum Entropy Classifiers

The Maximum Entropy Model

Maximizing Entropy

Generative Versus Conditional Classifiers

6.7 Modeling Linguistic Patterns

What Do Models Tell Us?

6.8 Summary

6.9 Further Reading

6.10 Exercises

Chapter 7. Extracting Information from Text

7.1 Information Extraction

Information Extraction Architecture

7.2 Chunking

Noun Phrase Chunking

Tag Patterns

Chunking with Regular Expressions

Exploring Text Corpora

Chinking

Representing Chunks: Tags Versus Trees

7.3 Developing and Evaluating Chunkers

Reading IOB Format and the CoNLL-2000 Chunking Corpus

Simple Evaluation and Baselines

Training Classifier-Based Chunkers

7.4 Recursion in Linguistic Structure

Building Nested Structure with Cascaded Chunkers

Trees

Tree Traversal

7.5 Named Entity Recognition

7.6 Relation Extraction

7.7 Summary

7.8 Further Reading

7.9 Exercises

Chapter 8. Analyzing Sentence Structure

8.1 Some Grammatical Dilemmas

Linguistic Data and Unlimited Possibilities

Ubiquitous Ambiguity

8.2 What’s the Use of Syntax?

Beyond n-grams

8.3 Context-Free Grammar

A Simple Grammar

Writing Your Own Grammars

Recursion in Syntactic Structure

8.4 Parsing with Context-Free Grammar

Recursive Descent Parsing

Shift-Reduce Parsing

The Left-Corner Parser

Well-Formed Substring Tables

8.5 Dependencies and Dependency Grammar

Valency and the Lexicon

Scaling Up

8.6 Grammar Development

Treebanks and Grammars

Pernicious Ambiguity

Weighted Grammar

8.7 Summary

8.8 Further Reading

8.9 Exercises

Chapter 9. Building Feature-Based Grammars

9.1 Grammatical Features

Syntactic Agreement

Using Attributes and Constraints

Terminology

9.2 Processing Feature Structures

Subsumption and Unification

9.3 Extending a Feature-Based Grammar

Subcategorization

Heads Revisited

Auxiliary Verbs and Inversion

Unbounded Dependency Constructions

Case and Gender in German

9.4 Summary

9.5 Further Reading

9.6 Exercises

Chapter 10. Analyzing the Meaning of Sentences

10.1 Natural Language Understanding

Querying a Database

Natural Language, Semantics, and Logic

10.2 Propositional Logic

10.3 First-Order Logic

Syntax

First-Order Theorem Proving

Summarizing the Language of First-Order Logic

Truth in Model

Individual Variables and Assignments

Quantification

Quantifier Scope Ambiguity

Model Building

10.4 The Semantics of English Sentences

Compositional Semantics in Feature-Based Grammar

The λ-Calculus

Quantified NPs

Transitive Verbs

Quantifier Ambiguity Revisited

10.5 Discourse Semantics

Discourse Representation Theory

Discourse Processing

10.6 Summary

10.7 Further Reading

10.8 Exercises

Chapter 11. Managing Linguistic Data

11.1 Corpus Structure: A Case Study

The Structure of TIMIT

Notable Design Features

Fundamental Data Types

11.2 The Life Cycle of a Corpus

Three Corpus Creation Scenarios

Quality Control

Curation Versus Evolution

11.3 Acquiring Data

Obtaining Data from the Web

Obtaining Data from Word Processor Files

Obtaining Data from Spreadsheets and Databases

Converting Data Formats

Deciding Which Layers of Annotation to Include

Standards and Tools

Special Considerations When Working with Endangered Languages

11.4 Working with XML

Using XML for Linguistic Structures

The Role of XML

The ElementTree Interface

Using ElementTree for Accessing Toolbox Data

Formatting Entries

11.5 Working with Toolbox Data

Adding a Field to Each Entry

Validating a Toolbox Lexicon

11.6 Describing Language Resources Using OLAC Metadata

What Is Metadata?

OLAC: Open Language Archives Community

11.7 Summary

11.8 Further Reading

11.9 Exercises

Afterword: The Language Challenge

Language Processing Versus Symbol Processing

Contemporary Philosophical Divides

NLTK Roadmap

Envoi...

Bibliography

NLTK Index

General Index

Natural Language Processing with Python

Natural Language Processing with Python Steven Bird, Ewan Klein, and Edward Loper Beijing•Cambridge•Farnham•Köln•Sebastopol•Taipei•Tokyo

Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper Copyright © 2009 Steven Bird, Ewan Klein, and Edward Loper. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com. Julie Steele Editor: Production Editor: Loranah Dimant Copyeditor: Genevieve d’Entremont Proofreader: Loranah Dimant Printing History: June 2009: First Edition. Indexer: Ellen Troutman Zaig Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Robert Romano Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Natural Language Processing with Python, the image of a right whale, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information con- tained herein. ISBN: 978-0-596-51649-9 [M] 1244726609

Table of Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1. Language Processing and Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 10 16 22 27 33 34 35 1.1 Computing with Language: Texts and Words 1.2 A Closer Look at Python: Texts as Lists of Words 1.3 Computing with Language: Simple Statistics 1.4 Back to Python: Making Decisions and Taking Control 1.5 Automatic Natural Language Understanding 1.6 Summary 1.7 Further Reading 1.8 Exercises 2. Accessing Text Corpora and Lexical Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 39 52 56 59 67 73 73 74 2.1 Accessing Text Corpora 2.2 Conditional Frequency Distributions 2.3 More Python: Reusing Code 2.4 Lexical Resources 2.5 WordNet 2.6 Summary 2.7 Further Reading 2.8 Exercises 3. Processing Raw Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 80 87 93 97 102 107 109 112 116 3.1 Accessing Text from the Web and from Disk 3.2 Strings: Text Processing at the Lowest Level 3.3 Text Processing with Unicode 3.4 Regular Expressions for Detecting Word Patterns 3.5 Useful Applications of Regular Expressions 3.6 Normalizing Text 3.7 Regular Expressions for Tokenizing Text 3.8 Segmentation 3.9 Formatting: From Lists to Strings v

3.10 Summary 3.11 Further Reading 3.12 Exercises 121 122 123 4. Writing Structured Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 130 133 138 142 149 154 160 167 172 173 173 4.1 Back to the Basics 4.2 Sequences 4.3 Questions of Style 4.4 Functions: The Foundation of Structured Programming 4.5 Doing More with Functions 4.6 Program Development 4.7 Algorithm Design 4.8 A Sample of Python Libraries 4.9 Summary 4.10 Further Reading 4.11 Exercises 5. Categorizing and Tagging Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 179 181 189 198 202 208 210 213 214 215 5.1 Using a Tagger 5.2 Tagged Corpora 5.3 Mapping Words to Properties Using Python Dictionaries 5.4 Automatic Tagging 5.5 N-Gram Tagging 5.6 Transformation-Based Tagging 5.7 How to Determine the Category of a Word 5.8 Summary 5.9 Further Reading 5.10 Exercises 6. Learning to Classify Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 221 233 237 242 245 250 254 256 256 257 6.1 Supervised Classification 6.2 Further Examples of Supervised Classification 6.3 Evaluation 6.4 Decision Trees 6.5 Naive Bayes Classifiers 6.6 Maximum Entropy Classifiers 6.7 Modeling Linguistic Patterns 6.8 Summary 6.9 Further Reading 6.10 Exercises 7. Extracting Information from Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 261 7.1 Information Extraction vi | Table of Contents

分享到：

赞收藏

资料库

Natural Language Processing with Python(使用Python进行自然语言处理).pdf

相关推荐

课程资源

热门标签

最新资料