PARSING TECHNIQUES --A Pratical Guide .pdf

发布时间：2022-05-30 发布人：admin 分类：说明书资料大小：3.85M 资料格式：pdf 举报版权申诉

yanmiao0715-10036168-4744302543413317531.pdf-第1页.png

第1页 / 共677页

yanmiao0715-10036168-4744302543413317531.pdf-第2页.png

第2页 / 共677页

yanmiao0715-10036168-4744302543413317531.pdf-第3页.png

第3页 / 共677页

yanmiao0715-10036168-4744302543413317531.pdf-第4页.png

第4页 / 共677页

yanmiao0715-10036168-4744302543413317531.pdf-第5页.png

第5页 / 共677页

yanmiao0715-10036168-4744302543413317531.pdf-第6页.png

第6页 / 共677页

yanmiao0715-10036168-4744302543413317531.pdf-第7页.png

第7页 / 共677页

yanmiao0715-10036168-4744302543413317531.pdf-第8页.png

第8页 / 共677页

Contents

Preface to the Second Edition

Preface to the First Edition

1. Introduction

1.1 Parsing as a Craft

1.2 The Approach Used

1.3 Outline of the Contents

1.4 The Annotated Bibliography

2. Grammars as a Generating Device

2.1 Languages as Infinite Sets

2.1.1 Language

2.1.2 Grammars

2.1.3 Problems with Infinite Sets

2.1.4 Describing a Language through a Finite Recipe

2.2 Formal Grammars

2.2.1 The Formalism of Formal Grammars

2.2.2 Generating Sentences from a Formal Grammar

2.2.3 The Expressive Power of Formal Grammars

2.3 The Chomsky Hierarchy of Grammars and Languages

2.3.1 Type 1 Grammars

2.3.2 Type 2 Grammars

2.3.3 Type 3 Grammars

2.3.4 Type 4 Grammars

2.3.5 Conclusion

2.4 Actually Generating Sentences froma Grammar

2.4.1 The Phrase-Structure Case

2.4.2 The CS Case

2.4.3 The CF Case

2.5 To Shrink or Not To Shrink

2.6 Grammars that Produce the Empty Language

2.7 The Limitations of CF and FS Grammars

2.7.1 The uvwxy Theorem

2.7.2 The uvw Theorem

2.8 CF and FS Grammars as Transition Graphs

2.9 Hygiene in Context-Free Grammars

2.9.1 Undefined Non-Terminals

2.9.2 Unreachable Non-Terminals

2.9.3 Non-Productive Rules and Non-Terminals

2.9.4 Loops

2.9.5 Cleaning up a Context-Free Grammar

2.10 Set Properties of Context-Free and Regular Languages

2.11 The Semantic Connection

2.11.1 Attribute Grammars

2.11.2 Transduction Grammars

2.11.3 Augmented Transition Networks

2.12 A Metaphorical Comparison of Grammar Types

2.13 Conclusion

3. Introduction to Parsing

3.1 The Parse Tree

3.1.1 The Size of a Parse Tree

3.1.2 Various Kinds of Ambiguity

3.1.3 Linearization of the Parse Tree

3.2 Two Ways to Parse a Sentence

3.2.1 Top-Down Parsing

3.2.2 Bottom-Up Parsing

3.2.3 Applicability

3.3 Non-Deterministic Automata

3.3.1 Constructing the NDA

3.3.2 Constructing the Control Mechanism

3.4 Recognition and Parsing for Type 0 to Type 4 Grammars

3.4.1 Time Requirements

3.4.2 Type 0 and Type 1 Grammars

3.4.3 Type 2 Grammars

3.4.4 Type 3 Grammars

3.4.5 Type 4 Grammars

3.5 An Overview of Context-Free Parsing Methods

3.5.1 Directionality

3.5.2 Search Techniques

3.5.3 General Directional Methods

3.5.4 Linear Methods

3.5.5 Deterministic Top-Down and Bottom-Up Methods

3.5.6 Non-Canonical Methods

3.5.7 Generalized Linear Methods

3.5.8 Conclusion

3.6 The "Strength" of a Parsing Technique

3.7 Representations of Parse Trees

3.7.1 Parse Trees in the Producer-Consumer Model

3.7.2 Parse Trees in the Data Structure Model

3.7.3 Parse Forests

3.7.4 Parse-Forest Grammars

3.8 When are we done Parsing?

3.9 Transitive Closure

3.10 The Relation between Parsing and Boolean Matrix Multiplication

3.11 Conclusion

4. General Non-Directional Parsing

4.1 Unger's Parsing Method

4.1.1 Unger's Method without ε-Rules or Loops

4.1.2 Unger's Method with ε-Rules

4.1.3 Getting Parse-Forest Grammars from Unger Parsing

4.2 The CYK Parsing Method

4.2.1 CYK Recognition with General CF Grammars

4.2.2 CYK Recognition with a Grammar in Chomsky Normal Form

4.2.3 Transforming a CF Grammar into Chomsky Normal Form

4.2.4 The Example Revisited

4.2.5 CYK Parsing with Chomsky Normal Form

4.2.6 Undoing the Effect of the CNF Transformation

4.2.7 A Short Retrospective of CYK

4.2.8 Getting Parse-Forest Grammars from CYK Parsing

4.3 Tabular Parsing

4.3.1 Top-Down Tabular Parsing

4.3.2 Bottom-Up Tabular Parsing

4.4 Conclusion

5. Regular Grammars and Finite-State Automata

5.1 Applications of Regular Grammars

5.1.1 Regular Languages in CF Parsing

5.1.2 Systems with Finite Memory

5.1.3 Pattern Searching

5.1.4 SGML and XML Validation

5.2 Producing from a Regular Grammar

5.3 Parsing with a Regular Grammar

5.3.1 Replacing Sets by States

5.3.2 ε-Transitions and Non-Standard Notation

5.4 Manipulating Regular Grammars and Regular Expressions

5.4.1 Regular Grammars from Regular Expressions

5.4.2 Regular Expressions from Regular Grammars

5.5 Manipulating Regular Languages

5.6 Left-Regular Grammars

5.7 Minimizing Finite-State Automata

5.8 Top-Down Regular Expression Recognition

5.8.1 The Recognizer

5.8.2 Evaluation

5.9 Semantics in FS Systems

5.10 Fast Text Search Using Finite-State Automata

5.11 Conclusion

6. General Directional Top-Down Parsing

6.1 Imitating Leftmost Derivations

6.2 The Pushdown Automaton

6.3 Breadth-First Top-Down Parsing

6.3.1 An Example

6.3.2 A Counterexample: Left Recursion

6.4 Eliminating Left Recursion

6.5 Depth-First (Backtracking) Parsers

6.6 Recursive Descent

6.6.1 A Naive Approach

6.6.2 Exhaustive Backtracking Recursive Descent

6.6.3 Breadth-First Recursive Descent

6.7 Definite Clause Grammars

6.7.1 Prolog

6.7.2 The DCG Format

6.7.3 Getting Parse Tree Information

6.7.4 Running Definite Clause Grammar Programs

6.8 Cancellation Parsing

6.8.1 Cancellation Sets

6.8.2 The Transformation Scheme

6.8.3 Cancellation Parsing with ε-Rules

6.9 Conclusion

7. General Directional Bottom-Up Parsing

7.1 Parsing by Searching

7.1.1 Depth-First (Backtracking) Parsing

7.1.2 Breadth-First (On-Line) Parsing

7.1.3 A Combined Representation

7.1.4 A Slightly More Realistic Example

7.2 The Earley Parser

7.2.1 The Basic Earley Parser

7.2.2 The Relation between the Earley and CYK Algorithms

7.2.3 Handling ε-Rules

7.2.4 Exploiting Look-Ahead

7.2.5 Left and Right Recursion

7.3 Chart Parsing

7.3.1 Inference Rules

7.3.2 A Transitive Closure Algorithm

7.3.3 Completion

7.3.4 Bottom-Up (Actually Left-Corner)

7.3.5 The Agenda

7.3.6 Top-Down

7.3.7 Conclusion

7.4 Conclusion

8. Deterministic Top-Down Parsing

8.1 Replacing Search by Table Look-Up

8.2 LL(1) Parsing

8.2.1 LL(1) Parsing without ε-Rules

8.2.2 LL(1) Parsing with ε-Rules

8.2.3 LL(1) versus Strong-LL(1)

8.2.4 Full LL(1) Parsing

8.2.5 Solving LL(1) Conflicts

8.2.6 LL(1) and Recursive Descent

8.3 Increasing the Power of Deterministic LL Parsing

8.3.1 LL(k) Grammars

8.3.2 Linear-Approximate LL(k)

8.3.3 LL-Regular

8.4 Getting a Parse Tree Grammar from LL(1) Parsing

8.5 Extended LL(1) Grammars

8.6 Conclusion

9. Deterministic Bottom-Up Parsing

9.1 Simple Handle-Finding Techniques

9.2 Precedence Parsing

9.2.1 Parenthesis Generators

9.2.2 Constructing the Operator-Precedence Table

9.2.3 Precedence Functions

9.2.4 Further Precedence Methods

9.3 Bounded-Right-Context Parsing

9.3.1 Bounded-Context Techniques

9.3.2 Floyd Productions

9.4 LR Methods

9.5 LR(0)

9.5.1 The LR(0) Automaton

9.5.2 Using the LR(0) Automaton

9.5.3 LR(0) Conflicts

9.5.4 ε-LR(0) Parsing

9.5.5 Practical LR Parse Table Construction

9.6 LR(1)

9.6.1 LR(1) with ε-Rules

9.6.2 LR(k > 1) Parsing

9.6.3 Some Properties of LR(k) Parsing

9.7 LALR(1)

9.7.1 Constructing the LALR(1) Parsing Tables

9.7.2 Identifying LALR(1) Conflicts

9.8 SLR(1)

9.9 Conflict Resolvers

9.10 Further Developments of LR Methods

9.10.1 Elimination of Unit Rules

9.10.2 Reducing the Stack Activity

9.10.3 Regular Right Part Grammars

9.10.4 Incremental Parsing

9.10.5 Incremental Parser Generation

9.10.6 Recursive Ascent

9.10.7 Regular Expressions of LR Languages

9.11 Getting a Parse Tree Grammar from LR Parsing

9.12 Left and Right Contexts of Parsing Decisions

9.12.1 The Left Context of a State

9.12.2 The Right Context of an Item

9.13 Exploiting the Left and Right Contexts

9.13.1 Discriminating-Reverse (DR) Parsing

9.13.2 LR-Regular

9.13.3 LAR(m) Parsing

9.14 LR(k) as an Ambiguity Test

9.15 Conclusion

10. Non-Canonical Parsers

10.1 Top-Down Non-Canonical Parsing

10.1.1 Left-Corner Parsing

10.1.2 Deterministic Cancellation Parsing

10.1.3 Partitioned LL

10.1.4 Discussion

10.2 Bottom-Up Non-Canonical Parsing

10.2.1 Total Precedence

10.2.2 NSLR(1)

10.2.3 LR(k,∞)

10.2.4 Partitioned LR

10.3 General Non-Canonical Parsing

10.4 Conclusion

11. Generalized Deterministic Parsers

11.1 Generalized LR Parsing

11.1.1 The Basic GLR Parsing Algorithm

11.1.2 Necessary Optimizations

11.1.3 Hidden Left Recursion and Loops

11.1.4 Extensions and Improvements

11.2 Generalized LL Parsing

11.2.1 Simple Generalized LL Parsing

11.2.2 Generalized LL Parsing with Left-Recursion

11.2.3 Generalized LL Parsing with ε-Rules

11.2.4 Generalized Cancellation and LC Parsing

11.3 Conclusion

12. Substring Parsing

12.1 The Suffix Grammar

12.2 General (Non-Linear) Methods

12.2.1 A Non-Directional Method

12.2.2 A Directional Method

12.3 Linear-Time Methods for LL and LR Grammars

12.3.1 Linear-Time Suffix Parsing for LL(1) Grammars

12.3.2 Linear-Time Suffix Parsing for LR(1) Grammars

12.3.3 Tabular Methods

12.3.4 Discussion

12.4 Conclusion

13. Parsing as Intersection

13.1 The Intersection Algorithm

13.1.1 The Rule Sets I[sub(rules)], I[sub(rough)], and I

13.1.2 The Languages of [sub(rules)], I[sub(rough)], and I

13.1.3 An Example: Parsing Arithmetic Expressions

13.2 The Parsing of FSAs

13.2.1 Unknown Tokens

13.2.2 Substring Parsing by Intersection

13.2.3 Filtering

13.3 Time and Space Requirements

13.4 Reducing the Intermediate Size: Earley's Algorithm on FSAs

13.5 Error Handling Using Intersection Parsing

13.6 Conclusion

14. Parallel Parsing

14.1 The Reasons for Parallel Parsing

14.2 Multiple Serial Parsers

14.3 Process-Configuration Parsers

14.3.1 A Parallel Bottom-up GLR Parser

14.3.2 Some Other Process-Configuration Parsers

14.4 Connectionist Parsers

14.4.1 Boolean Circuits

14.4.2 A CYK Recognizer on a Boolean Circuit

14.4.3 Rytter's Algorithm

14.5 Conclusion

15. Non-Chomsky Grammars and Their Parsers

15.1 The Unsuitability of Context-Sensitive Grammars

15.1.1 Understanding Context-Sensitive Grammars

15.1.2 Parsing with Context-Sensitive Grammars

15.1.3 Expressing Semantics in Context-Sensitive Grammars

15.1.4 Error Handling in Context-Sensitive Grammars

15.1.5 Alternatives

15.2 Two-Level Grammars

15.2.1 VW Grammars

15.2.2 Expressing Semantics in a VW Grammar

15.2.3 Parsing with VW Grammars

15.2.4 Error Handling in VW Grammars

15.2.5 Infinite Symbol Sets

15.3 Attribute and Affix Grammars

15.3.1 Attribute Grammars

15.3.2 Affix Grammars

15.4 Tree-Adjoining Grammars

15.4.1 Cross-Dependencies

15.4.2 Parsing with TAGs

15.5 Coupled Grammars

15.5.1 Parsing with Coupled Grammars

15.6 Ordered Grammars

15.6.1 Rule Ordering by Control Grammar

15.6.2 Parsing with Rule-Ordered Grammars

15.6.3 Marked Ordered Grammars

15.6.4 Parsing with Marked Ordered Grammars

15.7 Recognition Systems

15.7.1 Properties of a Recognition System

15.7.2 Implementing a Recognition System

15.7.3 Parsing with Recognition Systems

15.7.4 Expressing Semantics in Recognition Systems

15.7.5 Error Handling in Recognition Systems

15.8 Boolean Grammars

15.8.1 Expressing Context Checks in Boolean Grammars

15.8.2 Parsing with Boolean Grammars

15.8.3 §-Calculus

15.9 Conclusion

16. Error Handling

16.1 Detection versus Recovery versus Correction

16.2 Parsing Techniques and Error Detection

16.2.1 Error Detection in Non-Directional Parsing Methods

16.2.2 Error Detection in Finite-State Automata

16.2.3 Error Detection in General Directional Top-Down Parsers

16.2.4 Error Detection in General Directional Bottom-Up Parsers

16.2.5 Error Detection in Deterministic Top-Down Parsers

16.2.6 Error Detection in Deterministic Bottom-Up Parsers

16.3 Recovering from Errors

16.4 Global Error Handling

16.5 Regional Error Handling

16.5.1 Backward/Forward Move Error Recovery

16.5.2 Error Recovery with Bounded-Context Grammars

16.6 Local Error Handling

16.6.1 Panic Mode

16.6.2 FOLLOW-Set Error Recovery

16.6.3 Acceptable-Sets Derived from Continuations

16.6.4 Insertion-Only Error Correction

16.6.5 Locally Least-Cost Error Recovery

16.7 Non-Correcting Error Recovery

16.7.1 Detection and Recovery

16.7.2 Locating the Error

16.8 Ad Hoc Methods

16.8.1 Error Productions

16.8.2 Empty Table Slots

16.8.3 Error Tokens

16.9 Conclusion

17. Practical Parser Writing and Usage

17.1 A Comparative Survey

17.1.1 Considerations

17.1.2 General Parsers

17.1.3 General Substring Parsers

17.1.4 Linear-Time Parsers

17.1.5 Linear-Time Substring Parsers

17.1.6 Obtaining and Using a Parser Generator

17.2 Parser Construction

17.2.1 Interpretive, Table-Based, and Compiled Parsers

17.2.2 Parsing Methods and Implementations

17.3 A Simple General Context-Free Parser

17.3.1 Principles of the Parser

17.3.2 The Program

17.3.3 Handling Left Recursion

17.3.4 Parsing in Polynomial Time

17.4 Programming Language Paradigms

17.4.1 Imperative and Object-Oriented Programming

17.4.2 Functional Programming

17.4.3 Logic Programming

17.5 Alternative Uses of Parsing

17.5.1 Data Compression

17.5.2 Machine Code Generation

17.5.3 Support of Logic Languages

17.6 Conclusion

18. Annotated Bibliography

18.1 Major Parsing Subjects

18.1.1 Unrestricted PS and CS Grammars

18.1.2 General Context-Free Parsing

18.1.3 LL Parsing

18.1.4 LR Parsing

18.1.5 Left-Corner Parsing

18.1.6 Precedence and Bounded-Right-Context Parsing

18.1.7 Finite-State Automata

18.1.8 General Books and Papers on Parsing

18.2 Advanced Parsing Subjects

18.2.1 Generalized Deterministic Parsing

18.2.2 Non-Canonical Parsing

18.2.3 Substring Parsing

18.2.4 Parsing as Intersection

18.2.5 Parallel Parsing Techniques

18.2.6 Non-Chomsky Systems

18.2.7 Error Handling

18.2.8 Incremental Parsing

18.3 Parsers and Applications

18.3.1 Parser Writing

18.3.2 Parser-Generating Systems

18.3.3 Applications

18.3.4 Parsing and Deduction

18.3.5 Parsing Issues in Natural Language Handling

18.4 Support Material

18.4.1 Formal Languages

18.4.2 Approximation Techniques

18.4.3 Transformations on Grammars

18.4.4 Miscellaneous Literature

A. Hints and Solutions to Selected Problems

Author Index

Subject Index

Monographs in Computer Science Editors David Gries Fred B. Schneider

Monographs in Computer Science Abadi and Cardelli, A Theory of Objects Benosman and Kang [editors], Panoramic Vision: Sensors, Theory, and Applications Bhanu, Lin, Krawiec, Evolutionary Synthesis of Pattern Recognition Systems Broy and Stølen, Specification and Development of Interactive Systems: FOCUS on Streams, Interfaces, and Refinement Brzozowski and Seger, Asynchronous Circuits Burgin, Super-Recursive Algorithms Cantone, Omodeo, and Policriti, Set Theory for Computing: From Decision Procedures to Declarative Programming with Sets Castillo, Gutiérrez, and Hadi, Expert Systems and Probabilistic Network Models Downey and Fellows, Parameterized Complexity Feijen and van Gasteren, On a Method of Multiprogramming Grune and Jacobs, Parsing Techniques: A Practical Guide, Second Edition Herbert and Spärck Jones [editors], Computer Systems: Theory, Technology, and Applications Leiss, Language Equations Levin, Heydon, Mann, and Yu, Software Configuration Management Using VESTA Mclver and Morgan [editors], Programming Methodology Mclver and Morgan [editors], Abstraction, Refinement and Proof for Probabilistic Systems Misra, A Discipline of Multiprogramming: Programming Theory for Distributed Applications Nielson [editor], ML with Concurrency Paton [editor], Active Rules in Database Systems Poernomo, Crossley, and Wirsing, Adapting Proof-as-Programs: The Curry-Howard Protocol Selig, Geometrical Methods in Robotics Selig, Geometric Fundamentals of Robotics, Second Edition Shasha and Zhu, High Performance Discovery in Time Series: Techniques and Case Studies Tonella and Potrich, Reverse Engineering of Object Oriented Code

Dick Grune Ceriel J.H. Jacobs Parsing Techniques A Practical Guide Second Edition

Dick Grune and Ceriel J.H. Jacobs Faculteit Exacte Wetenschappen Vrije Universiteit De Boelelaan 1081 1081 HV Amsterdam The Netherlands Series Editors David Gries Department of Computer Science Cornell University 4130 Upson Hall Ithaca, NY 14853-7501 USA Fred P. Schneider Department of Computer Science Cornell University 4130 Upson Hall Ithaca, NY 14853-7501 USA ISBN-13: 978-0-387-20248-8 e-ISBN-13: 978-0-387-68954-8 Library of Congress Control Number: 2007936901 ©2008 Springer Science+Business Media, LLC ©1990 Ellis Horwood Ltd. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper. 9 8 7 6 5 4 3 2 1 springer.com (SB)

Preface to the Second Edition As is ﬁt, this second edition arose out of our readers’ demands to read about new developments and our desire to write about them. Although parsing techniques is not a fast moving ﬁeld, it does move. When the ﬁrst edition went to press in 1990, there was only one tentative and fairly restrictive algorithm for linear-time substring parsing. Now there are several powerful ones, covering all deterministic languages; we describe them in Chapter 12. In 1990 Theorem 8.1 from a 1961 paper by Bar- Hillel, Perles, and Shamir lay gathering dust; in the last decade it has been used to create new algorithms, and to obtain insight into existing ones. We report on this in Chapter 13. More and more non-Chomsky systems are used, especially in linguistics. None except two-level grammars had any prominence 20 years ago; we now describe six of them in Chapter 15. Non-canonical parsers were considered oddities for a very long time; now they are among the most powerful linear-time parsers we have; see Chapter 10. Although still not very practical, marvelous algorithms for parallel parsing have been designed that shed new light on the principles; see Chapter 14. In 1990 a gen- eralized LL parser was deemed impossible; now we describe two in Chapter 11. Traditionally, and unsurprisingly, parsers have been used for parsing; more re- cently they are also being used for code generation, data compression and logic language implementation, as shown in Section 17.5. Enough. The reader can ﬁnd more developments in many places in the book and in the Annotated Bibliography in Chapter 18. Kees van Reeuwijk has — only half in jest — called our book “a reservation for endangered parsers”. We agree — partly; it is more than that — and we make no apologies. Several algorithms in this book have very limited or just no practical value. We have included them because we feel they embody interesting ideas and offer food for thought; they might also grow and acquire practical value. But we also include many algorithms that do have practical value but are sorely underused; describing them here might raise their status in the world.

vi Preface to the Second Edition Exercises and Problems This book is not a textbook in the school sense of the word. Few universities have a course in Parsing Techniques, and, as stated in the Preface to the First Edition, read- ers will have very different motivations to use this book. We have therefore included hardly any questions or tasks that exercise the material contained within this book; readers can no doubt make up such tasks for themselves. The questions posed in the problem sections at the end of each chapter usually require the reader to step outside the bounds of the covered material. The problems have been divided into three not too well-deﬁned classes: • not marked — probably doable in a few minutes to a couple of hours. • marked Project — probably a lot of work, but almost certainly doable. • marked Research Project — almost certainly a lot of work, but hopefully doable. We make no claims as to the relevance of any of these problems; we hope that some readers will ﬁnd some of them enlightening, interesting, or perhaps even useful. Ideas, hints, and partial or complete solutions to a number of the problems can be found in Chapter A. There are also a few questions on formal language that were not answered eas- ily in the existing literature but have some importance to parsing. These have been marked accordingly in the problem sections. Annotated Bibliography For the ﬁrst edition, we, the authors, read and summarized all papers on parsing that we could lay our hands on. Seventeen years later, with the increase in publica- tions and easier access thanks to the Internet, that is no longer possible, much to our chagrin. In the ﬁrst edition we included all relevant summaries. Again that is not pos- sible now, since doing so would have greatly exceeded the number of pages allotted to this book. The printed version of this second edition includes only those refer- ences to the literature and their summaries that are actually referred to in this book. The complete bibliography with summaries as far as available can be found on the web site of this book; it includes its own authors index and subject index. This setup also allows us to list without hesitation technical reports and other material of possi- bly low accessibility. Often references to sections from Chapter 18 refer to the Web version of those sections; attention is drawn to this by calling them “(Web)Sections”. We do not supply URLs in this book, for two reasons: they are ephemeral and may be incorrect next year, tomorrow, or even before the book is printed; and, es- pecially for software, better URLs may be available by the time you read this book. The best URL is a few well-chosen search terms submitted to a good Web search engine. Even in the last ten years we have seen a number of Ph.D theses written in lan- guages other than English, speciﬁcally German, French, Spanish and Estonian. This choice of language has the regrettable but predictable consequence that their con- tents have been left out of the main stream of science. This is a loss, both to the authors and to the scientiﬁc community. Whether we like it or not, English is the de facto standard language of present-day science. The time that a scientiﬁcally in-

Preface to the Second Edition vii terested gentleman of leisure could be expected to read French, German, English, Greek, Latin and a tad of Sanskrit is 150 years in the past; today, students and sci- entists need the room in their heads and the time in their schedules for the vastly increased amount of knowledge. Although we, the authors, can still read most (but not all) of the above languages and have done our best to represent the contents of the non-English theses adequately, this will not sufﬁce to give them the international attention they deserve. The Future of Parsing, aka The Crystal Ball If there will ever be a third edition of this book, we expect it to be substantially thinner (except for the bibliography section!). The reason is that the more parsing algorithms one studies the more they seem similar, and there seems to be great op- portunity for uniﬁcation. Basically almost all parsing is done by top-down search with left-recursion protection; this is true even for traditional bottom-up techniques like LR(1), where the top-down search is built into the LR(1) parse tables. In this respect it is signiﬁcant that Earley’s method is classiﬁed as top-down by some and as bottom-up by others. The general memoizing mechanism of tabular parsing takes the exponential sting out of the search. And it seems likely that transforming the usual depth-ﬁrst search into breadth-ﬁrst search will yield many of the generalized deterministic algorithms; in this respect we point to Sikkel’s Ph.D thesis [158]. To- gether this seems to cover almost all algorithms in this book, including parsing by intersection. Pure bottom-up parsers without a top-down component are rare and not very powerful. So in the theoretical future of parsing we see considerable simpliﬁcation through uniﬁcation of algorithms; the role that parsing by intersection can play in this is not clear. The simpliﬁcation does not seem to extend to formal languages: it is still as difﬁcult to prove the intuitively obvious fact that all LL(1) grammars are LR(1) as it was 35 years ago. The practical future of parsing may lie in advanced pattern recognition, in addi- tion to its traditional tasks; the practical contributions of parsing by intersection are again not clear. Amsterdam, Amstelveen June 2007 Dick Grune Ceriel J.H. Jacobs

分享到：

赞收藏

资料库

PARSING TECHNIQUES --A Pratical Guide .pdf

相关推荐

开发技术

热门标签

最新资料