logo资料库

CUP 的使用方法 java 大全.doc

第1页 / 共24页
第2页 / 共24页
第3页 / 共24页
第4页 / 共24页
第5页 / 共24页
第6页 / 共24页
第7页 / 共24页
第8页 / 共24页
资料共24页,剩余部分请下载后查看
CUP User's Manual Scott E. Hudson Graphics Visualization and Usability Center Georgia Institute of Technology Modified by Frank Flannery, C. Scott Ananian, Dan Wang with advice from Andrew W. Appel Last updated July 1999 (v0.10j) Table of Contents i. 1. 2. 3. 4. 5. 6. 7. A. B. About CUP Version 0.10 Introduction and Example Specification Syntax Running CUP Customizing the Parser Scanner interface Error Recovery Conclusion References Grammar for CUP Specification Files A Very Simple Example Scanner
C. D. E. Incompatibilites between CUP 0.9 and CUP 0.10 Bugs Change log i. About CUP Version 0.10 Version 0.10 of CUP adds many new changes and features over the previous releases of version 0.9. These changes attempt to make CUP more like its predecessor, YACC. As a result, the old 0.9 parser specifications for CUP are not compatible and a reading of appendix C of the new manual will be necessary to write new specifications. The new version, however, gives the user more power and options, making parser specifications easier to write. 1. Introduction and Example This manual describes the basic operation and use of the Java(tm) Based Constructor of Useful Parsers (CUP for short). CUP is a system for generating LALR parsers from simple specifications. It serves the same role as the widely used program YACC [1] and in fact offers most of the features of YACC. However, CUP is written in Java, uses specifications including embedded Java code, and produces parsers which are implemented in Java. Although this manual covers all aspects of the CUP system, it is relatively brief, and assumes you have at least a little bit of knowledge of LR parsing. A working knowledge of YACC is also very helpful in understanding how CUP specifications work. A number of compiler construction textbooks (such as [2,3]) cover this material, and discuss the YACC system (which is quite similar to this one) as a specific example. Using CUP involves creating a simple specification based on the grammar for which a parser is needed, along with construction of a scanner capable of breaking characters up into meaningful tokens (such as keywords, numbers, and special symbols). As a simple example, consider a system for evaluating simple arithmetic expressions over integers. This system would read expressions from standard input (each terminated with a semicolon), evaluate them, and print the result on standard output. A grammar for the input to such a system might look like: expr_list ::= expr_list expr_part | expr_part expr_part ::= expr ';' expr ::= expr '+' expr | expr '-' expr | expr '*' expr | expr '/' expr | expr '%' expr | '(' expr ')' | '-' expr | number To specify a parser based on this grammar, our first step is to identify and name the set of terminal symbols that will appear on input, and the set of non-terminal symbols. In this case, the non-terminals are: expr_list, expr_part and expr. For terminal names we might choose: SEMI, PLUS, MINUS, TIMES, DIVIDE, MOD, NUMBER, LPAREN,
and RPAREN The experienced user will note a problem with the above grammar. It is ambiguous. An ambiguous grammar is a grammar which, given a certain input, can reduce the parts of the input in two different ways such as to give two different answers. Take the above grammar, for example. given the following input: 3 + 4 * 6 The grammar can either evaluate the 3 + 4 and then multiply seven by six, or it can evaluate 4 * 6 and then add three. Older versions of CUP forced the user to write unambiguous grammars, but now there is a construct allowing the user to specify precedences and associativities for terminals. This means that the above ambiguous grammar can be used, after specifying precedences and associativities. There is more explanation later. Based on these namings we can construct a small CUP specification as follows: // CUP specification for a simple expression evaluator (no actions) import java_cup.runtime.*; /* Preliminaries to set up and use the scanner. */ init with {: scanner.init(); :}; scan with {: return scanner.next_token(); :}; /* Terminals (tokens returned by the scanner). */ terminal terminal terminal Integer NUMBER; SEMI, PLUS, MINUS, TIMES, DIVIDE, MOD; UMINUS, LPAREN, RPAREN; /* Non terminals */ non terminal non terminal Integer expr_list, expr_part; expr, term, factor; /* Precedences */ precedence left PLUS, MINUS; precedence left TIMES, DIVIDE, MOD; precedence left UMINUS; /* The grammar */ expr_list ::= expr_list expr_part | expr_part; expr_part ::= expr SEMI; expr ::= expr PLUS expr | expr MINUS expr | expr TIMES expr | expr DIVIDE expr | expr MOD expr
| MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; We will consider each part of the specification syntax in detail later. However, here we can quickly see that the specification contains four main parts. The first part provides preliminary and miscellaneous declarations to specify how the parser is to be generated, and supply parts of the runtime code. In this case we indicate that the java_cup.runtime classes should be imported, then supply a small bit of initialization code, and some code for invoking the scanner to retrieve the next input token. The second part of the specification declares terminals and non-terminals, and associates object classes with each. In this case, the terminals are declared as either with no type, or of type Integer. The specified type of the terminal or non-terminal is the type of the value of those terminals or non-terminals. If no type is specified, the terminal or non-terminal carries no value. Here, no type indicates that these terminals and non-terminals hold no value. The third part specifies the precedence and associativity of terminals. The last precedence declaration give its terminals the highest precedence. The final part of the specification contains the grammar. To produce a parser from this specification we use the CUP generator. If this specification were stored in a file parser.cup, then (on a Unix system at least) we might invoke CUP using a command like: java java_cup.Main < parser.cup In this case, the system will produce two Java source files containing parts of the generated parser: sym.java and parser.java. As you might expect, these two files contain declarations for the classes sym and parser. The sym class contains a series of constant declarations, one for each terminal symbol. This is typically used by the scanner to refer to symbols (e.g. with code such as "return new Symbol(sym.SEMI);" ). The parser class implements the parser itself. The specification above, while constructing a full parser, does not perform any semantic actions &emdash; it will only indicate success or failure of a parse. To calculate and print values of each expression, we must embed Java code within the parser to carry out actions at various points. In CUP, actions are contained in code strings which are surrounded by delimiters of the form {: and :} (we can see examples of this in the init with and scan with clauses above). In general, the system records all characters within the delimiters, but does not try to check that it contains valid Java code. A more complete CUP specification for our example system (with actions embedded at various points in the grammar) is shown below: // CUP specification for a simple expression evaluator (w/ actions) import java_cup.runtime.*; /* Preliminaries to set up and use the scanner. */
init with {: scanner.init(); :}; scan with {: return scanner.next_token(); :}; /* Terminals (tokens returned by the scanner). */ terminal terminal terminal Integer NUMBER; SEMI, PLUS, MINUS, TIMES, DIVIDE, MOD; UMINUS, LPAREN, RPAREN; /* Non-terminals */ non terminal non terminal Integer expr_list, expr_part; expr; /* Precedences */ precedence left PLUS, MINUS; precedence left TIMES, DIVIDE, MOD; precedence left UMINUS; /* The grammar */ expr_list ::= expr_list expr_part | expr_part; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI ; expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2 {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 TIMES expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIVIDE expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | expr:e1 MOD expr:e2 {: RESULT = new Integer(e1.intValue() % e2.intValue()); :} | NUMBER:n {: RESULT = n; :}
| MINUS expr:e {: RESULT = new Integer(0 - e.intValue()); :} %prec UMINUS | LPAREN expr:e RPAREN {: RESULT = e; :} ; Here we can see several changes. Most importantly, code to be executed at various points in the parse is included inside code strings delimited by {: and:}. In addition, labels have been placed on various symbols in the right hand side of productions. For example in: expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} the first non-terminal expr has been labeled with e1, and the second with e2. The left hand side value of each production is always implicitly labeled asRESULT.Each symbol appearing in a production is represented at runtime by an object of type Symbol on the parse stack. The labels refer to the instance variablevalue in those objects. In the expression expr:e1 PLUS expr:e2, e1 and e2 refer to objects of type Integer. These objects are in the value fields of the objects of type Symbol representing those non-terminals on the parse stack. RESULT is of type Integer as well, since the resulting non-terminal expr was declared as of type Integer. This object becomes the value instance variable of a new Symbol object. For each label, two more variables accessible to the user are declared. A left and right value labels are passed to the code string, so that the user can find out where the left and right side of each terminal or non-terminal is in the input stream. The name of these variables is the label name, plus left orright. for example, given the right hand side of a production expr:e1 PLUS expr:e2 the user could not only access variables e1 and e2, but also e1left, e1right, e2left and e2right. these variables are of type int. The final step in creating a working parser is to create a scanner (also known as a lexical analyzer or simply a lexer). This routine is responsible for reading individual characters, removing things things like white space and comments, recognizing which terminal symbols from the grammar each group of characters represents, then returning Symbol objects representing these symbols to the parser. The terminals will be retrieved with a call to the scanner function. In the example, the should parser will type java_cup.runtime.Symbol. This than older versions of CUP's java_cup.runtime.symbol. These Symbol objects contains the instance variable value of type Object, which should be set by the lexer. This variable refers to the value of that symbol, and the type of object in value should be of the same type as declared in the terminal andnon terminal declarations. In the above example, if the lexer wished to pass a NUMBER token, it should create a Symbol with the value instance variable filled with an object of type Integer. Symbol objects corresponding to terminals and non-terminals with no value have a null value field. scanner is very different scanner.next_token(). The type call return objects of
The code contained in the init with clause of the specification will be executed before any tokens are requested. Each token will be requested using whatever code is found in the scan with clause. Beyond this, the exact form the scanner takes is up to you; however note that each call to the scanner function should return a new instance of java_cup.runtime.Symbol (or a subclass). These symbol objects are annotated with parser information and pushed onto a stack; reusing objects will result in the parser annotations being scrambled. As of CUP 0.10j, Symbol reuse should be detected if it occurs; the parser will throw an Errortelling you to fix your scanner. In the next section a more detailed and formal explanation of all parts of a CUP specification will be given. Section 3 describes options for running the CUP system. Section 4 discusses the details of how to customize a CUP parser, while section 5 discusses the scanner interface added in CUP 0.10j. Section 6considers error recovery. Finally, Section 7 provides a conclusion. 2. Specification Syntax Now that we have seen a small example, we present a complete description of all parts of a CUP specification. A specification has four sections with a total of eight specific parts (however, most of these are optional). A specification consists of:      package and import specifications, user code components, symbol (terminal and non-terminal) lists, precedence declarations, and the grammar. Each of these parts must appear in the order presented here. (A complete grammar for the specification language is given in Appendix A.) The particulars of each part of the specification are described in the subsections below. Package and Import Specifications A specification begins with optional package and import declarations. These have the same syntax, and play the same role, as the package and import declarations found in a normal Java program. A package declaration is of the form: package name; where name name is a Java package identifier, possibly in several parts separated by ".". In general, CUP employs Java lexical conventions. So for example, both styles of Java comments are supported, and identifiers are constructed beginning with a letter, dollar sign ($), or underscore (_), which can then be followed by zero or more letters, numbers, dollar signs, and underscores. After an optional package declaration, there can be zero or more import declarations. As in a Java program these have the form: import package_name.class_name; or import package_name.*; The package declaration indicates what package the sym and parser classes that are generated by the system will be in. Any import declarations that appear in the specification will also appear in
the source file for the parser class allowing various names from that package to be used directly in user supplied action code. User Code Components Following the optional package and import declarations are a series of optional declarations that allow user code to be included as part of the generated parser (see Section 4 for a full description of how the parser uses this code). As a part of the parser file, a separate non-public class to contain all embedded user actions is produced. The first action code declaration section allows code to be included in this class. Routines and variables for use by the code embedded in the grammar would normally be placed in this section (a typical example might be symbol table manipulation routines). This declaration takes the form: action code {: ... :}; where {: ... :} is a code string whose contents will be placed directly within the action class class declaration. After the action code declaration is an optional parser code declaration. This declaration allows methods and variable to be placed directly within the generated parser class. Although this is less common, it can be helpful when customizing the parser &emdash; it is possible for example, to include scanning methods inside the parser and/or override the default error reporting routines. This declaration is very similar to the action code declaration and takes the form: parser code {: ... :}; Again, code from the code string is placed directly into the generated parser class definition. Next in the specification is the optional init declaration which has the form: init with {: ... :}; This declaration provides code that will be executed by the parser before it asks for the first token. Typically, this is used to initialize the scanner as well as various tables and other data structures that might be needed by semantic actions. In this case, the code given in the code string forms the body of a void method inside the parser class. The final (optional) user code section of the specification indicates how the parser should ask for the next token from the scanner. This has the form: scan with {: ... :}; As with the init clause, the contents of the code string forms the body of a method in the generated parser. However, in this case the method returns an object of type java_cup.runtime.Symbol. Consequently the code found in the scan with clause should return such a value. See section 5 for information on the default behavior if the scan with section is omitted. As of CUP 0.10j the action code, parser code, init code, and scan with sections may appear in any order. They must, however, precede the symbol lists. Symbol Lists Following user supplied code comes the first required part of the specification: the symbol lists. These declarations are responsible for naming and supplying a type for each terminal and
分享到:
收藏