logo资料库

SmPL语法文档说明.pdf

第1页 / 共31页
第2页 / 共31页
第3页 / 共31页
第4页 / 共31页
第5页 / 共31页
第6页 / 共31页
第7页 / 共31页
第8页 / 共31页
资料共31页,剩余部分请下载后查看
Program
Metavariables for transformations
Metavariables for scripts
Control Flow
Basic dots
Dot variants
An example
Transformation
Basic transformations
Advanced transformations
Types
Function declarations
Declarations
Statements
Expressions
Constants, Identifiers and Types for Transformations
Comments and preprocessor directives
Command-line semantic match
Iteration
.cocciconfig support
Examples
Function renaming
Removing a function argument
Introduction of a macro
Look for NULL dereference
Reference counter: the of_xxx API
Filtering identifiers, declarers or iterators with regular expressions
Tips and Tricks
How to remove useless parentheses?
The SmPL Grammar (version 1.0.6 ) Research group on Coccinelle September 27, 2016 This document presents the grammar of the SmPL language used by the Coccinelle tool. For the most part, the grammar is written using standard notation. In some rules, however, the left-hand side is in all uppercase letters. These are macros, which take one or more grammar rule right-hand-sides as arguments. The grammar also uses some unspecified nonterminals, such as id, const, etc. These refer to the sets suggested by the name, i.e., id refers to the set of possible C-language identifiers, while const refers to the set of possible C-language constants. A square bracket that is surrounded by spaces in the description of a term should appear explicitly in the term, as in an array reference. On the other hand, square brackets that surround some other term indicate that the presence of that term is optional. An HTML version of this documentation is available online at http://coccinelle.lip6.fr/docs/ main_grammar.html. 1 Program program include_cocci changeset ::= include_cocci∗ changeset+ ::= include string | using string | using pathToIsoFile | virtual id (, id)∗ ::= metavariables transformation | script_metavariables script_code script_code is any code in the chosen scripting language. Parsing of the semantic patch does not check the validity of this code; any errors are first detected when the code is executed. Furthermore, @ should not be use in this code. Spatch scans the script code for the next @ and considers that to be the beginning of the next rule, even if @ occurs within e.g., a comment. virtual keyword is used to declare virtual rules. Virtual rules may be subsequently used as a dependency for the rules in the SmPL file. Whether a virtual rule is defined or not is controlled by the -D option on the command line. 2 Metavariables for transformations The rulename portion of the metavariable declaration can specify properties of a rule such as its name, the names of the rules that it depends on, the isomorphisms to be used in processing the rule, and whether quantification over paths should be universal or existential. The optional annotation expression indicates that the pattern is to be considered as matching an expression, and thus can be used to avoid some parsing problems. The metadecl portion of the metavariable declaration defines various types of metavariables that will be used for matching in the transformation section. 1
metavariables rulename dep [depends on dep] [iso] [disable-iso] [exists] [expression] ::= @@ metadecl∗ @@ | @ rulename @ metadecl∗ @@ ::= id [extends id] ::= id | !id | !(dep) | ever id | never id | | | file in string | (dep) ::= using string (, string)∗ ::= disable COMMA_LIST(id) ::= exists | forall dep && dep dep || dep iso disable-iso exists COMMA_LIST(elem) ::= elem (, elem)∗ The keyword disable is normally used with the names of isomorphisms defined in standard.iso or whatever isomorphism file has been included. There are, however, some other isomorphisms that are built into the implementa- tion of Coccinelle and that can be disabled as well. Their names are given below. In each case, the text describes the standard behavior. Using disable-iso with the given name disables this behavior. • optional_storage: A SmPL function definition that does not specify any visibility (i.e., static or extern), or a SmPL variable declaration that does not specify any storage (i.e., auto, static, register, or extern), matches a function declaration or variable declaration with any visibility or storage, respectively. • optional_qualifier: This is similar to optional_storage, except that here it is the qualifier (i.e., const or volatile) that does not have to be specified in the SmPL code, but may be present in the C code. • optional_attributes: This is similar to optional_attributes, except that here is it an attribute (e.g., __init) that does not have to be specified in the SmPL code, but may be present in the C code. Note that this isomorphism is currently useless, because matching of attributes is not supported, due to the difficulty of parsing attributes in C code. • value_format: Integers in various formats, e.g., 1 and 0x1, are considered to be equivalent in the matching process. • optional_declarer_semicolon: Some declarers (top-level terms that look like function calls but serve to declare some variable) don’t require a semicolon. This isomorphism allows a SmPL declarer with a semicolon to match such a C declarer, if no transformation is specified on the SmPL semicolon. • comm_assoc: An expression of the form exp bin_op ..., where bin_op is commutative and associative, is considered to match any top-level sequence of bin_op operators containing exp as the top-level argument. • prototypes: A rule for transforming a function prototype is generated when a function header changes. The depends on clause indicates conditions under which a semantic patch rule should be applied. Most of these conditions relate to the success or failure of other rules, which may be virtual rules. Giving the name of a rule implies that the current rule is applied if the named rule has succeeded in matching in the current environment. Giving ever followed by a rule name implies that the current rule is applied if the named rule has succeeded in matching in any environment. Analogously, never means that the named rule should have succeeded in matching in no environment. The boolean and, or and negation operators combine these declarations in the usual way. The declaration file in checks that the code being processed comes from the mentioned file, or from a subdirectory. The declaration file 2
in is only allowed on SmPL code-matching rules. Script rules are not applied to any code in particular, and thus it doesn’t make sense to check on the file being considered. The possible types of metavariable declarations are defined by the grammar rule below. Metavariables should occur at least once in the transformation code immediately following their declaration. Fresh identifier metavariables must only be used in + code. These properties are not expressed in the grammar, but are checked by a subsequent analysis. The metavariables are designated according to the kind of terms they can match, such as a statement, an identifier, or an expression. An expression metavariable can be further constrained by its type. A declaration metavariable matches the declaration of one or more variables, all sharing the same type specification (e.g., int a,b,c=3;). A field metavariable does the same, but for structure fields. In the minus code, a statement list metavariable can only appear as a complete function body or as the complete body of a sequence statement. In the plus code, a statement list metavariable can occur anywhere a statement list is allowed, i.e., including as an element of another statement list. 3
ids ; ids ; ids ; metadecl assignopdecl assignop_contraint binopdecl binop_contraint ::= metavariable ids ; | fresh identifier ids ; | identifier COMMA_LIST(pmid_with_regexp) ; | identifier COMMA_LIST(pmid_with_virt_or_not_eq) ; | parameter [list] | parameter list [ id ] ids ; | parameter list [ const ] ids ; | identifier [list] | identifier list [ id ] ids ; | identifier list [ const ] ids ; | type ids ; | statement [list] | declaration ids ; | field [list] ids ; | typedef ids ; | attribute ids ; | declarer name ids ; | declarer COMMA_LIST(pmid_with_regexp) ; | declarer COMMA_LIST(pmid_with_not_eq) ; | iterator name ids ; | iterator COMMA_LIST(pmid_with_regexp) ; | iterator COMMA_LIST(pmid_with_not_eq) ; | | | | expression list ids ; | expression *+ COMMA_LIST(pmid_with_not_eq) ; | expression enum * | expression struct * | expression union * | expression COMMA_LIST(pmid_with_not_ceq) ; | expression list [ id ] ids ; | expression list [ const ] ids ; | | | {ctypes} * ∗ COMMA_LIST(pmid_with_not_ceq) ; | {ctypes} * ∗ [ ] COMMA_LIST(pmid_with_not_eq) ; | constant [ctype] COMMA_LIST(pmid_with_not_eq) ; | constant [{ctypes} * | position [any] COMMA_LIST(pmid_with_not_eq_mid) ; | symbol ids; | format ids; | format list [ id ] ids ; | format list [ const ] ids ; | assignment operator COMMA_LIST(assignopdecl) ; | binary operator COMMA_LIST(binopdecl) ; ::= id [ = assignop_contraint] ::= {COMMA_LIST(assign_op)} | ::= id [ = binop_contraint] ::= {COMMA_LIST(bin_op)} | ctype [ ] COMMA_LIST(pmid_with_not_eq) ; ctype COMMA_LIST(pmid_with_not_ceq) ; assign_op bin_op [local | global] idexpression [ctype] COMMA_LIST(pmid_with_not_eq) ; [local | global] idexpression [{ctypes} * [local | global] idexpression *+ COMMA_LIST(pmid_with_not_eq) ; ∗] COMMA_LIST(pmid_with_not_eq) ; ∗ COMMA_LIST(pmid_with_not_eq) ; ∗ COMMA_LIST(pmid_with_not_eq) ; ∗ COMMA_LIST(pmid_with_not_eq) ; ∗] COMMA_LIST(pmid_with_not_eq) ; 4
A metavariable declaration local idexpression v means that v is restricted to be a local variable. If it should just be a variable, but not necessarily a local one, then drop local. A more complex description of a location, such as a->b is considered to be an expression, not an idexpression. Constant is for constants, such as 27. But it also considers an identifier that is all capital letters (possibly containing numbers) as a constant as well, because the names given to macros in Linux usually have this form. An identifier is the name of a structure field, a macro, a function, or a variable. It is the name of something rather than an expression that has a value. But an identifier can be used in the position of an expression as well, where it represents a variable. It is possible to specify that an expression list or a parameter list metavariable should match a specific number of expressions or parameters. An identifier list is only used for the parameter list of a macro. It is possible to specify its length. It is possible to specify some information about the definition of a fresh identifier. See the wiki. A symbol declaration specifies that the provided identifiers should be considered C identifiers when encountered in the body of the rule. Identifiers in the body of the rule that are not declared explicitly are by default considered symbols, thus symbol declarations are optional. It is not required, but it will not cause a parse error, to redeclare a name as a symbol. A name declared as a symbol can, however, be redeclared as another metavariable. It will be considered to be a metavariable in such rules, and will revert to being a symbol in subsequent rules. These conditions also apply to iterator names and declarer names. An attribute declaration indicates a name that should be considered to be an attribute. It is not possible to match or remove an attribute, only to add one. A position metavariable is used by attaching it using @ to any token, including another metavariable. Its value is the position (file, line number, etc.) of the code matched by the token. It is also possible to attach expression, declaration, type, initialiser, and statement metavariables in this manner. In that case, the metavariable is bound to the closest enclosing expression, declaration, etc. If such a metavariable is itself followed by a position metavariable, the position metavariable applies to the metavariable that it follows, and not to the attached token. This makes it possible to get eg the starting and ending position of f(...), by writing f(...)@E@p, for expression metavariable E and position metavariable p. This attachment notation for metavariables of type other than position can also be expressed with a conjunction, but the @ notation may be more concise. When used, a format or format list metavariable must be enclosed by a pair of @s. A format metavariable matches the format descriptor part, i.e., 2x in %2x. A format list metavariable matches a sequence of format descriptors as well as the text between them. Any text around them is matched as well, if it is not matched by the surrounding text in the semantic patch. Such text is not partially matched. If the length of the format list is specified, that indicates the number of matched format descriptors. It is also possible to use ... in a format string, to match a sequence of text fragments and format descriptors. This only takes effect if the format string contains format descriptors. Note that this makes it impossible to require ... to match exactly in a string, if the semantic patch string contains format descriptors. If that is needed, some processing with a scripting language would be required. And example for the use of string format metavariables is found in demos/format.cocci. Assignment (resp. binary) operator metavariables match any assignment (resp. binary) operator. The list of operators that can be matched can be restricted by adding an operator constraint, i.e. a list of accepted operators. Other kinds of metavariables can also be attached using @ to any token. In this case, the metavariable floats up to the enclosing appropriate expression. For example, 3 +@E 4, where E is an expression metavariable binds E to 3 + 4. A particular case is Ps@Es, where Ps is a parameter list and Es is an expression list. This pattern matches a parameter list, and then matches Es to the list of expressions, ie a possible argument list, represented by the names of the parameters. Another particular case is E@S, where E is any expression and S is a statement metavariable. S matches the closest enclosing statement, which may be more than what is matches by the semantic match pattern itself. Matching of various kinds of format strings within strings is supported. With the -ibm option, matching of decimal format declarations is supported, but the length and precision arguments are not interpreted. Thus it is not possible to match metavariables in these fields. Instead, the entire format is matched as a single string. 5
ids pmid mid pmid_with_regexp pmid_with_not_eq pmid_with_not_ceq id_or_cst id_or_meta pmid_with_not_eq_mid pos_constraint ANDAND_LIST(X) pmid_with_virt_or_not_eq ::= virtual.id pmid_with_not_eq pmid [!= { COMMA_LIST(id_or_meta) }] ::= COMMA_LIST(pmid) ::= id | mid ::= rulename_id.id ::= pmid =˜ regexp | pmid !˜ regexp ::= pmid [!= id_or_meta] | | ::= pmid [!= id_or_cst] | ::= id | ::= id | ::= pmid [ANDAND_LIST(pos_constraint)] ::= != mid | != { COMMA_LIST(mid) } | : script:ocaml (COMMA_LIST( mid )) {expr } ::= X| X && ANDAND_LIST(X) pmid [!= { COMMA_LIST(id_or_cst) }] rulename_id.id integer Subsequently, we refer to arbitrary metavariables as metaidty, where ty indicates the metakind used in the decla- ration of the variable. For example, metaidType refers to a metavariable that was declared using type and stands for any type. metavariable declares a metavariable for which the parser tried to figure out the metavariable type based on the usage context. Such a metavariable must be used consistently. These metavariables cannot be used in all contexts; specifically, they cannot be used in context that would make the parsing ambiguous. Some examples are the leftmost term of an expression, such as the left-hand side of an assignment, or the type in a variable declaration. These restrictions may seems somewhat arbitrary from the user’s point of view. Thus, it is better to use metavariables with metavariable types. If Coccinelle is given the argument -parse_cocci, it will print information about the type that is inferred for each metavariable. The ctype and ctypes nonterminals are used by both the grammar of metavariable declarations and the grammar of transformations, and are defined on page 15. An identifier metavariable with virtual as its “rule name” is given a value on the command line. For example, if a semantic patch contains a rule that declares an identifier metavariable with the name virtual.alloc, then the command line could contain -D alloc=kmalloc. There should not be space around the =. An example is in demos/vm.cocci and demos/vm.c. It is possible to give an identifier metavariable a list of constraints that it should or should not be equal to. If the constraint is a list of (unquoted) strings, then the value of the metavariable should be the same as one of the strings, in the case of an equality constraint, or different from all of the strings, in the case of an inequality constraint. It is also possible to include inherited identifier metavariables among the constraints. In the case of a positive constraint, things work in the same way, but not with respect to the inherited value of the metavariable. On the other hand, an inequality constraint does not work so well, because the only value available is the one available in the current environment. If the proposed value is different from the one in the current environment, but perhaps the same as the one in some other environment, the match will still succeed. Position metavariables can be associated with constraints implemented as OCaml script code. The code must have the form of a single C expression, typically a function call with a tuple of arguments. This expression must have type bool. The script code can be parameterized by any inherited metavariables. It is implicitly parameterized by the metavariable being declared. In the script, the inherited variable parameters are referred to by their variable names, without the associated rule name. The variable being declared is also referenced by its name. All parameters, except 6
position variables, have their string representation. An example is in demos/poscon.cocci. A declaration of a name as a typedef extends through the rest of the semantic patch. It is not required, but it will not cause a parse error, to redeclare a name as a typedef. A name declared as a typedef can, however, be redeclared as another metavariable. It will be considered to be a metavariable in such rules, and will revert to being a typedef in subsequent rules. Warning: Each metavariable declaration causes the declared metavariables to be immediately usable, without any inheritance indication. Thus the following are correct: @@ type r.T; T x; @@ [...] // some semantic patch code @@ r.T x; type r.T; @@ [...] // some semantic patch code But the following is not correct: @@ type r.T; r.T x; @@ [...] // some semantic patch code This applies to position variables, type metavariables, identifier metavariables that may be used in specifying a structure type, and metavariables used in the initialization of a fresh identifier. In the case of a structure type, any identifier metavariable indeed has to be declared as an identifier metavariable in advance. The syntax does not permit r.n as the name of a structure or union type in such a declaration. 3 Metavariables for scripts Metavariables for scripts can only be inherited from transformation rules. In the spirit of scripting languages such as Python that use dynamic typing, metavariables for scripts do not include type declarations. script_metavariables ::= @ script:language [rulename] [depends on dep] @ script_metadecl∗ @@ | @ initialize:language [depends on dep] @ script_virt_metadecl∗ @@ | @ finalize:language [depends on dep] @ script_virt_metadecl∗ @@ ::= python | ocaml ::= id << rulename_id.id ; | | | ::= id << virtual.id ; id << rulename_id.id = "..." ; id << rulename_id.id = [] ; id ; language script_metadecl script_virt_metadecl 7
Currently, the only scripting languages that are supported are Python and OCaml, indicated using python and ocaml, respectively. The set of available scripting languages may be extended at some point. Script rules declared with initialize are run before the treatment of any file. Script rules declared with finalize are run when the treatment of all of the files has completed. There can be at most one of each per scripting language (thus currently at most one of each). Initialize and finalize script rules do not have access to SmPL metavariables. Nevertheless, a finalize script rule can access any variables initialized by the other script rules, allowing information to be transmitted from the matching process to the finalize rule. Initialize and finalize rules do have access to virtual metavariables, using the usual syntax. As for other scripting language rules, the rule is not run (and essentially does not exist) if some of the required virtual metavariables are not bound. In ocaml, a warning is printed in this case. An example is found in demos/initvirt.cocci. A script metavariable that does not specify an origin, using «, is newly declared by the script. This metavari- able should be assigned to a string and can be inherited by subsequent rules as an identifier. In Python, the as- signment of such a metavariable x should refer to the metavariable as coccinelle.x. Examples are in the files demos/pythontococci.cocci and demos/camltococci.cocci. In an OCaml script, the following extended form of script_metadecl may be used: script_metadecl’ ::= (id,id) << rulename_id.id ; | | id << rulename_id.id ; id ; In a declaration of the form (id,id) << rulename_id.id ;, the left component of (id,id) receives a string repre- sentation of the value of the inherited metavariable while the right component receives its abstract syntax tree. The file parsing_c/ast_c.ml in the Coccinelle implementation gives some information about the structure of the abstract syntax tree. Either the left or right component may be replaced by _, indicating that the string representation or abstract syntax trees representation is not wanted, respectively. The abstract syntax tree of a metavariable declared using metavariable is not available. Script metavariables can have default values. This is only allowed if the abstract syntax tree of the metavariable is not requested. The default value of a position metavariable is written as []. The default value of any other kind of metavariable is a string. There is no control that the string actually represents the kind of term represented by the metavariable. Normally, a script rule is only applied if all of the metavariables have values. If default values are provided, then the script rule is only applied if all of the metavariables for which there are no default values have values. See demos/defaultscript.cocci for examples of the use of this feature. 4 Control Flow Rules describe a property that Coccinelle must match, and when the property described is matched the rule is consid- ered successful. One aspect that is taken into account in determining a match is the program control flow. A control flow describes a possible run time path taken by a program. 4.1 Basic dots When using Coccinelle, it is possible to express matches of certain code within certain types of control flows. Ellipses (“...”) can be used to indicate to Coccinelle that anything can be present between consecutive statements. For instance the following SmPL patch tells Coccinelle that rule r0 wishes to remove all calls to function c(). 1 @r0@ 2 @@ 3 4 -c(); The context of the rule provides no other guidelines to Coccinelle about any possible control flow other than this is a statement, and that c() must be called. We can modify the required control flow required for this rule by providing 8
分享到:
收藏