Synthesized attributes in Lark

Synthesized attributes in Lark#

Attributes are values attached to the grammar symbols, and synthesized attributes are attributes of non-terminal symbols which are computed at the moment of a reduction introducing that symbol (so they are computed bottom-up).

In the grammar they are written as python expressions between {{ and }} at the end of a rule.

For instance the classic arithmetic grammar with synthesized attributes computing the value of the expression:

from lark import Lark

synthesized_arith_grammar = """
%import common.NUMBER
%import common.WS_INLINE
%ignore WS_INLINE

?start: sum     {{ syn[1] }}

?sum: product           {{ syn[1] }} 
| sum "+" product       {{ syn[1] + syn[3] }}
| sum "-" product       {{ syn[1] - syn[3] }}

?product: atom           {{syn[1]}}
| product "*" atom       {{syn[1] * syn[3]}}
| product "/" atom       {{syn[1] / syn[3]}}

?atom: NUMBER         {{int(syn[1])}}
| "-" atom            {{-syn[2]}}
| "(" sum ")"         {{syn[2]}}
"""

parser = Lark(synthesized_arith_grammar, parser="lalr")
interactive_parser = parser.parse_interactive(start="start")
for t in parser.lex("5 * 3 - 7"):
    interactive_parser.feed_token(t)
interactive_parser.feed_eof()
interactive_parser.parser_state.attribute
8

In these attributes, syn is a special variable referring to the synthesized attributes of the symbols of the expansion it annotates. The indexing starts at 1 (like in yacc/bison), and terminal symbols are automatically given a synthesized attribute of their matching value.

Inside the parser the attribute expressions are stored as python AST objects attached to the rules. A stack of attributes is maintained alongside the stack of values and the stack of states, and these AST objects are transformed so that the syn slices point to the correct element of the attributes stack.

Let’s look at the internals of one of the rules deriving product for illustration:

import ast
rule = parser.rules[5]
print(f"{rule.origin} : {rule.expansion}, {{{{ {ast.unparse(rule.ast)} }}}}   ")
NonTerminal(Token('RULE', 'product'),'') : [NonTerminal('product',''), Terminal('STAR'), NonTerminal('atom','')], {{ stack[-3] * stack[-1] }}   

The AST expression is evaluated at the reduction step, when the top of the stack of attributes is filled with the attributes of the symbols product, STAR and atom.

The environment of its evaluation is set so that stack points to attributes stack, effectively computing the product of the attribute of product with the attribute of atom, and pushing the result into the attributes stack.