Inherited attributes and the marker trick#
Inherited attributes are attributes of non-terminal symbols which are defined in terms of the synthesized attributes of the siblings of the symbol (only those on its left) and of the inherited attribute of its parent.
They are computed before entering a derivation for a non-terminal symbol, not after its reduction like a synthesized attributes. It is possible to translate inherited attributes into synthesized ones following the marker trick explained in the dragon book (section 5.5.4).
Marker trick#
The inherited attributes are python expressions attached directly to the non-terminal symbols (again between {{
and }}
).
The marker trick works as follows, every time an inherited attributes is defined in a rule:
insert a new non-terminal symbol (the marker) just before
add a new rule to the grammar in which the marker derives to an empty string, and with the inherited attribute expression attached to it as a synthesized attribute
a: b c{{expr}} d
is replaced witha: b m c d
and a new rulem: {{expr}}
is added to the grammar
Declared variables with inherited attributes#
Thanks to the inherited attributes, we can define let ... in ...
statements, and pass on a context of declared variables.
We’ll go with an example to illustrate the feature. So, here is the (simplified) arithmetic grammar with let ... in ...
statements:
from lark import Lark
inherited_arith_grammar = """
%import common.NUMBER
%import common.WS_INLINE
%import common.CNAME
%ignore WS_INLINE
%python_header {{
from typing import NamedTuple
Ctx = NamedTuple('Ctx', [('ctx', dict[str, any]), ('value', any)])
}}
start: "let" declar_var{{ Ctx(dict(), None) }} "in" expr{{ Ctx(syn[2].ctx, None) }} {{ syn[4].value }}
declar_var: CNAME "=" expr{{ inh }} _declar_var{{ Ctx({ syn[1]: syn[3].value, **inh.ctx }, None) }} {{ syn[4] }}
_declar_var: {{ inh }}
| "," declar_var{{ inh }} {{ syn[2] }}
expr: atom{{ inh }} _sum{{ syn[1] }} {{ syn[2] }}
_sum: {{ inh }}
| "+" expr{{ inh }} {{ Ctx(syn[2].ctx, syn[2].value + inh.value) }}
atom: NUMBER {{ Ctx(inh.ctx, int(syn[1])) }}
| CNAME {{ Ctx(inh.ctx, inh.ctx[syn[1]]) }}
| "-" atom{{ inh }} {{ Ctx(inh.ctx, -syn[2].value) }}
"""
parser = Lark(inherited_arith_grammar, parser="lalr")
interactive_parser = parser.parse_interactive(start="start")
for t in parser.lex("let x = 5, y = 10 in 5 + x + -y"):
interactive_parser.feed_token(t)
interactive_parser.feed_eof()
interactive_parser.parser_state.attribute
0
First note that we have defined a python header; it will be stored in the parser as an AST module and executed every time before the evaluation of the attributes.
A new special variable inh
is also introduced; it will point to the inherited attribute of the rule’s origin symbol.
Let’s have a look at the rule for declar_var
:
import ast
from lark.grammar import Terminal
rules = parser.rules[3:6]
for r in rules:
print(f"{r.origin.name} : {[t.name if isinstance(t, Terminal) else t.name.value for t in r.expansion]},\t{{{{ {ast.unparse(r.ast)} }}}}")
declar_var : ['CNAME', 'EQUAL', '_#declar_var#0#2', 'expr', '_#declar_var#0#3', '_declar_var'], {{ stack[-1] }}
_#declar_var#0#2 : [], {{ stack[-3] }}
_#declar_var#0#3 : [], {{ Ctx({stack[-4]: stack[-1].value, **stack[-5].ctx}, None) }}
Here two new non-terminal symbols with empty derivation have been introducted at the compilation of the grammar, one for each inherited attribute defined in the rule:
_#declar_var#0#2
with attribute{{ stack[-3] }}
_#declar_var#0#3
with attribute{{ Ctx({stack[-4]: stack[-1].value, **stack[-5].ctx}, None) }}
If we pay attention to the slice indices, inh
(pushed into the new _declar_var#0#2
rule) has been
replaced by a pointer to the element of the stack
just before the attributes of the symbols appearing in the rule, which is where the inherited attribute of
the origin symbol (declar_var
) is stored.
During the running of the parser, the stack of attributes ends up being an intertwining of synthesized and inherited attributes. The indices of the synthesized attributes are updated in consequences.
In order to pass on the declaration context, an inherited attribute needs to be added at every non-terminal symbol. And a clear disadvantage of abusing inherited attributes is that it generates a lot of conflicts in the parser – hence the cumbersome rules.
This issue can easily be solved with the introduction of a global variable to keep track of the context.
Inherited attributes and global variable#
Let’s introduce another special variable to use in attribute expressions (besides syn
and inh
): GLOBAL
.
It is simply an empty object (so, mutable), whose reference is passed into the environment of evaluation of attribute expressions.
Thanks to that global variable, inherited attributes are now needed only at the entrance and exit of the declaration
context (to add and remove variables from the context). Here again the arithmetic grammar with let ... in ...
statements, this
time using the global variable:
global_arith_grammar = """
%import common.NUMBER
%import common.CNAME
%import common.WS_INLINE
%ignore WS_INLINE
%python_header {{
def init_ctx(obj):
obj.ctx = dict()
pop_list = lambda ctx, declar_var: [ctx.pop(k) for k,v in declar_var]
}}
start: sum{{ init_ctx(GLOBAL) }} {{ syn[1] }}
| "let" declar_var{{ init_ctx(GLOBAL) }} "in" sum{{ GLOBAL.ctx.update(syn[2]) }} {{ [pop_list(GLOBAL.ctx, syn[2]), syn[4]][1] }}
declar_var: CNAME "=" sum _declar_var {{ [(syn[1], syn[3])] + syn[4] }}
_declar_var: {{ [] }}
| "," declar_var {{ syn[2] }}
?sum: product {{ syn[1] }}
| sum "+" product {{ syn[1] + syn[3] }}
| sum "-" product {{ syn[1] - syn[3] }}
?product: atom {{ syn[1] }}
| product "*" atom {{ syn[1] * syn[3] }}
| product "/" atom {{ syn[1] / syn[3] }}
?atom: NUMBER {{ int(syn[1]) }}
| CNAME {{ GLOBAL.ctx[syn[1]] }}
| "-" atom {{ -syn[2] }}
| "(" sum ")" {{ syn[2] }}
"""
parser = Lark(global_arith_grammar, parser="lalr")
interactive_parser = parser.parse_interactive(start="start")
for t in parser.lex("let x = 5, y = 10 in 5 * x + y"):
interactive_parser.feed_token(t)
interactive_parser.feed_eof()
interactive_parser.parser_state.attribute
35