This week the following question was posed (and not properly answered) on the Stratego/XT mailing list and came up independently in a discussion with Peter Mosses about the syntax of OCaml. I had seen the problem and a solution to it before. I think it was Martin Bravenboer who used it in the syntax of AspectJ; but I can’t find the source.
The expression
x + if y then z + a
should be parsed to the following abstract syntax tree:
Plus(Var("x"), If(Var("y"), Plus(Var("z"), Var("a"))))
The problem is that this cannot be solved directly using ambiguous binary expressions and just priorities and associativity declarations in SDF. But that does not mean there is no solution.
To make the problem a bit more interesting we throw in the left-associative function application operator from ML, which is just juxtaposition of expressions without parentheses. The following seems to be the obvious SDF grammar to describe the language of expressions:
module Ambiguous
imports Common
exports
context-free syntax
"(" Exp ")" -> Exp {bracket}
ID -> Exp {"Var"}
Exp "+" Exp -> Exp {"Plus", left}
Exp Exp -> Exp {"App", left}
"if" Exp "then" Exp -> Exp {"If"}
context-free priorities
"if" Exp "then" Exp -> Exp
> Exp Exp -> Exp
> Exp "+" Exp -> Exp
However, this does not work since the higher priority of the If
causes the following parse:
Plus(Var("x"), Plus(If(Var("y"), Var("z")), Var("a")))
Making the If
have lower priority than Plus
is not a solution either since the expression would be rejected altogether.
The solution is to observe that an If
not surrounded by parentheses can only occur at the right-hand side of a Plus
or App
. We can describe this by means of two extra non-terminals that represent the left (L
) and right (R
) context. A top-level expression is now a right context. And an If
can only occur in a right context or as top-level expression (which includes an expression between parentheses. This is encoded in the following grammar:
module Exp
imports Common
exports
context-free syntax
R -> Exp
"(" Exp ")" -> L {bracket}
"(" Exp ")" -> R {bracket}
ID -> L {"VarL"}
ID -> R {"VarR"}
L "+" R -> R {"PlusR", left}
L R -> R {"AppR", left}
L "+" L -> L {"PlusL", left}
L L -> L {"AppL", left}
"if" Exp "then" R -> R {"If"}
context-free priorities
{L R -> R
L L -> L}
> {L "+" R -> R
L "+" L -> L}
This naturally ensures that Plus
and App
can only occur above If
in the tree if they are in its left context.
What’s not so nice about the solution above is that it duplicates each production for binary operators. But this can be avoided in SDF2 by means of parameterized modules. The following module defines the syntax of expressions using the parameter non-terminals E
, L
, and R
, only defining the right-context rules:
module BinExp[E L R]
imports Common
exports
context-free syntax
"(" E ")" -> R {bracket}
ID -> R {"Var"}
L "+" R -> R {"Plus", left}
L R -> R {"App", left}
context-free priorities
{L R -> R
L L -> L}
> {L "+" R -> R
L "+" L -> L}
(I haven’t figured out how to avoid duplicating the productions in the priority rules.)
By instantiating this module twice we obtain the two sets of rules, without having to write them twice.
module Expressions
imports Common BinExp[Exp ExpL ExpR] BinExp[Exp ExpL ExpL]
exports
context-free syntax
ExpR -> Exp
"if" Exp "then" ExpR -> ExpR {"If"}
This is a relatively clean solution to the problem that does not rely on filtering the parse forest during or after parsing.
Note that it is crucial that the productions for If
and the injection of ExpR
into Exp
are not defined in the BinExp
module.
This solution may work, but it is not optimal. I would really like a declarative disambiguation method that properly disambiguates the non-solution grammar. Is there a generic grammar transformation that safely transforms grammars with prefix operators (such as If
) to a grammar with productions for left and right context expressions? Where ‘safe’ is defined as producing a grammar that accepts the same set of sentences (but with fewer derivations). (And of course I mean the non-solution grammar without the priority rule for If
.)
The following papers are related to the topic of this post. In particular, the first three are not well known.