December 13, 2012

This week the following question was posed (and not properly answered) on the Stratego/XT mailing list and came up independently in a discussion with Peter Mosses about the syntax of OCaml. I had seen the problem and a solution to it before. I think it was Martin Bravenboer who used it in the syntax of AspectJ; but I can’t find the source.

The expression

```
x + if y then z + a
```

should be parsed to the following abstract syntax tree:

```
Plus(Var("x"), If(Var("y"), Plus(Var("z"), Var("a"))))
```

The problem is that this cannot be solved directly using ambiguous binary expressions and just priorities and associativity declarations in SDF. But that does not mean there is no solution.

To make the problem a bit more interesting we throw in the left-associative function application operator from ML, which is just juxtaposition of expressions without parentheses. The following seems to be the obvious SDF grammar to describe the language of expressions:

```
module Ambiguous
imports Common
exports
context-free syntax
"(" Exp ")" -> Exp {bracket}
ID -> Exp {"Var"}
Exp "+" Exp -> Exp {"Plus", left}
Exp Exp -> Exp {"App", left}
"if" Exp "then" Exp -> Exp {"If"}
context-free priorities
"if" Exp "then" Exp -> Exp
> Exp Exp -> Exp
> Exp "+" Exp -> Exp
```

However, this does not work since the higher priority of the `If`

causes the following parse:

```
Plus(Var("x"), Plus(If(Var("y"), Var("z")), Var("a")))
```

Making the `If`

have lower priority than `Plus`

is not a solution either since the expression would be rejected altogether.

The solution is to observe that an `If`

not surrounded by parentheses can only occur at the right-hand side of a `Plus`

or `App`

. We can describe this by means of two extra non-terminals that represent the left (`L`

) and right (`R`

) context. A top-level expression is now a right context. And an `If`

can only occur in a right context *or* as top-level expression (which includes an expression between parentheses. This is encoded in the following grammar:

```
module Exp
imports Common
exports
context-free syntax
R -> Exp
"(" Exp ")" -> L {bracket}
"(" Exp ")" -> R {bracket}
ID -> L {"VarL"}
ID -> R {"VarR"}
L "+" R -> R {"PlusR", left}
L R -> R {"AppR", left}
L "+" L -> L {"PlusL", left}
L L -> L {"AppL", left}
"if" Exp "then" R -> R {"If"}
context-free priorities
{L R -> R
L L -> L}
> {L "+" R -> R
L "+" L -> L}
```

This naturally ensures that `Plus`

and `App`

can only occur above `If`

in the tree if they are in its left context.

What’s not so nice about the solution above is that it duplicates each production for binary operators. But this can be avoided in SDF2 by means of parameterized modules. The following module defines the syntax of expressions using the parameter non-terminals `E`

, `L`

, and `R`

, only defining the right-context rules:

```
module BinExp[E L R]
imports Common
exports
context-free syntax
"(" E ")" -> R {bracket}
ID -> R {"Var"}
L "+" R -> R {"Plus", left}
L R -> R {"App", left}
context-free priorities
{L R -> R
L L -> L}
> {L "+" R -> R
L "+" L -> L}
```

(I haven’t figured out how to avoid duplicating the productions in the priority rules.)

By instantiating this module twice we obtain the two sets of rules, without having to write them twice.

```
module Expressions
imports Common BinExp[Exp ExpL ExpR] BinExp[Exp ExpL ExpL]
exports
context-free syntax
ExpR -> Exp
"if" Exp "then" ExpR -> ExpR {"If"}
```

This is a relatively clean solution to the problem that does not rely on filtering the parse forest during or after parsing.

Note that it is crucial that the productions for `If`

and the injection of `ExpR`

into `Exp`

are not defined in the `BinExp`

module.

This solution may work, but it is not optimal. I would really like a declarative disambiguation method that properly disambiguates the non-solution grammar. Is there a generic grammar transformation that safely transforms grammars with prefix operators (such as `If`

) to a grammar with productions for left and right context expressions? Where ‘safe’ is defined as producing a grammar that accepts the same set of sentences (but with fewer derivations). (And of course I mean the non-solution grammar *without* the priority rule for `If`

.)

The following papers are related to the topic of this post. In particular, the first three are not well known.

- Grammar Engineering Support for Precedence Rule Recovery and Compatibility Checking
- From Context-free Grammars with Priorities to Character Class Grammars
- A Case Study in Optimizing Parsing Schemata by Disambiguation Filters
- Using Filters for the Disambiguation of Context-free Grammars
- A Family of Syntax Definition Formalisms
- Disambiguation Filters for Scannerless Generalized LR Parsers
- Safe Specification of Operator Precedence Rules