Disambiguating If Expressions in SDF

December 13, 2012

This week the following question was posed (and not properly answered) on the Stratego/XT mailing list and came up independently in a discussion with Peter Mosses about the syntax of OCaml. I had seen the problem and a solution to it before. I think it was Martin Bravenboer who used it in the syntax of AspectJ; but I can’t find the source.

The Problem

The expression

x + if y then z + a

should be parsed to the following abstract syntax tree:

Plus(Var("x"), If(Var("y"), Plus(Var("z"), Var("a"))))

The problem is that this cannot be solved directly using ambiguous binary expressions and just priorities and associativity declarations in SDF. But that does not mean there is no solution.

The Non-Solution

To make the problem a bit more interesting we throw in the left-associative function application operator from ML, which is just juxtaposition of expressions without parentheses. The following seems to be the obvious SDF grammar to describe the language of expressions:

module Ambiguous
imports Common
  context-free syntax 
    "(" Exp ")"         -> Exp {bracket}
    ID                  -> Exp {"Var"}
    Exp "+" Exp         -> Exp {"Plus", left}
    Exp Exp             -> Exp {"App", left}
    "if" Exp "then" Exp -> Exp {"If"}
  context-free priorities
    "if" Exp "then" Exp -> Exp
    > Exp Exp           -> Exp
    > Exp "+" Exp       -> Exp

However, this does not work since the higher priority of the If causes the following parse:

Plus(Var("x"), Plus(If(Var("y"), Var("z")), Var("a")))

Making the If have lower priority than Plus is not a solution either since the expression would be rejected altogether.

The Solution (with Duplicate Productions)

The solution is to observe that an If not surrounded by parentheses can only occur at the right-hand side of a Plus or App. We can describe this by means of two extra non-terminals that represent the left (L) and right (R) context. A top-level expression is now a right context. And an If can only occur in a right context or as top-level expression (which includes an expression between parentheses. This is encoded in the following grammar:

module Exp
imports Common
  context-free syntax   
    R               -> Exp
    "(" Exp ")"     -> L {bracket} 
    "(" Exp ")"     -> R {bracket} 
    ID              -> L {"VarL"}
    ID              -> R {"VarR"}
    L "+" R         -> R {"PlusR", left}
    L R             -> R {"AppR", left}
    L "+" L         -> L {"PlusL", left}
    L L             -> L {"AppL", left}    
    "if" Exp "then" R -> R {"If"}  
  context-free priorities
    {L R       -> R
     L L       -> L}
    > {L "+" R -> R
       L "+" L -> L}

This naturally ensures that Plus and App can only occur above If in the tree if they are in its left context.

The Solution (with Parameterized Modules)

What’s not so nice about the solution above is that it duplicates each production for binary operators. But this can be avoided in SDF2 by means of parameterized modules. The following module defines the syntax of expressions using the parameter non-terminals E, L, and R, only defining the right-context rules:

module BinExp[E L R]
imports Common
  context-free syntax 
    "(" E ")" -> R {bracket}
    ID        -> R {"Var"}
    L "+" R   -> R {"Plus", left}
    L R       -> R {"App", left}
  context-free priorities
    {L R       -> R
     L L       -> L}
    > {L "+" R -> R
       L "+" L -> L}

(I haven’t figured out how to avoid duplicating the productions in the priority rules.)

By instantiating this module twice we obtain the two sets of rules, without having to write them twice.

module Expressions
imports Common BinExp[Exp ExpL ExpR] BinExp[Exp ExpL ExpL]
  context-free syntax      
    ExpR                 -> Exp
    "if" Exp "then" ExpR -> ExpR {"If"}  

This is a relatively clean solution to the problem that does not rely on filtering the parse forest during or after parsing.

Note that it is crucial that the productions for If and the injection of ExpR into Exp are not defined in the BinExp module.

Update: Analysis

This solution may work, but it is not optimal. I would really like a declarative disambiguation method that properly disambiguates the non-solution grammar. Is there a generic grammar transformation that safely transforms grammars with prefix operators (such as If) to a grammar with productions for left and right context expressions? Where ‘safe’ is defined as producing a grammar that accepts the same set of sentences (but with fewer derivations). (And of course I mean the non-solution grammar without the priority rule for If.)

Update: References

The following papers are related to the topic of this post. In particular, the first three are not well known.