Talking about Parsing

Last Summer I attended the Code Generation 2008 conference in Cambridge to give a tutorial on WebDSL, as case study in domain-specific language engineering. The conference was an interesting change from the usual academic conferences I visit, in that the majority of the audience were from industry. It was good to see the interest in code generation in industry, but also disconcerting to observe the gap between academic research and industrial practice; but more about that some other time.

During the conference I was interviewed by Laurence Tratt for Software Engineering Radio about parsing. The interview podcast recently appeared as Episode 118.

It was a long time ago (1997) that I defended my PhD thesis, which was mostly about syntax definition and parsing. In particular, I introduced SDF2, which radically integrates lexical and context-free syntax, and the SGLR parsing algorithm for parsing arbitrary ‘character-level’ context-free grammars. Since finishing my thesis I have done quite a bit of `applied parsing research’, using SDF and SGLR for applications such as meta-programming with concrete object syntax and DSL embedding, but I don’t consider myself a hard-core parsing researcher any more. So I had to dig deep in my memory to talk about Noam Chomsky’s language hierarchy, grammars as string rewrite systems, and parsing algorithms. I find the result a bit awkward to listen to, but people assure me that is because it is my own voice I’m listening too.

In the meantime my relation to parsing is changing again. While SDF/SGLR still provides the best approach to declarative definition of composite languages (in my opinion at least), it has some fundamental limitations which have never been addressed. A first step in addressing these limitations was taken in the SLE 2008 paper with Martin Bravenboer on parse table composition (see upcoming blog) to provide separate compilation for grammars. With a new PhD student starting in the new year, I hope to address other limitations such as the lack of error recovery.