jan heering

These days I am writing a book on ‘domain-specific language engineering’ for use in a master’s course at Delft University. The book is about the design and implementation of domain-specific languages, i.e. the definition of their syntax, static semantics, and code generators. But it also contains a dose of linguistic reflection, by studying the phenomenon of defining languages, and of defining languages for defining languages (which is called meta-modelling these days).

Writing chapters on syntax definition and modeling of languages, takes me back to my days as a PhD student at the University of Amsterdam. Our quarters were in the university building at theWatergraafsmeer, which was connected to the CWI building via a bridge. Since the ASF+SDF group of Paul Klint was divided over the two locations, meetings required a walk to the other end of the building. So, I would regularly wander to the CWI part of the building to chat.

[While the third application we learned to use in our Unix course in 1989 was talk, with which one could synchronously talk with someone else on the internet (the first application was probably csh and the second email), face-to-face meetings were still the primary mode of communication; as opposed to the use of IRC to talk to one’s officemate.]

Often I would look into Jan Heering’s office to say hi, and more often than not would end up spending the rest of the afternoon discussing research and meta-research.

One of the recurring topics in these conversations was the importance of examples. Jan was fascinated by the notion of ‘programming by example’, i.e. deriving a program from a bunch of examples of its expected behaviour, instead of a rigorous and complete definition for all cases. But the other use of examples was for validation, a word I didn’t learn until long after writing my thesis.

The culture of the day (and probably location?) was heavily influenced by mathematics and theoretical computer science. The game was the definition, preferably algebraically, of the artifacts of interest, and then, possibly proving interesting properties. The application minded would actually implement stuff. As a language engineer I was mostly interested in making languages with cool features. The motivation for these features was often highly abstract. The main test example driving much of the work on the ASF+SDF MetaEnvironment was creating an interactive environment for the Pico language (While with variable declarations). The idea being that once an environment for Pico was realized, creating one for a more realistic language would be a matter scaling up the Pico definition (mere engineering). Actually making a language (implementation) and using that to write programs would be a real test. To be fair, there were specifications of larger languages undertaken, such as ones of (mini-) ML [8] and Pascal [6]. As a student I had developed a specification of the syntax and static semantics of the object-oriented programming language Eiffel [16], but that was so big it was not usable at the Sun workstations we had at that time.

Time and again, Jan Heering would stress the importance of real examples to show the relevance of a technique and/or to discover the requirements for a design. While I thought it was a cool idea, I didn’t have examples. At least not to sell the design and implementation of SDF2, the syntax definition formalism that turned out to be the main contribution of my PhD thesis [20].

Read more

language engineers

If you're doing research into domain-specific languages, model-driven engineering, or program generation, your agenda for the coming months is set. Early October the three main conferences on these topics are co-located in Denver. The deadlines are somewhat spread, so you should be able to submit a paper to each conference:

May 10: Model Driven Engineering Languages and Systems (MODELS'09)

May 18: Generative Programming and Component Engineering (GPCE'09)

July 10: Software Language Engineering (SLE 2009)

I'm looking forward to your submission, and to meeting you in Denver.

2009

Research challenge for 2009: trust.

As mentionted before, we've been doing some real parsing research to better support parsers for extensible languages. Parse table composition provides separate compilation for syntax components such that syntax extensions can be provided as plugins to a compiler for a base language. Due to various distractions last Summer I seem to have forgotten to blog about the paper that Martin Bravenboer and I got accepted at the first international conference on Software Language Engineering (which Martin was looking forward too).

M. Bravenboer and E. Visser. Parse Table Composition. Separate Compilation and Binary Extensibility of Grammars. In D. Gasevic and E. van Wyk, editors, First International Conference on Software Language Engineering (SLE 2008). To appear in Lecture Notes in Computer Science, Heidelberg, 2009. Springer. [pdf]

submittted

Abstract: Module systems, separate compilation, deployment of binary components, and dynamic linking have enjoyed wide acceptance in programming languages and systems. In contrast, the syntax of languages is usually defined in a non-modular way, cannot be compiled separately, cannot easily be combined with the syntax of other languages, and cannot be deployed as a component for later composition. Grammar formalisms that do support modules use whole program compilation.

Current extensible compilers focus on source-level extensibility, which requires users to compile the compiler with a specific configuration of extensions. A compound parser needs to be generated for every combination of extensions. The generation of parse tables is expensive, which is a particular problem when the composition configuration is not fixed to enable users to choose language extensions.

In this paper we introduce an algorithm for parse table composition to support separate compilation of grammars to parse table components. Parse table components can be composed (linked) efficiently at runtime, i.e. just before parsing. While the worst-case time complexity of parse table composition is exponential (like the complexity of parse table generation itself), for realistic language combination scenarios involving grammars for real languages, our parse table composition algorithm is an order of magnitude faster than computation of the parse table for the combined grammars.

The experimental parser generator is available online.

Last Summer I attended the Code Generation 2008 conference in Cambridge to give a tutorial on WebDSL, as case study in domain-specific language engineering. The conference was an interesting change from the usual academic conferences I visit, in that the majority of the audience were from industry. It was good to see the interest in code generation in industry, but also disconcerting to observe the gap between academic research and industrial practice; but more about that some other time.

agent tratt

During the conference I was interviewed by Laurence Tratt for Software Engineering Radio about parsing. The interview podcast recently appeared as Episode 118.

It was a long time ago (1997) that I defended my PhD thesis, which was mostly about syntax definition and parsing. In particular, I introduced SDF2, which radically integrates lexical and context-free syntax, and the SGLR parsing algorithm for parsing arbitrary 'character-level' context-free grammars. Since finishing my thesis I have done quite a bit of `applied parsing research', using SDF and SGLR for applications such as meta-programming with concrete object syntax and DSL embedding, but I don't consider myself a hard-core parsing researcher any more. So I had to dig deep in my memory to talk about Noam Chomsky's language hierarchy, grammars as string rewrite systems, and parsing algorithms. I find the result a bit awkward to listen to, but people assure me that is because it is my own voice I'm listening too.

In the meantime my relation to parsing is changing again. While SDF/SGLR still provides the best approach to declarative definition of composite languages (in my opinion at least), it has some fundamental limitations which have never been addressed. A first step in addressing these limitations was taken in the SLE 2008 paper with Martin Bravenboer on parse table composition (see upcoming blog) to provide separate compilation for grammars. With a new PhD student starting in the new year, I hope to address other limitations such as the lack of error recovery.