In 2008, error recovery for generalized (scannerless) LR parsing appeared to be an unsolvable problem. Now, users of Spoofax get an error recovering parser for free, based on the SDF syntax definition of a language. The integration of our earlier OOPSLA 2009 and SLE 2009 papers into a journal article has now been accepted for publication in ACM TOPLAS. (It may take a while before the paper is actually published; we'll provide a pre-print as soon as possible.)

Maartje de Jonge, Lennart C. L. Kats, Emma Soderberg, Eelco Visser. Natural and Flexible Error Recovery for Generated Modular Language Environments. ACM Transactions on Programming Languages and Systems, 2013 (to appear).

Abstract: Integrated development environments (IDEs) increase programmer productivity, providing rapid, interactive feedback based on the syntax and semantics of a language. Unlike conventional parsing algorithms, scannerless generalized-LR parsing supports the full set of context-free grammars, which is closed under composition, and hence can parse languages composed from separate grammar modules. To apply this algorithm in an interactive environment, this paper introduces a novel error recovery mechanism. Our approach is language-independent, and relies on automatic derivation of recovery rules from grammars. By taking layout information into consideration it can efficiently suggest natural recovery suggestions.

Update: The publication process went faster than expected and the article was published in December 2012.

Today I gave a mini-tutorial at the 5th International Conference on Software Language Engineering (SLE 2012) attempting to explain grammarware to meta-modeling researchers. Since grammarware is a huge area, I chose to discuss a selection of memes from grammarware, illustrated with examples from the Spoofax 'technological space' (as they say). Here are the slides. A recording of the talk may be published as well. Note that the slides do not have much text explaining what they are about. We plan to elaborate the tutorial material in an associated paper for the SLE 2012 post-proceedings.

Read more

Yesterday we deployed a new release of researchr. The release fixes a bug that prevented creation of personal libraries. We have recently also addressed a number of performance bugs that became manifest at scale. But the most significant change in functionality is the advanced support for faceted search. The self implemented support for faceted search allowed selection of publications in a publication list for a single facet at a time. Now you can explore the intersection of multiple facet categories and taking the union or intersection of multiple facets within a category. Try it out. For example, explore the publications on domain-specific language design and find the publications from 2009 tagged 'grammar', or find the publications in the 'oopsla' venue tagged 'language workbench'.

The custom implementation of faceted search lacked features, was slow, not always correct, and took considerable effort to implement. The new faceted search is no longer a custom implementation specific for researchr, but is based on the search DSL for WebDSL that Elmer van Chastelet has been working on for his Master's thesis project. The DSL provides abstractions for search indexing with Apache Lucene and Hibernate Search and a query language for searching objects. With these features, any WebDSL application can use these powerful libraries with a fraction of the effort it took my custom implementation. A full description of the DSL is under construction.

Triggered by the bridge parsing paper that Emma Nilsson-Nyman presented at SLE 2008, we started working in 2009 on error recovery for SGLR parsing in order to make Spoofax editors robust in the presence of syntactic errors. Most editor services, from syntax highlighting to code completion, depend on an abstract syntax tree. Since the programs are frequently in a syntactically incorrect state during editing, many editor services would break without parse error recovery.

Because of the parallel forking nature of GLR parsing, error recovery looked like an impossible problem to solve. We ended up developing an interesting mix of techniques consisting of permissive grammars, a back-tracking extension of SGLR, and layout-sensitive error region discovery that produce good error recovery without language designer intervention.

However, evaluating the quality of error recovery turned to be a laborious process with lots of pitfalls. In an ASE 2012 short paper that Maartje presented last week at the conference, a solution to this problem is presented. By generating programs with errors from correct programs, we cheaply get a large collection of test programs for which we know a good recovery. The generators randomly insert errors guided by rules about typical types of errors that occur during programming.

Maartje de Jonge, Eelco Visser. Automated Evaluation of Syntax Error Recovery. In 27th IEEE/ACM International Conference on Automated Software Engineering (ASE 2012), September 3-7, Essen, Germany. pages 322-325, ACM, 2012.

Abstract: Evaluation of parse error recovery techniques is an open problem. The community lacks objective standards and methods to measure the quality of recovery results. This paper proposes an automated technique for recovery evaluation that offers a solution for two main problems in this area. First, a representative testset is generated by a mutation based fuzzing technique that applies knowledge about common syntax errors. Secondly, the quality of the recovery results is automatically measured using an oracle-based evaluation technique. We evaluate the validity of our approach by comparing results obtained by automated evaluation with results obtained by manual inspection. The evaluation shows a clear correspondence between our quality metric and human judgement.