Adding Error Recovery to Scannerless Generalized-LR Parsing

We just got the notification that our submission to OOPSLA 2009 has been accepted. The paper presents a solution to error recovery for the SGLR parsing algorithm. Here’s the full citation and abstract (pre-print will follow later):

Lennart C. L. Kats, Maartje de Jonge, Emma Nilsson-Nyman, and Eelco Visser. “Providing Rapid Feedback in Generated Modular Language Environments. Adding Error Recovery to Scannerless Generalized-LR Parsing” In Gary T. Leavens, editor, Proceedings of the 24th ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA 2009), New York, NY, USA, October 2009. ACM. (to appear).

Abstract: Integrated Development Environments (IDEs) increase programmer productivity, providing rapid, interactive feedback based on the syntax and semantics of a language. A heavy burden lies on developers of new languages to provide adequate IDE support. Code generation techniques provide a viable, efficient approach to semi-automatically produce IDE plugins. Key components for the realization of plugins are the language’s grammar and parser. For embedded languages and language extensions, constituent IDE plugin modules and their grammars can be combined. Unlike conventional parsing algorithms, scannerless generalized-LR parsing supports the full set of context-free grammars, which is closed under composition, and hence can parse language embeddings and extensions composed from separate grammar modules. To apply this algorithm in an interactive environment, this paper introduces a novel error recovery mechanism, which allows it to be used with files with syntax errors – common in interactive editing. Error recovery is vital for providing rapid feedback in case of syntax errors, as most IDE services depend on the parser – from syntax highlighting to semantic analysis and cross-referencing. We base our approach on the principles of island grammars, and automatically generate new productions for existing grammars, making them more permissive of their inputs. To cope with the added complexity of these grammars, we adapt the parser to support backtracking. We evaluate the recovery quality and performance of our approach using a set of composed languages, based on Java and Stratego.