Automated Evaluation of Syntax Error Recovery

September 12, 2012

Triggered by the bridge parsing paper that Emma Nilsson-Nyman presented at SLE 2008, we started working in 2009 on error recovery for SGLR parsing in order to make Spoofax editors robust in the presence of syntactic errors. Most editor services, from syntax highlighting to code completion, depend on an abstract syntax tree. Since the programs are frequently in a syntactically incorrect state during editing, many editor services would break without parse error recovery.

Because of the parallel forking nature of GLR parsing, error recovery looked like an impossible problem to solve. We ended up developing an interesting mix of techniques consisting of permissive grammars, a back-tracking extension of SGLR, and layout-sensitive error region discovery that produce good error recovery without language designer intervention.

However, evaluating the quality of error recovery turned to be a laborious process with lots of pitfalls. In an ASE 2012 short paper that Maartje presented last week at the conference, a solution to this problem is presented. By generating programs with errors from correct programs, we cheaply get a large collection of test programs for which we know a good recovery. The generators randomly insert errors guided by rules about typical types of errors that occur during programming.

Maartje de Jonge, Eelco Visser. Automated Evaluation of Syntax Error Recovery. In 27th IEEE/ACM International Conference on Automated Software Engineering (ASE 2012), September 3-7, Essen, Germany. pages 322-325, ACM, 2012.

Abstract: Evaluation of parse error recovery techniques is an open problem. The community lacks objective standards and methods to measure the quality of recovery results. This paper proposes an automated technique for recovery evaluation that offers a solution for two main problems in this area. First, a representative testset is generated by a mutation based fuzzing technique that applies knowledge about common syntax errors. Secondly, the quality of the recovery results is automatically measured using an oracle-based evaluation technique. We evaluate the validity of our approach by comparing results obtained by automated evaluation with results obtained by manual inspection. The evaluation shows a clear correspondence between our quality metric and human judgement.