In September 2018, Marieke Huisman and I organized a workshop in the Lorentz Center in Leiden titled “A Research Agenda for Formal Methods in the Netherlands”. The extended abstracts contributed to the workshop were bundled by Wouter Swierstra in a Utrecht University Technical Report. Below is my contribution.

Fast and Safe Linguistic Abstraction for the Masses

Abstract

Language workbenches support the high-level definition of (domain-specific) programming languages and the automatic derivation of implementations from such definitions. The mission of language workbench research is to increase the level of abstraction of language definitions and expand the range of tools that can be generated automatically from language definitions. In this note, I give an overview of research into language workbenches at TU Delft and the perspective of future research.

Linguistic Abstraction

Software engineering is the act of encoding intent into programs. A mismatch between the concepts being encoded and the intent to be expressed leads to unnecessary software complexity. Linguistic abstraction is the process of turning software design patterns into language constructs in order to support expressing intent at the right level of abstraction [15]. A linguistic abstraction provides notation, static checking, and implementation strategies that take the understanding of the domain into account. For example, in the IceDust language for data modeling, a programmer directly expresses how derived values are computed from base values, without being concerned about implementation techniques for caching computed values and incrementally recomputing values upon data changes; an implementation of an efficient incremental re-computation strategy is generated automatically from an IceDust program [3–6]. While many DSLs exist, the process of developing DSLs should become more systematic and requires further industrial case studies.

Linguistic Abstraction for the Masses

A domain-specific programming language can be beneficial for its programmers, but can be expensive to develop. Language workbenches are tools to reduce the cost of DSL development by generating language implementation from language definitions [1, 2]. Spoofax is a language workbench developed in the Programming Languages group at TU Delft [7, 16]. Spoofax has been applied to DSLs in industry (Oracle, Océ) and academia. Linguistic Abstraction for Linguistic Abstraction Rather than implememting languages in a general purpose language, language workbenches provide domain-specific abstractions for language development, i.e. linguistic abstractions for linguistic abstraction. Spoofax is a test bed for the exploration of the design of high-level declarative meta-languages for various aspects of language definition. A recent example is the development of the Statix DSL for the specification of type systems [11, 12]. The DSL is based on a novel approach to the formalization of name binding in programming languages using scope graphs [8]. A scope graph is an abstraction of a program that represents its binding facts. This abstraction allows the definition of a language independent calculus to define name resolution in programs, giving rise to a range of language independent tools.

Fast Linguistic Abstraction

Traditionally, language workbenches define code generators to realize the implementation of DSLs programs. The disadvantage of this approach is that the semantics of the language is defined through a translation relation in terms of a target language. We are investigating whether it is possible to define directly the semantics of languages and derive an implementation from such definitions. The DynSem meta-language is designed for the high-level declarative specification of the operational semantics of programming languages [13]. DynSem specifications can be executed directly using a meta-interpreter that interprets a DynSem specification with respect to an object program, which gives rise to two levels of interpretation and significant overhead. A promising direction of research is to use partial evaluation techniques to specialize the application of a meta-interpreter [14] to provide both a fast turn around time for language design experiments and at the same time get a fast run time for the language under specification.

Safe Linguistic Abstraction

In addition to producing implementations, the proper design of a programming language requires proofs that a design satisfies properties such as type soundness and semantics preservation of transformations. Traditionally, proving such properties is separated from language implementation, leading to proofs for only subsets of a language and a divergence of specification and implementation. We are investigating meta languages that enable the automatic verification of such properties. A first proof of concept automates the verification of type soundness. A complication in type soundness proofs is the alignment of the binding of names statically and at run time. We have developed a systematic approach to formalizing this alignment by relating scopes in scope graphs to frames in frame heaps [9]. By encoding a scope graph based type system as an intrinsically typed abstract syntax signature, type checking the evaluation rules of a definition interpreter against such a signature entails type soundness of the language under definition [10].

Future Work

In this abstract I have described several proof of concept components of a development environment for creating new (domain-specific) programming languages from declarative specifications that are safe by construction and that have fast runtimes. Fully realizing the promise of these proofs of concept requires scaling up the techniques to more advanced programming languages and large code bases.

References

[1] Sebastian Erdweg, Tijs van der Storm, Markus Völter, Meinte Boersma, Remi Bosman, William R. Cook, Albert Gerritsen, Angelo Hulshout, Steven Kelly, Alex Loh, Gabriël Konat, Pedro J. Molina, Martin Palatnik, Risto Pohjonen, Eugen Schindler, Klemens Schindler, Riccardo Solmi, Vlad A. Vergu, Eelco Visser, Kevin van der Vlist, Guido Wachsmuth, and Jimi van der Woning. The state of the art in language workbenches - conclusions from the language workbench challenge. In Martin Erwig, Richard F. Paige, and Eric Van Wyk, editors, Software Language Engineering - 6th International Conference, SLE 2013, Indianapolis, IN, USA, October 26-28, 2013. Proceedings, volume 8225 of Lecture Notes in Computer Science, pages 197–217. Springer, 2013.

[2] Sebastian Erdweg, Tijs van der Storm, Markus VÃűlter, Laurence Tratt, Remi Bosman, William R. Cook, Albert Gerritsen, Angelo Hulshout, Steven Kelly, Alex Loh, Gabriël Konat, Pedro J. Molina, Martin Palatnik, Risto Pohjonen, Eugen Schindler, Klemens Schindler, Riccardo Solmi, Vlad A. Vergu, Eelco Visser, Kevin van der Vlist, Guido Wachsmuth, and Jimi van der Woning. Evaluating and comparing language workbenches: Existing results and benchmarks for the future. Computer Languages, Systems & Structures, 44:24–47, 2015.

[3] Daco Harkes, Danny M. Groenewegen, and Eelco Visser. Icedust: Incremental and eventual computation of derived values in persistent object graphs. In Shriram Krishnamurthi and Benjamin S. Lerner, editors, 30th European Conference on Object-Oriented Programming, ECOOP 2016, July 18-22, 2016, Rome, Italy, volume 56 of LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016.

[4] Daco Harkes and Eelco Visser. Unifying and generalizing relations in role-based data modeling and navigation. In Benoît Combemale, David J. Pearce, Olivier Barais, and Jurgen J. Vinju, editors, Software Language Engineering - 7th International Conference, SLE 2014, Västeras, Sweden, September 15-16, 2014. Proceedings, volume 8706 of Lecture Notes in Computer Science, pages 241–260. Springer, 2014.

[5] Daco Harkes and Eelco Visser. Icedust 2: Derived bidirectional relations and calculation strategy composition. In Peter Müller, editor, 31st European Conference on Object-Oriented Programming, ECOOP 2017, June 19-23, 2017, Barcelona, Spain, volume 74 of LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017.

[6] Daco C. Harkes, Elmer van Chastelet, and Eelco Visser. Migrating business logic to an incremental computing dsl: a case study. In David Pearce 0005, Tanja Mayerhofer, and Friedrich Steimann, editors, Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering, SLE 2018, Boston, MA, USA, November 05-06, 2018, pages 83–96. ACM, 2018.

[7] Lennart C. L. Kats and Eelco Visser. The Spoofax language workbench: rules for declarative specification of languages and IDEs. In William R. Cook, SiobhÃąn Clarke, and Martin C. Rinard, editors, Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2010, pages 444–463, Reno/Tahoe, Nevada, 2010. ACM.

[8] Pierre NÃľron, Andrew P. Tolmach, Eelco Visser, and Guido Wachsmuth. A theory of name resolution. In Jan Vitek, editor, Programming Languages and Systems - 24th European Symposium on Programming, ESOP 2015, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2015, London, UK, April 11-18, 2015. Proceedings, volume 9032 of Lecture Notes in Computer Science, pages 205–231. Springer, 2015.

[9] Casper Bach Poulsen, Pierre Néron, Andrew P. Tolmach, and Eelco Visser. Scopes describe frames: A uniform model for memory layout in dynamic semantics. In Shriram Krishnamurthi and Benjamin S. Lerner, editors, 30th European Conference on Object-Oriented Programming, ECOOP 2016, July 18-22, 2016, Rome, Italy, volume 56 of LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016.

[10] Casper Bach Poulsen, Arjen Rouvoet, Andrew P. Tolmach, Robbert Krebbers, and Eelco Visser. Intrinsically-typed definitional interpreters for imperative languages. Proceedings of the ACM on Programming Languages, 2(POPL), 2018.

[11] Hendrik van Antwerpen, Pierre Néron, Andrew P. Tolmach, Eelco Visser, and Guido Wachsmuth. A constraint language for static semantic analysis based on scope graphs. In Martin Erwig and Tiark Rompf, editors, Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, PEPM 2016, St. Petersburg, FL, USA, January 20 - 22, 2016, pages 49–60. ACM, 2016.

[12] Hendrik van Antwerpen, Casper Bach Poulsen, Arjen Rouvoet, and Eelco Visser. Scopes as types. Proceedings of the ACM on Programming Languages, 2(OOPSLA), 2018.

[13] Vlad A. Vergu, Pierre Néron, and Eelco Visser. Dynsem: A dsl for dynamic semantics specification. In Maribel FernÃąndez, editor, 26th International Conference on Rewriting Techniques and Applications, RTA 2015, June 29 to July 1, 2015, Warsaw, Poland, volume 36 of LIPIcs, pages 365–378. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2015.

[14] Vlad A. Vergu and Eelco Visser. Specializing a meta-interpreter: Jit compilation of Dynsem specifications on the Graal vm. In Eli Tilevich and Hanspeter MÃűssenbÃűck, editors, Proceedings of the 15th International Conference on Managed Languages & Runtimes, ManLang 2018, Linz, Austria, September 12-14, 2018. ACM, 2018.

[15] Eelco Visser. Understanding software through linguistic abstraction. Science of Computer Programming, 97:11–16, 2015.

[16] Guido Wachsmuth, Gabriël Konat, and Eelco Visser. Language design with the spoofax language workbench. IEEE Software, 31(5):35–43, 2014.

I gave a talk in the PL seminar at TU Delft about our work in progress on formalizing the semantics of disambiguation by associativity and priority rules.

A Direct Semantics of Declarative Disambiguation Rules from Eelco Visser

These are the slides of my talk at the IFIP WG2.16 (Language Design) meeting in Portland in February 2019. The code of the slides can be found in the Statix Sandbox repository on github.

Abstract: In this talk I present the design of Statix, a new constraint-based language for the executable specification of type systems. Statix specifications consist of predicates that define the well-formedness of language constructs in terms of built-in and user-defined constraints. Statix has a declarative semantics that defines whether a model satisfies a constraint. The operational semantics of Statix is defined as a sound constraint solving algorithm that searches for a solution for a constraint. The aim of the design is that Statix users can ignore the execution order of constraint solving and think in terms of the declarative semantics.

A distinctive feature of Statix is its use of scope graphs, a language parametric framework for the representation and querying of the name binding facts in programs. Since types depend on name resolution and name resolution may depend on types, it is typically not possible to construct the entire scope graph of a program before type constraint resolution. In (algorithmic) type system specifications this leads to explicit staging of the construction and querying of the type environment (class table, symbol table). Statix automatically stages the construction of the scope graph of a program such that queries are never executed when their answers may be affected by future scope graph extension. In the talk, I will explain the design of Statix by means of examples.

At PEPM 2016 we presented the design of a constraint language for type checking based on scope graphs. That design was the basis for the NaBL2 name and type analysis language that is currently integrated into the Spoofax language workbench. While the approach deals elegantly with a large variety of name binding patterns, including type-dependent name resolution, it was limited to `simple’ type systems. In particular, NaBL2 does not support type systems with structural (anonymous) types and generic (parameterized) types. In our OOPSLA 2018 paper we show that these limitations are not intrinsic to the scope graph model. We show that by using scopes as types, we can model such types. The picture shows the scope graph for a program with structural record types.

To be presented next week at SPLASH 2018 in Boston.

Hendrik van Antwerpen, Casper Bach Poulsen, Arjen Rouvoet, Eelco Visser. Scopes as Types. Proceedings of the ACM on Programming Languages, Volume 2 Issue OOPSLA, November 2018, Article No. 114. 10.1145/3276484

Abstract: Scope graphs are a promising generic framework for modeling the binding structures of programming languages, briding formalization and implementation, supporting the definition of type checkers and the automation of type safety proofs. However, previous work on scope graphs has been limited to simple, nominal type systems. In this paper, we show that viewing scopes as types enables us to model the internal structure of types in a range of non-simple type systems (including structural records and generic classes) using the generic representation of scopes. Further, we show that relations between such types can be expressed in terms of generalized scope graph queries. We extend scope graphs with scoped relations and queries. We introduce Statix, a new domain-specific meta-language for the specification of static semantics, based on scope graphs and constraints. We evaluate the scopes as types approach and the Statix design in case studies of STLCREC, System F, and Featherweight Generic Java.

Following up on our paper about the design of the PIE build DSL, a paper at the ASE 2018 conference, to be presented by Gabriël Konat next week, deals with the poor scalability of the Pluto algorithm we used for the implementation of PIE. Since the algorithm inspects the entire dependency graph, even if few nodes are affected by a change, there is a large overhead that becomes significant in large projects. This paper extends the Pluto algorithm with a bottom-up traversal of the dependency graph that only visits nodes affected by a change, dramatically reducing build times in most scenarios.

Gabriël Konat, Sebastian Erdweg, and Eelco Visser. 2018. Scalable incremental building with dynamic task dependencies. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). ACM, New York, NY, USA, 76-86. DOI: https://doi.org/10.1145/3238147.3238196

Abstract:

Incremental build systems are essential for fast, reproducible software builds. Incremental build systems enable short feedback cycles when they capture dependencies precisely and selectively execute build tasks efficiently. A much overlooked feature of build systems is the expressiveness of the scripting language, which directly influences the maintainability of build scripts. In this paper, we present a new incremental build algorithm that allows build engineers to use a full-fledged programming language with explicit task invocation, value and file inspection facilities, and conditional and iterative language constructs. In contrast to prior work on incrementality for such programmable builds, our algorithm scales with the number of tasks affected by a change and is independent of the size of the software project being built. Specifically, our algorithm accepts a set of changed files, transitively detects and re-executes affected build tasks, but also accounts for new task dependencies discovered during building. We have evaluated the performance of our algorithm in a real-world case study and confirm its scalability.