A Growable Language Manifesto
Warning: this is by far the largest and most action-packed post I’ve ever made. Grab a cup of [insert favored beverage here], sit back, and enjoy. If you get headcramp from reading this in narrow-column blog format, there’s another full-screen version here — but please return to this post to leave your comments!
I’ve posted recently about the dynamic versus static flamewar and about recent research in extensible languages. The more I think about these ideas, the more compelling they get. It seems clear to me that these ideas are synergistic and point in the direction of a new kind of programming environment — one which could potentially offer the ease of dynamic languages, the safety of static languages, and an altogether new level of extensibility (both for enhancing the type system and for allowing safe metaprogramming.)
So I want to lay out a manifesto here for a growable programming language. Or perhaps it’s more like a toolbox for language construction. Or perhaps it’s a framework for experimenting with extensible syntax and extensible analysis. In any case, here is the vision. At the end, I include extensive references to the research that’s inspired these ideas.
- A growable language should be opt-in. It must support a graceful spectrum of usage styles. A growable language will necessarily be multi-paradigmatic, since it will be extensible with features from various fields of programming. The language should implement a dynamically-typed core on which all static analysis is built, to allow programmers a straightforward and lightweight syntax for prototyping.
- A growable language should be increasingly powerful. It should support a rich and expanding variety of static analyses and dynamic checking, to give maximal benefit to the programmer who wishes to leverage these features. over time, as more program analyses become possible, a growable language should gracefully integrate them. A growable language must continually increase the expressive power of the programmer, both in terms of what the programmer can say and in terms of what the environment can help the programmer prove about their program. A growable language should have a powerful analysis framework as a foundational component, such that other analyses can share that basic capability; wherever possible, additional analyses should require only incremental effort by leveraging infrastructure already created.
- A growable language needs a declarative metalanguage. Current languages do not define inherent mechanisms for syntax extension and for type extension. The syntax and the type system are both baked into the language’s compiler. A growable language needs to lift the specification level of the language itself to be both declarative and modularly defined, so that syntax and type analytics can both be expressed and implemented as layered extensions.
- A growable language needs partial typing. Certain analyses are necessarily incomplete or even undecidable. A growable language should permit such type systems, and should be able to fall back on dynamic typing whenever analysis is inconclusive. This lets programmers work freely without having to continually meet the demands of the type system, yet supports graceful type annotation to enhance static safety at the programmer’s discretion. (Without partial typing, the opt-in goal cannot be met.) The language should ideally provide clear feedback as to when its analyses are conclusive or inconclusive, and ideally should identify the sources of inconclusiveness so the programmer can either annotate them appropriately or deliberately ignore them.
- A growable language needs layered types. A growable language should be able to extend itself with primitive types, dependent types (e.g. data-dependent subrange types), traits or other parametric or abstraction-based genericity mechanisms, and multiple flavors of type qualifiers. Integers should be as much of a language extension as non-null types, tainted types, or range types. A growable language requires an extensible subtype lattice.
- A growable language needs inferrable types. To avoid drowning in explicit type declarations, the language should be able to infer types wherever possible, and the programming environment should support controllable visibility for the inferred type information. Without inferred types and environmental support for controlling analysis visibility, a growable language cannot scale for users; being able to selectively ignore some or all of the (many) analyses is critical.
- A growable language needs explicit, optional type annotations. An extensible analysis framework will be able to infer or deduce a great deal about the actual behavior of the program. But the actual behavior of the program may or may not correspond to the programmer’s intent. The programmer’s explicit annotations express the programmer’s expectations. A programmer might look at the analyzed annotations and make some of them explicit — perhaps they reflect properties of the code that the programmer considers important after the fact, and that the programmer now wants to enforce. Or a programmer might add explicit annotations during development, such that the system can confirm they are valid and warn when they are violated. Explicit annotations at module boundaries — whether programmer-generated or system-generated — are likely to aid in separate compilation and in module documentation.
- A growable language must be efficiently implementable. As more language features are added — more qualifiers, more types, more syntax — the language must still be efficient and usable. This applies both to programs written in the language (which should have performance competitive at least with Java or the CLR) and to programming environments for the language. The latter requires aggressive attention to analytical optimizations and to multi-threaded analysis frameworks. As the language’s analysis structure grows, the language’s programming environments must be able to leverage multicore architectures to ensure consistent responsiveness for users. Moreover, the language should be continuously analyzing in the background; incremental feedback as the user edits should be available for all language features and extensions.
- A growable language must have a unified internal representation. Concrete syntax, abstract syntax, type and name bindings, type qualifier propagation, type parameterization, dependent range type constraints, and logical queries over the program’s alias structure should all leverage a single internal representation of the program. This maximizes the reusability of internal language implementation code and ensures consistency in analytical derivations. Where multiple representations are necessary, the derivation rules must be clear and consistent.
- A growable language must promote static metaprogramming. Fully dynamic metaprogramming — runtime extension of classes and objects with arbitrary code, or even unrestricted dynamic reflective access to arbitrary methods — is almost impossible to analyze effectively. To give a growable language’s extended type systems maximum exposure to the actual behavior of the code, static metaprogramming techniques must be definable, extensible, and compatible with the rest of the language’s structure. One would hope that the very extensibility techniques that implement the language itself would be usable for static metaprogramming.
- A growable language must support diverse analyses. Some analyses are most naturally expressed as systems of recursive equations over an abstract syntax tree. Others can best be expressed as logical queries over a program’s alias graph. Ideally these could both be expressed naturally in the metalanguage.
- A growable language must be analytically composable. This is likely the single most technically ambitious goal. Traditional compiler development involves subtle and explicit scheduling tradeoffs, layering multiple analyses through manual interleavings that can be fragile or non-optimal. A growable language with a declarative metalanguage needs an analysis engine that can automatically schedule multiple, potentially interleaved analyses, leveraging parallelism where possible to optimize non-dependent analyses. Achieving this goal will be immensely difficult, but proportionately valuable. Interestingly, here is where the metalanguage itself will require many of the same extensibility properties as the language it describes; meta-level closure — implementing the metalanguage using the language itself — will be a holy grail for this language design.
Is the above language even possible? Is it too much of a stretch — particularly the “unified internal representation” and “analytically composable” goals? Maybe so. I’m definitely not an expert at programming language implementation; the only compiler I ever wrote was back in high school. So this may be ridiculously unrealistic in whole or in part. I welcome feedback on which areas of this manifesto are more or less plausible.
Overall, consider this my stab at answering Paul Graham’s challenge to ponder the hundred-year language. Given current trends in programming language development, it seems that languages of the future will transcend being “languages” as we know them and will become more like unified environments for language creation and extension. Arguably, this vision has a lot in common with intentional programming, which doesn’t bode well, since the intentional guys have been in stealth mode for almost fifteen years and nothing has publicly come of it. But that doesn’t mean the general direction isn’t interesting, any more than the slow progress of Chandler means that a unified and flexible personal information manager isn’t worth pursuing.
I promised references. Here they are:
- opt-in — Gilad Bracha’s pluggable type systems paper is the origin of this goal. Bracha forcefully posits that static type systems are all necessarily incomplete and that dynamically typed languages have necessary flexibility. Meijer makes a similar point in his static where possible, dynamic where necessary paper. I’m not clear that Bracha’s extreme view — that all static analysis must be kept out of the language kernel — is the correct one, given the potential performance cost, but I suppose RPython provides an encouraging counterpoint.
- increasingly powerful — There are increasingly many varieties of static analysis being developed primarily for Java. One recent paper on http://www.cs.umd.edu/~jfoster/papers/oopsla07-uno.pdf points out that its framework could straightforwardly be implemented on top of other analyses with similar intraprocedural resolution. The BDDBDDB framework has already been used for implementing taint analysis, and the JastAdd system for non-null Java type inference. In general it seems there is a lot of opportunity for shared infrastructure here. Also note that support for excellent error messages and great visibility into analysis results (and analysis failures) will be critical for usability. See Grimm’s paper titled Systems Need Languages Need Systems for some forceful advocacy here.
- a defining metalanguage — Some good examples of metalanguages for syntactic language description are the OMeta pattern-matching language for executable grammars, Gilad’s executable grammars in NewSpeak, and the Rats! extensible parsing system. A good example of an extensible language for static analysis is the JastAdd extensible Java compiler with its support for defining rewritable circular reference-attributed grammars… they implemented Java 1.5 generics as a modular declarative compiler extension, which proves their point to me, anyway!
- partial typing — The two best examples of this that I know of are the gradual typing work of Siek and Taha, and the hybrid typing for undecidable type systems work of Flanagan, Freund, et al. In both cases, a static type system is enhanced with a generalized type (named Dynamic or “?”), which is inferred to be a more specific static type where possible, and otherwise cast at runtime to preserve dynamic type safety.
- layered types — The already-mentioned JastAdd system is perhaps the best example of a structure which permits layering additional analyses. The extensible Java compiler Polyglot is another.
- inferrable types — My original thinking about this entire line of research originated in a blog post from a couple of months ago where I realized that some implementations of type qualifiers — for example, a “tainted” type qualifier in C++ — would ripple through the whole program due to mandatory explicit static typing everywhere. It’s noteworthy that many type qualifier analyses for Java are not based on explicit syntax. For example, the taint analysis based on the BDDBDDB framework does not require explicit propagation of tainted or untainted declarations, yet it derives such information throughout the program’s structure. An environment which made the results of that analysis visible — and traceable — at every program point would let the programmer see the flow of tainted values without having to explicitly declare them.
- explicit, optional type annotations — Programmers must also be able to add explicit qualifiers, since the programmer’s intent may or may not match the analysis; the analysis may be incomplete and the programmer needs to provide more information, or the analysis may be consistent and the programmer wants to declare that they confirm it and want it to be enforced at that location (e.g. if the program changes such that the property is no longer true there, the language would signal an error). The programmer’s explicit intent and the analyser’s implicit understanding should be able to be flexibly cross-checked. I’m not aware of any inference systems that support this fully; they seem to be either purely inference-based (e.g. JQual) or purely annotation-based (e.g. a C++-based type qualifier system discussed in the “Extending Type Systems in a Library” paper from LCSD ’06).
- efficiently implementable — This is obviously enormously difficult, insofar as analyses can in general be interdependent. There is a great tension between layering analyses (for separability) and weaving them (for mutual dependence and synergy). See the “analytically composable” goal below. In general, I wouldn’t be surprised if aggressive parallelization of a language analysis / compilation framework required software transactions to support optimistic and incremental analysis by multiple threads.
- a unified internal representation — I mean something along the lines of Grimm’s declarative extensible syntax trees, a concept proven by his weaving of Java and C typechecking into a single system. The JastAdd framework is another example; there is no separate symbol table in JastAdd, since name bindings become reference links in the abstract syntax tree (which is really more of an abstract syntax graph). Note that JastAdd’s own declarative language for extending the syntax tree is fundamentally similar to open classes, in that subsequent extensions can directly modify the structure of already-defined syntax node classes. This seems to inhibit modular development of language extensions, but modular language extension development is really hard anyway.
- promote static metaprogramming — This goal is about ensuring the entire program text remains analyzable, and about permitting domain-specific languages to be implemented in the same structure used for extending the base language. See OMeta’s SQL extensions or C# 2.0’s language-integrated queries which reify some expressions as syntax trees visible to the runtime system. The Google Web Toolkit’s support for extensible code generation is another example, as is the RapidMind system for creating parallel compute kernels from C++ code. Finally, there’s the RPython system which creates a “metaprogramming startup phase” for Python programs, followed by a static compilation phase yielding 100x speedups. Interestingly, this whole goal contradicts Paul Graham’s 2001-era view that “true macro systems aren’t compatible with static type systems.”
- support diverse analyses — The best two already-cited examples here are the JastADD grammars formalism and the BDDBDDB Datalog-based analysis specification. These are radically different but potentially very synergistic. I’d be fascinated to see whether there’s some deeper commonality between them….
- analytically composable — The JastADD framework seems the best example of a declarative structure that supports automatic weaving of multiple analyses. For evidence to this effect, consider that the JastADD folks claim that the Polyglot team is reimplementing their framework in JastADD to avoid the difficulties of scheduling dozens of analysis passes.
I plan to start experimenting with some prototypes of an Eclipse plugin for an extensible language framework along these lines, likely starting with something much like OMeta and extending it with a JastADD-like rewritable grammar formalism. This will be open source all the way. I would enthusiastically welcome all pointers to similar projects, all interest in helping with such a framework, and all critical comments on all or part of this manifesto!
(Disclaimer: our family will also be moving out of our house in the next three months, so progress may well be slow 🙂
Subscribe to comments with RSS.