Polyglot Compiler Tutorial
Introduction
Polyglot is a highly extensible compiler front-end for the Java programming
language. For more than ten years, researchers have used Polyglot to
develop Java language extensions. While Polyglot originally targeted Java 1.4,
its own extension mechanisms have been recently been used to add support for
modern Java features such as generics and annotations. Polyglot has proved
to be a very useful tool for experimenting with new language features and for
building other language-processing tools.
One particularly useful Polyglot extension is the Accrue interprocedural
analysis framework. The Accrue framework simplifies implementation of
interprocedural analyses of programs written using either Java, or extensions
to Java. These analyses can be used for program understanding or as part of the
implementation of the language.
In this tutorial, we explore how to use Polyglot and the Accrue
framework to build language extensions and program analyses.
Design philosophy
Polyglot has been successful because of its design philosophy:
- It is designed to support building complex language extensions that significantly modify the behavior of the base language, Java.
- Polyglot is built in Java and does not require coding in a specialized language.
- Its design patterns support modular extensibility in which the extended languages can be implemented while using the original Polyglot code as an unmodified library.
- Further, its extensibility is scalable: coding effort is proportional to the amount of functionality added.
- Compiler extensions can be layered on top of previous extensions, allowing languages to be built up incrementally.
Support for complex language extensions
Polyglot is not just a preprocessor; it supports the development
of complex language extensions that add new features to the Java language,
including to its type system. The base Polyglot framework implements an
extensible compiler for the base language Java 1.4. This framework, also
written in Java, is by default simply a semantic checker for Java. An
implementation of a language extension may extend the framework to define any
necessary changes to the compilation process, such as extending the abstract
syntax tree (AST) and adding new compiler passes that
analyze and transform the program.
Polyglot has been used to build extensions that change Java in very significant
ways, such as supporting information flow labels (Jif), aspects (abc) and
distributed computation (X10, Fabric). In fact, Java 5 and Java 7 are
implemented as successive extensions to the base compiler. Because Polyglot
implements Java 7, it is able to compile itself.
The standard back end of Polyglot generates pretty-printed Java code. It
is also possible to add new back ends. In the usual mode of use, all static
checking is performed by Polyglot and the extension code, and the back end
compiler is handed fully correct Java code. Error messages are generated
with respect to the original source code rather than relying on the back end
compiler to generate messages, which would likely be less understandable.
Developing in Java with design patterns
Unlike some other recent extensible compilers, Polyglot is
built using a standard programming language, Java. It's not necessary to
learn a new programming language, and existing libraries and IDEs can be
used to develop a Polyglot-based compiler. Polyglot was originally implemented
using Java 1.4, but it has been able to grow along with Java and to
take advantage of the new language features added in later versions of
the language, such as generics.
Avoiding domain-specific language support does have a price, however. Even
though object-oriented languages like Java are designed to support
extensibility, a compiler is a particularly challenging kind of software to
build in an extensible way. Compilers contain both complex data structures and
complex algorithms, both of which may need to be extended. To make this possible
without relying on support from specialized language features, Polyglot is
implemented using a distinctive set of design patterns.
Patterns for modular, scalable extensibility
The difficulty of extending in a type-safe way both types and the procedures
that manipulate them was observed early by Reynolds and is often called
the “Expression problem”. This problem
is encountered when extending compilers: the types to be extended are the
abstract syntax tree nodes used to represent the program, and the procedures to
be extended are the compiler passes that traverse and transform this AST.
Solutions to the Expression problem often have the problem that they are
not scalable, in the sense that the amount of code needed to construct
an extension is proportional to the size of the code base being extended, rather
than to the size of the change being made.
To provide modular, scalable extensibility, Polyglot uses several design
patterns:
-
Careful separation of interfaces and implementations. For example, all AST
nodes (e.g,
Node
) are represented by interfaces with a standard implementation (e.g.,Node_c
) that can be replaced. - A modified version of the Visitor pattern supports incremental, functional-style (side-effect-free) translation of ASTs. This pattern makes it convenient to split the work into a sequence of small compiler passes that each does only a small, modular task.
-
The Abstract Factory pattern is used to avoid binding the syntax of
the language to specific classes representing abstract syntax nodes.
Instead, these objects are created using a
NodeFactory
object. -
Extension objects allow Polyglot to mix in additional state and
operations to existing abstract syntax tree nodes. Each layer of extension
may add its own layer of extension objects to each node in the language.
They are created by
ExtFactory
factory objects. - Language dispatcher objects handle the dispatching of AST node operations to the appropriate extension object, performing the transformation that is appropriate to the current language extension.