Overview of the Polyglot Architecture
A Polyglot-based compiler for an extended language (“Ext”) translates code in a series of steps that are illustrated in this figure:
Parsing
The first step in compilation is parsing the input files (in UTF-8 format by
default) into an abstract syntax representing the source language. Parsing is
done by a parser that, in general, extends the base Java parser. Polyglot comes
with an extended version of the CUP parser called PPG (for
Polyglot Parser Generator). PPG adds features for inheriting and extending a
grammar with new symbols, nonterminals, and productions, and also for removing
grammar features. Support for parsing is found in the parser
subdirectory of Polyglot and (ordinarily) of its extensions.
The parser constructs the AST for the parsed code by using a NodeFactory
object associated with the current language extension to build the representation of
each AST node. The NodeFactory
class has one method for each AST node type
in the base language. Node factories for extended languages may add additional
methods to be called from the parser, and also may override how the base language constructs
nodes.
Nodes are represented by a small data structure that, in general, includes both
Node
objects and extension objects (implementing Ext
).
Extension objects contain the state and operations associated with the
particular extension layer.
For example, code to implement compiler passes added by an extension will be found
in the corresponding extension object.
In general, extensions may be layered, so multiple extension objects may be part of a
single node. In the case of the Java 7 extension, which is layered on top of the Java
5 extension object, the data structure representing an AST node looks like the following:
Extension objects are stored in a doubly-linked list, and each extension object has a reference to the root Node object.
The job of finding the appropriate extension object that implements a given compiler pass is not performed by the extension objects themselves; instead, a separate language dispatcher object does this. This design allows the same AST to be implicitly transformed between languages during compilation by simply changing the language dispatcher being used, without building new AST nodes.
Compiler passes
The main work of the compiler is done by a series of compiler passes. Each pass traverses the AST and transforms it into a new AST. A pass can also record information to be used elsewhere in the compiler. For example, the type checking pass computes the type of each language expression.
Compiler passes are scheduled in order to complete goals defined by the language extension. Goals declare which other goals they depend upon, and the Polyglot framework automatically computes the correct order in which to run passes to achieve the goals while satisfying prerequisite dependencies. Thus, language extensions can easily add new passes or dependencies that augment the base goals and dependencies of Java.
The end result of running the compiler passes is a Java AST that describes a correct Java program. In addition, the target code is augmented with static variables whose values encode the source-level (Ext) type information necessary to support separate compilation of Ext class files.
The following is the list of standard compiler goals built into Polyglot to implement Java 1.4 (and, in parentheses, the names of the visitors/passes that achieve them).
- Parsed: parses source files and constructs the initial AST.
- TypesInitialized (TypeBuilder): constructs a type object representing each type declared in the source file and stores it in a Resolver associated with the source file. Resolvers are used to look up types by name. Populates type objects representing class or interfaces with their members.
- Disambiguated (AmbiguityRemover): Uses the scoping rules of the language to resolve ambiguous names, including those found in the declarations of the supertypes of declared types, those found in the signatures of class and interface members, and those found in the bodies of methods.
- TypeChecked (TypeChecker): Perform static type checking.
- ExceptionsChecked (ExceptionChecker): Checks that exceptions are properly declared and caught.
- ReachabilityChecked (ReachChecker): Checks that all statements in each method are reachable from the method entry.
- ExitPathsChecked (ExitChecker): Checks that all paths through methods that should return a value do so.
- InitializationsChecked (InitChecker): Checks that local variables are initialized before use, and final fields are definitely assigned within constructors.
- ConstructorCallsChecked (ConstructorCallChecker): Checks that constructor calls are acyclic.
- ForwardReferencesChecked (ForwardReferenceChecker): Checks that forward references of fields are correct.
- Serialized (ClassSerializer): Serializes type information about a compiled class and injects into the class as a static string constant.
- CodeGenerated (Translator): Generates Java target code.
Code generation
Polyglot is primarily a front end. Language extensions ordinarily rely on some Java compiler such as javac to generate bytecode, though it is possible to add new back ends supporting other target languages. Code generation usually proceeds by using Polyglot's built-in support to generate a pretty-printed ASCII representation of the final Java AST. Polyglot then automatically invokes the Java compiler javac internally to generate bytecode. It is also possible to invoke an external compiler or to skip generation of bytecode.