Polyglot is an extensible Java compiler front end. The base polyglot compiler, jlc (“Java language compiler”), is a mostly-complete Java front end; that is, it parses and performs semantic checking on Java source code. The compiler outputs Java source code. Thus, the base compiler implements the identity translation.
Language extensions are implemented on top of the base compiler by extending the concrete and abstract syntax and the type system, and by defining new code transformations. The end product is a Java abstract syntax tree (AST) that is output into a Java source file, which is compiled with javac. (For historical reasons, some extensions just override some portions of the Java output code to handle the extended syntax of the particular language extension being compiled rather than rewriting the AST.)
The Polyglot compiler is structured as a set of passes over source files that ends with the output of Java source code. The passes parse the original source language and create an AST, rewrite the AST to eliminate any ambiguities, type check the AST, possibly rewrite the AST to another AST, then output the AST as Java source code.
When the compiler is invoked (through polyglot.main.Main.main()
), it
parses the command line (setting options in polyglot.main.Options
), then
creates a compiler object (an instance of polyglot.frontend.Compiler
) to
manage the compilation process. An important job of the command line
parser is to identify the language extension (specified on the command
line with -ext L, where L is the name of the extension), and load the
extension (from polyglot.ext.L.ExtensionInfo
). The compiler uses the
extension to determine several important features of the language,
including its source file extension, AST node factory, type system, and
pass schedule.
Parsing is done with the Java CUP parser generator and a Polyglot
extension to CUP called PPG. PPG allows CUP files to be selectively
extended to create parsers for extension languages by providing
operations on a CUP grammar, including adding, dropping, and renaming of
productions. The JFlex lexer generator is used to create a lexer for
the source language. The semantic actions in the parser create an AST
through a NodeFactory
, which is a class containing factory methods for
creating AST nodes.
After the AST has been created by the parser, a series of passes is performed upon it. The passes for a language extension, as well as the order in which they should be run, are defined in the extension's ExtensionInfo class. The compiler object runs the passes in the order specified so that that dependencies between compilation units are satisfied. Most passes are implemented using a modified version of the Visitor design pattern, described later (see also TR 2002-1871). The default set of passes is:
TypeBuilder
): Constructs a Type
object representing
each type in the source file and stores it in a Resolver
associated
with the source file. Resolver
s are used to lookup types by name.
AmbiguityRemover
): Removes any ambiguities found
in the declaration of the supertypes of a type (e.g., the extends
clause).
AmbiguityRemover
): Removes any ambiguities found in
the signatures of class or interface members.
AddMemberVisitor
): Adds the members of a class or
interface to its type object.
AmbiguityRemover
): Removes any ambiguities found
in the bodies of methods, constructors, or initializers.
TypeChecker
): Performs semantic analysis for Java.
ExceptionChecker
): Performs semantic analysis
upon exception declaration and propagation.
ReachChecker
): Checks that all statements
in each method are reachable from the method entry.
ExitChecker
): Checks that all paths through methods
that should return a value do so.
InitChecker
): Checks that local variables
are initialized before use.
PrettyPrinter
): Optional. A debugging pass that outputs the AST.
ClassSerializer
): Optional. Serializes information
about a compiled class and injects it into the class for output
during translation. This enables separate compilation of
extension languages.
Translator
): Transforms each AST node to a String
and writes it to an output file.
After many of these passes are “barrier” passes (implemented by
BarrierPass
). A barrier pass compiles all source files on which a given
source depends up to the same barrier. This ensures that enough of
the type information of dependent sources has been computed before the
compilation continues.
The ambiguities referred to in the above passes are ambiguities resulting from classification of names in Java. Some names are syntactically ambiguous because their meaning cannot be determined without some semantic analysis (see JLS2 6.5.2). Extensions may also introduce new ambiguities that require resolution.
Extensions will usually insert passes before type checking to perform some initial semantic analysis or after it to do some final semantic analysis, and between exception checking and translation to rewrite the AST.
All Polyglot code is in the package polyglot
. The subpackages are as
follows:
ast
- the AST node interface files. All AST nodes implement the
polyglot.ast.Node
interface.types
- the type system interface.
types/reflect
- class file parsing code.
visit
- visitor classes which iterate over abstract syntax trees.
frontend
- compiler pass scheduling code.
main
- code for the main method of the compiler in the class polyglot.main.Main. It includes code for parsing command line options and for debug output.
util
- utility code. This includes the parser generator in util/ppg.
lex
- lexer utility code.
parse
- parser utility code.
ext
- code for language extensions. Source code for a language extension lives in the package
polyglot.ext.<ext-name>
. The default language extension is the "jl
" extension which implements Java parsing and type checking. Extensions are usually implemented by inheriting from the "jl
" extension code. Extensions usually have the following subpackages:
ext.<ext-name>.ast
- AST nodes specific to the extension
ext.<ext-name>.extension
- New extension and delegate objects specific to the extension
ext.<ext-name>.types
- type objects and typing judgments specific to the extension
ext.<ext-name>.visit
- visitors specific to the extension
ext.<ext-name>.parse
- the parser and lexer for the language extension
In addition, an extension must define the class
ext.<ext-name>.ExtensionInfo
, which contains the objects which define how the language is to be parsed and type checked. There should also be a classext.<ext-name>.Version
defined, which specifies the version number of the extension. TheVersion
class is used as a check when extracting extension-specific type information from .class files.
To allow for greater flexibility in overriding the behavior of an AST node, each node has a pointer to a delegate object and a (possibly null) list of extension objects. Extension objects are useful for adding a field or a method to many different AST nodes. They provide functionality similar to mixins. Their purpose is to allow a uniform extension of many AST nodes, not to be the primary vehicle through which a language extension is implemented. Delegate objects are similar to extension objects and are used for overriding existing methods of many different AST nodes. For more details, see the tech report (Cornell CS-TR 2002-1883).
In order for the delegates to override the AST node, most calls to the AST node object should be dispatched through the delegate object. The default delegate of every AST node just calls the corresponding method in the AST node.
So for instance, to invoke the typeCheck()
method on an AST node n, we
do:
n.del().typeCheck(type_checker);
instead of directly calling:
n.typeCheck(type_checker);
To reduce the proliferation of classes, all nodes in the base compiler use the same delegate class. For each compiler pass, the delegate invokes a method in the AST node that implements the pass. Thus, in the base compiler, passes are implemented in the AST nodes themselves. Besides reducing the number of classes, this approach also permits more convenient access to instance variables of the nodes; delegates access the instance variables of their associated node through accessor methods.
In writing a language extension, the designer should avoid using this approach and put the pass implementation in the delegates themselves; this leads to less work in the number of AST nodes that need to be extended.
In deciding whether to put add functionality via inheritance, an extension object, or a delegate object, use the following guidelines:
If the designer chooses to use delegates or extensions, delegate factories and extension factories simplify the task of instantiating appropriate delegate and extension objects respectively. See below for more information on node, extension and delegate factories.
Suppose you want to create language L that extends the Java language. First, you need to design L. Your design process should include the following tasks:
Next, you can implement L by creating a Polyglot extension. Implementing the extension will require the following tasks.
build.xml
to add a target for the new extension. This can
usually be done by copying and modifying the skel
target.
(Optionally) Begin with the skeleton extension found in polyglot/ext/skel. Run the customization script found at polyglot/ext/newext, which will copy the skeleton to polyglot/ext/L, and substitute your language's name at all the appropriate places in the skeleton.
Semantic changes that are localized to an AST node will probably
be implemented by overriding that node's typeCheck()
method.
Semantic changes that affect more fundamental properties of the
Java type system will probably be implemented by overriding
appropriate methods in polyglot/ext/L/types/LTypeSystem_c.java.
Let's make this more concrete by introducing an actual extension. We'll
use the “Primitives as Objects” (Pao) extension, which
extends Java 1.4 with the ability to use primitive types (e.g., int,
float) as Object
s via autoboxing. For example, in Pao we can write:
Map m = new HashMap(); m.put(1, 2); int x = (int) m.get(1);
The changes to Java needed to support this feature are quite minimal.
relational_expression ::= ... | relational_expression:a INSTANCEOF reference_type:b ; In order to allow primitives, we should change this to: relational_expression ::= ... | relational_expression:a INSTANCEOF type:b ;
Object
. That means for all primitive types P
where P != void
,
P <: Object
(Polyglot defines void
as a primitive type, but void
has
no values). We'll want to use this relationship in assignments and
casting, as shown in the example above. Also, we'll need to allow
primitive types to appear inside an instanceof operator.
Object
, we
should box the value and insert a cast to Object
. We also need to unbox
primitives when casting from Object
to a primitive type. For completeness,
we also rewrite the operation ==
to have it compare
boxed values by value
rather than by pointer. This gives the illusion that all primitives
with the same value are boxed into the same object.
We create the extension as follows. The complete extension is in the Polyglot distribution at polyglot/ext/pao.
$ cd $POLYGLOT/polyglot/ext/pao $ sh ./newext pao Pao pao $ cd $POLYGLOT/polyglot/ext/pao
instanceof
production to
allow any type to be used in an instanceof
expression. This required
only appending the following code to pao.ppg:
extend relational_expression ::= relational_expression:a INSTANCEOF type:b {: RESULT = parser.nf.Instanceof(parser.util.pos(a), a, b); :} ; drop { relational_expression ::= relational_expression:a INSTANCEOF reference_type:b; }The remainder of the file is boilerplate code.
$ cd $POLYGLOT/polyglot/ext/pao/typesWe edit PaoTypeSystem_c.java to override the factory methods for primitive types and top-level class types. We also insert methods to provide access to the runtime boxing classes. We next create a subclass of
PrimitiveType
that overrides the methods:
descendsFrom()
, isImplicitCastValid()
, and isCastValid()
to
allow primitives to be used as Object
s.
We also create a subclass of ParsedClassType
to allow primitives
to be cast to Object
.
PaoExt
, that extends the Ext
interface. This extension interface has the signature for a new method,
rewrite()
, which we will use to rewrite the the new Pao
code into valid Java
code. We also create a class PaoExt_c
which extends Ext_c
and implements
PaoExt
. The default action for the rewrite()
function is to return the node
unchanged, which is the behavior that is desired for most nodes.
instanceof
operation. To do so, we
create a new delegate, PaoInstanceofDel_c
, that subclasses the JL_c class
in the base compiler. In it, we override the typeCheck()
to allow
primitive types to occur in the instanceof
expression. JL_c implements all
other methods of the JL interface by dispatching back to the node.
7. We define the translation that will take our Pao language to standard
Java by defining the implementation of the rewrite()
function.
By the translation rules that we have defined, three things will need
to be rewritten: casts, instanceof
operations, and the ==
and !=
operations.
In PaoInstanceofExt_c
, we override the rewrite()
method to allow for
instanceof operations on primitive types.
We also create a PaoCastExt_c
which extends PaoExt_c
, in which we override
the rewrite()
method to box and unbox primitives appropriately to allow
casting to and from primitive types.
In addition, we create a PaoBinaryExt_c
that also exends PaoExt_c
, which
overrides the rewrite()
method to rewrite ==
and !=
expressions to call
Primitive.equals(o, p)
when comparing two Object
s or boxed primitives.
This method allows boxed primitives to be compared using ==
and !=
.
We add a pass to insert explicit casts to Object
when assigning
a primitive to an object. We call this pass PaoBoxer
and
implement it as a visitor.
PaoBoxer
is a subclass of AscriptionVisitor
, which contains code
to locate places where expressions are used. The ascribe()
method
is called for each expression and is passed the type the expression
is used at rather than the type the type checker assigns to it.
For instance, with the following Pao
code:
Object o = 3;
ascribe()
will be called with expression 3
and type Object
.
We override ascribe()
to insert casts when assigning a primitive to
an Object
. We override the visitors leaveCall()
method to call the
rewrite()
method if the node's delegate is an instance of PaoDel.
This
makes sure that all the appropriate nodes are rewritten to ensure a
proper translation.
NodeFactory
, PaoNodeFactory_c
, that extends
NodeFactory_c
. In this new NodeFactory
we override the defaultExt()
method to make the default delegate the PaoDel_c
, and also override the
InstanceOf
, Cast
, and Binary
methods to return instantiations of the nodes
with the PaoInstanceofDel_c
, PaoCastDel_c
, and PaoBinaryDel_c
delegates.
ExtensionInfo
that defines our extension.
$ cd $POLYGLOT/polyglot/ext/pao
The skeleton generator created most of the necessary code. We modify
the passes()
method to add our new boxing pass. We also create a Version
class that defines the version of Pao
that is being worked on.
Node factories are used to create instances of AST nodes. Extension and delegate factories simplify the task of instantiating appropriate delegate and extension objects for the AST nodes.
Language extensions will typically implement node factories by
extending the NodeFactory_c
class in the package
polyglot.ext.jl.ast
. The NodeFactory_c
class can be given a delegate
factory and/or an extension factory to use. The classes
AbstractDelFactory_c
and AbstractExtFactory_c
in the same package
provide convenient base classes for language extensions to extend.
For any AST node type <node>
, the node factory typically has one or more
methods called <node>
, to create instances of <node>
. The
implementation of these methods in NodeFactory_c
has the following
form:
public <node> <node>(Position pos, ...) { <node> n = new <node>_c(pos, ...); n = (<node>) n.ext(extFactory.ext<node>()); n = (<node>) n.del(delFactory.del<node>()); return n; }
Note that first an object that implements the interface <node>
is
created: <node>_c
. An extension object for the newly created AST node
is obtained by calling the appropriate method on the extension
factory. A delegate object is obtained by a similar call to the
delegate factory. The extension object and/or the delegate object
returned by these calls may be null.
The AbstractExtFactory_c
class implements the ext<node>
methods and
provides convenient hooks for language extensions to override. The
implementation of the ext<node>
method in AbstractExtFactory_c
has the
following form:
public final Ext ext<node>() { Ext e = ext<node>Impl(); return postExt<node>(e); }
The ext<node>Impl()
is responsible for creating an appropriate
Ext
object. The default implementation of these methods in
AbstractExtFactory_c
is simply to call the ext<super>Impl()
method,
where <super>
is the superclass of <node>
. Thus, for example, the
implementation of extArrayAccessImpl
in
AbstractExtFactory_c
is:
protected Ext extArrayAccessImpl() { return extExprImpl(); }
For example, a language extension that needs to provide extension
objects for all expressions and also for class declarations would thus
need to override only two methods of AbstractExtFactory_c
:
extExprImpl()
and extClassDeclImpl()
. Another example: if a language
extension needs to use a single Ext
class for all AST nodes, then only
the single method extNodeImpl()
needs to be overridden.
The postExt<node>(Ext)
methods provide hooks for subclasses to
manipulate Ext objects after they have been created. The default
implementation of these methods in AbstractExtFactory_c
is simply to
call the postExt<node>
.
The structure of the delegate factory AbstractDelFactory_c class is
analogous to that of AbstractExtFactory_c.