Extending the Abstract Syntax Tree
Programming language extensions usually introduce at least one new syntactic
construct that requires a representation as a node in the AST. For example,
in the
CArray
extension, the new syntactic construct is the
const
keyword.
Extending the parser in the last section emphasizes this need, as the AST node
factory for CArray
must supply the parser with a new factory method
that creates a type node for constant arrays. This tutorial section describes
the steps required to have the compiler recognize and use the new construct
properly.
In general, adding a new AST node involves the following modifications to the
existing compiler:
- Create a class whose instance represents the new AST node.
- Extend the node factory to provide a method that returns an instance of the AST class created.
- Extend the extension factory to provide a hook for future language extensions to override functionality defined in this extension.
CArray
.
Creating a new AST class
In Polyglot, every AST node is an instance of interface
Node
in
package polyglot.ast
. Every interface defining AST components and
operations on an AST node ultimately extends Node
. Every AST class
is a subclass of abstract class Node_c
, which implements
Node
to provide default AST operations and define common AST
components, such as the textual position of the AST node in the source file.
In general, a new AST class can be defined by extending an existing AST class
that is most similar to the new class.
For example,
CArray
requires a new AST class that represents a type
node for constant arrays. The most similar AST class already defined in
Polyglot is ArrayTypeNode_c
, which implements interface
ArrayTypeNode
.
A good practice when creating a new AST class is to define the corresponding
interface, where the suffix
_c
is stripped off the class name.
Hence, to create a class for constant-array type nodes, we first create
interface ConstArrayTypeNode
in package carray.ast
,
extending ArrayTypeNode
:
It turns out that this new type node need not define any new components or operations because
ArrayTypeNode
already provides enough, so the
interface body is empty.
Next, we create the actual class
ConstArrayTypeNode_c
that extends
ArrayTypeNode_c
and implements ConstArrayTypeNode
:
package carray.ast; import polyglot.ast.ArrayTypeNode_c; // TODO: Add more imports. /** * A {@code ConstArrayTypeNode} is a type node for a non-canonical * const array type. */ public class ConstArrayTypeNode_c extends ArrayTypeNode_c implements ConstArrayTypeNode { // TODO: Implement this class. }Two items are worth noticing:
- Since
ConstArrayTypeNode_c
extendsArrayTypeNode_c
, we need to define a constructor forConstArrayTypeNode_c
that invokes a constructor inArrayTypeNode_c
. Each array type node has a textual position and the type node for the base type of the array type. For example, the type node for the base type ofint[]
is a canonical type node for typeint
, and the type node for the base type ofString[][]
is an array type node for typeString[]
. We don't need to add any additional child nodes to ourConstArrayTypeNode
s, so we simply add the following constructor:/* Also add these imports: * import polyglot.ast.TypeNode; * import polyglot.util.Position; */ public ConstArrayTypeNode_c(Position pos, TypeNode base) { super(pos, base); }
-
The documentation for
ConstArrayTypeNode
suggests that aConstArrayTypeNode
represents a non-canonical type. In Polyglot, a type is canonical if it does not contain an ambiguous name. For example, immediately after parsing, the type node for the base type ofString[]
is an ambiguous type node for a type whose name isString
. These ambiguous names are disambiguated by a resolver that assigns the names with appropriate entities. For example, Polyglot'sImportTable
might assign the nameString
with typejava.lang.String
for the type node above, or, in rare circumstances, some other classString
in the context of the above type node.If a type is not ambiguous, it is canonical. A canonical type is represented in an AST by aCanonicalTypeNode
. The disambiguation pass will automatically replace non-canonical type nodes with aCanonicalTypeNode
once the type represented by the type node is canonical. This means that during disambiguation, anyConstArrayTypeNode
nodes in the AST will be replaced withCanonicalTypeNode
nodes once thebase
AST node represents a canonical type. SeeArrayTypeNode_c.disambiguate(AmbiguityRemover)
for details.The next section covers how to add a new type in a type system, and completes the initial implementation of ourCArray
extension.
AST nodes are serializable, so we also add a
serialVersionUID
generated by a Polyglot utility class:
private static final long serialVersionUID = SerialVersionUID.generate();
Exercise
Override the
toString
method in ConstArrayTypeNode_c
to display the type name properly. As a guide, consider the implementation of
toString
in ArrayTypeNode_c
:
1 @Override 2 public String toString() { 3 return base.toString() + "[]"; 4 }where
base
is the TypeNode
of the base type, which has
one fewer dimension than the type being displayed. For example,
int[]
is the result of displaying the base type of
int[][]
, and int
is the result of displaying the base
type of int[]
. On the other hand, for constant-array type nodes,
int const[]
is the result of displaying the base type of
int const[][]
, and int
is the result of displaying the
base type of int const[]
. Remember that the base type node of a
multidimensional constant-array type node is also a constant-array type node.
Solution:
+ Reveal...
1 @Override 2 public String toString() { 3 String result = base.toString(); 4 if (base instanceof ConstArrayTypeNode) { 5 // If base is also a ConstArrayTypeNode, the keyword "const" would 6 // have been displayed, so just add additional dimension here. 7 result += "[]"; 8 } 9 else result += " const[]"; 10 return result; 11 }
Extending the node factory
In Polyglot, every node factory is an instance of interface
NodeFactory
in package polyglot.ast
. Language
extensions extend NodeFactory
to define new factory methods for new
AST constructs. CArray
needs a new factory method for creating an
instance of ConstArrayTypeNode
. Again, a good practice is to use
the _c
suffix to name the class implementing a given node factory
interface.
The extension generator already created a template node factory, which looks as
follows:
The signature of each factory method in the node factory follows that of the constructor for the corresponding AST class. Therefore, adding a factory method for
ConstArrayTypeNode
results in the following completed
interface:
On the implementation side, the extension generator also created this template:
To implement
ConstArrayTypeNode
factory method, we need to return
an instance of ConstArrayTypeNode
. At first glance, the following
implementation appears to suffice:
1 @Override 2 public ConstArrayTypeNode ConstArrayTypeNode(Position pos, TypeNode base) { 3 // Create the AST node to be returned. 4 ConstArrayTypeNode_c n = new ConstArrayTypeNode_c(pos, base); 5 return n; 6 }However, this implementation is not extensible. For the purposes of this tutorial, we will make the
CArray
extension
extensible. This is done by providing a hook for
future extensions to override compiler passes available in the language we are
implementing. This hook is provided by
extension objects.
Extension objects are created by a factory method in an
extension factory. Each AST node has a corresponding
method in the
extension factory to create extension object(s) for the AST node. For
the
CArray
language, the extension factory is
CArrayExtFactory
. Assume for the moment that
CArrayExtFactory
has a method
extConstArrayTypeNode()
that creates extension objects
for ConstArrayTypeNode
nodes. We use this method to
create appropriate extension objects for the newly created
ConstArrayTypeNode
. Thus, the correct implementation of
the node factory method is the following:
@Override public ConstArrayTypeNode ConstArrayTypeNode(Position pos, TypeNode base) { // Create the AST node to be returned. ConstArrayTypeNode_c n = new ConstArrayTypeNode_c(pos, base); n = ext(n, extFactory().extConstArrayTypeNode()); return n; }
An AST node, along with attached extension objects, is a complete representation
for syntactic construct in a language. For
CArray
,
the final look of the completed node factory implementation is as follows:
Next, we will extend the extension factory to supply the node factory with the
factory method, whose presence we assumed above.
Extending the extension factory
Extension factories enable languages to easily modify and extend the
functionality of existing languages.
In Polyglot, every extension factory is an instance of interface
polyglot.ast.ExtFactory
. Language
extensions extend ExtFactory
to define new factory methods to
create extension objects for new AST constructs. CArray
needs a
new factory methods for creating extension objects for
ConstArrayTypeNode
.
The extension generator already created a template extension factory. Each
factory method in the extension factory takes no argument. The naming
convention is that the prefix
ext
names the factory method for a
given AST construct. Therefore, adding a factory method for
ConstArrayTypeNode
results in the following completed interface:
Once again, a good practice is to use the
_c
suffix to name an
implementation of a given extension factory interface. To make the extension
extensible, however, two classes implement the factory interface, both of which
were generated by the extension generator script
bin/newext.sh
:
CArrayAbstractExtFactory_c
: This class primarily exists to make it easy for further extensions of ourCArray
language. The extension factory of these future extensions will extend this class to reuse its factory methods. The abstract extension factory provides an infrastructure for creating extension objects for new kinds of AST nodes (i.e., in our case, forConstArrayTypeNode
).CArrayExtFactory_c
: This class extendsCArrayAbstractExtFactory_c
and defines extension factory methods specifically forCArray
, i.e., to create extension objects to provide functionality for theCArray
language. This class is final, and future languages that extendCArray
do not reuse it.
CArray
to deal with
semantic checking and source-to-source translation.
Implementing methods in the abstract extension factory requires understanding a
few prerequisites:
- The role of extension objects.
- The organization of a collection of extension objects.
- The structure of extension factories.
Role of extension objects
Extension objects are designed for overriding existing operations or defining
new operations on existing AST nodes. The
bin/newext.sh
script that we used to create the
skeleton of the CArray
extension created
CArrayExt
, a class for extension objects for
CArray
. This class simply forwards all the
existing operations to the language being extended. Thus, when
CArrayExt
is subclassed to provide extension objects
for specific kinds of AST nodes, there is no need to provide
implementations for operations whose behavior is not modified.
Organization of extension objects
An AST node in Polyglot will have zero or more extension objects.
The AST node and extension objects form a linked list, where the first object
in the list is the AST node for the base language. The next object in the list
is the extension object for the next extension, and the last object is the
extension object for the deepest extension. The following are some examples of
the structure of AST nodes in Polyglot for different languages:
+----------+ Java 1.4 | Java 1.4 | | (base) | +----------+ +----------+ +--------+ Java 5 | Java 1.4 |--->| Java 5 | | (base) | | (ext) | +----------+ +--------+ +----------+ +--------+ +--------+ Java 7 | Java 1.4 |--->| Java 5 |--->| Java 7 | | (base) | | (ext) | | (ext) | +----------+ +--------+ +--------+ +----------+ +--------+ +--------+ +--------+ Java 8 | Java 1.4 |--->| Java 5 |--->| Java 7 |--->| Java 8 | | (base) | | (ext) | | (ext) | | (ext) | +----------+ +--------+ +--------+ +--------+ +----------+ +--------+ CArray | Java 1.4 |--->| CArray | | (base) | | (ext) | +----------+ +--------+Polyglot uses a factory pattern to create extension objects for AST nodes. The extension factory is responsible for creating all of the extension objects for an AST node, and forming them into a list in the correct order. Typically, each language extension has its own extension factory, and the extension factories are composed together, as described in the next section.
Structure of extension factories
Extension factories may be composed together to form a linked list. In
contrast with the list of extension objects for AST nodes,
the first factory in the list is one for the deepest
language extension, and the last for the shallowest language extension. For the base language (Java 1.4),
no extension factory exists. The following are examples of the
structure of extension factories for the languages shown
above:
Java 1.4 N/A +---------------+ Java 5 | JL5ExtFactory | +---------------+ +---------------+ +---------------+ Java 7 | JL7ExtFactory |--->| JL5ExtFactory | +---------------+ +---------------+ +---------------+ +---------------+ +---------------+ Java 8 | JL8ExtFactory |--->| JL7ExtFactory |--->| JL5ExtFactory | +---------------+ +---------------+ +---------------+ +------------------+ CArray | CArrayExtFactory | +------------------+Extension factories are traversed from head to tail so that the extension object of the deepest extension is created first. The tail of the factory chain is used to create the remaining extension objects. All of these extension objects are finally chained together in the correct order.
Implementing abstract extension factory
Let's consider the method in an extension factory to create extension
objects for a specific kind of AST node, say integer literals.
First, the extension object for the current language extension is
created. Second, the remaining extension objects (for the other
language extensions) are created. Third, all the
extension objects are composed. Finally, any last-minute modification to the
composed extension object can be made.
Language extensions typically only modify the first and the last steps, so these
steps are refactored into separate methods. For example, here is the
method to create extension objects for field assignment expressions
(i.e.,
e1.f = e2
) from the
class AbstractExtFactory_c
, the superclass of all
extension factories.
@Override public final Ext extFieldAssign() { Ext e = extFieldAssignImpl(); if (nextExtFactory != null) { Ext e2 = nextExtFactory.extFieldAssign(); e = composeExts(e, e2); } return postExtFieldAssign(e); }Note that the method
extFieldAssign
is final. The extension factory for a specific language extension may override
extFieldAssignImpl
or postExtFieldAssign
to implement language-specific behavior. The utility
method composeExts(Ext,Ext)
is used to
compose extension objects in the correct order. The
default implementation of extFieldAssignImpl
is
simply to invoke extAssignImpl
, since the
class FieldAssign
is a subclass of
Assign
. That is, a language extension could
provide an extension object that modifies the behavior of
all assignment expressions (field assignments, local
assignments, and array access assignments) by overriding
the method extAssignImpl
. Similarly, the
default implementation of postExtFieldAssign
is to invoke postFieldAssign
.
In general,
the default implementation in
AbstractExtFactory_c
of a method to create an
extension object is to simply invoke the corresponding method of the
next most general kind of AST node. This makes it easy for
a language extension to override the behavior of many
kinds of AST nodes.
Now we are ready to implement the
CArray
extension factory. Because CArray
adds a new kind of
AST node (i.e, ConstArrayTypeNode
), we need to define a
method in CArrayAbstractExtFactory_c
for creating
extension objects for this kind of AST node. The following code
follows the same pattern as the code for field assignment
expressions above.
@Override public final Ext extConstArrayTypeNode() { // First, create the extension object for this extension. Ext e = extConstArrayTypeNodeImpl(); // Now, create the extension objects for the remaining extensions. // TODO: e2 stores remaining extension objects. // Chain all extension objects together. e = composeExts(e, e2); // Perform any last-minute modification to the extension object chain. return postExtConstArrayTypeNode(e); }The default implementation of
extConstArrayTypeNodeImpl
and
postExtConstArrayTypeNode
simply delegate to the next
most general kind of AST node: ArrayTypeNode
.
protected Ext extConstArrayTypeNodeImpl() { // By default, the extension does not override any operations, so we // simply ask for the extension object of the superclass of // ConstArrayTypeNode, which may override some operations. return extArrayTypeNodeImpl(); } protected Ext postExtConstArrayTypeNode(Ext e) { // Again, nothing is done by default, so pass the argument up the AST // hierarchy in case some ancestors do something else. return postExtArrayTypeNode(e); }Creating the remaining extension objects is the most sophisticated step, but the intuition explained above helps. The next extension factory in the chain is examined to determine whether it supports
ConstArrayTypeNode
s. If so, use the extension
factory method for them (i.e., extConstArrayTypeNode
); otherwise, use one for the superclass of that
AST kind (i.e., extArrayTypeNode
).
// Now, create the extension objects for the remaining extensions. ExtFactory nextEF = nextExtFactory(); Ext e2; if (nextEF instanceof CArrayExtFactory) { // If the next extension is also a CArray extension factory, use // the most specific factory method, which is a recursive call to // this method. e2 = ((CArrayExtFactory) nextEF).extConstArrayTypeNode(); } else { // Otherwise, the next most specific factory method is one for // ArrayTypeNode, so invoke that instead. e2 = nextEF.extArrayTypeNode(); }
We have implemented all the steps. The following is the full implementation of
CArrayAbstractExtFactory_c
.
Note that it is possible to leave the generated abstract extension factory class
unmodified, as long as the language extension does not define any new
kind of AST nodes.
The concrete extension factory class (
CArrayExtFactory_c
) will override some of the
auxiliary methods to differentiate operations on ASTs. We will do this in later
sections.