Extending the Abstract Syntax Tree
Programming language extensions usually introduce at least one new syntactic
construct that requires a representation as a node in the AST. For example,
in the
CArray
extension, the new syntactic construct is the
const
keyword.
Extending the parser in the last section emphasizes this need, as the AST node
factory for CArray
must supply the parser with a new factory method
that creates a type node for constant arrays. This tutorial section describes
the steps required to have the compiler recognize and use the new construct
properly.
In general, adding a new AST node involves the following modifications to the
existing compiler:
- Create a class whose instance represents the new AST node.
- Extend the node factory to provide a method that returns an instance of the AST class created.
- Extend the extension factory to provide a hook for future language extensions to override functionality defined in this extension.
CArray
.
Creating a new AST class
In Polyglot, every AST node is an instance of interface
Node
in
package polyglot.ast
. Every interface defining AST components and
operations on an AST node ultimately extends Node
. Every AST class
is a subclass of abstract class Node_c
, which implements
Node
to provide default AST operations and define common AST
components, such as the textual position of the AST node in the source file.
In general, a new AST class can be defined by extending an existing AST class
that is most similar to the new class.
For example,
CArray
requires a new AST class that represents a type
node for constant arrays. The most similar AST class already defined in
Polyglot is ArrayTypeNode_c
, which implements interface
ArrayTypeNode
.
A good practice when creating a new AST class is to define the corresponding
interface, where the suffix
_c
is stripped off the class name.
Hence, to create a class for constant-array type nodes, we first create
interface ConstArrayTypeNode
in package carray.ast
,
extending ArrayTypeNode
:
ConstArrayTypeNode.javaIt turns out that this new type node need not define any new components or operations because
ArrayTypeNode
already provides enough, so the
interface body is empty.
Next, we create the actual class
ConstArrayTypeNode_c
that extends
ArrayTypeNode_c
and implements ConstArrayTypeNode
:
package carray.ast; import polyglot.ast.ArrayTypeNode_c; // TODO: Add more imports. /** * A {@code ConstArrayTypeNode} is a type node for a non-canonical * const array type. */ public class ConstArrayTypeNode_c extends ArrayTypeNode_c implements ConstArrayTypeNode { // TODO: Implement this class. }Two items are worth noticing:
- Since
ConstArrayTypeNode_c
extendsArrayTypeNode_c
, we need to define a constructor forConstArrayTypeNode_c
that invokes a constructor inArrayTypeNode_c
. Each array type node has a textual position and the type node for the base type of the array type. For example, the type node for the base type ofint[]
is a canonical type node for typeint
, and the type node for the base type ofString[][]
is an array type node for typeString[]
. We don't need to add any additional child nodes to ourConstArrayTypeNode
s, so we simply add the following constructor:/* Also add these imports: * import polyglot.ast.TypeNode; * import polyglot.util.Position; */ public ConstArrayTypeNode_c(Position pos, TypeNode base) { super(pos, base); }
-
The documentation for
ConstArrayTypeNode
suggests that aConstArrayTypeNode
represents a non-canonical type. In Polyglot, a type is canonical if it does not contain an ambiguous name. For example, immediately after parsing, the type node for the base type ofString[]
is an ambiguous type node for a type whose name isString
. These ambiguous names are disambiguated by a resolver that assigns the names with appropriate entities. For example, Polyglot'sImportTable
might assign the nameString
with typejava.lang.String
for the type node above, or, in rare circumstances, some other classString
in the context of the above type node.If a type is not ambiguous, it is canonical. A canonical type is represented in an AST by aCanonicalTypeNode
. The disambiguation pass will automatically replace non-canonical type nodes with aCanonicalTypeNode
once the type represented by the type node is canonical. This means that during disambiguation, anyConstArrayTypeNode
nodes in the AST will be replaced withCanonicalTypeNode
nodes once thebase
AST node represents a canonical type. SeeArrayTypeNode_c.disambiguate(AmbiguityRemover)
for details.The next section covers how to add a new type in a type system, and completes the initial implementation of ourCArray
extension.AST nodes are serializable, so we also add aserialVersionUID
generated by a Polyglot utility class:private static final long serialVersionUID = SerialVersionUID.generate();
Exercise
Override thetoString
method inConstArrayTypeNode_c
to display the type name properly. As a guide, consider the implementation oftoString
inArrayTypeNode_c
:@Override public String toString() { return base.toString() + "[]"; }
wherebase
is theTypeNode
of the base type, which has one fewer dimension than the type being displayed. For example,int[]
is the result of displaying the base type ofint[][]
, andint
is the result of displaying the base type ofint[]
. On the other hand, for constant-array type nodes,int const[]
is the result of displaying the base type ofint const[][]
, andint
is the result of displaying the base type ofint const[]
. Remember that the base type node of a multidimensional constant-array type node is also a constant-array type node.Solution: + Reveal...@Override public String toString() { String result = base.toString(); if (base instanceof ConstArrayTypeNode) { // If base is also a ConstArrayTypeNode, the keyword "const" would // have been displayed, so just add additional dimension here. result += "[]"; } else result += " const[]"; return result; }
Extending the node factory
In Polyglot, every node factory is an instance of interfaceNodeFactory
in packagepolyglot.ast
. Language extensions extendNodeFactory
to define new factory methods for new AST constructs.CArray
needs a new factory method for creating an instance ofConstArrayTypeNode
. Again, a good practice is to use the_c
suffix to name the class implementing a given node factory interface.The extension generator already created a template node factory, which looks as follows:CArrayNodeFactory.java
The signature of each factory method in the node factory follows that of the constructor for the corresponding AST class. Therefore, adding a factory method forConstArrayTypeNode
results in the following completed interface:CArrayNodeFactory.java
On the implementation side, the extension generator also created this template:CArrayNodeFactory_c.java
To implementConstArrayTypeNode
factory method, we need to return an instance ofConstArrayTypeNode
. At first glance, the following implementation appears to suffice:@Override public ConstArrayTypeNode ConstArrayTypeNode(Position pos, TypeNode base) { // Create the AST node to be returned. ConstArrayTypeNode_c n = new ConstArrayTypeNode_c(pos, base); return n; }
However, this implementation is not extensible. For the purposes of this tutorial, we will make theCArray
extension extensible. This is done by providing a hook for future extensions to override compiler passes available in the language we are implementing. This hook is provided by extension objects.Extension objects are created by a factory method in an extension factory. Each AST node has a corresponding method in the extension factory to create extension object(s) for the AST node. For theCArray
language, the extension factory isCArrayExtFactory
. Assume for the moment thatCArrayExtFactory
has a methodextConstArrayTypeNode()
that creates extension objects forConstArrayTypeNode
nodes. We use this method to create appropriate extension objects for the newly createdConstArrayTypeNode
. Thus, the correct implementation of the node factory method is the following:@Override public ConstArrayTypeNode ConstArrayTypeNode(Position pos, TypeNode base) { // Create the AST node to be returned. ConstArrayTypeNode_c n = new ConstArrayTypeNode_c(pos, base); n = ext(n, extFactory().extConstArrayTypeNode()); return n; }
An AST node, along with attached extension objects, is a complete representation for syntactic construct in a language. ForCArray
, the final look of the completed node factory implementation is as follows:CArrayNodeFactory_c.java
Next, we will extend the extension factory to supply the node factory with the factory method, whose presence we assumed above.Extending the extension factory
Extension factories enable languages to easily modify and extend the functionality of existing languages. In Polyglot, every extension factory is an instance of interfacepolyglot.ast.ExtFactory
. Language extensions extendExtFactory
to define new factory methods to create extension objects for new AST constructs.CArray
needs a new factory methods for creating extension objects forConstArrayTypeNode
.The extension generator already created a template extension factory. Each factory method in the extension factory takes no argument. The naming convention is that the prefixext
names the factory method for a given AST construct. Therefore, adding a factory method forConstArrayTypeNode
results in the following completed interface:CArrayExtFactory.java
Once again, a good practice is to use the_c
suffix to name an implementation of a given extension factory interface. To make the extension extensible, however, two classes implement the factory interface, both of which were generated by the extension generator scriptbin/newext.sh
:CArrayAbstractExtFactory_c
: This class primarily exists to make it easy for further extensions of ourCArray
language. The extension factory of these future extensions will extend this class to reuse its factory methods. The abstract extension factory provides an infrastructure for creating extension objects for new kinds of AST nodes (i.e., in our case, forConstArrayTypeNode
).CArrayExtFactory_c
: This class extendsCArrayAbstractExtFactory_c
and defines extension factory methods specifically forCArray
, i.e., to create extension objects to provide functionality for theCArray
language. This class is final, and future languages that extendCArray
do not reuse it.
CArray
to deal with semantic checking and source-to-source translation.Implementing methods in the abstract extension factory requires understanding a few prerequisites:- The role of extension objects.
- The organization of a collection of extension objects.
- The structure of extension factories.
Role of extension objects
Extension objects are designed for overriding existing operations or defining new operations on existing AST nodes. Thebin/newext.sh
script that we used to create the skeleton of theCArray
extension createdCArrayExt
, a class for extension objects forCArray
. This class simply forwards all the existing operations to the language being extended. Thus, whenCArrayExt
is subclassed to provide extension objects for specific kinds of AST nodes, there is no need to provide implementations for operations whose behavior is not modified.Organization of extension objects
An AST node in Polyglot will have zero or more extension objects. The AST node and extension objects form a linked list, where the first object in the list is the AST node for the base language. The next object in the list is the extension object for the next extension, and the last object is the extension object for the deepest extension. The following are some examples of the structure of AST nodes in Polyglot for different languages:+----------+ Java 1.4 | Java 1.4 | | (base) | +----------+ +----------+ +--------+ Java 5 | Java 1.4 |--->| Java 5 | | (base) | | (ext) | +----------+ +--------+ +----------+ +--------+ +--------+ Java 7 | Java 1.4 |--->| Java 5 |--->| Java 7 | | (base) | | (ext) | | (ext) | +----------+ +--------+ +--------+ +----------+ +--------+ +--------+ +--------+ Java 8 | Java 1.4 |--->| Java 5 |--->| Java 7 |--->| Java 8 | | (base) | | (ext) | | (ext) | | (ext) | +----------+ +--------+ +--------+ +--------+ +----------+ +--------+ CArray | Java 1.4 |--->| CArray | | (base) | | (ext) | +----------+ +--------+
Polyglot uses a factory pattern to create extension objects for AST nodes. The extension factory is responsible for creating all of the extension objects for an AST node, and forming them into a list in the correct order. Typically, each language extension has its own extension factory, and the extension factories are composed together, as described in the next section.Structure of extension factories
Extension factories may be composed together to form a linked list. In contrast with the list of extension objects for AST nodes, the first factory in the list is one for the deepest language extension, and the last for the shallowest language extension. For the base language (Java 1.4), no extension factory exists. The following are examples of the structure of extension factories for the languages shown above:Java 1.4 N/A +---------------+ Java 5 | JL5ExtFactory | +---------------+ +---------------+ +---------------+ Java 7 | JL7ExtFactory |--->| JL5ExtFactory | +---------------+ +---------------+ +---------------+ +---------------+ +---------------+ Java 8 | JL8ExtFactory |--->| JL7ExtFactory |--->| JL5ExtFactory | +---------------+ +---------------+ +---------------+ +------------------+ CArray | CArrayExtFactory | +------------------+
Extension factories are traversed from head to tail so that the extension object of the deepest extension is created first. The tail of the factory chain is used to create the remaining extension objects. All of these extension objects are finally chained together in the correct order.Implementing abstract extension factory
Let's consider the method in an extension factory to create extension objects for a specific kind of AST node, say integer literals. First, the extension object for the current language extension is created. Second, the remaining extension objects (for the other language extensions) are created. Third, all the extension objects are composed. Finally, any last-minute modification to the composed extension object can be made.Language extensions typically only modify the first and the last steps, so these steps are refactored into separate methods. For example, here is the method to create extension objects for field assignment expressions (i.e.,e1.f = e2
) from the classAbstractExtFactory_c
, the superclass of all extension factories.@Override public final Ext extFieldAssign() { Ext e = extFieldAssignImpl(); if (nextExtFactory != null) { Ext e2 = nextExtFactory.extFieldAssign(); e = composeExts(e, e2); } return postExtFieldAssign(e); }
Note that the methodextFieldAssign
is final. The extension factory for a specific language extension may overrideextFieldAssignImpl
orpostExtFieldAssign
to implement language-specific behavior. The utility methodcomposeExts(Ext,Ext)
is used to compose extension objects in the correct order. The default implementation ofextFieldAssignImpl
is simply to invokeextAssignImpl
, since the classFieldAssign
is a subclass ofAssign
. That is, a language extension could provide an extension object that modifies the behavior of all assignment expressions (field assignments, local assignments, and array access assignments) by overriding the methodextAssignImpl
. Similarly, the default implementation ofpostExtFieldAssign
is to invokepostFieldAssign
. In general, the default implementation inAbstractExtFactory_c
of a method to create an extension object is to simply invoke the corresponding method of the next most general kind of AST node. This makes it easy for a language extension to override the behavior of many kinds of AST nodes.Now we are ready to implement theCArray
extension factory. BecauseCArray
adds a new kind of AST node (i.e,ConstArrayTypeNode
), we need to define a method inCArrayAbstractExtFactory_c
for creating extension objects for this kind of AST node. The following code follows the same pattern as the code for field assignment expressions above.@Override public final Ext extConstArrayTypeNode() { // First, create the extension object for this extension. Ext e = extConstArrayTypeNodeImpl(); // Now, create the extension objects for the remaining extensions. // TODO: e2 stores remaining extension objects. // Chain all extension objects together. e = composeExts(e, e2); // Perform any last-minute modification to the extension object chain. return postExtConstArrayTypeNode(e); }
The default implementation ofextConstArrayTypeNodeImpl
andpostExtConstArrayTypeNode
simply delegate to the next most general kind of AST node:ArrayTypeNode
.protected Ext extConstArrayTypeNodeImpl() { // By default, the extension does not override any operations, so we // simply ask for the extension object of the superclass of // ConstArrayTypeNode, which may override some operations. return extArrayTypeNodeImpl(); } protected Ext postExtConstArrayTypeNode(Ext e) { // Again, nothing is done by default, so pass the argument up the AST // hierarchy in case some ancestors do something else. return postExtArrayTypeNode(e); }
Creating the remaining extension objects is the most sophisticated step, but the intuition explained above helps. The next extension factory in the chain is examined to determine whether it supportsConstArrayTypeNode
s. If so, use the extension factory method for them (i.e.,extConstArrayTypeNode
); otherwise, use one for the superclass of that AST kind (i.e.,extArrayTypeNode
).// Now, create the extension objects for the remaining extensions. ExtFactory nextEF = nextExtFactory(); Ext e2; if (nextEF instanceof CArrayExtFactory) { // If the next extension is also a CArray extension factory, use // the most specific factory method, which is a recursive call to // this method. e2 = ((CArrayExtFactory) nextEF).extConstArrayTypeNode(); } else { // Otherwise, the next most specific factory method is one for // ArrayTypeNode, so invoke that instead. e2 = nextEF.extArrayTypeNode(); }
We have implemented all the steps. The following is the full implementation ofCArrayAbstractExtFactory_c
.CArrayAbstractExtFactory_c.java
Note that it is possible to leave the generated abstract extension factory class unmodified, as long as the language extension does not define any new kind of AST nodes. The concrete extension factory class (CArrayExtFactory_c
) will override some of the auxiliary methods to differentiate operations on ASTs. We will do this in later sections.