Extending the Abstract Syntax Tree
Programming language extensions usually introduce at least one new syntactic
construct that requires a representation as a node in the AST. For example,
in the
CArray extension, the new syntactic construct is the
const keyword.
Extending the parser in the last section emphasizes this need, as the AST node
factory for CArray must supply the parser with a new factory method
that creates a type node for constant arrays. This tutorial section describes
the steps required to have the compiler recognize and use the new construct
properly.
In general, adding a new AST node involves the following modifications to the
existing compiler:
- Create a class whose instance represents the new AST node.
- Extend the node factory to provide a method that returns an instance of the AST class created.
- Extend the extension factory to provide a hook for future language extensions to override functionality defined in this extension.
CArray.
Creating a new AST class
In Polyglot, every AST node is an instance of interface
Node in
package polyglot.ast. Every interface defining AST components and
operations on an AST node ultimately extends Node. Every AST class
is a subclass of abstract class Node_c, which implements
Node to provide default AST operations and define common AST
components, such as the textual position of the AST node in the source file.
In general, a new AST class can be defined by extending an existing AST class
that is most similar to the new class.
For example,
CArray requires a new AST class that represents a type
node for constant arrays. The most similar AST class already defined in
Polyglot is ArrayTypeNode_c, which implements interface
ArrayTypeNode.
A good practice when creating a new AST class is to define the corresponding
interface, where the suffix
_c is stripped off the class name.
Hence, to create a class for constant-array type nodes, we first create
interface ConstArrayTypeNode in package carray.ast,
extending ArrayTypeNode:
ConstArrayTypeNode.javaIt turns out that this new type node need not define any new components or operations because
ArrayTypeNode already provides enough, so the
interface body is empty.
Next, we create the actual class
ConstArrayTypeNode_c that extends
ArrayTypeNode_c and implements ConstArrayTypeNode:
package carray.ast;
import polyglot.ast.ArrayTypeNode_c;
// TODO: Add more imports.
/**
* A {@code ConstArrayTypeNode} is a type node for a non-canonical
* const array type.
*/
public class ConstArrayTypeNode_c extends ArrayTypeNode_c implements
ConstArrayTypeNode {
// TODO: Implement this class.
}
Two items are worth noticing:
- Since
ConstArrayTypeNode_cextendsArrayTypeNode_c, we need to define a constructor forConstArrayTypeNode_cthat invokes a constructor inArrayTypeNode_c. Each array type node has a textual position and the type node for the base type of the array type. For example, the type node for the base type ofint[]is a canonical type node for typeint, and the type node for the base type ofString[][]is an array type node for typeString[]. We don't need to add any additional child nodes to ourConstArrayTypeNodes, so we simply add the following constructor:/* Also add these imports: * import polyglot.ast.TypeNode; * import polyglot.util.Position; */ public ConstArrayTypeNode_c(Position pos, TypeNode base) { super(pos, base); } -
The documentation for
ConstArrayTypeNodesuggests that aConstArrayTypeNoderepresents a non-canonical type. In Polyglot, a type is canonical if it does not contain an ambiguous name. For example, immediately after parsing, the type node for the base type ofString[]is an ambiguous type node for a type whose name isString. These ambiguous names are disambiguated by a resolver that assigns the names with appropriate entities. For example, Polyglot'sImportTablemight assign the nameStringwith typejava.lang.Stringfor the type node above, or, in rare circumstances, some other classStringin the context of the above type node.If a type is not ambiguous, it is canonical. A canonical type is represented in an AST by aCanonicalTypeNode. The disambiguation pass will automatically replace non-canonical type nodes with aCanonicalTypeNodeonce the type represented by the type node is canonical. This means that during disambiguation, anyConstArrayTypeNodenodes in the AST will be replaced withCanonicalTypeNodenodes once thebaseAST node represents a canonical type. SeeArrayTypeNode_c.disambiguate(AmbiguityRemover)for details.The next section covers how to add a new type in a type system, and completes the initial implementation of ourCArrayextension.AST nodes are serializable, so we also add aserialVersionUIDgenerated by a Polyglot utility class:private static final long serialVersionUID = SerialVersionUID.generate();Exercise
Override thetoStringmethod inConstArrayTypeNode_cto display the type name properly. As a guide, consider the implementation oftoStringinArrayTypeNode_c:@Override public String toString() { return base.toString() + "[]"; }wherebaseis theTypeNodeof the base type, which has one fewer dimension than the type being displayed. For example,int[]is the result of displaying the base type ofint[][], andintis the result of displaying the base type ofint[]. On the other hand, for constant-array type nodes,int const[]is the result of displaying the base type ofint const[][], andintis the result of displaying the base type ofint const[]. Remember that the base type node of a multidimensional constant-array type node is also a constant-array type node.Solution: + Reveal...@Override public String toString() { String result = base.toString(); if (base instanceof ConstArrayTypeNode) { // If base is also a ConstArrayTypeNode, the keyword "const" would // have been displayed, so just add additional dimension here. result += "[]"; } else result += " const[]"; return result; }Extending the node factory
In Polyglot, every node factory is an instance of interfaceNodeFactoryin packagepolyglot.ast. Language extensions extendNodeFactoryto define new factory methods for new AST constructs.CArrayneeds a new factory method for creating an instance ofConstArrayTypeNode. Again, a good practice is to use the_csuffix to name the class implementing a given node factory interface.The extension generator already created a template node factory, which looks as follows:CArrayNodeFactory.java
The signature of each factory method in the node factory follows that of the constructor for the corresponding AST class. Therefore, adding a factory method forConstArrayTypeNoderesults in the following completed interface:CArrayNodeFactory.java
On the implementation side, the extension generator also created this template:CArrayNodeFactory_c.java
To implementConstArrayTypeNodefactory method, we need to return an instance ofConstArrayTypeNode. At first glance, the following implementation appears to suffice:@Override public ConstArrayTypeNode ConstArrayTypeNode(Position pos, TypeNode base) { // Create the AST node to be returned. ConstArrayTypeNode_c n = new ConstArrayTypeNode_c(pos, base); return n; }However, this implementation is not extensible. For the purposes of this tutorial, we will make theCArrayextension extensible. This is done by providing a hook for future extensions to override compiler passes available in the language we are implementing. This hook is provided by extension objects.Extension objects are created by a factory method in an extension factory. Each AST node has a corresponding method in the extension factory to create extension object(s) for the AST node. For theCArraylanguage, the extension factory isCArrayExtFactory. Assume for the moment thatCArrayExtFactoryhas a methodextConstArrayTypeNode()that creates extension objects forConstArrayTypeNodenodes. We use this method to create appropriate extension objects for the newly createdConstArrayTypeNode. Thus, the correct implementation of the node factory method is the following:@Override public ConstArrayTypeNode ConstArrayTypeNode(Position pos, TypeNode base) { // Create the AST node to be returned. ConstArrayTypeNode_c n = new ConstArrayTypeNode_c(pos, base); n = ext(n, extFactory().extConstArrayTypeNode()); return n; }An AST node, along with attached extension objects, is a complete representation for syntactic construct in a language. ForCArray, the final look of the completed node factory implementation is as follows:CArrayNodeFactory_c.java
Next, we will extend the extension factory to supply the node factory with the factory method, whose presence we assumed above.Extending the extension factory
Extension factories enable languages to easily modify and extend the functionality of existing languages. In Polyglot, every extension factory is an instance of interfacepolyglot.ast.ExtFactory. Language extensions extendExtFactoryto define new factory methods to create extension objects for new AST constructs.CArrayneeds a new factory methods for creating extension objects forConstArrayTypeNode.The extension generator already created a template extension factory. Each factory method in the extension factory takes no argument. The naming convention is that the prefixextnames the factory method for a given AST construct. Therefore, adding a factory method forConstArrayTypeNoderesults in the following completed interface:CArrayExtFactory.java
Once again, a good practice is to use the_csuffix to name an implementation of a given extension factory interface. To make the extension extensible, however, two classes implement the factory interface, both of which were generated by the extension generator scriptbin/newext.sh:CArrayAbstractExtFactory_c: This class primarily exists to make it easy for further extensions of ourCArraylanguage. The extension factory of these future extensions will extend this class to reuse its factory methods. The abstract extension factory provides an infrastructure for creating extension objects for new kinds of AST nodes (i.e., in our case, forConstArrayTypeNode).CArrayExtFactory_c: This class extendsCArrayAbstractExtFactory_cand defines extension factory methods specifically forCArray, i.e., to create extension objects to provide functionality for theCArraylanguage. This class is final, and future languages that extendCArraydo not reuse it.
CArrayto deal with semantic checking and source-to-source translation.Implementing methods in the abstract extension factory requires understanding a few prerequisites:- The role of extension objects.
- The organization of a collection of extension objects.
- The structure of extension factories.
Role of extension objects
Extension objects are designed for overriding existing operations or defining new operations on existing AST nodes. Thebin/newext.shscript that we used to create the skeleton of theCArrayextension createdCArrayExt, a class for extension objects forCArray. This class simply forwards all the existing operations to the language being extended. Thus, whenCArrayExtis subclassed to provide extension objects for specific kinds of AST nodes, there is no need to provide implementations for operations whose behavior is not modified.Organization of extension objects
An AST node in Polyglot will have zero or more extension objects. The AST node and extension objects form a linked list, where the first object in the list is the AST node for the base language. The next object in the list is the extension object for the next extension, and the last object is the extension object for the deepest extension. The following are some examples of the structure of AST nodes in Polyglot for different languages:+----------+ Java 1.4 | Java 1.4 | | (base) | +----------+ +----------+ +--------+ Java 5 | Java 1.4 |--->| Java 5 | | (base) | | (ext) | +----------+ +--------+ +----------+ +--------+ +--------+ Java 7 | Java 1.4 |--->| Java 5 |--->| Java 7 | | (base) | | (ext) | | (ext) | +----------+ +--------+ +--------+ +----------+ +--------+ +--------+ +--------+ Java 8 | Java 1.4 |--->| Java 5 |--->| Java 7 |--->| Java 8 | | (base) | | (ext) | | (ext) | | (ext) | +----------+ +--------+ +--------+ +--------+ +----------+ +--------+ CArray | Java 1.4 |--->| CArray | | (base) | | (ext) | +----------+ +--------+Polyglot uses a factory pattern to create extension objects for AST nodes. The extension factory is responsible for creating all of the extension objects for an AST node, and forming them into a list in the correct order. Typically, each language extension has its own extension factory, and the extension factories are composed together, as described in the next section.Structure of extension factories
Extension factories may be composed together to form a linked list. In contrast with the list of extension objects for AST nodes, the first factory in the list is one for the deepest language extension, and the last for the shallowest language extension. For the base language (Java 1.4), no extension factory exists. The following are examples of the structure of extension factories for the languages shown above:Java 1.4 N/A +---------------+ Java 5 | JL5ExtFactory | +---------------+ +---------------+ +---------------+ Java 7 | JL7ExtFactory |--->| JL5ExtFactory | +---------------+ +---------------+ +---------------+ +---------------+ +---------------+ Java 8 | JL8ExtFactory |--->| JL7ExtFactory |--->| JL5ExtFactory | +---------------+ +---------------+ +---------------+ +------------------+ CArray | CArrayExtFactory | +------------------+Extension factories are traversed from head to tail so that the extension object of the deepest extension is created first. The tail of the factory chain is used to create the remaining extension objects. All of these extension objects are finally chained together in the correct order.Implementing abstract extension factory
Let's consider the method in an extension factory to create extension objects for a specific kind of AST node, say integer literals. First, the extension object for the current language extension is created. Second, the remaining extension objects (for the other language extensions) are created. Third, all the extension objects are composed. Finally, any last-minute modification to the composed extension object can be made.Language extensions typically only modify the first and the last steps, so these steps are refactored into separate methods. For example, here is the method to create extension objects for field assignment expressions (i.e.,e1.f = e2) from the classAbstractExtFactory_c, the superclass of all extension factories.@Override public final Ext extFieldAssign() { Ext e = extFieldAssignImpl(); if (nextExtFactory != null) { Ext e2 = nextExtFactory.extFieldAssign(); e = composeExts(e, e2); } return postExtFieldAssign(e); }Note that the methodextFieldAssignis final. The extension factory for a specific language extension may overrideextFieldAssignImplorpostExtFieldAssignto implement language-specific behavior. The utility methodcomposeExts(Ext,Ext)is used to compose extension objects in the correct order. The default implementation ofextFieldAssignImplis simply to invokeextAssignImpl, since the classFieldAssignis a subclass ofAssign. That is, a language extension could provide an extension object that modifies the behavior of all assignment expressions (field assignments, local assignments, and array access assignments) by overriding the methodextAssignImpl. Similarly, the default implementation ofpostExtFieldAssignis to invokepostFieldAssign. In general, the default implementation inAbstractExtFactory_cof a method to create an extension object is to simply invoke the corresponding method of the next most general kind of AST node. This makes it easy for a language extension to override the behavior of many kinds of AST nodes.Now we are ready to implement theCArrayextension factory. BecauseCArrayadds a new kind of AST node (i.e,ConstArrayTypeNode), we need to define a method inCArrayAbstractExtFactory_cfor creating extension objects for this kind of AST node. The following code follows the same pattern as the code for field assignment expressions above.@Override public final Ext extConstArrayTypeNode() { // First, create the extension object for this extension. Ext e = extConstArrayTypeNodeImpl(); // Now, create the extension objects for the remaining extensions. // TODO: e2 stores remaining extension objects. // Chain all extension objects together. e = composeExts(e, e2); // Perform any last-minute modification to the extension object chain. return postExtConstArrayTypeNode(e); }The default implementation ofextConstArrayTypeNodeImplandpostExtConstArrayTypeNodesimply delegate to the next most general kind of AST node:ArrayTypeNode.protected Ext extConstArrayTypeNodeImpl() { // By default, the extension does not override any operations, so we // simply ask for the extension object of the superclass of // ConstArrayTypeNode, which may override some operations. return extArrayTypeNodeImpl(); } protected Ext postExtConstArrayTypeNode(Ext e) { // Again, nothing is done by default, so pass the argument up the AST // hierarchy in case some ancestors do something else. return postExtArrayTypeNode(e); }Creating the remaining extension objects is the most sophisticated step, but the intuition explained above helps. The next extension factory in the chain is examined to determine whether it supportsConstArrayTypeNodes. If so, use the extension factory method for them (i.e.,extConstArrayTypeNode); otherwise, use one for the superclass of that AST kind (i.e.,extArrayTypeNode).// Now, create the extension objects for the remaining extensions. ExtFactory nextEF = nextExtFactory(); Ext e2; if (nextEF instanceof CArrayExtFactory) { // If the next extension is also a CArray extension factory, use // the most specific factory method, which is a recursive call to // this method. e2 = ((CArrayExtFactory) nextEF).extConstArrayTypeNode(); } else { // Otherwise, the next most specific factory method is one for // ArrayTypeNode, so invoke that instead. e2 = nextEF.extArrayTypeNode(); }We have implemented all the steps. The following is the full implementation ofCArrayAbstractExtFactory_c.CArrayAbstractExtFactory_c.java
Note that it is possible to leave the generated abstract extension factory class unmodified, as long as the language extension does not define any new kind of AST nodes. The concrete extension factory class (CArrayExtFactory_c) will override some of the auxiliary methods to differentiate operations on ASTs. We will do this in later sections.