Extending the Abstract Syntax Tree

Programming language extensions usually introduce at least one new syntactic construct that requires a representation as a node in the AST. For example, in the CArray extension, the new syntactic construct is the const keyword. Extending the parser in the last section emphasizes this need, as the AST node factory for CArray must supply the parser with a new factory method that creates a type node for constant arrays. This tutorial section describes the steps required to have the compiler recognize and use the new construct properly.
In general, adding a new AST node involves the following modifications to the existing compiler:
  1. Create a class whose instance represents the new AST node.
  2. Extend the node factory to provide a method that returns an instance of the AST class created.
  3. Extend the extension factory to provide a hook for future language extensions to override functionality defined in this extension.
The parser invokes the factory methods defined in the node factory. In turn, these factory methods return an instance of an AST class in the language. Before returning the instance, the node factory method attaches the instance with zero or more extension objects, provided by extensions of the language being implemented to override operations defined in the current language. The guide below illustrates these modifications for CArray.

Creating a new AST class

In Polyglot, every AST node is an instance of interface Node in package polyglot.ast. Every interface defining AST components and operations on an AST node ultimately extends Node. Every AST class is a subclass of abstract class Node_c, which implements Node to provide default AST operations and define common AST components, such as the textual position of the AST node in the source file.
In general, a new AST class can be defined by extending an existing AST class that is most similar to the new class. For example, CArray requires a new AST class that represents a type node for constant arrays. The most similar AST class already defined in Polyglot is ArrayTypeNode_c, which implements interface ArrayTypeNode.
A good practice when creating a new AST class is to define the corresponding interface, where the suffix _c is stripped off the class name. Hence, to create a class for constant-array type nodes, we first create interface ConstArrayTypeNode in package carray.ast, extending ArrayTypeNode:
ConstArrayTypeNode.java
It turns out that this new type node need not define any new components or operations because ArrayTypeNode already provides enough, so the interface body is empty.
Next, we create the actual class ConstArrayTypeNode_c that extends ArrayTypeNode_c and implements ConstArrayTypeNode:
package carray.ast;

import polyglot.ast.ArrayTypeNode_c;
// TODO: Add more imports.

/**
 * A {@code ConstArrayTypeNode} is a type node for a non-canonical
 * const array type.
 */
public class ConstArrayTypeNode_c extends ArrayTypeNode_c implements
        ConstArrayTypeNode {
    // TODO: Implement this class.
}
Two items are worth noticing:
AST nodes are serializable, so we also add a serialVersionUID generated by a Polyglot utility class:
    private static final long serialVersionUID = SerialVersionUID.generate();

Exercise

Override the toString method in ConstArrayTypeNode_c to display the type name properly. As a guide, consider the implementation of toString in ArrayTypeNode_c:
    @Override
    public String toString() {
        return base.toString() + "[]";
    }
where base is the TypeNode of the base type, which has one fewer dimension than the type being displayed. For example, int[] is the result of displaying the base type of int[][], and int is the result of displaying the base type of int[]. On the other hand, for constant-array type nodes, int const[] is the result of displaying the base type of int const[][], and int is the result of displaying the base type of int const[]. Remember that the base type node of a multidimensional constant-array type node is also a constant-array type node.
Solution: + Reveal...

Extending the node factory

In Polyglot, every node factory is an instance of interface NodeFactory in package polyglot.ast. Language extensions extend NodeFactory to define new factory methods for new AST constructs. CArray needs a new factory method for creating an instance of ConstArrayTypeNode. Again, a good practice is to use the _c suffix to name the class implementing a given node factory interface.
The extension generator already created a template node factory, which looks as follows:
CArrayNodeFactory.java
The signature of each factory method in the node factory follows that of the constructor for the corresponding AST class. Therefore, adding a factory method for ConstArrayTypeNode results in the following completed interface:
CArrayNodeFactory.java
On the implementation side, the extension generator also created this template:
CArrayNodeFactory_c.java
To implement ConstArrayTypeNode factory method, we need to return an instance of ConstArrayTypeNode. At first glance, the following implementation appears to suffice:
    @Override
    public ConstArrayTypeNode ConstArrayTypeNode(Position pos, TypeNode base) {
        // Create the AST node to be returned.
        ConstArrayTypeNode_c n = new ConstArrayTypeNode_c(pos, base);
        return n;
    }
However, this implementation is not extensible. For the purposes of this tutorial, we will make the CArray extension extensible. This is done by providing a hook for future extensions to override compiler passes available in the language we are implementing. This hook is provided by extension objects.
Extension objects are created by a factory method in an extension factory. Each AST node has a corresponding method in the extension factory to create extension object(s) for the AST node. For the CArray language, the extension factory is CArrayExtFactory. Assume for the moment that CArrayExtFactory has a method extConstArrayTypeNode() that creates extension objects for ConstArrayTypeNode nodes. We use this method to create appropriate extension objects for the newly created ConstArrayTypeNode. Thus, the correct implementation of the node factory method is the following:
    @Override
    public ConstArrayTypeNode ConstArrayTypeNode(Position pos, TypeNode base) {
        // Create the AST node to be returned.
        ConstArrayTypeNode_c n = new ConstArrayTypeNode_c(pos, base);
        n = ext(n, extFactory().extConstArrayTypeNode());
        return n;
    }
An AST node, along with attached extension objects, is a complete representation for syntactic construct in a language. For CArray, the final look of the completed node factory implementation is as follows:
CArrayNodeFactory_c.java
Next, we will extend the extension factory to supply the node factory with the factory method, whose presence we assumed above.

Extending the extension factory

Extension factories enable languages to easily modify and extend the functionality of existing languages. In Polyglot, every extension factory is an instance of interface polyglot.ast.ExtFactory. Language extensions extend ExtFactory to define new factory methods to create extension objects for new AST constructs. CArray needs a new factory methods for creating extension objects for ConstArrayTypeNode.
The extension generator already created a template extension factory. Each factory method in the extension factory takes no argument. The naming convention is that the prefix ext names the factory method for a given AST construct. Therefore, adding a factory method for ConstArrayTypeNode results in the following completed interface:
CArrayExtFactory.java
Once again, a good practice is to use the _c suffix to name an implementation of a given extension factory interface. To make the extension extensible, however, two classes implement the factory interface, both of which were generated by the extension generator script bin/newext.sh: For now, we will focus on implementing the abstract extension factory to complete a hook for future extensions. Later on, we will modify the final extension factory class specifically for CArray to deal with semantic checking and source-to-source translation.
Implementing methods in the abstract extension factory requires understanding a few prerequisites:

Role of extension objects

Extension objects are designed for overriding existing operations or defining new operations on existing AST nodes. The bin/newext.sh script that we used to create the skeleton of the CArray extension created CArrayExt, a class for extension objects for CArray. This class simply forwards all the existing operations to the language being extended. Thus, when CArrayExt is subclassed to provide extension objects for specific kinds of AST nodes, there is no need to provide implementations for operations whose behavior is not modified.

Organization of extension objects

An AST node in Polyglot will have zero or more extension objects. The AST node and extension objects form a linked list, where the first object in the list is the AST node for the base language. The next object in the list is the extension object for the next extension, and the last object is the extension object for the deepest extension. The following are some examples of the structure of AST nodes in Polyglot for different languages:
            +----------+
Java 1.4    | Java 1.4 |
            |  (base)  |
            +----------+

            +----------+    +--------+
Java 5      | Java 1.4 |--->| Java 5 |
            |  (base)  |    | (ext)  |
            +----------+    +--------+

            +----------+    +--------+    +--------+
Java 7      | Java 1.4 |--->| Java 5 |--->| Java 7 |
            |  (base)  |    | (ext)  |    | (ext)  |
            +----------+    +--------+    +--------+

            +----------+    +--------+    +--------+    +--------+
Java 8      | Java 1.4 |--->| Java 5 |--->| Java 7 |--->| Java 8 |
            |  (base)  |    | (ext)  |    | (ext)  |    | (ext)  |
            +----------+    +--------+    +--------+    +--------+

            +----------+    +--------+
CArray      | Java 1.4 |--->| CArray |
            |  (base)  |    | (ext)  |
            +----------+    +--------+
Polyglot uses a factory pattern to create extension objects for AST nodes. The extension factory is responsible for creating all of the extension objects for an AST node, and forming them into a list in the correct order. Typically, each language extension has its own extension factory, and the extension factories are composed together, as described in the next section.

Structure of extension factories

Extension factories may be composed together to form a linked list. In contrast with the list of extension objects for AST nodes, the first factory in the list is one for the deepest language extension, and the last for the shallowest language extension. For the base language (Java 1.4), no extension factory exists. The following are examples of the structure of extension factories for the languages shown above:
Java 1.4    N/A

            +---------------+
Java 5      | JL5ExtFactory |
            +---------------+

            +---------------+    +---------------+
Java 7      | JL7ExtFactory |--->| JL5ExtFactory |
            +---------------+    +---------------+

            +---------------+    +---------------+    +---------------+
Java 8      | JL8ExtFactory |--->| JL7ExtFactory |--->| JL5ExtFactory |
            +---------------+    +---------------+    +---------------+

            +------------------+
CArray      | CArrayExtFactory |
            +------------------+
Extension factories are traversed from head to tail so that the extension object of the deepest extension is created first. The tail of the factory chain is used to create the remaining extension objects. All of these extension objects are finally chained together in the correct order.

Implementing abstract extension factory

Let's consider the method in an extension factory to create extension objects for a specific kind of AST node, say integer literals. First, the extension object for the current language extension is created. Second, the remaining extension objects (for the other language extensions) are created. Third, all the extension objects are composed. Finally, any last-minute modification to the composed extension object can be made.
Language extensions typically only modify the first and the last steps, so these steps are refactored into separate methods. For example, here is the method to create extension objects for field assignment expressions (i.e., e1.f = e2) from the class AbstractExtFactory_c, the superclass of all extension factories.
    @Override
    public final Ext extFieldAssign() {
        Ext e = extFieldAssignImpl();

        if (nextExtFactory != null) {
            Ext e2 = nextExtFactory.extFieldAssign();
            e = composeExts(e, e2);
        }
        return postExtFieldAssign(e);
    }
Note that the method extFieldAssign is final. The extension factory for a specific language extension may override extFieldAssignImpl or postExtFieldAssign to implement language-specific behavior. The utility method composeExts(Ext,Ext) is used to compose extension objects in the correct order. The default implementation of extFieldAssignImpl is simply to invoke extAssignImpl, since the class FieldAssign is a subclass of Assign. That is, a language extension could provide an extension object that modifies the behavior of all assignment expressions (field assignments, local assignments, and array access assignments) by overriding the method extAssignImpl. Similarly, the default implementation of postExtFieldAssign is to invoke postFieldAssign. In general, the default implementation in AbstractExtFactory_c of a method to create an extension object is to simply invoke the corresponding method of the next most general kind of AST node. This makes it easy for a language extension to override the behavior of many kinds of AST nodes.
Now we are ready to implement the CArray extension factory. Because CArray adds a new kind of AST node (i.e, ConstArrayTypeNode), we need to define a method in CArrayAbstractExtFactory_c for creating extension objects for this kind of AST node. The following code follows the same pattern as the code for field assignment expressions above.
    @Override
    public final Ext extConstArrayTypeNode() {
        // First, create the extension object for this extension.
        Ext e = extConstArrayTypeNodeImpl();

        // Now, create the extension objects for the remaining extensions.
        // TODO: e2 stores remaining extension objects.

        // Chain all extension objects together.
        e = composeExts(e, e2);
        // Perform any last-minute modification to the extension object chain.
        return postExtConstArrayTypeNode(e);
    }
The default implementation of extConstArrayTypeNodeImpl and postExtConstArrayTypeNode simply delegate to the next most general kind of AST node: ArrayTypeNode.
    protected Ext extConstArrayTypeNodeImpl() {
        // By default, the extension does not override any operations, so we
        // simply ask for the extension object of the superclass of
        // ConstArrayTypeNode, which may override some operations.
        return extArrayTypeNodeImpl();
    }

    protected Ext postExtConstArrayTypeNode(Ext e) {
        // Again, nothing is done by default, so pass the argument up the AST
        // hierarchy in case some ancestors do something else.
        return postExtArrayTypeNode(e);
    }
Creating the remaining extension objects is the most sophisticated step, but the intuition explained above helps. The next extension factory in the chain is examined to determine whether it supports ConstArrayTypeNodes. If so, use the extension factory method for them (i.e., extConstArrayTypeNode); otherwise, use one for the superclass of that AST kind (i.e., extArrayTypeNode).
        // Now, create the extension objects for the remaining extensions.
        ExtFactory nextEF = nextExtFactory();
        Ext e2;
        if (nextEF instanceof CArrayExtFactory) {
            // If the next extension is also a CArray extension factory, use
            // the most specific factory method, which is a recursive call to
            // this method.
            e2 = ((CArrayExtFactory) nextEF).extConstArrayTypeNode();
        }
        else {
            // Otherwise, the next most specific factory method is one for
            // ArrayTypeNode, so invoke that instead.
            e2 = nextEF.extArrayTypeNode();
        }
We have implemented all the steps. The following is the full implementation of CArrayAbstractExtFactory_c.
CArrayAbstractExtFactory_c.java
Note that it is possible to leave the generated abstract extension factory class unmodified, as long as the language extension does not define any new kind of AST nodes. The concrete extension factory class (CArrayExtFactory_c) will override some of the auxiliary methods to differentiate operations on ASTs. We will do this in later sections.