Extending the Type System

In addition to new syntactic constructs, programming language extensions often introduce new types; our CArray extension is one such. Fortunately, Polyglot allows extensions to make extensive changes to the way the type system works.

Representing types

Polyglot represents types in two different ways: first, as ASTs representing how they are written as text in the program (interface polyglot.ast.TypeNode and its subtypes), and second, as a compile-time interpretation of those ASTs to type objects (interface polyglot.types.Type and its subtypes). For example, a single class A.B might be referred to as A.B or just B at different points in the code. However, both references will be resolved to the same type object.

Taking this a step farther, suppose the program includes class declarations for two classes A and B that refer to each other:

class A {
    B x;
}
class B {
    A f();
}

The AST generated by the parser represents each class as a tree structure, including a ClassDecl object that references a ClassBody containing a list of ClassMembers. The class member A.x (an ast.FieldDecl node) contains a reference to a TypeNode containing the name B, and similarly with B.f, represented by an ast.MethodDecl node. By contrast, the type objects representing classes A and B (types.ClassType) contain types.FieldInstance and types.MethodInstance objects for x and f that respectively point directly to the corresponding type objects, without any need to consult the AST or any kind of lookup structure to resolve names.

Notice that the resulting type objects form a graph containing a cycle, which results from the mutual recursion between the two classes. ASTs never contain cycles, but the type objects created from ASTs do when types are recursive, as is frequently the case in Java programs. This means that type objects are transformed during compilation mainly by imperatively updating them to create these cycles. Once type checking begins, however, type objects are immutable.

Once type nodes in the AST are interpreted to find the appropriate type object, their type attribute is (nondestructively) updated to refer to that type object.

Construction of type objects

Operations on types needed during type checking, such as deciding whether two types are in a subtype relationship, are performed by the TypeSystem object. It operates on type objects rather than on type nodes. Therefore, all type nodes must be resolved to type objects before type checking begins. Polyglot assigns type objects to AST nodes in three compiler passes:

During parsing, primitive type declarations are parsed into canonical type nodes, each attached with an instance of PrimitiveType.
During buildTypes, the remaining type objects are initially constructed to begin the process of removing ambiguities in the AST. In this phase, the TypeBuilder visitor traverses the AST to identify the type objects that need to be constructed, create these type objects, and populate them when their members. These type objects could still be ambiguous, since the necessary type objects may not exist yet. Ambiguous type objects will be replaced with canonical type objects in the next pass.
During disambiguate, the AmbiguityRemover visitor traverses the AST to replace ambiguous type objects with canonical ones. In particular, each type nodes in the AST is replaced, if not already, with a CanonicalTypeNode node object that wraps a pointer to the corresponding type object. For example, this pass determines when an expression such as A.B.C is seen, whether the name B refers to the name of a package, a class, or a field. This pass may need to be run more than once to resolve all type names; Polyglot automatically takes care of running passes until every ambiguity is removed, or until an unresolvable ambiguity is found, resulting in a compile-time error.

Adding new types

When adding a new kind of type to the language, it is typically necessary to create both a new AST node and new corresponding kind of type object. For example, in the CArray extension, the constant array type is the new type. Extending the AST in the last section emphasizes this need, as each ConstArrayTypeNode must have a type object associated with it. In general, AST nodes representing an expression or a type must be associated with a type object. We now see the steps required to have the compiler recognize and use the new type properly.

Similarly to adding a new AST node type, adding a new type involves the following modifications to the existing compiler:

Create a class whose instance represents the new type.
Extend the type system to provide a method that returns an instance of the type class created.
Modify the appropriate AST node class to use the new type.

Notice that, unlike the AST counterpart, there is no factory class for types. The type system object is the intermediary for everything related to types; it is responsible for creating type objects and for performing operations on types. The first two steps are similar to the AST counterpart, while the last step connects the AST with the type system to make semantic checking easier in later stages of the compilation.

Creating a new type class

In Polyglot, every type is an instance of interface Type in package polyglot.types. Every interface defining a type and its operations ultimately extends Type. Every type class is a subclass of abstract class Type_c, which implements Type to provide default operations on types.

Typically, a new type class can be defined by extending an existing type class that is most similar to the new class. For example, CArray requires a new type class that represents constant array types. The most similar type class already defined in Polyglot is ArrayType_c, which implements interface ArrayType.

Once again, a good practice is to use the _c suffix to name an implementation of a given type. Hence, to create a class for constant-array types, we first create interface ConstArrayType in package carray.types, extending ArrayType:

ConstArrayType.java

It turns out that this new type needs not define any new operations because ArrayType already provides enough, so the interface body is empty.

Next, we create the actual class ConstArrayType_c that extends ArrayType_c and implements ConstArrayType:

package carray.types;

import polyglot.types.ArrayType_c;

/**
 * A {@code ConstArrayType} represents an array of base java types,
 * whose elements cannot change after initialization.
 */
public class ConstArrayType_c extends ArrayType_c implements ConstArrayType {
    // TODO: Implement this class.
}

We need to define a constructor that invokes a constructor in ArrayType_c. Each array type has a corresponding type system, a position of the occurrence of the type in the AST, and the base type of the array type. For example, the base type of int[] is the type int, and the base type of String[][] is the array type String[]. These components suffices for constant-array types, so we simply add the following constructor:

    /* Also add these imports:
     * import polyglot.types.Type;
     * import polyglot.types.TypeSystem;
     * import polyglot.util.Position;
     */
    public ConstArrayType_c(TypeSystem ts, Position pos, Type base) {
        super(ts, pos, base);
    }

Polyglot compiles a source file into Java bytecode, which requires that non-Java features be erased from the source code prior to the bytecode translation. In order to use features from language extensions post-compilation, these features must be stored in a format different from a plain text. Polyglot uses serialization of objects for the conversion, and requires that types be serializable. To fully support serialization, the class-private constant serialVersionUID must be defined properly to avoid incompatibility, and a nullary constructor must be defined for deserialization.

Polyglot provides the utility class SerialVersionUID in package polyglot.util that supplies appropriate values of serialVersionUID, making the constant declaration easy. The nullary constructor has protected visibility to prevent use from compiler code:

    /* Also add these imports:
     * import polyglot.util.SerialVersionUID;
     */
    private static final long serialVersionUID = SerialVersionUID.generate();

    /** Used for deserializing types. */
    protected ConstArrayType_c() {
    }

At this point, ConstArrayType_c looks as follows:

package carray.types;

import polyglot.types.ArrayType_c;
import polyglot.types.Type;
import polyglot.types.TypeSystem;
import polyglot.util.Position;
import polyglot.util.SerialVersionUID;

/**
 * A {@code ConstArrayType} represents an array of base java types,
 * whose elements cannot change after initialization.
 */
public class ConstArrayType_c extends ArrayType_c implements ConstArrayType {
    private static final long serialVersionUID = SerialVersionUID.generate();

    /** Used for deserializing types. */
    protected ConstArrayType_c() {
    }

    public ConstArrayType_c(TypeSystem ts, Position pos, Type base) {
        super(ts, pos, base);
    }
}

Constant-array types inherit operations from array types, so both types are semantically equivalent at this time. Later on, some of the operations will be overridden to distinguish the two types.

Exercise

Override the toString method in ConstArrayType_c to display the type name properly. As a guide, consider the implementation of toString in ArrayType_c:

    @Override
    public String toString() {
        return base.toString() + "[]";
    }

where base is the Type of the base type, which has one fewer dimension than the type being displayed. For example, int[] is the result of displaying the base type of int[][], and int is the result of displaying the base type of int[]. On the other hand, for constant-array type nodes, int const[] is the result of displaying the base type of int const[][], and int is the result of displaying the base type of int const[]. Remember that the base type of a multidimensional constant-array type is also a constant-array type.

Hint: + Reveal...

Is there any difference between this method and one in class ConstArrayTypeNode_c?

Solution: + Reveal...

    @Override
    public String toString() {
        String result = base.toString();
        if (base instanceof ConstArrayType) {
            // If base is also a ConstArrayType, the keyword "const" would
            // have been displayed, so just add additional dimension here.
            result += "[]";
        }
        else result += " const[]";
        return result;
    }

Extending the type system with factory methods

In Polyglot, every type system is an instance of interface TypeSystem in package polyglot.types. Language extensions extends TypeSystem to define new factory methods for new types and to override existing operation on types. CArray needs a new factory method for creating an instance of ConstArrayType. Again, a good practice is to use the _c suffix the name the class implementing a given type system interface.

The extension generator already created a template type system, which looks as follows:

CArrayTypeSystem.java

The signature of each factory method in the type system follows that of the constructor for the corresponding type class, except that the type system is this and hence omitted from the parameter. Therefore, adding a factory method for ConstArrayType results in the following completed interface:

CArrayTypeSystem.java

On the implementation side, the extension generator also created this template:

CArrayTypeSystem_c.java

To implement constArrayOf factory method, we simply return an instance of ConstArrayType. Unlike with the node factory, there is no complication of attaching an extension object to the type object. At this point, CArrayTypeSystem_c looks as follows:

package carray.types;

import polyglot.types.Type;
import polyglot.types.TypeSystem_c;
import polyglot.util.Position;

public class CArrayTypeSystem_c extends TypeSystem_c implements
        CArrayTypeSystem {
    @Override
    public ConstArrayType constArrayOf(Position pos, Type base) {
        return new ConstArrayType_c(this, pos, base);
    }
}

CArray's type system will be further modified to complete semantic checking in later sections.

Using the new type in the AST

Recall that Polyglot creates type objects during the TypeBuilder pass and installs resolved type objects into AST nodes during the AmbiguityRemover pass.

Therefore, constant-array types are introduced into the AST during type building, when each ConstArrayTypeNode, which represents an ambiguous constant-array type node, is attached with an ambiguous ConstArrayType. The type builder has access to the type system, which we can use to instantiate the type object. The base type of the constant-array type is one attached to the base type node of this constant-array type node:

    @Override
    public Node buildTypes(TypeBuilder tb) throws SemanticException {
        // Ask the type system to create a type object for constant array
        // of the base type.  At this point, the base type might not be
        // canonical.  The disambiguation phase will canonicalize the base type.
        CArrayTypeSystem ts = (CArrayTypeSystem) tb.typeSystem();
        return type(ts.constArrayOf(position(), base().type()));
    }

The outermost method type in the return statement modifies the node's type with the specified argument, making a copy of the node if necessary to maintain the immutability of AST nodes.

A constant-array type is canonical when its base type is. During the ambiguity removal, the ambiguous ConstArrayTypeNode is transformed into a CanonicalTypeNode when the underlying constant-array type is canonical. Because the ambiguity remover might take several passes to remove an ambiguity, the base type might still be ambiguous by the time the node is disambiguated. In this case, nothing is done for this pass, as the node will be disambiguated again in a later pass; otherwise, the transformation is done:

    @Override
    public Node disambiguate(AmbiguityRemover ar) throws SemanticException {
        Type baseType = base.type();
        if (!baseType.isCanonical()) {
            // It is possible that the base type is still ambiguous because
            // additional disambiguation passes are necessary to resolve more
            // supertypes and method signatures.  In this case, there is nothing
            // to do for now; this node will be visited and this method called
            // again in a subsequent disambiguation pass.
            return this;
        }

        // Otherwise, the base type is canonical, so we are ready to turn this
        // ambiguous type node into a canonical type node.  Replace the base
        // type of the underlying type object with the canonical base.
        CArrayTypeSystem ts = (CArrayTypeSystem) ar.typeSystem();
        NodeFactory nf = ar.nodeFactory();
        return nf.CanonicalTypeNode(position(),
                                    ts.constArrayOf(position(), baseType));
    }

The ambiguity remover has access to the type system so that a canonical constant-array type object can be created. It also has access to the node factory so that a canonical type node can be instantiated.

These two methods complete the implementation of ConstArrayTypeNode_c.

ConstArrayTypeNode_c.java

Now, CArray can be successfully built again, but the test harness will show that many test cases still fail. This is because we have only duplicated the implementation of an AST node (ConstArrayTypeNode) and a type (ConstArrayType) without changing their semantics. We will do this in the next two sections to make our implementation of CArray correct.