Extending the Type System
In addition to new syntactic constructs, programming language extensions often
introduce new types; our CArray extension is one such. Fortunately, Polyglot
allows extensions to make extensive changes to the way the type system works.
Representing types
Polyglot represents types in two different ways: first, as ASTs representing
how they are written as text in the program (interface
polyglot.ast.TypeNode
and its subtypes), and second, as a
compile-time interpretation of those ASTs to type objects (interface
polyglot.types.Type
and its subtypes). For example, a single class
A.B
might be referred to as A.B
or just
B
at different points in the code. However, both references will
be resolved to the same type object.
Taking this a step farther, suppose the program includes class
declarations for two classes
A
and B
that refer to
each other:
class A { B x; } class B { A f(); }
The AST generated by the parser represents each class as a tree structure,
including a
ClassDecl
object that references a
ClassBody
containing a list of ClassMembers
. The
class member A.x
(an ast.FieldDecl
node) contains a
reference to a TypeNode
containing the name B
, and
similarly with B.f
, represented by an ast.MethodDecl
node. By contrast, the type objects representing classes A
and
B
(types.ClassType
) contain
types.FieldInstance
and types.MethodInstance
objects
for x
and f
that respectively point directly to the
corresponding type objects, without any need to consult the AST or any kind of
lookup structure to resolve names.
Notice that the resulting type objects form a graph containing a cycle, which
results from the mutual recursion between the two classes. ASTs never contain
cycles, but the type objects created from ASTs do when types are recursive, as
is frequently the case in Java programs. This means that type objects are
transformed during compilation mainly by imperatively updating them to create
these cycles. Once type checking begins, however, type objects are immutable.
Once type nodes in the AST are interpreted to find the appropriate type
object, their type attribute is (nondestructively) updated to refer to that
type object.
Construction of type objects
Operations on types needed during type checking, such as deciding whether two
types are in a subtype relationship, are performed by the
TypeSystem
object. It operates on type objects rather than on type
nodes. Therefore, all type nodes must be resolved to type objects before type checking
begins.
Polyglot assigns type objects to AST nodes in three compiler passes:
- During parsing, primitive type declarations are parsed into canonical
type nodes, each attached with an instance of
PrimitiveType
. - During buildTypes, the remaining type objects are initially
constructed to begin the process of removing ambiguities in the AST.
In this phase, the
TypeBuilder
visitor traverses the AST to identify the type objects that need to be constructed, create these type objects, and populate them when their members. These type objects could still be ambiguous, since the necessary type objects may not exist yet. Ambiguous type objects will be replaced with canonical type objects in the next pass. - During disambiguate, the
AmbiguityRemover
visitor traverses the AST to replace ambiguous type objects with canonical ones. In particular, each type nodes in the AST is replaced, if not already, with aCanonicalTypeNode
node object that wraps a pointer to the corresponding type object. For example, this pass determines when an expression such asA.B.C
is seen, whether the nameB
refers to the name of a package, a class, or a field. This pass may need to be run more than once to resolve all type names; Polyglot automatically takes care of running passes until every ambiguity is removed, or until an unresolvable ambiguity is found, resulting in a compile-time error.
Adding new types
When adding a new kind of type to the language, it is typically necessary to create
both a new AST node and new corresponding kind of type object.
For example, in the
CArray
extension, the constant array type is the new type. Extending the AST in the last
section emphasizes this need, as each ConstArrayTypeNode
must have
a type object associated with it. In general, AST nodes representing an
expression or a type must be associated with a type object. We now
see the steps required to have the compiler recognize and use the
new type properly.
Similarly to adding a new AST node type, adding a new type involves the following
modifications to the existing compiler:
- Create a class whose instance represents the new type.
- Extend the type system to provide a method that returns an instance of the type class created.
- Modify the appropriate AST node class to use the new type.
Creating a new type class
In Polyglot, every type is an instance of interface
Type
in package
polyglot.types
. Every interface defining a type and its operations
ultimately extends Type
. Every type class is a subclass of
abstract class Type_c
, which implements Type
to
provide default operations on types.
Typically, a new type class can be defined by extending an existing type class
that is most similar to the new class. For example,
CArray
requires a new type class that represents constant array types. The most
similar type class already defined in Polyglot is ArrayType_c
,
which implements interface ArrayType
.
Once again, a good practice is to use the
_c
suffix to name an
implementation of a given type. Hence, to create a class for constant-array
types, we first create interface ConstArrayType
in package
carray.types
, extending ArrayType
:
ConstArrayType.javaIt turns out that this new type needs not define any new operations because
ArrayType
already provides enough, so the interface body is empty.
Next, we create the actual class
ConstArrayType_c
that extends
ArrayType_c
and implements ConstArrayType
:
package carray.types; import polyglot.types.ArrayType_c; /** * A {@code ConstArrayType} represents an array of base java types, * whose elements cannot change after initialization. */ public class ConstArrayType_c extends ArrayType_c implements ConstArrayType { // TODO: Implement this class. }We need to define a constructor that invokes a constructor in
ArrayType_c
. Each array type has a corresponding type system, a
position of the occurrence of the type in the AST, and the base type of the
array type. For example, the base type of int[]
is the type
int
, and the base type of String[][]
is the array
type String[]
. These components suffices for constant-array types,
so we simply add the following constructor:
/* Also add these imports: * import polyglot.types.Type; * import polyglot.types.TypeSystem; * import polyglot.util.Position; */ public ConstArrayType_c(TypeSystem ts, Position pos, Type base) { super(ts, pos, base); }
Polyglot compiles a source file into Java bytecode, which requires that non-Java
features be erased from the source code prior to the bytecode translation. In
order to use features from language extensions post-compilation, these features
must be stored in a format different from a plain text. Polyglot uses
serialization of objects for the conversion, and requires that types be
serializable. To fully support serialization, the class-private constant
serialVersionUID
must be defined properly to avoid incompatibility,
and a nullary constructor must be defined for deserialization.
Polyglot provides the utility class
SerialVersionUID
in package
polyglot.util
that supplies appropriate values of
serialVersionUID
, making the constant declaration easy. The
nullary constructor has protected
visibility to prevent use from
compiler code:
/* Also add these imports: * import polyglot.util.SerialVersionUID; */ private static final long serialVersionUID = SerialVersionUID.generate(); /** Used for deserializing types. */ protected ConstArrayType_c() { }
At this point,
ConstArrayType_c
looks as follows:
package carray.types; import polyglot.types.ArrayType_c; import polyglot.types.Type; import polyglot.types.TypeSystem; import polyglot.util.Position; import polyglot.util.SerialVersionUID; /** * A {@code ConstArrayType} represents an array of base java types, * whose elements cannot change after initialization. */ public class ConstArrayType_c extends ArrayType_c implements ConstArrayType { private static final long serialVersionUID = SerialVersionUID.generate(); /** Used for deserializing types. */ protected ConstArrayType_c() { } public ConstArrayType_c(TypeSystem ts, Position pos, Type base) { super(ts, pos, base); } }Constant-array types inherit operations from array types, so both types are semantically equivalent at this time. Later on, some of the operations will be overridden to distinguish the two types.
Exercise
Override the
toString
method in ConstArrayType_c
to
display the type name properly. As a guide, consider the implementation of
toString
in ArrayType_c
:
@Override public String toString() { return base.toString() + "[]"; }where
base
is the Type
of the base type, which has one
fewer dimension than the type being displayed. For example, int[]
is the result of displaying the base type of int[][]
, and
int
is the result of displaying the base type of
int[]
. On the other hand, for constant-array type nodes,
int const[]
is the result of displaying the base type of
int const[][]
, and int
is the result of displaying the
base type of int const[]
. Remember that the base type of a
multidimensional constant-array type is also a constant-array type.
Hint:
+ Reveal...
Is there any difference between this method and one in class
ConstArrayTypeNode_c
?
Solution:
+ Reveal...
@Override public String toString() { String result = base.toString(); if (base instanceof ConstArrayType) { // If base is also a ConstArrayType, the keyword "const" would // have been displayed, so just add additional dimension here. result += "[]"; } else result += " const[]"; return result; }
Extending the type system with factory methods
In Polyglot, every type system is an instance of interface
TypeSystem
in package polyglot.types
. Language
extensions extends TypeSystem
to define new factory methods for new
types and to override existing operation on types. CArray
needs a
new factory method for creating an instance of ConstArrayType
.
Again, a good practice is to use the _c
suffix the name the class
implementing a given type system interface.
The extension generator already created a template type system, which looks as
follows:
CArrayTypeSystem.javaThe signature of each factory method in the type system follows that of the constructor for the corresponding type class, except that the type system is
this
and hence omitted from the parameter. Therefore, adding a
factory method for ConstArrayType
results in the following
completed interface:
CArrayTypeSystem.java
On the implementation side, the extension generator also created this template:
CArrayTypeSystem_c.javaTo implement
constArrayOf
factory method, we simply return an
instance of ConstArrayType
. Unlike with the node factory, there
is no complication of attaching an extension object to the type object.
At this point, CArrayTypeSystem_c
looks as follows:
package carray.types; import polyglot.types.Type; import polyglot.types.TypeSystem_c; import polyglot.util.Position; public class CArrayTypeSystem_c extends TypeSystem_c implements CArrayTypeSystem { @Override public ConstArrayType constArrayOf(Position pos, Type base) { return new ConstArrayType_c(this, pos, base); } }
CArray
's type system will be further modified to complete semantic
checking in later sections.
Using the new type in the AST
Recall that Polyglot creates type objects during the
TypeBuilder
pass and installs resolved type objects into AST nodes during the
AmbiguityRemover
pass.
Therefore, constant-array types are introduced into the AST during type building, when each
ConstArrayTypeNode
, which represents an ambiguous constant-array
type node, is attached with an ambiguous ConstArrayType
. The
type builder has access to the type system, which we can use to instantiate the
type object.
The base type of the constant-array type is one attached to the
base type node of this constant-array type node:
@Override public Node buildTypes(TypeBuilder tb) throws SemanticException { // Ask the type system to create a type object for constant array // of the base type. At this point, the base type might not be // canonical. The disambiguation phase will canonicalize the base type. CArrayTypeSystem ts = (CArrayTypeSystem) tb.typeSystem(); return type(ts.constArrayOf(position(), base().type())); }The outermost method
type
in the return
statement
modifies the node's type with the specified argument, making a copy of the node
if necessary to maintain the immutability of AST nodes.
A constant-array type is canonical when its base type is. During the ambiguity
removal, the ambiguous
ConstArrayTypeNode
is transformed into a
CanonicalTypeNode
when the underlying constant-array type is
canonical. Because the ambiguity remover might take several passes to remove
an ambiguity, the base type might still be ambiguous by the time the node is
disambiguated. In this case, nothing is done for this pass, as the node will
be disambiguated again in a later pass; otherwise, the transformation is done:
@Override public Node disambiguate(AmbiguityRemover ar) throws SemanticException { Type baseType = base.type(); if (!baseType.isCanonical()) { // It is possible that the base type is still ambiguous because // additional disambiguation passes are necessary to resolve more // supertypes and method signatures. In this case, there is nothing // to do for now; this node will be visited and this method called // again in a subsequent disambiguation pass. return this; } // Otherwise, the base type is canonical, so we are ready to turn this // ambiguous type node into a canonical type node. Replace the base // type of the underlying type object with the canonical base. CArrayTypeSystem ts = (CArrayTypeSystem) ar.typeSystem(); NodeFactory nf = ar.nodeFactory(); return nf.CanonicalTypeNode(position(), ts.constArrayOf(position(), baseType)); }The ambiguity remover has access to the type system so that a canonical constant-array type object can be created. It also has access to the node factory so that a canonical type node can be instantiated.
These two methods complete the implementation of
ConstArrayTypeNode_c
.
ConstArrayTypeNode_c.javaNow,
CArray
can be successfully built again, but the test harness
will show that many test cases still fail. This is because we have only
duplicated the implementation of an AST node (ConstArrayTypeNode
)
and a type (ConstArrayType
) without changing their semantics. We
will do this in the next two sections to make our implementation of
CArray
correct.