Structuring Extension Code

This tutorial section shows how to create an empty extension using our provided script, and goes over the organization of the generated extension code and how it corresponds to code in the Polyglot library.

Creating an empty extension

Any language extension incurs a fixed startup cost, namely, several packages and files required to make the extension usable with the base compiler. Creating these skeletal entities can be tedious, so Polyglot provides a script to generate them automatically. The skeleton template is located in directory skel of the Polyglot code base, but simply copying the directory will not do the trick because renaming must be done. The script renames and copies the files and directories in an instant.
The script is bin/newext.sh and has the following usage:
Usage: newext.sh dir package LanguageName ext
  where dir          - name to use for the top-level directory
                       and for the compiler script
        package      - name to use for the Java package
        LanguageName - full name of the language
        ext          - file extension for source files

package and LanguageName must be legal Java identifiers.
For our CArray extension, we will use this command:
newext.sh carray carray CArray car
The first token newext.sh may need to be prepended with appropriate directory if the bin directory is not defined in the PATH system variable.

Extension structure

Several directories will be generated for the empty extension. We will focus on directory carray/compiler/src, which contains the extension code. The remainder of this section provides short descriptions of these extension components. Later, we will explore some of these components in more detail while we implement the CArray extension in full.

Package carray

This package contains general specifications for the extension.

Package carray.parse

This package contains lexical specifications for the extension.

Package carray.ast

This package defines the machinery for abstract syntax trees (ASTs) of programs in the extension. The naming convention is that a name such as CArrayNodeFactory denotes an interface, and the suffix _c, as in CArrayNodeFactory_c designates the implementation of the corresponding interface.

Package carray.types

This package contains the type objects and defines the type system for the language.

Package carray.visit

This package hosts any visitor classes that iterate over abstract syntax trees specific to the extension.

Building the extension

Similar to Polyglot, extensions can be built using the Ant build tool. The generated build script, located at carray/build.xml, needs no modifications to build the empty extension. In fact, the full implementation of CArray requires no modifications to the build script whatsoever.

Polyglot Test Harness

Polyglot provides a framework called the Polyglot Test Harness (pth) for conveniently testing the extension implementation. The generated directory carray/tests contains a template test script called pthScript used by pth. The test script has the following grammar:
#      ScriptFile   ::= CompilerTest+
#      CompilerTest ::= ExtClassName ["CmdLineArgs"] { FileTest [; FileTest]* }
#      FileTest     ::= CompilationUnits [Description] [FailureSet]
#  CompilationUnits ::= Filenames [, Filenames]*
#      Filenames    ::= Filename [Filename]*
#      Description  ::= LitString
#      FailureSet   ::= Failure [, Failure]*
#      Failure      ::= ( ErrorKind )
#                    |  ( ErrorKind, "RegExp" )
#                    |  ( "RegExp" )
#                    |  ( )
#      ErrorKind    :   one of, or a unique prefix of one of the following 
#                       strings: "Warning", "Internal Error", "I/O Error", 
#                       "Lexical Error", "Syntax Error", "Semantic Error"
#                       or "Post-compiler Error".
#      Filename     :   the name of a file. Is interpreted from the 
#                       directory where pth is run.
#      LitString    :   a literal string, enclosed in quotes.
#      RegExp       :   a regular expression, as in java.util.regex; 
#                       is always enclosed in quotes.
#      CmdLineArgs  :   additional command line args for the Polyglot 
#                       compiler; is always enclosed in quotes.
The generated test script contains one compiler test, which contains one file test that invokes the currently empty CArray extension on the generated file Hello.car:
carray.ExtensionInfo "-d out" {
    Hello.car;
}
This compiler test also specifies the Polyglot flag -d, which designates the output directory of .class files generated by the compiler.
The entry point for pth is class polyglot.pth.Main, located in directory tools/pth/src of the Polyglot code base. To run pth, simply pass in the test script file name as the argument. For each file test in the test script, pth will invoke the specified compiler and report whether the compiler succeeds or fails as expected. For Hello.car, no failures are listed, so this file test is expected to compile. Running pth with the generated pthScript yields the following result:
Test script pthScript
  Hello.car: OK
pthScript: 1 out of 1 tests succeeded.
We will populate this test script with more test cases as we implement CArray.