A Space-Optimized Java Virtual Machine for Resource Constrained Mobile Nodes


Matthew Wachs
Prof. Emin Gün Sirer

{mw223, egs}@cs.cornell.edu

Introduction

    A current hot topic of research in the field of Operating Systems is that of sensor networks, which exploit the capabilities of inexpensive wireless networking and low-cost, low-power CPUs. Imagine building one thousand tiny circuitboards with a simple processor and a low-power radio for transmitting data, and an environmental sensor such as a thermometer. If you were to scatter these nodes in a field or lake, they could each take temperature readings at their respective locations. Ideally, you should be able to collect and analyze that data at a central computer to yield an overall picture of the temperature differences throughout the area covered. There are many challenges in accomplishing this task, however:
  1. Nodes out of range of the central data collector cannot communicate with it, but can still talk to nearby nodes. It should be possible to relay messages, across a complex and possibly changing network, to nodes far away.
  2. If the radios are a big power drain, it may be prohibitive to relay complete information from every node to a central source. It should be possible to aggregate and summarize data "within the network," while it's being relayed, to save on transmission costs.
  3. If not every node is running the same software (e.g. one node is summarizing the data), there may be bad and good places to locate the nodes with extra responsibilities. A node that is providing important services, for instance, might be best placed in the center, rather than at the fringe, where several hops will be required for most packets to reach it. Determining an ideal placement, however, cannot always be done in advance, for instance if the nodes are moving around. It is preferable to have the network track the traffic and the locations of each node, and migrate responsibilities from node to node to reduce battery drain.
  4. Mobile nodes that have severely limited memory capacity cannot run even simple programs written in popular languages like C++ or Java because often the runtime environment for this code is several megabytes in size.

    Wireless routing protocols such as DSR, DSDV, AODV, and TORA seek to address the first problem. Research into how to best accomplish the second goal without central coordination is still ongoing. The third goal is being addressed by the MagnetOS project at Cornell. The fourth concern is the focus of this paper.

    The MagnetOS project has created a runtime environment for Java programs that analyzes network traffic and migrates code from node to node as appropriate to reduce power consumption by decreasing transmission range. One benefit of the MagnetOS runtime is that it can optimize the performance of any Java code, without requiring the program to be custom-tailored for the MagnetOS system.

    Unfortunately, while Java is an ideal platform for developing MagnetOS and programs to run on it, the memory footprint of Java can be enormous: far beyond the capacity of even well-equipped sensor nodes. To demonstrate the applicability of MagnetOS for low-cost, resource constrained computing devices, it was necessary to build a Java interpreter with special space optimizations. While other compact JVM's exist, they do not support the wide range of services, such as reflection, object serialization, and JRMI (Java Remote Method Invocation) required by MagnetOS.

Approach

    Back when computers had under a megabyte of memory and programs fit on floppy disks, developers faced similar constraints in designing their applications. They observed that many parts of their programs were infrequently used and did not need to be occupying space in memory unless they were needed. For instance, a user might spend hours writing a paper in a word processor and only a minute using the print function. Thus, programmers designed their software to leave the printing code unloaded until called for. When the user chose to print their document, the printing instructions were loaded off the disk and executed. When the job was finished, they could be unloaded to make room for something else.

    This approach can be applied to a Java interpreter to accomplish similar memory savings. Conventional Java interpreters keep a class in memory after it has been loaded for the first time; this is unnecessary and should be avoided in a resource-constrained environment.


Information in Memory

    When a Java class is loaded by a JVM, several pieces are present in RAM:

    The largest single part of the class is generally the constant pool. The other pieces, however, also take up considerable RAM.


JVM Modifications

    This project began with the Kimera JVM, a third-party Java interpreter developed at the University of Washington, for which we had source code. Kimera supported nearly all Java features necessary to run the MagnetOS runtime; support for the remainder was added and bugs were fixed. The interpreter, originally designed for Java 1.1, was extensively modified to achieve compatibility with the standard Java J2SE 1.4.1 runtime libraries.

    The first space optimization performed was to defer loading of a Java method's bytecode until the particular method is executed. A cache of recently used methods is maintained, and once a method's bytecode leaves the cache, it must be reloaded from disk. This modification was performed first because it was simple to intercept method calls within the interpreter and add the additional loading and unloading code.

    The second modification made was to unload constant pools during program execution. The table of strings used by a Java class can be quite large because it contains the name and signature of each method and field, plus string literals used in the code. The structures used in the JVM were changed to no longer store the constant pool as a transparent object, but rather to maintain an opaque reference to it, with access still effected through the preexisting accessor functions. The accessor functions were then modified to return values from the cache of recently used constant pools, if available, or else load the constant pool from disk to provide the required information. As an additional optimization, strings are pooled between constant pools. In other words, different constant pools may contain separate copies of frequently used strings, such as "toString" or "hashCode". To avoid this, one global pool of constant pool strings is maintained, with pointers from the individual constant pools into this global structure. This presents a challenge, however, in that it is difficult to determine when all the constant pools using a particular string have been unloaded, which should allow the string to be removed from the global pool and freed. Detecting this case, however, is simplified through the use of a garbage collector. Strings are allocated in a GC heap and when no references to a string exist any longer, the string is automatically freed and removed from the pool.

    The third modification was to actually unload entire classes. Doing so removes all memory-resident information about a class, as if it had never been loaded in the first place. This allows for the greatest memory savings. Because class loading is deterministic, reloading a class will result in all of its contents being present at the same offsets as when it was originally loaded, making it possible to unload and reload a class without adjustments to instantiated objects. It is necessary, however, to maintain the values of the static fields of the class even after the class has been unloaded, or else they will revert to their default values when reloading occurs. Class unloading was made possible by introducing a level of indirection throughout the entire program. Class handles were introduced, which are opaque handles to a class. A boundary was established throughout the interpreter between functions that access the internal structures of a class directly, and those that only have access to the indirect handle. When a function needs to access details about a class but only maintains a handle to it, it calls a function to dereference the handle. This invokes the class unloading/reloading code, which searches its cache of recently used classes for a memory-resident copy of the class. If no such cached copy exists, the code loads the class from disk, restores the correct values of the static fields, and returns the newly-loaded copy. Classes and their internal structures are allocated using a garbage collector. As long as code maintains a reference to a particular class, it remains in memory. Once it no longer needs to access the internal structures of the class, however, its reference to the class will go away and the garbage collector will be able to reclaim the space occupied by the class. When it is next needed, subsequent code will dereference the class handle again, causing the class to be reloaded from disk.

    Other modifications were made to support the memory-saving objectives. Extraneous fields in bookkeeping structures were eliminated. The garbage collector was overhauled to move its speed-compactness tradeoff more toward compactness and add notify-on-free support for correct cache maintenance. A custom memory allocator was designed to improve the efficiency of the JVM's memory allocations by moving them closer to the hardware level, and support for compactly allocating a group of small objects was enhanced. Memory allocation functions were instrumented and a simulator was used to tune allocation block sizes.

    Finally, as an additional optimization, an optional compression feature was added to our custom memory allocation framework. A working set of recently used virtual memory pages is maintained, and when a page has not been used for a while, the data is compressed using run-length encoding and Lempel-Ziv compression, and the page's protection status is changed from "read/write" to "no access." When the interpreter next references a value stored within the page, a page fault occurs, the contents are decompressed and the protection level is restored to "read/write," and the access obtains the correct value.


Memory Savings

    Before performing these modifications to the JVM, the interpreter required over 9 MB of memory to load and run the MagnetOS runtime. With all of the space optimizations turned on, it is possible to run the exact same code with only 1.3 MB of memory. No modifications to the MagnetOS sources nor to the Java runtime library are necessary to achieve this savings, and the full functionality of the JVM is available, including reflection, object serialization, network communication, and JRMI.

    For a simpler HelloWorld program, the following memory requirements exist:

Optimizations
Memory Usage
None (baseline)
1620 KB
Improved garbage collector
1258 KB
Method bytecode unloading
1206 KB
Constant pool unloading
1031 KB
Class unloading and finer-grained method bytecode unloading
558 KB
Class unloading and finer-grained method bytecode and constant pool unloading 522 KB

    Savings for HelloWorld are less substantial than those for the MagnetOS runtime environment because the latter contains many more classes, which can all be unloaded. In the HelloWorld example, only the main class and classes comprising the Java runtime library can be unloaded.


Posted: February 18, 2004
Last Modified: February 18, 2004