A Space-Optimized Java Virtual Machine
for Resource Constrained Mobile Nodes
Introduction
A current hot topic of research in the field of
Operating Systems is that of sensor
networks, which exploit the capabilities of inexpensive wireless
networking and low-cost, low-power CPUs. Imagine building one thousand
tiny circuitboards with a simple processor and a low-power radio for
transmitting data, and an environmental sensor such as a thermometer.
If you were to scatter these nodes in a field or lake, they could each
take temperature readings at their respective locations. Ideally, you
should be able to collect and analyze that data at a central computer
to yield an overall picture of the temperature differences throughout
the area covered. There are many challenges in accomplishing this task,
however:
- Nodes out of range of the central data collector cannot
communicate with it, but can still talk to nearby nodes. It should be
possible to relay messages, across a complex and possibly changing
network, to nodes far away.
- If the radios are a big power drain, it may be prohibitive to
relay complete information from every node to a central source. It
should be possible to aggregate and summarize data "within the
network," while it's being relayed, to save on transmission costs.
- If not every node is running the same software (e.g. one node is
summarizing the data), there may be bad and good places to locate the
nodes with extra responsibilities. A node that is providing important
services, for instance, might be best placed in the center, rather than
at the fringe, where several hops will be required for most packets to
reach it. Determining an ideal placement, however, cannot always be
done in advance, for instance if the nodes are moving around. It is
preferable to have the network track the traffic and the
locations of each node, and migrate responsibilities from node to node
to reduce battery drain.
- Mobile nodes that have severely limited memory capacity cannot
run even simple programs written in popular languages like C++ or Java
because often the runtime environment for this code is several
megabytes in size.
Wireless routing protocols such as DSR,
DSDV,
AODV,
and TORA
seek to address the first problem. Research into how to
best accomplish the second goal without central coordination is still
ongoing. The third goal is being addressed by the MagnetOS
project at Cornell. The fourth concern is the focus of this paper.
The MagnetOS project has created a runtime
environment for Java programs that analyzes network traffic and
migrates code from node to node as appropriate to reduce power
consumption by decreasing transmission range. One benefit of the
MagnetOS runtime is that it can optimize the performance of any Java
code, without requiring the program to be custom-tailored for the
MagnetOS system.
Unfortunately, while Java is an ideal platform
for developing MagnetOS and programs to run on it, the memory footprint
of Java can be enormous: far beyond the capacity of even well-equipped
sensor nodes. To demonstrate the applicability of MagnetOS for
low-cost, resource constrained computing devices, it was necessary to
build a Java interpreter with special space optimizations. While other
compact JVM's exist, they do not support the wide range of services,
such as reflection,
object
serialization, and JRMI
(Java Remote Method
Invocation) required by MagnetOS.
Approach
Back when computers had under a megabyte of
memory and programs fit on floppy disks, developers faced similar
constraints in designing their applications. They observed that many
parts of their programs were infrequently used and did not need to be
occupying space in memory unless they were needed. For instance, a user
might spend hours writing a paper in a word processor and only a minute
using the print function. Thus, programmers designed their software to
leave the printing code unloaded until called for. When the user chose
to print their document, the printing instructions were loaded off the
disk and executed. When the job was finished, they could be unloaded to
make room for something else.
This approach can be applied to a Java
interpreter to accomplish similar memory savings. Conventional Java
interpreters keep a class in memory after it has been loaded for the
first time; this is unnecessary and should be avoided in a
resource-constrained environment.
Information in Memory
When a Java class is loaded by a JVM, several
pieces are present in RAM:
- The bytecode comprising each of the class's methods
- The Java constant pool
which contains string literals, the names and signatures of each
method, and other constants used in the code
- Additional details about the class, such as the list of fields
and methods
The largest single part of the class is generally
the constant
pool. The other pieces, however, also take up considerable RAM.
JVM Modifications
This project began with the Kimera JVM, a
third-party Java interpreter developed at the University of Washington,
for which we had source code. Kimera supported nearly all Java features
necessary to run the MagnetOS runtime; support for the remainder was
added and bugs were fixed. The interpreter, originally designed for
Java 1.1, was extensively modified to achieve compatibility with the
standard Java J2SE 1.4.1 runtime libraries.
The first space optimization performed was
to defer loading of a Java method's bytecode until the particular
method is executed. A cache of recently used methods is maintained, and
once
a method's bytecode leaves the cache, it must be reloaded from disk.
This modification was performed first because it was simple to
intercept method calls within the interpreter and add the additional
loading and unloading code.
The second modification made was to unload constant
pools during program execution. The table of strings used by a Java
class can be quite large because it contains the name and signature of
each method and field, plus string literals used in the code. The
structures used in the JVM were changed to no longer store the constant
pool as a transparent object, but rather to maintain an opaque
reference to it, with access still effected through the preexisting
accessor functions. The accessor functions were then modified to return
values from the cache of recently used constant pools, if available, or
else load the constant pool from disk to provide the required
information. As an additional optimization, strings are pooled between
constant pools. In other words, different constant pools may contain
separate copies of frequently used strings, such as "toString" or
"hashCode". To avoid this, one global pool of constant pool strings is
maintained,
with pointers from the individual constant pools into this global
structure. This
presents a challenge, however, in that it is difficult to determine
when all the constant pools using a particular string have been
unloaded, which should allow the string to be removed from the global
pool and freed. Detecting this case, however, is simplified through the
use of a garbage collector. Strings are allocated in a GC heap and when
no references to a string exist any longer, the string is automatically
freed and removed from the pool.
The third modification was to actually unload entire
classes. Doing so removes all memory-resident information about a
class, as if it had never been loaded in the first place. This allows
for the greatest memory savings. Because class loading is
deterministic, reloading a class will result in all of its contents
being present at the same offsets as when it was originally loaded,
making it possible to unload and reload a class without adjustments to
instantiated objects. It is necessary, however, to maintain the values
of the static fields of the class even after the class has been
unloaded, or else they will revert to their default values when
reloading occurs. Class unloading was made possible by introducing a
level of indirection throughout the entire program. Class handles were introduced,
which are opaque handles to a class. A boundary was established
throughout the interpreter between functions that access the internal
structures of a class directly, and those that only have access to the
indirect handle. When a function needs to access details about a class
but only maintains a handle to it, it calls a function to dereference the handle. This
invokes the class unloading/reloading code, which searches its cache of
recently used classes for a memory-resident copy of the class. If no
such cached copy exists, the code loads the class from disk, restores
the correct values of the static fields, and returns the newly-loaded
copy. Classes and their internal structures are allocated using a
garbage collector. As long as code maintains a reference to a
particular class, it remains in memory. Once it no longer needs to
access the internal structures of the class, however, its reference to
the class will go away and the garbage collector will be able to
reclaim the space occupied by the class. When it is next needed,
subsequent code will dereference the class handle again, causing the
class to be reloaded from disk.
Other modifications were made to support the
memory-saving objectives. Extraneous fields in bookkeeping structures
were eliminated. The garbage collector was overhauled to move its
speed-compactness tradeoff more toward compactness and add
notify-on-free support for correct cache maintenance. A custom memory
allocator was designed to improve the efficiency of the JVM's memory
allocations by moving them closer to the hardware level, and support
for compactly allocating a group of small objects was enhanced. Memory
allocation functions were instrumented and a simulator was used to tune
allocation block sizes.
Finally, as an additional optimization, an optional
compression feature was added to our custom memory allocation
framework. A working set of recently used virtual memory pages is
maintained, and when a page has not been used for a while, the data is
compressed using run-length encoding and Lempel-Ziv
compression, and the page's
protection status is changed from "read/write" to "no access." When the
interpreter next references a value stored within
the page, a page fault occurs, the contents are decompressed and the
protection level is restored to "read/write," and the access obtains
the correct value.
Memory Savings
Before performing these modifications to the JVM,
the interpreter required over 9 MB of memory to load and run the
MagnetOS runtime. With all of the space optimizations turned on, it is
possible to run the exact same code with only 1.3 MB of memory. No
modifications to the MagnetOS sources nor to the Java runtime library
are necessary to achieve this savings, and the full functionality of
the JVM is available, including reflection, object serialization,
network communication, and JRMI.
For a simpler HelloWorld program, the following
memory requirements exist:
Optimizations
|
Memory
Usage
|
None (baseline)
|
1620 KB
|
Improved garbage collector
|
1258 KB
|
Method bytecode unloading
|
1206 KB
|
Constant pool unloading
|
1031 KB
|
Class unloading and
finer-grained method bytecode unloading
|
558 KB
|
Class unloading and
finer-grained method bytecode and constant pool unloading |
522 KB
|
Savings for HelloWorld are less substantial than
those for the MagnetOS runtime environment because the latter contains
many more classes, which can all be unloaded. In the HelloWorld
example, only the main class and classes comprising the Java runtime
library can be unloaded.
Posted:
February 18, 2004
Last Modified: February 18, 2004