Prism Working Paper 2000-02-07 01:12:35 PM
The Prism Project: Vision and Focus
Over the past several
months we have successfully started a number of research efforts within the
context of Prism:
The refinement of FEDORA to make a stable platform for future experimentation.
The initial prototypes with POET in the context of FEDORA and the general movement in
the direction of “policy enforcement”.
Work in the library to define requirements for our policy enforcement work.
Work to study the dynamics of the collaboration within our group.
With the beginning of the new year, it is time to move from this “collection of research efforts” to a more coordinated vision and vehicle for our research activities. This short working paper is an attempt to establish that vision for the project and define the work that needs to be done in the context of that vision.
During our meeting prior to the holiday break, Geri's group presented some results that indicated general agreement on the goals of the project. We need to move beyond general agreement to more specific agreement on the nature of the beast - a digital library - on which we are working.:
A digital library is a managed collection of digital objects (content) and services (mechanisms) associated with the storage, discovery, retrieval, and preservation of those objects. The task of management is three-fold.
This definition contains the notion of a digital library as management layered on top of infrastructure. In particular: a digital library infrastructure provides the service definitions, protocols, and digital object model; digital library instances exploit this infrastructure by selecting services and content and administering policies on those selections.
It is illustrative to compare these digital library instances to the current notion of a portal on the Web, for example Lycos, Yahoo, and AltaVista. These portals offer access to distributed content and provide some services over this content (e.g., searching). Yet, there is little argument that these current portals do not provide the level of curatorial responsibility undertaken by existing libraries. The model of a digital library instance proposed here can be thought of as a hybrid portal. It not only expands the traditional portal concept with enhanced services and content (as described in Sections 2.1 and 2.2), but recognizes the need for integrity maintenance (curatorial responsibility) ranging from casual (a distributed e-print archive) to strict and broad ranging (a research library). The effect then is that content and mechanisms may be shared among multiple digital libraries, but the policies applied to that content and services are tailored to the requirements of the organization (e.g., research library, academic department, etc.) administering the digital library instance.
Traditionally, libraries have primarily asserted policies over objects through full control (ownership and containment) of the (physical) artifacts. With the arrival of digitized content, policy enforcement has become more complex and has mainly been dealt with in two models:
Full control of the bits - e.g., as in many of the digitization projects that largely extend standard library practices to a new genre of materials.
Contractual agreements - which allow the bits to be under somewhat else's control but assume a high level of cooperation with one or more trusted parties.
The hybrid portal model described above presents significant challenges to these traditional practices. Bits (content and mechanisms) are scattered across the Internet. These bits are controlled by parties with varying levels of integrity and management, cooperative agreements with them may be impractical or impossible, and the "importance" of the bits (from the perspective of a specific portal) bears no relation to their location or level of management (e.g., my research may consider a working paper on somebody's personal web page more important than a publication in an established scholarly journal).
The general questions we are exploring in Prism are then:
Given this new context (distribution of bits and the control of them), what new policies (for security and preservation) need to be formulated?
Given these policies and the context (distribution) in which they must operate, what are the mechanisms that are necessary to enforce them? These mechanisms must operate across a number of dimensions including variations in resource (content) types, variations in policy requirements, and variations in actual control over the resource by the party trying to enforce the policy.
Note that this research introduces important questions from both the library (information management) direction and CS direction. Clearly policies must adapt to the new distributed digital context and mechanisms must be developed to enforce these policies.
This section distills these broader questions in to a set of working hypotheses and corresponding work and research tasks (which are in bracketed italics beneath the hypotheses).
Individual digital
libraries (portals) will provide access to a mixture of shared and unshared
content and services.
[Continue to refine the infrastructure elements we are developing (e.g.
Dienst, FEDORA)
to enable research and
demonstration of policy enforcement capabilities].
Integrity (security and preservation) requirements will vary across digital library instances (even though they may share content and services).
An example in the context of access management is - the Cornell library may establish a campus-wide license for free access to all ACM content, whereas the digital library for researchers at Xerox PARC may allow access on a "pay per view" basis.
An example in the context of preservation is - a digital library for the Cornell computer science department may consider the preservation of Gerry Salton's technical reports as central to its intellectual mission, but may not feel the same about older content in the ACM digital library. At the same time the research library may decide that all TRs and all legacy content in the ACM digital library are mission critical and must be preserved.
[Undertake a study
to characterize security and preservation issues in the context of sample communities
(portals) and the range of resources (local, distributed, formal, multiple
genre) that they use.]
Policies are the means
of formally stating these integrity requirements. Using this
terminology, then, each digital library (portal) will have policies that
express its integrity requirements on distributed content and
services.
[Investigate various integrity
requirements for diverse digital library portals and translate these into
policies.]
[Investigate notations for expressing policies and developing methods and
tools for translating these diverse security and preservation requirements
as enforceable policies.]
Policy enforcement
mechanisms need to be constructed that permit enforcement
regardless of the level of control the digital library (the policy
formulator and enforcement agent) has over the objects and services.
[Investigate the implementation of a "policy layer" through
which each portal can enforce its policies on the range of content and
services and range of control over that content and services.]
[Investigate the implementation of this "policy layer" through the
creation of object surrogates using the FEDORA digital object model and
reference monitors].
[Investigate the tension between levels of control over content and services
and the types of policies that need to be enforced (e.g., it may be
necessary to "assert" control in certain situations - suck down
the bits - in order to achieve certain policies).
The ideas outlined above need to be exercised in well-defined test environments. The testbed(s) should provide us with a sufficiently rich set of policy contexts to provide interesting test cases.
If we believe in this hybrid portal model of digital libraries, then we should focus substantial effort on the prototyping of a limited number of portals - investigating, formulating, implementing, and testing policy enforcement in the context of those portals. Given our resource limitations, we should focus on two.
There will certainly be interesting relationships between these two focuses at the policy and mechanism level.
One testbed focus will be the traditional research library model, which can be distinguished in a number of ways by its content, clientele, and services:
content
clientele
services
the research library acts as both a service provider (meeting current users needs) and a cultural repository (safeguarding important cultural and scholarly resources for their own sake and the sake of future, unknown users)
this duel focus on users and collections means that the library is equally driven to provide collection development/identification, access, collection management, and preservation functions—for both material it owns and material it provides only access to.
The library tends toward impartiality in providing content and services, serving a broad range of users at fairly base, untailored levels
The research library in the digital realm may be characterized as a complex, emerging, and fairly amorphous entity motivated by multiple goals that compete with one another. There is a built in tension between serving current customers and safeguarding information resources. Because it serves a broad community and a broad mission, the policy requirements of the research library are necessarily complex and multi-conditional and will provide a test environment in which the tensions between a customer focus and a collections focus will play out in policies and mechanisms. It will also serve as a “moving target” in which new means are defined to provide content and services and to meet the needs of clients.
A concurrent testbed focus will be the investigation of a single portal and its policy requirements - a digital library serving researchers in the CS department at Cornell. This library would be a portal into the information space (both local and distributed materials) relevant to the needs of CS researchers. This will include some fairly conventional materials (textual technical reports) and some radically new resources (the lecture browser materials that coming out of Brian Smith's group).
Content
Researchers in the computer science department use an extremely diverse variety of digital resources - print-only materials, multimedia, licensed materials (e.g., ACM), informal web pages, software, etc. This diversity of resources and their diverse integrity requirements will provide a rich testbed for our work. This testbed supports resources critical to research, teaching, and scholarship, which presents a multi-dimensionality of the policy issues.
Clientele
Primary clientele are faculty, students, and staff of the CS department, who are on the leading edge in their use of electronic resources. In the future, use may be extended to students and faculty from other departments and researchers and distance learners outside of Cornell.
Services
The creation of such a testbed with the involvement of the library will provide sufficient balance to prevent this testbed from addressing only the "current" requirements of the CS researchers. The library will add to the efforts the same aspect that it has traditionally added to information management - curatorial responsibility that goes beyond the immediate needs of its patrons. The library knows what is better for the collection as a whole and for future patrons, than do current patrons. This will raise important issues due to the tension of the policy needs of the immediate patron (the individual CS researcher) and the broader policy needs as determined by the library.
Therefore, we think we should devote a substantial part our testbed efforts - including requirements analysis, collection selection, collection prototyping, evaluation - in building a digital library for the CS community at Cornell. This will serve as an alternative model to the research library so as to invite comparative analysis.