Overview White Paper View Source

Motivation

One of the most popular version control systems in use today is CVS. Developers often complain of several shortcomings in the design of CVS; for example, files cannot be moved or renamed without losing version metadata. DVS addresses weaknesses in CVS while maintaining backwards compatibility as much as possible.

The DVS Model

The DVS Model improves on CVS by tracking directory metadata in addition to file metadata. Each directory is represented in the repository as a contents file, which contains a list of the files in the directory and their versions. Committing changes to a directory is simply committing a new version of the directory's contents file.

This metadata is sufficient to restore any previous state of a souce tree. If a file is removed, the prior version of the directory records the presence of the file. When a file is renamed, the prior version of the directory records the prior name.

Additionally, when a directory is checked out from DVS, everything from that directory down in the file system tree forms a consistent snapshot of that set of data files.

Three types of operations benefit from keeping a record of the sequential changes made to directories. First, consider renaming or moving a file. A rename operation only requires changing the local name in the contents file for the appropriate directory (See "Contents Files" below). A move operation only requires removing the file's entry from one contents file and placing it in another contents file.

Second, consider deletion and restoration of a file. When a file is deleted, it is removed from the contents file but not from the repository. The contents file versions show precisely when the file was deleted. A file can be restored by updating the contents file. There is no data or metadata that is lost or that needs to be located.

Third, consider the restoration old software configurations. If a file in a branch is deleted in CVS, its status as part of that set may be lost. In contrast, DVS records all configurations -- including which files are present and their versions -- so recreating an old configuration is possible even if files have been moved or deleted.

Another benefit from this model is that it easily accounts for the common development scenario in which different subsystems are concurrently developed, perhaps by different people. DVS makes it easy to use different versions of subsystems as independent entities. For example, each developer could use the latest versions of her own files but the latest stable versions of all of the subsystems on which her project depends.

It is also possible to make other subsystems appear as subdirectories of the current project, explicitly capturing the dependence relation between the projects. CVS has no mechanism for this.

DVS is designed to retain as much backward compatibility with CVS as possible, and the interface presented to the user mimics that of CVS (see Command-based Interface" below). DVS does not require any migration process to use an existing CVS repository. DVS can use them directly, adding it's own features as an overlay on top of CVS. Further, CVS users can still use the repository even after some developers have started using DVS, because DVS only adds functionality, nothing is removed or replaced. Obviously, DVS' features would not be available to CVS users.

DVS does not use a database other than plain text RCS files, because if for any reason the administrator needs to modify the repository directly, the task is much more difficult for having used a database. Further, DVS is designed with simplicity in mind - rather than being a full software configuration management system, DVS handles versioning of projects and leaves the developer free to use any suitable build tools. DVS can, of course, version any files used to control the build process, such as makefiles.

Related Work

DVS Structure

The DVS repository is a directed acyclic graph (DAG) in which directory nodes have children and metadata but no data of their own and file nodes which have no children or metadata but instead have data. A subdirectory appears as a data file in its parent directory and has a version like other data files, but it also has its own set of data files that it contains.

Directories as a Graph

A collection of files and directories in DVS is represented as a DAG rather than the traditional, more restrictive tree model. Each file and directory is represented as a node in the graph, and the graph has an edge from node d to node f, if directory d contains f in the filesystem tree.

While the file system where checked out files are placed will usually still impose use of a tree for the directory structure, a subdirectory in a project may appear in arbitrarily many places provided that no cycles are introduced into the graph. This does not replace branches, which are still required for concurrent development.

Relaxing the model from a tree to a DAG allows for subsystems to conveniently and efficiently depend on one another, as any directory appearing in the project in multiple places is stored only once in the repository. Additionally, this facilitates dependencies on different versions of a subsystem without sacrificing consistency. For example, two different projects might use different versions of the same library, which can be accomplished by making the library a subdirectory of both.

Contents Files

The following grammar describes the syntax of DVS contents files:

         contents              ::= entry_list
         entry_list            ::= <empty> | entry
         entry_list entry      ::= LOCAL_NAME UID TYPE VERSION <newline>

LOCAL_NAME is the name used in the working directory. This name is not required to be unique. For example, two files may each be named foo.c if they're in different directories. For each of these files, foo.c is the local name.

UID is a file's unique identifier used in the repository, never removed, and never reused. Metadata is tracked for each UID regardless of how many places the file appears in the source tree the DVS constructs. See below for a more detailed discussion of this field and its use.

TYPE is one of "file," "dir," or "add." The first two indicate the file type in the obvious sense. "Add" indicates to DVS that the file is a recent addition and thus not yet in the repository. All add operations in DVS are local until committed, at which time all additions are processed in a batch. At commit time, the type of such a contents entry is changed to either "file" or "dir." Contents files in the repository use only these two types, the "add" type appears only in the local version of a directory before commit.

VERSION is the version number of the file. These version numbers are chosen in the same way that RCS version numbers are chosen [RCS].

Example Contents File:

           file1.c src/subsys1/test.c    file 1.3
           file2.c src/subsys2/util.c    file 1.8
           file3.c src/subsys2/util0.c   file 1.2
           libfoo  src/libs/foo          dir  1.2
           newfile src/subsys1/newtest.c add  1.1
This example shows several properties of contents files:

UIDs

As mentioned above, UIDs record a file's true identity in the repository. These identifiers must be unique, of course, and once a file is added to DVS with a particular UID, that UID is never removed or reused, and the corresponding file is never deleted from the repository. This presents a layer of indirection in that the file names DVS shows to the user need not be globally unique, and after a file is deleted, a new and unrelated one may be created with the same name without creating a name conflict with the previous file.

Because DVS is implemented as a layer on top of CVS, the expected UID is a relative path in the CVS sandbox. Using a back end other than CVS may suggest or require a different scheme for UIDs.

Choosing names

In order to insure that names are unique but still comprehensible, in case a repository administrator wishes to work directly with the repository, the first choice for a UID is the LOCAL_NAME prefixed by the path from the top of the project tree in the filesystem. In the absence of any name conflicts, this is clearly the best choice and makes the repository easy to maintain. However, given that files may be deleted, moved, renamed, etc. it is expected that name conflicts will arise. In such a case, a natural number is appended to the name to make it unique. These numbers are chosen sequentially beginning with 1, and the lowest unused number is chosen. Because files are not removed from the repository, a used number will never become available.

A special value ("_") is used for files that are scheduled for addition to the repository (those that have type "add"). This is necessary because in the general case choosing a valid name is not possible without contacting the (possibly remote) repository, so the correct UID is not known.

The indirection afforded through this mechanism facilitates the preservation of metadata across operations such as remove and restore because metadata is attached to the UID rather than the LOCAL_NAME for each file DVS tracks, allowing the LOCAL_NAMEs seen by the user to change arbitrarily as suits the project at hand.

Operations on Contents Files

Overview of Behaviors

initialize Generate contents files for any directories already present, perform "add" operation for each directory or normal file already present. This is used for creating a new project, and once the repository has been created, initialize is not used.
checkout <obj> Read contents file to find UID for <obj>, then fetch <obj> from the repository. This is a top-down recursive procedure. If <obj> is a normal file, no recursion takes place.
remove <obj> Removes <obj> from its parent directory's contents file. This is never a recursive procedure.
update <obj> The latest version of the specified object is fetched from the repository for comparison with the current instance. If the object doesn't exist locally, it is created. Otherwise, DVS notes whether it is modified or unmodified. If the specified object is a directory, this is a top-down recursive procedure beginning with the directory's contents file. After updating the contents file, each node in the DAG listed in the contents file is updated.
add <obj> Adds an entry to the contents file for the containing directory. Note that this entry will have a special type and a special UID to mark it as a new addition, and both of these fields will be modified when the containing directory is committed.
link <src> <dest> Creates a contents entry for <dest> having the same UID and version as <src>. No new file is created and <src> must already exist.
commit <obj> Compares <obj> to the latest version of <obj> in the repository. If there are conflicts (which can only happen in concurrent developement where <obj> has been changed and committed to the repository since the local copy was checked out), the conflict is noted and must be resolved before the local changes are reflected in the repository.

Examples

1.3 is the latest committed version of directory d.

In the sample repository, the left tree in the repository represents the contents files, while the right tree represents latest versions, and the "sample source tree" represents the files checked out. Thus an operation such as "update" from within directory b would fetch d1.2, e1.1, and so on down the tree based on the contents files. But "update d" would fetch d1.3 and then follow its contents file.

In the source tree, the files f1, f2, and f3 may be different from the versions in the repository. Assume that they were checked out as the versions specified for them, but do not assume that they have been changed or that they have not.

Examples of how contents files change

Let NCC denote checking for conflicts and notifying the user if any are found.

Supposed we're in directory e from the above examples.
Suppose we wish to commit new versions to e. This is an example of how a contents file changes when a directory is committed:

DVS Implementation

Repository API

Structures used:
 
Contents Represents a single contents file
ContentsEntry Represents one entry in a contents file
Repository Represents an entire DVS source tree
 
Only the Repository API is exposed. The Contents and ContentsEntry instances are hidden and they are mutually recursive.

Public Members: none
Public Methods:
Private Methods and Members

CVS Back end

Command Structure

DVS uses a structure, class Command, which contains a command, a vector of arguments, and a directory. Excution of a command uses java's exec method, passing all of this information and using a null environment (thus using the environment passed to DVS), since CVS often relies on evironment variables. A simple threaded class is used to obtain output from the command during execution and relay it to DVS' output streams.

Command API

Public Members: none
Public Methods:
Private Methods:

User Interface

In general, UI implementations will use the public API of a repository object. The DVS UI API presented here is used to acquire a repository object.

UI API

Command-based Interface

In the spirit of CVS, DVS' general command syntax is the following.

Formats are specified below in "Formats for Arguments."


Formats for Arguments

[1] Versions
vonly v
v1:v2inclusive range
v1:v1 and later
:v2not later than v2
[2] Dates

Dates are specified in the format YYYYMMDD

[3] Modules

A module may have a suffix specifying a subdirectory within the module to checkout. In such a case, only that part of the module is checked out from the repository. For example, one could check out subdirectory sub of module mod with dvs checkout mod/sub.

References