Overview | White Paper | View Source |
One of the most popular version control systems in use today is CVS. Developers often complain of several shortcomings in the design of CVS; for example, files cannot be moved or renamed without losing version metadata. DVS addresses weaknesses in CVS while maintaining backwards compatibility as much as possible.
The DVS Model improves on CVS by tracking directory metadata in addition to file metadata. Each directory is represented in the repository as a contents file, which contains a list of the files in the directory and their versions. Committing changes to a directory is simply committing a new version of the directory's contents file.
This metadata is sufficient to restore any previous state of a souce tree. If a file is removed, the prior version of the directory records the presence of the file. When a file is renamed, the prior version of the directory records the prior name.
Additionally, when a directory is checked out from DVS, everything from that directory down in the file system tree forms a consistent snapshot of that set of data files.
Three types of operations benefit from keeping a record of the sequential changes made to directories. First, consider renaming or moving a file. A rename operation only requires changing the local name in the contents file for the appropriate directory (See "Contents Files" below). A move operation only requires removing the file's entry from one contents file and placing it in another contents file.
Second, consider deletion and restoration of a file. When a file is deleted, it is removed from the contents file but not from the repository. The contents file versions show precisely when the file was deleted. A file can be restored by updating the contents file. There is no data or metadata that is lost or that needs to be located.
Third, consider the restoration old software configurations. If a file in a branch is deleted in CVS, its status as part of that set may be lost. In contrast, DVS records all configurations -- including which files are present and their versions -- so recreating an old configuration is possible even if files have been moved or deleted.
Another benefit from this model is that it easily accounts for the common development scenario in which different subsystems are concurrently developed, perhaps by different people. DVS makes it easy to use different versions of subsystems as independent entities. For example, each developer could use the latest versions of her own files but the latest stable versions of all of the subsystems on which her project depends.
It is also possible to make other subsystems appear as subdirectories of the current project, explicitly capturing the dependence relation between the projects. CVS has no mechanism for this.
DVS is designed to retain as much backward compatibility with CVS as possible, and the interface presented to the user mimics that of CVS (see Command-based Interface" below). DVS does not require any migration process to use an existing CVS repository. DVS can use them directly, adding it's own features as an overlay on top of CVS. Further, CVS users can still use the repository even after some developers have started using DVS, because DVS only adds functionality, nothing is removed or replaced. Obviously, DVS' features would not be available to CVS users.
DVS does not use a database other than plain text RCS files, because if for any reason the administrator needs to modify the repository directly, the task is much more difficult for having used a database. Further, DVS is designed with simplicity in mind - rather than being a full software configuration management system, DVS handles versioning of projects and leaves the developer free to use any suitable build tools. DVS can, of course, version any files used to control the build process, such as makefiles.
ClearCase integrates version control directly into the file system by providing a custom file system for storing files to be versioned. Working directories are presented as views, which must be customized by the developer to select the desired files and versions to display. All files are kept remotely on the ClearCase server and before a file is modified it is copied to the local machine. When changes are committed they are propagated back to the ClearCase server.
Like DVS, ClearCase versions directories. After modifying files, the containing directory needs to be committed as well so that ClearCase knows which file versions go together.
While the view shows the current version of each file, other versions may be selected explicitly using a hierarchical naming system. For example, test.cc@@/main/bugfix/3 represents version 3 of test.cc on the bugfix branch.
Checking files out typically requires the files to be locked and they are unlocked when the developer either releases them or commits changes. ClearCase also supports the copy-modify-merge process used in systems such as CVS and its derivatives.
Although DVS and ClearCase have similar models, their implementations differ greatly. One drawback to ClearCase is the potentially high network overhead. Files often reside on a remote machine even when being used locally (namely files that are read but not modified). When developers are spread across a large geographic area this can lead to very poor performance, as it requires a great deal of communication between the client and the server.
CVS is based on RCS which uses plain text files to store data in a repository. CVS versions files but not directories.
One major shortcoming of CVS is that directory version metadata is not captured. When files are deleted, CVS can't reconstruct a prior source tree that included that file because it doesn't record versions of trees. This limits the usefulness of the repository as a whole.
Another shortcoming is that moving or renaming files requires the administrator to manually edit the repository to avoid losing all version history and metadata for the file being moved or renamed. This inhibits project growth and evolution, because very often data files change in use over time or are renamed or moved to a new location. Requiring the administrator to manually edit the repository decreases productivity, but being restricted to a source layout that no longer reflects the organization of the project also decreases productivity.
Like CVS, Perforce uses RCS files for storing revision updates. It uses a different mechanism than RCS for branching however [PFIFB]. Perforce does not version directories but has some other means for restoring previous directory states.
The white paper for Perforce's branch model, Inter-File Branching [PFIFB], presents a difficulty in data representations used by version control systems. In particular it stresses a difference in treatment of variant names and file names, pointing out that deltas are only merged between variants of single files and that only one variant is presented to the user at a time (such as in the filesystem). This results in two independent naming systems for the data stored in the repository. However, this rarely presents a problem in practice.
Sourcesafe's model maps projects and subprojects to directories. Every project corresponds to a directory, but not all directories are projects. Data files are stored in a SQL database. Directories do not have versions, but sufficient version metadata is recorded to allow reconstruction of prior source trees.
The combination of this model and its implementation is very restrictive: files may only be moved from a project to a parent project; directories can't be moved at all, and moving projects loses metadata.
The implementation imposes some arbitrary limits such as upper bounds of 8000 files in a project and 15 levels of nested subprojects. While many projects will not exceed these limits, large projects will be inhibited in their growth, and reorganization will be difficult.
Further, storing data files in a SQL database limits the ability of developers and administrators to fix problems in the repository should they arise, and special tools are needed to extract the data from the database, for example for backups.
Subversion presents a model in which the repository has versions rather than individual files. This allows easy reconstruction of any previous source tree. Subversion's repository uses a Berkeley DataBase system for storing data files.
This is counter intuitive because when a file changes, the latest version for all files is incremented. For example, rather than having version 4 of foo.c, one can have foo.c as it appeared in version 4 of the repository. But files are usually developed as independent entities, and developers thus think of different versions of their files rather than different versions of the whole repository.
Vesta is a complete software configuration management system. Access to files is provided through NFS, and a scripting language is provided for defining how software is built including dependencies, compilers, etc. Vesta uses reserved checkouts - the next version is locked until the developer who checked it out releases it or commits changes. Vesta stores immutable snapshots of the working directory, and committing changes is committing the latest snapshot, at which point the reservation is removed and the new versions are available for other developers to use. Directories are not versioned, but prior source trees can be reconstructed.
The DVS repository is a directed acyclic graph (DAG) in which directory nodes have children and metadata but no data of their own and file nodes which have no children or metadata but instead have data. A subdirectory appears as a data file in its parent directory and has a version like other data files, but it also has its own set of data files that it contains.
A collection of files and directories in DVS is represented as a DAG rather than the traditional, more restrictive tree model. Each file and directory is represented as a node in the graph, and the graph has an edge from node d to node f, if directory d contains f in the filesystem tree.
While the file system where checked out files are placed will usually still impose use of a tree for the directory structure, a subdirectory in a project may appear in arbitrarily many places provided that no cycles are introduced into the graph. This does not replace branches, which are still required for concurrent development.
Relaxing the model from a tree to a DAG allows for subsystems to conveniently and efficiently depend on one another, as any directory appearing in the project in multiple places is stored only once in the repository. Additionally, this facilitates dependencies on different versions of a subsystem without sacrificing consistency. For example, two different projects might use different versions of the same library, which can be accomplished by making the library a subdirectory of both.
The following grammar describes the syntax of DVS contents files:
contents ::= entry_list entry_list ::= <empty> | entry entry_list entry ::= LOCAL_NAME UID TYPE VERSION <newline>
LOCAL_NAME is the name used in the working directory. This name is not required to be unique. For example, two files may each be named foo.c if they're in different directories. For each of these files, foo.c is the local name.
UID is a file's unique identifier used in the repository, never removed, and never reused. Metadata is tracked for each UID regardless of how many places the file appears in the source tree the DVS constructs. See below for a more detailed discussion of this field and its use.
TYPE is one of "file," "dir," or "add." The first two indicate the file type in the obvious sense. "Add" indicates to DVS that the file is a recent addition and thus not yet in the repository. All add operations in DVS are local until committed, at which time all additions are processed in a batch. At commit time, the type of such a contents entry is changed to either "file" or "dir." Contents files in the repository use only these two types, the "add" type appears only in the local version of a directory before commit.
VERSION is the version number of the file. These version numbers are chosen in the same way that RCS version numbers are chosen [RCS].
Example Contents File:
file1.c src/subsys1/test.c file 1.3 file2.c src/subsys2/util.c file 1.8 file3.c src/subsys2/util0.c file 1.2 libfoo src/libs/foo dir 1.2 newfile src/subsys1/newtest.c add 1.1This example shows several properties of contents files:
Because DVS is implemented as a layer on top of CVS, the expected UID is a relative path in the CVS sandbox. Using a back end other than CVS may suggest or require a different scheme for UIDs.
Choosing names
In order to insure that names are unique but still comprehensible, in case a repository administrator wishes to work directly with the repository, the first choice for a UID is the LOCAL_NAME prefixed by the path from the top of the project tree in the filesystem. In the absence of any name conflicts, this is clearly the best choice and makes the repository easy to maintain. However, given that files may be deleted, moved, renamed, etc. it is expected that name conflicts will arise. In such a case, a natural number is appended to the name to make it unique. These numbers are chosen sequentially beginning with 1, and the lowest unused number is chosen. Because files are not removed from the repository, a used number will never become available.
A special value ("_") is used for files that are scheduled for addition to the repository (those that have type "add"). This is necessary because in the general case choosing a valid name is not possible without contacting the (possibly remote) repository, so the correct UID is not known.
The indirection afforded through this mechanism facilitates the preservation of metadata across operations such as remove and restore because metadata is attached to the UID rather than the LOCAL_NAME for each file DVS tracks, allowing the LOCAL_NAMEs seen by the user to change arbitrarily as suits the project at hand.
initialize | Generate contents files for any directories already present, perform "add" operation for each directory or normal file already present. This is used for creating a new project, and once the repository has been created, initialize is not used. |
Read contents file to find UID for <obj>, then fetch <obj> from the repository. This is a top-down recursive procedure. If <obj> is a normal file, no recursion takes place. | |
remove <obj> | Removes <obj> from its parent directory's contents file. This is never a recursive procedure. |
update <obj> | The latest version of the specified object is fetched from the repository for comparison with the current instance. If the object doesn't exist locally, it is created. Otherwise, DVS notes whether it is modified or unmodified. If the specified object is a directory, this is a top-down recursive procedure beginning with the directory's contents file. After updating the contents file, each node in the DAG listed in the contents file is updated. |
add <obj> | Adds an entry to the contents file for the containing directory. Note that this entry will have a special type and a special UID to mark it as a new addition, and both of these fields will be modified when the containing directory is committed. |
Creates a contents entry for <dest> having the same UID and version as <src>. No new file is created and <src> must already exist. | |
commit <obj> | Compares <obj> to the latest version of <obj> in the repository. If there are conflicts (which can only happen in concurrent developement where <obj> has been changed and committed to the repository since the local copy was checked out), the conflict is noted and must be resolved before the local changes are reflected in the repository. |
Sample repository: | Sample Source Tree: | |
b |__d1.2 . . . . . . d1.3 | |___e1.1 . . . . |__e1.2 | |___f1 1.1 . . . |___f1 1.1 | |___f2 1.1 . . . |___f2 1.2 | |___f3 1.1 . . . |___f3 1.3 |__c1.7 | |___m1.1 | |___n1.3 | ... |
b |__d1.2 | |__e1.1 | |___f1 1.1 | |___f2 1.1 | |___f3 1.1 | ... |
In the sample repository, the left tree in the repository represents the contents files, while the right tree represents latest versions, and the "sample source tree" represents the files checked out. Thus an operation such as "update" from within directory b would fetch d1.2, e1.1, and so on down the tree based on the contents files. But "update d" would fetch d1.3 and then follow its contents file.
In the source tree, the files f1, f2, and f3 may be different from the versions in the repository. Assume that they were checked out as the versions specified for them, but do not assume that they have been changed or that they have not.
Operation | Actions | |
update d | NCC; d1.3 brought to sandbox | |
update f2 | NCC; f2 1.2 brought to sandbox | |
update | NCC; d1.2, e1.1, ... used for update | |
checkout e | e1.1 checked out | |
checkout e 1.2 | e1.2 checked out | |
Note that none of these operations makes changes to any contents file. |
move f1 f5 (requires modifying e1.1 contents) | |||
f1 /path/to/f1 file 1.1 f2 /path/to/f2 file 1.1 f3 /path/to/f3 file 1.1 |
==> |
f2 /path/to/f2 file 1.1 f3 /path/to/f3 file 1.1 f5 /path/to/f1 file 1.1 |
|
modify f1 and commit: | |||
f1 /path/to/f1 file 1.1 f2 /path/to/f2 file 1.1 f3 /path/to/f3 file 1.1 |
==> |
f1 /path/to/f1 file 1.2 f2 /path/to/f2 file 1.1 f3 /path/to/f3 file 1.1 |
|
remove f1: | |||
f1 /path/to/f1 file 1.2 f2 /path/to/f2 file 1.1 f3 /path/to/f3 file 1.1 |
==> |
f1 /path/to/f1 removed 1.2 f2 /path/to/f2 file 1.1 f3 /path/to/f3 file 1.1 |
|
link ../c/m . | |||
f1 /path/to/f1 file 1.2 f2 /path/to/f2 file 1.1 f3 /path/to/f3 file 1.1 |
==> |
c: no change
e:
f1 /path/to/f1 file 1.2 f2 /path/to/f2 file 1.1 f3 /path/to/f3 file 1.1 m /path/to/m file 1.1 |
|
d's contents: | |||
e /path/to/e dir 1.1 ... |
==> |
e /path/to/e dir 1.2 ... |
|
Structures used: | ||
Contents | Represents a single contents file | |
ContentsEntry | Represents one entry in a contents file | |
Repository | Represents an entire DVS source tree | |
Only the Repository API is exposed. The Contents and ContentsEntry instances are hidden and they are mutually recursive. |
Initializes a directory as a new DVS source tree. Argument rdir is the location in the file system where files will be stored when checked out of the (possibly remote) repository. Files are created under this directory using their UIDs, and no file occurs twice in the file system under rdir. Argument ddir is the directory in the file system where DVS will create the files that appear in a project. Directories are created per contents file and data files are created by local name under this directory.
DVS is currently implemented as a layer on top of CVS. As such, rdir is a CVS sandbox and every file in the CVS repository corresponds to precisly one UID listed in the contents files DVS keeps. One such file may occur multiple times under ddir depending on the DVS operations performed. CVS's metadata directories, called "CVS" and stored in each directory in a CVS module, are never recreated by DVS, they exist only in the CVS sandbox.
AddFile adds a file to the repository. The specified file must be present in the source tree. An entry is created in the appropriate contents file (having type "add").
LinkFile links src, which must already be tracked by DVS, into the new location dest. This consists of duplicating the contents entry for src into the contents for dest. If src and dest specifiy different file names, the name for dest is used in the entry added to dest's contents entry. The UID, TYPE and VERSION fields of src's contents entry are always used in the new entry for dest.
RemoveFile removes a file from DVS' revision tracking. The file must be present in the source tree. Its contents entry is removed and future checkout operations that don't explicitly name this file will not retrieve it. If the file noted by the UID is present in other locations, those locations are uneffected.
UpdateFile updates the specified file from the DVS repository. This entails fetching the file for comparison. If no changes have been made in the repository, the process stops. Otherwise, if no changes have been made locally, the local version is modified to reflect all changes from the version in the repository. If both have changed and CVS can merge them, it does, otherwise markers are placed in the file showing where the two versions conflict.
Because DVS is implemented on top of CVS, DVS depends on CVS for most of this work. CVS' update operation is performed on the UID file, and changes in the repository are propagated to the local DVS version. Conflicts detected by CVS are noted by DVS.
TagFile tags the specified file with the given symbolic tag. A CVS tag operation is performed on the file. It is an error to attempt to tag a file which has been modified from the version that was checked out.
Each Contents object corresponds to a directory node in the DAG for the project.
This is a vector of ContentsEntry objects, each corresponding to a single file node in the DAG for the project.
ReadContentsTree walks the directory tree rooted at root reading contents files for each directory it encounters. A DAG is constructed representing the project files and directories and the current Contents object corresponds to a directory node in the DAG. Because there are no cycles allowed in a DVS DAG, this node can be thought of as a root node, and the rest of the project is represented below this node.
ReadContentsFile reads the contents file connected to the specified BufferedReader. The entries are returned in a Contents object.
WriteContentsTree writes contents files for all of the directories represented in the current Contents object's DAG. All directories are assumed to exist already in the filesystem.
WriteContentsFile writes the current Contents object's entries to the file connected to the specified PrintWriter.
MakeContentsTree walks a directory tree rooted at root and creates a tree of Contents and ContentsEntry objects corresponding to the directories and files it encounters, respectively. This method is intended to be used to create a set of contents files for a directory tree that does not have them, for example when initializing a new DVS project.
DVSToCVS propagates changes made to the local DVS tree to the CVS sandbox used for checking files into the CVS repository. This includes contents files for all directories being committed as well as the data files.
CVSToDVS moves data files from the CVS sandbox used as the DVS repository into the DVS working tree. Each directory's contents file is used to determine which files appear in the DVS working tree.
These are static types defining the states which a file may take in the repository.
These hold the values for the fields in the contents file.
This vector represents the subdirectories in the source tree below the directory represented by the current ContentsEntry. If the current entry's type is not typeDir, then this member is not used when traversing the contents tree.
SplitEntry parses a string looking for the fields in a ContentsEntry and returns a ContentsEntry object whose members are initialized to the values in the string.
ToString returns a string representation of this ContentsEntry. This is the reverse of splitEntry in that splitEntry(C.toString()) will create a ContentsEntry whose members are equivalent to those of the ContentsEntry C.
AddSubDir adds the specified Contents object to this ContentsEntry's vector of subdirectories. This only makes sense if the type of the current ContentsEntry is typeDir, and during traversal of the DAG the subdirectory vector of a ContentsEntry is only considered if the ContentsEntry is of type typeDir.
DVS uses a structure, class Command, which contains a command, a vector of arguments, and a directory. Excution of a command uses java's exec method, passing all of this information and using a null environment (thus using the environment passed to DVS), since CVS often relies on evironment variables. A simple threaded class is used to obtain output from the command during execution and relay it to DVS' output streams.
Execute executes the command. This entails creation of two StreamGobbler objects, one for each output stream, executing the command, and waiting for termination. Execute's return value is the exit status of the command executed.
StreamGobbler is a class which simply reads all data from an output stream (its input stream) and passes it to its own output stream [STRGBL].
ToString returns a string representation of this command. This is the concatenation of all of the arguments to the command, separated by space characters.
AddArg appends the given string to this command's vector of arguments to be passed at execution time.
ToArray returns an array representation of this command. This is the format used by java's exec() method - each element of the returned array is one token of the command to be executed. Exec() is not given a string representation (such as that returned by toString()) because java would split the string at whitespace characters and the tokens are permitted to have embedded whitespace.
In general, UI implementations will use the public API of a repository object. The DVS UI API presented here is used to acquire a repository object.
GetRepository returns a repository object representing the DVS source tree rooted at root. It is an error if there is no repository rooted there.
In the spirit of CVS, DVS' general command syntax is the following.
Formats are specified below in "Formats for Arguments."
DVS Arguments | |
---|---|
--help | Displays a summary of DVS use |
--version | Displays version of DVS |
--repository <where> | Specifies the location of the repository |
--pending | Show pending operations defered until next commit |
DVS Commands and their Arguments | |||
---|---|---|---|
add | Schedules a file or directory for addition to the DVS repository | ||
dvs add <files> | |||
files | List of files to add | ||
checkout | Retrieves files from the repository for viewing or editing | ||
dvs checkout [-r revision] [-D date] [-d dir] [-j rev] modules | |||
-r revision | Get the specified revision | ||
-D date | Get latest version committed as of the specified date [2] | ||
-d dir | Root working directory at given directory instead of module name | ||
modules | A list of modules to retrieve [3] | ||
commit | Commit to the repository changes between the current versions of files and the versions checked out | ||
dvs commit [-F logfile] [-m message] [-r revision] [files] | |||
-F logfile | Read commit log message from the specified file | ||
-m message | Use the specified message in the commit log | ||
-r revision | Commit to the specified branch | ||
files | Which files' changes to commit. If none are specified, the current directory is used. | ||
diff | Show differences between versions | ||
dvs diff [-r r1 r2] [-D date1 date2] [files] | |||
-r r1 r2 | Compare the two specified versions | ||
Compare the latest versions committed as of the specified dates [2] | |||
files | Which files to compare. If none are specifed, the current directory is used. | ||
import | Import source tree into a DVS repository | ||
dvs import [directory] | |||
directory | Top directory to consider for import. If not specified, the current directory is used. | ||
log | Display log information for files | ||
dvs log [-r Revisions] [files] | |||
-r Revisions | Display logs for the specified revision(s) [1] | ||
files | Files whose logs will be displayed. If none is specified, the current directory is used. | ||
move | Renames a file or moves it to another directory | ||
dvs move <source> <dest> | |||
source | File to move (possibly including a path) | ||
dest | New name for source (possibly including a path) | ||
Note that the move procedure is only permitted on files which are synchronized with the DVS repository. This restriction is necessary because when a file is checked out, modifed, then moved, it's not always clear which versions should be listed in the contents files after the move operation. | |||
remove | Remove a file from DVS | ||
dvs remove <files> | |||
files | Which files to remove. The files are removed from the appropriate contents files but are not removed from the DVS repository. | ||
tag | Add a symbolic tag to the currently checkout files | ||
dvs tag [-b] [-r revision] tag [files] | |||
-b | Create a branch and apply the specified tag to all files in the branch | ||
-r revision | Tag the specified version | ||
tag | The tag to apply | ||
files | Which files to tag. If none are specified, the current directory is used. | ||
update | Synchronize current files with those in the repository. | ||
dvs update [-r revision] [files] | |||
-r revision | Use the specified version in the repository as source for update operation | ||
files | Which files to update |
v | only v |
v1:v2 | inclusive range |
v1: | v1 and later |
:v2 | not later than v2 |
Dates are specified in the format YYYYMMDD
A module may have a suffix specifying a subdirectory within
the module to checkout. In such a case, only that part of the
module is checked out from the repository. For example, one could
check out subdirectory sub of module mod with
[CLEARCASE] | Rational ClearCase http://www.rational.com/products/clearcase/index.jsp |
[CVS] | CVS http://www.cvshome.org |
[PERFORCE] | Perforce http://www.perforce.com |
[PFIFB] | Perforce Inter-File Branching white paper http://www.perforce.com/perforce/barnch.html |
[RCS] | RCS manual page rcsfile(5) |
[SOURCESAFE] | Visual SourceSafe http://msdn.microsoft.com/ssafe |
[STRGBL] | StreamGobbler http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-traps.html |
[SUBVERSION] | Subversion http://subversion.tigris.org |
[VESTA] | Vesta Configuration Management System http://www.vestasys.org |