RRRRCCCCSSSS--------AAAA SSSSyyyysssstttteeeemmmm ffffoooorrrr VVVVeeeerrrrssssiiiioooonnnn CCCCoooonnnnttttrrrroooollll Walter F. Tichy Department of Computer Sciences Purdue University West Lafayette, Indiana 47907 _A_B_S_T_R_A_C_T An important problem in program development and maintenance is version control, i.e., the task of keeping a software system consisting of many versions and configurations well organized. The Revision Control System (RCS) is a software tool that assists with that task. RCS manages revi- sions of text documents, in particular source pro- grams, documentation, and test data. It automates the storing, retrieval, logging and identification of revisions, and it provides selection mechanisms for composing configurations. This paper intro- duces basic version control concepts and discusses the practice of version control using RCS. For conserving space, RCS stores deltas, i.e., differ- ences between successive revisions. Several delta storage methods are discussed. Usage statistics show that RCS's delta storage method is space and time efficient. The paper concludes with a detailed survey of version control tools. KKKKeeeeyyyywwwwoooorrrrddddssss: configuration management, history management, version control, revisions, deltas. 1995/06/01 RRRRCCCCSSSS--------AAAA SSSSyyyysssstttteeeemmmm ffffoooorrrr VVVVeeeerrrrssssiiiioooonnnn CCCCoooonnnnttttrrrroooollll Walter F. Tichy Department of Computer Sciences Purdue University West Lafayette, Indiana 47907 _1. _I_n_t_r_o_d_u_c_t_i_o_n Version control is the task of keeping software systems consisting of many versions and configurations well organ- ized. The Revision Control System (RCS) is a set of UNIX commands that assist with that task. RCS' primary function is to manage _r_e_v_i_s_i_o_n _g_r_o_u_p_s. A revision group is a set of text documents, called _r_e_v_i_s_i_o_n_s, that evolved from each other. A new revision is created by manually editing an existing one. RCS organizes the revi- sions into an ancestral tree. The initial revision is the root of the tree, and the tree edges indicate from which revision a given one evolved. Besides managing individual revision groups, RCS provides flexible selection functions for composing configurations. RCS may be combined with MAKE1, resulting in a powerful package for version control. RCS also offers facilities for merging updates with customer modifications, for distributed software develop- ment, and for automatic identification. Identification is the `stamping' of revisions and configurations with unique markers. These markers are akin to serial numbers, telling software maintainers unambiguously which configuration is before them. RCS is designed for both production and experimental environments. In production environments, access controls detect update conflicts and prevent overlapping changes. In experimental environments, where strong controls are coun- terproductive, it is possible to loosen the controls. _________________________ An earlier version of this paper was published in _S_o_f_t_w_a_r_e--_P_r_a_c_t_i_c_e & _E_x_p_e_r_i_e_n_c_e _1_5, 7 (July 1985), 637-654. - 2 - Although RCS was originally intended for programs, it is useful for any text that is revised frequently and whose previous revisions must be preserved. RCS has been applied successfully to store the source text for drawings, VLSI layouts, documentation, specifications, test data, form letters and articles. This paper discusses the practice of version control using RCS. It also introduces basic version control con- cepts, useful for clarifying current practice and designing similar systems. Revision groups of individual components are treated in the next three sections, and the extensions to configurations follow. Because of its size, a survey of version control tools appears at the end of the paper. _2. _G_e_t_t_i_n_g _s_t_a_r_t_e_d _w_i_t_h _R_C_S Suppose a text file _f._c is to be placed under control of RCS. Invoking the check-in command _c_i _f._c creates a new revision group with the contents of _f._c as the initial revision (numbered 1.1) and stores the group into the file _f._c,_v. Unless told otherwise, the command deletes _f._c. It also asks for a description of the group. The description should state the common purpose of all revisions in the group, and becomes part of the group's documentation. All later check-in commands will ask for a log entry, which should summarize the changes made. (The first revision is assigned a default log message, which just records the fact that it is the initial revision.) Files ending in ,_v are called _R_C_S _f_i_l_e_s (_v stands for _versions); the others are called working files. To get back the working file _f._c in the previous example, execute the check-out command: co f.c This command extracts the latest revision from the revision group _f._c,_v and writes it into _f._c. The file _f._c can now be edited and, when finished, checked back in with _c_i: ci f.c _C_i assigns number 1.2 to the new revision. If _c_i complains with the message ci error: no lock set by then the system administrator has decided to configure RCS for a production environment by enabling the `strict locking feature'. If this feature is enabled, all RCS files are - 3 - initialized such that check-in operations require a lock on the previous revision (the one from which the current one evolved). Locking prevents overlapping modifications if several people work on the same file. If locking is required, the revision should have been locked during the check-out by using the option -_l: co -l f.c Of course it is too late now for the check-out with locking, because _f._c has already been changed; checking out the file again would overwrite the modifications. (To prevent accidental overwrites, _c_o senses the presence of a working file and asks whether the user really intended to overwrite it. The overwriting check-out is sometimes useful for back- ing up to the previous revision.) To be able to proceed with the check-in in the present case, first execute rcs -l f.c This command retroactively locks the latest revision, unless someone else locked it in the meantime. In this case, the two programmers involved have to negotiate whose modifica- tions should take precedence. If an RCS file is private, i.e., if only the owner of the file is expected to deposit revisions into it, the strict locking feature is unnecessary and may be disabled. If strict locking is disabled, the owner of the RCS file need not have a lock for check-in. For safety reasons, all others still do. Turning strict locking off and on is done with the commands: rcs -U f.c and rcs -L f.c These commands enable or disable the strict locking feature for each RCS file individually. The system administrator only decides whether strict locking is enabled initially. To reduce the clutter in a working directory, all RCS files can be moved to a subdirectory with the name _R_C_S. RCS commands look first into that directory for RCS files. All the commands presented above work with the _R_C_S subdirectory without change.|- It may be undesirable that _c_i deletes the working file. For instance, sometimes one would like to save the current _________________________ |- Pairs of RCS and working files can actually be specified in 3 ways: a) both are given, b) only the working file is given, c) only the RCS file is given. If a pair is given, both files may have arbitrary path prefixes; RCS commands pair them up intelligently. - 4 - revision, but continue editing. Invoking ci -l f.c checks in _f._c as usual, but performs an additional check-out with locking afterwards. Thus, the working file does not disappear after the check-in. Similarly, the option -_u does a check-in followed by a check-out without locking. This option is useful if the file is needed for compilation after the check-in. Both options update the identification mark- ers in the working file (see below). Besides the operations _c_i and _c_o, RCS provides the fol- lowing commands: _i_d_e_n_t extract identification markers _r_c_s change RCS file attributes _r_c_s_c_l_e_a_n remove unchanged working files (optional) _r_c_s_d_i_f_f compare revisions _r_c_s_f_r_e_e_z_e record a configuration (optional) _r_c_s_m_e_r_g_e merge revisions _r_l_o_g read log messages and other information in RCS files A synopsis of these commands appears in the Appendix. _2._1. _A_u_t_o_m_a_t_i_c _I_d_e_n_t_i_f_i_c_a_t_i_o_n RCS can stamp source and object code with special iden- tification strings, similar to product and serial numbers. To obtain such identification, place the marker $Id$ into the text of a revision, for instance inside a comment. The check-out operation will replace this marker with a string of the form $Id: filename revisionnumber date time author state locker $ This string need never be touched, because _c_o keeps it up to date automatically. To propagate the marker into object code, simply put it into a literal character string. In C, this is done as follows: static char rcsid[] = "$Id$"; The command _i_d_e_n_t extracts such markers from any file, in particular from object code. _I_d_e_n_t helps to find out which revisions of which modules were used in a given program. It returns a complete and unambiguous component list, from which a copy of the program can be reconstructed. This facility is invaluable for program maintenance. There are several additional identification markers, - 5 - one for each component of $Id$. The marker $Log$ has a similar function. It accumulates the log messages that are requested during check-in. Thus, one can maintain the complete history of a revision directly inside it, by enclosing it in a comment. Figure 1 is an edited version of a log contained in revision 4.1 of the file _c_i._c. The log appears at the beginning of the file, and makes it easy to determine what the recent modifications were. /* * $Log: ci.c,v $ * Revision 4.1 1983/05/10 17:03:06 wft * Added option -d and -w, and updated assignment of date, etc. to new delta. * Added handling of default branches. * * Revision 3.9 1983/02/15 15:25:44 wft * Added call to fastcopy() to copy remainder of RCS file. * * Revision 3.8 1983/01/14 15:34:05 wft * Added ignoring of interrupts while new RCS file is renamed; * avoids deletion of RCS files by interrupts. * * Revision 3.7 1982/12/10 16:09:20 wft * Corrected checking of return code from diff. * An RCS file now inherits its mode during the first ci from the working file, * except that write permission is removed. */ Figure 1. Log entries produced by the marker $Log$. Since revisions are stored in the form of differences, each log message is physically stored once, independent of the number of revisions present. Thus, the $Log$ marker incurs negligible space overhead. _3. _T_h_e _R_C_S _R_e_v_i_s_i_o_n _T_r_e_e RCS arranges revisions in an ancestral tree. The _c_i command builds this tree; the auxiliary command _r_c_s prunes it. The tree has a root revision, normally numbered 1.1, and successive revisions are numbered 1.2, 1.3, etc. The first field of a revision number is called the _r_e_l_e_a_s_e _n_u_m_b_e_r and the second one the _l_e_v_e_l _n_u_m_b_e_r. Unless given explicitly, the _c_i command assigns a new revision number by incrementing the level number of the previous revision. The release number must be incremented explicitly, using the -_r option of _c_i. Assuming there are revisions 1.1, 1.2, and 1.3 in the RCS file f.c,v, the command ci -r2.1 f.c or ci -r2 f.c assigns the number 2.1 to the new revision. Later check-ins - 6 - without the -_r option will assign the numbers 2.2, 2.3, and so on. The release number should be incremented only at major transition points in the development, for instance when a new release of a software product has been completed. _3._1. _W_h_e_n _a_r_e _b_r_a_n_c_h_e_s _n_e_e_d_e_d? A young revision tree is slender: It consists of only one branch, called the trunk. As the tree ages, side branches may form. Branches are needed in the following 4 situations. _T_e_m_p_o_r_a_r_y _f_i_x_e_s Suppose a tree has 5 revisions grouped in 2 releases, as illustrated in Figure 2. Revision 1.3, the last one of release 1, is in operation at customer sites, while release 2 is in active development. 1.1 ---> 1.2 ---> 1.3 ---> 2.1 ---> 2.2 - -> Figure 2. A slender revision tree. Now imagine a customer requesting a fix of a problem in revision 1.3, although actual development has moved on to release 2. RCS does not permit an extra revision to be spliced in between 1.3 and 2.1, since that would not reflect the actual development history. Instead, create a branch at revision 1.3, and check in the fix on that branch. The first branch starting at 1.3 has number 1.3.1, and the revisions on that branch are num- bered 1.3.1.1, 1.3.1.2, etc. The double numbering is needed to allow for another branch at 1.3, say 1.3.2. Revisions on the second branch would be numbered 1.3.2.1, 1.3.2.2, and so on. The following steps create branch 1.3.1 and add revision 1.3.1.1: co -r1.3 f.c -- check out revision 1.3 edit f.c -- change it ci -r1.3.1 f.c -- check it in on branch 1.3.1 This sequence of commands transforms the tree of Figure 2 into the one in Figure 3. Note that it may be neces- sary to incorporate the differences between 1.3 and 1.3.1.1 into a revision at level 2. The operation _r_c_s_m_e_r_g_e automates this process (see the Appendix). 1.1 ---> 1.2 ---> 1.3 ---> 2.1 ---> 2.2 - -> \ --->1.3.1.1 - -> Figure 3. A revision tree with one side branch - 7 - _D_i_s_t_r_i_b_u_t_e_d _d_e_v_e_l_o_p_m_e_n_t _a_n_d _c_u_s_t_o_m_e_r _m_o_d_i_f_i_c_a_t_i_o_n_s Assume a situation as in Figure 2, where revision 1.3 is in operation at several customer sites, while release 2 is in development. Customer sites should use RCS to store the distributed software. However, custo- mer modifications should not be placed on the same branch as the distributed source; instead, they should be placed on a side branch. When the next software distribution arrives, it should be appended to the trunk of the customer's RCS file, and the customer can then merge the local modifications back into the new release. In the above example, a customer's RCS file would contain the following tree, assuming that the customer has received revision 1.3, added his local modifications as revision 1.3.1.1, then received revi- sion 2.4, and merged 2.4 and 1.3.1.1, resulting in 2.4.1.1. 1.3 ----------------> 2.4 \ \ -->1.3.1.1 --> 2.4.1.1 Figure 4. A customer's revision tree with local modifications. This approach is actually practiced in the CSNET pro- ject, where several universities and a company cooperate in developing a national computer network. _P_a_r_a_l_l_e_l _d_e_v_e_l_o_p_m_e_n_t Sometimes it is desirable to explore an alternate design or a different implementation technique in parallel with the main line development. Such develop- ment should be carried out on a side branch. The experimental changes may later be moved into the main line, or abandoned. _C_o_n_f_l_i_c_t_i_n_g _u_p_d_a_t_e_s A common occurrence is that one programmer has checked out a revision, but cannot complete the assignment for some reason. In the meantime, another person must per- form another modification immediately. In that case, the second person should check-out the same revision, modify it, and check it in on a side branch, for later merging. Every node in a revision tree consists of the following attributes: a revision number, a check-in date and time, the author's identification, a log entry, a state and the - 8 - actual text. All these attributes are determined at the time the revision is checked in. The state attribute indi- cates the status of a revision. It is set automatically to `experimental' during check-in. A revision can later be promoted to a higher status, for example `stable' or `released'. The set of states is user-defined. _3._2. _R_e_v_i_s_i_o_n_s _a_r_e _r_e_p_r_e_s_e_n_t_e_d _a_s _d_e_l_t_a_s For conserving space, RCS stores revisions in the form of deltas, i.e., as differences between revisions. The user interface completely hides this fact. A delta is a sequence of edit commands that transforms one string into another. The deltas employed by RCS are line-based, which means that the only edit commands allowed are insertion and deletion of lines. If a single character in a line is changed, the edit scripts consider the entire line changed. The program _d_i_f_f2 produces a small, line- based delta between pairs of text files. A character-based edit script would take much longer to compute, and would not be significantly shorter. Using deltas is a classical space-time tradeoff: deltas reduce the space consumed, but increase access time. How- ever, a version control tool should impose as little delay as possible on programmers. Excessive delays discourage the use of version controls, or induce programmers to take shortcuts that compromise system integrity. To gain reason- ably fast access time for both editing and compiling, RCS arranges deltas in the following way. The most recent revi- sion on the trunk is stored intact. All other revisions on the trunk are stored as reverse deltas. A reverse delta describes how to go backward in the development history: it produces the desired revision if applied to the successor of that revision. This implementation has the advantage that extraction of the latest revision is a simple and fast copy operation. Adding a new revision to the trunk is also fast: _c_i simply adds the new revision intact, replaces the previ- ous revision with a reverse delta, and keeps the rest of the old deltas. Thus, _c_i requires the computation of only one new delta. Branches need special treatment. The naive solution would be to store complete copies for the tips of all branches. Clearly, this approach would cost too much space. Instead, RCS uses _f_o_r_w_a_r_d deltas for branches. Regenerating a revision on a side branch proceeds as follows. First, extract the latest revision on the trunk; secondly, apply reverse deltas until the fork revision for the branch is obtained; thirdly, apply forward deltas until the desired branch revision is reached. Figure 5 illustrates a tree with one side branch. Triangles pointing to the left and - 9 - right represent reverse and forward deltas, respectively. / | / | / | / | +---+ <1.1| ---> <1.2| ---> <1.3| ---> <2.1| ---> |2.2| \ | \ | \ | \ | +---+ \ \ | \ | \ \|1.3.1.1> ---> |1.3.1.2> | / | / Figure 5. A revision tree with reverse and forward deltas. Although implementing fast check-out for the latest trunk revision, this arrangement has the disadvantage that generation of other revisions takes time proportional to the number of deltas applied. For example, regenerating the branch tip in Figure 5 requires application of five deltas (including the initial one). Since usage statistics show that the latest trunk revision is the one that is retrieved in 95 per cent of all cases (see the section on usage statistics), biasing check-out time in favor of that revi- sion results in significant savings. However, careful implementation of the delta application process is necessary to provide low retrieval overhead for other revisions, in particular for branch tips. There are several techniques for delta application. The naive one is to pass each delta to a general-purpose text editor. A prototype of RCS invoked the UNIX editor _e_d both for applying deltas and for expanding the identifica- tion markers. Although easy to implement, performance was poor, owing to the high start-up costs and excess generality of _e_d. An intermediate version of RCS used a special- purpose, stream-oriented editor. This technique reduced the cost of applying a delta to the cost of checking out the latest trunk revision. The reason for this behavior is that each delta application involves a complete pass over the preceding revision. However, there is a much better algorithm. Note that the deltas are line oriented and that most of the work of a stream editor involves copying unchanged lines from one revision to the next. A faster algorithm avoids unnecessary copying of character strings by using a _p_i_e_c_e _t_a_b_l_e. A piece table is a one-dimensional array, specifying how a given revision is `pieced together' from lines in the RCS file. Suppose piece table _P_T represents revision _r. Then _P_T [_i] contains the starting p_orsition of line _i of revision _r._r Application of the next delta transforms piece table _P_T _r - 10 - into _P_T . For instance, a delete command removes a series of ent_rr+i_e1s from the piece table. An insertion command inserts new entries, moving the entries following the inser- tion point further down the array. The inserted entries point to the text lines in the delta. Thus, no I/O is involved except for reading the delta itself. When all del- tas have been applied to the piece table, a sequential pass through the table looks up each line in the RCS file and copies it to the output file, updating identification mark- ers at the same time. Of course, the RCS file must permit random access, since the copied lines are scattered throughout that file. Figure 6 illustrates an RCS file with two revisions and the corresponding piece tables. _F_i_g_u_r_e _6 _i_s _n_o_t _a_v_a_i_l_a_b_l_e. Figure 6. An RCS file and its piece tables The piece table approach has the property that the time for applying a single delta is roughly determined by the size of the delta, and not by the size of the revision. For example, if a delta is 10 per cent of the size of a revi- sion, then applying it takes only 10 per cent of the time to generate the latest trunk revision. (The stream editor would take 100 per cent.) There is an important alternative for representing del- tas that affects performance. SCCS3, a precursor of RCS, uses _i_n_t_e_r_l_e_a_v_e_d deltas. A file containing interleaved del- tas is partitioned into blocks of lines. Each block has a header that specifies to which revision(s) the block belongs. The blocks are sorted out in such a way that a single pass over the file can pick up all the lines belong- ing to a given revision. Thus, the regeneration time for all revisions is the same: all headers must be inspected, and the associated blocks either copied or skipped. As the number of revisions increases, the cost of retrieving any revision is much higher than the cost of checking out the latest trunk revision with reverse deltas. A detailed com- parison of SCCS's interleaved deltas and RCS's reverse del- tas can be found in Reference 4. This reference considers the version of RCS with the stream editor only. The piece table method improves performance further, so that RCS is always faster than SCCS, except if 10 or more deltas are applied. - 11 - Additional speed-up for both delta methods can be obtained by caching the most recently generated revision, as has been implemented in DSEE.5 With caching, access time to frequently used revisions can approach normal file access time, at the cost of some additional space. _4. _L_o_c_k_i_n_g: _A _C_o_n_t_r_o_v_e_r_s_i_a_l _I_s_s_u_e The locking mechanism for RCS was difficult to design. The problem and its solution are first presented in their `pure' form, followed by a discussion of the complications caused by `real-world' considerations. RCS must prevent two or more persons from depositing competing changes of the same revision. Suppose two pro- grammers check out revision 2.4 and modify it. Programmer A checks in a revision before programmer B. Unfortunately, programmer B has not seen A's changes, so the effect is that A's changes are covered up by B's deposit. A's changes are not lost since all revisions are saved, but they are con- fined to a single revision.|= This conflict is prevented in RCS by locking. Whenever someone intends to edit a revision (as opposed to reading or compiling it), the revision should be checked out and locked, using the -_l option on _c_o. On subsequent check-in, _c_i tests the lock and then removes it. At most one program- mer at a time may lock a particular revision, and only this programmer may check in the succeeding revision. Thus, while a revision is locked, it is the exclusive responsibil- ity of the locker. An important maxim for software tools like RCS is that they must not stand in the way of making progress with a project. This consideration leads to several weakenings of the locking mechanism. First of all, even if a revision is locked, it can still be checked out. This is necessary if other people wish to compile or inspect the locked revision while the next one is in preparation. The only operations they cannot do are to lock the revision or to check in the succeeding one. Secondly, check-in operations on other branches in the RCS file are still possible; the locking of one revision does not affect any other revision. Thirdly, revisions are occasionally locked for a long period of time because a programmer is absent or otherwise unable to com- plete the assignment. If another programmer has to make a _________________________ |= Note that this problem is entirely different from the atomicity problem. Atomicity means that concurrent update operations on the same RCS file cannot be permitted, because that may result in inconsistent data. Atomic updates are essential (and implemented in RCS), but do not solve the conflict discussed here. - 12 - pressing change, there are the following three alternatives for making progress: a) find out who is holding the lock and ask that person to release it; b) check out the locked revision, modify it, check it in on a branch, and merge the changes later; c) break the lock. Breaking a lock leaves a highly visible trace, namely an electronic mail message that is sent automatically to the holder of the lock, recording the breaker and a commentary requested from him. Thus, breaking locks is tolerated under certain circumstances, but will not go unnoticed. Experience has shown that the automatic mail message attaches a high enough stigma to lock breaking, such that programmers break locks only in real emergencies, or when a co-worker resigns and leaves locked revisions behind. If an RCS file is private, i.e., when a programmer owns an RCS file and does not expect anyone else to perform check-in operations, locking is an unnecessary nuisance. In this case, the `strict locking feature' discussed earlier may be disabled, provided that file protection is set such that only the owner may write the RCS file. This has the effect that only the owner can check-in revisions, and that no lock is needed for doing so. As added protection, each RCS file contains an access list that specifies the users who may execute update opera- tions. If an access list is empty, only normal UNIX file protection applies. Thus, the access list is useful for restricting the set of people who would otherwise have update permission. Just as with locking, the access list has no effect on read-only operations such as _c_o. This approach is consistent with the UNIX philosophy of openness, which contributes to a productive software development environment. _5. _C_o_n_f_i_g_u_r_a_t_i_o_n _M_a_n_a_g_e_m_e_n_t The preceding sections described how RCS deals with revisions of individual components; this section discusses how to handle configurations. A configuration is a set of revisions, where each revision comes from a different revi- sion group, and the revisions are selected according to a certain criterion. For example, in order to build a func- tioning compiler, the `right' revisions from the scanner, the parser, the optimizer and the code generator must be combined. RCS, in conjunction with MAKE, provides a number of facilities to effect a smooth selection. _5._1. _R_C_S _S_e_l_e_c_t_i_o_n _F_u_n_c_t_i_o_n_s _D_e_f_a_u_l_t _s_e_l_e_c_t_i_o_n During development, the usual selection criterion is to choose the latest revision of all components. The _c_o - 13 - command makes this selection by default. For example, the command co *,v retrieves the latest revision on the default branch of each RCS file in the current directory. The default branch is usually the trunk, but may be set to be a side branch. Side branches as defaults are needed in distributed software development, as discussed in the section on the RCS revision tree. _R_e_l_e_a_s_e _b_a_s_e_d _s_e_l_e_c_t_i_o_n Specifying a release or branch number selects the latest revision in that release or branch. For instance, co -r2 *,v retrieves the latest revision with release number 2 from each RCS file. This selection is convenient if a release has been completed and development has moved on to the next release. _S_t_a_t_e _a_n_d _a_u_t_h_o_r _b_a_s_e_d _s_e_l_e_c_t_i_o_n If the highest level number within a given release number is not the desired one, the state attribute can help. For example, co -r2 -sReleased *,v retrieves the latest revision with release number 2 whose state attribute is `Released'. Of course, the state attribute has to be set appropriately, using the _c_i or _r_c_s commands. Another alternative is to select a revision by its author, using the -_w option. _D_a_t_e _b_a_s_e_d _s_e_l_e_c_t_i_o_n Revisions may also be selected by date. Suppose a release of an entire system was completed and current on March 4, at 1:00 p.m. local time. Then the command co -d'March 4, 1:00 pm LT' *,v checks out all the components of that release, indepen- dent of the numbering. The -_d option specifies a `cut- off date', i.e., the revision selected has a check-in date that is closest to, but not after the date given. _N_a_m_e _b_a_s_e_d _s_e_l_e_c_t_i_o_n The most powerful selection function is based on - 14 - assigning symbolic names to revisions and branches. In large systems, a single release number or date is not sufficient to collect the appropriate revisions from all groups. For example, suppose one wishes to combine release 2 of one subsystem and release 15 of another. Most likely, the creation dates of those releases differ also. Thus, a single revision number or date passed to the _c_o command will not suffice to select the right revisions. Symbolic revision numbers solve this problem. Each RCS file may contain a set of symbolic names that are mapped to numeric revision numbers. For example, assume the symbol _V_3 is bound to release number 2 in file _s,_v, and to revision number 15.9 in _t,_v. Then the single command co -rV3 s,v t,v retrieves the latest revision of release 2 from _s,_v, and revision 15.9 from _t,_v. In a large system with many modules, checking out all revisions with one com- mand greatly simplifies configuration management. Judicious use of symbolic revision numbers helps with organizing large configurations. A special command, _r_c_s_f_r_e_e_z_e, assigns a symbolic revision number to a selected revision in every RCS file. _R_c_s_f_r_e_e_z_e effectively freezes a configuration. The assigned symbolic revision number selects all components of the configuration. If necessary, symbolic numbers may even be intermixed with numeric ones. Thus, _V_3._5 in the above example would select revision 2.5 in _s,_v and branch 15.9.5 in _t,_v. The options -_r, -_s, -_w and -_d may be combined. If a branch is given, the latest revision on that branch satisfy- ing all conditions is retrieved; otherwise, the default branch is used. _5._2. _C_o_m_b_i_n_i_n_g _M_A_K_E _a_n_d _R_C_S MAKE1 is a program that processes configurations. It is driven by configuration specifications recorded in a spe- cial file, called a `Makefile'. MAKE avoids redundant pro- cessing steps by comparing creation dates of source and pro- cessed objects. For example, when instructed to compile all modules of a given system, it only recompiles those source modules that were changed since they were processed last. MAKE has been extended with an auto-checkout feature for RCS.* When a certain file to be processed is not present, MAKE attempts a check-out operation. If _________________________ * This auto-checkout extension is available only in some versions of MAKE, e.g. GNU MAKE. - 15 - successful, MAKE performs the required processing, and then deletes the checked out file to conserve space. The selec- tion parameters discussed above can be passed to MAKE either as parameters, or directly embedded in the Makefile. MAKE has also been extended to search the subdirectory named _R_C_S for needed files, rather than just the current working directory. However, if a working file is present, MAKE totally ignores the corresponding RCS file and uses the working file. (In newer versions of MAKE distributed by AT&T and others, auto-checkout can be achieved with the rule DEFAULT, instead of a special extension of MAKE. However, a file checked out by the rule DEFAULT will not be deleted after processing. _R_c_s_c_l_e_a_n can be used for that purpose.) With auto-checkout, RCS/MAKE can effect a selection rule especially tuned for multi-person software development and maintenance. In these situations, programmers should obtain configurations that consist of the revisions they have personally checked out plus the latest checked in revi- sion of all other revision groups. This schema can be set up as follows. Each programmer chooses a working directory and places into it a symbolic link, named _R_C_S, to the directory con- taining the relevant RCS files. The symbolic link makes sure that _c_o and _c_i operations need only specify the working files, and that the Makefile need not be changed. The pro- grammer then checks out the needed files and modifies them. If MAKE is invoked, it composes configurations by selecting those revisions that are checked out, and the rest from the subdirectory _R_C_S. The latter selection may be controlled by a symbolic revision number or any of the other selection criteria. If there are several programmers editing in separate working directories, they are insulated from each other's changes until checking in their modifications. Similarly, a maintainer can recreate an older confi- guration by starting to work in an empty working directory. During the initial MAKE invocation, all revisions are selected from RCS files. As the maintainer checks out files and modifies them, a new configuration is gradually built up. Every time MAKE is invoked, it substitutes the modified revisions into the configuration being manipulated. A final application of RCS is to use it for storing Makefiles. Revision groups of Makefiles represent multiple versions of configurations. Whenever a configuration is baselined or distributed, the best approach is to unambigu- ously fix the configuration with a symbolic revision number by calling _r_c_s_f_r_e_e_z_e, to embed that symbol into the Makefile, and to check in the Makefile (using the same sym- bolic revision number). With this approach, old configura- tions can be regenerated easily and reliably. - 16 - _6. _U_s_a_g_e _S_t_a_t_i_s_t_i_c_s The following usage statistics were collected on two DEC VAX-11/780 computers of the Purdue Computer Science Department. Both machines are mainly used for research pur- poses. Thus, the data reflect an environment in which the majority of projects involve prototyping and advanced software development, but relatively little long-term maintenance. For the first experiment, the _c_i and _c_o operations were instrumented to log the number of backward and forward del- tas applied. The data were collected during a 13 month period from Dec. 1982 to Dec. 1983. Table I summarizes the results. ___________________________________________________________________________________ |Operation| Total | Total deltas| Mean deltas| Operations | Branch | _|__________|__o_p_e_r_a_t_i_o_n_s___|____a_p_p_l_i_e_d_____|____a_p_p_l_i_e_d____|__w_i_t_h__>_1__d_e_l_t_a__|__o_p_e_r_a_t_i_o_n_s__| |co | 7867 | 9320 | 1.18 | 509 (6%)| 203 (3%)| |ci | 3468 | 2207 | 0.64 | 85 (2%)| 75 (2%)| |ci & co | 11335 | 11527 | 1.02 | 594 (5%)| 278 (2%)| _|__________|______________|_______________|______________|________________|_____________| Table I. Statistics for _c_o and _c_i operations. The first two lines show statistics for check-out and check-in; the third line shows the combination. Recall that _c_i performs an implicit check-out to obtain a revision for computing the delta. In all measures presented, the most recent revision (stored intact) counts as one delta. The number of deltas applied represents the number of passes necessary, where the first `pass' is a copying step. Note that the check-out operation is executed more than twice as frequently as the check-in operation. The fourth column gives the mean number of deltas applied in all three cases. For _c_i, the mean number of deltas applied is less than one. The reasons are that the initial check-in requires no delta at all, and that the only time _c_i requires more than one delta is for branches. Column 5 shows the actual number of operations that applied more than one delta. The last column indicates that branches were not used often. The last three columns demonstrate that the most recent trunk revision is by far the most frequently accessed. For RCS, check-out of this revision is a simple copy operation, which is the absolute minimum given the copy-semantics of _c_o. Access to older revisions and branches is more common in non-academic environments, yet even if access to older deltas were an order of magnitude more frequent, the com- bined average number of deltas applied would still be below 1.2. Since RCS is faster than SCCS until up to 10 delta applications, reverse deltas are clearly the method of - 17 - choice. The second experiment, conducted in March of 1984, involved surveying the existing RCS files on our two machines. The goal was to determine the mean number of revisions per RCS file, as well as the space consumed by them. Table II shows the results. (Tables I and II were produced at different times and are unrelated.) _________________________________________________________________________________________ | | Total RCS| Total | Mean | Mean size of| Mean size of| Overhead| _|___________|____f_i_l_e_s____|__r_e_v_i_s_i_o_n_s__|__r_e_v_i_s_i_o_n_s__|___R_C_S__f_i_l_e_s____|___r_e_v_i_s_i_o_n_s____|___________| |All files | 8033 | 11133 | 1.39 | 6156 | 5585 | 1.10 | |Files with| 1477 | 4578 | 3.10 | 8074 | 6041 | 1.34 | |_> 2 deltas| | | | | | | _|___________|____________|____________|____________|_______________|_______________|___________| Table II. Statistics for RCS files. The mean number of revisions per RCS file is 1.39. Columns 5 and 6 show the mean sizes (in bytes) of an RCS file and of the latest revision of each RCS file, respec- tively. The `overhead' column contains the ratio of the mean sizes. Assuming that all revisions in an RCS file are approximately the same size, this ratio gives a measure of the space consumed by the extra revisions. In our sample, over 80 per cent of the RCS files con- tained only a single revision. The reason is that our sys- tems programmers routinely check in all source files on the distribution tapes, even though they may never touch them again. To get a better indication of how much space savings are possible with deltas, all measures with those files that contained 2 or more revisions were recomputed. Only for those files is RCS necessary. As shown in the second line, the average number of revisions for those files is 3.10, with an overhead of 1.34. This means that the extra 2.10 deltas require 34 per cent e3xtra space, or 16 per cent per extra revision. Rochkind measured the space consumed by SCCS, and reported an average of 5 revisions per group and an overhead of 1.37 (or about69 per cent per extra revi- sion). In a later paper, Glasser observed an average of 7 revisions per group in a single, large pro5ject, but provided no overhead figure. In his paper on DSEE , Leblang reported that delta storage combined with blank compression results in an overhead of a mere 1-2 per cent per revision. Since leading blanks accounted for about 20 per cent of the sur- veyed Pascal programs, a revision group with 5-10 members was smaller than a single cleartext copy. The above observations demonstrate clearly that the space needed for extra revisions is small. With delta storage, the luxury of keeping multiple revisions online is certainly affordable. In fact, introducing a system with delta storage may reduce storage requirements, because - 18 - programmers often save back-up copies anyway. Since back-up copies are stored much more efficiently with deltas, intro- ducing a system such as RCS may actually free a considerable amount of space. _7. _S_u_r_v_e_y _o_f _V_e_r_s_i_o_n _C_o_n_t_r_o_l _T_o_o_l_s The need to keep back-up copies of software arose when programs and data were no longer stored on paper media, but were entered from terminals and stored on disk. Back-up copies are desirable for reliability, and many modern edi- tors automatically save a back-up copy for every file touched. This strategy is valuable for short-term back-ups, but not suitable for long-term version control, since an existing back-up copy is overwritten whenever the corresponding file is edited. Tape archives are suitable for long-term, offline storage. If all changed files are dumped on a back-up tape once per day, old revisions remain accessible. However, tape archives are unsatisfactory for version control in several ways. First, backing up the file system every 24 hours does not capture intermediate revisions. Secondly, the old revisions are not online, and accessing them is tedious and time-consuming. In particular, it is impracti- cal to compare several old revisions of a group, because that may require mounting and searching several tapes. Tape archives are important fail-safe tools in the event of catastrophic disk failures or accidental deletions, but they are ill-suited for version control. Conversely, version control tools do not obviate the need for tape archives. A natural technique for keeping several old revisions online is to never delete a file. Editing a file simply creates a new file with the same name, but with a different sequence number. This technique, available as an option in DEC's VMS operating system, turns out to be inadequate for version control. First, it is prohibitively expensive in terms of storage costs, especially since no data compression techniques are employed. Secondly, indiscriminately storing every change produces too many revisions, and programmers have difficulties distinguishing them. The proliferation of revisions forces programmers to spend much time on finding and deleting useless files. Thirdly, most of the support functions like locking, logging, revision selection, and identification described in this paper are not available. An alternative approach is to separate editing from revision control. The user may repeatedly edit a given revision, until freezing it with an explicit command. Once a revision is frozen, it is stored permanently and can no longer be modified. (In RCS, freezing a revisions is done with _c_i.) Editing a frozen revision implicitly creates a new one, which can again be changed repeatedly until it is - 19 - frozen itself. This approach saves exactly those revisions that the user considers important, and keeps the number of revisions manageable. IBM's CLEAR/CASTER7, AT&T's SCCS3, CMU's SDC8 and DEC's CMS9, are examples of version control systems using this approach. CLEAR/CASTER maintains a data base of programs, specifications, documentation and mes- sages, using deltas. Its goal is to provide control over the development process from a management viewpoint. SCCS stores multiple revisions of source text in an ancestral tree, records a log entry for each revision, provides access control, and has facilities for uniquely identifying each revision. An efficient delta technique reduces the space consumed by each revision group. SDC is much simpler than SCCS because it stores not more than two revisions. How- ever, it maintains a complete log for all old revisions, some of which may be on back-up tape. CMS, like SCCS, manages tree-structured revision groups, but offers no iden- tification mechanism. Tools for dealing with configurations are still in a state of flux. SCCS, SDC and CMS can be combined with MAKE or MAKE-like programs. Since flexible selection rules are missing from all these tools, it is sometimes difficult to specify precisely which revision of each group should be passed to MAKE for building a desired configuration. The Xerox Cedar system10 provides a `System Modeller' that can rebuild a configuration from an arbitrary set of module revisions. The revisions of a module are only distinguished by creation time, and there is no tool for managing groups. Since the selection rules are primitive, the System Modeller appears to be somewhat tedious to use. Apollo's DSEE5 is a sophisticated software engineering environment. It manages revision groups in a way similar to SCCS and CMS. Confi- gurations are built using `configuration threads'. A confi- guration thread states which revision of each group named in a configuration should be chosen. A configuration thread may contain dynamic specifiers (e.g., `choose the revisions I am currently working on, and the most recent revisions otherwise'), which are bound automatically at build time. It also provides a notification mechanism for alerting main- tainers about the need to rebuild a system after a change. RCS is based on a general model for describing multi- version/multi-configuration systems11. The model describes systems using AND/OR graphs, where AND nodes represent con- figurations, and OR nodes represent version groups. The model gives rise to a suit of selection rules for composing configurations, almost all of which are implemented in RCS. The revisions selected by RCS are passed to MAKE for confi- guration building. Revision group management is modelled after SCCS. RCS retains SCCS's best features, but offers a significantly simpler user interface, flexible selection rules, adequate integration with MAKE and improved identifi- cation. A detailed comparison of RCS and SCCS appears in - 20 - Reference 4. An important component of all revision control systems is a program for computing deltas. SCCS and RCS use the program _d_i_f_f2, which first computes the longest common sub- string of two revisions, and then produces the delta from that substring. The delta is simply an edit script consist- ing of deletion and insertion commands that generate one revision from the other. A delta based on a longest common substring is not necessarily minimal, because it does not take advantage of crossing block moves. Crossing block moves arise if two or more blocks of lines (e.g., procedures) appear in a dif- ferent order in two revisions. An edit script derived from a longest common substring first deletes the shorter of the two blocks, and then reinserts it. Heckel12 proposed an algorithm for detecting block moves, but since the algorithm is based on heuristics, there are conditions under which the generated delta is far from minimal. DSEE uses this algo- rithm combined with blank compression, apparently with satisfactory overall results. A new algorithm that is guaranteed to produce a minimal delta based on block moves appears in Reference 13. A future release of RCS will use this algorithm. _A_c_k_n_o_w_l_e_d_g_e_m_e_n_t_s: Many people have helped make RCS a success by contributed criticisms, suggestions, corrections, and even whole new commands (including manual pages). The list of people is too long to be reproduced here, but my sincere thanks for their help and goodwill goes to all of them. _A_p_p_e_n_d_i_x: _S_y_n_o_p_s_i_s _o_f _R_C_S _O_p_e_r_a_t_i_o_n_s _c_i - cccchhhheeeecccckkkk iiiinnnn rrrreeeevvvviiiissssiiiioooonnnnssss _C_i stores the contents of a working file into the corresponding RCS file as a new revision. If the RCS file doesn't exist, _c_i creates it. _C_i removes the working file, unless one of the options -_u or -_l is present. For each check-in, _c_i asks for a commentary describing the changes relative to the previous revi- sion. _C_i assigns the revision number given by the -_r option; if that option is missing, it derives the number from the lock held by the user; if there is no lock and locking is not strict, _c_i increments the number of the latest revision on the trunk. A side branch can only be started by explicitly specifying its number with the -_r option during check-in. - 21 - _C_i also determines whether the revision to be checked in is different from the previous one, and asks whether to proceed if not. This facility simplifies check-in operations for large systems, because one need not remember which files were changed. The option -_k searches the checked in file for identif- ication markers containing the attributes revision number, check-in date, author and state, and assigns these to the new revision rather than computing them. This option is useful for software distribution: Reci- pients of distributed software using RCS should check in updates with the -_k option. This convention guaran- tees that revision numbers, check-in dates, etc., are the same at all sites. _c_o - cccchhhheeeecccckkkk oooouuuutttt rrrreeeevvvviiiissssiiiioooonnnnssss _C_o retrieves revisions according to revision number, date, author and state attributes. It either places the revision into the working file, or prints it on the standard output. _C_o always expands the identification markers. _i_d_e_n_t - eeeexxxxttttrrrraaaacccctttt iiiiddddeeeennnnttttiiiiffffiiiiccccaaaattttiiiioooonnnn mmmmaaaarrrrkkkkeeeerrrrssss _I_d_e_n_t extracts the identification markers expanded by _c_o from any file and prints them. _r_c_s - cccchhhhaaaannnnggggeeee RRRRCCCCSSSS ffffiiiilllleeee aaaattttttttrrrriiiibbbbuuuutttteeeessss _R_c_s is an administrative operation that changes access lists, locks, unlocks, breaks locks, toggles the strict-locking feature, sets state attributes and sym- bolic revision numbers, changes the description, and deletes revisions. A revision can only be deleted if it is not the fork of a side branch. _r_c_s_c_l_e_a_n - cccclllleeeeaaaannnn wwwwoooorrrrkkkkiiiinnnngggg ddddiiiirrrreeeeccccttttoooorrrryyyy _R_c_s_c_l_e_a_n removes working files that were checked out but never changed.* _r_c_s_d_i_f_f - ccccoooommmmppppaaaarrrreeee rrrreeeevvvviiiissssiiiioooonnnnssss _R_c_s_d_i_f_f compares two revisions and prints their differ- ence, using the UNIX tool _d_i_f_f. One of the revisions compared may be checked out. This command is useful for finding out about changes. _r_c_s_f_r_e_e_z_e - ffffrrrreeeeeeeezzzzeeee aaaa ccccoooonnnnffffiiiigggguuuurrrraaaattttiiiioooonnnn _R_c_s_f_r_e_e_z_e assigns the same symbolic revision number to a given revision in all RCS files. This command is useful for accurately recording a configuration.* _________________________ * The _r_c_s_c_l_e_a_n and _r_c_s_f_r_e_e_z_e commands are optional and are not always installed. - 22 - _r_c_s_m_e_r_g_e - mmmmeeeerrrrggggeeee rrrreeeevvvviiiissssiiiioooonnnnssss _R_c_s_m_e_r_g_e merges two revisions, _r_e_v_1 and _r_e_v_2, with respect to a common ancestor. A 3-way file comparison determines the segments of lines that are (a) the same in all three revisions, or (b) the same in 2 revisions, or (c) different in all three. For all segments of type (b) where _r_e_v_1 is the differing revision, the seg- ment in _r_e_v_1 replaces the corresponding segment of _r_e_v_2. Type (c) indicates an overlapping change, is flagged as an error, and requires user intervention to select the correct alternative. _r_l_o_g - rrrreeeeaaaadddd lllloooogggg mmmmeeeessssssssaaaaggggeeeessss _R_l_o_g prints the log messages and other information in an RCS file. - 23 - _R_e_f_e_r_e_n_c_e_s 1. Feldman, Stuart I., "Make--A Program for Maintaining Computer Programs," _S_o_f_t_w_a_r_e--_P_r_a_c_t_i_c_e & _E_x_p_e_r_i_e_n_c_e, vol. 9, no. 3, pp. 255-265, March 1979. 2. Hunt, James W. and McIlroy, M. D., "An Algorithm for Differential File Comparison," 41, Computing Science Technical Report, Bell Laboratories, June 1976. 3. Rochkind, Marc J., "The Source Code Control System," _I_E_E_E _T_r_a_n_s_a_c_t_i_o_n_s _o_n _S_o_f_t_w_a_r_e _E_n_g_i_n_e_e_r_i_n_g, vol. SE-1, no. 4, pp. 364-370, Dec. 1975. 4. Tichy, Walter F., "Design, Implementation, and Evalua- tion of a Revision Control System," in _P_r_o_c_e_e_d_i_n_g_s _o_f _t_h_e _6_t_h _I_n_t_e_r_n_a_t_i_o_n_a_l _C_o_n_f_e_r_e_n_c_e _o_n _S_o_f_t_w_a_r_e _E_n_g_i_n_e_e_r_- _i_n_g, pp. 58-67, ACM, IEEE, IPS, NBS, September 1982. 5. Leblang, David B. and Chase, Robert P., "Computer-Aided Software Engineering in a Distributed Workstation Environment," _S_I_G_P_L_A_N _N_o_t_i_c_e_s, vol. 19, no. 5, pp. 104-112, May 1984. Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Prac- tical Software Development Environments. 6. Glasser, Alan L., "The Evolution of a Source Code Con- trol System," _S_o_f_t_w_a_r_e _E_n_g_i_n_e_e_r_i_n_g _N_o_t_e_s, vol. 3, no. 5, pp. 122-125, Nov. 1978. Proceedings of the Software Quality and Assurance Workshop. 7. Brown, H.B., "The Clear/Caster System," _N_a_t_o _C_o_n_f_e_r_e_n_c_e _o_n _S_o_f_t_w_a_r_e _E_n_g_i_n_e_e_r_i_n_g, _R_o_m_e, 1970. 8. Habermann, A. Nico, _A _S_o_f_t_w_a_r_e _D_e_v_e_l_o_p_m_e_n_t _C_o_n_t_r_o_l _S_y_s_- _t_e_m, Technical Report, Carnegie-Mellon University, Department of Computer Science, Jan. 1979. 9. DEC, _C_o_d_e _M_a_n_a_g_e_m_e_n_t _S_y_s_t_e_m, Digital Equipment Corpora- tion, 1982. Document No. EA-23134-82 10. Lampson, Butler W. and Schmidt, Eric E., "Practical Use of a Polymorphic Applicative Language," in _P_r_o_c_e_e_d_i_n_g_s _o_f _t_h_e _1_0_t_h _S_y_m_p_o_s_i_u_m _o_n _P_r_i_n_c_i_p_l_e_s _o_f _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e_s, pp. 237-255, ACM, January 1983. 11. Tichy, Walter F., "A Data Model for Programming Support Environments and its Application," in _A_u_t_o_m_a_t_e_d _T_o_o_l_s _f_o_r _I_n_f_o_r_m_a_t_i_o_n _S_y_s_t_e_m _D_e_s_i_g_n _a_n_d _D_e_v_e_l_o_p_m_e_n_t, ed. by Hans-Jochen Schneider and Anthony I. Wasserman, North- Holland Publishing Company, Amsterdam, 1982. 12. Heckel, Paul, "A Technique for Isolating Differences - 24 - Between Files," _C_o_m_m_u_n_i_c_a_t_i_o_n_s _o_f _t_h_e _A_C_M, vol. 21, no. 4, pp. 264-268, April 1978. 13. Tichy, Walter F., "The String-to-String Correction Problem with Block Moves," _A_C_M _T_r_a_n_s_a_c_t_i_o_n_s _o_n _C_o_m_p_u_t_e_r _S_y_s_t_e_m_s, vol. 2, no. 4, pp. 309-321, Nov. 1984.