svnlook is a tool provided by Subversion for examining the various revisions and transactions in a repository. No part of this program attempts to change the repository—it's a “read-only” tool. svnlook is typically used by the repository hooks for reporting the changes that are about to be committed (in the case of the pre-commit hook) or that were just committed (in the case of the post-commit hook) to the repository. A repository administrator may use this tool for diagnostic purposes.
svnlook has a straightforward syntax:
$ svnlook help general usage: svnlook SUBCOMMAND REPOS_PATH [ARGS & OPTIONS ...] Note: any subcommand which takes the '--revision' and '--transaction' options will, if invoked without one of those options, act on the repository's youngest revision. Type "svnlook help <subcommand>" for help on a specific subcommand. …
Nearly every one of svnlook's subcommands can operate on either a revision or a transaction tree, printing information about the tree itself, or how it differs from the previous revision of the repository. You use the --revision and --transaction options to specify which revision or transaction, respectively, to examine. Note that while revision numbers appear as natural numbers, transaction names are alphanumeric strings. Keep in mind that the filesystem only allows browsing of uncommitted transactions (transactions that have not resulted in a new revision). Most repositories will have no such transactions, because transactions are usually either committed (which disqualifies them from viewing) or aborted and removed.
In the absence of both the --revision and --transaction options, svnlook will examine the youngest (or “HEAD”) revision in the repository. So the following two commands do exactly the same thing when 19 is the youngest revision in the repository located at /path/to/repos:
$ svnlook info /path/to/repos $ svnlook info /path/to/repos --revision 19
The only exception to these rules about subcommands is the svnlook youngest subcommand, which takes no options, and simply prints out the HEAD revision number.
$ svnlook youngest /path/to/repos 19
Output from svnlook is designed to be both human- and machine-parsable. Take as an example the output of the info subcommand:
$ svnlook info path/to/repos sally 2002-11-04 09:29:13 -0600 (Mon, 04 Nov 2002) 27 Added the usual Greek tree.
The output of the info subcommand is defined as:
The author, followed by a newline.
The date, followed by a newline.
The number of characters in the log message, followed by a newline.
The log message itself, followed by a newline.
This output is human-readable, meaning items like the datestamp are displayed using a textual representation instead of something more obscure (such as the number of nanoseconds since the Tasty Freeze guy drove by). But this output is also machine-parsable—because the log message can contain multiple lines and be unbounded in length, svnlook provides the length of that message before the message itself. This allows scripts and other wrappers around this command to make intelligent decisions about the log message, such as how much memory to allocate for the message, or at least how many bytes to skip in the event that this output is not the last bit of data in the stream.
Another common use of svnlook is to actually view the contents of a revision or transaction tree. Examining the output of svnlook tree command, which displays the directories and files in the requested tree (optionally showing the filesystem node revision IDs for each of those paths) can be extremely helpful to administrators deciding on whether or not it is safe to remove a seemingly dead transaction. It is also quite useful for Subversion developers who are diagnosing filesystem-related problems when they arise.
$ svnlook tree path/to/repos --show-ids / <0.0.1> A/ <2.0.1> B/ <4.0.1> lambda <5.0.1> E/ <6.0.1> alpha <7.0.1> beta <8.0.1> F/ <9.0.1> mu <3.0.1> C/ <a.0.1> D/ <b.0.1> gamma <c.0.1> G/ <d.0.1> pi <e.0.1> rho <f.0.1> tau <g.0.1> H/ <h.0.1> chi <i.0.1> omega <k.0.1> psi <j.0.1> iota <1.0.1>
svnlook can perform a variety of other queries, displaying subsets of bits of information we've mentioned previously, reporting which paths were modified in a given revision or transaction, showing textual and property differences made to files and directories, and so on. The following is a brief description of the current list of subcommands accepted by svnlook, and the output of those subcommands:
Print the tree's author.
Print the contents of a file in the tree.
Print the tree's datestamp.
List all files and directories that changed in the tree.
Print unified diffs of changed files.
List the directories in the tree that were themselves changed, or whose file children were changed.
Print the tree's author, datestamp, log message character count, and log message.
Print the tree's log message.
Print the names and values of properties set on paths in the tree.
Print the tree listing, optionally revealing the filesystem node revision IDs associated with each path.
Print the youngest revision number.
The svnadmin program is the repository administrator's best friend. Besides providing the ability to create Subversion repositories, this program allows you perform several maintenance operations on those repositories. The syntax of svnadmin is similar to that of svnlook:
$ svnadmin help general usage: svnadmin SUBCOMMAND REPOS_PATH [ARGS & OPTIONS ...] Type "svnadmin help <subcommand>" for help on a specific subcommand. Available subcommands: archive create createtxn …
We've already mentioned svnadmin's create subcommand (see the section called “Repository Creation and Configuration”). Most of the others we will cover in more detail later in this chapter. For now, let's just take a quick glance at what each of the available subcommands offers.
Displays the paths of database log files which can be safely archived away (and removed) from the repository.
Creates a new Subversion repository.
Creates a transaction in the repository based on a given existing revision.
Dumps the contents of the repository, bounded by a given set of revisions, using a portable dump format.
Lists the paths of Berkeley DB log files associated with the repository. This list includes all logfiles—those still in use by Subversion, as well as those no longer in use.
Lists the paths of Berkeley DB log files associated with, but no longer used by, the repository. You may safely remove these logfiles from the repository layout, possibly archiving them for use in the event that you ever need to perform a catastrophic recovery of the repository.
Loads a set of revisions into a repository from a stream of data that uses the same portable dump format generated by the dump subcommand.
List the revisions in which a given path in the repository was modified.
List the names of uncommitted Subversion transactions which currently exist in the repository.
Perform recovery steps on a repository that is in need of such, generally after a fatal error has occurred which prevented a process from cleanly shutting down its communication with the repository.
Cleanly remove Subversion transactions from the repository (conveniently fed by output from the lstxns subcommand).
Replace the current value of the svn:log (commit log message) property on a given revision in the repository with a new value.
Verify the contents of the repository. This includes, among other things, checksum comparisons of the versioned data stored in the repository.
The Subversion source tree also comes with a shell-like interface to the repository. The svnshell.py Python script (located in tools/examples/ in the source tree) uses Subversion's language bindings (so you have to have those properly compiled and installed in order for this script to work) to connect to the repository and filesystem libraries.
Once started, the program behaves similarly to a shell program, allowing you to browse the various directories in your repository. Initially, you are “positioned” in the root directory of the HEAD revision of the repository, and presented with a command prompt. You can use the help command at any time to display a list of available commands and what they do.
$ svnshell.py /path/to/repos <rev: 2 />$ help Available commands: cat FILE : dump the contents of FILE cd DIR : change the current working directory to DIR exit : exit the shell ls [PATH] : list the contents of the current directory lstxns : list the transactions available for browsing setrev REV : set the current revision to browse settxn TXN : set the current transaction to browse youngest : list the youngest browsable revision number <rev: 2 />$
Navigating the directory structure of your repository is done in the same way you would navigate a regular Unix or Windows shell—using the cd command. At all times, the command prompt will show you what revision (prefixed by rev:) or transaction (prefixed by txn:) you are currently examining, and at what path location in that revision or transaction. You can change your current revision or transaction with the setrev and settxn commands, respectively. As in a Unix shell, you can use the ls command to display the contents of the current directory, and you can use the cat command to display the contents of a file.
Example 5.1. Using svnshell to Navigate the Repository
<rev: 2 />$ ls REV AUTHOR NODE-REV-ID SIZE DATE NAME ---------------------------------------------------------------------------- 1 sally < 2.0.1> Nov 15 11:50 A/ 2 harry < 1.0.2> 56 Nov 19 08:19 iota <rev: 2 />$ cd A <rev: 2 /A>$ ls REV AUTHOR NODE-REV-ID SIZE DATE NAME ---------------------------------------------------------------------------- 1 sally < 4.0.1> Nov 15 11:50 B/ 1 sally < a.0.1> Nov 15 11:50 C/ 1 sally < b.0.1> Nov 15 11:50 D/ 1 sally < 3.0.1> 23 Nov 15 11:50 mu <rev: 2 /A>$ cd D/G <rev: 2 /A/D/G>$ ls REV AUTHOR NODE-REV-ID SIZE DATE NAME ---------------------------------------------------------------------------- 1 sally < e.0.1> 23 Nov 15 11:50 pi 1 sally < f.0.1> 24 Nov 15 11:50 rho 1 sally < g.0.1> 24 Nov 15 11:50 tau <rev: 2 /A>$ cd ../.. <rev: 2 />$ cat iota This is the file 'iota'. Added this text in revision 2. <rev: 2 />$ setrev 1; cat iota This is the file 'iota'. <rev: 1 />$ exit $
As you can see in the previous example, multiple commands may be specified at a single command prompt, separated by a semicolon. Also, the shell understands the notions of relative and absolute paths, and will properly handle the "." and ".." special path components.
The youngest command displays the youngest revision. This is useful for determining the range of valid revisions you can use as arguments to the setrev command—you are allowed to browse all the revisions (recalling that they are named with integers) between 0 and the youngest, inclusively. Determining the valid browsable transactions isn't quite as pretty. Use the lstxns command to list the transactions that you are able to browse. The list of browsable transactions is the same list that svnadmin lstxns returns, and the same list that is valid for use with svnlook's --transaction option.
Once you've finished using the shell, you can exit cleanly by using the exit command. Alternatively, you can supply an end-of-file character—Control-D (though some Win32 Python distributions use the Windows Control-Z convention instead).
Currently, the Subversion repository has only one database back-end—Berkeley DB. All of your filesystem's structure and data live in a set of tables within the db subdirectory of your repository. This subdirectory is a regular Berkeley DB environment directory, and can therefore be used in conjunction with any of Berkeley's database tools (you can see the documentation for these tools at SleepyCat's website, http://www.sleepycat.com/). For day-to-day Subversion use, these tools are unnecessary, however, they do provide some important functionality that is currently not provided by Subversion itself.
For example, because Subversion uses Berkeley DB's logging facilities, the database first writes out a description of any modifications it is about to make, and then makes the modification itself. This is to ensure that if something goes wrong, the database system can back up to a previous checkpoint—a location in the log files known not to be corrupt—and replay transactions until the data is restored to a usable state. This functionality is one of the main reasons why Berkeley DB was chosen as Subversion's initial database back-end.
Over time, these log files can accumulate. That is actually a feature of the database system—you should be able to recreate your entire database using nothing but the log files, so these files are important for catastrophic database recovery. But typically, you'll want to archive the log files that are no longer in use by Berkeley DB, and then remove them from disk to conserve space. Berkeley DB provides a db_archive utility for, among other things, listing the log files that are associated with a given database and which are no longer in use. That way, you know which files to archive and remove. The svnadmin utility provides a convenient wrapper around this Berkeley DB tool:
$ svnadmin archive /path/to/repos /path/to/repos/log.0000000031 /path/to/repos/log.0000000032 /path/to/repos/log.0000000033 $ svnadmin archive /path/to/repos | xargs rm ## disk space reclaimed!
Subversion's own repository uses a post-commit hook script, which, after performing a “hot backup” of the repository, removes these excess logfiles. In the Subversion source tree, the script tools/backup/hot-backup.py illustrates the safe way to perform a backup of a Berkeley DB database environment while it's being actively accessed: recursively copy the entire repository directory, then re-copy the logfiles listed by db_archive -l.
Generally speaking, only the truly paranoid would need to backup their entire repository every time a commit occurred. However, assuming that a given repository has some other redundancy mechanism in place with relatively fine granularity (like per-commit emails), a hot backup of the database might be something that a repository administrator would want to include as part of a system-wide nightly backup. For more repositories, archived commit emails alone are sufficient restoration sources, at least for the last few commits. But it's your data; protect it as much as you'd like.
Berkeley DB also comes with a pair of utilities for converting the database tables to and from flat ASCII text files. The db_dump and db_load programs write and read, respectively, a custom file format which describes the keys and values in a Berkeley DB database. Since Berkeley databases are not portable across machine architectures, this format is a useful way to transfer those databases from machine to machine, irrespective of architecture or operating system.
Your Subversion repository will generally require very little attention once it is configured to your liking. However, there are times when some manual assistance from an administrator might be in order. The svnadmin utility provides some helpful functionality to assist you in performing such tasks as
modifying commit log messages,
removing dead transactions,
recovering “wedged” repositories, and
migrating repository contents to a different repository.
Perhaps the most commonly used of svnadmin's subcommands is setlog. When a transaction is committed to the repository and promoted to a revision, the descriptive log message associated with that new revision (and provided by the user) is stored as an unversioned property attached to the revision itself. In other words, the repository remembers only the latest value of the property, and discards previous ones.
Sometimes a user will have an error in her log message (a misspelling or some misinformation, perhaps). If the repository is configured (using the pre-revprop-change and post-revprop-change hooks; see the section called “Hook Scripts”) to accept changes to this log message after the commit is finished, then the user can “fix” her log message remotely using the svn program's propset command (see Chapter 8, Subversion Complete Reference). However, because of the potential to lose information forever, Subversion repositories are not, by default, configured to allow changes to unversioned properties— except by an administrator.
If a log message needs to be changed by an administrator, this can be done using svnadmin setlog. This command changes the log message (the svn:log property) on a given revision of a repository, reading the new value from a provided file.
$ echo "Here is the new, correct log message" > newlog.txt $ svnadmin setlog myrepos newlog.txt -r 388
Another common use of svnadmin is to query the repository for outstanding—possibly dead—Subversion transactions. In the event that commit should fail, the transaction is usually cleaned up. That is, the transaction itself is removed from the repository, and any data associated with (and only with) that transaction is removed as well. Occasionally, though, a failure occurs in such a way that the cleanup of the transaction never happens. This could happen for several reasons: perhaps the client operation was inelegantly terminated by the user, or a network failure might have occurred in the middle of an operation, etc. Regardless of the reason, these dead transactions serve only to clutter the repository and consume resources.
You can use svnadmin's lstxns command to list the names of the currently outstanding transactions.
$ svnadmin lstxns myrepos 19 3a1 a45 $
Each item in the resultant output can then be used with svnlook (and its --transaction option) to determine who created the transaction, when it was created, what types of changes were made in the transaction—in other words, whether or not the transaction is a safe candidate for removal! If so, the transaction's name can be passed to svnadmin rmtxns, which will perform the cleanup of the transaction. In fact, the rmtxns subcommand can take its input directly from the output of lstxns!
$ svnadmin rmtxns myrepos `svnadmin lstxns myrepos` $
If you use these two subcommands like this, you should consider making your repository temporarily inaccessible to clients. That way, no one can begin a legitimate transaction before you start your cleanup. The following is a little bit of shell-scripting that can quickly generate information about each outstanding transaction in your repository:
Example 5.2. txn-info.sh (Reporting Outstanding Transactions)
#!/bin/sh ### Generate informational output for all outstanding transactions in ### a Subversion repository SVNADMIN=/usr/local/bin/svnadmin SVNLOOK=/usr/local/bin/svnlook REPOS=${1} if [ x$REPOS = x ] ; then echo "usage: $0 REPOS_PATH" exit fi for TXN in `${SVNADMIN} lstxns ${REPOS}`; do echo "---[ Transaction ${TXN} ]-------------------------------------------" ${SVNLOOK} info ${REPOS} --transaction ${TXN} done
You can run the previous script using /path/to/txn-info.sh /path/to/repos. The output is basically a concatenation of several chunks of svnlook info output (see the section called “svnlook”), and will look something like:
$ txn-info.sh myrepos ---[ Transaction 19 ]------------------------------------------- sally 2001-09-04 11:57:19 -0500 (Tue, 04 Sep 2001) 0 ---[ Transaction 3a1 ]------------------------------------------- harry 2001-09-10 16:50:30 -0500 (Mon, 10 Sep 2001) 39 Trying to commit over a faulty network. ---[ Transaction a45 ]------------------------------------------- sally 2001-09-12 11:09:28 -0500 (Wed, 12 Sep 2001) 0 $
Usually, if you see a dead transaction that has no log message attached to it, this is the result of a failed update (or update-like) operation. These operations use Subversion transactions under the hood to mimic working copy state. Since they are never intended to be committed, Subversion doesn't require a log message for those transactions. Transactions that do have log messages attached are almost certainly failed commits of some sort. Also, a transaction's datestamp can provide interesting information—for example, how likely is it that an operation begun nine months ago is still active?
In short, transaction cleanup decisions need not be made unwisely. Various sources of information—including Apache's error and access logs, the logs of successful Subversion commits, and so on—can be employed in the decision-making process. Finally, an administrator can often simply communicate with a seemingly dead transaction's owner (via email, for example) to verify that the transaction is, in fact, in a zombie state.
In order to protect the data in your repository, the database back-end uses a locking mechanism. This mechanism ensures that portions of the database are not simultaneously modified by multiple database accessors, and that each process sees the data in the correct state when that data is being read from the database. When a process needs to change something in the database, it first checks for the existence of a lock on the target data. If the data is not locked, the process locks the data, makes the change it wants to make, and then unlocks the data. Other processes are forced to wait until that lock is removed before they are permitted to continue accessing that section of the database.
In the course of using your Subversion repository, fatal errors (such as running out of disk space or available memory) or interruptions can prevent a process from having the chance to remove the locks it has placed in the database. The result is that the back-end database system gets “wedged”. When this happens, any attempts to access the repository hang indefinitely (since each new accessor is waiting for a lock to go away—which isn't going to happen).
First, if this happens to your repository, don't panic. Subversion's filesystem takes advantage of database transactions and checkpoints and pre-write journaling to ensure that only the most catastrophic of events [10] can permanently destroy a database environment. A sufficiently paranoid repository administrator will be making off-site backups of the repository data in some fashion, but don't call your system administrator to restore a backup tape just yet.
Secondly, use the following recipe to attempt to “unwedge” your repository:
Make sure that there are no processes accessing (or attempting to access) the repository. For networked repositories, this means shutting down the Apache HTTP Server, too.
Become the user who owns and manages the repository.
Run the command svnadmin recover /path/to/repos. You should see output like this:
Acquiring exclusive lock on repository db, and running recovery procedures. Please stand by... Recovery completed. The latest repos revision is 19.
Restart the Subversion server.
This procedure fixes almost every case of repository lock-up. Make sure that you run this command as the user that owns and manages the database, not just as root. Part of the recovery process might involve recreating from scratch various database files (shared memory regions, for example). Recovering as root will create those files such that they are owned by root, which means that even after you restore connectivity to your repository, regular users will be unable to access it.
If the previous procedure, for some reason, does not successfully unwedge your repository, you should do two things. First, move your broken repository out of the way and restore your latest backup of it. Then, send an email to the Subversion developer list (at <dev@subversion.tigris.org>) describing your problem in detail. Data integrity is an extremely high priority to the Subversion developers.
A Subversion filesystem has its data spread throughout various database tables in a fashion generally understood by (and of interest to) only the Subversion developers themselves. However, circumstances may arise that call for all, or some subset, of that data to be collected into a single, portable, flat file format. Subversion provides such a mechanism, implemented in a pair of svnadmin subcommands: dump and load.
The most common reason to dump and load a Subversion repository is due to changes in Subversion itself. As Subversion matures, there are times when certain changes made to the back-end database schema cause Subversion to be incompatible with previous versions of the repository. The recommended course of action when you are upgrading across one of those compatibility boundaries is a relatively simple process:
Using your current version of svnadmin, dump your repositories to dump files.
Upgrade to the new version of Subversion.
Move your old repositories out of the way, and create new empty ones in their place using your new svnadmin.
Again using your new svnadmin, load your dump files into their respective, just-created repositories.
Finally, be sure to copy any customizations from your old repositories to the new ones, including DB_CONFIG files and hook scripts. You'll want to pay attention to the release notes for the new release of Subversion to see if any changes since your last upgrade affect those hooks or configuration options.
svnadmin dump will output a range of repository revisions that are formatted using Subversion's custom filesystem dump format. The dump format is printed to the standard output stream, while informative messages are printed to the standard error stream. This allows you to redirect the output stream to a file while watching the status output in your terminal window. For example:
$ svnlook youngest myrepos 26 $ svnadmin dump myrepos > dumpfile * Dumped revision 0. * Dumped revision 1. * Dumped revision 2. … * Dumped revision 25. * Dumped revision 26.
At the end of the process, you will have a single file (dumpfile in the previous example) that contains all the data stored in your repository in the requested range of revisions.
The other subcommand in the pair, svnadmin load, parses the standard input stream as a Subversion repository dump file, and effectively replays those dumped revisions into the target repository for that operation. It also gives informative feedback, this time using the standard output stream:
$ svnadmin load newrepos < dumpfile <<< Started new txn, based on original revision 1 * adding path : A ... done. * adding path : A/B ... done. … ------- Committed new rev 1 (loaded from original rev 1) >>> <<< Started new txn, based on original revision 2 * editing path : A/mu ... done. * editing path : A/D/G/rho ... done. ------- Committed new rev 2 (loaded from original rev 2) >>> … <<< Started new txn, based on original revision 25 * editing path : A/D/gamma ... done. ------- Committed new rev 25 (loaded from original rev 25) >>> <<< Started new txn, based on original revision 26 * adding path : A/Z/zeta ... done. * editing path : A/mu ... done. ------- Committed new rev 26 (loaded from original rev 26) >>>
Note that because svnadmin uses standard input and output streams for the repository dump and load process, people who are feeling especially saucy can try things like this (perhaps even using different versions of svnadmin on each side of the pipe):
$ svnadmin create newrepos $ svnadmin dump myrepos | svnadmin load newrepos
We mentioned previously that svnadmin dump outputs a range of revisions. Use the --revision option to specify a single revision to dump, or a range of revisions. If you omit this option, all the existing repository revisions will be dumped.
$ svnadmin dump myrepos --revision 23 > rev-23.dumpfile $ svnadmin dump myrepos --revision 100:200 > revs-100-200.dumpfile
As Subversion dumps each new revision, it outputs only enough information to allow a future loader to re-create that revision based on the previous one. In other words, for any given revision in the dump file, only the items that were changed in that revision will appear in the dump. The only exception to this rule is the first revision that is dumped with the current svnadmin dump command.
By default, Subversion will not express the first dumped revision as merely differences to be applied to the previous revision. For one thing, there is no previous revision in the dump file! And secondly, Subversion cannot know the state of the repository into which the dump data will be loaded (if it ever, in fact, occurs). To ensure that the output of each execution of svnadmin dump is self-sufficient, the first dumped revision is by default a full representation of every directory, file, and property in that revision of the repository.
However, you can change this default behavior. If you add the --incremental option when you dump your repository, svnadmin will compare the first dumped revision against the previous revision in the repository, the same way it treats every other revision that gets dumped. It will then output the first revision exactly as it does the rest of the revisions in the dump range—mentioning only the changes that occurred in that revision. The benefit of this is that you can create several small dump files that can be loaded in succession, instead of one large one, like so:
$ svnadmin dump myrepos 0 1000 > dumpfile1 $ svnadmin dump myrepos 1001 2000 --incremental > dumpfile2 $ svnadmin dump myrepos 2001 3000 --incremental > dumpfile3
These dump files could be loaded into a new repository with the following command sequence:
$ svnadmin load newrepos < dumpfile1 $ svnadmin load newrepos < dumpfile2 $ svnadmin load newrepos < dumpfile3
Another neat trick you can perform with this --incremental option involves appending to an existing dump file a new range of dumped revisions. For example, you might have a post-commit hook that simply appends the repository dump of the single revision that triggered the hook. Or you might have a script like the following that runs nightly to append dump file data for all the revisions that were added to the repository since the last time the script ran.
Example 5.3. Using Incremental Repository Dumps
#!/usr/bin/perl $repos_path = '/path/to/repos'; $dumpfile = '/usr/backup/svn-dumpfile'; $last_dumped = '/var/log/svn-last-dumped'; # Figure out the starting revision (0 if we cannot read the last-dumped file, # else use the revision in that file incremented by 1). if (open LASTDUMPED, "$last_dumped") { $new_start = <LASTDUMPED>; chomp $new_start; $new_start++; close LASTDUMPED; } else { $new_start = 0; } # Query the youngest revision in the repos. $youngest = `svnlook youngest $repos_path`; chomp $youngest; # Do the backup. `svnadmin dump $repos_path $new_start $youngest --incremental >> $dumpfile`; # Store a new last-dumped revision open LASTDUMPED, "> $last_dumped" or die; print LASTDUMPED "$youngest\n"; close LASTDUMPED; # All done!
Used like this, svnadmin's dump and load commands can be a valuable means by which to backup changes to your repository over time in case of a system crash or some other catastrophic event.
Finally, another possible use of the Subversion repository dump file format is conversion from a different storage mechanism or version control system altogether. Because the dump file format is, for the most part, human-readable, [11] it should be relatively easy to describe generic sets of changes—each of which should be treated as a new revision—using this file format.
Despite numerous advances in technology since the birth of the modern computer, one thing unfortunately rings true with crystalline clarity—sometimes, things go very, very awry. Power outages, network connectivity dropouts, corrupt RAM and crashed hard drives are but a taste of the evil that Fate is poised to unleash on even the most conscientious administrator. And so we arrive at a very important topic—how to make backup copies your repository data.
There are generally two types of backup methods available for Subversion repository administrators—incremental and full. We discussed in an earlier section of this chapter how to use svnadmin dump --incremental to perform an incremental backup (see the section called “Migrating a Repository”). Essentially, the idea is to only backup at a given time the changes to the repository since the last time you made a backup.
A full backup of the repository is quite literally a duplication of the entire repository directory (which includes the Berkeley database environment). Now, unless you temporarily disable all other access to your repository, simply doing a recursive directory copy runs the risk of generating a defunct backup, since someone might be currently writing to the database.
Fortunately, Sleepycat's Berkeley DB documents describe a certain order in which database files can be copied which will guarantee a valid backup copy. And better still, you don't have to implement that algorithm yourself, because the Subversion development team has already done so. The hot-backup.py script is found in the tools/backup/ directory of the Subversion source distribution. Given a repository path and a backup location, hot-backup.py will perform the necessary steps for backing up your live repository—without requiring that you bar public repository access at all—and then will clean out the dead Berkeley logfiles from your live repository.
Even if you also have an incremental backup, you might want to run this program on a regular basis. For example, you might consider adding hot-backup.py to a program scheduler (such as crond on Unix systems). Or, if you prefer fine-grained backup solutions, you could have your post-commit hook script call hot-backup.py (see the section called “Hook Scripts”), which will then cause a new backup of your repository to occur with every new revision created. Simply add the following to the hooks/post-commit script in your live repository directory:
(cd /path/to/hook/scripts; ./hot-backup.py ${REPOS} /path/to/backups &)
The resulting backup is a fully functional Subversion repository, able to be dropped in as a replacement for your live repository should something go horribly wrong.
There are benefits to both types of backup methods. The easiest is by far the full backup, which will always result in a perfect working replica of your repository. This again means that should something bad happen to your live repository, you can restore from the backup with a simple recursive directory copy. Unfortunately, if you are maintaining multiple backups of your repository, these full copies will each eat up just as much disk space as your live repository.
Incremental backups using the repository dump format are excellent to have on hand if the database schema changes between successive versions of Subversion itself. Since a full repository dump and load are generally required to upgrade your repository to the new schema, it's very convenient to already have half of that process (the dump part) finished. Unfortunately, the creation of—and restoration from—incremental backups takes longer, as each commit is effectively replayed into either the dumpfile or the repository.
In either backup scenario, repository administrators need to be aware of how modifications to unversioned revision properties affect their backups. Since these changes do not themselves generate new revisions, they will not trigger post-commit hooks, and may not even trigger the pre-revprop-change and post-revprop-change hooks. [12] And since you can change revision properties without respect to chronological order—you can change any revision's properties at any time—an incremental backup of the latest few revisions might not catch a property modification to a revision that was included as part of a previous backup.
Often, the best approach to repository backups is a diversified one. You can leverage combinations of full and incremental backups, plus archives of commit emails. The Subversion developers, for example, back up the Subversion source code repository after every new revision is created, and keep an archive of all the commit and property change notification emails. Your solution might be similar, but should be catered to your needs and that delicate balance of convenience with paranoia. And while all of this might not save your hardware from the iron fist of Fate, [13] it should certainly help you recover from those trying times.
[10] e.g.: hard drive + huge electromagnet = disaster
[11] The Subversion repository dump file format resembles an RFC-822 format, the same type of format used for most email.
[12] svnadmin setlog, for example, bypasses the hook interface altogether.
[13] You know—the collective term for all of her “fickle fingers”.