A Subversion repository can be accessed simultaneously by clients running on the same machine on which the repository resides. But the typical Subversion setup involves a single server machine being accessed from clients on computers all over the office—or, perhaps, all over the world.
This section describes how to get your Subversion repository exposed outside its host machine for use by remote clients. We will cover each of Subversion's currently available server mechanisms, discussing the configuration and use of each one. After reading this section, you should be able to decide which networking setup is right for your needs, and understand how to enable such a setup on your host computer.
Subversion's primary network server is the Apache HTTP Server (httpd), speaking the WebDAV/deltaV protocol. This protocol (an extension to HTTP 1.1; see http://www.webdav.org/) takes the ubiquitous HTTP protocol that is core of the World Wide Web, and adds writing—specifically, versioned writing—capabilities. The result is a standardized, robust system that is conveniently packaged as part of the Apache 2.0 software, is supported by numerous pieces of core operating system and third-party products, and which doesn't require network administrators to open up yet another custom port. [14]
Much of the following discussion includes references to Apache configuration directives. While some examples are given of the use of these directives, describing them in full is outside the scope of this chapter. The Apache team maintains excellent documentation, publicly available on their website at http://httpd.apache.org. For example, a general reference for the configuration directives is located at http://httpd.apache.org/docs-2.0/mod/directives.html.
Also, as you make changes to your Apache setup, it is likely that somewhere along the way a mistake will be made. If you are not already familiar with Apache's logging subsystem, you should become aware of it. In your httpd.conf file are directives which specify the on-disk locations of the access and error logs generated by Apache (the CustomLog and ErrorLog directives, respectively). Subversion's mod_dav_svn uses Apache's error logging interface as well. You can always browse the contents of those files for information that might reveal the source of a problem which is not clearly noticeable otherwise.
To network your repository over HTTP, you basically need four components, available in two packages. You'll need Apache httpd 2.0, the mod_dav DAV module that comes with it, Subversion, and the mod_dav_svn filesystem provider module distributed with Subversion. Once you have all of those components, the process of networking your repository is as simple as:
getting httpd 2.0 up and running with the mod_dav module,
installing the mod_dav_svn plugin to mod_dav, which uses Subversion's libraries to access the repository, and
configuring your httpd.conf file to export (or expose) the repository.
You can accomplish the first two items either by compiling httpd and Subversion from source code, or by installing pre-built binary packages of them on your system. For the most up-to-date information on how to compile Subversion for use with Apache HTTP Server, as well as how to compile and configure Apache itself for this purpose, see the INSTALL file in the top level of the Subversion source code tree.
Once you have all the necessary components installed on your system, all that remains is the configuration of Apache via its httpd.conf file. Instruct Apache to load the mod_dav_svn module using the LoadModule directive. This directive must precede any other Subversion-related configuration items. If your Apache was installed using the default layout, your mod_dav_svn module should have been installed in the modules subdirectory of the Apache install location (often /usr/local/apache2). The LoadModule directive has a simple syntax, mapping a named module to the location of a shared library on disk:
LoadModule dav_svn_module modules/mod_dav_svn.so
Note that if mod_dav was compiled as a shared object (instead of statically linked, directly to the httpd binary), you'll need a similar LoadModule statement for it, too.
At a later location in your configuration file, you now need to tell Apache where you keep your Subversion repository (or repositories). The Location directive has an XML-like notation, starting with an opening tag, and ending with a closing tag, with various other configuration directives in the middle. The purpose of the Location directive is to instruct Apache to do something special when handling requests that are directed at a given URL or one of its children. In the case of Subversion, you want Apache to simply hand off support for URLs that point at versioned resources to the DAV layer. You can instruct Apache to delegate the handling of all URLs whose path portions (the part of the URL that follows the server's name and the optional port number) begin with /repos/ to a DAV provider whose repository is located at /absolute/path/to/repository using the following httpd.conf syntax:
<Location /repos> DAV svn SVNPath /absolute/path/to/repository </Location>
If you plan to support multiple Subversion repositories that will reside in the same parent directory on your local disk, you can use an alternative directive, the SVNParentPath directive, to indicate that common parent directory. For example, if you know you will be creating multiple Subversion repositories in a directory /usr/local/svn that would be accessed via URLs like http://my.server.com/svn/repos1, http://my.server.com/svn/repos2, and so on, you could use the httpd.conf configuration syntax in the following example:
<Location /svn> DAV svn SVNParentPath /usr/local/svn </Location>
Using the previous syntax, Apache will delegate the handling of all URLs whose path portions begin with /svn/ to the Subversion DAV provider, which will then assume that any items in the directory specified by the SVNParentPath directive are actually Subversion repositories. This is a particularly convenient syntax in that, unlike the use of the SVNPath directive, you don't have to restart Apache in order to create and network new repositories.
At this stage, you should strongly consider the question of permissions. If you've been running Apache for some time now as your regular web server, you probably already have a collection of content—web pages, scripts and such. These items have already been configured with a set of permissions that allows them to work with Apache, or more appropriately, that allows Apache to work with those files. Apache, when used as a Subversion server, will also need the correct permissions to read and write to your Subversion repository.
You will need to determine a permission system setup that satisfies Subversion's requirements without messing up any previously existing web page or script installations. This might mean changing the permissions on your Subversion repository to match those in use by other things the Apache serves for you, or it could mean using the User and Group directives in httpd.conf to specify that Apache should run as the user and group that owns your Subversion repository. There is no single correct way to set up your permissions, and each administrator will have different reasons for doing things a certain way. Just be aware that permission-related problems are perhaps the most common oversight when configuring a Subversion repository for use with Apache.
And while we are speaking about permissions, we should address how the authorization and authentication mechanisms provided by Apache fit into the scheme of things. Unless you have some system-wide configuration of these things, the Subversion repositories you make available via the Location directives will be generally accessible to everyone. In other words,
anyone can use their Subversion client to checkout a working copy of a repository URL (or any of its subdirectories),
anyone can interactively browse the repository's latest revision simply by pointing their web browser to the repository URL, and
anyone can commit to the repository.
If you want to restrict either read or write access to a repository as a whole, you can use Apache's built-in access control features. The easiest such feature is the Basic authentication mechanism, which simply uses a username and password to verify that a user is who she says she is. Apache provides an htpasswd utility for managing the list of acceptable usernames and passwords, those to whom you wish to grant special access to your Subversion repository. Let's grant commit access to Sally and Harry. First, we need to add them to the password file.
$ ### First time: use -c to create the file $ htpasswd -c /etc/svn-auth-file harry New password: ***** Re-type new password: ***** Adding password for user harry $ htpasswd /etc/svn-auth-file sally New password: ******* Re-type new password: ******* Adding password for user sally $
Next, you need to add some more httpd.conf directives inside your Location block to tell Apache what to do with your new password file. The AuthType directive specifies the type of authentication system to use. In this case, we want to specify the Basic authentication system. AuthName is an arbitrary name that you give for the authentication domain. Most browsers will display this name in the pop-up dialog box when the browser is querying the user for his name and password. Finally, use the AuthUserFile directive to specify the location of the password file you created using htpasswd.
After adding these three directives, your <Location> block should look something like this:
<Location /svn> DAV svn SVNParentPath /usr/local/svn AuthType Basic AuthName "Subversion repository" AuthUserFile /path/to/users/file </Location>
Now, at this stage, if you were to restart Apache, any Subversion operations which required authentication would harvest a username and password from the Subversion client, which would either provide previously cached values for these things, or prompt the user for the information. All that remains is to tell Apache which operations actually require that authentication.
You can restrict access on all repository operations by adding the Require valid-user directive to your <Location> block. Using our previous example, this would mean that only clients that claimed to be either harry or sally, and which provided the correct password for their respective username, would be allowed to do anything with the Subversion repository.
Sometimes you don't need to run such a tight ship. The repository at http://svn.collab.net/repos/svn which holds the Subversion source code, for example, allows anyone in the world to perform read-only repository tasks (like checking out working copies and browsing the repository with a web browser), but restricts all write operations to authenticated users. To do this type of selective restriction, you can use the Limit and LimitExcept configuration directives. Like the Location directive, these blocks have starting and ending tags, and you would nest them inside your <Location> block.
The parameters present on the Limit and LimitExcept directives are HTTP request types that are affected by that block. For example, if you wanted to disallow all access to your repository except the currently supported read-only operations, you would use the LimitExcept directive, passing the GET, PROPFIND, OPTIONS, and REPORT request type parameters. Then the previously mentioned Require valid-user directive would be placed inside the <LimitExcept> block instead of just inside the <Location> block.
<Location /svn> DAV svn SVNParentPath /usr/local/svn AuthType Basic AuthName "Subversion repository" AuthUserFile /path/to/users/file <LimitExcept GET PROPFIND OPTIONS REPORT> Require valid-user </LimitExcept> </Location>
These are only a few simple examples. For more in-depth information about Apache access control, take a look at the Security section of the Apache documentation's tutorials collection at http://httpd.apache.org/docs-2.0/misc/tutorials.html.
Subversion makes use of the COPY request type to perform server-side copies of files and directories. As part of the sanity checking done by the Apache modules, the source of the copy is expected to be located on the same machine as the destination of the copy. To satisfy this requirement, you might need to tell mod_dav the name you use as the hostname of your server. Generally, you can use the ServerName directive in httpd.conf to accomplish this.
ServerName svn.red-bean.com
If you are using Apache's virtual hosting support via the NameVirtualHost directive, you may need to use the ServerAlias directive to specify additional names that your server is known by. Again, refer to the Apache documentation for full details.
One of the most useful benefits of an Apache/WebDAV configuration for your Subversion repository is that the youngest revisions of your versioned files and directories are immediately available for viewing via a regular web browser. Since Subversion uses URLs to identify versioned resources, those URLs used for HTTP-based repository access can be typed directly into a Web browser. Your browser will issue a GET request for that URL, and based on whether that URL represents a versioned directory or file, mod_dav_svn will respond with a directory listing or with file contents.
Since the URLs do not contain any information about which version of the resource you wish to see, mod_dav_svn will always answer with the youngest version. This functionality has the wonderful side-effect that you can pass around Subversion URLs to your peers as references to documents, and those URLs will always point at the latest manifestation of that document. Of course, you can even use the URLs as hyperlinks from other web sites, too.
You generally will get more use out of URLs to versioned files—after all, that's where the interesting content tends to lie. But you might have occasion to browse a Subversion directory listing, where you'll quickly note that the generated HTML used to display that listing is very basic, and certainly not intended to be aesthetically pleasing (or even interesting). To enable customization of these directory displays, Subversion provides an XML index feature. A single SVNIndexXSLT directive in your repository's Location block of httpd.conf will instruct mod_dav_svn to generate XML output when displaying a directory listing, and to reference the XSLT stylesheet of your choice:
<Location /svn> DAV svn SVNParentPath /usr/local/svn SVNIndexXSLT "/svnindex.xsl" … </Location>
Using the SVNIndexXSLT directive and a creative XSLT stylesheet, you can make your directory listings match the color schemes and imagery used in other parts of your website. Or, if you'd prefer, you can use the sample stylesheets provided in the Subversion source distribution's tools/xslt/ directory. Keep in mind that the path provided to the SVNIndexXSLT directory is actually a URL path—browsers need to be able to read your stylesheets in order to make use of them!
Several of the features already provided by Apache in its role as a robust Web server can be leveraged for increased functionality or security in Subversion as well. Subversion communicates with Apache using Neon, which is a generic HTTP/WebDAV library with support for such mechanisms as SSL (the secure socket layer) and Deflate compression (the same algorithm used by the gzip and PKZIP programs to “shrink” files into smaller chunks of data). You need only to compile support for the features you desire into Subversion and Apache, and properly configure the programs to use those features.
This means that SSL-enabled Subversion clients can access SSL-enabled Apache servers and perform all communication using an encrypted protocol, all by using https: URLs with their Subversion clients instead of http: ones. Businesses that need to expose their repositories for access outside the company firewall should be conscious of the possibility that unauthorized parties could be “sniffing” their network traffic. SSL makes that kind of unwanted attention less likely to result in sensitive data leaks. Apache can be configured such that only SSL-enabled Subversion clients can communicate with the repository.
Deflate compression places a small burden on the client and server to compress and decompress network transmissions as a way to minimize the size of the actual transmission. In cases where network bandwidth is in short supply, this kind of compression can greatly increase the speed at which communications between server and client can be sent. In extreme cases, this minimized network transmission could be the difference between an operation timing out or completing successfully.
Less interesting, but equally useful, are other features of the Apache and Subversion relationship, such as the ability to specify a custom port (instead of the default HTTP port 80) or a virtual domain name by which the Subversion repository should be accessed, or the ability to access the repository through a proxy. These things are all supported by Neon, so Subversion gets that support for free.
As an alternative to Apache, Subversion also provides a stand-alone server program, svnserve. This program is considerably more lightweight than Apache, and much easier to configure. It speaks a custom protocol with Subversion clients over an ordinary TCP/IP connection.
There are two basic ways that svnserve can be used:
In this scenario, a daemon svnserve process runs on the server, listening for incoming connections. The svn client connects using a custom svn:// URL schema. The client connection is accepted unconditionally, and the repository is accessed with no authenticated username. Most often, administrators configure the daemon to allow read-only operations.
In this scenario, the svn client uses a custom svn+ssh:// URL schema; this initiates a local Secure Shell (SSH) process which connects to the server and authenticates itself. The user must have some sort of system account on the server for this to happen. After authentication is complete, the SSH process launches a temporary, private svnserve process on the server, running as the authenticated user. The server and client communicate over the encrypted ssh tunnel.
Note that these methods of svnserve usage aren't mutually exclusive; you can easily use both techniques on your server at the same time.
When run with no options, svnserve writes data to stdout and reads data from stdin, attempting to negotiate a session with an svn client:
$ svnserve ( success ( 1 1 ( ANONYMOUS ) ( ) ) )
This isn't immediately useful to anyone; svnserve behaves this way so it can be run by the inetd daemon. But there are a couple of different ways to run svnserve as a daemon.
One option is to register an svn service with the server machine's inetd. Then, when a client attempts to connect to that machine on port 3690, [15] inetd will launch a “one-off” svnserve process to handle that client's request.
When configuring this type of setup, keep in mind that you might not want to launch the svnserve process as the user root (or as any other user with unlimited permissions). Depending on the ownership and permissions of the repositories you're exporting, a different—perhaps custom—user might be more appropriate. For example, you might wish to create a new user named svn, grant that user exclusive rights to the Subversion repositories, and configure your svnserve processes to run as that user.
Of course, this first method is only available on machines which have an inetd (or inetd-like) daemon. This will generally be limited to Unix platforms. The alternative is to run svnserve as a standalone daemon. When started with the -d option, svnserve will immediately detach from the current shell process, and will execute as a background process which runs indefinitely, again waiting for incoming requests on port 3690.
$ svnserve -d $ # svnserve is still running, but the user is returned to the prompt
When a client makes a network connection to the svnserve process (running either as a daemon, or as a “one-off” handler), no authentication takes place. The server process accesses the repository as whatever user it's running as, and if the client performs a commit, the new revision has no svn:author property at all.
Once the svnserve program is running, it makes every repository on your system available to the network. In other words, if a client tries to checkout svn://example.com/usr/local/repos/project, an svnserve process running on example.com would look for a repository at the absolute path /usr/local/repos/project. To increase security, you would pass the -r option to svnserve, which restricts it to exporting only repositories below that path:
$ svnserve -d -r /usr/local …
Using the -r option effectively modifies the location that the program treats as the root of the remote filesystem space. Clients then use URLs that have that path portion removed from them, leaving much shorter (and much less revealing) URLs:
$ svn checkout svn://example.com/repos/project …
Finally, to disable write access to your repositories, start svnserve with the -R option. This will permit only read operations on the data in your repository.
It is generally interesting to know which user (of a potentially unbounded number of users) is responsible for a set of changes. It's even more important to be able to limit repository write access to a select group of people. [16] To accomplish both of these goals, clients using libsvn_ra_svn can tunnel their network sessions over SSH.
In this scenario, each time the client attempts to contact the Subversion repository, a new svnserve process is started on the server machine by a local SSH process. There's no need to run a standalone svnserve daemon—the client's authenticated SSH connection launches a private svnserve process on the server machine, running as the authenticated user. (This is, in fact, the same way that CVS works when using its :ext: access method and SSH.) SSH itself requires an authenticated user, so there is no anonymity. And since the access is not anonymous, the authenticated username is stored as the author of any repository changes.
A Subversion client can perform SSH tunneling by using the svn+ssh:// schema, and specifying the absolute path to the repository, like so:
$ svn checkout svn+ssh://example.com/usr/local/repos/project Password: …
By default, the svn client will attempt to launch a local program named ssh, found somewhere in the user's $PATH. The name of the SSH program can be overridden, however, in one of two ways. You can either set the SVN_SSH environment variable to the new name, or you can set the value of the ssh variable with in the [tunnels] section of your client's run-time config file.
For example, the ssh process will use your current username when attempting to authenticate with the server machine. You may want ssh to use a different username instead. To do this, you can either run the command
$ export SVN_SSH="ssh -l username"
… or you can edit your run-time config to contain
[tunnels] ssh = ssh -l username
For more information about changing the config run-time configuration file, see the section called “Config”.
When choosing between Apache HTTP Server and the custom svnserve program, there is no single correct decision. Depending on your particular requirements, one of the available solutions might seem more right for you. And in fact, these servers can run in parallel, each accessing your repositories in its own way, and each without hindering the other. To recap the previous two sections, here's a brief list of the highlights of the two available Subversion servers—choose whatever works best for you and your users.
Authentication: http basic/digest auth, certificates, LDAP, etc. Apache has a large collection of authentication modules. No need to create real system accounts for users.
Authorization: read/write access can be restricted per-repository (using httpd.conf directives), or per-directory (using mod_authz_svn).
No need to open a new firewall port.
Built-in (limited) repository web browsing.
Limited interoperability with other WebDAV clients.
Ability to use caching HTTP proxies.
Time-tested scalability—Apache is the web server of choice for countless large-volume corporate websites. It takes a lickin' and keeps traffickin'.
Vastly expandable: access to numerous existing authentication and authorization methods, encryption, compression, etc.
Authentication: only via SSH tunnel. Users require system accounts.
Authorization: via shared ownership/permissions on repository DB files.
Significantly more lightweight than Apache, and faster for most operations.
Much easier to configure and run than Apache.
Can use existing SSH security infrastructure.