The recent trouble at GitHub, both cultural changes within the company and criticism from the community, reminded me how unstable the whole "free as in beer code hosting for the public good" idea really is. The good part is that it motivated me to finally look into setting up personal hosting for my own projects, because how hard can it be, really?

History shows: code hosting is unreliable

GitHub isn't unique in its problems, and switching away to a competitor with less problems isn't going to help in the long run. Besides, dormant projects will be irretrievably lost if GitHub ever shuts down unless someone happens to have a recent clone. To show that the problem is bigger than GitHub, let's look at some events that I remember:

  • Ages ago in internet time, in 2005, the Dutch government set up a forge website to foster open source usage within the government. In 2009, it went offline. Most projects crawled off to SourceForge to die (including mine), but some survive to this day.
  • The year 2012 is actually not that long ago, and I clearly remember when BerliOS shut down. I had used it for a project or two back when Subversion was brand new. They offered Subversion hosting when SourceForge and Savannah only offered CVS.
  • In 2014, I found out the hard way that RubyForge had shut down a few months earlier. I lost the complete commit history for a maintenance-mode project at work.
  • In 2015, Gitorious got assimilated and subsequently shut down by GitLab. At least, projects that didn't opt-in to a GitLab migration are available in read-only from their archive.
  • Also in 2015, it became known that SourceForge was adding malware to popular free software downloads and lost all remaining goodwill from the community. It's probably a matter of time before it dies completely, taking down with it an Alexandrian wealth of source code.
  • At the beginning of 2015, Google Code shut down. Tarballs of archived projects will stay available until the end of the year, but after that the code will probably be gone forever.
  • In 2017, Gna! shut down. It was lesser-known but still relatively popular in some circles (especially in France).
  • In 2020, Bitbucket axed Mercurial support, simply deleting all Mercurial repositories (instead of, say, converting them to git). Some projects in maintenance mode where the author moved on to other hosting sites for their projects got their public code (and issue tracker!) removed.

Except for Gitorious/GitLab, I have used and relied on every single one of these code hosting sites, either for personal or work-related projects, or as a contributor to someone else's project.

Is your code for the public good?

As a community, we take too much for granted: Code hosting, free of charge, is regarded almost as a public utility. But in reality, it's far from that: we are relying on untrustworthy companies, assuming they won't tamper with our code and keep their servers up and running forever, free of cost.

And to top it all off, there's the irony, or should I say hypocrisy, of the free software community's dependence on proprietary software for critical project infrastructure. At the same time, some of us are trying to explain to people why proprietary software is harmful to society. Most people just choose what's most convenient when deciding where to host code. We need to realise that this decision can be a political, philosophical and ethical choice. This is my main motivation to move all my personal projects away from Bitbucket.

There are basically two ways to achieve code hosting freedom: The first is to entrust your code to a nonprofit organisation which is committed to supporting free software projects without commercial interference. For instance, the Free Software Foundation offers Savannah. There are more specific hosting sites, like those from the Debian, Apache, GNOME and KDE foundations, but they don't accept all projects. So, for small and personal projects it is probably easier to self-host.

Now, I know I can't completely avoid proprietary code hosting sites due to the network effect of contributing to free software projects, but for projects I control, I can at least do better. With CHICKEN we're already hosting our own code (with mailing lists provided by Savannah). I decided to host my personal projects on a VPS of my own, which is a good dog fooding opportunity: I found and fixed a bug in the spiffy-cgi-handlers egg while setting this up.

The rest of this blog post will explain how I set up code.more-magic.net. It's not difficult, so hopefully I can inspire you to consider hosting your own code too.

Installing Git, cgit and CHICKEN

I knew right away that I didn't need all the bells and whistles that GitLab or Phabricator provide. I just want to host a few small personal projects, and if ever one really did become popular (ha ha), it would make sense to set up a dedicated server like we have for CHICKEN.

I also decided to convert my Mercurial repositories to Git, to consolidate my VCS usage: At work we're using it, CHICKEN is using it, and so are other projects I contribute to. I'm tired of context-switching all the time, and I'm finally acclimated to magit.

Since I'm only using Git, I don't need to worry about VCS independence of the code hosting tool. Preferably it shouldn't need much RAM, to keep hosting costs down. I narrowed it down to gitweb or cgit. I chose the latter because its UI is less messy and confusing than the former (to me, at least).

Installing cgit is easy as 1, 2, 3:

$ sudo apt-get install git cgit

It is possible to install CHICKEN from its Debian package, but as a core developer I always want the latest version. Besides, CHICKEN only depends on libc, so it's no big deal:

$ sudo apt-get install gcc make libc-dev
$ wget https://code.call-cc.org/releases/4.10.0/chicken-4.10.0.tar.gz
$ tar xzf chicken-4.10.0.tar.gz
$ cd chicken-4.10.0

By installing it into /usr/local/chickens/4.10.0, you can have multiple versions of CHICKEN installed at the same time:

$ make PLATFORM=linux PREFIX=/usr/local/chickens/4.10.0
$ sudo make PLATFORM=linux PREFIX=/usr/local/chickens/4.10.0 install

A nice trick to help us remember which CHICKEN is being used for Spiffy is to symlink it by usage:

$ sudo ln -s /usr/local/chickens/4.10.0 /usr/local/chickens/spiffy

Setting up Spiffy under systemd

Let's start by installing the Spiffy egg. We'll also need a CGI handler to use cgit from Spiffy:

$ /usr/local/chickens/spiffy/chicken-install -s spiffy spiffy-cgi-handlers

First, we must create a small script to run Spiffy. Put this in /usr/local/libexec/spiffy.scm and make it executable:

#!/usr/local/chickens/spiffy/bin/csi -s

(use data-structures spiffy uri-common intarweb cgi-handler)

(spiffy-user "www-data")
(spiffy-group "www-data")
(server-port 80)

(root-path "/usr/share/cgit")
(error-log "/var/log/spiffy/error.log")
(access-log "/var/log/spiffy/access.log")
;(debug-log "/var/log/spiffy/debug.log")

(define cgit (cgi-handler* "/usr/lib/cgit/cgit.cgi"))

;; cgit expects its PATHINFO to contain the full request URI path.
;; However, this is a 404 handler, so we haven't resolved the path
;; to a final file.  This means we don't know what part of the URI
;; is the "script path" and which is the remainder (the pathinfo).
(handle-not-found
  (lambda (p)
    (let* ((uri (request-uri (current-request)))
           (uri-path-rest (cdr (uri-path uri)))
           (path (string-intersperse uri-path-rest "/")))
      (parameterize ((current-pathinfo uri-path-rest))
        (cgit path)))))

;; For the root request (otherwise you'll get 403 forbidden)
(handle-directory cgit)

(start-server)

Now, teach logrotate about the log files we configured, by saving this as /etc/logrotate.d/spiffy:

/var/log/spiffy/access.log
/var/log/spiffy/error.log
/var/log/spiffy/debug.log {
    daily
    missingok
    rotate 10
    compress
    delaycompress
    notifempty
    # If you're in the adm group, you can read logs without sudo
    create 640 www-data adm
}

This rotates logs daily, going back 10 days. Spiffy won't create the directory, and needs to be able to write to the file as www-data, so let's create the files and the directory:

$ sudo mkdir /var/log/spiffy
$ sudo touch /var/log/spiffy/{access,error,debug}.log
$ sudo chown -R www-data:adm /var/log/spiffy

The systemd script from our wiki is a bit too complicated, so I based mine on a simpler example from Python's Gunicorn documentation.

Put the following in /etc/systemd/system/multi-user.target.wants/spiffy.service:

[Unit]
Description=Spiffy the web server
After=network-online.target

[Service]
User=root
Group=www-data
WorkingDirectory=/usr/share/cgit/
ExecStart=/usr/local/libexec/spiffy.scm
ExecStop=/bin/kill -s TERM $MAINPID

[Install]
WantedBy=multi-user.target

Note that we need to run it as root so that it can bind to port 80. It will drop the privileges itself. To register this unit file immediately, you'll need to reload systemd:

$ sudo systemctl daemon-reload

Now you can start Spiffy simply by typing:

$ sudo systemctl start spiffy

If you visit the website with a browser, you'll notice that the styling doesn't work yet. To fix that, we'll turn to the cgit configuration.

Configuring cgit

The default configuration puts cgit's assets at /cgit-css. You could set up Spiffy so that this is handled from /usr/share/cgit, but it's much simpler to remove the prefix from the configuration. While we're at it, let's add some Git repositories as well. Open up /etc/cgitrc and put this in there:

# cgit config, see cgitrc(5) for details

css=/cgit.css
logo=/cgit.png

repo.url=testrepo1
repo.path=/srv/git/test1
repo.desc=This is my first git test repository

section=A section for repo 2
repo.url=testrepo2
repo.path=/srv/git/test2
repo.desc=This is my second git test repository

Let's make sure the git repos exist:

$ sudo mkdir /srv/git
$ sudo chown user:user /srv/git
$ sudo chmod 755 /srv/git
$ git init --bare /srv/git/test1
$ git init --bare /srv/git/test2

If you now visit the web site (no reload/restart necessary), you should see a fully functional cgit installation.

Improving the cgit configuration

The above is a simple configuration. My configuration currently looks more like the following:

# cgit config, see cgitrc(5) for details

css=/cgit.css
logo=/cgit.png

root-title=My repositories
root-desc=This is my repo browser. There are many like it, but this one is mine.

# This will show a "clone" section at the bottom of each repo.
clone-prefix=http://code.example.com ssh://code.example.com

# If you don't want clones to be made over HTTP, you must disable it!
#enable-http-clone=0

# When you want to serve eggs from cgit, snapshot links are helpful.
# Note that snapshots can be downloaded even when links are not shown!
snapshots=tar.gz

# Show readme files in "about" tab.  The colon tells cgit to take the
# file from the default branch (usually master). IMPORTANT: See below!
readme=:README
readme=:readme
readme=:readme.txt
readme=:README.txt
readme=:readme.md
readme=:README.md

# Process readme files with a file extension-specific formatter.
# Be *very* careful with this!  The default filter allows arbitrary
# HTML which means XSS, cookie hijacking and other tricks, so either
# run this on a sand-boxed domain or be careful who gets commit access.
about-filter=/usr/lib/cgit/filters/about-formatting.sh

# Highlight source files.  This requires the "python-pygments" package.
# For maximum dog fooding, I should use the colorize egg here :)
source-filter=/usr/lib/cgit/filters/syntax-highlighting.py

# Automatically scan /srv/git for repos.  If you want to de-list some,
# simply make them unreadable for the www-data user.  Important: this
# must be the last statement: everything after it is ignored!
section-from-path=1
scan-path=/srv/git

Many thanks to this guide for pointing out the paths and configuration settings that cgit uses on Debian. There are two follow-up posts that are useful too. There's one with tips on how to use cgit in practice and one about how to tweak the layout.

Now go forth and host your own code!