About this blog

Hi, I'm Peter Bex, a Scheme and free software enthusiast from the Netherlands. See my user page on the CHICKEN wiki for some of my projects.


Are you in need of support for CHICKEN Scheme? Or maybe you want clear tech writing like on this very blog? Then you're in luck! I am now available as a freelance consultant!

The 3 most recent posts (archive) Atom feed

The recent trouble at GitHub, both cultural changes within the company and criticism from the community, reminded me how unstable the whole "free as in beer code hosting for the public good" idea really is. The good part is that it motivated me to finally look into setting up personal hosting for my own projects, because how hard can it be, really?

History shows: code hosting is unreliable

GitHub isn't unique in its problems, and switching away to a competitor with less problems isn't going to help in the long run. Besides, dormant projects will be irretrievably lost if GitHub ever shuts down unless someone happens to have a recent clone. To show that the problem is bigger than GitHub, let's look at some events that I remember:

  • Ages ago in internet time, in 2005, the Dutch government set up a forge website to foster open source usage within the government. In 2009, it went offline. Most projects crawled off to SourceForge to die (including mine), but some survive to this day.
  • The year 2012 is actually not that long ago, and I clearly remember when BerliOS shut down. I had used it for a project or two back when Subversion was brand new. They offered Subversion hosting when SourceForge and Savannah only offered CVS.
  • In 2014, I found out the hard way that RubyForge had shut down a few months earlier. I lost the complete commit history for a maintenance-mode project at work.
  • In 2015, Gitorious got assimilated and subsequently shut down by GitLab. At least, projects that didn't opt-in to a GitLab migration are available in read-only from their archive.
  • Also in 2015, it became known that SourceForge was adding malware to popular free software downloads and lost all remaining goodwill from the community. It's probably a matter of time before it dies completely, taking down with it an Alexandrian wealth of source code.
  • At work, we experienced some frustration when Bitbucket was having availability issues last year. They had issues several times in a short time span.
  • At the beginning of this year, Google Code shut down. Tarballs of archived projects will stay available until the end of the year, but after that the code will probably be gone forever.

Except for Gitorious/GitLab, I have used and relied on every single one of these code hosting sites, either for personal or work-related projects, or as a contributor to someone else's project.

Is your code for the public good?

As a community, we take too much for granted: Code hosting, free of charge, is regarded almost as a public utility. But in reality, it's far from that: we are relying on untrustworthy companies, assuming they won't tamper with our code and keep their servers up and running forever, free of cost.

And to top it all off, there's the irony, or should I say hypocrisy, of the free software community's dependence on proprietary software for critical project infrastructure. At the same time, some of us are trying to explain to people why proprietary software is harmful to society. Most people just choose what's most convenient when deciding where to host code. We need to realise that this decision can be a political, philosophical and ethical choice. This is my main motivation to move all my personal projects away from Bitbucket.

There are basically two ways to achieve code hosting freedom: The first is to entrust your code to a nonprofit organisation which is committed to supporting free software projects without commercial interference. For instance, the Free Software Foundation offers Savannah. There are more specific hosting sites, like those from the Debian, Apache, GNOME and KDE foundations, but they don't accept all projects. So, for small and personal projects it is probably easier to self-host.

Now, I know I can't completely avoid proprietary code hosting sites due to the network effect of contributing to free software projects, but for projects I control, I can at least do better. With CHICKEN we're already hosting our own code (with mailing lists provided by Savannah). I decided to host my personal projects on a VPS of my own, which is a good dog fooding opportunity: I found and fixed a bug in the spiffy-cgi-handlers egg while setting this up.

The rest of this blog post will explain how I set up code.more-magic.net. It's not difficult, so hopefully I can inspire you to consider hosting your own code too.

Installing Git, cgit and CHICKEN

I knew right away that I didn't need all the bells and whistles that GitLab or Phabricator provide. I just want to host a few small personal projects, and if ever one really did become popular (ha ha), it would make sense to set up a dedicated server like we have for CHICKEN.

I also decided to convert my Mercurial repositories to Git, to consolidate my VCS usage: At work we're using it, CHICKEN is using it, and so are other projects I contribute to. I'm tired of context-switching all the time, and I'm finally acclimated to magit.

Since I'm only using Git, I don't need to worry about VCS independence of the code hosting tool. Preferably it shouldn't need much RAM, to keep hosting costs down. I narrowed it down to gitweb or cgit. I chose the latter because its UI is less messy and confusing than the former (to me, at least).

Installing cgit is easy as 1, 2, 3:

$ sudo apt-get install git cgit

It is possible to install CHICKEN from its Debian package, but as a core developer I always want the latest version. Besides, CHICKEN only depends on libc, so it's no big deal:

$ sudo apt-get install gcc make libc-dev
$ wget https://code.call-cc.org/releases/4.10.0/chicken-4.10.0.tar.gz
$ tar xzf chicken-4.10.0.tar.gz
$ cd chicken-4.10.0

By installing it into /usr/local/chickens/4.10.0, you can have multiple versions of CHICKEN installed at the same time:

$ make PLATFORM=linux PREFIX=/usr/local/chickens/4.10.0
$ sudo make PLATFORM=linux PREFIX=/usr/local/chickens/4.10.0 install

A nice trick to help us remember which CHICKEN is being used for Spiffy is to symlink it by usage:

$ sudo ln -s /usr/local/chickens/4.10.0 /usr/local/chickens/spiffy

Setting up Spiffy under systemd

Let's start by installing the Spiffy egg. We'll also need a CGI handler to use cgit from Spiffy:

$ /usr/local/chickens/spiffy/chicken-install -s spiffy spiffy-cgi-handlers

First, we must create a small script to run Spiffy. Put this in /usr/local/libexec/spiffy.scm and make it executable:

#!/usr/local/chickens/spiffy/bin/csi -s

(use data-structures spiffy uri-common intarweb cgi-handler)

(spiffy-user "www-data")
(spiffy-group "www-data")
(server-port 80)

(root-path "/usr/share/cgit")
(error-log "/var/log/spiffy/error.log")
(access-log "/var/log/spiffy/access.log")
;(debug-log "/var/log/spiffy/debug.log")

(define cgit (cgi-handler* "/usr/lib/cgit/cgit.cgi"))

;; cgit expects its PATHINFO to contain the full request URI path.
;; However, this is a 404 handler, so we haven't resolved the path
;; to a final file.  This means we don't know what part of the URI
;; is the "script path" and which is the remainder (the pathinfo).
(handle-not-found
  (lambda (p)
    (let* ((uri (request-uri (current-request)))
           (path (string-intersperse (cdr (uri-path uri)) "/")))
      (parameterize ((current-pathinfo (string-split path "/")))
        (cgit path)))))

;; For the root request (otherwise you'll get 403 forbidden)
(handle-directory cgit)

(start-server)

Now, teach logrotate about the log files we configured, by saving this as /etc/logrotate.d/spiffy:

/var/log/spiffy/access.log
/var/log/spiffy/error.log
/var/log/spiffy/debug.log {
    daily
    missingok
    rotate 10
    compress
    delaycompress
    notifempty
    # If you're in the adm group, you can read logs without sudo
    create 640 www-data adm
}

This rotates logs daily, going back 10 days. Spiffy won't create the directory, and needs to be able to write to the file as www-data, so let's create the files and the directory:

$ sudo mkdir /var/log/spiffy
$ sudo touch /var/log/spiffy/{access,error,debug}.log
$ sudo chown -R www-data:adm /var/log/spiffy

The systemd script from our wiki is a bit too complicated, so I based mine on a simpler example from Python's Gunicorn documentation.

Put the following in /etc/systemd/system/multi-user.target.wants/spiffy.service:

[Unit]
Description=Spiffy the web server
After=network-online.target

[Service]
User=root
Group=www-data
WorkingDirectory=/usr/share/cgit/
ExecStart=/usr/local/libexec/spiffy.scm
ExecStop=/bin/kill -s TERM $MAINPID

[Install]
WantedBy=multi-user.target

Note that we need to run it as root so that it can bind to port 80. It will drop the privileges itself. To register this unit file immediately, you'll need to reload systemd:

$ sudo systemctl daemon-reload

Now you can start Spiffy simply by typing:

$ sudo systemctl start spiffy

If you visit the website with a browser, you'll notice that the styling doesn't work yet. To fix that, we'll turn to the cgit configuration.

Configuring cgit

The default configuration puts cgit's assets at /cgit-css. You could set up Spiffy so that this is handled from /usr/share/cgit, but it's much simpler to remove the prefix from the configuration. While we're at it, let's add some Git repositories as well. Open up /etc/cgitrc and put this in there:

# cgit config, see cgitrc(5) for details

css=/cgit.css
logo=/cgit.png

repo.url=testrepo1
repo.path=/srv/git/test1
repo.desc=This is my first git test repository

section=A section for repo 2
repo.url=testrepo2
repo.path=/srv/git/test2
repo.desc=This is my second git test repository

Let's make sure the git repos exist:

$ sudo mkdir /srv/git
$ sudo chown user:user /srv/git
$ sudo chmod 755 /srv/git
$ git init --bare /srv/git/test1
$ git init --bare /srv/git/test2

If you now visit the web site (no reload/restart necessary), you should see a fully functional cgit installation.

Improving the cgit configuration

The above is a simple configuration. My configuration currently looks more like the following:

# cgit config, see cgitrc(5) for details

css=/cgit.css
logo=/cgit.png

root-title=My repositories
root-desc=This is my repo browser. There are many like it, but this one is mine.

# This will show a "clone" section at the bottom of each repo.
clone-prefix=http://code.example.com ssh://code.example.com

# If you don't want clones to be made over HTTP, you must disable it!
#enable-http-clone=0

# When you want to serve eggs from cgit, snapshot links are helpful.
# Note that snapshots can be downloaded even when links are not shown!
snapshots=tar.gz

# Show readme files in "about" tab.  The colon tells cgit to take the
# file from the default branch (usually master). IMPORTANT: See below!
readme=:README
readme=:readme
readme=:readme.txt
readme=:README.txt
readme=:readme.md
readme=:README.md

# Process readme files with a file extension-specific formatter.
# Be *very* careful with this!  The default filter allows arbitrary
# HTML which means XSS, cookie hijacking and other tricks, so either
# run this on a sand-boxed domain or be careful who gets commit access.
about-filter=/usr/lib/cgit/filters/about-formatting.sh

# Highlight source files.  This requires the "python-pygments" package.
# For maximum dog fooding, I should use the colorize egg here :)
source-filter=/usr/lib/cgit/filters/syntax-highlighting.py

# Automatically scan /srv/git for repos.  If you want to de-list some,
# simply make them unreadable for the www-data user.  Important: this
# must be the last statement: everything after it is ignored!
section-from-path=1
scan-path=/srv/git

Many thanks to this guide for pointing out the paths and configuration settings that cgit uses on Debian. There are two follow-up posts that are useful too. There's one with tips on how to use cgit in practice and one about how to tweak the layout.

Now go forth and host your own code!

Flattr this!  (why?)

I just submitted a patch to add a statistical profiler to CHICKEN. In this post, I'll explain how it works. It's easier than you'd think!

Instrumentation-based profiling

There are two major ways to profile a program. The first way has been supported by CHICKEN as long as I can remember: You add instrumentation to each procedure. This counts how often the procedure is called, and how much time is spent in it.

You've probably done this by hand: You check the clock before and after calling a procedure, and print the difference. This can be useful when whittling down a specific procedure's run time. But when you want to know where the bottlenecks are in the first place, it's less practical. You don't want to manually add this stuff to all your procedures!

In order to easily instrument each procedure, you'll need language support, either in the compiler or in the run time. Unfortunately, the instrumentation itself will cause your program to slow down: all this tracking takes some time! That's why CHICKEN's profiler is part of the compiler: Instrumentation is emitted only when you compile with -profile. This option adds wrappers around each procedure, which look like this:

;; Original source code:
(define (foo a b c)
  (print (+ a b) c))

;; Instrumented version created by -profile:
(define foo
  (lambda args
    (dynamic-wind  ; Explained later
      (lambda () (##sys#profile-entry 0 profile-info-1234.4567))
      (lambda () (apply (lambda (a b c) (print (+ a b) c)) args))
      (lambda () (##sys#profile-exit 0 profile-info-1234.4567)) ) ) )

In the above code, ##sys#profile-entry starts the clock and increments the call count for this procedure, and ##sys#profile-exit stops the clock. The profile-info-1234.4567 is a global vector in which each procedure of this compilation unit is assigned a unique position.

To create the vector and assign the positions, a prelude is added to the compilation unit. This defines the vector and registers a position for each procedure:

;; Prelude for entire program:
(define profile-info-1234.4567 (##sys#register-profile-info 1 #t))
(##sys#set-profile-info-vector! profile-info-1234.4567 0 'foo)

;; Not exactly true, but let's pretend because it's close enough:
(on-exit ##sys#finish-profile)

This simply creates a vector of size one, and assigns the foo procedure to position zero. Then it requests ##sys#finish-profile to run when the program exits. This will write profile information to disk on exit.

We need dynamic-wind, but it creates problems

If you're not familiar with it, dynamic-wind is a sort of "guard" or "try/finally". Whenever the second lambda is entered, the first lambda ("before") is invoked, and when it is left, the third lambda ("after") is invoked. To understand why we need a dynamic-wind in the code presented earlier, consider the naive, incorrect implementation:

;; INCORRECT expansion:
(define foo
  (lambda args
    (begin
      (##sys#profile-entry 0 profile-info-1234.4567)
      (apply (lambda (a b c) (print (+ a b) c)) args)
      ;; Not reached if + throws an exception.  This
      ;; will happen when a or b are not numbers!
      (##sys#profile-exit 0 profile-info-1234.4567) ) ) )

Here, the "after" part will be skipped if the procedure raises an exception. Furthermore, a continuation might be captured or called in the procedure, even multiple times. This would cause the code to jump in and out of the procedure without neatly going over the before/after bits every time. Because of this, the profiler would miss these exits and re-entries, and hence it would not be able to accurately keep track of the time actually spent in this procedure.

The code with dynamic-wind will take care of non-local exits, stopping the clock whenever we jump out of the procedure, and starting it when we jump in again via a captured continuation.

While dynamic-wind is necessary, it also implies quite a heavy hit on performance: dynamic-wind isn't cheap. Even worse is the fact that it prevents the compiler from inlining small procedures in larger ones. Furthermore, the use of apply implies we'll need to cons up an argument list. Normally, arguments aren't put into a Scheme list, because doing so results in more garbage being created. This means that the performance shape of the profiled application can be quite different from the original, non-profiled application!

Statistical profiling

I've always wanted to look into fixing the profiler, but never had the energy to do so. Now that Felix wrote an entire graphical debugger for CHICKEN, I thought maybe I could lean on the debugger's infrastructure to make a better profiler. But it turned out I didn't have to!

First, I should explain statistical profiling. The basic idea is that the process is periodically sampled while it's running. These samples are taken by inspecting the instruction pointer and mapping it to a procedure. If you do this often enough (every 10 ms or so), you can get a pretty good idea of where the program is spending most of its time.

CHICKEN's trace buffer

Looking at the instruction pointer or C function is not very useful in CHICKEN, unless you like to pore over endless piles of machine-generated C code and to mentally map it back to Scheme. It can be educational and even fun, just like it can be fun and educational to read the assembly output of a C compiler, but it is generally unproductive and gets frustrating quickly.

So how can we take a snapshot of what a Scheme program is doing at any given time? It turns out that CHICKEN already does this: when a Scheme program raises an exception, the interpreter will show you a call trace. This is a bit like a stack trace in a "traditional" language, only it shows a trace of the execution flow that led to the error. Let's look at a contrived example:

(define (fib n)
  (if (< n 2)
      n
      (+ (fib (- n 1)) (fib (- n 2))) ) )

(define (run-fib)
  (let* ((n (string->number (car (command-line-arguments))))
         (result (fib n)))
    ;; This line is wrong: It tries to append a number to a string
    (print (string-append "Result: " result)) ) )

(run-fib)

When you compile the program and run it, you'll see the output:

 $ csc -O3 fib.scm
 $ ./fib 3
 
 Error: (string-append) bad argument type - not a string: 2
 
         Call history:
         
         fib.scm:12: run-fib
         fib.scm:7: command-line-arguments
         fib.scm:8: fib
         fib.scm:4: fib
         fib.scm:4: fib
         fib.scm:4: fib
         fib.scm:4: fib
         fib.scm:10: string-append               <--

In a stack trace you would see only two lines, mentioning run-fib and string-append. Here we can see the trace of execution through the program, where it entered run-fib, then called command-line-arguments, proceeded to invoke fib five times in total, and finally it invoked string-append with the result.

There are advantages and disadvantages to either approach, but a trace makes more sense in languages with full continuations. It is the natural thing to do in a compiler that converts all programs to continuation-passing style. It does tend to confuse beginners, though!

So, these trace points are already inserted into every program by default. This happens even in programs optimised with -O3, because trace points have very little overhead: It's just a pointer into a ring buffer that gets updated to point to the procedure's name. You can choose to omit traces completely via -no-trace or -d0, but that's not the default.

Trace points are a good fit for our profiler: when taking a sample, we can simply take a look at the newest entry in the trace buffer, which will always reflect the procedure that's currently running!

Setting up a sampler

So how can we interrupt the program at a point in time, without messing with the program's state? This is simple: we ask for a signal to be delivered periodically. There's even a dedicated signal reserved for this very task: SIGPROF.

We'll use setitimer() to set up the timer which causes the signal to be delivered, even though POSIX says it's obsolete. It is much more convenient and more widely supported than the alternative, timer_create() plus timer_settime(). We can always switch when setitimer() is removed from an actual POSIX implementation.

The following setup code is simplified a bit (most importantly, error handling is omitted):

C_word CHICKEN_run(void *toplevel) { /* Initialisation code for CHICKEN */
  /* ... a lot more code ... */

  if (profiling) {
    struct itimerval itv;
    time_t freq = 10000;                   /* 10 msec (in usec) */

    itv.it_value.tv_sec = freq / 1000000;
    itv.it_value.tv_usec = freq % 1000000;
    itv.it_interval.tv_sec = itv.it_value.tv_sec;
    itv.it_interval.tv_usec = itv.it_value.tv_usec;

    setitimer(ITIMER_PROF, &itv, NULL);
    signal(SIGPROF, profile_signal_handler);
  }

  /* ... a lot more code ... */
}

This sets up a profile timer. When such a timer expires, the kernel will send SIGPROF to the process. The code also registers a signal handler that will be invoked when this happens. It looks like this, much simplified:

struct profile_item           /* Item in our profiling hash table */
{
  char *key;                  /* Procedure name, taken from trace buffer */
  unsigned int sample_count;  /* Times this procedure was seen */
  struct profile_item *next;  /* Next bucket chain */
};

void profile_signal_handler(int signum)
{
  struct profile_item *pi;
  char *procedure_name;

  procedure_name = get_topmost_trace_entry();

  pi = profile_table_lookup(procedure_name);

  if (pi == NULL) {
    pi = profile_table_insert(procedure_name);
    pi->sample_count = 1;
  } else {
    pi->sample_count++;
  }
}

Maybe you are wondering why we're doing this in C rather than Scheme. After all, Scheme code is more readable and easier to maintain. There are a few reasons for that:

  • Scheme signal handlers are blocked in some critical sections of the run time library. This would delay profiling until the program is back in user code, skewing the results.
  • Core is compiled with -no-trace by default, but this can be turned off. Doing so can be useful when profiling core procedures, but not with a signal handler in Scheme. It would see its own code in the trace buffer, instead of what we want to trace!
  • The profiling code should be as low-overhead as possible, to avoid affecting the results. Remember, this is one of the main problems with the instrumenting profiler! While CHICKEN produces fast code, it is faster if we do it directly in C, and it will trigger no garbage collections.

After the program has completed its run, we must write the profile to a file. I won't show it here, but the CHICKEN implementation simply writes each key in the hash table with its call count and the time spent in that procedure. The time is estimated by multiplying the sampling frequency by the call count.

This means we'll miss some calls, and therefore we'll under-represent the time taken by those procedures. On the other hand, some procedures are over-represented: if a sample is taken for a very fast procedure, we'll assign 10ms to it, even if it runs in a fraction of that. This is the essence of the statistical approach: if a program runs long enough, these measurement errors should balance out.

Comparing the two profilers

It's hard to come up with a small but representative example which is self-contained, so I'll use a few benchmarks from Mario's collection.

Low-level vs high-level

The first benchmark we'll profile is the kernwyk-array benchmark. It's taken from a historical set of benchmarks by Kernighan and Van Wyk, designed to compare the performance of various "scripting languages". This particular benchmark creates an array containing a million numbers by destructively initialising it. After that, it creates a second array into which the first is copied. This is repeated 100 times.

If we compile this with csc -O3 -profile and run it, the original profiler gives us the following breakdown:

procedure calls seconds average percent
my-try 100 4.008 0.040 100.000
go 1 4.008 4.008 100.000
create-y 100 2.311 0.023 57.684
create-x 100 1.696 0.016 42.315

If we compile this with csc -O3 and run it with -:p under the new profiler, we'll get a radically different result, even though the total run time did not change much:

procedure calls seconds average percent
kernwyk-array.scm:12:make-vector 100 2.400 0.024 58.679
kernwyk-array.scm:5:make-vector 100 1.690 0.016 41.320

The statistical profiler is a bit more "low-level": It tells you exactly the procedure call and line that is taking the most time. On the other hand, the instrumentation-based profiler shows us a breakdown in which procedure the most time is spent. The percentages and "seconds" column are also different: the original profiler shows us the cumulative time each procedure takes up. Thus, a main entry point will always be at 100% at the top.

But the most significant difference is in what this tells us about where the time is spent: the original profiler tells us that create-y is a little slower than create-x. Reading such output would lead me to think that probably vector-ref and vector-set! take the most time. If we remove all calls to those, the program takes 2.6 seconds, and the profiler output looks more or less the same, so they're not the biggest contributor to the total run time. Instead, make-vector is, due to the fact that it allocates, which will cause garbage collections. And garbage collections are the real time consumers in this benchmark!

Precision of the two profilers

The next benchmark we'll look at is the nfa benchmark. I'm not sure about the origins of this benchmark. It emulates a depth-first NFA search for the regular expression ((a | c)* b c d) | (a* b c). This is matched against a string containing 133 "a" characters followed by "bc".

The output of the instrumenting profiler is completely useless, because only toplevel procedures are instrumented:

procedure calls seconds average percent
recursive-nfa 150000 8.071 0.000 100.000

This is another reason for wanting to "fix" the profiler: it doesn't give an inside view of where large procedures or closures are spending their time. You can manually tweak the programs by lifting all the inner procedures up to the toplevel. If these procedures close over some variables, you must turn those into extra arguments and pass them along when calling these procedures. It's a bit tedious, but if we do this for the benchmark, we'll get output that is more useful:

procedure calls seconds average percent
recursive-nfa 150000 30.687 0.00020458 100.000
state0 150000 30.559 0.00020373 99.582
state1 20100000 23.372 0.00000116 76.160
state2 20100000 8.423 0.00000041 27.450
state3 20100000 7.080 0.00000035 23.070
state4 150000 0.132 0.00000088 0.430

Now, if we take the original, untweaked program, and run it through the statistical profiler, we'll immediately get useful output:

procedure calls seconds average percent
nfa.scm:14:state1 146 5.830 0.039 72.602
nfa.scm:31:state3 141 2.160 0.015 26.899
nfa.scm:9:state1 2 0.020 0.010 0.249
nfa.scm:44:##sys#display-times 1 0.010 0.010 0.124
nfa.scm:9:state3 1 0.010 0.010 0.124

This also shows a disadvantage of the statistical profiler: the call counts are all wrong! That's because the state procedures are extremely fast: in the original profiler you can see that they run 20 million times in 8 seconds or so. Because they're so fast, the average time per call is close to zero. This results in the timer being too slow for sampling each procedure while it is running. It's so slow we only see an extremely small fraction of all calls!

Nevertheless, we can clearly tell that most time is spent in state1 and state3. The calls for state2 are not even registering, because this state will return much sooner: there are almost no b, c or d characters in the input pattern, so it will just quickly "fall through" this procedure without a match. The reason it shows up in the original profiler is because the instrumentation itself is interfering with an accurate reading of time spent in the procedure.

The total run time of the instrumented and tweaked version is almost 31 seconds, while the total run time of the version with statistical profiling is less than 8 seconds on my laptop! Let's take a closer look at that overhead.

Instrumentation overhead

Having two distinct ways of gathering profile data opens up a really cool opportunity. We can measure the overhead introduced by the instrumentation profiler, by running it under the statistical profiler!

Let's do that on the "tweaked" NFA benchmark again:

 $ csc -O3 -profile nfa.scm
 $ ./nfa -:p
 30.88s CPU time, 2.404s GC time (major),
 163050000/149475510 mutations (total/tracked),
 5972/113664 GCs (major/minor)

And let's look at the result:

procedure calls seconds average percent
##sys#profile-entry 701 17.000 0.024 55.051
##sys#profile-exit 633 10.710 0.016 34.682
##sys#dynamic-wind 145 1.660 0.011 5.375
nfa.scm:15:state2 61 0.670 0.010 2.169
nfa.scm:29:state3 41 0.440 0.010 1.424
nfa.scm:12:state1 31 0.340 0.010 1.101
nfa.scm:41:state0 3 0.040 0.013 0.129
nfa.scm:49:recursive-nfa 2 0.020 0.010 0.064

This clearly shows that the instrumentation machinery completely dominates the profile: A stunning 27 seconds are being soaked up by ##sys#profile-entry and ##sys#profile-exit!

Of course, this is just a contrived example of profiling a benchmark. An experienced Chickeneer would have been able to just tell from the output of the benchmark with and without profiling:

 $ csc -O3 nfa.scm
 $ ./nfa
 8.156s CPU time, 0.012s GC time (major), 32/6849 GCs (major/minor)
 $ csc -O3 -profile nfa.scm
 $ ./nfa
 30.628s CPU time, 2.316s GC time (major),
 163050000/149475510 mutations (total/tracked),
 5970/113666 GCs (major/minor)

Aside from the obvious increase in CPU time, you can see that the number of mutations went from zero to more than a hundred million. The number of garbage collections is also many times higher. This would spell certain doom if you saw it in a real program!

Conclusion

It's too early to be sure, but it looks like a statistical profiler is a useful alternative to instrumentation. On the other hand, for some programs the situation is reversed:

  • Programs blocking SIGPROF offer fewer sampling opportunities, resulting in incomplete profiles.
  • If you need the exact number of procedure calls, instrumentation is your only real option.
  • When there are lots of smallish procedures called by a few toplevel procedures, the noisy low-level output of the statistical profiler can drown out the useful information.

I think if we can push down the performance bottleneck caused by the instrumentation, it'll become a lot more useful again. This won't be easy, because some of the overhead is fundamental to its use of dynamic-wind and the way inlining is prevented by it. In the mean time, please test the statistical profiler and let me know how it works for you!

Flattr this!  (why?)

Maybe you've noticed the little Flattr icons that I've added at the bottom of each post, and now you're wondering if, and why, you should support my blogging efforts with a donation.

Let me state this first and foremost: I enjoy writing posts about subjects that interest me and I learn a lot from the research I put into my posts. So, even without donations I'd continue doing so. But, it's also a fact that many of my posts demand a lot of effort. Researching, writing and illustrating an in-depth post can take weeks, depending on my energy levels and other activities.

I hope you now understand why this blog is updated so slowly. I'm convinced that few updates with informative, long posts is better than frequent 100-word status updates of my projects. That would be a waste of your time! But because it takes so much time and effort, sometimes I struggle to find the motivation for writing new posts.

This is where Flattr comes in: a small donation is a wonderful way to express how much you liked a post, and it can really lift my spirits and motivate me to write more!

As an added advantage, by seeing which posts get Flattr'd the most I can get a better idea of what you, the readership of my blog, like and what you don't like. This will help me choose which topics to write about more.

Why I refuse to put ads on my blog

Of course, I know that placing ads on my blog would be the easiest way to get some returns from the hard work that I put into my posts. This would easily repay my hosting costs. However, in my opinion, the advertising industry is the most toxic influence on the web today. It is completely absurd that it's considered normal for companies (and individuals with blogs...) to be rewarded for filling up our screens with obnoxious garbage, and for eroding our privacy through various tracking mechanisms. They don't care that all this digital pollution is only going to be acted on by a tiny fraction of all visitors (if any). It is downright disrespectful; personally I think it's as bad as spam e-mail.

In my opinion, Flattr is a much more benign way of receiving money for my efforts. The company was created by a founder of The Pirate Bay, as a better and more direct alternative for rewarding creators than the commercial exploitation of copyright law by middle men. It's also non-unobtrusive: just a small icon in each post. The integration that I'm using is a simple hyperlink, so no nasty tracking cookies or web beacons; I'm even hosting the icon image myself.

Besides that, Flattr is completely voluntary on the visitor's part. This means that any rewards are directly linked to people's enjoyment of my posts. This encourages informative posts of substance rather than click-bait titles with disappointing content which would draw more traffic, just to get more ad impressions.

These are my views on the matter. If you're using ads on your blog, I hope I've managed to convince you to switch away from ads to visitor-supported donations. And if you like my posts, I hope you'll consider making a small donation.

Integrating Flattr into Hyde

To make this post at least a little bit on-topic, I'd like to share how I added the Flattr button to my Hyde configuration. I added the following definition to hyde.scm (slightly modified):

(use hyde uri-common)

(define (flattr-button #!optional (page (current-page)))
  (let* ((base-uri (uri-reference "https://flattr.com/submit/auto"))
         (attrs `((user_id . "YOUR FLATTR ACCOUNT NAME")
                  ;; A bit ugly, but Hyde does it like this, too
                  (url . ,(string-append ($ 'base-uri page) (page-path page)))
                  (category . "text")
                  (language . ,(or ($ 'lang page) "en_GB"))
                  (title . ,($ 'title page))))
         ;; Flattr doesn't grok semicolon as query param separator...
         (flattr-page-uri
          (parameterize ((form-urlencoded-separator "&"))
            (uri->string (update-uri base-uri query: attrs)))))

    `(a (@ (class "flattr-button")
           (href ,flattr-page-uri))
        (img (@ (src "/pics/flattr-button.svg")
                (alt "Flattr this!")
                (title "Flattr this post!"))))))

Then, in my SXML definition for article layouts which I call, surprise, surprise! article.sxml:

;; -*- Scheme -*-
()

`(div (@ (class "article"))

        ,@(prev/next-navigation)

        (h1 (@ (class "article-title"))
            ,($ 'title) " "
            (small (@ (class "date"))
                   "Posted on " ,(format-seconds (page-updated))))

        (div (@ (class "article-body"))
             (inject ,contents)
             ,(flattr-button))

        ,@(prev/next-navigation))

The button immediately follows the contents of the blog post, which makes sense because only after reading a blog post, you'll know whether it was worth something to you.

Flattr this!  (why?)

Older articles...
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License. All code fragments on this site are hereby put in the public domain.