garbage collection.
by Jamie Zawinski <>

``Any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementation of half of Common Lisp.''
-- Philip Greenspun

In a conversation at work about the architecture of a new product, someone made an assertion along the lines of, ``then we'd need to use a garbage collector, and that would hurt performance.''

This is wrong.

It's a common belief that garbage collection means inferior performance, because everyone who has gotten into programming in the last decade regards manual storage management as a fact of life, and totally discounts the effort and performance impact of doing everything by hand.

In a large application, a good garbage collector is more efficient than malloc/free.

Now, you can argue over whether there are good C++ garbage collectors available -- but that's an implementation detail, which is a whole different level. In a large application, a good implementation of GC will be more efficient than an equivalently-good implementation of malloc/free. This is because large applications have nontrivial object life cycles, and so you end up spending all of your time trying to figure out what exactly those life cycles are. With GC, you'll get your program written faster, and it'll be more efficient to boot.

Absolute speed of your CPU, has nothing to do with it; for a large, complex application a good GC will be more efficient than a zillion pieces of hand-tuned, randomly micro-optimized storage management. It's a relative measure; on a slow CPU, good GC will still be more efficient.

Note that I said a good garbage collector. Don't blame the concept of GC just because you've never seen a good GC that interfaces well with your favorite language. (Especially if your favorite language is C or C++, which are really just overgrown PDP-11 assemblers, despite attempts to drag them kicking and screaming into the 1980s.)

If you're like most people, you've never seen a good garbage collector, because, while all of the commercial Lisp and Smalltalk systems had high quality GC, just about all of the open source garbage collectors are junk (e.g., Perl, Python, and Emacs.) Perl's GC is an especially bad joke: if you have circular references, the objects won't get collected until the program exits! Give me a break! (Update: I'm told that Python's so-called GC is just as bad, since it is also merely reference-counting instead of a real GC.)

Another point that often gets overlooked is that existence of a GC doesn't mean that the programmer has to play a totally hands-off role in the allocation of objects; as with any coding, there are usually a few bottlenecks that deserve special care. A system with a good GC will provide opportunities to tune; for example, to make hints to the GC about lifetime and locality and so on.

Java makes this really hard because (largely as a result of its security and type-safety requirements) it doesn't expose any low-level places where one can get down and dirty with the allocator and play all the usual carve-out-a-chunk-of-memory tricks that one can play in other languages like C (or even in the historical production-quality GC-oriented environments, the Lisp Machines.)

Java and Emacs are really bad examples of GC-oriented environments. They're primitive, badly tuned, and give the user very little opportunities for optimization. One way to work around this is by throwing faster and faster CPUs at the problem, but that's not the only way, and the fact that people seem to like to solve the problem that way reflects inappropriately badly on the notion of GC itself.

Based on my experience using both kinds of languages, for years at a stretch, I claim that a good GC always beats doing explicit malloc/free in both computational efficiency and programmer time.

However, I also claim that, because of the amount of programmer time that is saved by using GC rather than explicit malloc/free, as well as the dramatic reduction in hard-to-debug storage-management problems, even using a mediocre garbage collector will still result in your ending up with better software faster.

Most of the time, throwing memory and CPU at the problem is still cheaper than throwing programmer time at the problem, even when you multiply the CPUs/memory by the number of users. This isn't true all the time, but it's probably true more often than you think, because Worse is Better.

[ up ]