garbage collection.
by Jamie Zawinski
<jwz@jwz.org>
1-Feb-1998.
``Any sufficiently complicated C or Fortran program contains anad hoc informally-specified bug-ridden slow implementation of half of Common Lisp.''-- Philip Greenspun
In a conversation at work about the architecture of a new product, someone made an assertion along the lines of, ``then we'd need to use a garbage collector, and that would hurt performance.''
This is wrong.
It's a common belief that garbage collection means inferior performance, because everyone who has gotten into programming in the last decade regards manual storage management as a fact of life, and totally discounts the effort and performance impact of doing everything by hand.
In a large application, a good garbage collector is more efficient than malloc/free.
Now, you can argue over whether there are good C++ garbage
collectors
Absolute speed of your CPU, has nothing to do with it; for a large, complex application a good GC will be more efficient than a zillion pieces of hand-tuned, randomly micro-optimized storage management. It's a relative measure; on a slow CPU, good GC will still be more efficient.
Note that I said a good garbage collector. Don't blame the concept of GC just because you've never seen a good GC that interfaces well with your favorite language. (Especially if your favorite language is C or C++, which are really just overgrown PDP-11 assemblers, despite attempts to drag them kicking and screaming into the 1980s.)
If you're like most people, you've never seen a good garbage collector, because, while all of the commercial Lisp and Smalltalk systems had high quality GC, just about all of the open source garbage collectors are junk (e.g., Perl, Python, and Emacs.) Perl's GC is an especially bad joke: if you have circular references, the objects won't get collected until the program exits! Give me a break! (Update: I'm told that Python's so-called GC is just as bad, since it is also merely reference-counting instead of a real GC.)
Another point that often gets overlooked is that existence of a GC doesn't mean that the programmer has to play a totally hands-off role in the allocation of objects; as with any coding, there are usually a few bottlenecks that deserve special care. A system with a good GC will provide opportunities to tune; for example, to make hints to the GC about lifetime and locality and so on.
Java makes this really hard because
(largely as a result of its security and type-safety requirements) it
doesn't expose any low-level places where one can get down and dirty
with the allocator and play all the usual
carve- Java and Emacs are really bad examples of GC-oriented environments.
They're primitive, badly tuned, and give the user very little
opportunities for optimization. One way to work around this is by
throwing faster and faster CPUs at the problem, but that's not the only
way, and the fact that people seem to like to solve the problem that way
reflects inappropriately badly on the notion of GC itself.
Based on my experience using both kinds of languages, for years at
a stretch, I claim that a good GC always beats doing explicit malloc/free
in both computational efficiency and programmer time.
However, I also claim that, because of the amount of programmer time
that is saved by using GC rather than explicit malloc/free, as well as
the dramatic reduction in hard-to-debug storage-management problems,
even using a mediocre garbage collector will still result in your
ending up with better software faster.
Most of the time, throwing memory and CPU at the problem is
still cheaper than throwing programmer time at the problem,
even when you multiply the CPUs/memory by the number of users.
This isn't true all the time, but it's probably true more often
than you think, because Worse
is Better.