leaks, leaks, leaks.

Ever since upgrading to 10.6, some of the xscreensaver modules leak like crazy. Not all of them, just some. But with those, it is so bad that if those savers have run for only a few hours, the machine takes minutes to come back to life as it swaps madly. The ScreenSaverEngine process is becoming truly gargantuan.

As this didn't happen in 10.5, and the relevant code in xscreensaver hasn't changed, it's pretty tempting to point the finger at the fact that 10.6 switched from reference counting to garbage colection. But, that doesn't help me figure out how to fix it, really.

The problem (or at least, one particularly bad problem) seems to be under XCopyArea. Specifically, running the "Leaks" tool in the XCode "Instruments" app on the "Moire2" saver shows me a lot of leaking happening here:

    malloc ← CGDataProviderCreateWithData ← CGDataProviderCreateWithCopyOfData ← CGBitmapContextCreateImage ← XCopyArea

The code in question looks roughly like this (jwxyz.m around line 590):

    cgi = CGBitmapContextCreateImage (src->cgc);
    CGContextDrawImage (dst->cgc, dst_rect, cgi);
    CGImageRelease (cgi);

So this suggests that CGImageRelease() is not properly releasing the DataProvider inside the CGImage. But when I do

    CGDataProviderRelease (CGImageGetDataProvider (cgi));

before the CGImageRelease(), I get warnings about reference counts going negative, so clearly that's wrong too.

I am at a loss about how to figure out what's actually going wrong here, or how to fix it. Any ideas?

Tags: , , , ,

23 Responses:

  1. fnivramd says:

    Do toy examples using XCopyArea() also leak?

  2. ts4z says:

    One possible workaround is to get the current size of the heap in the main loop. When the heap has grown too large, stop doing stuff. (I'd say exit, but I don't know if that's kosher on a Mac.)

    The best way I know to do this is just to call sbrk(0) when the saver starts up, and somewhere in the main loop; when sbrk(0) reports it's using, say, 20MB more than when you started, fail safe.

    Lame, but at least we won't be swapping.

    • dasht says:

      "workaround" is not obviously the right description. A lot of apps should do that. That should be default in many areas. That's good "loose coupling between simple parts" practice.

      In the case at hand, the leaks would still be a bug - giving rise to the gitteryness of those re-starts -- but there would be more breathing room to fix it, so to speak. A more graceful play-out of the failure mode.

    • lionsphil says:

      IIRC, MacOS has an idiotic screensaver system, whereby individual hacks are libraries loaded into the process space of the screensaver daemon. Exiting this will unlock/unblank the user's screen. (It will reblank after their timeout, if any, but that's clearly nowhere near acceptable.)

      Yes, Apple basically wrote xlock.

      (Disclaimer: information dates from 10.4. But since jwz had 32/64-bit transition issues, it would seem that this is still current for 10.6.)

    • jwz says:

      Actually, sbrk() returns crazy, nonsensical values on MacOS. I don't know what the hell is going on in their implementation of malloc(), but sometimes sbrk() will return a pointer enormously lower than the address of an early malloc(), or a gigabyte higher. I never saw this behavior with malloc() on Linux, and they claim to be using GNU malloc. See xscreensaver/hacks/memscroller.c line 318.

      Maybe getrusage() would work. But that's not a workaround, except in the way that suicide is a weight loss technique.

      • ts4z says:

        I like the analogy. It's a hack, but at least you won't have to wait for the universe to page back in.

        We tried this with getrusage in a process-pool server; IIRC, it worked OK, but on FreeBSD, sbrk was found to be better.

      • drbrain says:

        setrlimit() for at least RSS, STACK and DATA is not implemented on OS X but getrusage() seems to return valid values.

  3. spc78 says:

    I can almost guarantee it's not the switch to GC, since anything compiled to use reference counting, last I checked, will still use reference counting. GC only kicks in by default on new Xcode projects, and even there, can be turned off.

    • jwz says:

      I'm pretty sure that when GC is turned on in the runtime, release becomes a no-op.

      • spc78 says:

        Yes, but I believe those runtime settings are per-app, and not system wide. At least that's how the Xcode project settings, and Apple's documentation make it look. But that may be different for screen savers, since they're closer to a system resource.

        • jwz says:

          Screen savers are not apps, they are dynamically loaded into the address space of ScreenSaverEngine, which (as of 10.6) was compiled with mandatory GC.

          • spc78 says:

            Yeah, I kind of started figuring it was something like that the more I thought about it.

            • eminence_gris says:

              I was thinking along these lines too. Could it be something like the module is compiled with GC, but some library or other linked resource is still compiled with reference counts? I don't know if that's possible, but it could certainly lead to madness if it is.

              • jwz says:

                I'm pretty sure a gc-enabled process can't load in a non-gc-enabled code segment at all.

  4. pmb7777 says:

    Sigh. Nothing like a technology whose sole existence is so that programmers can do less work leads directly to having to do more.

    Assuming it is GC causing the problem, there are probably pointers that need to be hinted as 'weak' references. "Release" is a no-op in a GC world.

    See! Now programmers don't need to balance retain counts, they just need to properly annotate the retain semantics of every memory reference! Much easier!

  5. mhoye says:

    Vaguely relatedly, XJack: Is it expected that XJack would periodically tell me that all work and no play makes NFS server overlook not responding, still trying, NFS overlook ok?

    I doubt it, but it happens.

    • mhoye says:

      Ah, I see that it might - sorry.

    • jwz says:

      "The Enrichment Center reminds you that the Companion Cube cannot speak. In the event that the Companion Cube does speak, the Enrichment Center urges you to disregard its advice."

      • mhoye says:

        "You euthanised your faithful companion cube more quickly than any test subject on record. Congratulations."

  6. legolas says:

    Any chance that whatever you are doing in these savers blocks the garbage collector from starting? (I hadn't heard about garbage collection on 10.6 before, so I can't provide more ideas, except to say that IIRC java had some way to explicitly start the garbage collection, maybe 10.6 has that too?)

  7. owyn says:

    Hmm. When playing around with iphone development last year I ran across some mentions of clang: http://clang-analyzer.llvm.org/ which can be used to find memory leaks in obj-c projects. It worked on my rudimentary iphone stuff. At least it's a different tool than Leaks and I didn't see anyone else mention it...