XScreenSaver 5.11 out now

XScreenSaver 5.11 is out now. Mostly bug fixes and minor tweaks this time, the most notable fix being that it uses a lot less memory on MacOS 10.6.

My current theory as to why the memory usage got so much worse between 10.5 and 10.6 is that the 10.6 garbage collector sucks in the following way:

It only does a collection when a threshold of outstanding collectible allocations has been surpassed. However, CoreGraphics creates lots of small collectible allocations that contain pointers to very large non-collectible allocations: a small CG object that's collectible referencing large malloc'd allocations (non-collectible) containing bitmap data. So the large allocation doesn't get freed until GC collects the small allocation, which triggers its finalizer to run which frees the large allocation. So GC is deciding that it doesn't really need to run, even though the process has gotten enormous. GC eventually runs once page-outs have happened, but by then it's too late, and the machine's resident set has been sodomized.

So I fixed this by forcing an exhaustive garbage collection in the ScreenSaverEngine process approximately every 5 seconds whether the system thinks it needs one or not.

There still seem to be some leaks happening, but I think this improved matters a lot. (The X11 version doesn't leak; these are Cocoa-specific problems.)

Good times.

Tags: , , , ,

22 Responses:

  1. I thought one of the lessons anyone working a GC environment with finalizers had to internalize right away was to not use finalizers for resource reclamation. Someone at Apple needs a smack in the back of the head.

    • jwz says:

      Seriously. Apple's own documentation even says never ever do this.

      • A good question to ask though, is how could you implement this without a finalizer? You'd pretty much need to do your own reference counting... but how do you detect the event of a reference going out of scope to decrement that counter? That's what the finalizer does!

        Using a finalizer is wrong, but at least it has the advantage of being simple, and thus you could have some expectation of people to implement that incorrect technique correctly. This is no excuse of course, it's very wrong in the sense that it does not actually behave correctly in the real world, which is how we got here in the first place. Some kind of cobbled-together manual reference counting would be complicated enough to get implemented wrong anyway, and thus it would actually leak memory, instead of just effectively leak memory. Not an improvement.

        I suspect that in a lazy-GC environment, the forced GC solution is the best of a set of bad choices, in this grim GC-hook future that we live in.

        • lionsphil says:

          Hell, just be thankful this isn't Java. Finalizers under the JVM aren't even guaranteed to be run at all.

          Rather than nice, straightforward destructor semantics which are useful when you might leave a block via any number of means (hello exceptions), you are expected to carefully ensure that you call whichever class-specific release method was defined on any classes that hold non-GC resources.

          • scullin says:

            Don't even get me started. There's this crazy language called "C" where not only are there no finalizers, forcing you to release various resources in a manner that could best be described as "ad hoc", you have to even manually free your own friggin' memory. True caveman stuff, that.

        • jwz says:

          Well, the proximate problem here is that the GC doesn't know the size of the objects, so it's making poor decisions about memory usage. If there isn't a way to inform the GC, "I am a 300 byte object from boxed storage tightly coupled to 10MB of unboxed storage", there needs to be.

          Alternately, you could shake a finger elsewhere and say that the problem is that you shouldn't be allocating two halves of the same object out of different allocators. I don't know how the OSX GC actually carves out memory, but surely it's got a concept of array-headers, where the front part of the block has a boxed-storage-compatible array header with type and length, and the rest of it is unboxed bits of arbitrary size. You allocate your objects boxed and collectible, and when hysterical code needs a malloc'd buffer, you just hand it an interior pointer. Then you need locking to ensure that GC doesn't relocate the object until the C code is done with it, but I'm pretty sure OSX's GC only runs from the user-input idle-loop anyway.

        • vordark says:

          Maybe I'm misreading you, but a big part of my brain wants to jump up and scream "Why do people always think 'reference counting' when we're talking about garbage collection!?!"

          • Because they don't know their lore properly.

            One day a student came to Moon and said: "I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons."

            Moon patiently told the student the following story:

            "One day a student came to Moon and said: `I understand how to make a better garbage collector...

          • Well, the problem at hand is that the existing GC scheme is not behaving properly for these objects managing huge chunks of outside-the-GC-heap memory, so you have to roll your own memory management strategy somehow. Reference counting or some ad-hoc object lifetime detection are pretty much your choices, although the idea jwz mentions above of being able to tell the GC "this object is actually 9000k, not twenty bytes like you think it is" would be a nice approach if it were possible.

  2. Bouncing Cow is glorious on a 27" iMac. It truly underscores the beast/machine duality from a marxist perspective.

  3. barrkel says:

    Sounds like it could use the equivalent of .NET's GC.AddMemoryPressure().

  4. holywar says:

    I just think it's awesome that you told the computing industry in general to FOAD, yet still throw down things like this. I hope I can be like you when I grow up, and by "grow up", I mean before I reach the "climbing-a-belltower-with-a-high-powered-rifle" stage.

  5. Also, Pipes has some entertaining new gadgetry.

  6. rane500 says:

    XMatrix keeps locking up my Preference Pane and I have to Force Quit, but other than that these are fun on my Mac. (Versus the crappy, under-powered Linux desktops I was always given at various jobs.)

    • jwz says:

      Sigh, it works for me, of course. Try selecting System Preferences in Activity Monitor, click the "i" button and press "Sample". Hopefully that will show you a stack trace.

      • rane500 says:

        Hopefully this is helpful and isn't totally mangled by LJ.

        EDIT: Also, this is the only one I can capture this way - there's another result but the Prefs app crashes silently and I'm not familiar with GUI debugging on OS X. (I'm used to just pushing things through gdb.)

        Steps to reproduce:
        1) System Prefs -> Screensaver
        2) Select XMatrix
        3) Test, allow it to run until the "trace" starts. Screen freezes without moving past the initial number spread.
        4) Cancel test
        5) Hit "Options" for XMatrix

        Result: Blank panel, unable to close without killing System Prefs.

        Sample:
        Sampling process 77614 for 1 seconds with 1 millisecond of run time between samples
        Sampling completed, processing symbols...
        Analysis of sampling System Preferences (pid 77614) every 1 millisecond
        Call graph:
        845 Thread_1807050 DispatchQueue_1: com.apple.main-thread (serial)
        845 0x2921
        845 NSApplicationMain
        845 -[NSApplication run]
        845 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
        845 _DPSNextEvent
        845 BlockUntilNextEventMatchingListInMode
        845 ReceiveNextEventCommon
        845 RunCurrentEventLoopInMode
        845 CFRunLoopRunInMode
        845 CFRunLoopRunSpecific
        845 __CFRunLoopRun
        845 mach_msg
        845 mach_msg_trap
        845 Thread_1807054 DispatchQueue_2: com.apple.libdispatch-manager (serial)
        845 start_wqthread
        845 _pthread_wqthread
        845 _dispatch_worker_thread2
        845 _dispatch_queue_invoke
        845 _dispatch_mgr_invoke
        844 kevent
        1 _dispatch_run_timers
        845 Thread_1807634
        845 thread_start
        845 _pthread_start
        845 __NSThread__main__
        845 -[NSThread main]
        845 +[NSURLConnection(NSURLConnectionReallyInternal) _resourceLoadLoop:]
        845 CFRunLoopRunInMode
        845 CFRunLoopRunSpecific
        845 __CFRunLoopRun
        845 mach_msg
        845 mach_msg_trap
        845 Thread_1807878
        845 start_wqthread
        844 _pthread_wqthread
        844 __workq_kernreturn
        1 start_wqthread
        844 Thread_1807607
        844 start_wqthread
        840 _pthread_wqthread
        840 __workq_kernreturn
        4 _pthread_exit
        3 _pthread_tsd_cleanup
        1 _CFRelease
        1 __CFRunLoopDeallocate
        1 _CFRelease
        1 __CFBasicHashDrain
        1 __CFBasicHashStandardCallback
        1 _CFRelease
        1 __CFRunLoopModeDeallocate
        1 mach_port_get_set_status
        1 mach_msg
        1 mach_msg_trap
        1 __CFFinalizeRunLoop
        1 __CFRunLoopRemoveAllSources
        1 CFSetApplyFunction
        1 CFBasicHashApply
        1 __CFSetApplyFunction_block_invoke_1
        1 __CFRunLoopRemoveSourceFromMode
        1 CFRunLoopRemoveSource
        1 __CFRunLoopSourceCancel
        1 mach_port_extract_member
        1 mach_msg
        1 mach_msg_trap
        1 __NSFinalizeThreadData
        1 _NSThreadGet0
        1 __spin_lock
        1 _pthread_exit
        844 Thread_1807953
        844 start_wqthread
        844 _pthread_wqthread
        844 __workq_kernreturn
        1 Thread_1807953 DispatchQueue_53: com.apple.iLifeMediaBrowser_spotlightGCD (serial)
        1 start_wqthread
        1 _pthread_wqthread
        1 _dispatch_worker_thread2
        1 _dispatch_queue_invoke
        1 _dispatch_queue_drain
        1 _dispatch_queue_invoke

        Total number in stack (recursive counted multiple, when >=5):
        5 _pthread_wqthread
        5 start_wqthread

        Sort by top of stack, same collapsed (when >= 5):
        __workq_kernreturn 2528
        mach_msg_trap 1692
        kevent 844
        Sample analysis of process 77614 written to file /dev/stdout

      • rane500 says:

        Also, this might help:

        Model Name: iMac
        Model Identifier: iMac4,1
        Processor Name: Intel Core Duo
        Processor Speed: 1.83 GHz
        Number Of Processors: 1
        Total Number Of Cores: 2
        L2 Cache: 2 MB
        Memory: 2 GB
        Bus Speed: 667 MHz
        Boot ROM Version: IM41.0055.B08
        SMC Version (system): 1.1f5

        System Version: Mac OS X 10.6.3 (10D573)
        Kernel Version: Darwin 10.3.0
        Boot Mode: Normal
        Secure Virtual Memory: Not Enabled
        64-bit Kernel and Extensions: No
        Time since boot: 4 days 12 minutes

        Developer Information:

        Version: 3.2 (10M2020)
        Location: /Developer
        Applications:
        Xcode: 3.2.1 (1613)
        Interface Builder: 3.2.1 (740)
        Instruments: 2.0.1 (1096)
        Dashcode: 3.0 (328)
        SDKs:
        Mac OS X:
        10.5: (9J61)
        10.6: (10A432)

        Desktop & Screen Saver:

        Version: 3.0.1
        Supported By: Apple
        Visible: Yes
        Identifier: com.apple.preference.desktopscreeneffect
        Location: /System/Library/PreferencePanes/DesktopScreenEffectsPref.prefPane

  7. pmb7777 says:

    I'm so glad that garbage collection has freed programmers from having to understand memory management! Or was it introduced for some other reason?

    • A language designer, confronted with the problem of memory management, thinks "I know! I'll use garbage collection!" Now, he has two problems.

    • fnivramd says:

      No, abstractions are leaky. You aren't freed from knowing, you are freed from constantly worrying. The knowledge is good, the worry is bad.