address spaces and bad craziness

I have a .saver bundle. It gets dynamically loaded and run by either my test harness, or by the actual screen saver framework. In the former case, it works fine; in the latter case it blows up. This seems to have something to do with static variables getting reset to their default state instead of the values they had been set to. I'm very puzzled, and I don't really understand how dynamic loading of bundles works on this system. Here's what I've got:

ScreenSaverView.m: init() calls yarandom.c: ya_rand_init(). Then it calls dangerball.c: screenhack(), which later calls yarandom.c: ya_random(). ScreenSaverView.m is ObjC, everything else is C.

So I've sprinkled some test code around. It looks like this: In yarandom.c I've got:

    static unsigned int bad_craziness = 0xABCD1234;
    void TEST (void) {
      if (bad_craziness != 0xDEADBEEF) abort();

    ya_rand_init (unsigned int seed)
      bad_craziness = 0xDEADBEEF;

Every call to TEST() made from ScreenSaverView.m after calling ya_rand_init() succeeds. In particular: it calls TEST() then immediately calls screenhack(). The very first line in screenhack() is a call to TEST() which fails, because bad_craziness is 0xABCD1234 instead of 0xDEADBEEF. The value appears to change back to its previous value halfway through setting up the stack frame.

So maybe what's going on is not that it's getting reset, but that somehow there are two copies of the yarandom.o code loaded, and different ones are being seen by the two files? (Update: yes.) But they were all loaded together from the same .saver bundle.

W, I must ask, TF?

Tags: , , ,

18 Responses:

  1. bodyfour says:

    At the top of TEST() add something like:

      FILE *fp = fopen("/tmp/crazy.log", "a");
      if (fp != NULL) {
        fprintf(fp, "Value: 0x%08X, address=%p\n", bad_craziness, &bad_craziness);

    • jwz says:

      Yup, two addresses, alternating back and forth. Freaky:

        void TEST (void) {
        static int count = 0;
        fprintf(stderr, "TEST: %4d: 0x%08X @ 0x%08X\n", count++, bad_craziness, &bad_craziness);

        TEST: 0: 0xABCD1234 @ 0x0605E148
        TEST: 1: 0xDEADBEEF @ 0x0605E148
        TEST: 2: 0xDEADBEEF @ 0x0605E148
        TEST: 3: 0xDEADBEEF @ 0x0605E148
        TEST: 4: 0xDEADBEEF @ 0x0605E148
        TEST: 0: 0xABCD1234 @ 0x060B9148
        TEST: 1: 0xABCD1234 @ 0x060B9148
        TEST: 2: 0xABCD1234 @ 0x060B9148
        TEST: 3: 0xABCD1234 @ 0x060B9148
        ...770 lines omitted...
        TEST: 774: 0xABCD1234 @ 0x060B9148
        TEST: 775: 0xABCD1234 @ 0x060B9148
        TEST: 776: 0xABCD1234 @ 0x060B9148
        TEST: 5: 0xDEADBEEF @ 0x0605E148
        TEST: 6: 0xDEADBEEF @ 0x0605E148
        TEST: 777: 0xABCD1234 @ 0x060B9148
        TEST: 778: 0xABCD1234 @ 0x060B9148
        TEST: 779: 0xABCD1234 @ 0x060B9148
        TEST: 780: 0xABCD1234 @ 0x060B9148
        TEST: 781: 0xABCD1234 @ 0x060B9148
        TEST: 782: 0xABCD1234 @ 0x060B9148
        TEST: 7: 0xDEADBEEF @ 0x0605E148
        TEST: 8: 0xDEADBEEF @ 0x0605E148
        TEST: 783: 0xABCD1234 @ 0x060B9148
        TEST: 784: 0xABCD1234 @ 0x060B9148
        TEST: 785: 0xABCD1234 @ 0x060B9148
        TEST: 786: 0xABCD1234 @ 0x060B9148

      • bodyfour says:

        Well, at least we know the cause then...

        Couple ideas come to mind:

        • So these are all getting linked into the same binary, right? Can you control the order that they're passed to ld in Xcode? You might try putting ya_random.o first or last and see if that makes a difference.
        • My personal guess is that the calling conventions being different in ObjC -vs- C and becuase of that you're somehow getting two copies of the .o emited. I don't really know enough about ObjC to do much but hand-wave, but I'm thinking about how C++ does that name mangling thing in order to support polymorphism. That's why when you're linking C++ to C you need to specify "extern "C" { ... }" around the prototype to tell the compiler "Hey, this really is the name of the function I'm calling; don't mangle it" I wonder if ObjC has something similar and it's screwing you. Maybe running "nm foo.saver" would give you some hints
        • jwz says:

          Mangling only applies to C++ functions, though; the "extern C" thing says something along the lines of "this is a C function, not a method in the current context", right? Even in a C++ executable, C function names are what you expect, it's only the methods that are wacky. Anyway, nm says there's only one copy of ya_random() and bad_craziness in the actual executable...

          • bodyfour says:

            > "this is a C function, not a method in the current context", right?

            No, because C++ allows you to have polymorphic global functions. For instance, this is valid C++:

            int foo(int val) {
            return val;
            int foo(char *a, char *b) {
            return *a == *b;

            ...which generates:

            % objdump -t a.o |grep foo
            00000000 g F .text 00000008 _Z3fooi
            00000008 g F .text 00000017 _Z3fooPcS_

            So when C++ calls an extern function it will normally try to link with one with a type-mangled version of the name. If the prototype was declared as extern "C" then it knows to link with the non-mangled version.

            > Anyway, nm says there's only one copy of ya_random() and bad_craziness in the actual executable...

            Yeah, but clearly it's lying, right? Maybe adding the -m option will provide more enlightenment?

            • jwz says:

              nm -m says:

              00021148 (__DATA,__data) non-external _bad_craziness
              0000b060 (__TEXT,__text) external _ya_random

              Incidentally, ObjC "mangling" looks like this, which maybe means "nm" is silently de-mangling it for me, or maybe means that's really how they're stored:

              0000c1d4 (__TEXT,__text) non-external -[XScreenSaverView animateOneFrame]

              It looks like the two "worlds" are in fact divided on a language basis; which means I guess I can work around this by never calling some function X from both C and ObjC, where X side-effects global data. (Specifically, call ya_rand_init() from C code instead of ObjC.) That's pretty sketchy, though, and I wish I understood why this is happening.

              • duskwuff says:

                ObjC is mangled somewhat, but not heavily. -[XScreenSaverView animateOneFrame] would mangle to _XScreenSaverView:animateOneFrame, for example.

                Your problem isn't related to cross-calling between C and ObjC, though. That much is very well defined: namely, nothing untoward happens, or should happen. What I'd be more concerned about is that more than one instance of your screensaver may be instantiated at once - especially if you have a multiheaded machine - and you might be running into race conditions.

                • jwz says:

                  It is definitely the case that more than one instance of each saver will get created. In fact, my test harness (where this problem does not occur) makes two of them, to test that. But when I fire it up in System Preferences, this problem occurs -- and in that case, there's only one instance of the saver (the small preview window) though eventually there will be 3 (preview + 2 monitors).

                  This problem can't be because there are multiple instances of ScreenSaverView, because the data that is being duplicated is in C code. Also, as far as I know, it's not possible to load the same bundle twice: the API prevents that:

                    NSBundle *bundle = [NSBundle bundleWithPath:path];
                    Class new_class = [bundle principalClass];
                    id instance = [new_class alloc];

                  It is also a mystery why this problem happens in System Preferences but not in my test harness, both of which are dynamically loading the same .saver bundle.

                  • duskwuff says:

                    In the absence of any better advice, may I suggest you inquire at cocoa-dev? There's certain to be someone there who'd know better than I what your problem might be, and what to do about it.

                  • pphaneuf says:

                    I don't really see how that snippet shows that the API prevents loading the same bundle twice. For all you know, calling that [NSBundle bundleWithPath] method twice does load it twice and gives you different NSBundles.

                    As a component hacker (I used to fiddle around in XPCOM, many years ago, eww!), I find it to be a problem in the Linux and Solaris ELF loaders (dlopen) that it will "nicely" see that you are trying to open the same shared library twice and give you a reference to the already loaded one. If two pieces of unrelated code (say, different libraries) do this dlopen() and each expect the global state of the library to be at the initial state, the second one is going to be in for a surprise.

                    So, in short, one person's feature is another's bug, it would seem...

  2. gen_witt says:

    Make bad_craziness volatile? You may be encountering a faulty compieler optomization?

  3. bodyfour says:

    Programming is hard. Let's go shopping.

    And by "shopping" I mean "drinking". It's 11:30pm on Saturday night.

  4. pphaneuf says:

    Could your bundle be loaded twice by the screen saver framework? There's a feature of the Mac OS X linker to specifically allow multiple loadings of the same bundle to be done independently, if I recall, so they can be used twice and be well insulated. But this seems only half-assed insulation to me, where you get one, and then the other.

  5. danomite55 says:

    Out of curiosity, how many displays do you have? If you have multiple displays on your computer, there will be an instance of your ScreenSaverView per display. Not sure if that would impact this or not...

  6. awkward_2 says:

    It's sadly not very helpful but I found recently that using NSAddImage() to load one of my own shared libraries was giving me the sort of symptoms you describe. Switching to dlopen with the RTLD_LOCAL flag fixed it.

    In that case I controlled 'both ends' so the fix was easier - but frankly I think it's a bug that this sort of thing can go on in such wierd way regardless of the API used.