OpenGL testing: epilogue

So, that worked. But it turned out not to help after all!

When loading images, I used to do this:

  • Get a Pixmap as an XImage in whatever form the X server hands us;
  • Convert that XImage to 32-bit RGBA in client-local endianness;
  • Create an OpenGL texture from it using GL_RGBA / GL_UNSIGNED_BYTE.

That has been working for some time, but it was slow: copying and converting the image cost me about 0.1 second per image. I thought to speed that up by cutting out that "conversion" phase and doing it like this instead:

<LJ-CUT text=" --More--(12%) ">

  • Get a Pixmap as an XImage in whatever form the X server hands us;
  • Figure out the way to express that form to OpenGL, and create the texture from the raw data, using, e.g., GL_BGRA / GL_UNSIGNED_INT_8_8_8_8_REV.

strangehours had the inspiration of the sensible way to compute the GL format/type values to use, and with that last test, it was working.

But then when I plugged it in to the real code, I found that it had gotten slower instead of faster! Apparently if you use a "packed" type, instead of using GL_BYTE and passing each color component in separately, GL takes six times longer to construct the texture. I guess the way packed types are implemented internally is by converting them to something else first, and that conversion is even slower than the conversion I had been doing originally.

So, yay, that was a big waste of time.

The actual problem I'm trying to solve here is that when you run the glslideshow screensaver (the one that pans/zooms through a series of images, in a direct ripoff of the MacOS X slideshow screen saver) there's a visible glitch every time a new image loads. In the currently-released version of xscreensaver, that glitch could freeze the animation for up to a couple seconds, since it was waiting for the image to be loaded from disk and everything.

I've fixed most of that problem by loading the image file in the background. Once the image data is in memory, I get signalled, and only then have to stop and convert it to a texture. So that glitch is down to about 0.1 or 0.2 seconds now. But I was trying to shave some more off that time (which was the point of that whole exercise earlier.)

In glslideshow, image loading happens in three stages:

  1. Fork a process and run xscreensaver-getimage in the background. This writes image data to a server-side X pixmap.
  2. When that completes, a callback informs us that the pixmap is ready. Then we download the pixmap data from the server with XGetImage (or XShmGetImage.)

  3. Convert the XImage data to a form OpenGL can use.

  4. Finally, construct a texture.

So, the speed of step 1 doesn't really matter, since that happens in the background. But steps 2, 3, and 4 happen in this process, and cause the visible glitch.

Step 2 can't be moved to another process without opening a second connection to the X server, which is pretty heavy-weight. (That would be possible, though; the other process could open an X connection, retrieve the pixmap, and feed it back to us through a pipe or something.)

Step 3 is what I spent the last few days trying to optimize, and failed.

Step 4 is also hard. I can't just fork() and load the texture in another process, because glXCreateContext says:

An arbitrary number of contexts can share a single display-list space. However, all rendering contexts that share a single display-list space must themselves exist in the same address space. Two rendering contexts share an address space if both are nondirect using the same server, or if both are direct and owned by a single process. Note that in the nondirect case, it is not necessary for the calling threads to share an address space, only for their related rendering contexts to share an address space.

So I think that means that the only way two processes can share GL state is if you turn "direct" off, which I think means that they run unaccelerated (or perhaps only "less accelerated"?), because they're going through the GLX protocol instead of talking to the hardware directly.

I think that maybe threads running in the same process might be able to share accelerated GL contexts, but xscreensaver doesn't use threads now, and I really don't want to deal with the portability hassle of adding them.

I guess the Apple saver must be doing this by loading the textures in a shared-address-space thread. I think that's probably the only way to make this work.


Also, the API for shared-memory XImages is just stupid. There's so much book-keeping you need to do around them that I'm pretty sure I'm leaking shared memory segments, but fuck if I know what to do about it... Seriously, go read the code in xscreensaver/utils/xshm.c and feel the pain! The XShmSegmentInfo data has to have "at least" the lifetime of the XImage itself, so you've got two things you need to pass around to every user. There is a destroy hook on XImages themselves, but (I think) the hook on SHM images free only the server side of the shared segment, not the client side; you have to do that explicitly.

I'm tempted to just turn off XSHM in xscreensaver under the assumption that on modern machines, the speed advantage isn't worth the hassle.

Tags: , , ,

14 Responses:

  1. duskwuff says:


    First of all, is there any reason you HAVE to go through the X server (or SHM madness) and use XImages to pass around images instead of using "normal" shared memory? If the API's so stupid, then...

    Also, is there any reason step 3 can't take place in the helper app?

    Finally, I can't be sure, but one possibility is that Apple's slideshow saver just loads all of the images into (texture?) memory when it loads. The default slideshows are around a dozen 1024x768 images each - that'd be 36 MB of unpacked data. My machine's got 64 MB of VRAM, so that's a distinct possibility.

    On the other hand, I just ran gdb on Apple's slideshow[1], and it shows up with two threads. I can't tell exactly what the second threads is doing without some more work, but you're probably correct that it's loading images.

    [1]: cd /System/Library/Frameworks/ScreenSaver.framework; cd Versions/Current/Resources/; gdb --args Contents/MacOS/ScreenSaverEngine -debug -module Forest (or whatever -- and no, the CWD doesn't matter)

  2. nothings says:

    Your speed problem is probably because you're generating mipmaps, which is going to be slow with the packed formats since it's done in software and is either unpacking to bytes and repacking, or using some really inefficient general purpose mask-and-shift code. You could check whether just using glTexImage2D() is faster. (I could have sworn I mentioned this the first time around, but I may have spaced. Also, higher-end hardware can do the mipmap generation in hardware, but I don't know how this shakes out, or whether the glu interface takes advantage of it.)

    However, if you need mipmaps for visual quality reason, you can just generate them in software across multiple frames, and then finally download them all at once once it's ready. If the downloading of them is slow, you can download them in chunks with glTexSubImage2D().

    • jwz says:

      Yeah, I pretty much need mipmaps in the places they are being used; the aliasing is pretty jarring otherwise.

      • nothings says:

        Actually, I guess the other big speed hit comes from not using texture maps whose dimensions are powers of two; that's all the hardware can handle, so gluBuildMipmaps2D() will automatically rescale to the nearest powers-of-two, which is (a) slow and (b) not so great since it induces double-sampling. glTexImage2D() won't rescale.


        1. If your textures are not power-of-two in dimension, see if you can switch. If you can't switch, pad them out to a power-of-two, and adjust your texture coordinates to only use the defined part of it.

        2. Once your texture is power-of-two, switch to calling glTexImage2D() and see if your glitch improves (even though the visual quality will suck without mipmaps)

        2. If it does, here's some never-compiled code to generate mipmaps reasonably efficiently without dropping to assembly: mipmap.txt although since it's never-compiled and untested you might want to try to find some other source that's compiled and tested.

  3. edge_walker says:

    I was wondering about the same thing as <lj user="zetawoof" />. Why not load the image and convert it to something GL can use in the helper process, without involving X at all, then feed the whole shebang back to the main process with a pipe? That would leave only step 4 in the main process, which you really can't do much about. (Though <lj user="nothings" />' suggestions might help take the spike out of that? I've written a lot of blitting code but I know jack about GL.)

  4. gen_witt says:

    Forgive me, but why are you transmitting the image to the server and then back again. Both the loader and the screensaver are on the (same) client. Couldn't you just pass the image data (say through an unammed pipe). That would cut down on some of the conversions at least. Also cannot you do the background loading in one thread by doing some event pump voodoo (non-blocking i/o, then wait on either X event or i/o event, etc), although in truth this sounds like a hell of a lot of work.

    • gen_witt says:

      It would be smart of me to read the other comments first, wouldn't it.

    • jwz says:

      Yes, it's kind of dumb that the image is making three trips over the pipe before getting displayed, but that's because the framework the OpenGL programs use is kind of an afterthought.

      With a normal X program, this makes more sense: you load images by calling xscreensaver-getimage, which hands you a server-side pixmap; the X clients operate on that pixmap and no round-trips are necessary. But in the GL case, turning a pixmap into a texture necessitates another round trip.

      In the case where xscreensaver-getimage is grabbing an image of the desktop, it doesn't move any data between client and server; it just copies the desktop to a pixmap on the server side, and it's done. So in that case, normal X programs do no round trips at all.

      • nothings says:

        Can you draw the pixmap onto the GL-accessible framebuffer without using GL, and then glCopyTexImage2D() to grab it? (This still leaves mipmaps, but there are solutions to that too.) I guess there are size issues for the desktop unless you do it in little blocks or something.

        • jwz says:

          I don't think there's any way to do that. I think that glXCreateGLXPixmap() creates a shared area that you can draw into with GL commands but not with X commands. But, I couldn't even get that far (glXMakeCurrent just core dumped on me.)

          • gen_witt says:

            You need to pass glXCreateGLXPixmap, an XPixmap, which from what I can tell you can draw into the XPixmap provided you use wait judiciously. From the way the forth edition of the red book is written, it looks like you can:

            glXCreateGLXPixmap( );

            I'm sure there's a bunch of GLContext shit i'm missing or missunderstanding, because basically I hate/suck at X and GLX. Of course who knows a particualr GL/GLX driver may just default to software emulation and loads of message passing when you start doing things like this.

            Although on an aside, i've used X drawing and GL drawing commands side by side in an XWindow, with FLTK doing all the real work for me; which is to say I know it can be done.

            • Did you really just state that it works... "as long as you use wait judiciously"?

              Aren't things like OpenGL supposed to get us the fuck away from the old happy horseshit of timing loops to wait for hardware to behave?

              • gen_witt says:

                By wait I ment, glXWaitX and glXWaitGL which syncronise GL and X drawing calls together, because when you make a gl call the drawing doesn't happen until the driver/hardware feels that it would be efficient.

            • jwz says:

              I tried that a few days ago, and I can't make it work. I just get core dumps in the second call to glXCreateContext (if the first one had used "direct") or no data (if neither uses "direct".) The code I tried is in test-texture4.c