So, that worked. But it turned out not to help after all!
When loading images, I used to do this:
- Get a Pixmap as an XImage in whatever form the X server hands us;
- Convert that XImage to 32-bit RGBA in client-local endianness;
- Create an OpenGL texture from it using GL_RGBA / GL_UNSIGNED_BYTE.
That has been working for some time, but it was slow: copying and converting the image cost me about 0.1 second per image. I thought to speed that up by cutting out that "conversion" phase and doing it like this instead:
<LJ-CUT text=" --More--(12%) ">
- Get a Pixmap as an XImage in whatever form the X server hands us;
- Figure out the way to express that form to OpenGL, and create the texture from the raw data, using, e.g., GL_BGRA / GL_UNSIGNED_INT_8_8_8_8_REV.
strangehours had the inspiration of the sensible way to compute the GL format/type values to use, and with that last test, it was working.
But then when I plugged it in to the real code, I found that it had gotten slower instead of faster! Apparently if you use a "packed" type, instead of using GL_BYTE and passing each color component in separately, GL takes six times longer to construct the texture. I guess the way packed types are implemented internally is by converting them to something else first, and that conversion is even slower than the conversion I had been doing originally.
So, yay, that was a big waste of time.
The actual problem I'm trying to solve here is that when you run the glslideshow screensaver (the one that pans/zooms through a series of images, in a direct ripoff of the MacOS X slideshow screen saver) there's a visible glitch every time a new image loads. In the currently-released version of xscreensaver, that glitch could freeze the animation for up to a couple seconds, since it was waiting for the image to be loaded from disk and everything.
I've fixed most of that problem by loading the image file in the background. Once the image data is in memory, I get signalled, and only then have to stop and convert it to a texture. So that glitch is down to about 0.1 or 0.2 seconds now. But I was trying to shave some more off that time (which was the point of that whole exercise earlier.)
In glslideshow, image loading happens in three stages:
- Fork a process and run xscreensaver-getimage in the background. This writes image data to a server-side X pixmap.
When that completes, a callback informs us that the pixmap is ready. Then we download the pixmap data from the server with XGetImage (or XShmGetImage.)
Convert the XImage data to a form OpenGL can use.
Finally, construct a texture.
So, the speed of step 1 doesn't really matter, since that happens in the background. But steps 2, 3, and 4 happen in this process, and cause the visible glitch.
Step 2 can't be moved to another process without opening a second connection to the X server, which is pretty heavy-weight. (That would be possible, though; the other process could open an X connection, retrieve the pixmap, and feed it back to us through a pipe or something.)
Step 3 is what I spent the last few days trying to optimize, and failed.
Step 4 is also hard. I can't just fork() and load the texture in another process, because glXCreateContext says:
An arbitrary number of contexts can share a single display-list space. However, all rendering contexts that share a single display-list space must themselves exist in the same address space. Two rendering contexts share an address space if both are nondirect using the same server, or if both are direct and owned by a single process. Note that in the nondirect case, it is not necessary for the calling threads to share an address space, only for their related rendering contexts to share an address space.
So I think that means that the only way two processes can share GL state is if you turn "direct" off, which I think means that they run unaccelerated (or perhaps only "less accelerated"?), because they're going through the GLX protocol instead of talking to the hardware directly.
I think that maybe threads running in the same process might be able to share accelerated GL contexts, but xscreensaver doesn't use threads now, and I really don't want to deal with the portability hassle of adding them.
I guess the Apple saver must be doing this by loading the textures in a shared-address-space thread. I think that's probably the only way to make this work.
Blah.
Also, the API for shared-memory XImages is just stupid. There's so much book-keeping you need to do around them that I'm pretty sure I'm leaking shared memory segments, but fuck if I know what to do about it... Seriously, go read the code in xscreensaver/utils/xshm.c and feel the pain! The XShmSegmentInfo data has to have "at least" the lifetime of the XImage itself, so you've got two things you need to pass around to every user. There is a destroy hook on XImages themselves, but (I think) the hook on SHM images free only the server side of the shared segment, not the client side; you have to do that explicitly.
I'm tempted to just turn off XSHM in xscreensaver under the assumption that on modern machines, the speed advantage isn't worth the hassle.