I think that #1 can be solved without using a toolkit. For a dialog as simple as the password entry box, the "look" of the toolkit really just comes down to colors, fonts, and border widths. Personally, I think the current unlock dialog looks very much like the default GTK theme, but if you disagree, it's easy to tweak it by simply editing the colors and border sizes in the app-defaults file. (If you think you have changes that make it look more "conventional", please send them to me. There are a few examples of other looks in XScreenSaver.ad.)
Reason #2 is much more valid; it's true that it is very hard to make accessibility tools work without using a toolkit.
However, converting that dialog to use a toolkit turns out to be a very difficult thing to do securely.
And if the screen locker is not secure, then it's better to not lock the screen at all: giving the impression of security when there is no actual security is far worse than having no security at all. It's a matter of expectations: if people don't expect to be able to lock their screens, they'll log out. But if they expect to be able to lock their screens and it doesn't actually work, then they're screwed.
Minimal library usage by the xscreensaver daemon.
Let's suppose that down in the bowels of some particular version of some particular toolkit library, there lurks a bug. Let's suppose that the nature of this bug is something relatively obscure: say that it's something like, if you hold down 5 keys on the keyboard for 10 seconds then drag the middle mouse button, the text entry widget gets a SEGV. (In fact, I'm not making this up: I saw this very bug once, years ago.)
Now, that's the sort of bug that is not likely to be noticed or fixed, because it's the sort of thing that people "never" do. If that bug was reported against, say, a web browser, nobody would much care: User: "I can crash my web browser by doing this crazy thing!" Developer: "Uh, don't do that then." And that's not a totally unreasonable response.
However, in the context of security software, it matters, because then it's not merely a cute trick that crashes the program: now it's a backdoor password that unlocks the screen.
Bugs like that will exist in GUI libraries; it's inevitable. The libraries are big, and do many different things. So one way to protect against that problem is to keep the number of libraries used by the xscreensaver daemon to an absolute minimum.
Today, the only libraries that are actually linked into the xscreensaver daemon are Xlib and dependent libraries; and the few crypto-related libraries needed to determine whether a typed password is, in fact, the right one.
That's why I implemented the unlock dialog using only Xlib: not because I think Xlib is a good way to write user interfaces, but because I think this was the safest way. The amount of code in Xlib is very small, and has been extensively security audited. It is very unlikely that there are crashing bugs lurking in Xlib itself. The same cannot be said for larger, more featureful libraries. So, by making minimal use of Xlib (the dialog box is drawn using only the lowest level text-printing and rectangle-drawing routines) we can keep the code path short and auditable.
I am as close to certain as I can be that there is no action a user can take on their input devices that will cause the current Xlib-based lock dialog in xscreensaver to unlock. That's because it's a small amount of code that I have stared at and tested for a very long time. It is a small enough piece of code that I (believe I) know every possible path through it.
Introduce N layers of widget library, general text field handling, compose processing, input methods, I18N... and all bets are off. Who knows what bugs wait lurking in there; who knows which particular combinations of which libraries are a security-bug timebomb.
Let me put that another way:
The GTK and GNOME libraries have never been security-audited to the extent that their maintainers would be willing to make the claim, "under no circumstances will this library ever crash."
One can, within a reasonable doubt, make that claim about libc, or even about Xlib, but not about anything the size of GTK. It's just too big to be sure. This is not a criticism of GTK or GNOME or their authors: it's simply a truth about any piece of software of that size.
All password boxes are not alike.
What happens if a user finds a way to crash gdm by typing noise at the login box? Nothing much: if gdm crashes, there's still nobody logged in (and gdm is probably just automatically restarted.)
But if xscreensaver crashes, the screen is unlocked, and our attacker is now logged in as the person who locked their screen.
Segregation of library usage.
So that suggests one way of making the lock dialog use a toolkit: move the lock dialog out of the xscreensaver daemon process altogether. Make it be a separate program that the daemon invokes in such a way that, if the lock dialog crashes, the daemon neither crashes nor unlocks the screen. If that can be accomplished, then the presence of a crashing bug in the toolkit is no longer a critical security problem.
Splitting out the lock dialog.
In fact, this approach would actually reduce the number of libraries (and thus, lines of code) in the daemon itself, since the daemon would not need to link against things like PAM and crypto. That's a good thing.
So that doesn't sound hard so far, except that the xscreensaver daemon has the keyboard grabbed. It's pretty important that it hold that grab, because otherwise keystrokes tend to go "through" the xscreensaver window and reach random desktop windows underneath.
This begs the question of, how do the keystrokes get to the unlock dialog at all? That's a difficult question. Understanding how to do that right requires a lot of knowledge about X (which I have) but also probably a lot of knowledge about foreign-language input methods and screen readers and other accessibility-ware (which I do not have.)
Since xscreensaver blanks the screen using an override-redirect window (that is, a window that is not under window manager control) it is necessary for it to grab the keyboard and mouse. Without that, the window manager would continue to send events to whatever window last had focus. That's obviously a bad thing, as it results in people typing their passwords into IRC channels.
In the current system, where the same process is the creator of both the screen-blanking window and the unlock dialog, this is not a problem: that process gets all the events it wants. But when they are in different processes, we need a way for the keyboard and mouse events to get to the process driving the unlock dialog. So you'd like to transfer the grabs from the xscreensaver daemon to the unlock dialog, and then transfer them back afterward. Unfortunately, there is no way to transfer grabs atomically in X. The best you could do is this:
|daemon process:||dialog process:|
|1:||spawns dialog process|
|2:||loops, repeatedly trying to grab kbd and mouse|
|5:||waits for dialog process to exit|
|6:||exits (or crashes) thus ungrabbing|
|7:||re-grabs kbd and mouse|
There are two race conditions here: between steps 3 and 4, and between steps 6 and 7. In those periods the keyboard and mouse are (briefly) not grabbed at all. There are two bad things that could happen there. First, the keyboard or mouse might become grabbed by some third program (a popup window from something else on the desktop, for example.) Or second, the user might type a keystroke at just the wrong time. That keystroke would then go to neither xscreensaver nor the unlock dialog, but instead to some other window on the desktop.
So, that'd be bad.
Another possibility is for the xscreensaver daemon to keep its grabs, meaning that all keyboard and mouse events would go to it; but then for it to use XSendEvent() to generate synthetic events on the lock dialog window. That is, the xscreensaver daemon would read a KeyPress, and then would simulate an exact duplicate of that KeyPress on the lock dialog window.
There are a few potential problems with this approach:
Remember, when the screen is locked, the window manager is not involved at all: it doesn't see any of these windows. The windows do not have title bars. You can't click and drag them around. You can't click to change focus. Your favorite window-management keystrokes don't work.
If it's necessary to have full window management of arbitrary windows in order for the accessibility tools to work, well, that's a big problem...
As of XScreenSaver version 6, released in 2021, the unlock dialog has in fact been split out into its own process. How? It turns out that by using the "XInput2" server extension, one can snoop the keyboard even while another app has the keyboard grabbed! This does not strike me as necessarily a good design decision on the part of the XInput2 developers, but that's how it is, so XScreenSaver now takes advantage of it. The dialog process does not need to grab the keyboard at all, so the daemon process can retain that grab the whole time, avoiding any race conditions.
The XInput2 extension was first released in 2009, five years after this document was written, and seventeen years after XScreenSaver was first released. XInput2 only became ubiquitous quite some number of years later, but as of 2021 it is enabled by default on all currently-maintained X11 systems. Splitting out the unlock dialog necessitated dropping support for older X11 systems that don't have XInput2.
However, choosing to make XInput2 a requirement allowed a lot of complicated code to be removed. The "critical section" of XScreenSaver -- the part of XScreenSaver where a crash would cause the screen to unlock -- was reduced by roughly 87%, down to around 1,800 lines.
Making the unlock dialog also be able to take advantage of accessibility tools is probably a lot harder. I don't know how much harder, because I'm not an accessibility expert. But anyone intending to implement that had better be both an expert on accessibility, and well versed in secure X11 programming, because the security implications of getting it wrong would be dire indeed.
Epilogue, 2016: I told you so.
If you are not running xscreensaver on Linux, then it is safe to assume that your screen does not lock. Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Also I remind you that the Turing Police say you have more than two problems, and we live in a magical future where "strings" is exploitable.
It's amazing that anything works at all.
Update, 2019: Almost two decades ago, the engineers at Sun read all of the above, and then said to themselves, "Well that's all very interesting, but we really, really want to link the entire GNOME library stack into xscreensaver, so we're gonna go ahead and do that anyway." Surprise, the thing that I said would happen happened, and in 2019 a privilege escalation was discovered in their forked version of xscreensaver.
Update, 2021: Cinnamon-screensaver got popped, again.