Stupid Linux Crap!

Hey kids! It's been a while since I've posted questions about stupid Linux crap! That's because I've been ignoring Linux for a very, very long time.

Well, I decided (for a number of individually not very good reasons that, in my head, seemed to add up) to try and upgrade the OS on the DNA Lounge kiosks.

<LJ-CUT text="Dear Lazyweb... --More--(13%) ">

So, I went from vintage-2003 Red Hat 9 to Ubuntu 7.10. (I'm sure thousands of you think a different OS would have been a better choice for some reason, no matter which OS I picked. So let's just skip that part.) It's mostly working now, but it was a typically-monumental pain in the ass (most of which I attribute to changes in LTSP rather than the distro itself).

Here, then, are the problems I'm having, which you will now solve for me:

  1. I'm using LTSP, meaning the kiosks themselves are thin clients, and all the action is happening on the one big server machine. The kiosks themselves keep crashing. As in, freezing up, becoming unpingable. I don't know why. Maybe it's X, maybe it's just random. All I know is, I walk away, come back ten minutes later, and I have to hit the reset button. If something were logged at all, I assume it was to a ramdisk, so I don't even know how to go about diagnosing this. Oh, and the screen is always black when I discover them being all dead (not a frozen screen saver, as you'd expect).
  2. I have a bunch of scripts that reset the kiosks to a known good state. One of the things they do is note that the kiosks are in a bad state (e.g., that certain critical programs are no longer running, like the window manager or the dock) and when it appears that things are "bad", logs the guest user out, so that it will automatically log in again and reset things.

    The way this used to work was, "kill off all processes owned by user guest01, and guest01's X session will terminate, and go back to gdm." But now what happens is, X just keeps running with no applications at all. This is suboptimal. In this modern gdm/XDCMP world, how do I forcibly log a user out and go back to the greeter? ("Kill the X server" is a bad answer, because that's running on a different machine).

  3. I tried to replace the Ubuntu logo in "Usplash" (the boot-up progress bar thingy) with my own. I failed. I did this:

    • Create an 800x600 16-color XPM with magic color pallette ordering.
    • convert usplash-dnalounge.xpm usplash-dnalounge.png
    • pngtousplash usplash-dnalounge.png usplash-dnalounge.png.c
    • gcc -g -c -fPIC -o usplash-dnalounge.o usplash-dnalounge.png.c
    • gcc -g -fPIC -shared -o usplash-dnalounge.o
    • mv /usr/lib/usplash/
    • Run the startupmanager GUI
      • Add ""
      • Select ""

    What happens when I reboot is, I get the scrolling-text non-usplash mode. Which I guess means something is missing and it is falling back to that in confusion. Nothing is logged.

  4. On Red Hat 9, I was able to make the old Netscape a.out binaries from the early 1990s fully functional by doing this:

    • /etc/modules.conf: alias binfmt-0064 binfmt_aout
    • Install aout-libs-1.4-9 RPM from Red Hat 5.2
    • Install RPM from Red Hat 6.2
    • Install libc-5.3.12-31 RPM from Red Hat 6.2

    The binfmt module is already there, but how do I acquire the corresponding old ld and libc libraries for Ubuntu?

      Update: Ok, this one I figured out: I copied the contents of the ancient Red Hat RPMs onto Ubuntu, adjusted /etc/ to find them, ran ldconfig, and it works. Whee.

Tags: , , , ,

35 Responses:

  1. evan says:

    2) killall x-session-manager

    (Can find this by running "pstree" and seeing which process owns everything under gdm.)

    • jwz says:

      I believe that "kill x-session-manager" really means "kill the process that was launched by xinit", which in my case is the window manager, metacity (since I'm using something simpler than gnome-session). Either way, I'm killing every pid owned by user guest02 (and they are definitely all dying) so... that's not the answer.

      • evan says:

        Huh! I even tried it locally so I wouldn't have left a "here's a guess" comment.

        I suppose the complication here is XDCMP. Guessing now, but have you checked for stray processes on both the server and client? Maybe the machine with the X server launched some session manager locally before starting all of its X clients off your management machine. (Argh, hate wording reversals when talking about X servers and clients.)

  2. thargol says:

    I walk away, come back ten minutes later, and I have to hit the reset button. If something were logged at all, I assume it was to a ramdisk, so I don't even know how to go about diagnosing this.

    SystemTap is great for looking at this sort of system wide wtf-is-going-on type problem. I'd start with a simple list of each call to execve() and see what was the last thing to run just before the crash. The problem, of course, is getting to see the output. I'd try and coerce it into sending output to syslog (maybe even just piping it into /usr/bin/logger would be enough), so that you can send the logs to a machine that isn't going to hang.

  3. pdkl95 says:

    I tried to replace the Ubuntu logo in "Usplash" (the boot-up progress bar thingy) with my own.

    Wow. All that to replace a splash screen? You have to compile it to a .so? I mean, I can (almost) understand having to make an XPM with a magic color pallette, but converting it to C? Compiling to a shared object? Just so you can show a picture on the screen?

    This is pretty big brain-damage even for what I've come to expect from the gnome/kde/whatever kids.

    As for help - I wish I could... you're being entirely sane here.

    The only things that come to mind are stupid ACPI shit (lilo with "acpi=off") that seems to have become more unstable in recent years. (it likes to conflict with video cards, which MAYBE sounds like your black screen issue.

    As for a.out binaries - I've had great success with "alien" in the past to at least convert .rpm -> .deb, and force installing those compatibility packages. Unfortunately, while this worked great for some other older libraries, I don't think I've tried to get a.out binaries to work sense... well... around that Red Hat 6.2 or so.

    Good luck...

    Your terminals are always appreciated no matter what they look like!

    • evan says:

      Usplash has nothing to do with Gnome or KDE.

    • volkris says:

      No, it's not just a picture on the screen. It's a program that displays something that may or may not move.

      • mark242 says:

        pngtousplash usplash-dnalounge.png usplash-dnalounge.png.c about the most ridiculous command-line entry I've seen in a while.

        I would love to know what was going through the original coder's brain when he decided that reading a config file would be, you know, too hard, and thought that translating a (deliberately-formed) image to C would be easier.

        • jwz says:

          The best part is that an XPM image is already C source code!

          • mark242 says:

            You're right, I did miss that.

            Of course, Google:

            "Your search - xpmtousplash - did not match any documents."

            (oh yeah? It has now!)

          • edouardp says:

            > The best part is that an XPM image is already C source code!

            Pure comedy gold!

            (Admittedly for small values of "pure", "comedy" and "gold", but *I* laughed out loud at least...)

        • volkris says:

          You're assuming the developer was looking to display images; I'm willing to bet that he wasn't.

          What if the user wanted to display a video during startup? Maybe even something interactive?

          The developer could start including the necessary code to decode and display every type of image, video, or other program that could likely be attempted, or he could assert that only certain formats would be allowed. He'd still have to build in the codecs, and some would still criticize him for bloat there.

          But the developer went a different way: he got out of worrying about codecs and additionally gave the users more flexibility by just having the user compile his display into a program. The developer decided that startscreens would be handled the same as themes in most programs, where it's ok if it's not the most user friendly process since end users can apply packages created by third parties.

          This option, like the others, has its benefits and drawbacks, but it's perfectly reasonable. That you and I may disagree with the decision doesn't mean it's wrong.

          • mark242 says:

            I'm sorry-- your reply has caused a buffer overflow in the section of my brain that responds to ridiculous Linux functionality.

            ("Who gives a rat's ass about time to desktop, I've got DEANNA_TROI.mpg.c running on startup!")

            • volkris says:

              Tradeoffs everywhere.

              In this particular case, using a program over a video or image file might actually IMPROVE time to desktop.

  4. bifrosty2k says:

    My usual unhelpful advice is "screw that loonix crap, FreeBSD 4tw".

    It lets you run the BSD/OS binaries of old stuff, while running only the tested new stuff AKA you can not run the janky loonix shit.

  5. jwm says:

    1. Lock ups are usually caused by video drivers, these days. What video hardware are you using?

      Also, could DPMS be the trigger? I'm guessing that you'd turn that off, but perhaps something else is switching it on.

    2. Short of restarting gdm, I'll have to pass
    3. Consider splashy which is in the Universe repository, if no other advice suffices.
    4. libc5 appears to have vanished several releases ago, and isn't even in the current Debian repositories (or Fedora for that matter). It might be possible to resurrect them from an earlier version, though. I'll give it a go.

    • bodyfour says:

      > libc5 appears to have vanished several releases ago, and isn't even in the current Debian repositories (or Fedora for that matter).

      Really? Wow.. I guess it has been awhile.

      Jamie's actually got two separate issues here:

      1. The really old a.out binaries which actually need the shared libc4 This is what the RH5.2 "aout-libs" package was being used for
      2. The libc5 binaries — these are ELF binaries but are compiled against an older version of glibc (what was known in the linux world as "libc5")

      For the latter probably all thats needed is to install the right debian library package (from whatever the last version of debian that shipped them for compatibility) They should co-exist peacefully with libc6.

      Things might be slightly tricker for the a.out libraries. As a first stab I'd just try copying the same files from those first two .rpm's onto the box (you can extract the raw contents using rpm2cpio on a redhat box and then unextract them under Ubuntu)

      • backrowbass says:

        I've had success sending the old libc5 stuff from a Fedora box to a Ubuntu user who was running an ancient binary. Dig the stuff out of the RPM and try installing...

        (You don't need an RPM box to extract the files. Use rpm2cpio from here, which just needs perl and cpio.)

    • jwz says:

      Mostly they are onboard VIA video on ~1GHz-ish boards.

      DPMS could be a culprit, I hadn't thought of that. Now I have to figure out how one turns that off. This thin client world has made X configuration even more insane than normal since they try and auto-generate the config for you (thank you master may I have another).

      Oh, and when you set the magic env variables to override things, they generate config files with syntax errors. Joy.

      • blasdelf says:

        xset -display :0 dpms force on

        I never thought to set it in the X config, I wonder if that works...

      • cheesedaemon says:

        There are several VIA chipsets which will lock the system hard if you attempt DRI on them with X11.

        Here's a relevant bug report:

        I myself have experenced the same behavior with the CN700 chipset, which is not mentioned in that report.

        If you have such a chipset, the most expedient solution is to disable DRI, full stop. There are positively no drivers, not from VIA or anyone else, that will allow you to use DRI along with X11 with these chipsets.

        • jwz says:

          Well, turns out that turning both DRI and DPMS off does not make the crashing stop. I'm getting syslogs on another machine, but predictably, nothing is logged.

          I think that at some point in the past I stopped crashes by just running the vesa driver on all the kiosks, but that doesn't want to run any more. So I'm still using the via driver, which is perhaps a mistake.

          • discogravy says:

            have you tried running the X startup commands with sterr redirected to a network file, or perhaps using "tee" will help with getting logs (or output...)

  6. I'm using LTSP, meaning the kiosks themselves are thin clients, and all the action is happening on the one big server machine. The kiosks themselves keep crashing. As in, freezing up, becoming unpingable. I don't know why. Maybe it's X, maybe it's just random. All I know is, I walk away, come back ten minutes later, and I have to hit the reset button

    You've got the kiosks set up as "thin clients". Are you remotely mounting the (swap) partition from the server? I have seen that configuration used in the past, where everything was mounted from the server and the performance can be dire to say the least. The problem(s) being the amount of data sent to/from the server plus the number of similar clients all add to up poor response times.

    As it's a thin client, you probably don't have any local disk to use for caching, right? The trick then is probably to prune down the procedural memory footprint loaded from the server and/or increase the amount of real memory.

    • blasdelf says:

      You don't *need* swap you know.

      I run netbooted LAN parties in the computer lab I administrate. We boot a kernel off an ISOLINUX cd (you can eject it as soon as the kernel is read from the disk), and a readonly NFS filesystem is set on the kernel command line. Works awesomely, especially with gigabit ethernet. Never have problems with no swap, even when there isn't that much RAM.

  7. allartburns says:

    I have a machine stuck on RedHat 7.2 (STFU, not my fault). Did you pick Ubuntu 10.7 for any particular reason? RH7.2 is so different than some of the modern distros I've looked at that I'm considering migrating to BSD just because it's closer to OSX.

  8. idcmp says:

    Using memtest86 will rule out memory problems super easily.

  9. perligata says:

    Debian packages for old and libc5, but they're not exact matches. It'll fail on installation because the package tries to overwrite libc6's ldd, so you might have to run it in a chroot.

    Doing that with software this old is kind of a pain in the ass, but it can be done: I believe you can use debootstrap with the old release tarballs, such as base1_3.tgz here, but I'm not sure about the actual implementation details.

  10. perligata says:

    Did you try this?: update-alternatives --install /usr/lib/usplash/ /usr/lib/usplash/ 1000


    Actually, there's a lot of info here.

    • jwz says:

      I believe all that crud is what the "startupmanager" gui did.

      Where what I did differs from what that URL says is because it disagreed with what I found in /usr/share/doc/libusplash-dev/examples/Makefile

  11. rodgerd says:

    ...a fucking annoying problem I hit when upgrading an Ubuntu 6.10 box to 7.10 (which is, I assume, what you acutally meant by 10.7).

    Running apps remotely from the box would either lock up or blackscreen the remote X server. Every time. About, oh, 10 minutes or so after starting the app. A bit of footling suggested that Ubuntu was starting a bunch of gnome stuff I hadn't asked for, which fucked up the remote X server. It all worked fine if run locally to the Ubuntu box on it's X server, of course.

    Unfortunately my answer was to reformat and install Fedora 7, which Just Worked using the same apps. I assume there's at least one person in Redhat who's aware X is supposed to be work over a network, not lock the fucking remote server up. And tests before release.

    • nester says:

      I have not been overly happy with Ubuntu, having just inherited a bunch of them in my new job. It seems like they took Debian, added a bunch of hackery to make "ours" and boom.. market.

      This is not "my distro is better than x" just.. fuck them all. I hate linux, but only marginally less than everything else.

      • rodgerd says:

        Digressing frantically from Jamie's question, I found Ubuntu pretty good through to 6.10 with the whole Debian++ promise. 7.10 has been shit enough to drive me back to Fedora, though. There's a fine line between "cool new stuff that works" and "cool new stuff that's broken shit".

        Of course, the standard disclaimer applies: all systems suck.

        • discogravy says:

          FWIW, they did this knowing a bunch of stuff would break. Much like the RedHat 8/9 change where they implemented their own version of GCC 2.95 because no one wanted to be the first distro to switch to GCC 3.x, ubuntu maintainers decided that a bunch of video/desktop upgrades were being held up by everyone playing chicken. So a lot of changes (some cosmetic, like compiz, and some more low-level like the ones above) were made under the umbrella of "it'll sting a bit when we tear the band-aid off but then it'll be all better".