Unicode Consortium: 214 characters from legacy computers and teletext that were proposed by the Terminals Working Group were just accepted:

Previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously.

Tags: , ,

32 Responses:

  1. This is fantastic. I was actually going to petition Unicode for 16 characters that look just like row 1FB7.

    I am still hoping to petition them for inclusion of the Andorsand later this year.

  2. MattyJ says:

    Five emoji should be enough for anyone.

  3. James says:

    Wow, I haven't seen those 63 on the left since TRS-80 Level II BASIC.

  4. Brad says:

    And yet the Atari ST 'Bob Dobbs' characters aren't in Unicode yet.

    I'll settle for a Bob Dobbs emoji.

    • James C. says:

      Section 7 of the proposal:

      Not all characters identified in the target platforms were deemed suitable for encoding. For example, the character set for Atari 16-bit machines included two characters for the left and right halves of the Atari logo, and four which could be arranged to form an image of the fictional character J.R. “Bob” Dobbs (see Wikipedia article). Both of these symbols, like the existing Apple logo, were determined to be IP-encumbered and thus are not proposed here.

    • Doug Ewell says:

      It's copyrighted; we couldn't encode it. Same for the Atari logo.

      • Apple and Linux distros use the private use areas for that. That's why you have Apple logos on Macs, Ubuntu, Debian and Fedora (at least).

      • Doctor Memory says:

        Is there a process for getting licenses for copyrighted glyphs? I can't imagine that the SubGenius Foundation would object to a permanent unicode home for "Bob".

        Or is this one of those "you'd have to do the magic licensing invocation ritual in every country/treaty area in the world and therefore only multinational corps with 8-figure legal budgets are empowered to contribute to the public domain" deals?

  5. Karellen says:

    I'm confused about U+1FBF0 - U+1FBF9. Don't they already exist at U+30 - U+39?

    Or have Unicode started including existing characters in different typefaces now? Are we going to get separate serif and sans-serif codepoints for the existing U+41 - U+5A and U+61 - U+7A?

    If not, and typefaces are staying distinct from codepoints, I'm wondering if the Unicode Consortium has given any thought to what serif versions of U+1FBF0 - U+1FBF9 might look like.

    Also, that table has its axes the wrong way round. Grrrr!

    • Arabic numbers are also in :

      * superscript 0-9 (U+00B2, U+00B2, U+00B9, U+2070, U+2074 - U+2079)
      * Subscript 0-9 (U+2080 - U+2089)
      * Circled digits 1-20 (U+2460 - U+2473)
      * Parenthesized digits 1-20 (U+2474 - U+2487)
      * Digits with full stop 0-20 (U+1F100, U+2488 - U+249B)
      * Double circled digits 1-10 (U+24F5 - U+24FE)
      * Negative circled digits 0, 11-20 (U+24FF, U+24EB - U+24F4
      * Dingbat negative circled digits 1-10 (U+2776 - U+277F)
      * Dingbat circled digits 0-10 (U+1F10B, U+2780 - U+2789)
      * Dingbat negative circled sans-serif digits 0-10 (U+1F10C, U+278A - U+2793)
      * A whole bunch of circled digits in Enclosed CJK
      * Fullwidth 0-9 (U+FF10 - U+FF19)
      * Math bold 0-9 (U+1D7CE - U+1D7D7)
      * Math double-struck 0-9 (U+1D7D8 - U+1D7E1)
      * Math Sans-Serif 0-9 (U+1D7E2 - U+1D7EB)
      * Math Sans-Serif Bold 0-9 (U+1D7EC - U+1D7F5)
      * Math Monospace 0-9 (U+1D7F6 - U+1D7FF)
      * Digit comma 0-9 (U+1F101 - U+1F10A

      • Karellen says:

        Good points.

        Although, I can imagine serif and sans-serif variants of the super/subscript numerals; and I can see a semantic distinction there too.

        Also, the dingbats characters exist to enable lossless round-tripping between pre-existing legacy character sets and unicode, which I can understand the argument for.

        Some of those other groups though - yup, I'm baffled. I suppose there are probably rationale documents out there I could look up if I were really interested, but I'm quite enjoying doing the "old man yells at cloud" bit here, so whatever.

        • Nick Lamb says:

          Round-tripping, which you accepted as a rationale for dingbats, is the usual reason for other weird variants that you'd expect to only exist as new typefaces/ fonts, and in particular U+1FBF0 to U+1FBF9 which existed (as well as "normal" digits) on some Atari systems.

          The other big cause has been mathematics as you can see. Mathematicians need a lot of symbols, they will take any symbols left lying about that weren't nailed down and assign them to mean some particular mathematical thing (or more than one). Presumably it has taken great restraint on their part to not choose the "Danger, biological hazard" symbol to also mean "Member of a set of Martin-Löf random sequences" or something. So, coding just one set of integers, letters, etcetera wasn't enough for the mathematicians.

      • Karellen says:

        I wonder if a future unicode update will include codepoints for Nixie tube digits 0-9, to go alongside those 7-segment LED digits 0-9? And the E13-B MICR digits 0-9 (I note that MICR control characters are already in the standard from U+2440 onwards, but not the digits?). And maybe codepoints for 5×7 dot matrix digits 0-9 too?

        • Doug Ewell says:

          We were able to make a strong case for these. They were used specially in Atari ST apps, and they are a small, closed set. I doubt they set any sort of precedent for anything.

        • tahrey says:

          Well, if you can show us an electronic character set where those particular things are specifically encoded and aren't themselves meant to be direct representations of standard ASCII characters (which I expect numbers in Nixie tubes, the MICR numbers - control codes are something special - and almost everything shown on a 5x7 LCD other than manufacturer specific glyphs would count as), then sure, by all means. Otherwise, I don't believe unicode has anything to do with the medium through which a machine's internal character set is translated to a user-viewable form, or the particular typeface used (for example, an OCR-A-ish one for turning regular numbers + unicode points for the control codes into MICR printout).

  6. Nicholas Riley says:

    MouseText! Including the original two-part running guy. That is amazing.

  7. This is really exciting for everyone working at bridging 80ies home computers and terminals, like video text, minutes or the German Bildschirmtext, to modern systems.

  8. Nate says:

    I'd love to see PETSCII completed. There are so many boxes here that aren't quite right:


    • Can you be more specific? We had very long discussions on this before and after submitting it. A lot of it is already covered in other places (box drawing, 2x2 bricks, etc).

      • Nate says:

        My fault for not reading the linked doc and just referring to the preview on this blog post. I was concerned about things like the line art (PETSCII 0x64-0x68, 0x70, 0x72, 0x74, 0x79 etc.) But I think you've covered it all in the doc. Thanks

    • Doug Ewell says:

      What Ricardo said. You need to define "not quite right." If you mean that the glyphs in the code charts don't look exactly like the Commodore screen or emulator, that's a font issue. Glyphs in code charts are not normative.

  9. Finally, I can break out all my ATASCII BBS load screens!

  10. Darren Embry says:

    Page 31 of that PDF says teletext is still a thing in Romania. Then I learned that it's still a thing in a few other countries too. (Wikipedia)

  11. Doodpants says:

    This is great and all, but even with the block- and box-drawing characters that already exist in Unicode, I haven't found a monospaced font in which they will all draw in uniform width, consistent alignment, and without gaps in between pairs of certain block characters that are supposed to go to the edge of the character space. So when I want to output illustrative diagrams as text, I often end up resorting to crude ASCII art instead.

  12. The Liquid says:

    Can't not love U+1FBC5 to U+1FBC9. They were so missed.

  13. Tinus says:

    Finally Marlboro gets their own emoji!

  14. thielges says:

    Room to expand row 1FBF to support full hex 7 segment LEDs.

    • If you can find an old computer that has that built into its default fonts, it's possible.

      The 7-segment digits are there because the Atari ST computers distinct had codes for them separated from the ASCII 0-9 range and files that used them (text files) would not be properly represented in Unicode.

  15. aaronfranke says:

    Hey, thanks for adding column 1FBF! I was annoyed that I couldn't write my fours the way I wanted to, so I created L2/18-323, which was rejected. But your proposal is even better!

  • Previously