Just gonna leave this regexp here

How to handle emoji:

Where other methods are not available, you can use the following regex (for Unicode 11.0 emoji). For clarity, it escapes all characters that can be invisible or are non-spacing -- otherwise you see some odd constructions like ([♀♂])?+ that are really (\\x{200D}[♀♂]\\x{FE0F})?+.

([©®‼⁉™ℹ↔-↙↩-↪⌨⏏⏭-⏯⏱-⏲⏸-⏺Ⓜ▪-▫▶◀◻-◼☀-☄☎☑☘☠☢-☣☦☪☮-☯☸-☺♀♂♟-♠♣♥-♦♨♻♾⚒⚔-⚗⚙⚛-⚜⚠⚰-⚱⛈⛏⛑⛓⛩⛰-⛱⛴⛷-⛸✂✈-✉✏✒✔✖✝✡✳-✴❄❇❣-❤➡⤴-⤵⬅-⬇〰〽㊗㊙🅰-🅱🅾-🅿🈂🈷🌡🌤-🌬🌶🍽🎖-🎗🎙-🎛🎞-🎟🏍-🏎🏔-🏟🏵🏷🐿📽🕉-🕊🕯-🕰🕳🕶-🕹🖇🖊-🖍🖥🖨🖱-🖲🖼🗂-🗄🗑-🗓🗜-🗞🗡🗣🗨🗯🗳🗺🛋🛍-🛏🛠-🛥🛩🛰🛳]\\x{FE0F}|[☝✌-✍🕴🖐][\\x{FE0F}🏻-🏿]|[✊-✋🎅🏂🏇👂-👃👆-👐👦-👧👰👲👴-👶👸👼💃💅💪🕺🖕-🖖🙌🙏🛀🛌🤘-🤜🤞-🤟🤰-🤶🦵-🦶🧑-🧕]([🏻-🏿])?+|🇦[🇨-🇬🇮🇱-🇲🇴🇶-🇺🇼-🇽🇿]|🇧[🇦-🇧🇩-🇯🇱-🇴🇶-🇹🇻-🇼🇾-🇿]|🇨[🇦🇨-🇩🇫-🇮🇰-🇵🇷🇺-🇿]|🇩[🇪🇬🇯-🇰🇲🇴🇿]|🇪[🇦🇨🇪🇬-🇭🇷-🇺]|🇫[🇮-🇰🇲🇴🇷]|🇬[🇦-🇧🇩-🇮🇱-🇳🇵-🇺🇼🇾]|🇭[🇰🇲-🇳🇷🇹-🇺]|🇮[🇨-🇪🇱-🇴🇶-🇹]|🇯[🇪🇲🇴-🇵]|🇰[🇪🇬-🇮🇲-🇳🇵🇷🇼🇾-🇿]|🇱[🇦-🇨🇮🇰🇷-🇻🇾]|🇲[🇦🇨-🇭🇰-🇿]|🇳[🇦🇨🇪-🇬🇮🇱🇴-🇵🇷🇺🇿]|🇴🇲|🇵[🇦🇪-🇭🇰-🇳🇷-🇹🇼🇾]|🇶🇦|🇷[🇪🇴🇸🇺🇼]|🇸[🇦-🇪🇬-🇴🇷-🇹🇻🇽-🇿]|🇹[🇦🇨-🇩🇫-🇭🇯-🇴🇷🇹🇻-🇼🇿]|🇺[🇦🇬🇲-🇳🇸🇾-🇿]|🇻[🇦🇨🇪🇬🇮🇳🇺]|🇼[🇫🇸]|🇽🇰|🇾[🇪🇹]|🇿[🇦🇲🇼]|[\\#\\*0-9]\\x{FE0F}\\x{20E3}|🏳\\x{FE0F}(\\x{200D}🌈)?+|[👯🤼🧞-🧟](\\x{200D}[♀♂]\\x{FE0F})?+|[⛹🏋-🏌🕵][\\x{FE0F}🏻-🏿](\\x{200D}[♀♂]\\x{FE0F})?+|[🏃-🏄🏊👮👱👳👷💁-💂💆-💇🙅-🙇🙋🙍-🙎🚣🚴-🚶🤦🤷-🤹🤽-🤾🦸-🦹🧖-🧝]((\\x{200D}[♀♂]\\x{FE0F}|[🏻-🏿](\\x{200D}[♀♂]\\x{FE0F})?+))?+|👁\\x{FE0F}(\\x{200D}🗨\\x{FE0F})?+|🏴((\\x{200D}☠\\x{FE0F}|\\x{E0067}\\x{E0062}((\\x{E0065}\\x{E006E}\\x{E0067}\\x{E007F}|\\x{E0073}\\x{E0063}\\x{E0074}\\x{E007F}|\\x{E0077}\\x{E006C}\\x{E0073}\\x{E007F}))))?+|👨(([🏻-🏿](\\x{200D}(([⚕-⚖✈]\\x{FE0F}|[🌾🍳🎓🎤🎨🏫🏭💻-💼🔧🔬🚀🚒🦰-🦳])))?+|\\x{200D}(([⚕-⚖✈]\\x{FE0F}|👦(\\x{200D}👦)?+|👧(\\x{200D}[👦-👧])?+|[👨-👩]\\x{200D}((👦(\\x{200D}👦)?+|👧(\\x{200D}[👦-👧])?+))|❤\\x{FE0F}\\x{200D}((💋\\x{200D}👨|👨))|[🌾🍳🎓🎤🎨🏫🏭💻-💼🔧🔬🚀🚒🦰-🦳]))))?+|👩(([🏻-🏿](\\x{200D}(([⚕-⚖✈]\\x{FE0F}|[🌾🍳🎓🎤🎨🏫🏭💻-💼🔧🔬🚀🚒🦰-🦳])))?+|\\x{200D}(([⚕-⚖✈]\\x{FE0F}|👦(\\x{200D}👦)?+|👧(\\x{200D}[👦-👧])?+|👩\\x{200D}((👦(\\x{200D}👦)?+|👧(\\x{200D}[👦-👧])?+))|❤\\x{FE0F}\\x{200D}((💋\\x{200D}[👨-👩]|[👨-👩]))|[🌾🍳🎓🎤🎨🏫🏭💻-💼🔧🔬🚀🚒🦰-🦳]))))?+|[⌚-⌛⏩-⏬⏰⏳◽-◾☔-☕♈-♓♿⚓⚡⚪-⚫⚽-⚾⛄-⛅⛎⛔⛪⛲-⛳⛵⛺⛽✅✨❌❎❓-❕❗➕-➗➰➿⬛-⬜⭐⭕🀄🃏🆎🆑-🆚🈁🈚🈯🈲-🈶🈸-🈺🉐-🉑🌀-🌠🌭-🌵🌷-🍼🍾-🎄🎆-🎓🎠-🏁🏅-🏆🏈-🏉🏏-🏓🏠-🏰🏸-🐾👀👄-👅👑-👥👪-👭👹-👻👽-💀💄💈-💩💫-📼📿-🔽🕋-🕎🕐-🕧🖤🗻-🙄🙈-🙊🚀-🚢🚤-🚳🚷-🚿🛁-🛅🛐-🛒🛫-🛬🛴-🛹🤐-🤗🤝🤠-🤥🤧-🤯🤺🥀-🥅🥇-🥰🥳-🥶🥺🥼-🦢🦰-🦴🦷🧀-🧂🧐🧠-🧿])

Dooming us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane:

Different programs may not identify emoji as being the same, which can clearly cause problems. For example, hashtags are commonly treated as identical even if they differ by case (#foo = #FOO = #Foo = #fOO). But hashtags #🎅🏿 and #🎅🏻 and #🎅 may be treated the same on some systems (≅ #santa, but language-neutral), but treated differently on others.

Previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously.

Tags: , , , , ,

13 Responses:

  1. J. Peterson says:

    This post is live for a full half hour without a "...now you have two problems." response?

    Fixing that.

  2. ctag says:

    Still waiting for a system-wide solution that replaces all emoji with bb-code style descriptions :thinking-face:

    • MattyJ says:

      I'd settle for an iOS-wide herp-derp toggle button.

    • Karellen says:

      So, take the unicode character name for each code point, convert it to lower case, replace spaces with hyphens, and render that as the glyph?

      That sounds like it actually ought to be doable, somewhere in the font rendering guts.

  3. thielges says:

    “For clarity...”

    Hmmmph.

    In any case I’ll cease bragging about a regexp that can “parse” verilog netlists.

  4. wtf says:

    This is reasonable. 2

  5. James says:

    (╯°□°)╯︵ ┻━┻)

    • jwz says:

      Unmatched ) in regex; marked by <-- HERE in m/(╯°□°)╯︵ ┻━┻) <-- HERE / at -e line 1.

  6. Steve Allen says:

    Second footnote in the link "The ZWJ stands for Zero-Width Joiner. "
    ZWJ. JWZ.
    Now I need someone to tell me "I honestly think you ought to sit down calmly, take a stress pill and think things over."

  7. When I got to this bit of glyph translation...

    woman, dark skin, ZWJ,
    man, light skin, ZWJ,
    boy, medium skin, ZWJ,
    girl, dark skin

    ...I began to suspect this might be a deliberate JWZ banishing ritual.

  • Previously