Where other methods are not available, you can use the following regex (for Unicode 11.0 emoji). For clarity, it escapes all characters that can be invisible or are non-spacing -- otherwise you see some odd constructions like ([♀♂])?+ that are really (\\x{200D}[♀♂]\\x{FE0F})?+.
([©®‼⁉™ℹ↔-↙↩-↪⌨⏏⏭-⏯⏱-⏲⏸-⏺Ⓜ▪-▫▶◀◻-◼☀-☄☎☑☘☠☢-☣☦☪☮-☯☸-☺♀♂♟-♠♣♥-♦♨♻♾⚒⚔-⚗⚙⚛-⚜⚠⚰-⚱⛈⛏⛑⛓⛩⛰-⛱⛴⛷-⛸✂✈-✉✏✒✔✖✝✡✳-✴❄❇❣-❤➡⤴-⤵⬅-⬇〰〽㊗㊙🅰-🅱🅾-🅿🈂🈷🌡🌤-🌬🌶🍽🎖-🎗🎙-🎛🎞-🎟🏍-🏎🏔-🏟🏵🏷🐿📽🕉-🕊🕯-🕰🕳🕶-🕹🖇🖊-🖍🖥🖨🖱-🖲🖼🗂-🗄🗑-🗓🗜-🗞🗡🗣🗨🗯🗳🗺🛋🛍-🛏🛠-🛥🛩🛰🛳]\\x{FE0F}|[☝✌-✍🕴🖐][\\x{FE0F}🏻-🏿]|[✊-✋🎅🏂🏇👂-👃👆-👐👦-👧👰👲👴-👶👸👼💃💅💪🕺🖕-🖖🙌🙏🛀🛌🤘-🤜🤞-🤟🤰-🤶🦵-🦶🧑-🧕]([🏻-🏿])?+|🇦[🇨-🇬🇮🇱-🇲🇴🇶-🇺🇼-🇽🇿]|🇧[🇦-🇧🇩-🇯🇱-🇴🇶-🇹🇻-🇼🇾-🇿]|🇨[🇦🇨-🇩🇫-🇮🇰-🇵🇷🇺-🇿]|🇩[🇪🇬🇯-🇰🇲🇴🇿]|🇪[🇦🇨🇪🇬-🇭🇷-🇺]|🇫[🇮-🇰🇲🇴🇷]|🇬[🇦-🇧🇩-🇮🇱-🇳🇵-🇺🇼🇾]|🇭[🇰🇲-🇳🇷🇹-🇺]|🇮[🇨-🇪🇱-🇴🇶-🇹]|🇯[🇪🇲🇴-🇵]|🇰[🇪🇬-🇮🇲-🇳🇵🇷🇼🇾-🇿]|🇱[🇦-🇨🇮🇰🇷-🇻🇾]|🇲[🇦🇨-🇭🇰-🇿]|🇳[🇦🇨🇪-🇬🇮🇱🇴-🇵🇷🇺🇿]|🇴🇲|🇵[🇦🇪-🇭🇰-🇳🇷-🇹🇼🇾]|🇶🇦|🇷[🇪🇴🇸🇺🇼]|🇸[🇦-🇪🇬-🇴🇷-🇹🇻🇽-🇿]|🇹[🇦🇨-🇩🇫-🇭🇯-🇴🇷🇹🇻-🇼🇿]|🇺[🇦🇬🇲-🇳🇸🇾-🇿]|🇻[🇦🇨🇪🇬🇮🇳🇺]|🇼[🇫🇸]|🇽🇰|🇾[🇪🇹]|🇿[🇦🇲🇼]|[\\#\\*0-9] \\x{FE0F} \\x{20E3}|🏳 \\x{FE0F}(\\x{200D}🌈)?+|[👯🤼🧞-🧟](\\x{200D}[♀♂]\\x{FE0F})?+|[⛹🏋-🏌🕵][\\x{FE0F}🏻-🏿](\\x{200D}[♀♂] \\x{FE0F})?+|[🏃-🏄🏊👮👱👳👷💁-💂💆-💇🙅-🙇🙋🙍-🙎🚣🚴-🚶🤦🤷-🤹🤽-🤾🦸-🦹🧖-🧝]((\\x{200D}[♀♂]\\x{FE0F}|[🏻-🏿](\\x{200D}[♀♂]\\x{FE0F})?+))?+|👁\\x{FE0F}(\\x{200D}🗨\\x{FE0F})?+|🏴((\\x{200D}☠\\x{FE0F}|\\x{E0067}\\x{E0062}((\\x{E0065} \\x{E006E}\\x{E0067} \\x{E007F}|\\x{E0073} \\x{E0063} \\x{E0074} \\x{E007F}| \\x{E0077} \\x{E006C} \\x{E0073} \\x{E007F}))))?+|👨(([🏻-🏿](\\x{200D}(([⚕-⚖✈]\\x{FE0F}|[🌾🍳🎓🎤🎨🏫🏭💻-💼🔧🔬🚀🚒🦰-🦳])))?+|\\x{200D}(([⚕-⚖✈] \\x{FE0F}|👦(\\x{200D}👦)?+|👧(\\x{200D}[👦-👧])?+|[👨-👩] \\x{200D}((👦(\\x{200D}👦)?+|👧(\\x{200D}[👦-👧])?+))|❤\\x{FE0F}\\x{200D}((💋\\x{200D}👨|👨))|[🌾🍳🎓🎤🎨🏫🏭💻-💼🔧🔬🚀🚒🦰-🦳]))))?+|👩(([🏻-🏿](\\x{200D}(([⚕-⚖✈]\\x{FE0F}|[🌾🍳🎓🎤🎨🏫🏭💻-💼🔧🔬🚀🚒🦰-🦳])))?+|\\x{200D}(([⚕-⚖✈]\\x{FE0F}|👦(\\x{200D}👦)?+|👧(\\x{200D}[👦-👧])?+|👩\\x{200D}((👦(\\x{200D}👦)?+|👧(\\x{200D}[👦-👧])?+))|❤\\x{FE0F}\\x{200D}((💋\\x{200D}[👨-👩]|[👨-👩]))|[🌾🍳🎓🎤🎨🏫🏭💻-💼🔧🔬🚀🚒🦰-🦳]))))?+|[⌚-⌛⏩-⏬⏰⏳◽-◾☔-☕♈-♓♿⚓⚡⚪-⚫⚽-⚾⛄-⛅⛎⛔⛪⛲-⛳⛵⛺⛽✅✨❌❎❓-❕❗➕-➗➰➿⬛-⬜⭐⭕🀄🃏🆎🆑-🆚🈁🈚🈯🈲-🈶🈸-🈺🉐-🉑🌀-🌠🌭-🌵🌷-🍼🍾-🎄🎆-🎓🎠-🏁🏅-🏆🏈-🏉🏏-🏓🏠-🏰🏸-🐾👀👄-👅👑-👥👪-👭👹-👻👽-💀💄💈-💩💫-📼📿-🔽🕋-🕎🕐-🕧🖤🗻-🙄🙈-🙊🚀-🚢🚤-🚳🚷-🚿🛁-🛅🛐-🛒🛫-🛬🛴-🛹🤐-🤗🤝🤠-🤥🤧-🤯🤺🥀-🥅🥇-🥰🥳-🥶🥺🥼-🦢🦰-🦴🦷🧀-🧂🧐🧠-🧿])
Dooming us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane:
Different programs may not identify emoji as being the same, which can clearly cause problems. For example, hashtags are commonly treated as identical even if they differ by case (#foo = #FOO = #Foo = #fOO). But hashtags #🎅🏿 and #🎅🏻 and #🎅 may be treated the same on some systems (≅ #santa, but language-neutral), but treated differently on others.
Previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously.
This post is live for a full half hour without a "...now you have two problems." response?
Fixing that.
We're long past a mere two problems IMHO. Now you have U+1F4A9 PILE OF POO problems...
How many Unicode planes can you embed populations of competing Universal Turing Machines?
Still waiting for a system-wide solution that replaces all emoji with bb-code style descriptions :thinking-face:
I'd settle for an iOS-wide herp-derp toggle button.
So, take the unicode character name for each code point, convert it to lower case, replace spaces with hyphens, and render that as the glyph?
That sounds like it actually ought to be doable, somewhere in the font rendering guts.
“For clarity...”
Hmmmph.
In any case I’ll cease bragging about a regexp that can “parse” verilog netlists.
This is reasonable. 2
(╯°□°)╯︵ ┻━┻)
Unmatched ) in regex; marked by <-- HERE in m/(╯°□°)╯︵ ┻━┻) <-- HERE / at -e line 1.
Second footnote in the link "The ZWJ stands for Zero-Width Joiner. "
ZWJ. JWZ.
Now I need someone to tell me "I honestly think you ought to sit down calmly, take a stress pill and think things over."
When I got to this bit of glyph translation...
...I began to suspect this might be a deliberate JWZ banishing ritual.