perl and unicode go together like apples and razor blades

That scrmable thing has really been making the rounds: I've seen the text translated into three or four other (human) languages now, not to mention all the people writing their own scripts in their marginalized geek-language du jour.

But my script was malfunctioning for a bunch of people, and I finally figured out why. Fucking Unicode again. If $LANG contains "utf8" (which is the default on recent Red Hat systems), then "^\w" doesn't work right, among other things. Check this out:

    setenv LANG en_US
    echo -n "" | \
    perl -e '$_ = <>; print join (" | ", split (/([^\w]+)/)) . "\n";'

          ===> "foo | . | bar" (right)

    setenv LANG en_US.utf8
    echo -n "" | \
    perl -e '$_ = <>; print join (" | ", split (/([^\w]+)/)) . "\n";'

          ===> "" (wrong!)

It works fine in both cases if you do $_ = "" instead of reading it from stdin.

perl-5.8.0-88, Red Hat 9. Hate.

