now-you-have-two-problems.el

"I sure am tired of typing extra backslashes in my isearches", I says to myself. "I wonder if anyone has done the unthinkable and implemented perlre in elisp?"

Yes:

(rxt-pcre-to-elisp "(abc|def)\\w+\\d+")
;; => "\\(\\(?:abc\\|def\\)\\)[_[:alnum:]]+[[:digit:]]+"

pcre2el.el:

PCRE has a complicated syntax and semantics, only some of which can be translated into Elisp. The following subset of PCRE should be correctly parsed and converted:

  • parenthesis grouping ( .. ), including shy matches (?: ... )
  • backreferences (various syntaxes), but only up to 9 per expression
  • alternation |
  • greedy and non-greedy quantifiers *, *?, +, +?, ? and ?? (all of which are the same in Elisp as in PCRE)
  • numerical quantifiers {M,N}
  • beginning/end of string \A, \Z
  • string quoting \Q .. \E
  • word boundaries \b, \B (these are the same in Elisp)
  • single character escapes \a, \c, \e, \f, \n, \r, \t, \x, and \octal digits (but see below about non-ASCII characters)
  • character classes [...] including Posix escapes
  • character classes \d, \D, \h, \H, \s, \S, \v, \V both within character class brackets and outside
  • word and non-word characters \w and \W (Emacs has the same syntax, but its meaning is different)
  • s (single line) and x (extended syntax) flags, in regexp literals, or set within the expression via (?xs-xs) or (?xs-xs: .... ) syntax
  • comments (?# ... )

Most of the more esoteric PCRE features can't really be supported by simple translation to Elisp regexps. These include the different lookaround assertions, conditionals, and the "backtracking control verbs" (* ...)

I am sad to report, however, that I can't get it to work in either xemacs 21.5.28 or emacs 22.1.1. (And MacPorts won't let you install emacs and xemacs simultaneously! How partisan!)

Amusing as it would have been had they tried to translate from one regexp syntax to another using regexps... they did not do that.

Also, my heart grew three sizes when I saw that the first line of the file contains -*- lexical-binding: t -*-

Previously, previously, previously, previously, previously, previously, previously, previously.

Tags: , , , , ,

16 Responses:

  1. Philip Guenther says:

    So much for hoping it included a full perl->elisp translator to support perl5's (?{ code }) syntax.

    • jwz says:

      Well obviously the code part there should be elisp.

      • Philip Guenther says:

        Embedding elisp should obviously instead be (?( code ))

        ...which perl already gives a different meaning to, damn it.

        (Unrelated: the 'reply' button in the email from jwz blog robot auto-fills the email address, but not the Name field. Could it do the latter as well?)

  2. cms says:

    It's been a few years since I was a user, but I think MacPorts will let you have different emacses installed if they are defined as port variants, but only one active at a time. I recall it being not too much trouble to have several local variants defined.

  3. Eric Larson says:

    Whoa! That is an old version of Emacs. M-x version gives me GNU Emacs 27.0.50 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.22.25) of 2018-04-28.

    On my mac I used brew to keep up to date. Any reason to prefer ports?

    • Steve Allen says:

      I concur. I've moved from MacPorts to HomeBrew without regret.

    • margaret says:

      now-you-have-two-problems.el++

    • jwz says:

      Emacs (and XEmacs) are not exactly a moving target these days. I don't typically feel the need to ever upgrade them. In fact, upgrading them means something might change, and if anything changes then my muscle memory is off, so no thank you very much.

      MacPorts has more recent versions, but as I said, it got persnickety at me when I tried to install both.

      (I have never used HomeBrew, however I have not moved from MacPorts to HomeBrew without regret.)

      • Eric Larson says:

        Fair enough! I suppose you're likely long past the days of declaring emacs bankruptcy and would want hop on the emacs packaging bandwagon, so that definitely makes sense.

      • J Greely says:

        Oh, yeah. Every time I forget to pin Emacs in Homebrew and it upgrades, I discover something new that I need to turn off in my .emacs file. The most recent surprise was isearch-lax-whitespace and search-whitespace-regexp, which "fix" each literal space character in your searches to instead match one or more Unicode whitespace characters (including CR, LF, etc). Awesome for hosing keyboard macros, especially when combined with line-move-visual!

        -j

      • Come to the dark side. We have lexical binding.

  4. Unter Larrson says:

    I do not understand this post, and I do not want to understand this post.

  5. Luís says:

    Lexical binding was introduced in Emacs 24, IIRC, so pcre2el.el requires Emacs >= 24. Another great package by Jon Oddie, if you're into Emacs Lisp (or Common Lisp) debugging is macrostep.

  6. Actually, using real perl 5 expressions -- I doubt the imitators have got there, though I wouldn't know -- it actually is possible to parse html with regular expressions because now they can handle recursive matching.

    It would however, not be the greatest of ideas, unless you want to inflict lovecraftian dread upon a certain kind of purist...

    And perl 6 grammars look like they're capable of anything, and should probably be deployed against the Russians.

  7. Chris Hanson says:

    Why not just use CL-PPCRE via one of the Common Lisp compatibility packages?

    (I’ll see myself out…)

  • Previously