[Bug 978233] New: perl-5.18: Regex \8 and \9 after literals no longer work

bugzilla at redhat.com bugzilla at redhat.com
Wed Jun 26 07:01:31 UTC 2013


https://bugzilla.redhat.com/show_bug.cgi?id=978233

            Bug ID: 978233
           Summary: perl-5.18: Regex \8 and \9 after literals no longer
                    work
           Product: Fedora
           Version: rawhide
         Component: perl
          Severity: unspecified
          Priority: unspecified
          Assignee: mmaslano at redhat.com
          Reporter: ppisar at redhat.com
        QA Contact: extras-qa at fedoraproject.org
                CC: cweyl at alumni.drew.edu, iarnell at gmail.com,
                    jplesnik at redhat.com, kasal at ucw.cz, lkundrak at v3.sk,
                    mmaslano at redhat.com,
                    perl-devel at lists.fedoraproject.org, ppisar at redhat.com,
                    psabata at redhat.com, rc040203 at freenet.de,
                    tcallawa at redhat.com

There is a regression about \8 and \9 back-references not working since
v5.17.0-543-g726ee55. This has been somewhat fixed with:

commit f1e1b256c5c1773d90e828cca6323c53fa23391b
Author: Yves Orton <demerphq at gmail.com>
Date:   Tue Jun 25 21:01:27 2013 +0200

    Fix rules for parsing numeric escapes in regexes

    Commit 726ee55d introduced better handling of things like \87 in a
    regex, but as an unfortunate side effect broke latex2html.

    The rules for handling backslashes in regexen are a bit arcane.

    Anything starting with \0 is octal.

    The sequences \1 through \9 are always backrefs.

    Any other sequence is interpreted as a decimal, and if there
    are that many capture buffers defined in the pattern at that point
    then the sequence is a backreference. If however it is larger
    than the number of buffers the sequence is treated as an octal digit.

    A consequence of this is that \118 could be a backreference to
    the 118th capture buffer, or it could be the string "\11" . "8". In
    other words depending on the context we might even use a different
    number of digits for the escape!

    This also left an awkward edge case, of multi digit sequences
    starting with 8 or 9 like m/\87/ which would result in us parsing
    as though we had seen /87/ (iow a null byte at the start) or worse
    like /\x{00}87/ which is clearly wrong.

    This patches fixes the cases where the capture buffers are defined,
    and causes things like the \87 or \97 to throw the same error that
    /\8/ would. One might argue we should complain about an illegal
    octal sequence, but this seems more consistent with an error like
    /\9/ and IMO will be less surprising in an error message.

    This patch includes exhaustive tests of patterns of the form
    /(a)\1/, /((a))\2/ etc, so that we dont break this again if we
    change the logic more.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=wixX3ZHmwA&a=cc_unsubscribe



More information about the perl-devel mailing list