Finding matching braces with regular expressions

Cameron Simpson cs at zip.com.au
Mon Mar 19 06:04:55 UTC 2012


On 18Mar2012 20:19, Bruno Wolff III <bruno at wolff.to> wrote:
| On Sun, Mar 18, 2012 at 23:46:17 +0100,
|    suvayu ali <fatkasuvayu+linux at gmail.com> wrote:
| >
| >I'm trying to write a regular expression that matches function and class
| >definitions in C/C++ and defuns in lisp code. I intend to use it with
| >sed and `git blame'. My first attempt relies on indentation. That
| >obviously breaks rather often.
| 
| Mathematically, regular expressions can't match braces like this to an
| unbounded depth. You might be able to use extensions to common regular
| expression implementations that aren't strictly regular expressions to do this.

In particular, regular expressions are not recursive. Hence not capable
of arbitrarily matching nested constructs. But you _can_ construct one
to match a certain depth. For example, four or five deep probably covers
most things in reasonable code (lisp excluded; that is naturally very
bracket intensive).

If you're using sed you probably want the "extended regular expressions
mode", turned on by -E in GNU sed IIRC.

Personally, for easy of debugging, I would construct the regexp from
smaller pieces in shell. Untested example:

  no_br='[^()]*'                        # no brackets
  upto1="${no_br}|(\(${no_br}\))"       # no brackets or "no brackets" in brackets
  upto2="${no_br}|(\(${upto1}\))"
  upto3="${no_br}|(\(${upto2}\))"
  upto4="${no_br}|(\(${upto3}\))"
  sed -E -n "/^${upto4}\$/!p" <blame-data >blame-bad-bracketing

| It might be easier to write your own parser. There are tools, like flex
| and bison to help with this.

Indeed. Or you could write a hand rolled recursive descent parser in
what ever language you like provided it makes character by character
access fairly easy (C, python, etc; not awk or sed).

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

-- compunerd
GNU Emacs is a LISP operating system disguised as a word processor.
        - Doug Mohney, in comp.arch


More information about the users mailing list