> On Fri, Feb 02, 2018 at 11:04:01AM -0500, R. G. Newbury wrote:
> A bug in regx handling???
>
> I am cleaning up some html code
.....
> # grep -h '[0-9]*s[0-9]*">' temp
> Returns the example line with the 's[0-9]">' highlighted.
> Can anyone explain what is happening?. This isn't politics so
the group
> [0-9] should not equal [0-9"#]. Or even [0-9\"\#].
.
Fri, 2 Feb 2018 10:14:37 -0600 From: Chris Adams
<linux(a)cmadams.net>
A * in a regex is "0 or more of the previous", so basically
you are just
matching 's[0-9]*">' (because there will always be at least 0 of the
[0-9] part at the start).
If you really mean "1 or more", you can use an extended regex (the -E
argument to grep/sed) and use + instead of *, so '[0-9]+s[0-9]*">'.
Fri, 02 Feb 2018 16:15:37 +0000 From: Patrick O'Callaghan
In grep, * matches any number of instances, including 0. You want to
use + rather than * to guarantee at least one digit.
Date: Fri, 2 Feb 2018 11:26:02 -0500 > From: Jon
LaBadie<jonfu(a)jgcomp.com>
You are misunderstanding the "*". It means any sequence of
the
associated character including a ZERO length sequence.
So [0-9]*s matches "s (actually just the s) as is is a zero length
sequence of digits followed by an s. When you grep for [0-9]s, there
must be at least one digit before the s (but any extra digits are not
part of the match). Sometimes the sequence [0-9][0-9]*s is useful to
say "one or more digits before the s".
jl
Thanks to all for the quick responses. I *tried* to RTFM but that was
not clear,
even on a re-read. I took [0-9]* as multiple instances of
[0-9] but NOT zero instances..
Geoff