dist-git help wanted: write me a regex!

Bruno Wolff III bruno at wolff.to
Mon Dec 21 06:53:42 UTC 2009


On Mon, Dec 21, 2009 at 00:25:06 -0500,
  James Cassell <fedoraproject at cyberpear.com> wrote:
> 
> This should do it:
> /((Mon|Tues?|Wed|Thu(rs?)?|Fri|Sat|Sun)\s+(Jan|Feb|Mar|Apr|May|June?|July?|Aug|Sep|Oct|Nov|Dec)\s+[0-3]?[0-9]\s+(19|20)[0-9][0-9]\s+[A-Za-z0-9\s]+<[^\s@]+@[^\s@>]+>\s+2.[4-6].[0-9.-]+\s*)/

I don't think this will catch a period in the comment part of the email
address (as people often do after initials). Also if anyone is using hyphenated
names, I don't think those will get picked up. Since those entries are utf-8,
you need to worry about nonascii letters in the name. I am not sure how those
collate compared to ascii letters, but it might be safer to use [^<]+
(instead of [A-Za-z0-9\s]+) since I think being more liberal in what get's
matched is less likely to match something you don't want than being picky
is going to not match something (either that has a typo or unusual characters
in it).




More information about the devel mailing list