fix common misspellings -- mostly mechanically

Jim Meyering jim at meyering.net
Mon Jun 4 12:06:27 UTC 2012


Even in mature projects full of nit-picky reviewers, it seems there
are always a few misspelled words in comments, documentation, etc.

Here's a recipe for correcting those.  Hook this up as a
"make check" dependent rule to prevent recurrence.

First, get this http://github.com/lyda/misspell-check
and install its misspellings script.

Then, (presuming you use git for VC), run this to
see what you might want to change[*]:

  git ls-files|misspellings -f -|perl -nl \
    -e '/^(.*?)\[(\d+)\]: (\w+) -> "(.*?)"$/ or next;' \
    -e '($file,$n,$l,$r)=($1,$2,$3,$4); $q="'\''"; $r=~s/$q/$q\\$q$q/g;'\
    -e 'print "sed -i $q${n}s!$l!$r!$q $file"'

It massages the misspellings output into sed -i invocations.
Here are the ones from autoconf/master:

    sed -i '104s!Stange!Strange!' AUTHORS
    sed -i '175s!Propogate!Propagate!' ChangeLog.1
    sed -i '673s!occurences!occurrences!' ChangeLog.2
    sed -i '6973s!Accomodate!Accommodate!' ChangeLog.3

If you look at the AUTHORS file, you see that the first is a false
positive: we don't want to change the name of a contributor.
However, the three remaining commands fix legitimate spelling errors,
albeit only in ChangeLog files.

Remember that this is a naive tool.  Be sure to review each
change carefully, and *in context*.  It has no clue about the
boundaries between code and comments.  For example, you must
be careful that it does not change grammar tokens or variable
names like "THRU" or "UPTO" to THROUGH or "UP TO".

Jim

[*] The misspellings script appears to be intended to do some
of this itself, but so far, it hasn't worked for me.
Note that for a typo like "cant", you'll be presented with this
sed command:

  sed -i '16s!cant!cannot","can not","can'\''t!' m4/nullsort.m4

While the perl filter was careful to escape the single quote in the RHS
of the substitution, it will not choose which of those three
alternatives to use.  Search for "," in the list of sed commands
to identify ones with more than one alternative spelling.
For each of those, you must manually select the word that you prefer.

Once you've done that, you can save the sed commands to a file, say
K, and apply their changes with "bash K".


More information about the devel mailing list