[Freeipa-devel] Re: [DESIGN] IPA healthcheck design

Thursday, 25 October 2018

Fraser Tweedale wrote:
...
 On Wed, Oct 24, 2018 at 04:49:21PM -0400, Rob Crittenden via
FreeIPA-devel wrote:
> I started a design of an IPA healthcheck framework at
> https://www.freeipa.org/page/V4/Healthcheck
>
> Have at it.
>
> Note that this concentrates more on how it will work big picture and
> less on individual checks that may be performed. I'm happy to add any
> ideas you come up with for specific tests.
>
> rob
>
 Thanks Rob, feedback below.

 1. I think we should consider promoting the server hostname into the
 object, with attribute name 'ipaErrorHost' (or whatever).  This may
 make some kinds of searches easier, e.g. if you have
 ipa[123].bne.example.com and ipa[123].bos.example.com, and you are
 interested in errors from the bne site, you can search for
 '(ipaErrorHost=*.bne.example.com)'.  We can index the attribute. 
We already have a fqdn attribute in the IPA schema. I'd prefer to re-use
that. It has the eq, pres and sub indices.

...
 It does make some sense to group in per-host subtrees but because
 there is no subtree delete operation a flat container might be worth
 it for the additional search flexibility. 
Yes, I suppose if we specify the master within the entry that is
sufficient. Let's agree on what to call the master and I'll make this
change.

...
 2. Schema and indices:

     - for ipaErrorDateReported and ipaErrorDateResolved, specify:

           EQUALITY generalizedTimeMatch
           ORDERING generalizedTimeOrderingMatch

     - for ipaSeverity specify:
           EQUALITY integerMatch
           ORDERING integerOrderingMatch

     - ipaIgnoreError specify: EQUALITY booleanMatch

     - ipaIgnoreError being MAY is a pitfall.  Assuming absense
       implies "not ignored", searching for:

         (ipaIgnoreError=FALSE)

       will _exclude_ entries without the ipaIgnoreError attribute.
       The correct filter is '(!(ipaIgnoreError=FALSE))'.  Better to
       make it a MUST attribute and exclude this pitfall.

     - We probably want presence index for ipaErrorDateResolved 
Done.

...

 3. Execution; we might want a watchdog to kill checks that take too
 long (for whatever reason).  There'll be some complexity so maybe
 just make a note not to code ourselves into a corner and we can
 defer it. 
Added. I also added a config file so it can be overridden. I think I
need to explore configuration a bit more. Ideally most of the config
would be stored in LDAP (e.g. if you want to disable a whole set of
tests from running).

A local config for timeout is preferred in case LDAP is inaccessible for
some reason.

...
 4. (Comment) regarding the separate repo, I'm not against it but
 there's some interdependency, i.e. HC will depend on a lot of stuff
 from ipalib, but the IPA healthcheck plugin will also depend on
 stuff defined by HC.  What bits will live where is not fully clear.
 We might have to work it out as we go. 
I'm not dead set on this but it might be nice and a check on the
developer API changing. I added a bit more verbiage.

...
 5. CLI: the '--source' option has not been defined.  Does
'--tool'
 mean the same thing?

 6. Terminology: not sure about "source"/"command" (especially
 "command", which could be confusing ("what command failed?") Some
 ideas: command -> item/check/fault.  I don't care about bikeshedding
 the strings, I just want to avoid overloaded/confusing terms. 
PLEASE, bikeshed away! As you can see I'm having a heck of a time coming
up with a good way to specify the group of tests versus an individual
test. This is key to understanding everything so good naming is
important. I'm very open to suggestions on this.

...
 7. CLI: there is some inconsistency with how other IPA commands work
 (not necessarily bad, but it should be justified).  If we follow the
 IPA pattern:

 - `ipa healthcheck-show UUID` would show a single report
 - `ipa healthcheck-find` would have a `--master=HOSTNAME` filter
   option.
 - `--all` would show all attributes, and there would be a separate
   option to show ignored reports (e.g. `include-ignored`).

 So again, we don't have to do it that way, but the current design is
 a deviation from the norm so I think that should be discussed from a
 usability perspective. 
Yes, this is complicated. If we want to drop it, and I'm perfectly ok
with it, we'd have to have extremely atomic, uniquely named individual
tests within a plugin. For example, to check on file ownership one way
to do it would be with a table:

files = [ ('/etc/httpd/alias/key3.db', 'root', 'apache',
'0640'),
  ('/etc/httpd/alias/cert8.db', 'root', 'apache',
'0640'),
 ...
]

for (file, owner, group, mode) in files:
   [ test ]

How would we name a particular failure? This is why I went with UUID.

Similar applies to the certmonger tracking. We have 8 or so tracking
requests by default, if one or more fail we'd report each one
individually but how to name them automatically? I punted.

Honestly I think the -show command will be used more within the UI than
the CLI. The -find command will show the same information.

...
 8. Can a single tool+command combo produced multiple reports for a
 single master, with different ipaErrorMessage key-value pairs?
 Example: file permissions.  Is every possible file to check a
 different tool+command, or is it one tool+command, with potentially
 multiple reports with different ipaErrorMessage parameters? 
Exactly. I imagined a separate report for every single failure.

...
 Consider this from a usability perspective: the resolution is likely
 to be very similar for all the possible instantiations.  Also
 consider how many tool+command combinations there would be if all
 the possible files to check had to have different names.  Lookup
 tables for error message generation and external resources get huge. 
The failure lookup is by the plugin and particular test. I kept them
separate so it is insanely easy to track which ones are resolved and
when (if 3 files have bad perms and are reported in a single LDAP entry
and one is fixed, what do we do)? I looked at it like a transaction file.

...
 OTOH if a single tool+command can produce multiple reports, it
 affects the API/CLI somewhat (e.g. `ipa healthcheck-ignore` must now
 be given the UUID or enough parameters to uniquely identify the
 report to ignore). 
Yes. Icky but necessary.

...
 9. Would be good to include links to external resources etc in
 healthcheck-show.  Also to indicate when 'ipa-healthcheck' may be
 able to repair the issue (may reduce support burden if we can subtly
 encourage the administrator to run the repair tool instead of
 contact support / mailing list). 
I've been a bit vague about working with the user on how to resolve a
particular problem. We have a few obvious options:

1) external documentation: wiki, downstream docs, both?
2) a separate LDAP lookup table

I've added a section on this and some additional schema as a starting
point for discussion.

...
 That's all for now :)  Overall the design is looking good.

Thanks for the feedback

rob

2024

2023

2022

2021

2020

2019

2018

2017

[Freeipa-devel] Re: [DESIGN] IPA healthcheck design