[DESIGN] IPA healthcheck design

older

[freeipa PR#2622][opened]...

[freeipa PR#2617][opened]...

Rob Crittenden

Wednesday, 24 October 2018 Wed, 24 Oct '18

3:49 p.m.

Show replies by date

Fraser Tweedale

Thursday, 25 October Thu, 25 Oct

12:49 a.m.

On Wed, Oct 24, 2018 at 04:49:21PM -0400, Rob Crittenden via FreeIPA-devel wrote:

...

Thanks Rob, feedback below. 1. I think we should consider promoting the server hostname into the object, with attribute name 'ipaErrorHost' (or whatever). This may make some kinds of searches easier, e.g. if you have ipa[123].bne.example.com and ipa[123].bos.example.com, and you are interested in errors from the bne site, you can search for '(ipaErrorHost=*.bne.example.com)'. We can index the attribute. It does make some sense to group in per-host subtrees but because there is no subtree delete operation a flat container might be worth it for the additional search flexibility. 2. Schema and indices: - for ipaErrorDateReported and ipaErrorDateResolved, specify: EQUALITY generalizedTimeMatch ORDERING generalizedTimeOrderingMatch - for ipaSeverity specify: EQUALITY integerMatch ORDERING integerOrderingMatch - ipaIgnoreError specify: EQUALITY booleanMatch - ipaIgnoreError being MAY is a pitfall. Assuming absense implies "not ignored", searching for: (ipaIgnoreError=FALSE) will _exclude_ entries without the ipaIgnoreError attribute. The correct filter is '(!(ipaIgnoreError=FALSE))'. Better to make it a MUST attribute and exclude this pitfall. - We probably want presence index for ipaErrorDateResolved 3. Execution; we might want a watchdog to kill checks that take too long (for whatever reason). There'll be some complexity so maybe just make a note not to code ourselves into a corner and we can defer it. 4. (Comment) regarding the separate repo, I'm not against it but there's some interdependency, i.e. HC will depend on a lot of stuff from ipalib, but the IPA healthcheck plugin will also depend on stuff defined by HC. What bits will live where is not fully clear. We might have to work it out as we go. 5. CLI: the '--source' option has not been defined. Does '--tool' mean the same thing? 6. Terminology: not sure about "source"/"command" (especially "command", which could be confusing ("what command failed?") Some ideas: command -> item/check/fault. I don't care about bikeshedding the strings, I just want to avoid overloaded/confusing terms. 7. CLI: there is some inconsistency with how other IPA commands work (not necessarily bad, but it should be justified). If we follow the IPA pattern: - `ipa healthcheck-show UUID` would show a single report - `ipa healthcheck-find` would have a `--master=HOSTNAME` filter option. - `--all` would show all attributes, and there would be a separate option to show ignored reports (e.g. `include-ignored`). So again, we don't have to do it that way, but the current design is a deviation from the norm so I think that should be discussed from a usability perspective. 8. Can a single tool+command combo produced multiple reports for a single master, with different ipaErrorMessage key-value pairs? Example: file permissions. Is every possible file to check a different tool+command, or is it one tool+command, with potentially multiple reports with different ipaErrorMessage parameters? Consider this from a usability perspective: the resolution is likely to be very similar for all the possible instantiations. Also consider how many tool+command combinations there would be if all the possible files to check had to have different names. Lookup tables for error message generation and external resources get huge. OTOH if a single tool+command can produce multiple reports, it affects the API/CLI somewhat (e.g. `ipa healthcheck-ignore` must now be given the UUID or enough parameters to uniquely identify the report to ignore). 9. Would be good to include links to external resources etc in healthcheck-show. Also to indicate when 'ipa-healthcheck' may be able to repair the issue (may reduce support burden if we can subtly encourage the administrator to run the repair tool instead of contact support / mailing list). That's all for now :) Overall the design is looking good. Cheers, Fraser

Rob Crittenden

10:29 a.m.

Fraser Tweedale wrote:

...

On Wed, Oct 24, 2018 at 04:49:21PM -0400, Rob Crittenden via FreeIPA-devel wrote: > I started a design of an IPA healthcheck framework at > https://www.freeipa.org/page/V4/Healthcheck > > Have at it. > > Note that this concentrates more on how it will work big picture and > less on individual checks that may be performed. I'm happy to add any > ideas you come up with for specific tests. > > rob > Thanks Rob, feedback below. 1. I think we should consider promoting the server hostname into the object, with attribute name 'ipaErrorHost' (or whatever). This may make some kinds of searches easier, e.g. if you have ipa[123].bne.example.com and ipa[123].bos.example.com, and you are interested in errors from the bne site, you can search for '(ipaErrorHost=*.bne.example.com)'. We can index the attribute.

We already have a fqdn attribute in the IPA schema. I'd prefer to re-use that. It has the eq, pres and sub indices.

...

It does make some sense to group in per-host subtrees but because there is no subtree delete operation a flat container might be worth it for the additional search flexibility.

Yes, I suppose if we specify the master within the entry that is sufficient. Let's agree on what to call the master and I'll make this change.

...

2. Schema and indices: - for ipaErrorDateReported and ipaErrorDateResolved, specify: EQUALITY generalizedTimeMatch ORDERING generalizedTimeOrderingMatch - for ipaSeverity specify: EQUALITY integerMatch ORDERING integerOrderingMatch - ipaIgnoreError specify: EQUALITY booleanMatch - ipaIgnoreError being MAY is a pitfall. Assuming absense implies "not ignored", searching for: (ipaIgnoreError=FALSE) will _exclude_ entries without the ipaIgnoreError attribute. The correct filter is '(!(ipaIgnoreError=FALSE))'. Better to make it a MUST attribute and exclude this pitfall. - We probably want presence index for ipaErrorDateResolved

Done.

...

3. Execution; we might want a watchdog to kill checks that take too long (for whatever reason). There'll be some complexity so maybe just make a note not to code ourselves into a corner and we can defer it.

Added. I also added a config file so it can be overridden. I think I need to explore configuration a bit more. Ideally most of the config would be stored in LDAP (e.g. if you want to disable a whole set of tests from running). A local config for timeout is preferred in case LDAP is inaccessible for some reason.

...

4. (Comment) regarding the separate repo, I'm not against it but there's some interdependency, i.e. HC will depend on a lot of stuff from ipalib, but the IPA healthcheck plugin will also depend on stuff defined by HC. What bits will live where is not fully clear. We might have to work it out as we go.

I'm not dead set on this but it might be nice and a check on the developer API changing. I added a bit more verbiage.

...

5. CLI: the '--source' option has not been defined. Does '--tool' mean the same thing? 6. Terminology: not sure about "source"/"command" (especially "command", which could be confusing ("what command failed?") Some ideas: command -> item/check/fault. I don't care about bikeshedding the strings, I just want to avoid overloaded/confusing terms.

PLEASE, bikeshed away! As you can see I'm having a heck of a time coming up with a good way to specify the group of tests versus an individual test. This is key to understanding everything so good naming is important. I'm very open to suggestions on this.

...

7. CLI: there is some inconsistency with how other IPA commands work (not necessarily bad, but it should be justified). If we follow the IPA pattern: - `ipa healthcheck-show UUID` would show a single report - `ipa healthcheck-find` would have a `--master=HOSTNAME` filter option. - `--all` would show all attributes, and there would be a separate option to show ignored reports (e.g. `include-ignored`). So again, we don't have to do it that way, but the current design is a deviation from the norm so I think that should be discussed from a usability perspective.

Yes, this is complicated. If we want to drop it, and I'm perfectly ok with it, we'd have to have extremely atomic, uniquely named individual tests within a plugin. For example, to check on file ownership one way to do it would be with a table: files = [ ('/etc/httpd/alias/key3.db', 'root', 'apache', '0640'), ('/etc/httpd/alias/cert8.db', 'root', 'apache', '0640'), ... ] for (file, owner, group, mode) in files: [ test ] How would we name a particular failure? This is why I went with UUID. Similar applies to the certmonger tracking. We have 8 or so tracking requests by default, if one or more fail we'd report each one individually but how to name them automatically? I punted. Honestly I think the -show command will be used more within the UI than the CLI. The -find command will show the same information.

...

8. Can a single tool+command combo produced multiple reports for a single master, with different ipaErrorMessage key-value pairs? Example: file permissions. Is every possible file to check a different tool+command, or is it one tool+command, with potentially multiple reports with different ipaErrorMessage parameters?

Exactly. I imagined a separate report for every single failure.

...

Consider this from a usability perspective: the resolution is likely to be very similar for all the possible instantiations. Also consider how many tool+command combinations there would be if all the possible files to check had to have different names. Lookup tables for error message generation and external resources get huge.

The failure lookup is by the plugin and particular test. I kept them separate so it is insanely easy to track which ones are resolved and when (if 3 files have bad perms and are reported in a single LDAP entry and one is fixed, what do we do)? I looked at it like a transaction file.

...

OTOH if a single tool+command can produce multiple reports, it affects the API/CLI somewhat (e.g. `ipa healthcheck-ignore` must now be given the UUID or enough parameters to uniquely identify the report to ignore).

Yes. Icky but necessary.

...

9. Would be good to include links to external resources etc in healthcheck-show. Also to indicate when 'ipa-healthcheck' may be able to repair the issue (may reduce support burden if we can subtly encourage the administrator to run the repair tool instead of contact support / mailing list).

I've been a bit vague about working with the user on how to resolve a particular problem. We have a few obvious options: 1) external documentation: wiki, downstream docs, both? 2) a separate LDAP lookup table I've added a section on this and some additional schema as a starting point for discussion.

...

That's all for now :) Overall the design is looking good.

Thanks for the feedback rob

Petr Vobornik

Wednesday, 21 November Wed, 21 Nov

5:16 a.m.

Hi, could the design also contains the proposed set of commands(including an installation of the feature) in a specific example including also anticipated output of CLI. I.e to see a workflow which the user(admistrator) of this enhancement would need to do. And how the information he/she would get would look like. It will help us to determine how usable it will be without actually implementing it. So that we can save some time on a possible redesign. The example could be some check which will be implemented later. E.g. expired RA certificate. Thank you On Wed, Oct 24, 2018 at 10:49 PM Rob Crittenden via FreeIPA-devel <freeipa-devel(a)lists.fedorahosted.org> wrote:

...

I started a design of an IPA healthcheck framework at https://www.freeipa.org/page/V4/Healthcheck Have at it. Note that this concentrates more on how it will work big picture and less on individual checks that may be performed. I'm happy to add any ideas you come up with for specific tests. rob _______________________________________________ FreeIPA-devel mailing list -- freeipa-devel(a)lists.fedorahosted.org To unsubscribe send an email to freeipa-devel-leave(a)lists.fedorahosted.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/freeipa-devel@lists.fedoraho...

-- Petr Vobornik Associate Manager, Engineering, Identity Management Red Hat

Rob Crittenden

10:14 a.m.

Petr Vobornik via FreeIPA-devel wrote:

...

I added sample usage in the https://www.freeipa.org/page/V4/Healthcheck#CLI section and an installation section in https://www.freeipa.org/page/V4/Healthcheck#Installation

...

The example could be some check which will be implemented later. E.g. expired RA certificate.

I'm not sure I follow. rob

...

Thank you On Wed, Oct 24, 2018 at 10:49 PM Rob Crittenden via FreeIPA-devel <freeipa-devel(a)lists.fedorahosted.org> wrote: > > I started a design of an IPA healthcheck framework at > https://www.freeipa.org/page/V4/Healthcheck > > Have at it. > > Note that this concentrates more on how it will work big picture and > less on individual checks that may be performed. I'm happy to add any > ideas you come up with for specific tests. > > rob > _______________________________________________ > FreeIPA-devel mailing list -- freeipa-devel(a)lists.fedorahosted.org > To unsubscribe send an email to freeipa-devel-leave(a)lists.fedorahosted.org > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: https://lists.fedorahosted.org/archives/list/freeipa-devel@lists.fedoraho...

Petr Vobornik

Thursday, 22 November Thu, 22 Nov

8:08 a.m.

On Wed, Nov 21, 2018 at 5:14 PM Rob Crittenden <rcritten(a)redhat.com> wrote:

...

Petr Vobornik via FreeIPA-devel wrote: > Hi, > > could the design also contains the proposed set of commands(including > an installation of the feature) in a specific example including also > anticipated output of CLI. I.e to see a workflow which the > user(admistrator) of this enhancement would need to do. And how the > information he/she would get would look like. It will help us to > determine how usable it will be without actually implementing it. So > that we can save some time on a possible redesign. I added sample usage in the https://www.freeipa.org/page/V4/Healthcheck#CLI section and an installation section in https://www.freeipa.org/page/V4/Healthcheck#Installation

Thanks this helps. But I was not clear about the thing to solve. The design page is consumed by people with different roles also a design page has several sections. Nobody except developers reads "design part" as you need knowledge of IPA internals to understand that section. Non-developers will read "Overview", "Use cases", "How to use". In some cases also "Feature management". So people should be able to understand how to use the feature only from these sections. Also, the sections should not mention implementation details. The "how to use" section has instructions in the design template: """ This a starting point for design discussions. Easy to follow instructions on how to use the new feature according to the use cases described above. FreeIPA user needs to be able to follow the steps and demonstrate the new features. The chapter may be divided in sub-sections per Use Case. """ The Healtch Check desing page doesn't contain this information in the section, there are only very generic sentences with bit of implementation details. The use cases section lists only on general use case + bunch of checks. I tried to expand this use case to some bit more specific, to give I idea what si meant by the "easy to follow instructions": Use case: new server/replica installation, just checking status: --------------------------------------------------------------- # install server $ dnf install freeipa-server $ ipa-server-install # Check if there are issues after installation ipa healthcheck-find # I expect that it will return nothing. # Q: How will the admin know that health check was run and what checks it did? The output of a no health check was done and all is all right is the same. It leads to question: Q: how will I know that automatic health check is running is not broken? Use case: some error happened, couple months/years after install, taking corrective actions: -------------------------------------------------------------------------------------------- # ipa healthcheck-find UUID: 25003678-bae7-4d1a-a071-b6d42e3840c1 Source: certcheck Check: bad_permissions Severity: Error Message: The file /etc/httpd/alias/key3.db has incorrect permissions. Expected 0640, got 0755 Solution: See URL Reported: Wed Nov 14 18:35:11 2018 UTC Ignored: FALSE # taking corrective action, let's assume that `ipactl restart` was there as well ipa healthcheck-find # What will be the result? Will it be empty or there will be new line: "Resolved: TRUE"? If it will be present with a resolved line. How the records will be sorted. Will "not resolved" be on top and "resolved" on the bottom? General questions: ------------------ Why the command is called "healthcheck-find" when the use case is "show me the errors" not "show me the available checks"? Should it be more " ipa problem-find" How people will know what can be passed to --source and --check options? Is it produced in help? How people will know what the various checks check? ipa-healhcheck command ---------------------- """ $ ipa-healthcheck Check certificate renewal Check file permissions ... The ipa-healthcheck command failed. """ Does "The ipa-healthcheck command failed" mean that there were issues when executing the checks (I assume this one) or that an issue was found? --source, how will I know what are the available checks? "Check certificate renewal", ... does it mean that the tool just listed the checks or the tool ran them? If it was run, shouldn't it be more like a: "Checking: certificate renewal: ... OK"

...

> The example could be some check which will be implemented later. E.g. > expired RA certificate. I'm not sure I follow.

Just an example check. You've pick expired certs and wrong permissions.

...

rob > > Thank you > On Wed, Oct 24, 2018 at 10:49 PM Rob Crittenden via FreeIPA-devel > <freeipa-devel(a)lists.fedorahosted.org> wrote: >> >> I started a design of an IPA healthcheck framework at >> https://www.freeipa.org/page/V4/Healthcheck >> >> Have at it. >> >> Note that this concentrates more on how it will work big picture and >> less on individual checks that may be performed. I'm happy to add any >> ideas you come up with for specific tests. >> >> rob >> _______________________________________________ >> FreeIPA-devel mailing list -- freeipa-devel(a)lists.fedorahosted.org >> To unsubscribe send an email to freeipa-devel-leave(a)lists.fedorahosted.org >> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html >> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines >> List Archives: https://lists.fedorahosted.org/archives/list/freeipa-devel@lists.fedoraho... > > >

-- Petr Vobornik Associate Manager, Engineering, Identity Management Red Hat

Rob Crittenden

Monday, 26 November Mon, 26 Nov

2:48 p.m.

I see what you're saying now. I'll update it once I hear back from Dmitri whether this feature is going to be upstream or not. I don't see the point of not doing it upstream but he's the boss. I did a rather mixed job of flushing out my head on some of these details. I'll fix them when I can. rob Petr Vobornik via FreeIPA-devel wrote:

...

On Wed, Nov 21, 2018 at 5:14 PM Rob Crittenden <rcritten(a)redhat.com> wrote: > > Petr Vobornik via FreeIPA-devel wrote: >> Hi, >> >> could the design also contains the proposed set of commands(including >> an installation of the feature) in a specific example including also >> anticipated output of CLI. I.e to see a workflow which the >> user(admistrator) of this enhancement would need to do. And how the >> information he/she would get would look like. It will help us to >> determine how usable it will be without actually implementing it. So >> that we can save some time on a possible redesign. > > I added sample usage in the > https://www.freeipa.org/page/V4/Healthcheck#CLI section and an > installation section in > https://www.freeipa.org/page/V4/Healthcheck#Installation Thanks this helps. But I was not clear about the thing to solve. The design page is consumed by people with different roles also a design page has several sections. Nobody except developers reads "design part" as you need knowledge of IPA internals to understand that section. Non-developers will read "Overview", "Use cases", "How to use". In some cases also "Feature management". So people should be able to understand how to use the feature only from these sections. Also, the sections should not mention implementation details. The "how to use" section has instructions in the design template: """ This a starting point for design discussions. Easy to follow instructions on how to use the new feature according to the use cases described above. FreeIPA user needs to be able to follow the steps and demonstrate the new features. The chapter may be divided in sub-sections per Use Case. """ The Healtch Check desing page doesn't contain this information in the section, there are only very generic sentences with bit of implementation details. The use cases section lists only on general use case + bunch of checks. I tried to expand this use case to some bit more specific, to give I idea what si meant by the "easy to follow instructions": Use case: new server/replica installation, just checking status: --------------------------------------------------------------- # install server $ dnf install freeipa-server $ ipa-server-install # Check if there are issues after installation ipa healthcheck-find # I expect that it will return nothing. # Q: How will the admin know that health check was run and what checks it did? The output of a no health check was done and all is all right is the same. It leads to question: Q: how will I know that automatic health check is running is not broken? Use case: some error happened, couple months/years after install, taking corrective actions: -------------------------------------------------------------------------------------------- # ipa healthcheck-find UUID: 25003678-bae7-4d1a-a071-b6d42e3840c1 Source: certcheck Check: bad_permissions Severity: Error Message: The file /etc/httpd/alias/key3.db has incorrect permissions. Expected 0640, got 0755 Solution: See URL Reported: Wed Nov 14 18:35:11 2018 UTC Ignored: FALSE # taking corrective action, let's assume that `ipactl restart` was there as well ipa healthcheck-find # What will be the result? Will it be empty or there will be new line: "Resolved: TRUE"? If it will be present with a resolved line. How the records will be sorted. Will "not resolved" be on top and "resolved" on the bottom? General questions: ------------------ Why the command is called "healthcheck-find" when the use case is "show me the errors" not "show me the available checks"? Should it be more " ipa problem-find" How people will know what can be passed to --source and --check options? Is it produced in help? How people will know what the various checks check? ipa-healhcheck command ---------------------- """ $ ipa-healthcheck Check certificate renewal Check file permissions ... The ipa-healthcheck command failed. """ Does "The ipa-healthcheck command failed" mean that there were issues when executing the checks (I assume this one) or that an issue was found? --source, how will I know what are the available checks? "Check certificate renewal", ... does it mean that the tool just listed the checks or the tool ran them? If it was run, shouldn't it be more like a: "Checking: certificate renewal: ... OK" > >> The example could be some check which will be implemented later. E.g. >> expired RA certificate. > > I'm not sure I follow. Just an example check. You've pick expired certs and wrong permissions. > > rob > >> >> Thank you >> On Wed, Oct 24, 2018 at 10:49 PM Rob Crittenden via FreeIPA-devel >> <freeipa-devel(a)lists.fedorahosted.org> wrote: >>> >>> I started a design of an IPA healthcheck framework at >>> https://www.freeipa.org/page/V4/Healthcheck >>> >>> Have at it. >>> >>> Note that this concentrates more on how it will work big picture and >>> less on individual checks that may be performed. I'm happy to add any >>> ideas you come up with for specific tests. >>> >>> rob >>> _______________________________________________ >>> FreeIPA-devel mailing list -- freeipa-devel(a)lists.fedorahosted.org >>> To unsubscribe send an email to freeipa-devel-leave(a)lists.fedorahosted.org >>> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html >>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines >>> List Archives: https://lists.fedorahosted.org/archives/list/freeipa-devel@lists.fedoraho... >> >> >> >

Rob Crittenden

Tuesday, 27 November Tue, 27 Nov

1:09 p.m.

Petr Vobornik via FreeIPA-devel wrote:

...

On Wed, Nov 21, 2018 at 5:14 PM Rob Crittenden <rcritten(a)redhat.com> wrote: > > Petr Vobornik via FreeIPA-devel wrote: > I added sample usage in the > https://www.freeipa.org/page/V4/Healthcheck#CLI section and an > installation section in > https://www.freeipa.org/page/V4/Healthcheck#Installation Thanks this helps. But I was not clear about the thing to solve. The design page is consumed by people with different roles also a design page has several sections. Nobody except developers reads "design part" as you need knowledge of IPA internals to understand that section. Non-developers will read "Overview", "Use cases", "How to use". In some cases also "Feature management". So people should be able to understand how to use the feature only from these sections. Also, the sections should not mention implementation details. The "how to use" section has instructions in the design template: """ This a starting point for design discussions. Easy to follow instructions on how to use the new feature according to the use cases described above. FreeIPA user needs to be able to follow the steps and demonstrate the new features. The chapter may be divided in sub-sections per Use Case. """ The Healtch Check desing page doesn't contain this information in the section, there are only very generic sentences with bit of implementation details. The use cases section lists only on general use case + bunch of checks. I tried to expand this use case to some bit more specific, to give I idea what si meant by the "easy to follow instructions":

Well, this is embedded later and I didn't want to keep repeating myself. I just added quite a lot more information more suitable for user-level documentation.

...

Use case: new server/replica installation, just checking status: --------------------------------------------------------------- # install server $ dnf install freeipa-server $ ipa-server-install # Check if there are issues after installation ipa healthcheck-find # I expect that it will return nothing. # Q: How will the admin know that health check was run and what checks it did? The output of a no health check was done and all is all right is the same.

This is a good point I'll need to consider. I've gone back and forth a couple of times in the organization of the data whether there should be a location to store metadata on the server itself. It seems that is necessary. If the tool fails entirely then it will be noted in the journal.

...

It leads to question: Q: how will I know that automatic health check is running is not broken?

journalctl and ipa healthcheck-file --master <some master> returns 0.

...

Use case: some error happened, couple months/years after install, taking corrective actions: -------------------------------------------------------------------------------------------- # ipa healthcheck-find UUID: 25003678-bae7-4d1a-a071-b6d42e3840c1 Source: certcheck Check: bad_permissions Severity: Error Message: The file /etc/httpd/alias/key3.db has incorrect permissions. Expected 0640, got 0755 Solution: See URL Reported: Wed Nov 14 18:35:11 2018 UTC Ignored: FALSE # taking corrective action, let's assume that `ipactl restart` was there as well ipa healthcheck-find # What will be the result? Will it be empty or there will be new line: "Resolved: TRUE"? If it will be present with a resolved line. How the records will be sorted. Will "not resolved" be on top and "resolved" on the bottom?

Resolved are not displayed by default. As for sorting, I'm more focused on collection at this point. If we store the data correctly then the reporting should flow from that, assuming of course we are storing everything we need :-) I hadn't thought about it though, I think on the CLI it will be unsorted though, just the raw data from LDAP. I didn't ignore it completely though. I imagine one could do automation to do: # ipa healthcheck-find --master ipa.example.com --since `date --date="yesterday"` And confirm that the number of entries returned is 0. I added this use-case to the section. An issue is not is not marked as resolved until the next time that ipa-healthcheck runs.

...

General questions: ------------------ Why the command is called "healthcheck-find" when the use case is "show me the errors" not "show me the available checks"? Should it be more " ipa problem-find"

I think the typical admin will run the ipa command healtcheck-* more often than the collection tool ipa-healthcheck. Not everything discovered is a "problem" I suppose, some may be marked as warnings.

...

How people will know what can be passed to --source and --check options? Is it produced in help?

TBD. Because of the separation between collection and reporting I'll need to work something out.

...

How people will know what the various checks check?

Ideally the combination of source + check will be sufficient. We have a couple of ways to look at this depending on how we structure individual plugins, here are a few examples. Take the filesystem permissions and owner/group tests, for example. In a pythonic way we could store the list as a big tuple of tuples: perms = (('/etc/pki/nssdb/cert9.db', 0o0644, 'root', 'root'), ('/etc/pki/nssdb/key4.db', 0o0600, 'root', 'root'), ...) Then have a for loop to go through it: for file, expected, owner, group in perms: s = stat(file) if s.mode != expected: report bad_perms So the source would be filesystem the check would be permissions. There would be another check for owner, perhaps include group in there to reduce the number of stat(), for arguments sake. On the one hand this is pretty efficient, small, easy to read code. On the reporting side, which I think is where you are going, this ain't so grand because the check is not at all specific to the actual problem. You could end up with a bunch of filesystem::permission errors. Is this ok? In this case, probably yes. Another example, validating the certmonger tracking. We could have something similar where we validate the options passed to start_tracking for the certs we know about: requests = [ { 'cert-file': paths.RA_AGENT_PEM, 'key-file': paths.RA_AGENT_KEY, 'ca-name': 'dogtag-ipa-ca-renew-agent', 'cert-presave-command': template % 'renew_ra_cert_pre', 'cert-postsave-command': template % 'renew_ra_cert', }, ... ] Then loop over that, verifying it. Should we have a separate check for each request so that only those that failed are there? In this case I might argue that yes, we should have a separate check for each request, or at least a dynamic check name so that only a single one could be reported (and tested). So we would have "The agent is not configured properly" vs "one of the tracked certs, agent, is not configured properly". Unfortunately I don't think we can have any hard rule on how atomic is atomic without being too atomic (like check_etc_pki_nssdb_cert9.db) It will be a balance of meaningful named checks versus the number that have to be paged through to find one.

...

ipa-healhcheck command ---------------------- """ $ ipa-healthcheck Check certificate renewal Check file permissions ... The ipa-healthcheck command failed. """> Does "The ipa-healthcheck command failed" mean that there were issues when executing the checks (I assume this one) or that an issue was found?

This was due to my playing with my PoC in a way that made things blow up and I didn't notice when I pasted it. The only way a non-zero error will be returned by ipa-healtcheck is if the tool itself blows up in some unrecoverable or fatal way. Ideally it should never happen. This type of error can only be reported via the journal.

...

--source, how will I know what are the available checks?

In this case we can provide a list.

...

"Check certificate renewal", ... does it mean that the tool just listed the checks or the tool ran them? If it was run, shouldn't it be more like a: "Checking: certificate renewal: ... OK"

I suppose that is a matter for interpretation. It is to be determined how much we want/should display on the command-line output. I was thinking that this text would be shown just to indicate that the tool is not hung, so that there is a semi-continuous stream of output. Display an OK or FAILED, sure, that is possible. rob

...

>> The example could be some check which will be implemented later. E.g. >> expired RA certificate. > > I'm not sure I follow. Just an example check. You've pick expired certs and wrong permissions. > > rob > >> >> Thank you >> On Wed, Oct 24, 2018 at 10:49 PM Rob Crittenden via FreeIPA-devel >> <freeipa-devel(a)lists.fedorahosted.org> wrote: >>> >>> I started a design of an IPA healthcheck framework at >>> https://www.freeipa.org/page/V4/Healthcheck >>> >>> Have at it. >>> >>> Note that this concentrates more on how it will work big picture and >>> less on individual checks that may be performed. I'm happy to add any >>> ideas you come up with for specific tests. >>> >>> rob >>> _______________________________________________ >>> FreeIPA-devel mailing list -- freeipa-devel(a)lists.fedorahosted.org >>> To unsubscribe send an email to freeipa-devel-leave(a)lists.fedorahosted.org >>> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html >>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines >>> List Archives: https://lists.fedorahosted.org/archives/list/freeipa-devel@lists.fedoraho... >> >> >> >

Florence Blanc-Renaud

2:39 a.m.

On 10/24/18 10:49 PM, Rob Crittenden via FreeIPA-devel wrote:

...

Hi Rob, thanks for the design. One minor question: is there a rationale for defining an AUXILIARY class instead of STRUCTURAL for ipaHealthCheckObject and ipaHealthCheckSolutionObject? flo

Rob Crittenden

8:03 a.m.

Florence Blanc-Renaud wrote:

...

On 10/24/18 10:49 PM, Rob Crittenden via FreeIPA-devel wrote: > I started a design of an IPA healthcheck framework at > https://www.freeipa.org/page/V4/Healthcheck > > Have at it. > > Note that this concentrates more on how it will work big picture and > less on individual checks that may be performed. I'm happy to add any > ideas you come up with for specific tests. > > rob > _______________________________________________ > FreeIPA-devel mailing list -- freeipa-devel(a)lists.fedorahosted.org > To unsubscribe send an email to > freeipa-devel-leave(a)lists.fedorahosted.org > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedorahosted.org/archives/list/freeipa-devel@lists.fedoraho... > > Hi Rob, thanks for the design. One minor question: is there a rationale for defining an AUXILIARY class instead of STRUCTURAL for ipaHealthCheckObject and ipaHealthCheckSolutionObject?

Mostly because I don't expect anything to extend it. I have no objection to making it STRUCTURAL if you think that would be better. rob

Florence Blanc-Renaud

Monday, 3 December Mon, 3 Dec

2:33 a.m.

On 11/27/18 3:03 PM, Rob Crittenden via FreeIPA-devel wrote:

...

Florence Blanc-Renaud wrote: > On 10/24/18 10:49 PM, Rob Crittenden via FreeIPA-devel wrote: >> I started a design of an IPA healthcheck framework at >> https://www.freeipa.org/page/V4/Healthcheck >> >> Have at it. >> >> Note that this concentrates more on how it will work big picture and >> less on individual checks that may be performed. I'm happy to add any >> ideas you come up with for specific tests. >> >> rob >> _______________________________________________ >> FreeIPA-devel mailing list -- freeipa-devel(a)lists.fedorahosted.org >> To unsubscribe send an email to >> freeipa-devel-leave(a)lists.fedorahosted.org >> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html >> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines >> List Archives: >> https://lists.fedorahosted.org/archives/list/freeipa-devel@lists.fedoraho... >> >> > > Hi Rob, > > thanks for the design. One minor question: is there a rationale for > defining an AUXILIARY class instead of STRUCTURAL for > ipaHealthCheckObject and ipaHealthCheckSolutionObject? Mostly because I don't expect anything to extend it. I have no objection to making it STRUCTURAL if you think that would be better.

I have a different reasoning for STRUCTURAL vs AUX: if the object is self-standing (ie would contain only this oc and top) I use STRUCTURAL but if it's used to extend another one I favor AUXILIARY. Not a big deal anyway... flo

...

rob _______________________________________________ FreeIPA-devel mailing list -- freeipa-devel(a)lists.fedorahosted.org To unsubscribe send an email to freeipa-devel-leave(a)lists.fedorahosted.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/freeipa-devel@lists.fedoraho...

1971

days inactive

2011

days old

freeipa-devel@lists.fedorahosted.org

Manage subscription

10 comments

4 participants

tags (0)

participants (4)

Florence Blanc-Renaud
Fraser Tweedale
Petr Vobornik
Rob Crittenden

2024

2023

2022

2021

2020

2019

2018

2017

[DESIGN] IPA healthcheck design