Petr Vobornik via FreeIPA-devel wrote:
On Wed, Nov 21, 2018 at 5:14 PM Rob Crittenden
> Petr Vobornik via FreeIPA-devel wrote:
> I added sample usage in the
section and an
> installation section in
Thanks this helps. But I was not clear about the thing to solve. The
design page is consumed by people with different roles also a design
page has several sections. Nobody except developers reads "design
part" as you need knowledge of IPA internals to understand that
section. Non-developers will read "Overview", "Use cases", "How
use". In some cases also "Feature management". So people should be
able to understand how to use the feature only from these sections.
Also, the sections should not mention implementation details. The "how
to use" section has instructions in the design template:
This a starting point for design discussions.
Easy to follow instructions on how to use the new feature according to
the use cases described above. FreeIPA user needs to be able to follow
the steps and demonstrate the new features.
The chapter may be divided in sub-sections per Use Case.
The Healtch Check desing page doesn't contain this information in the
section, there are only very generic sentences with bit of
The use cases section lists only on general use case + bunch of
checks. I tried to expand this use case to some bit more specific, to
give I idea what si meant by the "easy to follow instructions":
Well, this is embedded later and I didn't want to keep repeating myself.
I just added quite a lot more information more suitable for user-level
Use case: new server/replica installation, just checking status:
# install server
$ dnf install freeipa-server
# Check if there are issues after installation
# I expect that it will return nothing.
# Q: How will the admin know that health check was run and what checks
it did? The output of a no health check was done and all is all right
is the same.
This is a good point I'll need to consider. I've gone back and forth a
couple of times in the organization of the data whether there should be
a location to store metadata on the server itself. It seems that is
If the tool fails entirely then it will be noted in the journal.
It leads to question:
Q: how will I know that automatic health check is running is not broken?
journalctl and ipa healthcheck-file --master <some master> returns 0.
Use case: some error happened, couple months/years after install,
taking corrective actions:
# ipa healthcheck-find
Message: The file /etc/httpd/alias/key3.db has incorrect permissions.
Expected 0640, got 0755
Solution: See URL
Reported: Wed Nov 14 18:35:11 2018 UTC
# taking corrective action, let's assume that `ipactl restart` was there as well
# What will be the result? Will it be empty or there will be new line:
If it will be present with a resolved line. How the records will be
sorted. Will "not resolved" be on top and "resolved" on the bottom?
Resolved are not displayed by default. As for sorting, I'm more focused
on collection at this point. If we store the data correctly then the
reporting should flow from that, assuming of course we are storing
everything we need :-)
I hadn't thought about it though, I think on the CLI it will be unsorted
though, just the raw data from LDAP. I didn't ignore it completely
though. I imagine one could do automation to do:
# ipa healthcheck-find --master ipa.example.com
And confirm that the number of entries returned is 0.
I added this use-case to the section. An issue is not is not marked as
resolved until the next time that ipa-healthcheck runs.
Why the command is called "healthcheck-find" when the use case is
"show me the errors" not "show me the available checks"? Should it
more " ipa problem-find"
I think the typical admin will run the ipa command healtcheck-* more
often than the collection tool ipa-healthcheck. Not everything
discovered is a "problem" I suppose, some may be marked as warnings.
How people will know what can be passed to --source and --check
options? Is it produced in help?
TBD. Because of the separation between collection and reporting I'll
need to work something out.
How people will know what the various checks check?
Ideally the combination of source + check will be sufficient. We have a
couple of ways to look at this depending on how we structure individual
plugins, here are a few examples.
Take the filesystem permissions and owner/group tests, for example.
In a pythonic way we could store the list as a big tuple of tuples:
perms = (('/etc/pki/nssdb/cert9.db', 0o0644, 'root', 'root'),
('/etc/pki/nssdb/key4.db', 0o0600, 'root', 'root'),
Then have a for loop to go through it:
for file, expected, owner, group in perms:
s = stat(file)
if s.mode != expected: report bad_perms
So the source would be filesystem the check would be permissions.
There would be another check for owner, perhaps include group in there
to reduce the number of stat(), for arguments sake.
On the one hand this is pretty efficient, small, easy to read code.
On the reporting side, which I think is where you are going, this ain't
so grand because the check is not at all specific to the actual problem.
You could end up with a bunch of filesystem::permission errors.
Is this ok? In this case, probably yes.
Another example, validating the certmonger tracking.
We could have something similar where we validate the options passed to
start_tracking for the certs we know about:
requests = [
'cert-presave-command': template % 'renew_ra_cert_pre',
'cert-postsave-command': template % 'renew_ra_cert',
Then loop over that, verifying it.
Should we have a separate check for each request so that only those that
failed are there? In this case I might argue that yes, we should have a
separate check for each request, or at least a dynamic check name so
that only a single one could be reported (and tested). So we would have
"The agent is not configured properly" vs "one of the tracked certs,
agent, is not configured properly".
Unfortunately I don't think we can have any hard rule on how atomic is
atomic without being too atomic (like check_etc_pki_nssdb_cert9.db)
It will be a balance of meaningful named checks versus the number that
have to be paged through to find one.
Check certificate renewal
Check file permissions
The ipa-healthcheck command failed.
Does "The ipa-healthcheck command failed" mean that there were issues
when executing the checks (I assume this one) or that an issue was
This was due to my playing with my PoC in a way that made things blow up
and I didn't notice when I pasted it.
The only way a non-zero error will be returned by ipa-healtcheck is if
the tool itself blows up in some unrecoverable or fatal way. Ideally it
should never happen. This type of error can only be reported via the
--source, how will I know what are the available checks?
this case we can provide a list.
"Check certificate renewal", ... does it mean that the tool
listed the checks or the tool ran them? If it was run, shouldn't it be
more like a: "Checking: certificate renewal: ... OK"
I suppose that is a matter for interpretation. It is to be determined
how much we want/should display on the command-line output.
I was thinking that this text would be shown just to indicate that the
tool is not hung, so that there is a semi-continuous stream of output.
Display an OK or FAILED, sure, that is possible.
>> The example could be some check which will be implemented
>> expired RA certificate.
> I'm not sure I follow.
Just an example check. You've pick expired certs and wrong permissions.
>> Thank you
>> On Wed, Oct 24, 2018 at 10:49 PM Rob Crittenden via FreeIPA-devel
>> <freeipa-devel(a)lists.fedorahosted.org> wrote:
>>> I started a design of an IPA healthcheck framework at
>>> Have at it.
>>> Note that this concentrates more on how it will work big picture and
>>> less on individual checks that may be performed. I'm happy to add any
>>> ideas you come up with for specific tests.
>>> FreeIPA-devel mailing list -- freeipa-devel(a)lists.fedorahosted.org
>>> To unsubscribe send an email to freeipa-devel-leave(a)lists.fedorahosted.org
>>> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
>>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>>> List Archives: