In the first case we could easily mitigate the risk by testing and be
fairly confident, in the second case the tests are too complex to achieve the same
confidence and we should take this kind of risk only if there were a serious benefit to
balance it, but in this case, there are other solutions with less risks.
Actually, I think testing the lib389 tooling would be even harder. You would need to
recreate the logic of the mapping tree and sorting in python, which may have subtle
differences compared to the C version. So it would be harder to test and gain confidence
in. It also doesn't solve the issue that may come about from manual misconfiguration.
I can understand it could seem too conservervative and frustrating
but that is the price when working on mature projects. If you do not do that, the product
becomes unstable, and users quickly abandon it.
I have worked on this project for a number of years, so I'm well aware of the culture
in the team. We are a team who values the highest quality of code, with customers who
demand the very best. To satisfy this as engineers we need to be confident in what we do
and the work we create. But every day we make changes that are bigger than this, or have
"more unknowns" and more. It's out attitude as a team to quality, our
attention to testing, and designs, that make us excellent at effectively making changes
with confidence.
Because just as easily, when a product has subtle traps, unknown configuration bugs and
lets people mishandle it, then they also abandon us. Our user experience is paramount, and
part of that experience is not just stability, but reliability and correctness, that
changes performed by administrators will work and not "silently fail". This bug
is just as much a risk for people to abandon us because when the server allows
misconfiguration to exist that is hard to isolate and understand that too can cause a
negative user experience.
So here, I think we are going to have to "agree to disagree", but as Mark has
stated - the fix is created, the PR is open. If you have more configuration cases to
contribute to the test suite, that would benefit the project significantly to ensure the
quality of the change, and the quality of the mapping tree in general. Our job is to
qualify and create scenarios that were "unknown" and turn them to
"knowns" so we can control changes and have confidence in our work.
On 20 Oct 2020, at 06:10, Mark Reynolds <mreynolds(a)redhat.com>
wrote:
Hi,
So some of the arguments here is that we are introducing risk for something that is not
really a big problem. Or, simply not worth investing in. From a Red Hat perspective
"we" would never fix this, it's just not a problem that comes up enough to
justify the work and time. But... The initial work has been done by the upstream
community (William).
With a corporate interest too, we have a customer at SUSE who has hit this :).
So from a RH perspective we are getting this work for free.
Personally I don't see this code change as "very" risky, but this is a very
sensitive area of the code. That being said, I am not opposed to adding it, but... I
think we need much more testing around it to build confidence in the patch. I would want
tests that deal with suffixes of varying size, names, nested levels/complexity:
o=my_server.com
dc=example,dc=com
dc=abcdef,dc=abc (same length as suffix above - since the patch uses sizing as a way
of sorting)
dc=test,dc=this,dc=patch
Yep, these are some great test ideas. I can add these.
I want tests that are adding and removing subsuffixes, and sub-subsuffixes, and making
sure ldap ops work, and replication, etc. I want tests that use many different suffixes
at the same time and many subsuffixes - some customers have 50 subsuffixes. Our current
CI test suite does not have these kinds of tests, and we need them.
I have already checked with replication suite too, and of course, with ASAN. I think that
these also are good to have added in general, so I can expand the testing to include more
suffixes too.
Do you see 50 subsuffixes in a single level nesting or deeper? I can do some shallow
nesting and deep nesting hierarchies with that kind of number if you want. I think an
interesting test would also be to have
ou=x,ou=y,dc=example,dc=com
dc=example,dc=com
and then add ou=y,dc=example,dc=com in between. Today I think the pre-patched MT code
would actually not handle this either, but that's a pretty big edge case IMO. The real
guarantee is that we do assemble the tree correctly.
We thankfully gain confidence already because the CN is already relied on for routing and
query matching anyway, so we know these values *must* be correct, we just need to
guarantee the sorting order and tree assembly.
Thanks for the ideas Mark :)
As of today I'm not comfortable with the current CI tests to ack this patch, but if
we can ramp it up and cover more scenarios it would be a step in the right direction.
This is all just my humble opinion, we are all still just talking :-)
Mark
—
Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server
SUSE Labs, Australia