https://github.com/mreynolds389/389wiki/pull/48
This is a draft design, and probably of interest to thierry whom I discussed this with last night :)
Thanks!
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia
Hi,
you are right that it is possible to configure suffix hierarchies which are broken, but in my experience this wasn't an issue. people using sub suffixes did get it right.
So is there really a need to change something that is working for a long time ?
Regards,
Ludwig
On 14.10.20 08:12, William Brown wrote:
https://github.com/mreynolds389/389wiki/pull/48
This is a draft design, and probably of interest to thierry whom I discussed this with last night :)
Thanks!
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia _______________________________________________ 389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
This has come up because there is a set of customer cases where they have configured it incorrectly, due to bugs in lib389. The issues in lib389 arise from a lack of validation/constraint in the checking of the nsslapd-parent-suffix value in the server, allowing the client to create invalid configurations.
So today, our own tools can easily, and trivially cause this situation.
One thought is to either document this issue or to fix lib389 - but neither of the actions really fix the underlying problem, which is that our server accepts an invalid configuration silently.
So the best thing for us to do is to make it impossible for the server to get it wrong, which means we fix lib389 *and* any other admin tooling/scripts in a single pass.
Which is what led to the interest in changing something that has been "working" for a long time :)
On 14 Oct 2020, at 19:47, Ludwig Krispenz krispenz@t-online.de wrote:
Hi,
you are right that it is possible to configure suffix hierarchies which are broken, but in my experience this wasn't an issue. people using sub suffixes did get it right.
So is there really a need to change something that is working for a long time ?
Regards,
Ludwig
On 14.10.20 08:12, William Brown wrote:
https://github.com/mreynolds389/389wiki/pull/48
This is a draft design, and probably of interest to thierry whom I discussed this with last night :)
Thanks!
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia _______________________________________________ 389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia
Hi,
For my part, I have seen mapping tree misconfiguration popping from time to time but it is quite rare (maybe 7 or 8 times in 15 years) So although in pure term or architecture your proposal is IMHO the cleanest, in term of risks it is not a good solution: there will likely be regressions (it always happens when a component get redesigned) and that means that the number of people that will be annoyed by the change will be far greater than the few people that will be helped by the change.
So my feeling is that we should rather go for a trade off. Since the goal is to prevent that user misconfigures the mapping tree without warning, we should do that without changing the way the mapping tree is handled internally. simply by adding a consistency check when starting the server or changing the mapping tree configuration.
If we want to limit the risk we could phase things: in first phase we only log warning if inconsistency are found and latter on when we get more confident about the code we could reject the configuration change in case of inconsistency
I know that my proposal is less appealing in term of architecture but such a solution is safer because it does not change the way mapping tree are handled and that drastically limits the regression risks
Regards, Pierre
On Wed, Oct 14, 2020 at 11:23 PM William Brown wbrown@suse.de wrote:
This has come up because there is a set of customer cases where they have configured it incorrectly, due to bugs in lib389. The issues in lib389 arise from a lack of validation/constraint in the checking of the nsslapd-parent-suffix value in the server, allowing the client to create invalid configurations.
So today, our own tools can easily, and trivially cause this situation.
One thought is to either document this issue or to fix lib389 - but neither of the actions really fix the underlying problem, which is that our server accepts an invalid configuration silently.
So the best thing for us to do is to make it impossible for the server to get it wrong, which means we fix lib389 *and* any other admin tooling/scripts in a single pass.
Which is what led to the interest in changing something that has been "working" for a long time :)
On 14 Oct 2020, at 19:47, Ludwig Krispenz krispenz@t-online.de wrote:
Hi,
you are right that it is possible to configure suffix hierarchies which
are broken, but in my experience this wasn't an issue. people using sub suffixes did get it right.
So is there really a need to change something that is working for a long
time ?
Regards,
Ludwig
On 14.10.20 08:12, William Brown wrote:
https://github.com/mreynolds389/389wiki/pull/48
This is a draft design, and probably of interest to thierry whom I
discussed this with last night :)
Thanks!
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia _______________________________________________ 389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia _______________________________________________ 389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
Hi Pierre,
you expressed my concerns much clearer. I agree with you,
Thanks,
Ludwig
On 15.10.20 13:00, Pierre Rogier wrote:
Hi,
For my part, I have seen mapping tree misconfiguration popping from time to time but it is quite rare (maybe 7 or 8 times in 15 years) So although in pure term or architecture your proposal is IMHO the cleanest, in term of risks it is not a good solution: there will likely be regressions (it always happens when a component get redesigned) and that means that the number of people that will be annoyed by the change will be far greater than the few people that will be helped by the change.
So my feeling is that we should rather go for a trade off. Since the goal is to prevent that user misconfigures the mapping tree without warning, we should do that without changing the way the mapping tree is handled internally. simply by adding a consistency check when starting the server or changing the mapping tree configuration.
If we want to limit the risk we could phase things: in first phase we only log warning if inconsistency are found and latter on when we get more confident about the code we could reject the configuration change in case of inconsistency
I know that my proposal is less appealing in term of architecture but such a solution is safer because it does not change the way mapping tree are handled and that drastically limits the regression risks
Regards, Pierre
On Wed, Oct 14, 2020 at 11:23 PM William Brown <wbrown@suse.de mailto:wbrown@suse.de> wrote:
This has come up because there is a set of customer cases where they have configured it incorrectly, due to bugs in lib389. The issues in lib389 arise from a lack of validation/constraint in the checking of the nsslapd-parent-suffix value in the server, allowing the client to create invalid configurations. So today, our own tools can easily, and trivially cause this situation. One thought is to either document this issue or to fix lib389 - but neither of the actions really fix the underlying problem, which is that our server accepts an invalid configuration silently. So the best thing for us to do is to make it impossible for the server to get it wrong, which means we fix lib389 *and* any other admin tooling/scripts in a single pass. Which is what led to the interest in changing something that has been "working" for a long time :) > On 14 Oct 2020, at 19:47, Ludwig Krispenz <krispenz@t-online.de <mailto:krispenz@t-online.de>> wrote: > > Hi, > > you are right that it is possible to configure suffix hierarchies which are broken, but in my experience this wasn't an issue. people using sub suffixes did get it right. > > So is there really a need to change something that is working for a long time ? > > > Regards, > > Ludwig > > > On 14.10.20 08:12, William Brown wrote: >> https://github.com/mreynolds389/389wiki/pull/48 >> >> This is a draft design, and probably of interest to thierry whom I discussed this with last night :) >> >> Thanks! >> >> — >> Sincerely, >> >> William Brown >> >> Senior Software Engineer, 389 Directory Server >> SUSE Labs, Australia >> _______________________________________________ >> 389-devel mailing list -- 389-devel@lists.fedoraproject.org <mailto:389-devel@lists.fedoraproject.org> >> To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org <mailto:389-devel-leave@lists.fedoraproject.org> >> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ >> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines >> List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org > _______________________________________________ > 389-devel mailing list -- 389-devel@lists.fedoraproject.org <mailto:389-devel@lists.fedoraproject.org> > To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org <mailto:389-devel-leave@lists.fedoraproject.org> > Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org — Sincerely, William Brown Senior Software Engineer, 389 Directory Server SUSE Labs, Australia _______________________________________________ 389-devel mailing list -- 389-devel@lists.fedoraproject.org <mailto:389-devel@lists.fedoraproject.org> To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org <mailto:389-devel-leave@lists.fedoraproject.org> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
--
389 Directory Server Development Team
389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
Hi,
For my part, I have seen mapping tree misconfiguration popping from time to time but it is quite rare (maybe 7 or 8 times in 15 years) So although in pure term or architecture your proposal is IMHO the cleanest, in term of risks it is not a good solution: there will likely be regressions (it always happens when a component get redesigned) and that means that the number of people that will be annoyed by the change will be far greater than the few people that will be helped by the change.
Today, we already rely upon the attribute of cn to determine what suffix the mapping tree element provides. We must trust this value, and already do. This also implies that all current deployments, also trust this value to be correct.
It is also provable by tests that *invalid* nsslapd-parent-suffix configs cause backends to "not appear" in the mapping tree. So invalid parent suffixes already fail "noisly". This means, yes, that most deployments have both valid cn's for suffixes and valid nsslapd-parent-suffix values.
While it may be rare, we have had a few cases here at suse because the lib389 tools make a mistake in configuration that can easily and trivially cause this failure.
So changing this to trust a value of cn to provide ordering, this is already a value we rely on to be correct *and* we remove an obvious configuration consistency failure that has occurred. It is because of these factors that I am more confident that we can make this change with low risk.
So my feeling is that we should rather go for a trade off. Since the goal is to prevent that user misconfigures the mapping tree without warning, we should do that without changing the way the mapping tree is handled internally. simply by adding a consistency check when starting the server or changing the mapping tree configuration.
If we want to limit the risk we could phase things: in first phase we only log warning if inconsistency are found and latter on when we get more confident about the code we could reject the configuration change in case of inconsistency
I know that my proposal is less appealing in term of architecture but such a solution is safer because it does not change the way mapping tree are handled and that drastically limits the regression risks
Generally it's my view that we should always prioritise constraint and "inability to make mistakes" over "warning about mistakes". There is certainly value in providing warnings about mistakes when they occur, but preventing the mistake from ever occuring is a far more reliable option. Our server should ensure that there is "no way to hold it incorrectly".
In the situation where we "warn", that would then actually mean we have to redesign some parts of lib389 to parse and generate a mapping tree itself so it can then correctly determine the parent-suffixs to emit into configs when it attaches a backend into the tree. This would also itself be a significant chunk of work, and a risk of breaking our cli tools too, while also "not preventing" the issue for any other administration methods that exist.
I think that your suggestion is moving the burden of correctness from our work in the server as engineers, onto administrators and our tooling to "understand and use it correctly", and that doesn't really sit right with me. So I'd rather continue with the suggestion I have made as we eliminate an entire class of potential problems.
In order to really see understand the percieved risks of this change, I'd really like to see configurations that would cause this proposal to fail, and then those can become tests that we can understand and resolve issues with.
Thanks,
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia
Hi William, I agree with your architecture points and that is why I said my proposal is a less appealing trade off.
My real concern is your last point: we just do not know and IMHO we are unable to predict what (or if) config will cause problems, and I am afraid we will only discover it when people start to complain. So I still think that the benefit/risk ratio is bad)
Regards Pierre
On Fri, Oct 16, 2020 at 1:35 AM William Brown wbrown@suse.de wrote:
Hi,
For my part, I have seen mapping tree misconfiguration popping from time
to time but it is quite rare (maybe 7 or 8 times in 15 years)
So although in pure term or architecture your proposal is IMHO the
cleanest, in term of risks it is not a good solution:
there will likely be regressions (it always happens when a component
get redesigned)
and that means that the number of people that will be annoyed by the
change will be far greater than the few people that will be helped by the change.
Today, we already rely upon the attribute of cn to determine what suffix the mapping tree element provides. We must trust this value, and already do. This also implies that all current deployments, also trust this value to be correct.
It is also provable by tests that *invalid* nsslapd-parent-suffix configs cause backends to "not appear" in the mapping tree. So invalid parent suffixes already fail "noisly". This means, yes, that most deployments have both valid cn's for suffixes and valid nsslapd-parent-suffix values.
While it may be rare, we have had a few cases here at suse because the lib389 tools make a mistake in configuration that can easily and trivially cause this failure.
So changing this to trust a value of cn to provide ordering, this is already a value we rely on to be correct *and* we remove an obvious configuration consistency failure that has occurred. It is because of these factors that I am more confident that we can make this change with low risk.
So my feeling is that we should rather go for a trade off. Since the goal is to prevent that user misconfigures the mapping tree
without warning,
we should do that without changing the way the mapping tree is handled
internally.
simply by adding a consistency check when starting the server or
changing the mapping tree configuration.
If we want to limit the risk we could phase things: in first phase we only log warning if inconsistency are found and latter on when we get more confident about the code we could
reject the configuration change in case of inconsistency
I know that my proposal is less appealing in term of architecture but
such a solution is safer because it does not change the way mapping tree are handled and that drastically limits the regression risks
Generally it's my view that we should always prioritise constraint and "inability to make mistakes" over "warning about mistakes". There is certainly value in providing warnings about mistakes when they occur, but preventing the mistake from ever occuring is a far more reliable option. Our server should ensure that there is "no way to hold it incorrectly".
In the situation where we "warn", that would then actually mean we have to redesign some parts of lib389 to parse and generate a mapping tree itself so it can then correctly determine the parent-suffixs to emit into configs when it attaches a backend into the tree. This would also itself be a significant chunk of work, and a risk of breaking our cli tools too, while also "not preventing" the issue for any other administration methods that exist.
I think that your suggestion is moving the burden of correctness from our work in the server as engineers, onto administrators and our tooling to "understand and use it correctly", and that doesn't really sit right with me. So I'd rather continue with the suggestion I have made as we eliminate an entire class of potential problems.
In order to really see understand the percieved risks of this change, I'd really like to see configurations that would cause this proposal to fail, and then those can become tests that we can understand and resolve issues with.
Thanks,
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia _______________________________________________ 389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
On 16 Oct 2020, at 17:48, Pierre Rogier progier@redhat.com wrote:
Hi William, I agree with your architecture points and that is why I said my proposal is a less appealing trade off.
My real concern is your last point: we just do not know and IMHO we are unable to predict what (or if) config will cause problems, and I am afraid we will only discover it when people start to complain. So I still think that the benefit/risk ratio is bad)
I think this wasn't my point. The thing is *any* change will have that "unknown" risk. Our job is to qualify and identify as many of those risks as we can, to remove them as unknowns. Think about the work recently to merge the changelog to the main db, or BDB to LMDB work, even changing from perl to python for installation. These are all significantly larger changes, which would be "much riskier" but all of them have been managed effectively by the team communicating, coordinating, analysing, designing and testing changes.
So I really don't accept this "unknown" risk argument. I have laid out a design that explores the configuration, how it works today and how the values are currently trusted, and a process to manage and understand this change in a way to minimise the risk. There are associated tests, and it passes with address sanitiser, and other test cases for mapping trees, replication and others.
If we just say "unknown risk" at every change we make we'd never progress. We may as well packup and go home, the project is completed.
So I still stand by my design and the PR I have submitted in this case, and if there are concerns about esoteric configurations, then we should identify and understand them too beyond the testing I have already provided.
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia
On 19.10.20 01:26, William Brown wrote:
On 16 Oct 2020, at 17:48, Pierre Rogier progier@redhat.com wrote:
Hi William, I agree with your architecture points and that is why I said my proposal is a less appealing trade off.
My real concern is your last point: we just do not know and IMHO we are unable to predict what (or if) config will cause problems, and I am afraid we will only discover it when people start to complain. So I still think that the benefit/risk ratio is bad)
I think this wasn't my point. The thing is *any* change will have that "unknown" risk. Our job is to qualify and identify as many of those risks as we can, to remove them as unknowns. Think about the work recently to merge the changelog to the main db, or BDB to LMDB work, even changing from perl to python for installation. These are all significantly larger changes, which would be "much riskier" but all of them have been managed effectively by the team communicating, coordinating, analysing, designing and testing changes.
So I really don't accept this "unknown" risk argument. I have laid out a design that explores the configuration, how it works today and how the values are currently trusted, and a process to manage and understand this change in a way to minimise the risk. There are associated tests, and it passes with address sanitiser, and other test cases for mapping trees, replication and others.
If we just say "unknown risk" at every change we make we'd never progress. We may as well packup and go home, the project is completed.
if you put it that way any change is justified because it is a change. Changes are necessary to achieve something, eg features performance (and I would distinguish changes from fixes).
This started, as you said yourself, because:
This has come up because there is a set of customer cases where they have configured it incorrectly, due to bugs in lib389. The issues in lib389 arise from a lack of validation/constraint in the checking of the nsslapd-parent-suffix value in the server, allowing the client to create invalid configurations.
So today, our own tools can easily, and trivially cause this situation.
<<<
So we have situation where the design has flaws, but in effect was "working" and the we messed up ourselves by providing tools which can easily break things. And here I would say it is justified to discuss the balance of fixing the tools and eventually adding some checks to the server vs reimplementing it with the risk that the design, implementation and new tooling will als have challenges.
Ludwig
So I still stand by my design and the PR I have submitted in this case, and if there are concerns about esoteric configurations, then we should identify and understand them too beyond the testing I have already provided.
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia _______________________________________________ 389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
Hi William,
Things are not black and white: there is a huge difference between a fix with limited impact (like adding some check in configuration tools or in the server) and redesigning something that is used in many different contexts for every request handled by the server ...
In the first case we could easily mitigate the risk by testing and be fairly confident, in the second case the tests are too complex to achieve the same confidence and we should take this kind of risk only if there were a serious benefit to balance it, but in this case, there are other solutions with less risks.
I can understand it could seem too conservervative and frustrating but that is the price when working on mature projects. If you do not do that, the product becomes unstable, and users quickly abandon it.
Regards, Pierre
On Mon, Oct 19, 2020 at 1:27 AM William Brown wbrown@suse.de wrote:
On 16 Oct 2020, at 17:48, Pierre Rogier progier@redhat.com wrote:
Hi William, I agree with your architecture points and that is why I said my proposal
is a less appealing trade off.
My real concern is your last point: we just do not know and IMHO we are unable to predict what (or if)
config will cause problems, and I am afraid we will only discover it when people start to complain.
So I still think that the benefit/risk ratio is bad)
I think this wasn't my point. The thing is *any* change will have that "unknown" risk. Our job is to qualify and identify as many of those risks as we can, to remove them as unknowns. Think about the work recently to merge the changelog to the main db, or BDB to LMDB work, even changing from perl to python for installation. These are all significantly larger changes, which would be "much riskier" but all of them have been managed effectively by the team communicating, coordinating, analysing, designing and testing changes.
So I really don't accept this "unknown" risk argument. I have laid out a design that explores the configuration, how it works today and how the values are currently trusted, and a process to manage and understand this change in a way to minimise the risk. There are associated tests, and it passes with address sanitiser, and other test cases for mapping trees, replication and others.
If we just say "unknown risk" at every change we make we'd never progress. We may as well packup and go home, the project is completed.
So I still stand by my design and the PR I have submitted in this case, and if there are concerns about esoteric configurations, then we should identify and understand them too beyond the testing I have already provided.
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia _______________________________________________ 389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
Hi,
So some of the arguments here is that we are introducing risk for something that is not really a big problem. Or, simply not worth investing in. From a Red Hat perspective "we" would _never_ fix this, it's just not a problem that comes up enough to justify the work and time. But... The initial work has been done by the upstream community (William). So from a RH perspective we are getting this work for free. Personally I don't see this code change as "very" risky, but this is a very sensitive area of the code. That being said, I am not opposed to adding it, but... I think we need much more testing around it to build confidence in the patch. I would want tests that deal with suffixes of varying size, names, nested levels/complexity:
o=my_server.com
dc=example,dc=com
dc=abcdef,dc=abc (same length as suffix above - since the patch uses sizing as a way of sorting)
dc=test,dc=this,dc=patch
I want tests that are adding and removing subsuffixes, and sub-subsuffixes, and making sure ldap ops work, and replication, etc. I want tests that use many different suffixes at the same time and many subsuffixes - some customers have 50 subsuffixes. Our current CI test suite does not have these kinds of tests, and we need them.
As of today I'm not comfortable with the current CI tests to ack this patch, but if we can ramp it up and cover more scenarios it would be a step in the right direction. This is all just my humble opinion, we are all still just talking :-)
Mark
On 10/19/20 6:47 AM, Pierre Rogier wrote:
Hi William,
Things are not black and white: there is a huge difference between a fix with limited impact (like adding some check in configuration tools or in the server) and redesigning something that is used in many different contexts for every request handled by the server ...
In the first case we could easily mitigate the risk by testing and be fairly confident, in the second case the tests are too complex to achieve the same confidence and we should take this kind of risk only if there were a serious benefit to balance it, but in this case, there are other solutions with less risks.
I can understand it could seem too conservervative and frustrating but that is the price when working on mature projects. If you do not do that, the product becomes unstable, and users quickly abandon it.
Regards, Pierre
On Mon, Oct 19, 2020 at 1:27 AM William Brown <wbrown@suse.de mailto:wbrown@suse.de> wrote:
> On 16 Oct 2020, at 17:48, Pierre Rogier <progier@redhat.com <mailto:progier@redhat.com>> wrote: > > Hi William, > I agree with your architecture points and that is why I said my proposal is a less appealing trade off. > > My real concern is your last point: > we just do not know and IMHO we are unable to predict what (or if) config will cause problems, and I am afraid we will only discover it when people start to complain. > So I still think that the benefit/risk ratio is bad) > I think this wasn't my point. The thing is *any* change will have that "unknown" risk. Our job is to qualify and identify as many of those risks as we can, to remove them as unknowns. Think about the work recently to merge the changelog to the main db, or BDB to LMDB work, even changing from perl to python for installation. These are all significantly larger changes, which would be "much riskier" but all of them have been managed effectively by the team communicating, coordinating, analysing, designing and testing changes. So I really don't accept this "unknown" risk argument. I have laid out a design that explores the configuration, how it works today and how the values are currently trusted, and a process to manage and understand this change in a way to minimise the risk. There are associated tests, and it passes with address sanitiser, and other test cases for mapping trees, replication and others. If we just say "unknown risk" at every change we make we'd never progress. We may as well packup and go home, the project is completed. So I still stand by my design and the PR I have submitted in this case, and if there are concerns about esoteric configurations, then we should identify and understand them too beyond the testing I have already provided. — Sincerely, William Brown Senior Software Engineer, 389 Directory Server SUSE Labs, Australia _______________________________________________ 389-devel mailing list -- 389-devel@lists.fedoraproject.org <mailto:389-devel@lists.fedoraproject.org> To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org <mailto:389-devel-leave@lists.fedoraproject.org> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
--
389 Directory Server Development Team
389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
In the first case we could easily mitigate the risk by testing and be fairly confident, in the second case the tests are too complex to achieve the same confidence and we should take this kind of risk only if there were a serious benefit to balance it, but in this case, there are other solutions with less risks.
Actually, I think testing the lib389 tooling would be even harder. You would need to recreate the logic of the mapping tree and sorting in python, which may have subtle differences compared to the C version. So it would be harder to test and gain confidence in. It also doesn't solve the issue that may come about from manual misconfiguration.
I can understand it could seem too conservervative and frustrating but that is the price when working on mature projects. If you do not do that, the product becomes unstable, and users quickly abandon it.
I have worked on this project for a number of years, so I'm well aware of the culture in the team. We are a team who values the highest quality of code, with customers who demand the very best. To satisfy this as engineers we need to be confident in what we do and the work we create. But every day we make changes that are bigger than this, or have "more unknowns" and more. It's out attitude as a team to quality, our attention to testing, and designs, that make us excellent at effectively making changes with confidence.
Because just as easily, when a product has subtle traps, unknown configuration bugs and lets people mishandle it, then they also abandon us. Our user experience is paramount, and part of that experience is not just stability, but reliability and correctness, that changes performed by administrators will work and not "silently fail". This bug is just as much a risk for people to abandon us because when the server allows misconfiguration to exist that is hard to isolate and understand that too can cause a negative user experience.
So here, I think we are going to have to "agree to disagree", but as Mark has stated - the fix is created, the PR is open. If you have more configuration cases to contribute to the test suite, that would benefit the project significantly to ensure the quality of the change, and the quality of the mapping tree in general. Our job is to qualify and create scenarios that were "unknown" and turn them to "knowns" so we can control changes and have confidence in our work.
On 20 Oct 2020, at 06:10, Mark Reynolds mreynolds@redhat.com wrote:
Hi,
So some of the arguments here is that we are introducing risk for something that is not really a big problem. Or, simply not worth investing in. From a Red Hat perspective "we" would never fix this, it's just not a problem that comes up enough to justify the work and time. But... The initial work has been done by the upstream community (William).
With a corporate interest too, we have a customer at SUSE who has hit this :).
So from a RH perspective we are getting this work for free. Personally I don't see this code change as "very" risky, but this is a very sensitive area of the code. That being said, I am not opposed to adding it, but... I think we need much more testing around it to build confidence in the patch. I would want tests that deal with suffixes of varying size, names, nested levels/complexity:
o=my_server.com dc=example,dc=com dc=abcdef,dc=abc (same length as suffix above - since the patch uses sizing as a way of sorting) dc=test,dc=this,dc=patch
Yep, these are some great test ideas. I can add these.
I want tests that are adding and removing subsuffixes, and sub-subsuffixes, and making sure ldap ops work, and replication, etc. I want tests that use many different suffixes at the same time and many subsuffixes - some customers have 50 subsuffixes. Our current CI test suite does not have these kinds of tests, and we need them.
I have already checked with replication suite too, and of course, with ASAN. I think that these also are good to have added in general, so I can expand the testing to include more suffixes too.
Do you see 50 subsuffixes in a single level nesting or deeper? I can do some shallow nesting and deep nesting hierarchies with that kind of number if you want. I think an interesting test would also be to have
ou=x,ou=y,dc=example,dc=com
dc=example,dc=com
and then add ou=y,dc=example,dc=com in between. Today I think the pre-patched MT code would actually not handle this either, but that's a pretty big edge case IMO. The real guarantee is that we do assemble the tree correctly.
We thankfully gain confidence already because the CN is already relied on for routing and query matching anyway, so we know these values *must* be correct, we just need to guarantee the sorting order and tree assembly.
Thanks for the ideas Mark :)
As of today I'm not comfortable with the current CI tests to ack this patch, but if we can ramp it up and cover more scenarios it would be a step in the right direction. This is all just my humble opinion, we are all still just talking :-)
Mark
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia
On 10/20/20 4:01 AM, William Brown wrote:
In the first case we could easily mitigate the risk by testing and be fairly confident, in the second case the tests are too complex to achieve the same confidence and we should take this kind of risk only if there were a serious benefit to balance it, but in this case, there are other solutions with less risks.
Actually, I think testing the lib389 tooling would be even harder. You would need to recreate the logic of the mapping tree and sorting in python, which may have subtle differences compared to the C version. So it would be harder to test and gain confidence in. It also doesn't solve the issue that may come about from manual misconfiguration.
I can understand it could seem too conservervative and frustrating but that is the price when working on mature projects. If you do not do that, the product becomes unstable, and users quickly abandon it.
I have worked on this project for a number of years, so I'm well aware of the culture in the team. We are a team who values the highest quality of code, with customers who demand the very best. To satisfy this as engineers we need to be confident in what we do and the work we create. But every day we make changes that are bigger than this, or have "more unknowns" and more. It's out attitude as a team to quality, our attention to testing, and designs, that make us excellent at effectively making changes with confidence.
Because just as easily, when a product has subtle traps, unknown configuration bugs and lets people mishandle it, then they also abandon us. Our user experience is paramount, and part of that experience is not just stability, but reliability and correctness, that changes performed by administrators will work and not "silently fail". This bug is just as much a risk for people to abandon us because when the server allows misconfiguration to exist that is hard to isolate and understand that too can cause a negative user experience.
So here, I think we are going to have to "agree to disagree", but as Mark has stated - the fix is created, the PR is open. If you have more configuration cases to contribute to the test suite, that would benefit the project significantly to ensure the quality of the change, and the quality of the mapping tree in general. Our job is to qualify and create scenarios that were "unknown" and turn them to "knowns" so we can control changes and have confidence in our work.
On 20 Oct 2020, at 06:10, Mark Reynolds mreynolds@redhat.com wrote:
Hi,
So some of the arguments here is that we are introducing risk for something that is not really a big problem. Or, simply not worth investing in. From a Red Hat perspective "we" would never fix this, it's just not a problem that comes up enough to justify the work and time. But... The initial work has been done by the upstream community (William).
With a corporate interest too, we have a customer at SUSE who has hit this :).
As users/customers start hitting MT bugs justifies we are fixing it. Should it be fixed in MT or in lib389 ?. I tend to agree with Pierre and Ludwig that (buggy) MT have been working for decades now and as sensitive and difficult to test area I prefer to not change it. Now we have a valid patch/design on the table and I suspect/hope that if it introduces a regression it will be discovered rapidly. So I agree there is a disagreement and that is the way open source works. IMHO the patch should be pushed as soon as it is reviewed.
best regards thierry
So from a RH perspective we are getting this work for free. Personally I don't see this code change as "very" risky, but this is a very sensitive area of the code. That being said, I am not opposed to adding it, but... I think we need much more testing around it to build confidence in the patch. I would want tests that deal with suffixes of varying size, names, nested levels/complexity:
o=my_server.com dc=example,dc=com dc=abcdef,dc=abc (same length as suffix above - since the patch uses sizing as a way of sorting) dc=test,dc=this,dc=patch
Yep, these are some great test ideas. I can add these.
I want tests that are adding and removing subsuffixes, and sub-subsuffixes, and making sure ldap ops work, and replication, etc. I want tests that use many different suffixes at the same time and many subsuffixes - some customers have 50 subsuffixes. Our current CI test suite does not have these kinds of tests, and we need them.
I have already checked with replication suite too, and of course, with ASAN. I think that these also are good to have added in general, so I can expand the testing to include more suffixes too.
Do you see 50 subsuffixes in a single level nesting or deeper? I can do some shallow nesting and deep nesting hierarchies with that kind of number if you want. I think an interesting test would also be to have
ou=x,ou=y,dc=example,dc=com
dc=example,dc=com
and then add ou=y,dc=example,dc=com in between. Today I think the pre-patched MT code would actually not handle this either, but that's a pretty big edge case IMO. The real guarantee is that we do assemble the tree correctly.
We thankfully gain confidence already because the CN is already relied on for routing and query matching anyway, so we know these values *must* be correct, we just need to guarantee the sorting order and tree assembly.
Thanks for the ideas Mark :)
As of today I'm not comfortable with the current CI tests to ack this patch, but if we can ramp it up and cover more scenarios it would be a step in the right direction. This is all just my humble opinion, we are all still just talking :-)
Mark
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia _______________________________________________ 389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
Hi,
As we are speaking about the test, we should not forget to test the different mapping tree usages: the common ones (like standard, chaining or referral on update backends) and the uncommon ones (like those involving the various distribution plugins) I am a bit uneasy about thoses, as I remember they sometimes involved weird configuration and custom plugins), that said it is old stories and things have probably changed. Let's hope that we now have a nice list of supported distribution scenarios that could be tested ...
Regards, Pierre
On Mon, Oct 19, 2020 at 10:10 PM Mark Reynolds mreynolds@redhat.com wrote:
Hi,
So some of the arguments here is that we are introducing risk for something that is not really a big problem. Or, simply not worth investing in. From a Red Hat perspective "we" would *never* fix this, it's just not a problem that comes up enough to justify the work and time. But... The initial work has been done by the upstream community (William). So from a RH perspective we are getting this work for free. Personally I don't see this code change as "very" risky, but this is a very sensitive area of the code. That being said, I am not opposed to adding it, but... I think we need much more testing around it to build confidence in the patch. I would want tests that deal with suffixes of varying size, names, nested levels/complexity:
o=my_server.com dc=example,dc=com dc=abcdef,dc=abc (same length as suffix above - since the patch uses
sizing as a way of sorting)
dc=test,dc=this,dc=patch
I want tests that are adding and removing subsuffixes, and sub-subsuffixes, and making sure ldap ops work, and replication, etc. I want tests that use many different suffixes at the same time and many subsuffixes - some customers have 50 subsuffixes. Our current CI test suite does not have these kinds of tests, and we need them.
As of today I'm not comfortable with the current CI tests to ack this patch, but if we can ramp it up and cover more scenarios it would be a step in the right direction. This is all just my humble opinion, we are all still just talking :-)
Mark
On 10/19/20 6:47 AM, Pierre Rogier wrote:
Hi William,
Things are not black and white: there is a huge difference between a fix with limited impact (like adding some check in configuration tools or in the server) and redesigning something that is used in many different contexts for every request handled by the server ...
In the first case we could easily mitigate the risk by testing and be fairly confident, in the second case the tests are too complex to achieve the same confidence and we should take this kind of risk only if there were a serious benefit to balance it, but in this case, there are other solutions with less risks.
I can understand it could seem too conservervative and frustrating but that is the price when working on mature projects. If you do not do that, the product becomes unstable, and users quickly abandon it.
Regards, Pierre
On Mon, Oct 19, 2020 at 1:27 AM William Brown wbrown@suse.de wrote:
On 16 Oct 2020, at 17:48, Pierre Rogier progier@redhat.com wrote:
Hi William, I agree with your architecture points and that is why I said my
proposal is a less appealing trade off.
My real concern is your last point: we just do not know and IMHO we are unable to predict what (or if)
config will cause problems, and I am afraid we will only discover it when people start to complain.
So I still think that the benefit/risk ratio is bad)
I think this wasn't my point. The thing is *any* change will have that "unknown" risk. Our job is to qualify and identify as many of those risks as we can, to remove them as unknowns. Think about the work recently to merge the changelog to the main db, or BDB to LMDB work, even changing from perl to python for installation. These are all significantly larger changes, which would be "much riskier" but all of them have been managed effectively by the team communicating, coordinating, analysing, designing and testing changes.
So I really don't accept this "unknown" risk argument. I have laid out a design that explores the configuration, how it works today and how the values are currently trusted, and a process to manage and understand this change in a way to minimise the risk. There are associated tests, and it passes with address sanitiser, and other test cases for mapping trees, replication and others.
If we just say "unknown risk" at every change we make we'd never progress. We may as well packup and go home, the project is completed.
So I still stand by my design and the PR I have submitted in this case, and if there are concerns about esoteric configurations, then we should identify and understand them too beyond the testing I have already provided.
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia _______________________________________________ 389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
--
389 Directory Server Development Team
389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
--
389 Directory Server Development Team
389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject....
On 21 Oct 2020, at 02:47, Pierre Rogier progier@redhat.com wrote:
Hi,
As we are speaking about the test, we should not forget to test the different mapping tree usages: the common ones (like standard, chaining or referral on update backends) and the uncommon ones (like those involving the various distribution plugins) I am a bit uneasy about thoses, as I remember they sometimes involved weird configuration and custom plugins), that said it is old stories and things have probably changed. Let's hope that we now have a nice list of supported distribution scenarios
These are all really good ideas for expansion of the tests, for certain. Since the scope of the change is just the tree hierarchy, not the backend type though, then there should be no impact if it's chaining/bdb/referral in the tree, only that the nodes are correctly arranged.
But more testing is always good, and I'm looking at the area now anyway, so I'll add some of these. Thanks!
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia
389-devel@lists.fedoraproject.org