On Mon, 20 Nov 2017 16:30:22 +0100 Petr Splichal psplicha@redhat.com wrote:
Hi!
Current Fedora CI documentation suggests: "The tests to be executed are stored in the dist-git repositories. The tests are stored or wrapped along-side the spec files..." [0] While working on porting tests to Fedora CI, several times we've noted concerns from developers/maintainers about placing test code directly into dist-git repositories. A common question is how to efficiently maintain tests & minimize test code duplication. There are some nice real-life examples available:
Examples
There are several shells which implement the POSIX specification: bash, ksh, mksh, zsh, dash. All of them share a significant amount of test coverage and it does not make sense to commit & maintain identical tests in five different repositories (+ possible branches). See the pull request [1] for a bit of context.
Another example is Ruby: With about 80 packages related to Ruby on Rails it would be useful and efficient to have a single place for integration tests which verify that the framework is correctly working after updating any of these packages. Conversely, maintaining those tests in 80 repos would be a tedious task.
Proposal: Share!
So this is where the idea of shared test repository comes from. In general, tests define how the software works and the basic functionality of many packages doesn’t change that often. We try hard to keep the backward compatibility where possible. Thus it seems natural that, for such components, tests guarding the spec could change at a slower pace than the distribution branches.
Main goals
- Package source and tests are linked in a discoverable and unambiguous way (dist-git)
- Prevent test code duplication (minimize test maintenance)
- Catch incompatibilities early
How are git repos in different namespaces of a larger repo ecosystem easier to link together than just urls that point to github or pagure.io ?
There has been some talk about pulling in tests that live in upstream repos which makes me think github. If this is true and we're going to be pulling other things in from github/pagure/gitlab/wherever, why not just have these shared repos in pagure.io?
How it could work? A new namespace "tests" would be created in Fedora as suggested in [2], test code would reside there and the tests in the rpms dist git could directly reference a specific version of the tests to be run, e.g. with a pre_task:
Is the Fedora dist-git instance set up to enable ACL lists made up of users who are not already packagers and who don't otherwise have write access to the corresponding package repo? If not, how much extra work are we talking about? Are there available cycles to get the work done in short order?
As I understand it, RHEL QE has a setup with a "tests" namespace which stores all the tests for a package whose name matches the name of that repo from the tests namespace. If we use "tests" as the namespace here, I fear that we will be adding to confusion from the people who are used to the RHEL version of a "tests" namespace. We would also not realistically be able to replicate a similar setup in Fedora. There seemed to be interest in something like that when the issue came up on devel@ a few years ago which also makes me think it'd be unwise to close this door unless there is no other reasonable choice. If we do go the dist-git namespace route, have there been any alternative names proposed?
pre_tasks: - name: Fetch the tests git: repo: https://src.fedoraproject.org/tests/shell.git dest: tests version: master
Also, standard-test-roles [3] should be enhanced to make it really easy to fetch tests from a remote repo. Ansible git module is quite simple but with standard-test-roles it could be even shorter. The tests.yml file could look like this (we could support fetching from multiple repositories):
- role: standard-test-beakerlib tags: - classic - container - atomic tests: - smoke repo: - https://src.fedoraproject.org/tests/shell.git version: master
A consistent location for shared tests is a good source of examples and helps drive automation in the future, e.g. trigger CI jobs on test changes.
For stable packages providing specification-like functionality a single master branch could be used, for those components which change more, dist-git tests.yml could pick a proper branch or even a specific commit (if having concerns about updated tests breaking the continuous integration).
This seems to give a nice flexibility while at the same time minimizing test code duplication. Referencing specific tests from rpms dist git ties tests and code together very well. Storing the test code in the tests namespace is only one option, any git repo accessible from the CI pipeline would suffice.
I don't really have a problem with the idea of having a shared repository in the interest of decreasing duplication. That being said, I do have some questions about some more of the details behind how this would work.
While this isn't the most common case, it is possible to have a situation where a package changes and the tests need to change at the same time. It's not necessarily a fatal problem but it brings up some questions:
- For the PR testing case, how will the shared tests be included in that PR set for testing purposes?
- Will there be some delay in test/build kickoff on dist-git repo change to allow for tests to be pushed around the same time that there are code changes? Otherwise are we planning to expect some transitory failures if both the tests and packages need to change at the same time? Are there plans to verify that things are properly re-run if those transitory errors happen?
Just One Place?
We could possibly go one more step further: Shall we aim at having just a single test code for testing both Fedora and Red Hat Enterprise Linux? At least for those cases where this is possible: There might be some licensing or compatibility issues. For example, speaking about Beaker: restraint harness can also be instructed to fetch tests directly from git so the identical test code could perhaps be used for running tests in Beaker as well.
A recent experience of a libselinux pull request [4] shows how cumbersome it can be to keep both upstream and downstream version of the test code in sync. Wouldn't it be great to have everything at a single place and keep the test code up-to-date only there? Instead of doing several rounds of "pull request - give feedback - update internal tests - update the pull request - find another issue - give feedback..." with many unnecessary steps and reminders just directly collaborating on a single test repository?
When downstream testing is just a branch of upstream testing, we can use our standard open source workflows. Just as in upstream, there would likely be a downstream shared git repository that branches from the upstream one.
Bonus
As a bonus we would keep the dist-git for such repositories a bit more consistent as a place for metadata only: Build metadata (spec file = how to build the package) and test metadata (tests.yml = how to test the package).
Plus we would get a more streamlined test maintenance as QE and volunteers willing to help with improving the test coverage could be granted direct access to individual package tests repo. Not every QE is a Fedora packager so contributing to dist-git is not that straightforward.
The question of how to deal with non-packager write access for tests has been brought up a couple of times but I'm not sure it's been dealt with well yet.
I don't really like the idea of tests with arbitrary ACLs. As we work toward a vision of having CI gate all changes to packages before those changes are even accepted, changes to the tests associated with a package start approaching the importance of the non-test content for the same package.
If someone tosses an unproven test into a shared repo before going home for the weekend - there is a chance that could cause all sorts of problems from keeping a package's changes out of stable to breaking the build of a release-blocking image.
I realize that there are plans to handle situations like this with automatic reversion but to the best of my knowledge, that system doesn't exist yet (an admin could back stuff out before the automation exists, too). If we demand that packagers go through a vetting and mentoring process even if they're a co-maintainer, why shouldn't we have a similar requirement for the folks who can change the tests which are responsible for gating the bits we release?
Notes
There are cons of this approach as well. It isn’t possible to have new feature code and tests submitted in a single commit or triggering test execution for every change in tests (CI for tests). There is a need for handling relevance of individual tests for different environments. But the maintenance simplification seems to be worth it, based on experience of the BaseOS QE team which has been maintaining thousands of tests for many years and using them for testing large number of supported versions of Red Hat Enterprise Linux.
The overall benefits also depend on which test types we are speaking about. Tests should always be as far upstream as possible. Unit tests generally make more sense in upstream. For integration tests it might make more sense to live in the shared Fedora tests repository. Recommendations about which tests fit well into this "tests" namespace will evolve, but it will be up to each package to choose where to place its tests.
So?
What are your thoughts on this? Does it make sense to you?
It makes sense, just some concerns about the specifics of this proposal. Thanks for putting so much time into this.
Tim
psss...
[0] https://fedoraproject.org/wiki/CI#Tests_locations [1] https://src.fedoraproject.org/rpms/bash/pull-request/1 [2] https://pagure.io/fedora-infrastructure/issue/6478 [3] https://pagure.io/standard-test-roles/ [4] https://src.fedoraproject.org/rpms/libselinux/pull-request/1 _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org