Le vendredi 08 mars 2019 à 10:28 -0500, Russ Cox a écrit :
On Fri, Mar 8, 2019 at 8:09 AM 'Nicolas Mailhot' via
golang-dev <
golang-dev(a)googlegroups.com> wrote:
> The notary part of Go modules, like the rest of the module
> implementation, suffers from a lack of understanding of integration
> and QA workflows, and a simplistic dev-centric worldview.
This is not an auspicious beginning.
This mail came across as trying more to be antagonistic than
constructive
I'm sorry, my level of English is not sufficient to convey information
you want to ignore, and dress it up so you feel good about it. I've
tried to stick to plain facts. If you object to plain facts I can't do
anything about it.
Even so, I really would like to understand your concerns
and either help address misunderstandings or adjust the
design appropriately.
> CONTINUOUS TRANSPARENT INTERNET LOOKUPS ARE NOT ACCEPTABLE POST−DEV
>
> [snip]
>
> But, that QA reproducibility constrain was not understood by Go
> developers, and pretty much all the Go tools that were ported to
> module
> mode, will attempt Internet accesses at the slightest occasion, and
> change the project state based on those access results, with no
> ability
> to control or disable it, and nasty failure modes if the internet
> access
> fails.
This really couldn't be farther from the truth.
You can turn off module changes using -mod=readonly in CI/CD systems.
It would be nice if it where true. Unfortunately even a simple command
like tell me what you know about the local code tree
go list -json -mod=readonly ./...
will abort if the CI/CD system cuts network access to make sure builds
do not depend on Internet state.
That's what the nasty failure modes are about. If the go command is not
sure about anything, it will “solve” things by trying to download new
bits from the Internet. If the Internet is not available, it will abort
violently, not degrade gracefully and work with what it has.
The no-modification no-internet CI/CD constrain was not taken into
account into the original design. It was bolted on later, and the
bolting is imperfect. Just take a vacation in some paradisiac place with
no internet access, and see how far you can do go coding in module mode.
*That* will replicate our CI/CD constrains (paradisiac place as a bonus,
we don't have those in our build farms – see I'm trying not to be
antagonistic).
> QA−ED CODE IS NOT PRISTINE DEV CODE
>
I've read this a couple times and found it a little hard to follow,
but I think what you are saying is that the go.sum and notary checks
are going to cause serious problems because Fedora and other
distributions want to create modified copies of module versions and
use them for building the software they ship. I don't see why that
would be the case.
I would have expected that if Fedora modified a library, they
would give it a different version number, so that for example
modifying v1.2.2 would produce v1.2.3-fedora.1
And that's not the case neither for Fedora, nor RHEL, nor Debian, nor
pretty much any large scale integrator, because when you integrate
masses of third-party code you will eventually hit bugs in pretty much
every component, so having to run patched code at every layer is the
norm not the exception.
It would be terribly inconvenient to have to rename or renumber or
replace everything, and then have to convince all the other components
to use the renamed or renumbered versions of the components they depend
on. It would be even more inconvenient to do it just in time and
continuously flip flop between upstream and local names and numbers.
That would actually add friction to merging back and returning to
pristine upstream code, because merging back would now cost a
rename/renumber, instead of being a net win. Patching is forced on you
(you hit a bug in upstream code). Upstreaming is a deliberate virtous
choice (you've already fixed your problem locally). If you make
upstreaming more expensive, people just stop doing it.
There are legal ways to force distributions to do the wasteful
renaming/renumbering dance. They will take it as an hostile imposition.
Where do you think Iceweasel came from? No love lost here.
Go.sum and notary checks would only trigger at all if Fedora
were to take v1.2.2, modify it, and then try to pass it off as the
original v1.2.2
That's the standard Linux distribution workflow.
but that's indistinguishable from a man-in-the-middle attack.
The whole purpose of a distribution, is to be a giant middleman, freely
chosen by the end user, between the code released upstream, and this end
user¹. All middlemen are not hostile. You need to understand that or
your system will not work.
In the real world you have friendly middlemen. Sometimes layers of them
(it is quite common for local organisations to add another level of
changes over distro changes). Sometimes those middlemen make changes.
Sometimes they just check things for nastiness.
A correct trust test is not “it exists exactly this way on the
Internet”. What kind of assurance is that? If I dump malware on a public
URL, it’s trusted as long as no AV touches it mid-flight?
A correct trust test is “has the state been signed by an entity the user
trusts”.
So, digital signatures. A way to configure the public keys of trusted
third parties. And, if no signature by a trusted third party is found
locally, at last resort, and only if the user asked for it, look on the
internet if a notary signature exists. Of course, that's a piss-poor
level or trust, but better that than nothing.
Furthermore, once you have created those versions and want to
build software using them, all it takes is to create a new module
with just a go.mod file (no source code) listing those versions as
requirements and then run builds referring to the unmodified original
top-level targets. Those targets' dependencies will be bumped forward
to the Fedora-patched versions listed in the go.mod file, without
any need to modify any of the client modules at all.
# dnf repoquery --disablerepo=* --enablerepo=rawhide \
--whatprovides 'golang(*)' |wc -l
766
(here Debian people are laughing at me because we are lagging behind
them on the Go integration front)
We have several hundreds more Go components in the integration queue
waiting for review.
Some of those will end up as several Go modules.
And you want us to renumber all this, and patch all the other module
files that use those with the new numbers, and redo it all every time
something changes, all year round, just because the go command can’t
accept that the code state Fedora builds from, may differ from the one
observed on the Internet?
Really?
REALLY ?
Do you really REALLY think any sane integrator will play this dance long
instead of patching out notary checks from its go command, and be done
with it?
Is the utopia the fact there would be a consistent meaning for
mymodule(a)v1.2.2 across all possible references and that any
modified source code would have to present a modified version number?
The utopia is thinking that everything is released in a perfect state by
upstream, so changes downstream need not happen, and therefore any
downstream change is necessarily an attempt to inject malware by Mr
Nasty.
Is the inability to silently modify code without changing the
version
number what's going to break the workflows of a large proportion of
the Go ecosystem? I'd like to understand that better if so.
Distributions distinguish between the upstream version number and their
own build id. So you have
mymodule(a)v1.2.2 release 1
mymodule(a)v1.2.2 release 2
and so on. Each release id is a separate build that can involve
different patches (some release ids are more complex than a single
number).
Only a single release can exist within the distribution at a given time,
so makefiles, go mod files and so on need not differentiate between
release X and release Y, they will only see one of those at any time and
they better work with it because they won't be given or allowed any
other.
System artefacts (libraries, binaries, go modules) are built from plain
upstream source code, as downloaded by distribution processes directly
from uptream VCS or website, after preparation (removal of problem
parts, patching, etc) in pristine containers, isolated from the
Internet, and populated only with a minimal distribution installation
and the content of the distribution components necessary for their
build.
So to build
github.com/my/thing version x.y.z that declares in its
module file
module
github.com/my/thing
require (
github.com/some/dependency v1.2.3
github.com/another/dependency/v4 v4.0.0
)
A. We will populate a clean container with
1. a minimal system
2. the go compiler
3. the most recent system component that provides the go module
github.com/some/dependency ≥ 1.2.3 and < 2
and all its dependencies
(go module as produced by our own built of this other component, not
as downloaded by go get from the internet)
4. the most recent system component that provides the go module
github.com/another/dependency/v4 ≥ 4.0.0 and < 5
and all its dependencies (ditto)
5. the source code for
github.com/my/thing x.y.z as downloaded and
checked from the internet by a human (not by go get), and then sealed
B. We will cut internet access of this container
C. We will prepare the
github.com/my/thing x.y.z source (remove problem
parts, patch bugs, remove vendored code, remove at least local replaces
in go.mod)
I'm fairly certain we will nuke go.sum because we want to build from our
reference state against our reference versions, not reproduce whatever
upstream tested on ubuntu or windows.
D. We will point the go compiler to the local module files (via GOPROXY)
E. We will ask a go command to transform our
github.com/my/thing x.y.z
source state in a module files that can be used by other Go code (zip
ziphash info mod files) once deployed in our GOPROXY directory. It would
be nice if go mod pack existed upstream otherwise we will write and use
our own utility.
That will involve sanitizing the upstream mod file (remove indirect
requires, remove local replaces, not sure about non-local replaces yes,
by gut feeling is that we should remove them too and force the fixing of
imports in source preparation, but I'm not sure yet)
For various technical reasons it's not possible to deploy directly in
GOPROXY at this stage, so at this point you have the prepared
github.com/my/thing x.y.z source code in unpacked state in a directory,
the module it needs in GOPROXY, and a packed version of
github.com/my/thing x.y.z in a staging directory.
D. we will ask the go compiler to build the various binaries that need
to be produced from
github.com/my/thing version x.y.z
E. we will pack the result files (binaries and go modules) in system
components, so they can be deployed at need. The component containing
github.com/my/thing x.y.z will record a need for
github.com/some/dependency ≥ 1.2.3 and < 2 and
github.com/another/dependency/v4 ≥ 4.0.0 and < 5
So the internet access won't exist when you think it exists, the files
arrive on disk without go get, and the state they are available on is
not the upstream state.
Having worked in a proprietary integration shop before, and working with
proprietary integrators today, the workflows are no so different, except
proprietary shops tend to be a lot laxer, allowing internet access when
they should not, and either processing code as downloaded from the
internet, with no checks, or forking it to death, without trying to re-
attach to upstream.
Best regards,
¹ The function of this middleman is beat upstream code into shape so the
user does not have to do it itself. Beating into shape does involve
large-scale changing of upstream code.
Users choose the distribution system because their alternatives are:
1. to hope every single upstream they use never makes a mistake
requiring a last-mile fix (fat chance on that)
2. wasting their time doing the last-mile fixing themselves, instead of
delegating this function to distributors.
It's all free software. An unhappy distribution user can take the source
code he needs and get it integrated somewhere else. No strings attached.
--
Nicolas Mailhot