On Tue, Jun 23, 2020 at 08:31:06PM +0200, clime wrote:
> The unanswered question is what mechanism would be used make
sure that
> the rpms from the "module" are all installed. One option would be to
> somehow mangle rpm names, another option would be to add some kind of
> Provides/Requires, etc. But *some* mechanism is needed, because without
> that dnf would often pick other rpms.
>
> In Modularity the solution is that the rpms from the module shadow
> rpms with the same name from outside. That's probably the single
> feature of Modularity that causes the most problems.
Yeah, I could notice modularity causes quite a lot of troubles.
The question is whether those troubles can be somehow technically
justified, i.e. by the benefit modularity will eventually bring.
But I personally don't see it. Actually, (big revelation) I have never
understood modularity and I was even there in RH when it was just
starting to be born. Probably, one of the reasons why I didn't
understand it at that point was that there seemed to be no clear
specification floating around. We knew modularity should solve "too
fast, too slow" problem of distributions but it wasn't exactly clear
how. It was a cool buzz word but nobody seemed to know what it means
(at least that was my view of the situation).
I and perhaps others were thinking that it wants to provide parallel
availability+installability of the distribution software but after
some time, it was cleared out this isn't the case and that the goal is
just "parallel availability".
OK, but even this isn't clear to me.
I mean I understand the usefulness of modularity for build-time where
you can create a recipe (modulemd) that will build your packages with
interdependencies in a predictable and automatic manner. I think
that's a cool thing to have.
But what about run-time (or install-time in other words)? That's the
part I don't understand. And I have spent quite some time trying to
understand it but never managed. So it's possible that I am missing
something all these years...in that case, it would be great if
somebody could shed a light on it for me. Here, I would really like to
give modularity a chance.
From what I understand, the use of modularity in run-time is to
provide rpm namespaces. Natural way to do this would be to use
separate repositories ala COPR where rpms are namespaced by repo ID
but I know one of the requirements of modularity was to use a single
repo for those namespaces with an argument that dnf is slow when
working with a large number of repos...to me that reason always seemed
quite artificial...something is slow...ok, then it can be made faster.
I could understand if we were talking about let's say thousands of
modules - there I would believe that initiating a thousand (or
multiplied by few) new downloads of repo files might already have its
price. But okay, if thousands of modules were the plan, then I could
understand this argument.
But now comes even more curious part. So...run-time modularity
provides rpm namespacing if I understand it correctly. Basically
<module>/<stream>/<package_name>. The easy solution for this would be
to put the namespace implicitly into package name like python does it
when there should be multiple pythons available, e.g. currently in
CentOS7/EPEL7, there is python34-requests and python36-requests (I
understand there will be a dot between major and minor at some point
so e.g. python3.6-requests but that's another thing :)). So if we have
different rpm names (because the namespace is already included in the
rpm name itself), then there is no problem to provide multiple
variants of the "same package" (the same thing but intended e.g. for a
different python interpreter) in the same repo.
Yes. Putting the "stream identification" into the package name is the
most natural solution, and has been floated various times. It seems that
it would even be possible to make the implementation relatively painless
using some clever macros. And this alone would give 80% of what
modularity offers on the delivery side. I know various people have been
working on prototypes and I hope we have some serious proposal in the end.
So I would be willing to accept that this is a hacky solution or
just
a workaround (even though I am not sure it is). But even if I accept
that it is just a hack and we need a more proper solution, I still
have an issue in my mind. Let's say we have this two-level namespacing
(<module>/<stream>/) and it enables us to have a package of the same
name twice or more times in the same repo and it enables us to avoid
mangling the rpm names. Great, isn't it? Well...but what if those
different variants of the same package are actually
parallel-installable and a user would benefit from having them
parallel-installable (because it's a dev working with different
versions of the same language at the same time)? We can only install a
package of a certain name once into the system so that's why
modularity enables us to use always just a single stream from all
available streams of a module, i.e. you can only switch between the
individual streams, having multiple of them enabled at the same time
is not possible.
Yep, my personal view is that Modularity makes easy things complex,
and advanced things are labelled "out of scope".
So basically, modularity gives parallel-availability but at the same
time, it disables the option of parallel-installability which could be
achieved through alternatives and some smart packaging for probably
all the language stacks if I understand correctly. I think that's a
too much of a limitation. To avoid it, we would need to keep an rpm DB
per the namespace (<module>/<stream>/) and these various DBs would be
handled by dnf, which would basically mean, rpm command itself
wouldn't know what's all installed on the system - hard to imagine
that people would be alright with it. ...Or we can bring the notion of
the namespaces into rpm itself (that's where my suggestion of "Stream"
rpm attribute comes from but it could also be called just
"Namespace"). But then there is the argument: "Why not just put the
namespace into rpm name itself?" I mean...I wouldn't mind having it as
a separate attribute but the usefulness of it would need to be
discussed.
I think there's some fear that "name mangling" is not a general
solution, and we'd have cases where names conflict. I think the
concern is realistic, but not a big issue in practice. With some
careful naming guidelines we are able to resolve naming conflicts, and
I'm sure we could extend the guidelines to multiple versions of language
stacks or whatever. We'd probably have slightly longer names or packages,
but that's not the end of the world.
So I don't really get even after almost five years where
modularity is
going or what it wants to achieve. I don't understand its use-case for
any of Fedora, RHEL, and CentOS because disabling
parallel-installability to allow parallel availability is imho not
really an option. But yeah...maybe I am missing some angle. In that
case, please, explain it to me because I would really like to
understand...
Zbyszek