[fedora-java] Installing effective Maven POM files

Wed Feb 20 14:24:42 UTC 2013

----- Original Message -----
> From: "Mikolaj Izdebski" <mizdebsk at redhat.com>
> To: "java-devel" <java-devel at lists.fedoraproject.org>
> Sent: Wednesday, February 20, 2013 2:41:31 PM
> Subject: [fedora-java] Installing effective Maven POM files
> 
> Yesterday at the Java SIG meeting we started discussion about future
> of
> packaging of Maven artifacts in Fedora (i.e. the new guidelines).  We
> didn't get very far with discussion about the guidelines and the main
> reason for that seems to be that the new way of packaging Maven
> artifacts described in the guidelines causes effective POM files to
> be
> installed in place of raw POM files.
> 
> Let me begin with my apologies for not communicating this matter
> earlier, but I didn't consider this change as controversial and I
> didn't think that anyone would have any problems with it. Because
> there were many doubts (and even accusations) I'll try to describe in
> detail why in my opinion this change is needed.

If there was an accusation from my side it was entirely for the way the change was introduced and not discussing this prior to making the change as it makes real world problems for no benefit for some of us. Explanation about that further in the mail inlined.

> 
> 
> The current problem
> ===================
> 
> Maven has advanced and powerful dependency mechanism. It has such
> features as dependency scopes, exclusions, and optional dependencies
> (if you want you can get more detailed information about them in
> Maven
> documentation [5], describing them here wouldn't make much sense).
> 
> On the other hand RPM has much simpler dependency mechanism. You can
> either require some package along with all its requirements or not
> require it at all. There is no way to express different dependency
> scopes in RPM packages.
> 
> Let me give you a simple example. A and B are JAR artifacts, X is a
> POM artifact (aka parent POM), P is some Maven plugin.
> 
>   artifact B requires artifact A (scope: compile)
>   artifact A inherits from artifact X
>   artifact X requires plugin P
> 
> And some ASCII-art graph (feel free to skip it if you don't like it
> of
> if it's unreadable for you, all the information represented in the
> graph is present in the text too).
> 
>          ,---.    ,---.
>          | X |--->| P |
>          `---'    `---'
>            ^
>            |
> ,---.    ,---.
> | B |--->| A |
> `---'    `---'
> 
> In this case there is no wonder that package B should require package
> A because its needed even during runtime. Question whether A should
> require X or if X should require P is more problematic. There are 4
> possibilities in this case which I'll try to describe in more detail.
> 
> Case 1. A requires X and X requires P. In this case if you want to
> install package B you'll have to install P as a transitive
> dependency. All dependencies of P will be installed too. For example
> Maven plugins require Maven, but possibly much more. This solution is
> not good because it causes many unneeded packages to be installed.
> 
> Case 2. A does not require X (and X does require P). Now you get
> correct runtime dependencies from perspective of package B. But to
> build package B you'll need to manually add BuildRequires: X because
> Maven would otherwise fail to resolve artifact X which is referenced
> from POM A. This solution is not good because (1) you'd need to
> manually specify BuildRequires: X when in spec file of package B and
> (2) because plugin P is installed when building package B even that
> this plugin is not needed.
> 
> Case 3: X does not require P (but A does require X). Now when
> installing B you get only a single uneeded dependency (i.e. package
> X). We could live with that, especially because X is a small POM-only
> package. Building B is simple - you don't need to specify extra
> BuildRequires. However in this case to build package A you need to
> add
> BuildRequires: P in spec file of package A. If package X changes (for
> example a plugin is added or removed) then you need to change package
> A too (to add or remove respective BuildRequires).  This is error
> prone and tedious.
> 
> Case 4 in which X does not require P and X does require P is just a
> combination of cases 2 and 3. It adds no benefits and combines their
> disadvantages. For this reason this case is not acceptable.
> 
> To summarize: cases 1 and 4 were unacceptable. 2 and 3 could work,
> but
> have major disadvantages. These disadvantages get much more
> problematic as number of involved packages increases. Both cases
> require maintaining BuildRequires in different places from where they
> arising. When updating a single package one would need to investigate
> if any of related packages need updating. If you don't update related
> packages then you get inaccurate requires (which itself leads to
> build
> failures, dependency bloat, or two at the same time).
> 
> I tried to improve the situation of Maven packaging in Fedora and I
> have thought of many different solutions. I can't really explain you
> in one email (which gets too long anyway) all the possibilities I
> considered in the whole 6-month process of designing and implementing
> XMvn. Explaining at least some of them would require knowledge about
> Maven internals. Instead I'll propose a solution which in my
> judgement
> is the best and try to show that it's better than current situation.
> 
> [5]
> http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html
> 
> 
> Proposed solution - effective POMs
> ==================================
> 
> First let me explain what I mean by "effective POM". Effective POM is
> basically a POM with included metadata from all ancestor POMs (parent
> POM, parent of parent and so on). Effective POMs don't need to
> explicitly declare parent POM because they have all settings copied
> from ancestors. Inheriting from parent would be a NOP.
> 
> So what happens in the previous example if package A installed
> effective POM instead of raw (upstream) POM? The most important
> consequence is that we can simplify requires:
> 
>          ,---.    ,---.
>          | X |--->| P |
>          `---'    `---'
> 
> ,---.    ,---.
> | B |--->| A |
> `---'    `---'
> 
> All requires are accurate now (minimal and correct).
> 
> 1) Package A no longer needs to require X because POM A is effective
> and doesn't reference POM X.
> 
> 2) Installing binary packages (A or B) doesn't bring any unneeded
> dependencies on parent POM packages or Maven plugins.
> 
> 3) Building package B doesn't bring plugin P (which is not needed to
> build B).
> 
> 4) Building package A automatically brings plugin P (which is needed
> to build A). Plugin P is installed automatically because A
> BuildRequires X, which pulls in P.
> 
> 5) All packages declare Requires or BuildRequires for stuff that are
> specified only in their POMs. With this solution you don't *ever*
> need
> to declare Requires or BuildRequires on dependencies added in POMs
> from other packages.
> 
> Let me highlight two things:
> 
> 1) With this solution effective POMs need to be installed only in
> packages shipping binaries (like A). POM-only packages (like X) still
> install raw POMs.
> 
> 2) As I showed, the noticeable improvement in dependencies is a
> direct
> consequence of installing effective POM instead of raw POM in package
> A. Any proposal like "let's keep XMvn but revert installing effective
> POMs" would nullify the benefit gained - pretty much the whole reason
> of using XMvn and automated dependency generation in the first place.
> 
> Some time ago automated Maven and OSGi provides and requires
> generation was implemented. It was fully enabled for OSGi (mainly
> because OSGi dependencies are much simpler than Maven), but
> auto-requires for Maven artifacts were not enabled. They were
> disabled
> because we didn't install effective POMs and without that generated
> automatic requires wouldn't be sane (as I showed above).
> 
> 
> Effective POMs - evil or not
> ============================
> 
> There were several matters related to effective POMs touched at the
> meeting (and after). I'll try to comment on them.
> 
> 1. "Effective POMs are bundling other POMs and because of that they
> need to be forbidden in Fedora."
> 
> Explanation: The only thing that needs to be copied from parent POM
> is simple metadata. To be more specific - groupId, dependency
> artifact
> names and extension artifact names. No code is bundled or anything
> like that. Only simple, small metadata that would otherwise have to
> be
> included in form of package Requires or BuildRequires manually in
> order to get things working.
> 
> 2. "Effective POMs are unreadable."
> 
> Explanation: Effective POMs aren't supposed to be read by people, but
> for machine processing. You can install raw POMs next to effective
> POMs if you feel there is a need to have them for people to read (as
> a
> form of documentation). This is as if you said that we should install
> C code and interpret it instead of installing machine code because
> the
> second is unreadable.
> 
> 3. "Using effective POMs breaks compatibility of Fedora system
> artifact
> repository with upstream Maven."
> 
> Explanation: First of all, our repository has different structure
> from upstream Maven and there is no way to directly use it from
> upstream Maven. Secondly, effective POMs are valid POMs (not some
> custom format) and as such they can be parsed and used by unmodified
> upstream Maven. Installing effective POMs instead of raw POMs doesn't
> bring any change in terms of compatibility with upstream Maven.
> 
> 4. "If there is a bug in parent POM then all dependant packages have
> to
> be rebuilt to fix the bug."
> 
> Explanation: If the bug is not about declared in dependencies then
> dependant packages don't need to be rebuilt because all metadata
> besides dependencies is meaningless when used by effective POM in
> installed binary packlage (in future all the meaningless data will be
> stripped off to reduce POM size and improve readability). But if the
> bug is in dependencies declared in the POM then dependant packages
> would have to be rebuilt anyways, no matter if raw or effective POMs
> are used. Rebuild is needed because if a dependency in parent POM
> changed then this cange needs to be reflected in updated Requires or
> BuildRequires of other packages. If you don't update dependant
> packages then you silently introduce packaging bugs.
> 
> 5. "If some packages install effective POMs and some raw POMs then
> dependencies become incorrect."
> 
> Explanation: That is simply not true. Mixing effective POMs and raw
> POMs in Fedora could expose *existing* packaging bugs in other
> packages. There are cases that packages don't declare all of their
> dependencies but people don't experience that bugs because other
> package have excessive dependencies that cover missing requires in
> other packages. Reducing dependencies to minimum creates a
> possibility
> that fixing one bug (excessive dependencies) can expose another bug
> in a different package. Using effective POMs doesn't introduce new
> bugs by itself. Moreover, having both effective and raw POMs in
> distribution is hopefully a transitive state and I hope that at some
> point in future all non-POM packages will install effective POMs.
> 
> 6. "Installing effective POMs instead of raw POMs is a deviation from
> upstream."
> 
> The reason why POM files are installed with Fedora packages is that
> they are needed to be automatically processed by Maven during build
> of
> dependant packages. Internally Maven creates effective POM very soon
> in the build process and uses this effective POM during the
> build. Hence installing effective POMs has the same semantics as
> installing raw POMs. The difference is that our installed POMs will
> have a bit different structure, possibly won't include all the
> information (which would be meaningless in that context anyways) and
> won't be byte-identical to upstream POMs. But again, POMs should be
> treated as data for machine processing, so different structure is
> acceptable as long as semantics are the same. It's like in Fedora we
> install JAR files (or .so files) different (not bit-identical) from
> upstream, but with the same semantics (the same runtime behaviour;
> implementing exactly the same algorithm because they are compiled
> from
> the same source code).

Now to the problems with embedding poms aka effective poms - There are parent poms like (http://git.eclipse.org/c/platform/eclipse.platform.releng.aggregator.git/tree/eclipse-parent/pom.xml ) which contain way more information than one would think - like supported architectures, set of bundles in a bundle pool to be used when running some tasks and etc. Now consider that this is the parent pom of few maven plugins like maven-cbi-plugin, eclipse-jarsigner-plugin and others. So the parent pom gets embedded in the plugins pom and I want to add one more arch (e.g. arm64). In this case I would have to rebuild all the plugins (even this is unacceptable) so they get the arm64 arch embedded in them, while this might work if everything is noarch and I do it on the primary archs, it becomes impossible to do it if there is arch specific parts(and yes we do have them - calling eclipse and etc.) cause I would need to run the cbi to build the jarsigner but the cbi has embedded config that arm64 is not supported hence fail the build. So in this case we end up in no way to do something as simple as adding one more arch which in the old case was just patching the parent and everything works.
Another problem is the benefit from effective poms - it's questionable if you don't rely on maven to handle your dependencies - aka for OSGi case - a lot of additional work for no direct benefit.
P.S. The actual plugins go deep inside eclipse and the cycles are a bit bigger so this is simplified description. But it's real world problem. We have spend a whole lot of time to make bootstrapping and cycles needed to add support for additional arch easy that such a change is a major drawback.
P.S. 2 The plugins in question are not yet in fedora separately but will be soon and they are still used as part of other builds. 

Alexander Kurtakov
Red Hat Eclipse team
> 
> 
> I hope that this explains why installing effective POMs is needed and
> covers most of concerns about them. If you have any questions or
> comments, please feel free to comment. Contrary to what some people
> seem
> to believe, I do not refuse to listen to any concerns about stuff
> related to Maven in Fedora and I would appreciate any (constructive)
> criticism or comments.
> 
> --
> Mikolaj Izdebski
> --
> java-devel mailing list
> java-devel at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/java-devel