[fedora-java] Installing effective Maven POM files

Wed Feb 20 12:41:31 UTC 2013

Yesterday at the Java SIG meeting we started discussion about future of
packaging of Maven artifacts in Fedora (i.e. the new guidelines).  We
didn't get very far with discussion about the guidelines and the main
reason for that seems to be that the new way of packaging Maven
artifacts described in the guidelines causes effective POM files to be
installed in place of raw POM files.

Let me begin with my apologies for not communicating this matter
earlier, but I didn't consider this change as controversial and I
didn't think that anyone would have any problems with it. Because
there were many doubts (and even accusations) I'll try to describe in
detail why in my opinion this change is needed.

The current problem
===================

Maven has advanced and powerful dependency mechanism. It has such
features as dependency scopes, exclusions, and optional dependencies
(if you want you can get more detailed information about them in Maven
documentation [5], describing them here wouldn't make much sense).

On the other hand RPM has much simpler dependency mechanism. You can
either require some package along with all its requirements or not
require it at all. There is no way to express different dependency
scopes in RPM packages.

Let me give you a simple example. A and B are JAR artifacts, X is a
POM artifact (aka parent POM), P is some Maven plugin.

  artifact B requires artifact A (scope: compile)
  artifact A inherits from artifact X
  artifact X requires plugin P

And some ASCII-art graph (feel free to skip it if you don't like it of
if it's unreadable for you, all the information represented in the
graph is present in the text too).

         ,---.    ,---.
         | X |--->| P |
         `---'    `---'
           ^
           |
,---.    ,---.
| B |--->| A |
`---'    `---'

In this case there is no wonder that package B should require package
A because its needed even during runtime. Question whether A should
require X or if X should require P is more problematic. There are 4
possibilities in this case which I'll try to describe in more detail.

Case 1. A requires X and X requires P. In this case if you want to
install package B you'll have to install P as a transitive
dependency. All dependencies of P will be installed too. For example
Maven plugins require Maven, but possibly much more. This solution is
not good because it causes many unneeded packages to be installed.

Case 2. A does not require X (and X does require P). Now you get
correct runtime dependencies from perspective of package B. But to
build package B you'll need to manually add BuildRequires: X because
Maven would otherwise fail to resolve artifact X which is referenced
from POM A. This solution is not good because (1) you'd need to
manually specify BuildRequires: X when in spec file of package B and
(2) because plugin P is installed when building package B even that
this plugin is not needed.

Case 3: X does not require P (but A does require X). Now when
installing B you get only a single uneeded dependency (i.e. package
X). We could live with that, especially because X is a small POM-only
package. Building B is simple - you don't need to specify extra
BuildRequires. However in this case to build package A you need to add
BuildRequires: P in spec file of package A. If package X changes (for
example a plugin is added or removed) then you need to change package
A too (to add or remove respective BuildRequires).  This is error
prone and tedious.

Case 4 in which X does not require P and X does require P is just a
combination of cases 2 and 3. It adds no benefits and combines their
disadvantages. For this reason this case is not acceptable.

To summarize: cases 1 and 4 were unacceptable. 2 and 3 could work, but
have major disadvantages. These disadvantages get much more
problematic as number of involved packages increases. Both cases
require maintaining BuildRequires in different places from where they
arising. When updating a single package one would need to investigate
if any of related packages need updating. If you don't update related
packages then you get inaccurate requires (which itself leads to build
failures, dependency bloat, or two at the same time).

I tried to improve the situation of Maven packaging in Fedora and I
have thought of many different solutions. I can't really explain you
in one email (which gets too long anyway) all the possibilities I
considered in the whole 6-month process of designing and implementing
XMvn. Explaining at least some of them would require knowledge about
Maven internals. Instead I'll propose a solution which in my judgement
is the best and try to show that it's better than current situation.

[5] http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html

Proposed solution - effective POMs
==================================

First let me explain what I mean by "effective POM". Effective POM is
basically a POM with included metadata from all ancestor POMs (parent
POM, parent of parent and so on). Effective POMs don't need to
explicitly declare parent POM because they have all settings copied
from ancestors. Inheriting from parent would be a NOP.

So what happens in the previous example if package A installed
effective POM instead of raw (upstream) POM? The most important
consequence is that we can simplify requires:

         ,---.    ,---.
         | X |--->| P |
         `---'    `---'

,---.    ,---.
| B |--->| A |
`---'    `---'

All requires are accurate now (minimal and correct).

1) Package A no longer needs to require X because POM A is effective
and doesn't reference POM X.

2) Installing binary packages (A or B) doesn't bring any unneeded
dependencies on parent POM packages or Maven plugins.

3) Building package B doesn't bring plugin P (which is not needed to
build B).

4) Building package A automatically brings plugin P (which is needed
to build A). Plugin P is installed automatically because A
BuildRequires X, which pulls in P.

5) All packages declare Requires or BuildRequires for stuff that are
specified only in their POMs. With this solution you don't *ever* need
to declare Requires or BuildRequires on dependencies added in POMs
from other packages.

Let me highlight two things:

1) With this solution effective POMs need to be installed only in
packages shipping binaries (like A). POM-only packages (like X) still
install raw POMs.

2) As I showed, the noticeable improvement in dependencies is a direct
consequence of installing effective POM instead of raw POM in package
A. Any proposal like "let's keep XMvn but revert installing effective
POMs" would nullify the benefit gained - pretty much the whole reason
of using XMvn and automated dependency generation in the first place.

Some time ago automated Maven and OSGi provides and requires
generation was implemented. It was fully enabled for OSGi (mainly
because OSGi dependencies are much simpler than Maven), but
auto-requires for Maven artifacts were not enabled. They were disabled
because we didn't install effective POMs and without that generated
automatic requires wouldn't be sane (as I showed above).

Effective POMs - evil or not
============================

There were several matters related to effective POMs touched at the
meeting (and after). I'll try to comment on them.

1. "Effective POMs are bundling other POMs and because of that they
need to be forbidden in Fedora."

Explanation: The only thing that needs to be copied from parent POM
is simple metadata. To be more specific - groupId, dependency artifact
names and extension artifact names. No code is bundled or anything
like that. Only simple, small metadata that would otherwise have to be
included in form of package Requires or BuildRequires manually in
order to get things working.

2. "Effective POMs are unreadable."

Explanation: Effective POMs aren't supposed to be read by people, but
for machine processing. You can install raw POMs next to effective
POMs if you feel there is a need to have them for people to read (as a
form of documentation). This is as if you said that we should install
C code and interpret it instead of installing machine code because the
second is unreadable.

3. "Using effective POMs breaks compatibility of Fedora system artifact
repository with upstream Maven."

Explanation: First of all, our repository has different structure
from upstream Maven and there is no way to directly use it from
upstream Maven. Secondly, effective POMs are valid POMs (not some
custom format) and as such they can be parsed and used by unmodified
upstream Maven. Installing effective POMs instead of raw POMs doesn't
bring any change in terms of compatibility with upstream Maven.

4. "If there is a bug in parent POM then all dependant packages have to
be rebuilt to fix the bug."

Explanation: If the bug is not about declared in dependencies then
dependant packages don't need to be rebuilt because all metadata
besides dependencies is meaningless when used by effective POM in
installed binary packlage (in future all the meaningless data will be
stripped off to reduce POM size and improve readability). But if the
bug is in dependencies declared in the POM then dependant packages
would have to be rebuilt anyways, no matter if raw or effective POMs
are used. Rebuild is needed because if a dependency in parent POM
changed then this cange needs to be reflected in updated Requires or
BuildRequires of other packages. If you don't update dependant
packages then you silently introduce packaging bugs.

5. "If some packages install effective POMs and some raw POMs then
dependencies become incorrect."

Explanation: That is simply not true. Mixing effective POMs and raw
POMs in Fedora could expose *existing* packaging bugs in other
packages. There are cases that packages don't declare all of their
dependencies but people don't experience that bugs because other
package have excessive dependencies that cover missing requires in
other packages. Reducing dependencies to minimum creates a possibility
that fixing one bug (excessive dependencies) can expose another bug
in a different package. Using effective POMs doesn't introduce new
bugs by itself. Moreover, having both effective and raw POMs in
distribution is hopefully a transitive state and I hope that at some
point in future all non-POM packages will install effective POMs.

6. "Installing effective POMs instead of raw POMs is a deviation from
upstream."

The reason why POM files are installed with Fedora packages is that
they are needed to be automatically processed by Maven during build of
dependant packages. Internally Maven creates effective POM very soon
in the build process and uses this effective POM during the
build. Hence installing effective POMs has the same semantics as
installing raw POMs. The difference is that our installed POMs will
have a bit different structure, possibly won't include all the
information (which would be meaningless in that context anyways) and
won't be byte-identical to upstream POMs. But again, POMs should be
treated as data for machine processing, so different structure is
acceptable as long as semantics are the same. It's like in Fedora we
install JAR files (or .so files) different (not bit-identical) from
upstream, but with the same semantics (the same runtime behaviour;
implementing exactly the same algorithm because they are compiled from
the same source code).

I hope that this explains why installing effective POMs is needed and
covers most of concerns about them. If you have any questions or
comments, please feel free to comment. Contrary to what some people seem
to believe, I do not refuse to listen to any concerns about stuff
related to Maven in Fedora and I would appreciate any (constructive)
criticism or comments.

--
Mikolaj Izdebski