Is it time to remove the existing Big Data packages from Fedora?

Christopher ctubbsii-fedora at apache.org
Mon Aug 3 16:38:26 UTC 2015


On Sat, Aug 1, 2015 at 3:32 AM, Javi Roman <jroman.espinar at gmail.com> wrote:
> On Sat, Aug 1, 2015 at 1:22 AM, Christopher <ctubbsii-fedora at apache.org> wrote:
>> On Fri, Jul 31, 2015 at 10:27 AM, Gerald Henriksen <ghenriks at gmail.com> wrote:
[snip]
>
> The problem here is the big data tools, for example Apache Flume, or
> Apache Spark are evolving quickly with new important features, however
> some of the libraries used by those tools (for example Java artifacts in Flume)
> have frozen versions, because it's works for the developer. In Fedora
> some of those
> libraries are increasing their versions (the most updated versions
> from the upstream project), and the big data tool affected breaks in
> compilation time.

That's true of many upstream projects. That problem is not unique to
Big Data, though I will concede it seems to be a larger problem with
Java projects, in general. This is where the downstream, like Fedora,
can provide a huge benefit to users: it can ship versions of projects
that have been updated with the latest known security and stability
fixes, and it can provide dependency convergence to ensure
compatibility between components, and API stability/reliability that
upstream doesn't necessarily care about.

Where the upstream project breaks when using up-to-date versions of
libraries, Fedora has an opportunity (and, perhaps, a responsibility)
to give back to the upstream community.

>
> So we have two options:
>
> 1. The package maintainer (of the tools) must to patch the tool over
> and over, in order to adapt the code to the new interfaces, sometimes
> very complex task. And very difficult to follow the Fedora release policy cycle.
>
> 2. Try to convince the upstream developer for update the version of
> the library of his tool. This is very improbable to achieve, because
> the developer is thinking in the new features and he is not thinking
> about change the way the tools are writing in the log (log4j), for
> example.
>
> I think we can not solve easily this situation without multi-version
> packages. On the other hand we can help with things such as FHS,
> environment variables, weird shell scripts and so forth.

To some extent, Fedora has employed each of these strategies already:
repeated patching, convincing upstream, and supporting multiple
versions. If a particular package is problematic, or the maintainer
isn't up to the task to decide which combination of strategies is
best... well, that's where co-maintainers can be a really big help.
I'd only support dropping if maintainers can't be found who are
willing to support the packages to the extent they need. Isn't that
how things typically work in Fedora?


More information about the bigdata mailing list