On Sat, 1 Aug 2015 11:48:59 -0400, you wrote:
On Sat, Aug 01, 2015 at 09:32:22AM +0200, Javi Roman wrote:
> The problem here is the big data tools, for example Apache Flume, or
> Apache Spark are evolving quickly with new important features, however
> some of the libraries used by those tools (for example Java artifacts in Flume)
> have frozen versions, because it's works for the developer. In Fedora
> some of those
> libraries are increasing their versions (the most updated versions
> from the upstream project), and the big data tool affected breaks in
> compilation time.
This isn't a problem just with big data tools. Many developers want to
do this. The problem is that having all of those multiple versions
becomes a maintenance nightmare. When there's a security problem, how
do you identify which packages have a library with the problem? Which
versions are affected? Does the same patch apply to them all? Who fixes
it if it doesn't?
I agree it is a packaging nightmare, but the problem is there is no
ideal solution.
Best solution, we try to continue what was started, which means
packaging Hadoop and friends with a lot of work (and patches) to make
things build with the newer libraries that are in Fedora.
The problem here is that even if we get everything packaged we already
apparently have feedback from the Big Data community that they don't
trust, and hence won't use the packages that people have put much
effort into making. While it wasn't mentioned why this is the case, I
am guessing it is at least in part because nothing is tested with
those newer library versions, and the attitude on the Hadoop mailing
lists will be shrug shoulders and say download the "official" version
of Hadoop or whatever part of the ecosystem is being problematic.
Which is why the idea of simply creating a docker container is so
attractive. You remove all the time spent either creating/maintaining
the above patched versions, or alternatively packaging and maintaining
older libraries. We in essence throw the security issue to the
upstream developers, and simply update the docker container with every
Hadoop/Spark/etc release.
Is it a great solution? Not really, but unless upstream changes there
are no great solutions.