----- Original Message -----
From: "Gerald Henriksen" <ghenriks(a)gmail.com>
To: "Fedora Big Data SIG" <bigdata(a)lists.fedoraproject.org>
Sent: Saturday, August 1, 2015 10:33:04 PM
Subject: Re: Is it time to remove the existing Big Data packages from Fedora?
On Sat, 1 Aug 2015 11:48:59 -0400, you wrote:
>On Sat, Aug 01, 2015 at 09:32:22AM +0200, Javi Roman wrote:
>> The problem here is the big data tools, for example Apache Flume, or
>> Apache Spark are evolving quickly with new important features, however
>> some of the libraries used by those tools (for example Java artifacts in
>> Flume)
>> have frozen versions, because it's works for the developer. In Fedora
>> some of those
>> libraries are increasing their versions (the most updated versions
>> from the upstream project), and the big data tool affected breaks in
>> compilation time.
>
>This isn't a problem just with big data tools. Many developers want to
>do this. The problem is that having all of those multiple versions
>becomes a maintenance nightmare. When there's a security problem, how
>do you identify which packages have a library with the problem? Which
>versions are affected? Does the same patch apply to them all? Who fixes
>it if it doesn't?
I agree it is a packaging nightmare, but the problem is there is no
ideal solution.
Best solution, we try to continue what was started, which means
packaging Hadoop and friends with a lot of work (and patches) to make
things build with the newer libraries that are in Fedora.
The problem here is that even if we get everything packaged we already
apparently have feedback from the Big Data community that they don't
trust, and hence won't use the packages that people have put much
effort into making. While it wasn't mentioned why this is the case, I
am guessing it is at least in part because nothing is tested with
those newer library versions, and the attitude on the Hadoop mailing
lists will be shrug shoulders and say download the "official" version
of Hadoop or whatever part of the ecosystem is being problematic.
Which is why the idea of simply creating a docker container is so
attractive. You remove all the time spent either creating/maintaining
the above patched versions, or alternatively packaging and maintaining
older libraries. We in essence throw the security issue to the
upstream developers, and simply update the docker container with every
Hadoop/Spark/etc release.
Is it a great solution? Not really, but unless upstream changes there
are no great solutions.
IMHO it is the only tenable solution.
_______________________________________________
bigdata mailing list
bigdata(a)lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/bigdata
--
Cheers,
Timothy St. Clair
Red Hat Inc.