Is it time to remove the existing Big Data packages from Fedora?

Mon Aug 3 13:54:12 UTC 2015

----- Original Message -----
> From: "Gerald Henriksen" <ghenriks at gmail.com>
> To: "Fedora Big Data SIG" <bigdata at lists.fedoraproject.org>
> Sent: Saturday, August 1, 2015 10:33:04 PM
> Subject: Re: Is it time to remove the existing Big Data packages from Fedora?
> 
> On Sat, 1 Aug 2015 11:48:59 -0400, you wrote:
> 
> >On Sat, Aug 01, 2015 at 09:32:22AM +0200, Javi Roman wrote:
> >> The problem here is the big data tools, for example Apache Flume, or
> >> Apache Spark are evolving quickly with new important features, however
> >> some of the libraries used by those tools (for example Java artifacts in
> >> Flume)
> >> have frozen versions, because it's works for the developer. In Fedora
> >> some of those
> >> libraries are increasing their versions (the most updated versions
> >> from the upstream project), and the big data tool affected breaks in
> >> compilation time.
> >
> >This isn't a problem just with big data tools. Many developers want to
> >do this. The problem is that having all of those multiple versions
> >becomes a maintenance nightmare. When there's a security problem, how
> >do you identify which packages have a library with the problem? Which
> >versions are affected? Does the same patch apply to them all? Who fixes
> >it if it doesn't?
> 
> I agree it is a packaging nightmare, but the problem is there is no
> ideal solution.
> 
> Best solution, we try to continue what was started, which means
> packaging Hadoop and friends with a lot of work (and patches) to make
> things build with the newer libraries that are in Fedora.
> 
> The problem here is that even if we get everything packaged we already
> apparently have feedback from the Big Data community that they don't
> trust, and hence won't use the packages that people have put much
> effort into making.  While it wasn't mentioned why this is the case, I
> am guessing it is at least in part because nothing is tested with
> those newer library versions, and the attitude on the Hadoop mailing
> lists will be shrug shoulders and say download the "official" version
> of Hadoop or whatever part of the ecosystem is being problematic.
> 
> Which is why the idea of simply creating a docker container is so
> attractive.  You remove all the time spent either creating/maintaining
> the above patched versions, or alternatively packaging and maintaining
> older libraries.  We in essence throw the security issue to the
> upstream developers, and simply update the docker container with every
> Hadoop/Spark/etc release.
> 
> Is it a great solution?  Not really, but unless upstream changes there
> are no great solutions.

IMHO it is the only tenable solution.  

> _______________________________________________
> bigdata mailing list
> bigdata at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/bigdata
> 

-- 
Cheers,
Timothy St. Clair
Red Hat Inc.