Re: Is it time to remove the existing Big Data packages from Fedora?

Monday, 3 August 2015

----- Original Message -----
...
 From: "Gerald Henriksen" <ghenriks(a)gmail.com&gt;
 To: "Fedora Big Data SIG" <bigdata(a)lists.fedoraproject.org&gt;
 Sent: Saturday, August 1, 2015 10:33:04 PM
 Subject: Re: Is it time to remove the existing Big Data packages from Fedora?

 On Sat, 1 Aug 2015 11:48:59 -0400, you wrote:

 >On Sat, Aug 01, 2015 at 09:32:22AM +0200, Javi Roman wrote:
 >> The problem here is the big data tools, for example Apache Flume, or
 >> Apache Spark are evolving quickly with new important features, however
 >> some of the libraries used by those tools (for example Java artifacts in
 >> Flume)
 >> have frozen versions, because it's works for the developer. In Fedora
 >> some of those
 >> libraries are increasing their versions (the most updated versions
 >> from the upstream project), and the big data tool affected breaks in
 >> compilation time.
 >
 >This isn't a problem just with big data tools. Many developers want to
 >do this. The problem is that having all of those multiple versions
 >becomes a maintenance nightmare. When there's a security problem, how
 >do you identify which packages have a library with the problem? Which
 >versions are affected? Does the same patch apply to them all? Who fixes
 >it if it doesn't?

 I agree it is a packaging nightmare, but the problem is there is no
 ideal solution.

 Best solution, we try to continue what was started, which means
 packaging Hadoop and friends with a lot of work (and patches) to make
 things build with the newer libraries that are in Fedora.

 The problem here is that even if we get everything packaged we already
 apparently have feedback from the Big Data community that they don't
 trust, and hence won't use the packages that people have put much
 effort into making.  While it wasn't mentioned why this is the case, I
 am guessing it is at least in part because nothing is tested with
 those newer library versions, and the attitude on the Hadoop mailing
 lists will be shrug shoulders and say download the "official" version
 of Hadoop or whatever part of the ecosystem is being problematic.

 Which is why the idea of simply creating a docker container is so
 attractive.  You remove all the time spent either creating/maintaining
 the above patched versions, or alternatively packaging and maintaining
 older libraries.  We in essence throw the security issue to the
 upstream developers, and simply update the docker container with every
 Hadoop/Spark/etc release.

 Is it a great solution?  Not really, but unless upstream changes there
 are no great solutions. 
IMHO it is the only tenable solution.  

...
 _______________________________________________
 bigdata mailing list
 bigdata(a)lists.fedoraproject.org
 https://admin.fedoraproject.org/mailman/listinfo/bigdata

-- 
Cheers,
Timothy St. Clair
Red Hat Inc.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: Is it time to remove the existing Big Data packages from Fedora?