apache spark, anyone?

Thu Jul 30 21:46:35 UTC 2015

On 07/30/2015 10:20 PM, Gerald Henriksen wrote:
> On Wed, 29 Jul 2015 20:54:32 -0400, you wrote:
> 
>> I fear that we are tilting at windmills here. Containers and projects 
>> like Atomic and CoreOS are redefining what application composition can 
>> look like. The base OS is no longer driving this since I can compose an 
>> application with containers that could be a fabulous mix of components 
>> running on Fedora, CentOS, Ubuntu, and so on.
>>
>> Language ecosystems have evolved for delivering applications. But they 
>> _are not reliant__nor interested_ in the whims of any particular distro 
>> packaging rules. I have an uber jar with my application and its bundled 
>> runtime dependencies. Do you have a current compatible JVM? Great! Let's 
>> get work done!
>>
>> But it's not clear that Fedora is evolving. The CVE paranoia only 
>> travels so far up the stack to a point _where the value of the 
>> application __outweighs the security risks of having multiple log4j 
>> versions installed_. A "conform or be cast out" ethos is the road to 
>> irrelevance IMHO.
> 
> My feeling is that if Fedora (and I suppose by default the Big Data
> SIG) want to have Hadoop and company running on Fedora then we need to
> stop trying to package it into RPMS.

I agree that having a new, simplified, possibly non-RPM, closer to
upstream packaging way would be an improvement.

> My personal opinion, having looked into the idea of working on Big
> Data packages, is that Java apps are unsuitable for packaging.

Agreed. While it is possible to package Java applications, doing so
requires persistence and a lot of work...
(saying that as maintainer of 400+ Java packages)

> I understand that people have spent a lot of time coming up with
> guidelines, and helper apps, for packaging Java stuff over the years.
> But it is still a mess and that is without considering what the
> developers of those JVM based apps are doing.

There is an obvious misalignment between expectations of Java upstreams
and downstream distributions. These "guidelines" and "helper apps" are
an imperfect attempt of bringing Fedora closer to Java upstream reality.
They made it possible to achieve what we currently have in Fedora, but
are not enough to bring all the useful BigData goodies to Fedora users.

Fedora historically had (and still has) a strict set of rules for
packaged applications. We had to follow these rules to be able to have
basic Java developer tooling available in Fedora. With Fedora.next this
is starting to change and I think it's a great opportunity for BigData
packaging (see below).

> So forget the idea of trying to get exceptions to the rules, or
> creating alternate repos.
> 
> In my mind the best thing that can be done is:
> 
> 1) actively test the Hadoop and other componenet releases on Fedora
> and make them as reliable and bug free on Fedora as possible.  This
> means making sure they work on openJDK given most in the Hadoop
> community seem to use Oracle's JVM, and making sure that stuff works
> on the newer openJDK versions as Hadoop and company seem to be very
> reluctant to move on from versions of the JVM that are no longer
> supported.

This definitely has a value for upstream projects, but it's not so much
within scope of Fedora (except for filing OpenJDK bugs ofc).

> 2) if you want to have Hadoop (or Spark, etc) as an easy to run thing
> then forget packaging and instead create Docker containers.

I've been watching BigData SIG since its creation and I've been trying
to participate as time allows, mostly as Java build system maintainer.
Sadly, I have the impression that the initial enthusiasm has fallen down
over time, possibly largely due to packaging difficulties.

Fedora.next opens possibility for new ways of packaging applications. As
I understand it, Fedora as a project agreed on these new ideas, but they
are not implemented yet, at least not fully. Everyone is welcome to
participate in defining and creating the new Fedora, where non-RPM
packaging formats (like Docker) are accepted.

Perhaps it's time to schedule a SIG meeting and talk about the future of
BigData in Fedora? If we agree on how we want BigData packaging to look
like then we can approach appropriate parties (such as Env and Stacks)
and talk about implementing what we want to achieve.

In short, what I want say is: Don't give up with Fedora. Try to
participate in improving the way we package and ship software.

-- 
Mikolaj Izdebski
Software Engineer, Red Hat
IRC: mizdebsk