<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 07/23/2013 12:58 PM, Toshio Kuratomi

      wrote:<br>

    </div>

    <blockquote cite="mid:20130723165834.GC11402@unaka.lan" type="cite">

      <pre wrap="">On Mon, Jul 22, 2013 at 05:54:27PM -0400, Peter MacKinnon wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">

So far, so good...sort of. We can make the basic use case and tests work with

the modified dependencies but in doing so we risk giving up parity with the

Apache baseline (including the JRE) and potentially lose out to other so-called

"dirty RPMs". Ideally, we wouldn't be forced into some of these adaptations and

compromises if there were Fedora packaging alternatives that would give us (a

SIG ring?) more control over the bundles needed by Hadoop as opposed to the

ones mandated by the latest Fedora release. Make no mistake: patches are fed

from the SIG to the Hadoop community to try to bump the versions there. But the

upstream project can't and won't chase an ever-vanishing point in the distance.

They view their lower dependencies much like a stable OS such as RHEL and

change should be deliberated there.

</pre>

      </blockquote>

      <pre wrap="">So -- would you be willing to go on a tangent with me for a short bit?

Looking at your issue, I have a few comments, some of which might even be

helpful :-)  I'm bringing this up because the way you describe Hadoop makes

it sound like people are going to want to build on top of it.  having it in

a Ring 2/3 non-rpm universe makes it more tempting for people to treat

Hadoop as an alternative choice which might mean that we get multiple copies

of hadoop there as people build on different hadoop foundations.  Figuring

out whether it's possible to build this in Fedora Commons might be better.

Observations-only:

* Bravo for your patching effort!  My experience in open source is that

  projects which relentlessly stick to old versions eventually fork and the

  fork which is supporting the newer dep stack.  Unfortunately, this switch

  of project direction is painful and can take a long while to come about.

  So even though your experience with patching up to more recent versions

  will be valuable when/if this actually happens, I understand that you will

  likely want to work on some other strategy first.

Ways to improve within the Fedora Core/Fedora Commons model:

* Although bundled libraries are not allowed in Fedora Core and Commons;

  both compat libraries and forks are allowed.  There are downsides to each:

  * Compat libraries may not have support from upstream.  Once upstream

    moves onto libfoo-2.0, they may no longer be interested in releasing

    bugfixes or even security fixes for libfoo-1.x.  This means that you, as

    the owner of the compat package would be responsible for those.  On the

    other hand, you'd be responsible for those even if they were bundled

    into the package you're really interested in.  It's just that you would

    mostly pretend that the Hadoop upstream was on top of those problems

    instead of having to deal with them yourself.  You could still rely

    heavily on the Hadoop upstream by syncing their fixes to the compat

    library in their bundled copy to your copy.

  * Forking increases the number of copies of virtually the same code, at

    least, until the forks diverge significantly.  Forking also means that

    someone needs to setup upstream infrastructure: an upstream SCM,

    a mailing list somewhere, an issue tracker (a lot of times, the ml and

    issue tracker start off as a piece of an existing project; only the SCM

    differs).  What does forking bring you, though?  The potential benefit

    is that you get a better upstream experience.  It's a place where people

    who need the same API version of libfoo can congregate and supply fixes

    that can be used in common with every other project that uses that code.

  * The upsides to both of these over bundled libraries are that you have

    better tracking of your dependency chain (Security issue discovered in

    libfoo that affects everything from 1.0.5 through 2.8.  When you have an

    external package, it's easier to see whether the code is affected and

    how many packages are affected than if it's bundled.  This can also be

    overcome using Virtual Provides to "tag" a package that's bundling

    a specific version.) and the possibility for more people to collaborate

    on this (We've often seen that multiple upstream packages are

    bundling upstream versions of a libfoo-1.x API.  When we pull these into

    a compat package, all the bugfix work can be pushed to the central

    package which cuts down on work for everyone.)

* Compat packages only help us if there's a way to parallel install library

  packages (and it sounds like for hadoop you might need parallel versions

  of the JRE as well).  Forking usually implies a new library name so that

  usually doesn't need parallel library support.  So the first

  implementation question might be:  Is parallel insallation and usage

  supported in your language?  if it isn't out of the box, is there a way we

  could create it (for instance, via a wrapper script that set CLASSPATH

  appropriately)?  For the JRE, is there a way to parallel iinstall it?  Is

  there also a way that the application being run can choose which JRE it

  needs to be run under?

-Toshio

</pre>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

    </blockquote>

    <br>

    Well, compat libraries are certainly an option but I view that as a

    tactical solution to an institutional, um, challenge. <br>

    And I believe that is what Matt is driving at: sustainable solutions

    that satisfy the user/admin need for stability and <br>

    "cleanliness" while also providing an OS that developers from all

    manner of technology and language profiles would <br>

    gravitate toward.<br>

    <br>

    Forking is fairly radical I would say in our specific case. There

    are many other stakeholders involved in the upstream project, <br>

    both individual and corporate. Not to say that is not a possibility

    where a significant Hadoop fork appears at some point in <br>

    the future, with a mandate to track dependency updates more closely

    and perhaps technically innovate or refactor with those <br>

    deps. But the scale of that effort (technically and politically) is

    daunting and just plain unattractive at the moment for our SIG.<br>

    <br>

    BTW, the #fedora-java team has done an awesome job providing a tool

    set to bridge the gap in the specific case of Maven<br>

    and Fedora system repos.<br>

    <br>

    <pre class="moz-signature" cols="72">-- 

Peter MacKinnon

MRG Grid/Big Data

Red Hat Inc.

Raleigh, NC

</pre>

  </body>

</html>