Inter-WG coordination: Stable application runtimes

Sun Jan 12 20:34:19 UTC 2014

On 10.01.2014 21:12, Matthew Miller wrote:
> On Thu, Jan 09, 2014 at 07:58:44PM -0800, Adam Williamson wrote:
>> So the question becomes, what is it appropriate for a distribution to do
>> in this situation? My personal opinion is that what's appropriate for a
>> distribution to do is also, happily, what's easiest for a distribution
>> to do: punt on it. Entirely punt on the whole thing.
>
>
> And, this ultimately makes a better experience for users, because if
> Fedora's included tools are aware of the native packaging system, we can do
> things like system auditing, security alerts (if not updates), maybe even
> integration with selinux policy. Basically, we don't hammer all of the
> possible universe into the distribution model, and we don't include all of
> the packages of everything in the base Fedora distribution as RPMs, but we
> do include those ecosystems in the Fedora _Project_, including the tools and
> documentation to make them feel natural.
>

Your expression " ... tools aware of the native packaging system"
and the Andrew comment about the pip behaviors above in the thread,
encouraged me to share my "big hammer of the DB plumber style" :-)
opinion on the problem.

TL;DR: What follows is SCC: An idea for optional DB and service which
caches bits from YumDB, the local state (RPMDB and /etc) plus the
"native" systems like NPM in the unified way for the purposes of
planning, resolving and performing system state transitions.

Initially I meant to say just few words to mark the possibility, but
obviously my English and talent for easy for reading and well focused
messages are both far from good, so the whole thing become too long and
I am going to split some additional notes into self replays - Excuses!

RPMDB and YumDB are two rich datasets present on every Fedora instance
representing respectively the local state and distribution+ state of the
package universe. '+' because +Fusion*, +COPRs, +LocalOrg, etc.
Unfortunately they are somewhat hidden in the dark, because lack of
interfaces - we even missing SVG or other explorer for the YumDB graph.

The (let think of them as "virtual") Maven DB, PyPI DB, NPM DB, LuaRocks
DB, etc, technically are subsets of YumDB (in sense richness of encoded
logical relations between the DB nodes used in their schemes - e.g. PyPI
DB, before the pip egg, do not knows which file from which module comes,
nor have the concept of higher than package NV granularity of the
requirement points - Provides and Requires in YumDB). Also, the
depsolving of the "native" packaging systems is less sophisticated than
both current yum and hawkey ones.

These observation naturally lead to the idea of SCC: System
Configuration Cache DB representing the merge of RPMDB, YumDB and e.g.
local pip egs, PyPI DB (if e.g. additional python modules/versions are
needed for the given deployment) where the depsolving could be shifted,
somewhat in the same fashion as Data warehouse solutions are used in the
enterprises for merging the significant datasets from various ERP
systems into single DB for interactive time reporting and decision
making.

I am using the term SCC (vs. e.g. UPMC: Unified Package Metadata Cache)
in attempt to cover a (paraphrased) Mirek question from the beginning of
the thread - "OK, we have Fedora and PyPI integrated, at one point of
the time for a given instance, the Fedora packaged module has been
chosen. What happens if we upgrade Fedora along with am incompatible
version this Python module for given installed service" - obviously we
need to register in the SCC the dependencies of the installed machine
roles with the same effect like Require clauses in the our packages, so
the SCC machinery to validate (with negative result in this described
case) the yum upgrade or to resolve the upgrade including installation
of newest available compatible version from PyPI as an alternative
provision during upgrade preparation.

Further, I think that ideally SCC should parse/absorb as much as
possible system object properties and relations from /etc (plus /lib and
/var configuration areas) to allow sysadmins and devops to inject rules
for effective use of these sets latter in the depsolving (and other DB
functionality). That said, the integration with OpenLMI, or even
implementing the whole thing under the OpenLMI umbrella, both seems
natural.

So, finally on that road we have:

- choice for good enough DB engine for SCC, query language, compiler
   [*], etc. design decisions like sync protocol and plugable data
   sources.

- Yum/RPM datasets: optional rpm, yum, hawkey hooks for syncing their
   DBs, alternatively just sync tools.

- optional pip, npm, luarocks, etc. hooks for the same, alternatively
   just sync tools.

- OpenLMI integration for absorbing system configuration, alternatively
   just Augeas import + transformation rules to sync the DB
   representation of the system objects.

- sccd capable to:
   - depsolving (on top of cumulative - YumDB + native managers DBs,
     preferably providing interfaces to the system libsolv for using SCC
     DB in place of the libsolv own filestores.
   - invoke package operations with yum/dnf + "native" managers
   - provides sysadmins, developers and devops with the capability of
     defining system overlays - services/applications trees and their
     relations with the rest of the DB.
   - supports live textual views (e.g. trough FUSEs) for cases when
     local preferred deployment style is ansible, salt, etc. based.

   As alternative/complementing the own daemon approach the above above
   functionality might be implemented in the form of OpenLMI providers
   (or existing provider extensions).

- sccd-web: WebUI exposing full functionality, alternatively Cockpit
   (OpenLMI WebUI) extension.

- scc [**] command line shortcut for few simple tasks.

- providing pip, luarocks, etc patches for sccd integration if present,
   alternatively "native" managers command lines/api implementations on
   top of sccd; providing yum and hawkey plugins for "native" packages
   conflicts handling

- NTH: remote SCC DB for the instance,

- NTH: SCC local state for multiple instances (e.g. deployment nodes or
   local containers) kept in the same SCC DB

- NTH: SCC local state inheritance between instances

Kind Regards,
Alek

[*] Crucial aspect of any sophisticated data management system is the
     data query and manipulation language. Unfortunately the choices are
     rather limited - Imperative approaches (recently resurrected by some
     NoSQL DBs) are weak and error prone; SQL and few more "text prose"
     languages have proven their incompatibility with the vast majority
     of the developers (these without years of specific experience around
     the data volumes processing). The predominant workaround seems to be
     ORMs, but ORMs and "sophisticated/fast" should not be mixed in same
     project :-).

[**] scc resolve python module hgsubversion
     output: Found 2 resolutions, Transaction IDs: 1921, 1922
     output: Recommended: http://scc-geteway/machine-id/transaction/1921
     user: browsing, thinking ... (hgsubversion comes from PyPI), but
     happily spotting that subvertpy dependency is satisfied by fedora
     provided package - it is linked with the system subversion.

     scc transaction 1921 add resolve luajit 2.0 module zmq min-version
     0.3.3
     output: Found 17 resolutions, Transaction IDs: 1921, 1924, ...
     output: Recommended: http://scc-geteway/machine-id/transaction/1921
     user: browsing, ... lua zmq will use the system zmq library, which
     is the in the desired version range - OK

     scc transaction 1921 apply
     etc ...