Hi all,
This morning I wrote a small script that converts yum's medata into a small json blob that can be used to check if a package is present in RHEL before creating a branch for it in EPEL, or to restrict building the package on EPEL for certain arch.
What is does it basically: - For a list of RHEL version - For a list of directories specific to that RHEL version - Find all the primary.sqlite database - Decompress them if needed - For all the packages listed in the database - get the base package (using the srpm info) - get the epoch, version, release (does nothing w/ it atm) - get the arch - As more arch are found the list grows - Store of the info in a json - Dump the json into a text file
So using this, we are quickly able to check if a package is in RHELX, for example:
import json with open('pkg_el7.json') as stream:
... data = json.load(stream)
'python-zope-interface' in data
True
data['python-zope-interface']
{u'release': u'4.el7', u'epoch': u'0', u'version': u'4.0.5', u'arch': [u'ppc64', u'x86_64']}
The script is fairly quick to run: time python rhel_to_json.py ... 2316 packages retrieved in el6 Output File: pkg_el6.json ... 2514 packages retrieved in el7 Output File: pkg_el7.json ... 1395 packages retrieved in el5 Output File: pkg_el5.json
real 0m47.411s user 0m16.989s sys 0m5.888s
And its output: du -sh pkg_* 144K pkg_el5.json 240K pkg_el6.json 252K pkg_el7.json
Hope this helps,
Pierre
On Fri, Nov 07, 2014 at 01:32:32PM +0100, Pierre-Yves Chibon wrote:
Hi all,
This morning I wrote a small script that converts yum's medata into a small json blob that can be used to check if a package is present in RHEL before creating a branch for it in EPEL, or to restrict building the package on EPEL for certain arch.
What is does it basically:
- For a list of RHEL version
- For a list of directories specific to that RHEL version
- Find all the primary.sqlite database
- Decompress them if needed
- For all the packages listed in the database
- get the base package (using the srpm info)
- get the epoch, version, release (does nothing w/ it atm)
- get the arch - As more arch are found the list grows
- Store of the info in a json
- Dump the json into a text file
[...snip...]
There was a question about whether there was any specific Red Hat sensitivity. I haven't found any reason against using this kind of script. Go for it.
This morning I wrote a small script that converts yum's medata into a small json blob that can be used to check if a package is present in RHEL before creating a branch for it in EPEL, or to restrict building the package on EPEL for certain arch.
What is does it basically:
- For a list of RHEL version
- For a list of directories specific to that RHEL version
- Find all the primary.sqlite database
- Decompress them if needed
- For all the packages listed in the database
- get the base package (using the srpm info)
- get the epoch, version, release (does nothing w/ it atm)
- get the arch - As more arch are found the list grows
- Store of the info in a json
- Dump the json into a text file
[...snip...]
There was a question about whether there was any specific Red Hat sensitivity. I haven't found any reason against using this kind of script. Go for it.
How do we deal with additions of new packages in later RHEL releases? EG there's a number of new packages that are in 7.1 beta (and hence will be in 7.1 GA) that are now public.
Peter
On Thu, Dec 25, 2014 at 11:41:48AM +0000, Peter Robinson wrote:
This morning I wrote a small script that converts yum's medata into a small json blob that can be used to check if a package is present in RHEL before creating a branch for it in EPEL, or to restrict building the package on EPEL for certain arch.
What is does it basically:
- For a list of RHEL version
- For a list of directories specific to that RHEL version
- Find all the primary.sqlite database
- Decompress them if needed
- For all the packages listed in the database
- get the base package (using the srpm info)
- get the epoch, version, release (does nothing w/ it atm)
- get the arch - As more arch are found the list grows
- Store of the info in a json
- Dump the json into a text file
[...snip...]
There was a question about whether there was any specific Red Hat sensitivity. I haven't found any reason against using this kind of script. Go for it.
How do we deal with additions of new packages in later RHEL releases?
IIRC we drop them from EPEL or give them a chance to be renamed to compat packages, but I'll let Kevin or Dennis confirm.
The idea of this script is that we now have a list of the packages in RHEL and therefore we can check our overlap with EPEL.
EG there's a number of new packages that are in 7.1 beta (and hence will be in 7.1 GA) that are now public.
Two things there, iirc for 7.0 there were some packages dropped between the beta and the release (I remember some packages could be found in the beta repo and not in the released ones). And we also ensure that epel does not conflict only on a set of channels, so depending where these new packages are, we might (or not) have to act in EPEL.
Pierre
On Fri, Nov 07, 2014 at 01:32:32PM +0100, Pierre-Yves Chibon wrote:
Hi all,
This morning I wrote a small script that converts yum's medata into a small json blob that can be used to check if a package is present in RHEL before creating a branch for it in EPEL, or to restrict building the package on EPEL for certain arch.
What is does it basically:
- For a list of RHEL version
- For a list of directories specific to that RHEL version
- Find all the primary.sqlite database
- Decompress them if needed
- For all the packages listed in the database
- get the base package (using the srpm info)
- get the epoch, version, release (does nothing w/ it atm)
- get the arch - As more arch are found the list grows
- Store of the info in a json
- Dump the json into a text file
So using this, we are quickly able to check if a package is in RHELX, for example:
import json with open('pkg_el7.json') as stream:
... data = json.load(stream)
'python-zope-interface' in data
True
data['python-zope-interface']
{u'release': u'4.el7', u'epoch': u'0', u'version': u'4.0.5', u'arch': [u'ppc64', u'x86_64']}
I would like to hear if people are fine with me breaking this already. It has been running for a week now and afaik we have nothing (in prod or testing) depending on it yet so it seems like the best time to make the schema evolve.
Basically I would like to go from: { "pkg1": { "version": 0.2, "arch": ["i686", "x86_64"], ... }, "pkg2": { "version": ... }, ... }
To: { "packages": [ "pkg1": { "version": 0.2, "arch": ["i686", "x86_64"], ... }, "pkg2": { "version": ... }, ... ], "arches": ["i686", "x86_64", "ppc64"...], <eventually something if we need to> }
Thoughts? +1/-1?
Thanks, Pierre
Sure, seems like a better time to make changes than when we have consumers. +1
kevin
On Mon, Jan 19, 2015 at 05:41:41PM +0100, Pierre-Yves Chibon wrote:
I would like to hear if people are fine with me breaking this already. It has been running for a week now and afaik we have nothing (in prod or testing) depending on it yet so it seems like the best time to make the schema evolve.
Basically I would like to go from: { "pkg1": { "version": 0.2, "arch": ["i686", "x86_64"], ... }, "pkg2": { "version": ... }, ... }
To: { "packages": [ "pkg1": { "version": 0.2, "arch": ["i686", "x86_64"], ... }, "pkg2": { "version": ... }, ... ], "arches": ["i686", "x86_64", "ppc64"...],
<eventually something if we need to> }
For the record, here is the change:
=== @@ -187,7 +187,7 @@ def main():
for el in PATHS:
- output = {} + output = {'packages': {}, 'arches': []}
dbfiles = find_primary_sqlite(PATHS[el])
@@ -208,14 +208,18 @@ def main(): cnt = 0 new = 0 for pkg in session.query(Package).all(): - if pkg.basename in output: - if pkg.arch not in output[pkg.basename]['arch']: - output[pkg.basename]['arch'].append(pkg.arch) + if pkg.basename in output['packages']: + if pkg.arch not in output['packages'][ + pkg.basename]['arch']: + output['packages'][pkg.basename]['arch'].append( + pkg.arch) + if pkg.arch not in output['arches']: + output['arches'].append(pkg.arch) # TODO: checks if the evr is more recent or not # (and update if it is) else: new += 1 - output[pkg.basename] = { + output['packages'][pkg.basename] = { 'arch': [pkg.arch], 'epoch': pkg.epoch, 'version': pkg.version, @@ -225,7 +229,8 @@ def main(): print '%s packages in %s' % (cnt, cur_fold) print '%s packages were new packages' % (new)
- print '\n%s packages retrieved in %s' % (len(output), el) + print '\n%s packages retrieved in %s' % (len(output['packages']), el) + print '%s arches for in %s' % (len(output['arches']), el) outputfile = 'pkg_%s.json' % el with open(outputfile, 'w') as stream: stream.write(json.dumps(output)) === Pierre
rel-eng@lists.fedoraproject.org