New coredump2packages with libsolv

Wednesday, 26 October 2011

Dear all,

Here is a rewrite of coredump2packages script that is used in retrace
server. It uses libsolv to scan repository metadata and to check which
packages can be installed together. Dependency on yum was completely
dropped, as yum dependency solver was not versatile enough to satisfy
the requirements.

This new coredump2packages should return the best available package list
from the provided repositories all the time. The returned package list
is always installable. When the script cannot determine the package
containing the executable binary referenced from a coredump, it dowloads
the debuginfo package to tmp and checks appropriate symlink. A
disadvantage is that it is slower than the old script, because it does
more things.

Dependency:
libsolv: http://jnovy.fedorapeople.org/libsolv-0.0.0-1.src.rpm
install python-solv

Usage:
See the attached manpage. I am testing it with the following command:

./coredump2packages coredump_evince --fields packages components \
missing_build_ids missing_packages program_packages program_components \
installable_packages installable_components \
installable_program_package installable_program_component \
full_installable_packages --log=stdout

However, I haven't tested it much: just single devel machine with remote
Fedora 14 repositories. Local repositories should be tested. Good test
script would discover remaining bugs in the code.

Karel

#! /usr/bin/python
# -*- coding:utf-8;mode:python -*-
#
# Extracts package information from a coredump.
# Repository handling code was mostly copied from libsolv's pysolv
# example.
#
# Requires Python 2.6+, libsolv Python bindings, rpm2cpio, cpio,
# eu-unstrip.
import StringIO
import argparse
import glob
import iniparse
import os
import re
import shutil
import solv
import stat
import subprocess
import sys
import tempfile
import time
import fnmatch
import urllib2
import functools
import itertools

class GenericRepo(dict):
    def __init__(self, name, type, md_cache_dir, attribs = {}):
        for k in attribs:
            self[k] = attribs[k]
        self.name = name
        self.type = type
        self.md_cache_dir = md_cache_dir

    def cache_path(self, ext = None):
        path = re.sub(r'^\.', '_', self.name)
        if ext:
            path += "_" + ext + ".solvx"
        else:
            path += ".solv"
        return os.path.join(self.md_cache_dir, re.sub(r'[/]', '_', path))

    def load(self, pool):
        self.handle = pool.add_repo(self.name)
        self.handle.appdata = self
        self.handle.priority = 99 - self['priority']
        if self['autorefresh']:
            dorefresh = True
        if dorefresh:
            try:
                st = os.stat(self.cache_path())
                if time.time() - st[stat.ST_MTIME] < self['metadata_expire']:
                    dorefresh = False
            except OSError, e:
                pass
        self['cookie'] = ''
        if not dorefresh and self.use_cached_repo(None):
            log.write("Loading cached repository
'{0}'.\n".format(self.name))
            return True
        return self.load_if_changed()

    def load_if_changed(self):
        return False

    def load_ext(repodata):
        return False

    def set_from_urls(self, urls):
        if not urls:
            return
        url = urls[0]
        log.write("  - using mirror
{0}\n".format(re.sub(r'^(.*?/...*?)/.*$', r'\1', url)))
        self['baseurl'] = url

    def set_from_metalink(self, metalink):
        nf = self.download(metalink, False)
        if not nf:
            return None
        f = os.fdopen(os.dup(solv.xfileno(nf)), 'r')
        solv.xfclose(nf)
        urls = []
        chksums = []
        for l in f.readlines():
            l = l.strip()
            m = re.match(r'^https?://.+/', l)
            if m:
                urls.append(m.group(0))
            m = re.match(r'^<hash
type="sha256">([0-9a-fA-F]{64})</hash>', l)
            if m:
                chksums.append(solv.Chksum(solv.REPOKEY_TYPE_SHA256, m.group(1)))
            m =
re.match(r'^<url.*>(https?://.+)repodata/repomd.xml</url>', l)
            if m:
                urls.append(m.group(1))
        if len(urls) == 0:
            chksums = [] # in case the metalink is about a different file
        f.close()
        self.set_from_urls(urls)
        return chksums

    def set_from_mirror_list(self, mirrorlist):
        nf = self.download(mirrorlist, False)
        if not nf:
            return
        f = os.fdopen(os.dup(solv.xfileno(nf)), 'r')
        solv.xfclose(nf)
        urls = []
        for l in f.readline():
            l = l.strip()
            if l[0:6] == 'http://' or l[0:7] == 'https://':
                urls.append(l)
        self.set_from_urls(urls)
        f.close()

    def download(self, file, uncompress, chksums=[], markincomplete=False):
        url = None
        if 'baseurl' not in self:
            if 'metalink' in self:
                if file != self['metalink']:
                    metalinkchksums = self.set_from_metalink(self['metalink'])
                    if file == 'repodata/repomd.xml' and len(chksums) == 0:
                        chksums = metalinkchksums
                else:
                    url = file
            elif 'mirrorlist' in self:
                if file != self['mirrorlist']:
                    self.set_from_mirror_list(self['mirrorlist'])
                else:
                    url = file
        if not url:
            if 'baseurl' not in self:
                log.write("Error: {0}: no baseurl\n".format(self.name))
                return None
            url = re.sub(r'/$', '', self['baseurl']) + '/'
+ file
        log.write("  - downloading {0}\n".format(url))
        f = tempfile.TemporaryFile()
        try:
            urlfile = urllib2.urlopen(url, timeout=30)
            while True:
                data = urlfile.read(8*32168)
                if len(data) == 0:
                    break
                f.write(data)
            urlfile.close()
        except urllib2.URLError as e:
            log.write("Error: {0}: download error: {1}\n".format(url, e))
            if markincomplete:
                self['incomplete'] = True
            return None
        f.flush()
        os.lseek(f.fileno(), 0, os.SEEK_SET)
        verified = (len(chksums) == 0)
        for chksum in chksums:
            fchksum = solv.Chksum(chksum.type)
            if fchksum is None:
                if markincomplete:
                    self['incomplete'] = True
                continue
            fchksum.add_fd(f.fileno())
            if fchksum.raw() != chksum.raw():
                if markincomplete:
                    self['incomplete'] = True
                continue
            else:
                verified = True
        if not verified:
            log.write("Error {0}: checksum mismatch or unknown checksum
type\n".format(file))
            return None
        if uncompress:
            return solv.xfopen_fd(file, os.dup(f.fileno()))
        return solv.xfopen_fd(None, os.dup(f.fileno()))

    def use_cached_repo(self, ext, mark=False):
        if not ext:
            cookie = self['cookie']
        else:
            cookie = self['extcookie']
        try:
            repopath = self.cache_path(ext)
            f = open(repopath, 'r')
            f.seek(-32, os.SEEK_END)
            fcookie = f.read(32)
            if len(fcookie) != 32:
                return False
            if cookie and fcookie != cookie:
                return False
            if self.type != 'system' and not ext:
                f.seek(-32 * 2, os.SEEK_END)
                fextcookie = f.read(32)
                if len(fextcookie) != 32:
                    return False
            f.seek(0)
            flags = 0
            if ext:
                flags = solv.Repo.REPO_USE_LOADING | solv.Repo.REPO_EXTEND_SOLVABLES
                if ext != 'DL':
                    flags |= solv.Repo.REPO_LOCALPOOL
            if not self.handle.add_solv(f, flags):
                return False
            if self.type != 'system' and not ext:
                self['cookie'] = fcookie
                self['extcookie'] = fextcookie
            if mark:
                # no futimes in python?
                try:
                    os.utime(repopath, None)
                except Exception, e:
                    pass
        except IOError, e:
            return False
        return True

    def get_ext_cookie(self, f):
        chksum = solv.Chksum(solv.REPOKEY_TYPE_SHA256)
        chksum.add(self['cookie'])
        if f:
            st = os.fstat(f.fileno())
            chksum.add(str(st[stat.ST_DEV]))
            chksum.add(str(st[stat.ST_INO]))
            chksum.add(str(st[stat.ST_SIZE]))
            chksum.add(str(st[stat.ST_MTIME]))
        extcookie = chksum.raw()
        # compatibility to c code
        if ord(extcookie[0]) == 0:
            extcookie = chr(1) + extcookie[1:]
        self['extcookie'] = extcookie

    def write_cached_repo(self, ext, info=None):
        try:
            if not os.path.isdir(self.md_cache_dir):
                os.mkdir(self.md_cache_dir, 0755)
            (fd, tmpname) = tempfile.mkstemp(prefix='.newsolv-',
                                             dir=self.md_cache_dir)
            os.fchmod(fd, 0444)
            f = os.fdopen(fd, 'w+')
            if not info:
                self.handle.write(f)
            elif ext:
                info.write(f)
            else:       # rewrite_repos case
                self.handle.write_first_repodata(f)
            if self.type != 'system' and not ext:
                if 'extcookie' not in self:
                    self.get_ext_cookie(f)
                f.write(self['extcookie'])
            if not ext:
                f.write(self['cookie'])
            else:
                f.write(self['extcookie'])
            f.close()
            if self.handle.iscontiguous():
                # Switch to saved repo to activate paging and save memory.
                nf = solv.xfopen(tmpname)
                if not ext:
                    # Main repository.
                    self.handle.empty()
                    if not self.handle.add_solv(nf, solv.Repo.SOLV_ADD_NO_STUBS):
                        sys.exit("Internal error, cannot reload solv file.")
                else:
                    # Extension repodata.
                    # Need to extend to repo boundaries, as this is how
                    # info.write() has written the data.
                    info.extend_to_repo()
                    # LOCALPOOL does not help as pool already contains all ids
                    info.add_solv(nf, solv.Repo.REPO_EXTEND_SOLVABLES)
                solv.xfclose(nf)
            os.rename(tmpname, self.cache_path(ext))
        except IOError, e:
            if tmpname:
                os.unlink(tmpname)

    def update_added_provides(self, addedprovides):
        if 'incomplete' in self:
            return
        if 'handle' not in self:
            return
        if self.handle.isempty():
            return
        # Make sure there's just one real repodata with extensions.
        repodata = self.handle.first_repodata()
        if not repodata:
            return
        oldaddedprovides = repodata.lookup_idarray(solv.SOLVID_META,
                                                   solv.REPOSITORY_ADDEDFILEPROVIDES)
        if not set(addedprovides) <= set(oldaddedprovides):
            for id in addedprovides:
                repodata.add_idarray(solv.SOLVID_META,
                                     solv.REPOSITORY_ADDEDFILEPROVIDES, id)
            repodata.internalize()
            self.write_cached_repo(None, repodata)

class MetadataRepo(GenericRepo):
    def load_if_changed(self):
        log.write("Checking rpmmd repo '{0}'.\n".format(self.name))
        sys.stdout.flush()
        f = self.download("repodata/repomd.xml", False)
        if not f:
            log.write("  - no repomd.xml file, skipping\n")
            self.handle.free(True)
            del self.handle
            return False

        # Calculate a cookie from repomd contents.
        chksum = solv.Chksum(solv.REPOKEY_TYPE_SHA256)
        chksum.add_fp(f)
        self['cookie'] = chksum.raw()

        if self.use_cached_repo(None, True):
            log.write("  - using cached metadata\n")
            solv.xfclose(f)
            return True
        os.lseek(solv.xfileno(f), 0, os.SEEK_SET)
        self.handle.add_repomdxml(f, 0)
        solv.xfclose(f)
        log.write("  - fetching metadata\n")
        (filename, filechksum) = self.find('primary')
        if filename:
            f = self.download(filename, True, [filechksum], True)
            if f:
                self.handle.add_rpmmd(f, None, 0)
                solv.xfclose(f)
            if 'incomplete' in self:
                return False # Hopeless, need good primary.
        (filename, filechksum) = self.find('updateinfo')
        if filename:
            f = self.download(filename, True, [filechksum], True)
            if f:
                self.handle.add_updateinfoxml(f, 0)
                solv.xfclose(f)
        self.add_exts()
        if 'incomplete' not in self:
            self.write_cached_repo(None)
        # Must be called after writing the repo.
        self.handle.create_stubs()
        return True

    def find(self, what):
        di = self.handle.Dataiterator(solv.SOLVID_META,
                                      solv.REPOSITORY_REPOMD_TYPE, what,
                                      solv.Dataiterator.SEARCH_STRING)
        di.prepend_keyname(solv.REPOSITORY_REPOMD)
        for d in di:
            d.setpos_parent()
            filename = d.pool.lookup_str(solv.SOLVID_POS,
                                         solv.REPOSITORY_REPOMD_LOCATION)
            chksum = d.pool.lookup_checksum(solv.SOLVID_POS,
                                            solv.REPOSITORY_REPOMD_CHECKSUM)
            if filename and not chksum:
                log.write("Error: no {0} file checksum!\n".format(filename))
                filename = None
                chksum = None
            if filename:
                return (filename, chksum)
        return (None, None)

    def add_ext(self, repodata, what, ext):
        filename, chksum = self.find(what)
        if not filename and what == 'deltainfo':
            filename, chksum = self.find('prestodelta')
        if not filename:
            return
        handle = repodata.new_handle()
        repodata.set_poolstr(handle, solv.REPOSITORY_REPOMD_TYPE, what)
        repodata.set_str(handle, solv.REPOSITORY_REPOMD_LOCATION, filename)
        repodata.set_checksum(handle, solv.REPOSITORY_REPOMD_CHECKSUM, chksum)
        if ext == 'DL':
            repodata.add_idarray(handle, solv.REPOSITORY_KEYS,
                                 solv.REPOSITORY_DELTAINFO)
            repodata.add_idarray(handle, solv.REPOSITORY_KEYS,
                                 solv.REPOKEY_TYPE_FLEXARRAY)
        elif ext == 'FL':
            repodata.add_idarray(handle, solv.REPOSITORY_KEYS,
                                 solv.SOLVABLE_FILELIST)
            repodata.add_idarray(handle, solv.REPOSITORY_KEYS,
                                 solv.REPOKEY_TYPE_DIRSTRARRAY)
        repodata.add_flexarray(solv.SOLVID_META, solv.REPOSITORY_EXTERNAL, handle)

    def add_exts(self):
        repodata = self.handle.add_repodata(0)
        self.add_ext(repodata, 'deltainfo', 'DL')
        self.add_ext(repodata, 'filelists', 'FL')
        repodata.internalize()

    def load_ext(self, repodata):
        repomdtype = repodata.lookup_str(solv.SOLVID_META,
                                         solv.REPOSITORY_REPOMD_TYPE)
        if repomdtype == 'filelists':
            ext = 'FL'
        elif repomdtype == 'deltainfo':
            ext = 'DL'
        else:
            return False
        log.write("Loading extended metadata {1} for {0}.\n".format(
                self.name, repomdtype))
        if self.use_cached_repo(ext):
            log.write("  - found recent copy in cache\n")
            return True
        log.write("  - fetching\n")
        filename = repodata.lookup_str(solv.SOLVID_META,
                                       solv.REPOSITORY_REPOMD_LOCATION)
        filechksum = repodata.lookup_checksum(solv.SOLVID_META,
                                              solv.REPOSITORY_REPOMD_CHECKSUM)
        f = self.download(filename, True, [filechksum])
        if not f:
            return False
        if ext == 'FL':
            self.handle.add_rpmmd(f, 'FL', solv.Repo.REPO_USE_LOADING |
                                  solv.Repo.REPO_EXTEND_SOLVABLES)
        elif ext == 'DL':
            self.handle.add_deltainfoxml(f, solv.Repo.REPO_USE_LOADING)
        solv.xfclose(f)
        self.write_cached_repo(ext, repodata)
        return True

def load_stub(repodata):
    repo = repodata.repo.appdata
    if repo:
        return repo.load_ext(repodata)
    return False

# Process command line arguments
cmdline_parser = argparse.ArgumentParser(description='Get packages for coredump
processing.')
cmdline_parser.add_argument('--repos', default=['*'], nargs='+',
metavar='WILDCARD',
                            help='Repositories (wildcard) to be enabled')
cmdline_parser.add_argument('coredump', help='Coredump')
cmdline_parser.add_argument('--log', metavar='FILENAME',
                            help='Store debug output to a file (it is possible to use
"stdout" and "stderr")')
cmdline_parser.add_argument("--md-cache-dir", metavar="DIR",
default="/var/cache/solv",
                            help="Directory to store repository metadata
cache.")
FIELDS_CHOICES = ["packages", "components",
                  "program_packages", "program_components",
                  "installable_packages", "installable_components",
                  "installable_program_package",
"installable_program_component",
                  "full_installable_packages",
                  "missing_build_ids", "missing_packages"]
cmdline_parser.add_argument('--fields', nargs='+',
choices=FIELDS_CHOICES,
                            default=["installable_packages"],
                            help="Coredump information to be displayed")
cmdline_args = cmdline_parser.parse_args()

# Set up logging
if cmdline_args.log == "stdout":
    log = sys.stdout
elif cmdline_args.log == "stderr":
    log = sys.stderr
elif cmdline_args.log is not None:
    log = open(cmdline_args.log, "w")
else:
    log = open("/dev/null", "w")

# Create solver package pool.
pool = solv.Pool()
pool.setarch(os.uname()[4])
pool.set_loadcallback(load_stub)

# Read all repo configs
repos = []
for reponame in sorted(glob.glob("/etc/yum.repos.d/*.repo")):
    text = open(reponame).read()
    # TODO: properly handle releasever and basearch.  To get
    # releasever run `rpm -q --queryformat '%{VERSION}'
    # fedora-release`.  Basearch should come from `uname -i`.
    text = text.replace("$releasever",
"14").replace("$basearch", "i386")
    textfile = StringIO.StringIO(text)
    cfg = iniparse.INIConfig(textfile)
    for alias in cfg:
        repoattr = {'enabled': 0, 'priority': 99, 'autorefresh':
1,
                    'type': 'rpm-md', 'metadata_expire': 900}
        for k in cfg[alias]:
            repoattr[k] = cfg[alias][k]
        if 'mirrorlist' in repoattr and 'metalink' not in repoattr:
            if repoattr['mirrorlist'].find('/metalink'):
                repoattr['metalink'] = repoattr['mirrorlist']
                del repoattr['mirrorlist']
        if repoattr['type'] == 'rpm-md':
            repo = MetadataRepo(alias, 'repomd', cmdline_args.md_cache_dir,
repoattr)
        else:
            sys.exit("Unknown repository type for {0}.".format(alias))
        repos.append(repo)

# Enable repositories depending on a wildcard. Ignore enabled attribute.
for repo in repos:
    for repo_wildcard in cmdline_args.repos:
        if fnmatch.fnmatch(repo.name, repo_wildcard):
            repo.load(pool)

# Update pool
addedprovides = pool.addfileprovides_ids()
if addedprovides:
    for repo in repos:
        repo.update_added_provides(addedprovides)
pool.createwhatprovides()

# Get eu-unstrip output, which contains build-ids and binary object
# paths
log.write("Running eu-unstrip...\n")
unstrip_args = ['eu-unstrip', '--core={0}'.format(cmdline_args.coredump),
'-n']
unstrip_proc = subprocess.Popen(unstrip_args, stdout=subprocess.PIPE)
unstrip = unstrip_proc.communicate()[0]
log.write("{0}\n".format(unstrip))
if unstrip is None:
    sys.exit("Missing output from eu-unstrip.")

class UnstripOutput:
    """
    Represents a parsed output of `eu-unstrip -n --core=coredump`. It
    contains a list of lines with build ids and paths, where the first
    line corresponds to the executable program and the rest are
    dynamic libraries.
    """
    def __init__(self):
        self.lines = []
        self.program_entry_line = None

    def missing_build_ids(self):
        build_ids = []
        for line in self.lines:
            empty = True
            for bin_packages in line.debuginfo_packages.values():
                if len(bin_packages) > 0:
                    empty = False
                    break
            if empty:
                build_ids.append((line.build_id, line.binobj_path))
        return build_ids

    def missing_packages(self):
        packages = []
        for line in self.lines:
            for debuginfo, bin_packages in line.debuginfo_packages.items():
                if len(bin_packages) == 0:
                    packages.append((debuginfo, line.binobj_path))
        return packages

    def packages(self):
        """
        Build a list of all packages, including debuginfo. There might
        be several versions of single package.
        """
        packages = set()
        for line in self.lines:
            packages |= set(line.debuginfo_packages.keys())
            for binary_packages in line.debuginfo_packages.values():
                packages |= set(binary_packages)
        return packages

    def program_packages(self):
        packages = set()
        for binary_packages in self.program_entry_line.debuginfo_packages.values():
            packages |= set(binary_packages)
        return packages

    def build_install_variants(self):
        for line in self.lines:
            line.build_install_variants()
        self.install_variants_list = [line.install_variants for line in self.lines
                                      if len(line.install_variants) > 0]

    def max_install_variant_len(self):
        max_list = set()
        for install_variants in self.install_variants_list:
            max_list |= set(max(install_variants, key=len))
        return len(max_list)

class UnstripLine:
    """
    Represents a line from `eu-unstrip -n --core=coredump` output,
    including debuginfo packages that match the build id and
    corresponding packages containing the binary.
    """
    def __init__(self, build_id, binobj_path):
        self.build_id = build_id
        self.binobj_path = binobj_path
        self.debuginfo_packages = {}

    def build_install_variants(self):
        self.install_variants = []
        for debuginfo, binary_packages in self.debuginfo_packages.items():
            if len(binary_packages) == 0:
                self.install_variants.append([debuginfo])
            else:
                for package in binary_packages:
                    self.install_variants.append([debuginfo, package])

@functools.total_ordering
class Package:
    """
    Represents a single package from repository.
    """
    def __init__(self, solvable):
        self.solvable = solvable

    def source_name(self):
        source_name_id = self.solvable.lookup_id(solv.SOLVABLE_SOURCENAME)
        return pool.id2str(source_name_id) if source_name_id > 0 else
self.solvable.name

    def nevra(self):
        return "{0}-{1}.{2}".format(self.solvable.name, self.solvable.evr,
                                    self.solvable.arch)

    def download(self):
        location = self.solvable.lookup_location()[0]
        chksum = self.solvable.lookup_checksum(solv.SOLVABLE_CHECKSUM)
        return self.solvable.repo.appdata.download(location, False, [chksum])

    # Oneliners
    name = lambda self: self.solvable.name
    evr = lambda self: self.solvable.evr
    arch = lambda self: self.solvable.arch
    __str__ = lambda self: self.nevra()
    __repr__ = lambda self: "<Package {0}>".format(self.nevra())
    __eq__ = lambda self, other: self.nevra() == other.nevra()
    __lt__ = lambda self, other: self.nevra() < other.nevra()
    __hash__ = lambda self: hash(self.nevra())

# Parse the build-ids and paths to a structure.  Store the build id of
# a program. Build_id is an unique hash of a binary. Single identical
# binary might be located in several packages, or in several versions
# of single package.
unstrip_output = UnstripOutput()
for line in unstrip.split('\n'):
    parts = line.split()
    if not parts or len(parts) < 3:
        continue
    build_id = parts[1].split('@')[0]
    binobj_path = parts[2]
    if binobj_path[0] != '/' and parts[4] != '[exe]':
        continue
    unstrip_output.lines.append(UnstripLine(build_id, binobj_path))
    if unstrip_output.program_entry_line is None:
        unstrip_output.program_entry_line = unstrip_output.lines[-1]

# Find debuginfo packages corresponding to every build id. Create a
# new structure for this. Debuginfo_package is a structure with epoch,
# ver, rel, and arch attributes.
for line in unstrip_output.lines:
    # Ask for a known path from debuginfo package.
    debuginfo_path = "/usr/lib/debug/.build-id/{0}/{1}.debug".format(
        line.build_id[:2], line.build_id[2:])
    log.write("Searching repositories for {0}.\n".format(debuginfo_path))
    debuginfo_package_list = pool.Dataiterator(
        0, solv.SOLVABLE_FILELIST, debuginfo_path,
        solv.Dataiterator.SEARCH_STRING | solv.Dataiterator.SEARCH_FILES |
        solv.Dataiterator.SEARCH_COMPLETE_FILELIST)
    for package in debuginfo_package_list:
        assert Package(package.solvable) not in line.debuginfo_packages
        line.debuginfo_packages[Package(package.solvable)] = []
    count = len(line.debuginfo_packages.keys())
    formatted_packages = [str(package) for package in line.debuginfo_packages.keys()]
    log.write("  - found {0} result{1}{2}\n".format(
            count,
            "s" if count != 1 else "",
            ": {0}".format(", ".join(formatted_packages)) if count
> 0 else ""))

# For every debuginfo package, find associated binary packages that
# might contain the binary (specified either by a path or build id if
# path is not available).  Usually, the list for single debuginfo
# package contains only one item. Empty list means that no package was
# found, this might be caused by a bug in the package. List with
# multiple items indicate that there are multiple packages to select
# from, which indicates either a packaging bug or mutually exclusive
# packages within single component.
for line in unstrip_output.lines:
    for debuginfo_package in line.debuginfo_packages:
        # Binobj_paths is a list of path strings, that are used to detect
        # which package(s) associated with the debuginfo_package contains
        # the binary or binaries. It is a list because one build_id might
        # correspond to several identical binaries installed to different
        # paths when the path is not known from coredump, but retrived
        # using build_id. Usually the list contain a single item.
        binobj_paths = []
        if line.binobj_path == "-": # [exe] without binary name
            # Build_id is used to find the binary package when binobj_path
            # is "-". Let's download and extract the debuginfo package. We
            # need to know the value of the build_id symlink from the
            # inside.
            log.write("Downloading {0} for
examination.\n".format(debuginfo_package))
            temp_dir = tempfile.mkdtemp(prefix="coredump2packages")
            downloaded = debuginfo_package.download()
            local = os.path.join(temp_dir, debuginfo_package.nevra())
            cpio_file = open(local + ".cpio", "wb+")
            rpm2cpio_proc = subprocess.Popen(["rpm2cpio", "-"],
stdout=cpio_file,
                                             stdin=solv.xfileno(downloaded))
            rpm2cpio_proc.wait()
            solv.xfclose(downloaded)
            if rpm2cpio_proc.returncode != 0:
                sys.exit("Failed to convert RPM to cpio using rpm2cpio.")
            cpio_file.seek(0)

            unpack_dir = os.path.join(temp_dir, "unpacked")
            os.makedirs(unpack_dir)
            cpio_proc = subprocess.Popen(["cpio", "--extract",
"-d", "--quiet"],
                                         stdin=cpio_file, cwd=unpack_dir)
            cpio_proc.wait()
            if cpio_proc.returncode != 0:
                sys.exit("Failed to unpack RPM using cpio.")
            cpio_file.close()

            build_id_reldir = os.path.join("usr", "lib",
"debug", ".build-id",
                                           line.build_id[:2])
            build_id_file = os.path.join(unpack_dir, build_id_reldir,
                                         line.build_id[2:])
            if os.path.islink(build_id_file):
                binary_relpath = os.readlink(build_id_file)
                binobj_paths.append("/" + os.path.normpath(os.path.join(
                            build_id_reldir, binary_relpath)))
            for i in range(1, 8):
                build_id_file_i = "{0}.{1}".format(build_id_file, i)
                if os.path.islink(build_id_file_i):
                    binary_relpath = os.readlink(build_id_file_i)
                    binobj_paths.append("/" + os.path.normpath(os.path.join(
                                build_id_reldir, binary_relpath)))

            shutil.rmtree(temp_dir)
            log.write("Found exe paths {0}\n".format(binobj_paths))
        else:
            binobj_paths = [line.binobj_path]

        for binobj_path in binobj_paths:
            log.write("Searching for {0}.\n".format(binobj_path))
            binobj_package_list = pool.Dataiterator(
                0, solv.SOLVABLE_FILELIST, binobj_path,
                solv.Dataiterator.SEARCH_STRING | solv.Dataiterator.SEARCH_FILES |
                solv.Dataiterator.SEARCH_COMPLETE_FILELIST)
            for binobj_package_solvable in binobj_package_list:
                binobj_package = Package(binobj_package_solvable.solvable)
                log.write("    - {0}".format(binobj_package))
                if debuginfo_package.source_name() != binobj_package.source_name() or \
                        debuginfo_package.evr() != binobj_package.evr():
                    log.write(": EVR or base package name doesn't match {0}
{1}\n".format(
                            debuginfo_package.source_name(), debuginfo_package.evr()))
                    continue
                log.write(": EVR and base package name matches\n")
                line.debuginfo_packages[debuginfo_package].append(binobj_package)

def limit_jobs(pool, jobs, flags, evrstr):
    njobs = []
    evr = pool.str2id(evrstr)
    for j in jobs:
        how = j.how
        sel = how & solv.Job.SOLVER_SELECTMASK
        what = pool.rel2id(j.what, evr, flags)
        if flags == solv.REL_ARCH:
            how |= solv.Job.SOLVER_SETARCH
        elif flags == solv.REL_EQ and sel == solv.Job.SOLVER_SOLVABLE_NAME:
            if evrstr.find('-') >= 0:
                how |= solv.Job.SOLVER_SETEVR
            else:
                how |= solv.Job.SOLVER_SETEV
        njobs.append(pool.Job(how, what))
    return njobs

def dep_glob(pool, name, globname, globdep):
    id = pool.str2id(name, False)
    if id:
        match = False
        for s in pool.whatprovides(id):
            if globname and s.nameid == id:
                return [pool.Job(solv.Job.SOLVER_SOLVABLE_NAME, id)]
            match = True
        if match:
            if globname and globdep:
                log.write("[using capability match for
'{0}']\n".format(name))
            return [pool.Job(Job.SOLVER_SOLVABLE_PROVIDES, id)]
    if not re.search(r'[[*?]', name):
        return []
    if globname:
        # try name glob
        idmatches = {}
        for d in pool.Dataiterator(0, solv.SOLVABLE_NAME, name,
                                   solv.Dataiterator.SEARCH_GLOB):
            s = d.solvable
            if s.installable():
                idmatches[s.nameid] = True
        if idmatches:
            return [pool.Job(Job.SOLVER_SOLVABLE_NAME, id)
                    for id in sorted(idmatches.keys())]
    if globdep:
        # try dependency glob
        idmatches = pool.matchprovidingids(name, solv.Dataiterator.SEARCH_GLOB)
        if idmatches:
            log.write("[using capability match for
'{0}']\n".format(name))
            return [pool.Job(Job.SOLVER_SOLVABLE_PROVIDES, id)
                    for id in sorted(idmatches)]
    return []

unstrip_output.build_install_variants()
max_len = unstrip_output.max_install_variant_len()
best_installable_variant = None
best_installable_variant_len = 0
for variant in itertools.product(*unstrip_output.install_variants_list):
    log.write("Solving an installation variant.\n")
    flat_variant = reduce(lambda a,p:a|set(p), variant, set())
    jobs = []
    for package in flat_variant:
        mid_jobs = dep_glob(pool, package.name(), True, True)
        if mid_jobs:
            jobs += limit_jobs(pool, mid_jobs, solv.REL_EQ, package.evr())
            jobs += limit_jobs(pool, mid_jobs, solv.REL_ARCH, package.arch())
            continue
    for job in jobs:
        job.how |= solv.Job.SOLVER_INSTALL
    solver = pool.Solver()
    problems = solver.solve(jobs)
    if problems:
        log.write("  - this variant is unresolvable\n")
        for problem in problems:
            log.write("  -
{0}\n".format(problem.findproblemrule().info().problemstr()))
        del solver
        continue
    trans = solver.transaction()
    del solver
    assert not trans.isempty()
    assert len(trans.classify()) == 1
    installables = trans.classify()[0]
    assert installables.type == solv.Transaction.SOLVER_TRANSACTION_INSTALL
    if len(flat_variant) > best_installable_variant_len:
        log.write("  - replacing older best installable variant\n")
        best_installable_variant = (flat_variant, installables)
        best_installable_variant_len = len(flat_variant)
    else:
        log.write("  - keeping older best installable variant\n")
    if best_installable_variant_len == max_len:
        log.write("  - found best variant, terminating\n")
        break
    else:
        log.write("  - current best variant has {0} packages, maximum is
{1}\n".format(
                best_installable_variant_len, max_len))

def to_components(packages):
    return set([package.source_name() for package in packages])

# Format and print output. Formatting depends on --fields command line
# argument.
output = []
for field in cmdline_args.fields:
    array_formatted = lambda x:["{0}\n".format(item) for item in sorted(x)]
    single_formatted = lambda x:"".join(array_formatted(x)) if len(x) > 0
else "-\n"
    if field == "packages":
        output.append(single_formatted(unstrip_output.packages()))
    elif field == "components":
        output.append(single_formatted(to_components(unstrip_output.packages())))
    elif field == "program_packages":
        output.append(single_formatted(unstrip_output.program_packages()))
    elif field == "program_components":
        output.append(single_formatted(to_components(unstrip_output.program_packages())))
    elif field == "installable_packages":
        output.append(single_formatted(best_installable_variant[0]))
    elif field == "installable_components":
        output.append(single_formatted(to_components(best_installable_variant[0])))
    elif field == "installable_program_component":
        comp = to_components(best_installable_variant[0] &
                             unstrip_output.program_packages())
        assert len(comp) <= 1
        output.append(single_formatted(comp))
    elif field == "installable_program_package":
        package = best_installable_variant[0] & unstrip_output.program_packages()
        assert len(package) <= 1
        output.append(single_formatted(package))
    elif field == "full_installable_packages":
        output.append(single_formatted(best_installable_variant[1].solvables()))
    elif field == "missing_build_ids":
        formatted = "".join(["{0} {1}\n".format(build_id, path)
                             for build_id, path in unstrip_output.missing_build_ids()])
        output.append(formatted if len(formatted) > 0 else "-\n")
    elif field == "missing_packages":
        formatted = "".join(["{0} {1}\n".format(package, path)
                             for package, path in unstrip_output.missing_packages()])
        output.append(formatted if len(formatted) > 0 else "-\n")
    else:
        sys.exit("Unknown field '{0}'.".format(field))
sys.stdout.write("\n".join(output))

coredump2packages(1)
====================

NAME
----
coredump2packages - Extract package information from a coredump.

SYNOPSIS
--------
'coredump2packages' COREDUMP [--repos WILDCARD...] [--log FILENAME] [--fields
FIELD...]

DESCRIPTION
-----------
This tool matches a coredump to packages from the provided repositories.

OPTIONS
-------

--repos WILDCARD...::
  Names of repositories that should be searched. Multiple wildcards or
  repository names can be specified. Default value is '*' (search all
  repositories, including the disabled).

--log FILENAME::
  Store debugging information to a specified file. When 'stdout' or
  'stderr' is provided as FILENAME, the debugging output is redirected
  to standard (erorr) output.

--md-cache-dir DIR::
  Directory to store repository metadata cache, which speeds up
  repository metadata initialization. If the provided directory does
  not exist, it is created.

--fields FIELD...::
  Specifies the output of the tool. The order of provided fields is
  used for coredump2packages output. The fields are separated by an
  empty line in the output. An empty field output is indicated by a
  single line containing the '-' character. Default value is
  'installable_packages'.

Fields
~~~~~~

'packages'::
  Prints all packages (name-version-release) that are possiby
  referenced from the coredump. The list includes multiple versions of
  debuginfo packages if they matches a build_id and multiple versions
  of binary packages that correspond to debuginfo packages.

'components'::
  Prints all components that are possibly referenced from the
  coredump.

'program_packages'::
  Prints a list of packages that contain the binary referenced by the
  coredump with the matching build_id.

'program_components'::
  Prints a list of components which package contains the binary
  referenced by the coredump via build_id.

'installable_packages'::
  Prints a list of packages that are referenced from the coredump and
  that are installable. This means there are no conflicts between them
  or their dependencies.

'installable_components'::
  Prints a list of components covering the installable packages.

'installable_program_package'::
  Prints a package containing the binary referenced from the coredump.

'installable_program_component'::
  Prints a component that contains the binary referenced from the
  coredump.

'full_installable_packages'::
  Prints an installable full set of packages that can be used to setup
  an environment for coredump analysis via chroot GDB.

'missing_build_ids'::
  Prints a list of build ids from the coredump without corresponding
  debuginfo package found in provided repositories. Every line
  contains a build id and a path to binary.

'missing_packages'::
  Prints a list of debuginfo packages that were referenced from the
  coredump, but no binary was found in corresponding binary packages.
  In other words, the debugging symbols are available, but the binary
  is not. Every line contains a debuginfo package name-version-release
  and a path to binary that was not found in packages.

AUTHORS
-------
* Karel Klic

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009