* Zbigniew Jędrzejewski-Szmek:
On Mon, Jun 03, 2019 at 02:59:13PM +0200, Florian Weimer wrote:
> * Zbigniew Jędrzejewski-Szmek:
>
> > On Mon, May 27, 2019 at 09:13:50PM +0200, Florian Weimer wrote:
> >> * Tomasz Kłoczko:
> >>
> >> > On Mon, 27 May 2019 at 10:41, Florian Weimer <fweimer(a)redhat.com>
wrote:
> >> >> I'm investigating whether it makes sense to switch to a scheme
where the
> >> >> glibc locale data is built from source, during package
installation,
> >> >> based on the langpack configuration system. This is similar to
what
> >> >> Debian does.
> >> >>
> >> >> The reason is that the compressed locale source code (without the
> >> >> charmaps, which are not strictly needed once we patch localedef)
>
> > Can you expand a bit on this part about patch?
>
> localedef currently reads character conversion tables from charmap files
> under /usr/share/i18n/charmaps. The same information is contained in
> the gconv modules unconditionally installed under /usr/lib*/gconv.
>
> > Do I understand correctly, that the saving essentially comes from the fact
> > that current glibc-langpack-en contains 14 localized variants (AU, BW, ZA,
> > US, ...), and only a subset of those could be generated in your proposal?
> > If so, would simply splitting glibc-langpack-en further into subpackages
> > be an alternative? E.g. glibc-langpack-en-US, glibc-langpack-en-AU,
> > ... ?
>
> In theory, yes, but that would result in a few dozen more langpack
> packages.
>
> The other variance is the supported single-byte charset (UTF-8,
> ISO-8859-1, ISO-8859-15).
Hmm, so maybe that's the way to go: split each langpack into
glibc-langpack-XX and glibc-langpack-XX-legacy. Not installing -legacy
will halve the disk usage, no?
This will nearly double the number of langpack packages needed by glibc.
We also use hard links to share identical files across locales—compare
the output of “du -hcs /usr/lib/locale/en_*”, “du -hcsl
/usr/lib/locale/en_*”, “du -hcs /usr/lib/locale/en_US.utf8/” and finally
“du -hcs /usr/lib/locale/en_US{,.utf8}/”.
In short, there's 6.7 MiB today, 2.9 MiB for UTF-8 only, and 3.2 MiB for
UTF-8 and ISO-8859-1. (I don't think skipping en_US is realistic.)
Thanks,
Florian