No more deltarpms by default

Nico Kadel-Garcia nkadel at gmail.com
Sun Oct 19 04:37:04 UTC 2014


On Sat, Oct 18, 2014 at 8:28 PM, Reindl Harald <h.reindl at thelounge.net> wrote:
>
>
> Am 19.10.2014 um 02:19 schrieb Solomon Peachy:
>>
>> On Sat, Oct 18, 2014 at 07:00:19PM -0400, Nico Kadel-Garcia wrote:
>>>
>>>    3) People who have a lot of hosts and high bandwidth, high speed
>>> local deployment requirements can, and do, set up an internal Fedora
>>> mirror with much lower bandwidth costs. This reduces the tangible
>>> benefits of deltarpms significantly. This is combined with my direct
>>
>>
>> Folks that have that sort of environment also typically use kickstart to
>> set systems up, and can trivially disable deltarpms in the process
>
>
> that people don't fall in that category anyways
>
> they just re-use the result of deltarpm in /var/cache/yum to build up their
> local repos and even in that case they benefit from one time saved downloads
> - keep in mind the result in /var/cache/yum/ from which you can build up
> your local repos is the full RPM
>
> and that is why the current implementation of deltarpm is perfect designed
> and any improvement needs to happen on a different layer
>
> that few lines below are enough to use createrepo and build up a local cache
> without mirror the whole upstream, you just need to have one machine with
> any pakcge you use installed on it - works perfect over 6 years including
> dist-upgrades *and* benefits from deltarpm in the first step
>
> #!/usr/bin/bash
> basearch=`uname -i`
> releasever=`rpm -q --qf "%{version}\n" fedora-release`
> for g in `ls -1b /var/cache/yum`
> do
>  if [ -d /var/cache/yum/$g/packages ]
>  then
>   echo "/var/cache/yum/$g/packages/ > /repo/cache/fc$releasever/"
>   sudo mv --verbose /var/cache/yum/$g/packages/*.rpm
> /repo/cache/fc$releasever/ 2> /dev/null
>  fi
> done
> /buildserver/repo-create.sh

And that only works on one machine, and is sensitive to exclusions
written into yum.conf or /etc/yum.repos.d/*, etc., etc., etc. You also
forgot to use 'gpgcheck' to verify the contents of the already
downloaded RPM, since 'repsync' does not track successful versus
partial downloads without it.

My old script for dong reposync mirrors is at
https://github.com/nkadel/nkadel-rsync-scripts/blob/master/reposync-rhel.sh,
written for RHEL SRPM mirroring for CentOS work, and it's quite
suitable for synchronizing 3rd party repositories without 'rsync'
access. My old tools for rsyncing other repositories are in the same
github repo, do grab them if you feel a need.

Anyway. Now multiply your 'reposync' based tool by 200 thin
provisioned virtual machines, and you're chewing up quite a lot of
disk space doing this. And the "bandwidth saving" is still badly
overshadowed by the necessary repodata bandwith from almost every run
of "reposync" in the model you've just described. 50 Megabytes, *every
time* it runs unless the metadata for the repo has not expired. .
Since the default is 90 minutes, if you're running any of the "keep me
all the time updated or alerted" tools, that's 24 hours / 90 minutes *
50 Meg, or up to 800 Meg/day. Even if it's only run nightly, it's 50
Meg/day with *no* updates applied, That overwhelms the benefits of
deltarpms pretty quickly.

Sorry to rain on parades about the benefits of deltarpms, but it's
like wearing waterproof boots when when you're standing in the shower.
It's just an extra burden when the repodata downloads are already
eating significantly more bandwidth than any but the largest largest
RPM downloads.


More information about the devel mailing list