A more efficient up2date service using binary diffs

Jeff Johnson n3npq at nc.rr.com
Mon Mar 14 13:32:28 UTC 2005


Thomas Hille wrote:

>Am Montag, den 14.03.2005, 08:18 +0100 schrieb Florian La Roche:
>  
>
>>On Mon, Mar 14, 2005 at 01:59:07AM +0000, Joe Desbonnet wrote:
>>    
>>
>>>I have more results from my experiments in RPM delta compression. I've posted
>>>the results so far here: http://www.wombat.ie/software/rpmdc/index.shtml
>>>
>>>Conclusion so far: assuming someone has the distribution RPMs
>>>available then an entire
>>>update repository (about 1GB) can be generated from 200MB of files. 
>>>
>>>I hope to post my code once I clean it up a bit (it's implemented in
>>>Java currently).
>>>
>>>Must check out rdiff also...
>>>      
>>>
>>I've done tests some time ago that showed a 4.8 factor to reduce bandwidth
>>needs for RHEL update releases. Big drawback will be the need of the previous
>>packages, so this might again be only something for a local server to
>>download updates, but not for normal client machines.
>>Still the savings look very nice, so I think we should continue looking at
>>this.
>>    
>>
>
>Just to give my 2 cents....
>
>The drawback you talk about could be eliminated, when you diff not the
>whole rpm, but instead for the single files in it. - These are present
>on the client machine. Then the only problem that arises are corrupted
>files. So you would need to check the md5 before.
>  
>

Yep. All this has been known since 1998, see rpm-list at redhat.com archives..

Josh McDonald, the xdelta guy, even had a proof-of-concept implementation
in (iirc) xdelta-0.18 for *.rpm packages.

Ooops, Red Hat chose not to maintain <rpm-list at redhat.com> archives, too 
bad.

>Maybe doing the diff on the single files also could help compress the
>rpms, that were not compressible using the whole rpm (omni-foomatic
>etc.)
>
>Nevertheless, the recent OOo update makes me believe, that we should
>really think about anything that reduces bandwidth (even in the time of
>common broadband access).
>  
>

I ask:

    If you have to rip apart a *.rpm package into it's components in 
order to achieve
    the goal of minimal bandwidth used to transfer a package, then what 
exactly is
    the point of putting the contents in a *.rpm package in the first place?

73 de Jeff





More information about the devel mailing list