Duplicated files in the pristine FC4t2 installation

Mike A. Harris mharris at www.linux.org.uk
Tue May 3 00:35:34 UTC 2005


Peter Jones wrote:
> On Mon, 2005-05-02 at 12:35 -0700, Roland McGrath wrote:
> 
>>>Roland McGrath wrote:
>>>
>>>>I think what one clearly wants is for rpm to maintain an installed file
>>>>indexed keyed by md5sum.  Then you can have a tool that just uses this
>>>>database to identify duplicates (and doesn't take forever), or have rpm do
>>>>so itself when installing new files.
>>>>
>>>
>>>Hmm, what about hash collisions, that would be really really BAD
>>
>>If you are concerned about them you can still compare contents before
>>declaring two files identical.  But using the hashes as the main detector
>>makes it fast, since you only examine the data of files that are 99.999%
>>likely to be identical.
> 
> 
> And in the vast majority of cases, there's a simpler heuristic you can
> use first: is the basename the same?
> 
> But really, this is 160MB of wasted space.  We don't support installing
> onto USB, so from glancing at pricewatch, the smallest disk they list
> that we support installing onto would appear to be an 18GB SCSI drive
> for $23.  There are larger, cheaper drives, too.
> 
> So we're talking about saving just under 1% of the least-desirable
> supported install target currently being sold.  Let's just stop?

Seconded.  The time wasted on the thread would be better spent if
people would look at the top 10 real space wasters and tried to find
solutions to those.  Even then, personally I believe if we could
cut the OS full install from 9Gb down to 1Gb -> who cares.  Like
you just stated, disks are massive nowadays, and cheap to boot.

If there are people out there who cant or don't want to buy a new
disk, and the install size is stopping them from using Fedora, then
again - finding the _biggest_ space wasters and trying to resolve
them instead of wasting time talking peanuts of duplicated files
would be a bigger saver.

I can think of several things people could volunteer for to make
install footprint smaller:

- Join X.Org Development and help keithp and juliusz get bitmap
   fonts converted from .bdf/pcf to .ttf bitmaps, and finish off
   the tools needed to make this happen.  That shaves off quite a
   number of megs off the CDROMs in theory, although the same holds
   true as above as far as disk prices are concerned.

- Examine packages for overzealous rpm dependancies.  This includes
   finding things that link to libraries that aren't used, causing
   unnecessary deps.  Finding and fixing those will help lower install
   footprint, although it wont save CD space.

- Find things we can just totally throw away completely, or move to
   Fedora Extras, or some other repository.  Things that really do not
   need to be in "Core" for general purpose OS.  Since this is a
   touchy subject for many who want their favourite apps to be included
   in the "Core" by default for convenience, I'll suggest some that
   I personally use myself that I'd like to see remain in core, but
   which probably don't belong there:  "mc", "iptraf", "pinfo".

- Examine large packages like openoffice, X, and others to see what
   things consume the most space, determine wether there might be a
   better short or long term way to package the stuff, or change the
   way the underlying technology works.

etc.




More information about the devel mailing list