Duplicated files in the pristine FC4t2 installation

Jindrich Novy jnovy at redhat.com
Mon May 2 12:38:06 UTC 2005


Hello all,

I've found some file duplicates when I browsed through the /usr
directory tree in the pristine & complete FC4t2 installation what made
me curious how many duplicates are there in total. This is not critical,
so please take this as something for your information that some files
would better be symlinked/hardlinked in order to not to waste disc space
without a point. I know there's sometimes no other way that to duplicate
a file, but the statistics I have is IMHO rather interesting:

206405 regular files found, 4149 MiB [5468 MiB]
15797 total dupes, 15705 non-zero sized.
96 MiB [161 MiB], 2.325% [2.951%] wasted by dupes, 13906 symlinks, 5042
hardlinks.

So that 161 MiB is "wasted" physically in the /usr tree, what is about
3% in total from all the files within the /usr hierarchy.

To let this information be somehow worth for the package maintainers,
I'm adding a link to the list of all the duplicated files including
their sizes and md5 sums and to what package they belong:

http://people.redhat.com/jnovy/files/FC4t2-usr-dupes.gz

This statistics was done by the "slink" utility I wrote some time ago.
It's able to replace duplicates with symbolic links to save disc space
[EXPERIMENTAL, but seems to work] or just display a statistics about
duplicates for a given directory. If you want to give it a try, get it
from:

http://people.redhat.com/jnovy/files/slink-0.0.1-pre1.tar.bz2

It's interesting how many GPL "COPYING" clones we have in /usr/share/doc
for instance. Unfortunately some of my packages are also affected ;)

Regards,
Jindrich

-- 
Jindrich Novy <jnovy at redhat.com>, http://people.redhat.com/jnovy/

The worst evil in the world is refusal to think.




More information about the devel mailing list