Finding Duplicate Files

Fri Mar 14 15:20:27 UTC 2008

Mikkel L. Ellertson wrote:
> You may want to look at the fdups and fslint packages.
That would be fdupes. Both are available in Fedora 8:
yum list  fdupes fslint
Loading "downloadonly" plugin
Loading "skip-broken" plugin
Installed Packages
fdupes.i386         1.40-10.fc8            installed
fslint.noarch       2.24-1.fc8             installed

With fdupes: move your folder that you think has duplicates below the 
original, then:
$ fdupes --recurse --delete

This sorts files by size, and for matching size compares contents. For 
dupes it list the path to each {note that file name does not need to 
match, only the content}, with an associated [number]. You then type 1 
or more  numbers to indicate which copy to keep.

If you don't care which copy to keep, you can use a trick like:
$ yes 1|fdupes --recurse --delete /home/myhome/mypath_to_dedupe/
It runs as before and any time the list stops waiting for input yes 
passes a 1 in; thus the first item is kept. Make sure you have tried the 
command without the yes before hand so you get an idea of what would be 
deleted - automatically.

I have found fslint's gui useful to find and erase empty folders across 
the disc. This saves time in not going into folders just to check if 
there is any contents.

With late version rsync, you can also:

rsync --dry-run --remove-source-files -a 
different_machine:/home/stuff_i_think_is_dupes/ /home/the_primary_copy/

which is useful if the 2 folders are somewhat identical and you want to 
end up with a single merged copy. You would probably then run fdupes 
afterwards to tidy dupes that are in differently named folders.

DaveT.