Finding Duplicate Files

Geoffrey Leach geoff at hughes.net
Fri Mar 14 01:21:19 UTC 2008


On 03/13/2008 02:25:26 PM, Jonathan Roberts wrote:
> Hey all,
> 
> I've got into a bit of a muddle with my backups...more than a little
> in fact!
> 
> I have several folders each approx 10-20 Gb in size. Each has some
> unique material and some duplicate material, and it's even possible
> there's duplicate material in sub-folders too. How can I consolidate
> all of this into a single folder so that I can easily move the backup
> onto different mediums, and get back some disk space!?

Here's a Perl script to compare all files of the same size, and prints 
commands to eliminate all but one of those that compare equal. You'll 
need to modify or consilidate to deal with subdirs. File::Slurp is not 
part of the standard distrubtion; its rpm is perl-File-Slurp.noarch

#!/usr/bin/perl 
use File::Slurp;

my $dir = q{/path-to-your-dir};

my @files = read_dir($dir);

@files = map { [ (stat "$dir/$_" )[7], $_ ] } @files;
@files = sort { $a->[0] <=> $b->[0] } @files;

while ( @files ) {
    my $f = shift @files;
    last unless @files;
    my @dups = ( $f->[1] );
    while ( $f->[0] == $files[0]->[0] ) {
        my $s = shift @files;
        if ( system(qq{ cmp -s $dir/$f->[1] $dir/$s->[1] }) != 0 ) { $f 
= $s; @dups = ( $f->[1] ); }
        else                                                       
{ push @dups, $s->[1]; }
        last unless @files;
    }
    if ( @dups > 1 ) {
        shift @dups;
        map { print "rm $_\n" } @dups;
    }
}








More information about the users mailing list