Finding Duplicate Files
Geoffrey Leach
geoff at hughes.net
Fri Mar 14 01:21:19 UTC 2008
On 03/13/2008 02:25:26 PM, Jonathan Roberts wrote:
> Hey all,
>
> I've got into a bit of a muddle with my backups...more than a little
> in fact!
>
> I have several folders each approx 10-20 Gb in size. Each has some
> unique material and some duplicate material, and it's even possible
> there's duplicate material in sub-folders too. How can I consolidate
> all of this into a single folder so that I can easily move the backup
> onto different mediums, and get back some disk space!?
Here's a Perl script to compare all files of the same size, and prints
commands to eliminate all but one of those that compare equal. You'll
need to modify or consilidate to deal with subdirs. File::Slurp is not
part of the standard distrubtion; its rpm is perl-File-Slurp.noarch
#!/usr/bin/perl
use File::Slurp;
my $dir = q{/path-to-your-dir};
my @files = read_dir($dir);
@files = map { [ (stat "$dir/$_" )[7], $_ ] } @files;
@files = sort { $a->[0] <=> $b->[0] } @files;
while ( @files ) {
my $f = shift @files;
last unless @files;
my @dups = ( $f->[1] );
while ( $f->[0] == $files[0]->[0] ) {
my $s = shift @files;
if ( system(qq{ cmp -s $dir/$f->[1] $dir/$s->[1] }) != 0 ) { $f
= $s; @dups = ( $f->[1] ); }
else
{ push @dups, $s->[1]; }
last unless @files;
}
if ( @dups > 1 ) {
shift @dups;
map { print "rm $_\n" } @dups;
}
}
More information about the users
mailing list