Recursive comparing of files

Marko Vojinovic vvmarko at gmail.com
Wed Feb 24 01:31:00 UTC 2010


Hi folks! :-)

I have the following task: there are two directories on the disk, say a/ and 
b/, with various subdirectories and files inside. I need to find and erase all 
*duplicate* files, and after that all empty directories. The files may reside in 
different directories, may have different names, but if they have identical 
*contents*, file from b/ branch should be deleted.

Now, the directories that I have are rather large and I wouldn't want to go 
hunt for duplicates manually. Is there some tool that can at least identify 
and list duplicate files in some directory structure?

I could think of an algorithm like:

1) list all files in all subdirectories of a/ along with their file size
2) do the same thing for files in b/
3) sort and compare lists, look for pairs of files with identical size
4) test each pair to see if the file content is the same, and if yes, list them 
in the output

I could probably be able to write a bash script which would do this, but I 
guess this problem is common and there are already some available tools which 
would do this for me. Any suggestions?

Thanks, :-)
Marko



More information about the users mailing list