Recursive comparing of files

Antonio Olivares olivares14031 at yahoo.com
Wed Feb 24 02:38:18 UTC 2010



--- On Tue, 2/23/10, Marko Vojinovic <vvmarko at gmail.com> wrote:

> From: Marko Vojinovic <vvmarko at gmail.com>
> Subject: Recursive comparing of files
> To: users at lists.fedoraproject.org
> Date: Tuesday, February 23, 2010, 5:31 PM
> 
> Hi folks! :-)
> 
> I have the following task: there are two directories on the
> disk, say a/ and 
> b/, with various subdirectories and files inside. I need to
> find and erase all 
> *duplicate* files, and after that all empty directories.
> The files may reside in 
> different directories, may have different names, but if
> they have identical 
> *contents*, file from b/ branch should be deleted.
> 
> Now, the directories that I have are rather large and I
> wouldn't want to go 
> hunt for duplicates manually. Is there some tool that can
> at least identify 
> and list duplicate files in some directory structure?
> 
> I could think of an algorithm like:
> 
> 1) list all files in all subdirectories of a/ along with
> their file size
> 2) do the same thing for files in b/
> 3) sort and compare lists, look for pairs of files with
> identical size
> 4) test each pair to see if the file content is the same,
> and if yes, list them 
> in the output
> 
> I could probably be able to write a bash script which would
> do this, but I 
> guess this problem is common and there are already some
> available tools which 
> would do this for me. Any suggestions?
> 
> Thanks, :-)
> Marko
> 
> -- 


There is a tool called fdupes.  Read more about it here:

http://www.cyberciti.biz/faq/linux-unix-finds-duplicate-files-in-given-directories/

<quote>
You need to use a tool called fdupes. It will searche the given path for duplicate files. Such files are found by comparing file sizes and MD5 signatures, followed by a byte-by-byte comparison. fdupes is a nice tool to get rid of duplicate files.
</quote>
> 

Regards,

Antonio 


      


More information about the users mailing list