Recursive comparing of files
Antonio Olivares
olivares14031 at yahoo.com
Wed Feb 24 02:38:18 UTC 2010
--- On Tue, 2/23/10, Marko Vojinovic <vvmarko at gmail.com> wrote:
> From: Marko Vojinovic <vvmarko at gmail.com>
> Subject: Recursive comparing of files
> To: users at lists.fedoraproject.org
> Date: Tuesday, February 23, 2010, 5:31 PM
>
> Hi folks! :-)
>
> I have the following task: there are two directories on the
> disk, say a/ and
> b/, with various subdirectories and files inside. I need to
> find and erase all
> *duplicate* files, and after that all empty directories.
> The files may reside in
> different directories, may have different names, but if
> they have identical
> *contents*, file from b/ branch should be deleted.
>
> Now, the directories that I have are rather large and I
> wouldn't want to go
> hunt for duplicates manually. Is there some tool that can
> at least identify
> and list duplicate files in some directory structure?
>
> I could think of an algorithm like:
>
> 1) list all files in all subdirectories of a/ along with
> their file size
> 2) do the same thing for files in b/
> 3) sort and compare lists, look for pairs of files with
> identical size
> 4) test each pair to see if the file content is the same,
> and if yes, list them
> in the output
>
> I could probably be able to write a bash script which would
> do this, but I
> guess this problem is common and there are already some
> available tools which
> would do this for me. Any suggestions?
>
> Thanks, :-)
> Marko
>
> --
There is a tool called fdupes. Read more about it here:
http://www.cyberciti.biz/faq/linux-unix-finds-duplicate-files-in-given-directories/
<quote>
You need to use a tool called fdupes. It will searche the given path for duplicate files. Such files are found by comparing file sizes and MD5 signatures, followed by a byte-by-byte comparison. fdupes is a nice tool to get rid of duplicate files.
</quote>
>
Regards,
Antonio
More information about the users
mailing list