Finding Duplicate Files
Cameron Simpson
cs at zip.com.au
Fri Mar 14 23:01:13 UTC 2008
On 13Mar2008 21:25, Jonathan Roberts <jonathan.roberts.uk at googlemail.com> wrote:
| I've got into a bit of a muddle with my backups...more than a little in fact!
|
| I have several folders each approx 10-20 Gb in size. Each has some
| unique material and some duplicate material, and it's even possible
| there's duplicate material in sub-folders too. How can I consolidate
| all of this into a single folder so that I can easily move the backup
| onto different mediums, and get back some disk space!?
I use a few tools for this kind of thing, all available here:
http://www.cskk.ezoshosting.com/cs/css/bin/
http://www.cskk.ezoshosting.com/cs/css/
mklinks - http://www.cskk.ezoshosting.com/cs/css/bin/mklinks
walks directories trees hardlinking identical content
only compares files of the same size, remembers previously
seen files, etc
mrg, overlay, ov
mostly the "ov" script:
ov one-dir another-dir
where you know the two directories are the same shape
and purpose. It moves everything from the first directory into the
corresponding place in the second, except where there's a file of the
same name already there - in that case, if they are the same the
copy in one-dir is removed, otherwise it is left alone
That leaves you with one-dir containing only the conflicts,
which you may now examine.
links
this is a wrapper for find that I usually run like this:
links some-dir -ls -rm
the idea is that you have run mklinks over this dir and probably others.
It will remove all files with more than one link. Thus, at the end
you have only files that are not already present elsewhere.
pruneleafdirs
walks specified directories tossing empty subdirs
good to run after a big ov/overlay/mrg run above
Some of these scripts use others - the easy thing is to grab the
tarball.
Cheers,
--
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/
You are just paranoid, and all your friends think so too.
- James Joseph Dominguez <d9250788 at zac.riv.csu.edu.au>
More information about the users
mailing list