* Stephen John Smoogen:
No because the things that backups and rsync do works in a slow way. We can do the backup the look-aside cache with tar-balls in a couple of hours. We can also rsync that in the same amount of time. It takes that long or longer to do that with a couple of git trees which are much smaller in size but larger in file numbers. Every file in a git tree is stat'd and while there is some deduplication, there is a lot of files.
I think there's a logic bug somewhere. 8-)
The number of files in the lookaside cache is small only because we check in patch files into dist-git. Upstream glibc.git isn't too bad, despite not having a particularly clean repository due to frequently rebased user branches (and tons of lose objects as a result):
$ find glibc.git/ | wc -l 1725
That's not two far off from the number of files we have in downstream dist-git at the tip of each release branch:
$ for x in {7..32} ; do git ls-tree origin/f$x: ; done | awk '{print $3}' | wc -l 1232
Admittedly, the deduplicated number is somewhat lower:
$ for x in {7..32} ; do git ls-tree origin/f$x: ; done | awk '{print $3}' | sort -u | wc -l 711
(I don't know how many files end up on the dist-git server for that.)
There must be hundreds of glibc tarballs in the lookaside cache by now, too, but I don't have insight into that. (Clearly, we aren't model citizens.) The file count would likely be way lower if you had to back up only one or two Git repositories.
Thanks, Florian