On Sun, Jul 28, 2019 at 01:21:16PM -0400, Stephen John Smoogen wrote:
On Sun, 28 Jul 2019 at 13:17, Anderson, Charles R <cra(a)wpi.edu>
wrote:
> On Sat, Jul 27, 2019 at 10:04:06AM -0400, Stephen John Smoogen wrote:
> > I did a bunch of copying of packages over to archives at the end of May.
> I
> > do all the copies using cp -l to preserve links. I then ran the
> > update-archives for full file list but then did not see it did not work.
> >
> > This we got several requests from mirrors to get rid of f28 out of normal
> > space. I confirmed that f28 was copied over and then I synced over f29
> and
> > f30 so I would not be behind in a couple of months. All of these were
> also
> > done with cp -l also and I checked that the hardlink numbers had
> increased
> > on files. I gave the info to Adrian and he let me know that the
> > update-archives was not updated. I then fixed that and again didnt send
> any
> > email about this.
>
> I'm not sure whether q-f-m didn't work quite right, whether there was
> a server-side issue (update-archives?), or whether this was just all
> due to timing of cp -l vs. update-archives vs. my q-f-m runs, but
> somehow I received over 1 TB of downloads that were not hardlinked.
> It did not finish (I ran out of disk space) so I cleaned that up
> yesterday, moved the files from .~tmp/* to their final locations, ran
> a local hardlink.py, and re-ran q-f-m. The hardlink restored the 1+
> TB of free space. The files are all staying hardlinked, but due to
> q-f-m optimizing the rsync runs, that doesn't provide any assurance
> that the master mirror is as hardlinked as it could be.
>
> To focus on the good parts:
>
> My local hardlink.py run saved at least 600 GB additional space
> compared to before Friday. I think some of the additional savings
> comes from internal hardlinking between e.g. Workstation, Everything,
> Spins, noarch across different arch directories, etc. that one might
> expect to be already hardlinked, but weren't for some reason.
>
> I'm going to look into running quick-fedora-harlink locally regularly.
>
So I am going to look at using the hardlink command with a 'do not do this,
just tell me what you would do' option first. I will report to the list
what I find. This would be with the hardlink we normally ship, but I was
wondering if anyone has any new versions they recommend.
I know of at least 5 implementations. Despite hardlink.py claiming to be faster than the
original C version, it still took over 12 hours when I ran it yesterday.
- hardlink (the C version we ship in Fedora).
- hardlink in util-linux (maybe the same as above).
- hardlink from Debian that may end up being adopted into util-linux (see
https://github.com/karelzak/util-linux/issues/808).
- hardlink.py by John Villalovos that I used
(
https://code.google.com/archive/p/hardlinkpy/, there appear to be forks on github).
# John Villalovos
# email: john(a)sodarock.com
#
http://www.sodarock.com/
#
# Inspiration for this program came from the hardlink.c code. I liked what it
# did but did not like the code itself, to me it was very unmaintainable. So I
# rewrote in C++ and then I rewrote it in python. In reality this code is
# nothing like the original hardlink.c, since I do things quite differently.
# Even though this code is written in python the performance of the python
# version is much faster than the hardlink.c code, in my limited testing. This
# is mainly due to use of different algorithms.
#
# Original inspirational hardlink.c code was written by: Jakub Jelinek
# <jakub(a)redhat.com>
- quick-fedora-hardlink
(
https://docs.pagure.org/quick-fedora-mirror/quick-fedora-hardlink.rst) that can run much
quicker using the filelists in the repos.