On Wed, 11 May 2016 11:30:00 -0500
Dennis Gilmore <dennis(a)ausil.us> wrote:
I am trying to catch up with email from the last week or so, I am
still 13000 behind, So I did not catch this. this only works if you
do not care about hardlinking, which is going to mean that people are
using an extra 500G + of disk on the mirrors. An issue some mirrors
have hit due to what I am assuming are bad mirroring practices. the
only way to fix it properly is going to mean re-evaluating how we
push content and how we message the pushing, and having tooling to
either do push mirroring or enabling intelligent pull based
mirroring, including information about whats hardlinked where and
what content we have pushed. this is like a bandaid when the sore
under it is still festering away.
Well, this change was simply to allow us to explore using more data for
syncing.
Hopefully we can come up with a way to express hardlinks with it.
If you have the fullfiletimelist file and there is a new one you can
diff them. Once you have that list of files that were deleted or
changed, you can sort them and possibly hard link the ones with the
same name/timestamp/size. All of our hardlinked files should be the
same name/timestamp/size I think.
But failing all that we could easily have people rsync just the changed
files (saving us LOTS AND LOTS of iops), but not getting hardlinks and
then once a week or two doing a full traditional sync that would delete
any removed files and hardlink everything. Doing this they would not
have an extra 500GB, they would only get back those files changed in
the last week that were hardlinked, so it would be much smaller I
suspect. This would save us tons of iops, make their syncs super fast
and only have a slight bad effect on space.
If this all turns out to not work out, no harm done, but I think it
might well help us out a great deal.
kevin