#5846: move away from md5 for look-aside cache ------------------+------------------------ Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Keywords: | Blocked By: Blocking: | ------------------+------------------------ The lookaside cache uses md5, but something more secure like sha-256 or sha-512 should be used instead. Maybe it should even be made to allow easy changes in the future.
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+-----------------------
Comment (by till):
This was previously discussed in [https://fedorahosted.org/fedora- infrastructure/ticket/3358|Intrastructure ticket 3358].
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+----------------------- Changes (by till):
* cc: sgrubb@…, limb, ausil, jkeating, mitr (added)
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+----------------------- Changes (by skottler):
* cc: skottler (added)
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+-----------------------
Comment (by bochecha):
A few explanations on the patches I just attached.
The pyrpkg patch introduces a new option in the configuration file: lookasidefallbackhash.
This hash is to be used when the **verification** fails using the normal lookasidehash. However, files are always uploaded with the normal lookasidehash.
Then, the fedpkg changes just adapt to that new option, and set the lookasidehash sha256 instead of md5, while setting the new lookasidefallbackhash to md5.
With these 3 changes applied, fedpkg now uploads sources with the sha256 hash, but can still verify previously uploaded sources.
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+-----------------------
Comment (by tomspur):
Wouldn't it be easier to change the syntax of the sources file, so it is clear, which hash was used?
If the old format of the sources file is used, it is md5, otherwise the sources file says which one to use.
What are the consumers of the sources file except rpkg?
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+----------------------- Changes (by pbabinca):
* cc: pbabinca@… (added)
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+-----------------------
Comment (by bochecha):
Wouldn't it be easier to change the syntax of the sources file, so it is
clear, which hash was used?
Other tools are based on the pyrpkg API, I know of at least: * nbpkg: my own fedpkg equivalent at $dayjob * goosepkg: GoOSe Linux's fedpkg equivalent * rhpkg: RHEL's fedpkg equivalent
There might be others, for example I wouldn't be surprised if Centos was using it too.
The fact that it was md5 previously is configured in the /etc/fedpkg.conf file, so you can't assume that all of these tools have been using md5 so far: they might very well have started from scratch with sha256 or example.
To me, this is a strong argument in favour of being able to specify the previously used hash, as a fallback.
It doesn't preclude the fact that we could indeed change the format of the sources file, of course.
You mentioned the advantage: it makes it explicit which hash was used.
The drawback is that the currently trivial way to fill it without fedpkg would not work any more:
{{{ $ md5sum foo-1.0.tar.xz > sources }}}
In any case, I'd be happy to provide additional patches to implement this if people feel like it would be interesting. I still think that introducing a « previous hash » configuration option is important.
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+-----------------------
Comment (by tmz):
It seems to me that the hash used can be determined by the length of the hash string. Those are constant.
{{{ $ for hash in md5sum sha{1,256,512}sum; do len=$( $hash ~/.bashrc | awk '{printf "%s", $1}' | wc -c ); printf "%-10s: %3s\n" $hash $len; done md5sum : 32 sha1sum : 40 sha256sum : 64 sha512sum : 128 }}}
It would even be possible to determine individually for different items in a single source file and have the tools use the appropriate tool to verify each item (though mixing them seems like it would be rather pointless ;).
But teaching the tools that verify sources to use the non-changing format of the various hash algorithms would allow changes to be made to the hash used to generate sources without requiring that every packager updated fedpkg first, wouldn't it? Then, as soon as the builders were updated to recognize a new hash algorithm, it could be used in a sources file.
(Apologies for chiming in without any code to show, I feel guilty about that.)
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+-----------------------
Comment (by bruno):
If we're going to change the hash, we might want to go right to SHA-3 (Keccak).
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+-----------------------
Comment (by till):
It might be necessary that the server side needs to be adjusted as well. Maybe someone who knows can confirm or deny this.
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+-----------------------
Comment (by kevin):
A copy of the upstream upload.cgi is:
https://git.fedorahosted.org/cgit/fedora- infrastructure.git/tree/scripts/upload.cgi
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+-----------------------
Comment (by bochecha):
Indeed, I completely forgot about the server side. :-)
There's a lot of "md5" in the source, but it all seems to be just names of variables which were chosen assuming we were using md5 as the hashing. Those could be renamed to make things clearer, but that wouldn't change anything not to do it.
The only thing really related to md5 I could find with a quick look is this: https://git.fedorahosted.org/cgit/fedora- infrastructure.git/tree/scripts/upload.cgi/upload.cgi#n24
That obviously needs to be adapted.
I'll see if I can get some time today to work on that, but anyone should feel free to beat me to it. ;-)
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+-----------------------
Comment (by bochecha):
Indeed, I didn't thinka bout that.
And as a bonus, it makes the change much simpler! :)
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+-----------------------
Comment (by bochecha):
So, I've been attaching quite a few patches here, and I can't find a way to mark some as obsolete, so here are the details.
First, these patches can be totally ignored, as I found tmz's suggestion comment 8 would be better than what I had done:
* 0001-lookaside-Add-a-new-fallback-hash.patch * 0001-lookaside-Adapt-to-the-rpkg-change.patch * 0002-lookaside-Use-sha256-instead-of-md5-by-default.patch
Then, 0001-lookaside-Determine-which-hash-function-was-used.patch is a patch to apply to rpkg, which implements the following:
* when verifying a source file, determine the hash used from the length of the hashed string * nothing changes when uploading: the hash used is the one set in the configuration file
Finally, 0001-lookaside-Support-more-than-just-md5sum.patch is for the upload.cgi script on the lookaside server, so that when a file is uploaded, we don't assume the hash is md5, but determine it from the length of the hashed string.
One note though: the patch for the lookaside drops compatibility with older versions of Python which didn't have the hashlib module.
Those versions didn't have hash functions other than md5 and sha1, so it wouldn't have been possible to deal with sha256 or sha512.
I was assured that our lookaside runs on EL 6 though, which has a recent enough Python, and it's the system I tested these changes on.
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+----------------------- Changes (by tmz):
* cc: tmz (added)
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+-----------------------
Comment (by bochecha):
I just sent some patches to the appropriate lists, based on the discussion at a previous releng meeting:
* server-side changes: https://lists.fedoraproject.org/pipermail/infrastructure/2014-March/014187.h... * client-side changes: https://lists.fedoraproject.org/pipermail/rel- eng/2014-March/017457.html
As Dennis requested, this avoids breaking the pyrpkg API by limiting the client-side changes to fedpkg only.
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: meeting Blocked By: | Blocking: -------------------+----------------------- Changes (by bochecha):
* keywords: => meeting
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: meeting Blocked By: | Blocking: -------------------+-----------------------
Comment (by jkeating):
Has anybody looked at git-annex as a way to store references to source files, and fetch them down? https://git-annex.branchable.com/
It would be more of a change than what's been talked about here, but it would remove bits of homegrown software, and would more natively fit with the rest of the git structure. Just a thought, I'm not really going to be putting any work in here :)
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+----------------------- Changes (by ausil):
* keywords: meeting =>
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+----------------------- Changes (by msuchy):
* cc: msuchy (added)
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: -------------------+-----------------------
Comment (by bochecha):
An update on the current status.
This follows the migration plan I had outlined on the mailing-list:
https://lists.fedoraproject.org/pipermail/rel- eng/2014-April/017619.html
Staging has distgit in Ansible, so I've rebased the first part of my patches to it, and Pierre-Yves merged them. (thanks!)
The current status is that the **staging** lookaside can now accept uploads as SHA512, but still falls back on MD5. It can also store the uploaded sources in the new directory layout (containing the hash type), but doesn't do it for MD5.
So all in all, with all this merged, we are preparing for the migration, but absolutely nothing should change.
Again, this is **only in staging** for now.
#5846: move away from md5 for look-aside cache -------------------+----------------------- Reporter: till | Owner: rel-eng@… Type: task | Status: new Milestone: | Component: other Resolution: | Keywords: Blocked By: | Blocking: 6111 -------------------+----------------------- Changes (by till):
* blocking: => 6111
Comment:
This was deferred to after the Fedora 22 release: https://fedorahosted.org/rel-eng/ticket/6111
rel-eng@lists.fedoraproject.org