If any of you are testing Presto, please upgrade to 0.2.4. This version stores a log in /var/log/presto.log that contains two very important numbers for each yum session: * Number of bytes that would have been downloaded for Presto-enabled repository updates * Number of bytes that were downloaded for Presto-enabled repository updates
It also contains the percentage saved and total (cumulative) percentage saved.
Every week or so, I'd love to get a copy of your logs so I can see what real savings we're getting from Presto.
Also, our test server is syncing updates and extras every six hours. Does anyone know if there's a way for the fedoraproject servers to push updates rather than our polling?
Thanks, Jonathan
I noticed something strange, after installing yum-presto.
I have a freshrpms repo enabled on my FC6 box. This repo is setup with a mirrorlist, which contains the following, when downloaded:
http://ayo.ie.freshrpms.net/fedora/linux/6/$ARCH/freshrpms/ http://ayo.uk3.freshrpms.net/fedora/linux/6/$ARCH/freshrpms/ http://ayo.us5.freshrpms.net/fedora/linux/6/$ARCH/freshrpms/ http://ayo.pt.freshrpms.net/fedora/linux/6/$ARCH/freshrpms/
When presto is enabled, the freshrpms repo consistently fails with errors looking like this:
http://ayo.pt.freshrpms.net/fedora/linux/6/%24BASEARCH/freshrpms/repodata/re...: [Errno 14] HTTP Error 404: Date: Mon, 26 Mar 2007 11:51:45 GMT
Running yum with --disablepresto makes freshrpms work again.
So it seems like presto is screwing up some internal variables or convertions or something.
Hope this is useful, if not, please let me know what you need.
/Thomas
This is a duplicate post - My apologies.
If you want to discuss this issue, please do so under the
"Presto and mirrorlist url problems"
thread instead.
/Thomas
On Mon, 2007-03-26 at 08:52 -0400, Jeremy Katz wrote:
On Mon, 2007-03-26 at 11:26 +0300, Jonathan Dieter wrote:
Also, our test server is syncing updates and extras every six hours. Does anyone know if there's a way for the fedoraproject servers to push updates rather than our polling?
There's not
Jeremy
Bummer. I guess we get to work with what we've got then. Is six hours a good mirror timetable?
Jonathan
Jonathan Dieter wrote:
On Mon, 2007-03-26 at 08:52 -0400, Jeremy Katz wrote:
On Mon, 2007-03-26 at 11:26 +0300, Jonathan Dieter wrote:
Also, our test server is syncing updates and extras every six hours. Does anyone know if there's a way for the fedoraproject servers to push updates rather than our polling?
There's not
Jeremy
Bummer. I guess we get to work with what we've got then. Is six hours a good mirror timetable?
Jonathan
This brings up (again) the discussion of whether if would be worth adding sync-flag semantics to the way we mirror (like what debian has been doing for years). That would allow downstream mirrors to monitor for a specific file every hour or so, and when that file exists with an updated timestamp (or something) then we can be sure that the mirror has finished it's own sync. The downstream mirror can then (more) reliably perform it's own sync, sleep for some hours and start all over again. So while we're still polling for updates, the risk of ending with an inconsistent mirror is reduced.
If this was implemented on all mirrors, we could effectively and very simply decrease mirroring bandwidth, improve mirror reliability/quality, reduce the number of update problems due to incomplete mirroring etc etc.
By now it should not be a big surprise, that I think this could be a good idea. I'll just keep pointing out situations where this could make a positive difference. ;-)
/Thomas
At 3:21 PM +0200 3/26/07, Thomas M Steenholdt wrote:
Jonathan Dieter wrote:
On Mon, 2007-03-26 at 08:52 -0400, Jeremy Katz wrote:
On Mon, 2007-03-26 at 11:26 +0300, Jonathan Dieter wrote:
Also, our test server is syncing updates and extras every six hours. Does anyone know if there's a way for the fedoraproject servers to push updates rather than our polling?
There's not
Jeremy
Bummer. I guess we get to work with what we've got then. Is six hours a good mirror timetable?
Jonathan
This brings up (again) the discussion of whether if would be worth adding sync-flag semantics to the way we mirror (like what debian has been doing for years). That would allow downstream mirrors to monitor for a specific file every hour or so, and when that file exists with an updated timestamp (or something) then we can be sure that the mirror has finished it's own sync. The downstream mirror can then (more) reliably perform it's own sync, sleep for some hours and start all over again. So while we're still polling for updates, the risk of ending with an inconsistent mirror is reduced.
If the repomd.xml file were guaranteed to be updated last, then it would make a good sentinal. (Actually, all the metadata files should be updated at the same time. Aas they have static names there can't be two versions at a time.)
If this was implemented on all mirrors, we could effectively and very simply decrease mirroring bandwidth, improve mirror reliability/quality, reduce the number of update problems due to incomplete mirroring etc etc.
By now it should not be a big surprise, that I think this could be a good idea. I'll just keep pointing out situations where this could make a positive difference. ;-)
At 11:26 AM +0300 3/26/07, Jonathan Dieter wrote: ...
Also, our test server is syncing updates and extras every six hours. Does anyone know if there's a way for the fedoraproject servers to push updates rather than our polling?
I don't think so, but you could poll just the repomd.xml metadata file pretty quickly. If it is updated, then the repo is probably worth syncing with. You might also check if the filelists.gz is updated as well.
Tony Nelson wrote:
At 11:26 AM +0300 3/26/07, Jonathan Dieter wrote: ...
Also, our test server is syncing updates and extras every six hours. Does anyone know if there's a way for the fedoraproject servers to push updates rather than our polling?
I don't think so, but you could poll just the repomd.xml metadata file pretty quickly. If it is updated, then the repo is probably worth syncing with. You might also check if the filelists.gz is updated as well.
Problem with this is, that unless special care is taken to do it this way, we can't rely on the repository metadata to be updated last.
I still vote for adopting the debian way - add a
if test -f /path/to/mirror/trace/$(hostname); then rm -f path/to/mirror/trace/$(hostname) fi
to the start of the sync script, and a
date > /path/to/mirror/trace/$(hostname)
to the end...
(obviously, the code would have to change to reflect the public mirror name, if different from $(hostname) and such, so this is just an example of how easy something like this could be.
When syncing against ftp.giantmirrorsite.com, we can check for the "ftp.giantmirrorsite.com" file in the trace directory. If it's there, we can relatively safely assume that the mirror is in sync. The contents of the file would indicate the time of last sync completion and the other contents of the trade/ dir reveals the path the updates has traveled downstream. Very nice.
/Thomas
At 9:11 PM +0200 3/26/07, Thomas M Steenholdt wrote:
Tony Nelson wrote:
At 11:26 AM +0300 3/26/07, Jonathan Dieter wrote: ...
Also, our test server is syncing updates and extras every six hours. Does anyone know if there's a way for the fedoraproject servers to push updates rather than our polling?
I don't think so, but you could poll just the repomd.xml metadata file pretty quickly. If it is updated, then the repo is probably worth syncing with. You might also check if the filelists.gz is updated as well.
Problem with this is, that unless special care is taken to do it this way, we can't rely on the repository metadata to be updated last.
...
True, but that's already broken for everybody, so this would be no different. Once the metadata updates, either the other files are there or just about everything doesn't work.
It would be a really good idea to fix this, by splitting up the rsync into two lines, so that the metadata is done last. Still, even without that, once the metadata changes one can assume that a repo update is happening. Repomd.xml is a nice small file that is always updated.
Tony Nelson wrote:
At 9:11 PM +0200 3/26/07, Thomas M Steenholdt wrote:
Tony Nelson wrote:
At 11:26 AM +0300 3/26/07, Jonathan Dieter wrote: ...
Also, our test server is syncing updates and extras every six hours. Does anyone know if there's a way for the fedoraproject servers to push updates rather than our polling?
I don't think so, but you could poll just the repomd.xml metadata file pretty quickly. If it is updated, then the repo is probably worth syncing with. You might also check if the filelists.gz is updated as well.
Problem with this is, that unless special care is taken to do it this way, we can't rely on the repository metadata to be updated last.
...
True, but that's already broken for everybody, so this would be no different. Once the metadata updates, either the other files are there or just about everything doesn't work.
It would be a really good idea to fix this, by splitting up the rsync into two lines, so that the metadata is done last. Still, even without that, once the metadata changes one can assume that a repo update is happening. Repomd.xml is a nice small file that is always updated.
I still think implementing this in just another file will be a better solution. Its atomic (for lack of a better term here), in that the flag can always be counted on as being valid. If the flag file is not there, a sync is in progress, if it's there, the mirror is in synced state. The flag file is generated locally so there's no potential loop hole of even tenths of a second, like there would be the other way during the time when the repomd.xml is downloaded. It will not rely on data to actually change but reports whether a sync is in progress. It provides backtracking in case of staleness, errors, package corruption, etc. It lets you know the "age" of the mirrordata and it works without splitting the actual mirroring script. And all in a 30 byte file.
/Thomas
On Monday 26 March 2007 17:33:21 Thomas M Steenholdt wrote:
I still think implementing this in just another file will be a better solution. Its atomic (for lack of a better term here), in that the flag can always be counted on as being valid. If the flag file is not there, a sync is in progress, if it's there, the mirror is in synced state. The flag file is generated locally so there's no potential loop hole of even tenths of a second, like there would be the other way during the time when the repomd.xml is downloaded. It will not rely on data to actually change but reports whether a sync is in progress. It provides backtracking in case of staleness, errors, package corruption, etc. It lets you know the "age" of the mirrordata and it works without splitting the actual mirroring script. And all in a 30 byte file.
It's relying on mirrors to use OUR script to mirror content though, and we can't count on A) they'd use our script instead of whatever else they're using to sync other data, B) they're even using a host with bash/sh on it.
On 3/26/07, Jesse Keating jkeating@redhat.com wrote:
On Monday 26 March 2007 17:33:21 Thomas M Steenholdt wrote:
I still think implementing this in just another file will be a better solution. Its atomic (for lack of a better term here), in that the flag can always be counted on as being valid. If the flag file is not there, a sync is in progress, if it's there, the mirror is in synced state. The flag file is generated locally so there's no potential loop hole of even tenths of a second, like there would be the other way during the time when the repomd.xml is downloaded. It will not rely on data to actually change but reports whether a sync is in progress. It provides backtracking in case of staleness, errors, package corruption, etc. It lets you know the "age" of the mirrordata and it works without splitting the actual mirroring script. And all in a 30 byte file.
It's relying on mirrors to use OUR script to mirror content though, and we can't count on A) they'd use our script instead of whatever else they're using to sync other data, B) they're even using a host with bash/sh on it.
-- Jesse Keating Release Engineer: Fedora
I've been following this thread, and I would just like to ask about combining both methods, and giving a yes/no warning if both factors aren't in place?
Jesse Keating wrote:
It's relying on mirrors to use OUR script to mirror content though, and we can't count on A) they'd use our script instead of whatever else they're using to sync other data, B) they're even using a host with bash/sh on it.
That's what I find so good about this solution. True, you'd need to do something extra to comply with the "mirroring protocol", but there is NO need for special mirroring scripts. It's all about putting a timestamp in a plain text. No need for special mirror scripts and it's easily doable on every single platform that I can think of. We should not provide any custom tools to perform this sync IMO, but supply a fedora mirroring howto, outlining what should be done. Mirror admins are then free to choose what ever sync tool/protocol he'd like, to perform the actual sync (rsync, ftp, cifs, nfs, <fav sync method here>), as long as he remembers to put a simple date in a file, when he's done.
/Thomas
On Tuesday 27 March 2007 00:59:57 Thomas M Steenholdt wrote:
as long as he remembers to put a simple date in a file, when he's done.
Which calls for special syncing. And there is no guarantee that they'll create the date file before or after the sync, so we'll still have to validate the age of the repomd files.
Jesse Keating wrote:
On Tuesday 27 March 2007 00:59:57 Thomas M Steenholdt wrote:
as long as he remembers to put a simple date in a file, when he's done.
Which calls for special syncing. And there is no guarantee that they'll create the date file before or after the sync, so we'll still have to validate the age of the repomd files.
That's the whole point. We need to specify/document/demand the order in which the sync phases are carried out (remove date file, sync, create date file). The sync part is entirely up to the mirror admin, but the other parts are not. For a mirror to be considered official, they'd have to obey to these rules (I can't imagine, creating a timestamped file would be a huge obstacle for most mirrorsites). The date of the repomd.xml file is really another issue. It'll let you know the age of the repository contents, but it's not suitable for checking mirror sync status since you'd need to remove it for sync, and thus disabling the use of your mirror for updating, while syncing. Various different approaches can be taken to make this better, but IMO, none that I've seen or heard are as simple as the date file method.
/Thomas
Jonathan Dieter wrote:
Also, our test server is syncing updates and extras every six hours. Does anyone know if there's a way for the fedoraproject servers to push updates rather than our polling?
Is there a good site with info on setting up a drpms mirror and other useful info to get one started on the deltarpm era?
I'm not thinking about rsync'ing the drpm packages, but rather, i'd like to automatically build the drpms on this system, for use by my internal machines i386,x86_64.
Any pointers?
Thanks
/Thomas
On Wed, 2007-03-28 at 07:15 +0200, Thomas M Steenholdt wrote:
Jonathan Dieter wrote:
Also, our test server is syncing updates and extras every six hours. Does anyone know if there's a way for the fedoraproject servers to push updates rather than our polling?
Is there a good site with info on setting up a drpms mirror and other useful info to get one started on the deltarpm era?
I'm not thinking about rsync'ing the drpm packages, but rather, i'd like to automatically build the drpms on this system, for use by my internal machines i386,x86_64.
Any pointers?
Thanks
/Thomas
In the .tar.bz2 and the source rpm is a folder called makerepo that contains a very ugly createprestorepo.py script.
The syntax is "/path/to/createprestorepo.py <base directory> <relative directory to create drpms in>". For the test server it's something like "~/bin/createprestorepo.py ./ DRPMS/" while in ~/public_html/jdieter/updates/fc6/i386.
It will then go and find all rpms, make any applicable deltas (at the moment, it creates any delta it can with 50%+ savings), and save them in the destination folder, and create prestomd, etc in repodata.
If the drpms aren't worth saving it deletes them and creates a ".dontdelta" file for the drpm, so the next round it will be ignored.
You might want to wait until I push out 0.3.0 later today, as the current version of createprestorepo.py won't try to make deltarpms for packages over 70MB (due to memory constraints on my server).
You will need roughly 3x as much RAM as the largest package you want to delta.
Jonathan
Jonathan Dieter wrote:
In the .tar.bz2 and the source rpm is a folder called makerepo that contains a very ugly createprestorepo.py script.
The syntax is "/path/to/createprestorepo.py <base directory> <relative directory to create drpms in>". For the test server it's something like "~/bin/createprestorepo.py ./ DRPMS/" while in ~/public_html/jdieter/updates/fc6/i386.
It will then go and find all rpms, make any applicable deltas (at the moment, it creates any delta it can with 50%+ savings), and save them in the destination folder, and create prestomd, etc in repodata.
If the drpms aren't worth saving it deletes them and creates a ".dontdelta" file for the drpm, so the next round it will be ignored.
You might want to wait until I push out 0.3.0 later today, as the current version of createprestorepo.py won't try to make deltarpms for packages over 70MB (due to memory constraints on my server).
You will need roughly 3x as much RAM as the largest package you want to delta.
Sounds great, I'll give it a whirl once 0.3.0 hits the server :)
But I wonder, since we only specify the updatedir (cwd) and the target DRPMS dir, how does it know what packages to build base the delta on? Does it use whatever is installed or in the core repo or something?
And here is a thought-up scenario :
- base install includes a package called xxx-1.0.0-1.i386.rpm (100MB) - then an update is released called xxx-1.0.0-2.i386.rpm (100 MB) - then an update is released called xxx-1.0.0-3.i386.rpm (100 MB) - then an update is released called xxx-1.0.1-1.i386.rpm (101 MB)
The currently installed systems could be on any one of the 3 previous versions of xxx. So to be able to use drpms for package xxx for any system, we'd need several drpm packages to be able to use drpms for all systems?
At least, these drpms would then be required, right? 1.0.0-1 -> 1.0.1-1 1.0.0-2 -> 1.0.1-1 1.0.0-3 -> 1.0.1-1
Or does it work differently? Perhaps I'm just missing an important piece of information here? ;o)
Thanks a lot!
/Thomas
On Wed, 2007-03-28 at 07:56 +0200, Thomas M Steenholdt wrote:
Sounds great, I'll give it a whirl once 0.3.0 hits the server :)
But I wonder, since we only specify the updatedir (cwd) and the target DRPMS dir, how does it know what packages to build base the delta on? Does it use whatever is installed or in the core repo or something?
And here is a thought-up scenario :
- base install includes a package called xxx-1.0.0-1.i386.rpm (100MB)
- then an update is released called xxx-1.0.0-2.i386.rpm (100 MB)
- then an update is released called xxx-1.0.0-3.i386.rpm (100 MB)
- then an update is released called xxx-1.0.1-1.i386.rpm (101 MB)
The currently installed systems could be on any one of the 3 previous versions of xxx. So to be able to use drpms for package xxx for any system, we'd need several drpm packages to be able to use drpms for all systems?
At least, these drpms would then be required, right? 1.0.0-1 -> 1.0.1-1 1.0.0-2 -> 1.0.1-1 1.0.0-3 -> 1.0.1-1
Or does it work differently? Perhaps I'm just missing an important piece of information here? ;o)
Thanks a lot!
/Thomas
Okay, I'm going to answer what I *think* you're asking, but I'm not sure. In your scenario (which is what I'm using on the test server), createprestorepo.py would create all three drpms (though some thought needs to go into how we're going to trim them in the future).
As for making the drpms for core => updates, I took the lazy way out. I just mirrored core and updates all into the same local directory on the test server. Then, createprestorepo was able to find all the update paths.
There are obviously better ways of doing this, but I've been focusing on the client rather than the server at the moment.
Jonathan
Jonathan Dieter wrote:
Okay, I'm going to answer what I *think* you're asking, but I'm not sure. In your scenario (which is what I'm using on the test server), createprestorepo.py would create all three drpms (though some thought needs to go into how we're going to trim them in the future).
As for making the drpms for core => updates, I took the lazy way out. I just mirrored core and updates all into the same local directory on the test server. Then, createprestorepo was able to find all the update paths.
This is exactly what I asked :o) Just needed to understand how that was supposed to work.
Sure we need to think about a good way to solve this, but for now, I just needed to understand how that was supposed to work.
Thanks.
/Thomas