-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Ah, what a wonderful scholastic assignment. It's also great that Mike is going to have to open the letter once it actually makes to him.
Dear Mr. McGrath,
As long time Fedora developers, we both have watched and lived through the growing pains of Fedora releases. With the onset of the merging between core and extras, we have successfully created the needed infrastructure on which even basic Linux users can create their own Fedora-based derivative distribution. Along with this achievement comes the task of providing hosting infrastructure for an average user/developer to be able to share his/her creation with the world. An additional benefit in being directly involved in the sharing of the "Spins" is the ability to data mine what packages people are including in custom spins as to give our developer base more focus. To be able to fulfill this task, please consider requesting additional resources to archive signed updates so that they can be publicly accessed for the duration of a release life-cycle.
The long road of Fedora development has been bumpy and even not enjoyable at times. Throughout the duration of the Fedora 7 release, we have made some very amazing changes that helped to open Fedora up to the masses. One of the most community-based features is the ability to Re-Mix and Re-Spin the distribution as the end-user sees fit. Being the server team lead for Fedora Unity, I've had to come up with unique ways to share the Re-Spins we compose. After trying many approaches with different technologies, we have settled on using Jigdo, the jigsaw downloader. Debian has used Jigdo for some time now and after poking their system for a while I've seen they keep every update they push. I do understand that Debian is considered a stable distribution much in the way RHEL is considered stable. This makes the updates tree not as active and it is easier to find mirrors to carry the full data set for an extended period of time. We, however, need to find a way to offer this same extended life for signed packages that have hit the Fedora updates tree.
The amount of storage and bandwidth able to be saved can be illustrated by a simple comparison between the efficiency of chopping up a 3.4GB iso9660 file system arbitrarily (by a static chunk size) and the same file system based on contents (file by file.) For a BitTorrent, Fedora's current choice for sharing Spins, the hosted data is only valid for a given chunk on a single ISO. This data's footprint (equal to the combined chunk sizes of the entire torrent) can be used for nothing but this Spin. To be able to host 5 Spins composed from similar trees via BitTorrent, we now have a footprint of 17GB, not to mention "seeders" have to run BitTorrent software to be able to contribute to the swarm. Alternatively, Jigdo can be used to reduce the footprint of these 5 Spins to about 4GB. The amount of additional data needing to be hosted for each Spin, in addition to what data is already pushed to the mirrors, is about 150MB per ISO with anaconda and about 200KB for ISOs without the installer bits. To help illustrate the efficiency of using Jigdo vs BitTorrent, the footprint for 250 Spins is 850GB for BitTorrent and about 40GB for Jigdo. Additionally, a reduction in overhead can be achieved by removing the need for the BitTorrent tracker and all related network traffic without requiring any additional work on the part of mirror administrators.
The current updates system is getting better each release, but I think we should adjust our policies to also have an “updates-archive” repository. This repository will include all signed updates that had once lived in the updates repo, for the duration of the releases life-cycle. I don't expect all mirrors would want to carry this extra data so making the new repo optional will be a must. With the new MirrorManager, we will be able to effectively point users at mirrors that have been willing to take on the extra footprint. These requests to MirrorManager could be used to compose reports on what packages the community is utilizing most, allowing us to better focus our efforts. By providing an unified point of entry for data, we will be able to also log requests for package data found on in-house and non-official spins. By utilizing the abilities of MirrorManager to return specific mirrors for pre-defined IP blocks, we will enable end-users and companies to download from on-site mirrors while maintaining complete transparency. I hope that with advancements in Jigdo client software we will be able to look at using Jigdo to host our official images. Please let me know your thoughts on this matter.
As always, thanks,
- -- Jonathan Steffan daMaestro GPG Fingerprint: 93A2 3E2F DC26 5570 3472 5B16 AD12 6CE7 0D86 AF59
- -- Jonathan Steffan daMaestro GPG Fingerprint: 93A2 3E2F DC26 5570 3472 5B16 AD12 6CE7 0D86 AF59
Jonathan Steffan said the following on 12/06/2007 01:28 PM Pacific Time:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Ah, what a wonderful scholastic assignment. It's also great that Mike is going to have to open the letter once it actually makes to him.
Dear Mr. McGrath,
As long time Fedora developers, we both have watched and lived through the growing pains of Fedora releases. With the onset of the merging between core and extras, we have successfully created the needed infrastructure on which even basic Linux users can create their own Fedora-based derivative distribution. Along with this achievement comes the task of providing hosting infrastructure for an average user/developer to be able to share his/her creation with the world. An additional benefit in being directly involved in the sharing of the "Spins" is the ability to data mine what packages people are including in custom spins as to give our developer base more focus. To be able to fulfill this task, please consider requesting additional resources to archive signed updates so that they can be publicly accessed for the duration of a release life-cycle.
The long road of Fedora development has been bumpy and even not enjoyable at times. Throughout the duration of the Fedora 7 release, we have made some very amazing changes that helped to open Fedora up to the masses. One of the most community-based features is the ability to Re-Mix and Re-Spin the distribution as the end-user sees fit. Being the server team lead for Fedora Unity, I've had to come up with unique ways to share the Re-Spins we compose. After trying many approaches with different technologies, we have settled on using Jigdo, the jigsaw downloader. Debian has used Jigdo for some time now and after poking their system for a while I've seen they keep every update they push. I do understand that Debian is considered a stable distribution much in the way RHEL is considered stable. This makes the updates tree not as active and it is easier to find mirrors to carry the full data set for an extended period of time. We, however, need to find a way to offer this same extended life for signed packages that have hit the Fedora updates tree.
The amount of storage and bandwidth able to be saved can be illustrated by a simple comparison between the efficiency of chopping up a 3.4GB iso9660 file system arbitrarily (by a static chunk size) and the same file system based on contents (file by file.) For a BitTorrent, Fedora's current choice for sharing Spins, the hosted data is only valid for a given chunk on a single ISO. This data's footprint (equal to the combined chunk sizes of the entire torrent) can be used for nothing but this Spin. To be able to host 5 Spins composed from similar trees via BitTorrent, we now have a footprint of 17GB, not to mention "seeders" have to run BitTorrent software to be able to contribute to the swarm. Alternatively, Jigdo can be used to reduce the footprint of these 5 Spins to about 4GB. The amount of additional data needing to be hosted for each Spin, in addition to what data is already pushed to the mirrors, is about 150MB per ISO with anaconda and about 200KB for ISOs without the installer bits. To help illustrate the efficiency of using Jigdo vs BitTorrent, the footprint for 250 Spins is 850GB for BitTorrent and about 40GB for Jigdo. Additionally, a reduction in overhead can be achieved by removing the need for the BitTorrent tracker and all related network traffic without requiring any additional work on the part of mirror administrators.
The current updates system is getting better each release, but I think we should adjust our policies to also have an “updates-archive” repository. This repository will include all signed updates that had once lived in the updates repo, for the duration of the releases life-cycle. I don't expect all mirrors would want to carry this extra data so making the new repo optional will be a must. With the new MirrorManager, we will be able to effectively point users at mirrors that have been willing to take on the extra footprint. These requests to MirrorManager could be used to compose reports on what packages the community is utilizing most, allowing us to better focus our efforts. By providing an unified point of entry for data, we will be able to also log requests for package data found on in-house and non-official spins. By utilizing the abilities of MirrorManager to return specific mirrors for pre-defined IP blocks, we will enable end-users and companies to download from on-site mirrors while maintaining complete transparency. I hope that with advancements in Jigdo client software we will be able to look at using Jigdo to host our official images. Please let me know your thoughts on this matter.
Hasn't the door already been opened for this? http://fedoraproject.org/wiki/ReleaseEngineering/Meetings/2007-nov-19 http://fedoraproject.org/wiki/ReleaseEngineering/Meetings/2007-nov-19#t13:44
Also, I see an incomplete feature page here: http://fedoraproject.org/wiki/Features/JigdoRelease
John
John Poelstra wrote:
Hasn't the door already been opened for this? http://fedoraproject.org/wiki/ReleaseEngineering/Meetings/2007-nov-19 http://fedoraproject.org/wiki/ReleaseEngineering/Meetings/2007-nov-19#t13:44
The Release Meeting addressed releasing spins via Jigdo, yes.
What had been proposed -and accepted- in the Release Meeting though was releasing /custom/ spins built of the release tree, via Jigdo. In particular the Everything Spin comes to mind, composed of the full tree in releases/$releasever/Everything/$basearch. However, Jonathan's mail addresses a request to keep signed copies of updates available in updates/$releasever/$basearch (or somewhere else) for the entire release cycle, so that:
- Re-Spins and Remixes can be composed at any time during the release cycle and hosted with the Fedora Project without too large a footprint, and
- Optionally older packages can be used by respinners and remixes, when needed, for example in case of a yum or rpm update that breaks its compatibilities with anaconda, or a kernel that just doesn't do the job.
Also, I see an incomplete feature page here: http://fedoraproject.org/wiki/Features/JigdoRelease
As the owner of the feature; Jigdo hasn't got the integration bits yet to make full use of Fedora Project infrastructure, as I'm working with upstream to get some changes done, and pyJigdo (a python alternative "wrapper" around upstream Jigdo that does integrate it with the existing Fedora Project in full), hasn't been released stable yet. Then again, mind that it is an alternative. If people really start using Jigdo it hits the mirrorlist for every single slice -possibly (or inevitably) overloading the mirrorlist. With pyJigdo this problem, and another couple of problems are solved (such as multi-image downloading, error recovery, scanning more then one local directory for slices, ..., ...).
-Jeroen
On Thu, 6 Dec 2007 14:28:13 -0700 Jonathan Steffan jonathansteffan@gmail.com wrote:
The amount of storage and bandwidth able to be saved can be illustrated by a simple comparison between the efficiency of chopping up a 3.4GB iso9660 file system arbitrarily (by a static chunk size) and the same file system based on contents (file by file.) For a BitTorrent, Fedora's current choice for sharing Spins, the hosted data is only valid for a given chunk on a single ISO. This data's footprint (equal to the combined chunk sizes of the entire torrent) can be used for nothing but this Spin. To be able to host 5 Spins composed from similar trees via BitTorrent, we now have a footprint of 17GB, not to mention "seeders" have to run BitTorrent software to be able to contribute to the swarm. Alternatively, Jigdo can be used to reduce the footprint of these 5 Spins to about 4GB. The amount of additional data needing to be hosted for each Spin, in addition to what data is already pushed to the mirrors, is about 150MB per ISO with anaconda and about 200KB for ISOs without the installer bits. To help illustrate the efficiency of using Jigdo vs BitTorrent, the footprint for 250 Spins is 850GB for BitTorrent and about 40GB for Jigdo. Additionally, a reduction in overhead can be achieved by removing the need for the BitTorrent tracker and all related network traffic without requiring any additional work on the part of mirror administrators.
Question. Spins.fp.o is mostly about Live images no? Live images are essentially a big squasfs image wrapped up in an iso with some bootable stuff involved? How would jigdo possibly help here?
On Thu, 6 Dec 2007 16:42:57 -0500 Jesse Keating jkeating@redhat.com wrote:.
Question. Spins.fp.o is mostly about Live images no? Live images are essentially a big squasfs image wrapped up in an iso with some bootable stuff involved? How would jigdo possibly help here?
This is correct. In it's current state, jigdo is only of value for installer images. I have a goal of getting pyJigdo to be able to deal with a squashfs, pulling the bits from rpms and then sticking them into the squash. I have not worked out the technical details as of yet, but it is planned.
Jesse Keating wrote:
Question. Spins.fp.o is mostly about Live images no? Live images are essentially a big squasfs image wrapped up in an iso with some bootable stuff involved? How would jigdo possibly help here?
As Jigdo needs an expanded tree somewhere (online) of the contents of an ISO image, so for Live Media, Jigdo isn't a very viable solution.
-Jeroen
On Fri, 07 Dec 2007 02:05:15 +0100 Jeroen van Meeuwen kanarip@kanarip.com wrote:
Jesse Keating wrote:
Question. Spins.fp.o is mostly about Live images no? Live images are essentially a big squasfs image wrapped up in an iso with some bootable stuff involved? How would jigdo possibly help here?
As Jigdo needs an expanded tree somewhere (online) of the contents of an ISO image, so for Live Media, Jigdo isn't a very viable solution.
Well, as I've expressed in https://www.redhat.com/archives/fedora-infrastructure-list/2007-December/msg... we could make it so jigdo supports sticking data that has been published to a tree in some form. To generate the jigdo definition, the squash would need to be opened and the resulting bits would need to be hashed. Thus, putting the data back together would also involve creating a squashfs and sticking data into it. I've not looked into this outside of just conceptual design. If there are any squashfs gurus that would like to comment, please do.
On Do Dezember 6 2007, Jonathan Steffan wrote:
The current updates system is getting better each release, but I think we should adjust our policies to also have an “updates-archive” repository. This repository will include all signed updates that had once lived in the updates repo, for the duration of the releases life-cycle. I don't expect all mirrors would want to carry this extra
Once the repositories are presto enabled, an updates-archive with all the delta rpms would not take as much space as a full rpm repository but provide the same functionality.
Regards, Till
On Thu, 06 Dec 2007 22:53:22 +0100 Till Maas opensource@till.name wrote:
On Do Dezember 6 2007, Jonathan Steffan wrote:
The current updates system is getting better each release, but I think we should adjust our policies to also have an “updates-archive” repository. This repository will include all signed updates that had once lived in the updates repo, for the duration of the releases life-cycle. I don't expect all mirrors would want to carry this extra
Once the repositories are presto enabled, an updates-archive with all the delta rpms would not take as much space as a full rpm repository but provide the same functionality.
Yes, this is correct. I left it out of the letter due to the need for a specific client to deal with the presto data. PyJigdo could do such a task, if this is the route we want to go.
Till Maas wrote:
On Do Dezember 6 2007, Jonathan Steffan wrote:
The current updates system is getting better each release, but I think we should adjust our policies to also have an “updates-archive� repository. This repository will include all signed updates that had once lived in the updates repo, for the duration of the releases life-cycle. I don't expect all mirrors would want to carry this extra
Once the repositories are presto enabled, an updates-archive with all the delta rpms would not take as much space as a full rpm repository but provide the same functionality.
If in some way a mirror can regenerate a full, original RPM from all delta RPMs, so that Jigdo can use it as a slice, a combination of the two would be possible.
Without that ability, all Jigdo recognizes is full RPMs on the ISO image (slices), against no (zero) matches in the updates-archive/ folder (all partial slices).
-Jeroen
On Fri, 07 Dec 2007 02:10:26 +0100 Jeroen van Meeuwen kanarip@kanarip.com wrote:
Till Maas wrote:
On Do Dezember 6 2007, Jonathan Steffan wrote:
The current updates system is getting better each release, but I think we should adjust our policies to also have an “updates-archiveâ€_ repository. This repository will include all signed updates that had once lived in the updates repo, for the duration of the releases life-cycle. I don't expect all mirrors would want to carry this extra
Once the repositories are presto enabled, an updates-archive with all the delta rpms would not take as much space as a full rpm repository but provide the same functionality.
If in some way a mirror can regenerate a full, original RPM from all delta RPMs, so that Jigdo can use it as a slice, a combination of the two would be possible.
This would be nice to be able to utilize the mirrors CPU, but most likely is not practical and we will not be able to accomplished without mirror admins running special handlers and at that point I would expect admins would rather waste the disk space vs supporting a home-brew handler.
Without that ability, all Jigdo recognizes is full RPMs on the ISO image (slices), against no (zero) matches in the updates-archive/ folder (all partial slices).
Yes, this would require changes to the client software which would basically obsolete the current jigdo client. We would most likely need to utilize yum metadata to put all the pieces back together (not a real problem but it does restrict what platforms we can run pyJigdo on, not that it runs on anything but fedora at the moment) correctly. Using presto is also not as bandwidth efficient, but is more disk efficient (with regards to mirroring an entire tree of archived updates.) I specially left out using presto in the mix because there would be a lot of restrictions placed on where the client software could run if we went this route. I am, however, not in objection to utilizing existing systems we are developing. We would get the benefit of both having delta enabled repos for use with yum and a full archive of all updates released for the duration of a release. Granted multiple deltas would need to be downloaded to achieve a specific version of a package, but that also depends on how the deltas are generated.
If anyone knows what the current plan is for generation of deltas, please chime in. Are we doing deltas against the previous public release or are we doing deltas against the release package? In both cases, what is our expected retention policy?
Jonathan Steffan wrote:
The amount of storage and bandwidth able to be saved can be illustrated by a simple comparison between the efficiency of chopping up a 3.4GB iso9660 file system arbitrarily (by a static chunk size) and the same file system based on contents (file by file.) For a BitTorrent, Fedora's current choice for sharing Spins, the hosted data is only valid for a given chunk on a single ISO. This data's footprint (equal to the combined chunk sizes of the entire torrent) can be used for nothing but this Spin. To be able to host 5 Spins composed from similar trees via BitTorrent, we now have a footprint of 17GB, not to mention "seeders" have to run BitTorrent software to be able to contribute to the swarm. Alternatively, Jigdo can be used to reduce the footprint of these 5 Spins to about 4GB. The amount of additional data needing to be hosted for each Spin, in addition to what data is already pushed to the mirrors, is about 150MB per ISO with anaconda and about 200KB for ISOs without the installer bits. To help illustrate the efficiency of using Jigdo vs BitTorrent, the footprint for 250 Spins is 850GB for BitTorrent and about 40GB for Jigdo. Additionally, a reduction in overhead can be achieved by removing the need for the BitTorrent tracker and all related network traffic without requiring any additional work on the part of mirror administrators.
My concern with jigdo is with how many people use it? It seems silly to host both torrent and jigdo (as much of this letter points out the benefits of switching to jigdo, those benefits disappear if we simply add jigdo to the mix. Most people already have bittorrent. Lets say we were going to give Jigdo a trial run for Fedora 9 and we were going to judge jigdo a success if a certain % (compared to bittorrent) use jigdo. What % would that be?
While google trends is sort of crazy in that I don't know what it says, it does say something:
http://google.com/trends?q=bittorrent%2C+jigdo
-Mike
Mike McGrath wrote:
Jonathan Steffan wrote:
The amount of storage and bandwidth able to be saved can be illustrated by a simple comparison between the efficiency of chopping up a 3.4GB iso9660 file system arbitrarily (by a static chunk size) and the same file system based on contents (file by file.) For a BitTorrent, Fedora's current choice for sharing Spins, the hosted data is only valid for a given chunk on a single ISO. This data's footprint (equal to the combined chunk sizes of the entire torrent) can be used for nothing but this Spin. To be able to host 5 Spins composed from similar trees via BitTorrent, we now have a footprint of 17GB, not to mention "seeders" have to run BitTorrent software to be able to contribute to the swarm. Alternatively, Jigdo can be used to reduce the footprint of these 5 Spins to about 4GB. The amount of additional data needing to be hosted for each Spin, in addition to what data is already pushed to the mirrors, is about 150MB per ISO with anaconda and about 200KB for ISOs without the installer bits. To help illustrate the efficiency of using Jigdo vs BitTorrent, the footprint for 250 Spins is 850GB for BitTorrent and about 40GB for Jigdo. Additionally, a reduction in overhead can be achieved by removing the need for the BitTorrent tracker and all related network traffic without requiring any additional work on the part of mirror administrators.
My concern with jigdo is with how many people use it? It seems silly to host both torrent and jigdo (as much of this letter points out the benefits of switching to jigdo, those benefits disappear if we simply add jigdo to the mix. Most people already have bittorrent. Lets say we were going to give Jigdo a trial run for Fedora 9
FYI, we have done so, and we are doing so officially for Fedora 9.
and we were going to
judge jigdo a success if a certain % (compared to bittorrent) use jigdo. What % would that be?
Jigdo would in this case be particularly useful to those with a local mirror as they have 99% of the content already (90% if you have F9T3?). Because it is particularly useful to some, and completely weird and strange for others, the number of users that will use it if BitTorrent is an alternative wouldn't be a very good indicator to see if it is actually a viable distribution method for the whole of Fedora, neither is it the goal for these proposals.
However on the other hand we do have a couple of people with local mirrors, and last time I checked, test releases are downloaded a couple of times. We are hoping that these users in particular try out Jigdo and become happy bandwidth and time savers.
For the Fedora Project, the greatest benefit of doing a trial Jigdo release with Fedora 9 is to get to know the feeling, see the numbers, get some feedback, and not having to host 68GB of Everything spins on different media, instead of 450MB -giving the same results. The same goes for any other additional installation media composed off the release tree -even rebranded downstream distributions, although there isn't much of those as long as updates keep expiring from the mirrors. The original proposal was in fact that Fedora 9 CDs would be released and hosted by Fedora Project but it seemed to be a better path to do so via Release Engineering
While google trends is sort of crazy in that I don't know what it says, it does say something:
Right, I do hope our gut feeling rather then the number of hits on Google comes up with a decision...
Another consideration in the entire footprint discussion may be to expire FC{1,2,3,4,5,6} from the master mirror.
Kind regards,
Jeroen van Meeuwen -kanarip
Jeroen van Meeuwen wrote:
Mike McGrath wrote:
Jonathan Steffan wrote:
The amount of storage and bandwidth able to be saved can be illustrated by a simple comparison between the efficiency of chopping up a 3.4GB iso9660 file system arbitrarily (by a static chunk size) and the same file system based on contents (file by file.) For a BitTorrent, Fedora's current choice for sharing Spins, the hosted data is only valid for a given chunk on a single ISO. This data's footprint (equal to the combined chunk sizes of the entire torrent) can be used for nothing but this Spin. To be able to host 5 Spins composed from similar trees via BitTorrent, we now have a footprint of 17GB, not to mention "seeders" have to run BitTorrent software to be able to contribute to the swarm. Alternatively, Jigdo can be used to reduce the footprint of these 5 Spins to about 4GB. The amount of additional data needing to be hosted for each Spin, in addition to what data is already pushed to the mirrors, is about 150MB per ISO with anaconda and about 200KB for ISOs without the installer bits. To help illustrate the efficiency of using Jigdo vs BitTorrent, the footprint for 250 Spins is 850GB for BitTorrent and about 40GB for Jigdo. Additionally, a reduction in overhead can be achieved by removing the need for the BitTorrent tracker and all related network traffic without requiring any additional work on the part of mirror administrators.
My concern with jigdo is with how many people use it? It seems silly to host both torrent and jigdo (as much of this letter points out the benefits of switching to jigdo, those benefits disappear if we simply add jigdo to the mix. Most people already have bittorrent. Lets say we were going to give Jigdo a trial run for Fedora 9
FYI, we have done so, and we are doing so officially for Fedora 9.
and we were going to
judge jigdo a success if a certain % (compared to bittorrent) use jigdo. What % would that be?
Jigdo would in this case be particularly useful to those with a local mirror as they have 99% of the content already (90% if you have F9T3?). Because it is particularly useful to some, and completely weird and strange for others, the number of users that will use it if BitTorrent is an alternative wouldn't be a very good indicator to see if it is actually a viable distribution method for the whole of Fedora, neither is it the goal for these proposals.
I'm talking specifically about people going to the get-fedora page and clicking on the torrent link vs the jigdo link. Out of every 100 people, how many people will click on the jigdo link?
-Mike
Mike McGrath wrote:
and we were going to
judge jigdo a success if a certain % (compared to bittorrent) use jigdo. What % would that be?
Jigdo would in this case be particularly useful to those with a local mirror as they have 99% of the content already (90% if you have F9T3?). Because it is particularly useful to some, and completely weird and strange for others, the number of users that will use it if BitTorrent is an alternative wouldn't be a very good indicator to see if it is actually a viable distribution method for the whole of Fedora, neither is it the goal for these proposals.
I'm talking specifically about people going to the get-fedora page and clicking on the torrent link vs the jigdo link. Out of every 100 people, how many people will click on the jigdo link?
Given the choice to download, say, the Fedora 9 i386 vanilla DVD, frankly, I expect only people that know Jigdo, or want to get to know Jigdo as it may have some benefits for them, and want to use it, are going to use it, so in all my optimism:
roughly 10 out of a 100.
For other spins without regular bittorrent seeds obviously the rate is 100%, and some of the people that get to know Jigdo that way will be using it again for our respins, and if possible, again for other spins (non-Everything?), and again, and again.
Kind regards,
Jeroen van Meeuwen -kanarip
Jeroen van Meeuwen wrote:
Mike McGrath wrote:
and we were going to
judge jigdo a success if a certain % (compared to bittorrent) use jigdo. What % would that be?
Jigdo would in this case be particularly useful to those with a local mirror as they have 99% of the content already (90% if you have F9T3?). Because it is particularly useful to some, and completely weird and strange for others, the number of users that will use it if BitTorrent is an alternative wouldn't be a very good indicator to see if it is actually a viable distribution method for the whole of Fedora, neither is it the goal for these proposals.
I'm talking specifically about people going to the get-fedora page and clicking on the torrent link vs the jigdo link. Out of every 100 people, how many people will click on the jigdo link?
Given the choice to download, say, the Fedora 9 i386 vanilla DVD, frankly, I expect only people that know Jigdo, or want to get to know Jigdo as it may have some benefits for them, and want to use it, are going to use it, so in all my optimism:
roughly 10 out of a 100.
I would venture less. As a former Debian user (and knowing other Debian users), our favorite install system was always the ~100 MB net install image, which we can do for Fedora already (though it's not 100 MB)
Spins are perhaps interesting for users that don't do minimal net-installs, and don't want to build their system with yum later, but jigdo is something that advanced users would use. They seem to be two different groups.
Jigdo did not seem to be very popular among anyone I talked to once they figured out the minimal install images were available.
Fedora's MMV, of course.
Michael DeHaan wrote:
Jeroen van Meeuwen wrote:
Mike McGrath wrote:
and we were going to
judge jigdo a success if a certain % (compared to bittorrent) use jigdo. What % would that be?
Jigdo would in this case be particularly useful to those with a local mirror as they have 99% of the content already (90% if you have F9T3?). Because it is particularly useful to some, and completely weird and strange for others, the number of users that will use it if BitTorrent is an alternative wouldn't be a very good indicator to see if it is actually a viable distribution method for the whole of Fedora, neither is it the goal for these proposals.
I'm talking specifically about people going to the get-fedora page and clicking on the torrent link vs the jigdo link. Out of every 100 people, how many people will click on the jigdo link?
Given the choice to download, say, the Fedora 9 i386 vanilla DVD, frankly, I expect only people that know Jigdo, or want to get to know Jigdo as it may have some benefits for them, and want to use it, are going to use it, so in all my optimism:
roughly 10 out of a 100.
I would venture less. As a former Debian user (and knowing other Debian users), our favorite install system was always the ~100 MB net install image, which we can do for Fedora already (though it's not 100 MB)
Assuming you do have a network connection, like the example above; What would you want to do if there's no Jigdo, but you do have the Fedora 9 DVD, and you want/need the CD version too? Download another ~4GB of ISO images?
How about off-line?
It's fairly simple to create the Jigdo files and host them. Let's try it and get the real numbers, then decide if it's valuable enough for all of us to continue distributing installation media with.
Spins are perhaps interesting for users that don't do minimal net-installs, and don't want to build their system with yum later, but jigdo is something that advanced users would use. They seem to be two different groups.
Jigdo did not seem to be very popular among anyone I talked to once they figured out the minimal install images were available.
We on the other hand have hundreds -if not thousands- of users download the CD version of Fedora 7 and Fedora 8 while supposedly they are in possession of the DVD images already.
Kind regards,
Jeroen van Meeuwen -kanarip
On Tue, 11 Dec 2007 00:41:24 +0100 Jeroen van Meeuwen kanarip@kanarip.com wrote:
We on the other hand have hundreds -if not thousands- of users download the CD version of Fedora 7 and Fedora 8 while supposedly they are in possession of the DVD images already.
What makes you say that? When I poked at fedoraunity.org the only download option I saw was jigdo, so how are you determining that these people are using jigdo by choice rather than by necessity?
Jesse Keating wrote:
On Tue, 11 Dec 2007 00:41:24 +0100 Jeroen van Meeuwen kanarip@kanarip.com wrote:
We on the other hand have hundreds -if not thousands- of users download the CD version of Fedora 7 and Fedora 8 while supposedly they are in possession of the DVD images already.
What makes you say that? When I poked at fedoraunity.org the only download option I saw was jigdo, so how are you determining that these people are using jigdo by choice rather than by necessity?
I don't think I said anything about those users /choosing/ to use Jigdo, but if I was likely to be misunderstood in that aspect I apologize.
Talking about necessity though, users interested in the Fedora 9 Everything Spin would need to use Jigdo as it is the only way anyone offers it (officially).
Kind regards,
Jeroen van Meeuwen -kanarip
On Tue, 11 Dec 2007 14:02:48 +0100 Jeroen van Meeuwen kanarip@kanarip.com wrote:
I don't think I said anything about those users /choosing/ to use Jigdo, but if I was likely to be misunderstood in that aspect I apologize.
Ok, Michael was talking about choosing, which is what led me to assume that you were too.
On Mon, 10 Dec 2007 17:55:47 -0500 Michael DeHaan mdehaan@redhat.com wrote:
Jeroen van Meeuwen wrote:
Mike McGrath wrote:
and we were going to
judge jigdo a success if a certain % (compared to bittorrent) use jigdo. What % would that be?
Jigdo would in this case be particularly useful to those with a local mirror as they have 99% of the content already (90% if you have F9T3?). Because it is particularly useful to some, and completely weird and strange for others, the number of users that will use it if BitTorrent is an alternative wouldn't be a very good indicator to see if it is actually a viable distribution method for the whole of Fedora, neither is it the goal for these proposals.
I'm talking specifically about people going to the get-fedora page and clicking on the torrent link vs the jigdo link. Out of every 100 people, how many people will click on the jigdo link?
Jigdo did not seem to be very popular among anyone I talked to once they figured out the minimal install images were available.
I will admit, our poll on spins.fedoraunity.org showed that more users want to use BitTorrent. I still think these users just had no concept of what Jigdo is *good* at and were left to deal with our growing pains setting everything up. We do apologize for that. We do, however, 110% support Jigdo in many cases:
* Testing images can easily be "upgraded" to the next release, only downloading what has changed * Re-Spins: The same goes for Re-Spins, an end users (and the testers) can easily "update" their ISO, again, only having to download what has changed * Any Spin: Not all mirrors chose to carry the ISO images. A next-hop or local mirror might not be available with the ISO images for direct download. This very close mirror will be able to be used to "put together" the ISO image and with our awesome new MirrorManager the acquisition of this data source will be automatic. * Bandwidth Optimization/Utilization: We are able to utilize mirrors around the globe without requiring mirror admins to think twice about hosting Jigdo data, they already are hosting most of the data needed to put the image back together and have to install no additional software (as in the case of running a torrent seed.) * Official Releases: This "letter" was directed at requesting all updates be archived in some form; this bullet is about official releases. These official releases will *always* be able to be hosted via Jigdo with *very little* additional storage requirements because the Fedora and Everything trees that were composed against are exploded and hosted indefinitely.
Please do understand that some of our ambitions are based around releasing an optimized, and if need-be completely rewritten, client that will solve a lot of the issues people have had with Jigdo in the past. The only client I know of that works is jigdo-lite and that is just a shell script. We currently are maintaining backwards compatibility with the existing Jigdo concept (pyjigdo being basically just a wrapper) but might find we need to take it to the next level due to needs outside of the scope of the original Jigdo and my plan has been to implement full compatibility in pure python and then add new bells and whistles that can be switched on and off, depending on the source .jigdo definition.
On Dec 12, 2007 2:47 AM, Jonathan Steffan jonathansteffan@gmail.com wrote:
On Mon, 10 Dec 2007 17:55:47 -0500 Michael DeHaan mdehaan@redhat.com wrote:
Jeroen van Meeuwen wrote:
Mike McGrath wrote:
and we were going to
judge jigdo a success if a certain % (compared to bittorrent) use jigdo. What % would that be?
Jigdo would in this case be particularly useful to those with a local mirror as they have 99% of the content already (90% if you have F9T3?). Because it is particularly useful to some, and completely weird and strange for others, the number of users that will use it if BitTorrent is an alternative wouldn't be a very good indicator to see if it is actually a viable distribution method for the whole of Fedora, neither is it the goal for these proposals.
I'm talking specifically about people going to the get-fedora page and clicking on the torrent link vs the jigdo link. Out of every 100 people, how many people will click on the jigdo link?
Jigdo did not seem to be very popular among anyone I talked to once they figured out the minimal install images were available.
I will admit, our poll on spins.fedoraunity.org showed that more users want to use BitTorrent. I still think these users just had no concept of what Jigdo is *good* at and were left to deal with our growing pains setting everything up. We do apologize for that. We do, however, 110% support Jigdo in many cases:
- Testing images can easily be "upgraded" to the next release, only
downloading what has changed
- Re-Spins: The same goes for Re-Spins, an end users (and the testers)
can easily "update" their ISO, again, only having to download what has changed
- Any Spin: Not all mirrors chose to carry the ISO images. A next-hop
or local mirror might not be available with the ISO images for direct download. This very close mirror will be able to be used to "put together" the ISO image and with our awesome new MirrorManager the acquisition of this data source will be automatic.
- Bandwidth Optimization/Utilization: We are able to utilize mirrors
around the globe without requiring mirror admins to think twice about hosting Jigdo data, they already are hosting most of the data needed to put the image back together and have to install no additional software (as in the case of running a torrent seed.)
- Official Releases: This "letter" was directed at requesting all
updates be archived in some form; this bullet is about official releases. These official releases will *always* be able to be hosted via Jigdo with *very little* additional storage requirements because the Fedora and Everything trees that were composed against are exploded and hosted indefinitely.
Jigdo support will be a fantastic addition definitely. Bandwidth is very costly in my country. So, using jigdo makes a lot of sense to me. Have not yet done it just for proper knowhow.
Cheers, Imtiaz
Jonathan Steffan wrote:
- Any Spin: Not all mirrors chose to carry the ISO images. A next-hop
or local mirror might not be available with the ISO images for direct download. This very close mirror will be able to be used to "put together" the ISO image and with our awesome new MirrorManager the acquisition of this data source will be automatic.
- Bandwidth Optimization/Utilization: We are able to utilize mirrors
around the globe without requiring mirror admins to think twice about hosting Jigdo data, they already are hosting most of the data needed to put the image back together and have to install no additional software (as in the case of running a torrent seed.)
I will speak as a mirror admin (ftp.free.fr/ftp.proxad.net). As far as I am concerned, bandwidth is not a real issue but disk IOs are. Disk capacity is growing exponentially, sequential access bandwidth grows linearly but disk seeks decrease at a very slow pace (SSD are coming but they do not match yet the needed capacity). Thus, improving server performances means either adding more disks or optimizing disks access (ie reading more data per each disk seeks).
The "biggest" mirrors I manage are stored on a two-disks RAID1 volume (to avoid stripping which induce disks seeks) and IO are optimized by doing 1 MB chunk readahead (using posix_fadvise). If I need additionnal bandwidth, I may increase the readahead chunk size. But this is usable only if the file I read is big enough (and if the fragmentation is kept low).
As long as jigdo use is uncommon, I just don't mind but if it had to be commonly used, it would mean a very sensible decrease in performances.
François
On Wed, 12 Dec 2007 12:18:31 +0100 Francois Petillon fantec@proxad.net wrote:
I will speak as a mirror admin (ftp.free.fr/ftp.proxad.net). As far as I am concerned, bandwidth is not a real issue but disk IOs are.
-SNIP-
As long as jigdo use is uncommon, I just don't mind but if it had to be commonly used, it would mean a very sensible decrease in performances.
So, you are against deltarpms also then?
The access time for the data being served as a Jigdo would be the same as anyone using yum against your mirror source. Also, in most cases download requests would be spread across multiple sources... though this can be changed by an end user or by the behavior of MirrorManager redirection results.
Jonathan Steffan wrote:
So, you are against deltarpms also then?
The problem is not being for / against anything. Deltarpms may be interesting for low-bandwidth client that would keep all downloaded rpms. But it would be a IO waste for servers if users have to download several files to rebuild the final ISO.
I was just trying to remind the choices you make may change servers efficiency (and as all mirrors do not have the same issues, it would be difficult to please everyone).
The access time for the data being served as a Jigdo would be the same as anyone using yum against your mirror source.
Yes. But it would be more IO costy than downloading an ISO if you have to download most of the packages to build the ISO.
Also, in most cases download requests would be spread across multiple sources...
At the server level (and as long as you do not speak about parallelizing downloads), there won't be much difference (you will still have to manage n% of the downloads/connections).
François
On Fri, Dec 07, 2007 at 08:48:10AM -0600, Mike McGrath wrote:
My concern with jigdo is with how many people use it? It seems silly to host both torrent and jigdo (as much of this letter points out the benefits of switching to jigdo, those benefits disappear if we simply add jigdo to the mix. Most people already have bittorrent. Lets say we were going to give Jigdo a trial run for Fedora 9 and we were going to judge jigdo a success if a certain % (compared to bittorrent) use jigdo. What % would that be?
Some people CAN'T use bittorrent because of firewalls. There should be no reason at all why anyone couldn't use Jigdo, because it uses standard FTP or HTTP to download the slices. There are clients available for all the important OSes.
Jigdo and Bittorrent are really two different beasts that do different things to benefit different use cases. Bittorrent is best for getting all the bits for an ISO set when you have nothing currently. Jigdo is good for getting bits that are packed differently but are otherwise identical to the bits you have already, plus it can also get all the bits via separate and possibly distributed downloads. So Jigdo in this sense has a superset of the functionality of Bittorrent.
I think we need to have some overlap by providing both services, certainly at least for a transition period, but perhaps even long term. The real benefit to Jigdo is that you can distribute one set of files that represent the Everthing universe of content via HTTP/FTP (or ISOs via Bittorrent), and then a bunch of Jigdo templates for all the various spins.
Chuck Anderson wrote:
On Fri, Dec 07, 2007 at 08:48:10AM -0600, Mike McGrath wrote:
My concern with jigdo is with how many people use it? It seems silly to host both torrent and jigdo (as much of this letter points out the benefits of switching to jigdo, those benefits disappear if we simply add jigdo to the mix. Most people already have bittorrent. Lets say we were going to give Jigdo a trial run for Fedora 9 and we were going to judge jigdo a success if a certain % (compared to bittorrent) use jigdo. What % would that be?
Some people CAN'T use bittorrent because of firewalls. There should be no reason at all why anyone couldn't use Jigdo, because it uses standard FTP or HTTP to download the slices. There are clients available for all the important OSes.
Those that can't use bittorrent can use the mirrors as well.
-Mike
Jonathan Steffan wrote:
The amount of storage and bandwidth able to be saved can be illustrated by a simple comparison between the efficiency of chopping up a 3.4GB iso9660 file system arbitrarily (by a static chunk size) and the same file system based on contents (file by file.) For a BitTorrent, Fedora's current choice for sharing Spins, the hosted data is only valid for a given chunk on a single ISO. This data's footprint (equal to the combined chunk sizes of the entire torrent) can be used for nothing but this Spin. To be able to host 5 Spins composed from similar trees via BitTorrent, we now have a footprint of 17GB, not to mention "seeders" have to run BitTorrent software to be able to contribute to the swarm. Alternatively, Jigdo can be used to reduce the footprint of these 5 Spins to about 4GB. The amount of additional data needing to be hosted for each Spin, in addition to what data is already pushed to the mirrors, is about 150MB per ISO with anaconda and about 200KB for ISOs without the installer bits. To help illustrate the efficiency of using Jigdo vs BitTorrent, the footprint for 250 Spins is 850GB for BitTorrent and about 40GB for Jigdo. Additionally, a reduction in overhead can be achieved by removing the need for the BitTorrent tracker and all related network traffic without requiring any additional work on the part of mirror administrators.
This paragraph shows the savings we would make on the jigdo server. How much would our storage needs increase by needing to keep all RPM's around on the mirrors?
-Mike
Mike McGrath wrote:
This paragraph shows the savings we would make on the jigdo server.
How
much would our storage needs increase by needing to keep all RPM's around on the mirrors?
For Fedora 8, per arch, at this moment that would be:
$ du -sch fedora-unity/releases/8/Everything/x86_64/ \ fedora-unity/updates/8/x86_64/
19G fedora-unity/releases/8/Everything/x86_64/ 5.0G fedora-unity/updates/8/x86_64/ 24G total
minus the updates in the normal tree already:
$ du -sch fedora/releases/8/Everything/x86_64/ fedora/updates/8/x86_64/ 19G fedora/releases/8/Everything/x86_64/ 3.9G fedora/updates/8/x86_64/ 23G total
For Fedora 7, per arch, that'd be:
$ du -sch fedora-unity/releases/7/Everything/x86_64/ \ fedora-unity/updates/7/x86_64/
16G fedora-unity/releases/7/Everything/x86_64/ 18G fedora-unity/updates/7/x86_64/ 34G total
again minus the updates in the tree already:
$ du -sch fedora/releases/7/Everything/x86_64/ fedora/updates/7/x86_64/ 16G fedora/releases/7/Everything/x86_64/ 13G fedora/updates/7/x86_64/ 28G total
So, roughly, 12GB per release per arch, not taking into account the ever growing number of packages.
Kind regards,
Jeroen van Meeuwen -kanarip
-- P.S. For everyone wondering, yes this host archives every single bit that ever hits the mirrors except for development/ and releases/test/.
infrastructure@lists.fedoraproject.org