Would be interesting to have in Fedora something like this
?
Look interesting from a QA point of view.
Regards
On Sun, 2 May 2010 17:25:10 +0200 yersinia yersinia.spiros@gmail.com wrote:
Would be interesting to have in Fedora something like this
?
Look interesting from a QA point of view.
It's been suggested many times before, but no one has really stepped forward to champion it. ;)
There is an rpm version being worked on by an OpenSUSE person:
http://gitorious.org/opensuse/popcorn
Something would need to be packaged, tested, etc.
Then the problem becomes what data to store, how to store it. It's going to be a vast amount of data, and we would need some server to store it, policies around when to drop entries, etc.
Not that I think it's a bad idea, It just needs a group of determined people to work on and make it happen. ;)
kevin
It's been suggested many times before, but no one has really stepped forward to champion it. ;)
There is an rpm version being worked on by an OpenSUSE person:
http://gitorious.org/opensuse/popcorn
Something would need to be packaged, tested, etc.
Then the problem becomes what data to store, how to store it. It's going to be a vast amount of data, and we would need some server to store it, policies around when to drop entries, etc.
Not that I think it's a bad idea, It just needs a group of determined people to work on and make it happen. ;)
kevin
i have looked at the source code (C server side / Python client side), it uses libtdb [1] as storage back-end (a plain text format) , i think that sqlite is better, and you can port it to other DBMS such as Postgres or MySQL
But how it can be integrated in Fedora, by writing yum plug-in ?
On Mon, 2010-05-03 at 19:20 +0100, Athmane Madjoudj wrote:
It's been suggested many times before, but no one has really stepped forward to champion it. ;)
There is an rpm version being worked on by an OpenSUSE person:
http://gitorious.org/opensuse/popcorn
Something would need to be packaged, tested, etc.
Then the problem becomes what data to store, how to store it. It's going to be a vast amount of data, and we would need some server to store it, policies around when to drop entries, etc.
Not that I think it's a bad idea, It just needs a group of determined people to work on and make it happen. ;)
i have looked at the source code (C server side / Python client side), it uses libtdb [1] as storage back-end (a plain text format)
TDB is not "plain text" it's a key/value store, like BDB/etc.
, i think that sqlite is better, and you can port it to other DBMS such as Postgres or MySQL
My guess is that sqlite would be nicer though, as I'd imagine you wouldn't want to store just key/values.
But how it can be integrated in Fedora, by writing yum plug-in ?
I can't think why you'd want a plugin, but you'd probably need to use the yum API ... at least so you could get data out of yumdb. The "client" side should be truly trivial though. Dumping the installed packages, which repos. they came from and the reason for their install ... is probably like 10 lines of yum code. If someone is doing this work, they'd probably do a bit more to get more info. ... but again, it would be trivial in comparison to the work needed on the server side (and to get people to install it etc.)
I would find it interesting too. Not only about the usage of my packages, but it might also be interesting to know what packages are installed from other repositories (fusion, adobe, skype, remi, ...) .
I think it would help to make decisions about what packages to keep supporting or include in the different Fedora spins.
More information about our users can't be a bad thing, at the moment there seems to be a log of guesswork around.
It obviously has to be a opt-in functionality and the data made anonymous. Maybe it could be included in smolt.
Cheers, Christof
On Tue, May 4, 2010 at 23:21, Kevin Kofler kevin.kofler@chello.at wrote:
James Antill wrote:
I can't think why you'd want a plugin
To automatically count the package as installed as soon as you "yum install" it?
Kevin Kofler
-- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Am Montag, den 03.05.2010, 11:52 -0600 schrieb Kevin Fenzi:
On Sun, 2 May 2010 17:25:10 +0200 yersinia yersinia.spiros@gmail.com wrote:
Would be interesting to have in Fedora something like this
?
Look interesting from a QA point of view.
It's been suggested many times before, but no one has really stepped forward to champion it. ;)
There is an rpm version being worked on by an OpenSUSE person:
http://gitorious.org/opensuse/popcorn
Something would need to be packaged, tested, etc.
Then the problem becomes what data to store, how to store it. It's going to be a vast amount of data, and we would need some server to store it, policies around when to drop entries, etc.
Not that I think it's a bad idea, It just needs a group of determined people to work on and make it happen. ;)
Wouldn't it be easier to let MirrorManager do that? This way each mirror can save a counter per package and publish them statically on the server side. To get a total amount of the data, all counter files from all servers needs to be collected and that's it.
Maybe a bit less data, than collecting anything like in smolt...
Of course, this does not say 'how many computers out there have package X'. Maybe it fails and one needs to download a package twice and so on. But I think this would be a first great approximation.
Comments?
On 2010-05-03 09:25:06 PM, Thomas Spura wrote:
Wouldn't it be easier to let MirrorManager do that? This way each mirror can save a counter per package and publish them statically on the server side. To get a total amount of the data, all counter files from all servers needs to be collected and that's it.
Maybe a bit less data, than collecting anything like in smolt...
Of course, this does not say 'how many computers out there have package X'. Maybe it fails and one needs to download a package twice and so on. But I think this would be a first great approximation.
I don't think yum requests packages specifically from mirrormanager. It gets a mirror URL from mirrormanager, then downloads everything else directly from the mirror that it gets back.
Thanks, Ricky
Ricky Zhou (ricky@fedoraproject.org) said:
Of course, this does not say 'how many computers out there have package X'. Maybe it fails and one needs to download a package twice and so on. But I think this would be a first great approximation.
I don't think yum requests packages specifically from mirrormanager. It gets a mirror URL from mirrormanager, then downloads everything else directly from the mirror that it gets back.
Ricky's analysis is correct. Mirrormanager logs only tell us what version/arch someone is looking for, not what packages.
Bill
On Tue, May 4, 2010 at 6:30 PM, Bill Nottingham notting@redhat.com wrote:
Ricky Zhou (ricky@fedoraproject.org) said:
Of course, this does not say 'how many computers out there have package X'. Maybe it fails and one needs to download a package twice and so on. But I think this would be a first great approximation.
I don't think yum requests packages specifically from mirrormanager. It gets a mirror URL from mirrormanager, then downloads everything else directly from the mirror that it gets back.
Ricky's analysis is correct. Mirrormanager logs only tell us what version/arch someone is looking for, not what packages.
Ok, thank you. There goes the theory. I doubt we can have proper logs from any mirror. Maybe it's possible to have it collect that data on our own mirror(s) and have at least little info. Or have it opt-in, phone home and hope many users be part of it.
Maybe it's not worth all the effort, dunno.
On Sun, May 2, 2010 at 7:25 AM, yersinia yersinia.spiros@gmail.com wrote:
Look interesting from a QA point of view.
How exactly is this interesting from a QA pov in Fedora? Smolt profiles I can understand being useful for QA because it gives us some ability to look for commonalities when troubleshooting hardware problems. I'm really not sure what installed packaging information gives up in terms of helping any QA process. Care to explain your thoughts on this?
Debian uses popcon for a specific reason...to help in ordering the packages on their install media sets. I'm not sure we are interested in that sort of help...Debian releases are a vastly different timescale than ours. We aren't going to adapt the media contents based on popcon every 6 months.. I don't see us making a commitment to use the data in the same way Debian uses it..so I'm left scratching my head on how we will use it at all.
Before I would be personally willing to commit time on seeing this implemented I would need to know what the perceived value is. I love datamining...but I'm not a big fan of collecting data without first having a stated reason for the collection of that information. If we are going to collect it I expect it to be used and I expect the initial use to be stated before we start collecting it.
And more generally speaking. I'm not keen on collecting information unless there is a potential direct benefit for users who are providing the information. So the reason for collection needs to be sufficiently...user-focused...and not just because we want metrics. Collecting the information has to be used primarily to help us provide a better user experience or I'm going to get really pissy about it. Fair warning.
-jef
On Mon, May 3, 2010 at 9:27 PM, Jeff Spaleta jspaleta@gmail.com wrote:
On Sun, May 2, 2010 at 7:25 AM, yersinia yersinia.spiros@gmail.com wrote:
Look interesting from a QA point of view.
How exactly is this interesting from a QA pov in Fedora? Smolt profiles I can understand being useful for QA because it gives us some ability to look for commonalities when troubleshooting hardware problems. I'm really not sure what installed packaging information gives up in terms of helping any QA process. Care to explain your thoughts on this?
Debian uses popcon for a specific reason...to help in ordering the packages on their install media sets. I'm not sure we are interested in that sort of help...Debian releases are a vastly different timescale than ours. We aren't going to adapt the media contents based on popcon every 6 months.. I don't see us making a commitment to use the data in the same way Debian uses it..so I'm left scratching my head on how we will use it at all.
Before I would be personally willing to commit time on seeing this implemented I would need to know what the perceived value is. I love datamining...but I'm not a big fan of collecting data without first having a stated reason for the collection of that information. If we are going to collect it I expect it to be used and I expect the initial use to be stated before we start collecting it.
- Superb information for us packagers if and how much (of course not the correct value) users use the software i package - Helps to decide if a package can be easily removed from Fedora (upstream dead, no users left, good bye is no problem)
At least two points, so rock on and implement it please :)
On Mon, May 3, 2010 at 12:03 PM, Thomas Janssen thomasj@fedoraproject.org wrote:
- Superb information for us packagers if and how much (of course not
the correct value) users use the software i package
It may or may not be superb information...but you haven't told me how collecting this information is helpful to the users of my packages. Nor have you told me exactly how we as a project would use this information. So what if you find out that all your packages are in 0.01% the long tail of lowest popularity..how does that help you as a maintainer? How would knowing it they were very popular than your thought help you as a maintainer?
- Helps to decide if a package can be easily removed from Fedora
(upstream dead, no users left, good bye is no problem)
1) Popcon says nothing about dead upstream 2) Having zero counts in popcon does not mean unused.
You can not make effective project choices with regard to expiring packages on popcon data that is opt-in. We have way too many niche packages which will have zero popcon counts but are still in use by someone. And if you are proposing that we go further than Debian and have this as on by default?
-jef
On Mon, May 3, 2010 at 10:21 PM, Jeff Spaleta jspaleta@gmail.com wrote:
On Mon, May 3, 2010 at 12:03 PM, Thomas Janssen thomasj@fedoraproject.org wrote:
- Superb information for us packagers if and how much (of course not
the correct value) users use the software i package
It may or may not be superb information...but you haven't told me how collecting this information is helpful to the users of my packages.
It's not intended to be helpful for the users, but for me as packager and upstream.
Nor have you told me exactly how we as a project would use this information.
We could use the information to find out, if an app that we think is the real deal, is really the real deal. I bet we would be surprised about the one or other application. Well maybe not :)
So what if you find out that all your packages are in 0.01% the long tail of lowest popularity..how does that help you as a maintainer? How would knowing it they were very popular than your thought help you as a maintainer?
That wouldn't effect my work as maintainer directly. Though the one or the other maintainer might be more careful with updates if he finds out that his software is used by a large portion of the community. Well, looks like i found at least one thing that it's helpful for our users. Oh, another thing helpful for our users: People who search for some software, not sure which one might be the best, could start with the most popular. I might have more ideas, it's almost midnight, nothing more to expect from me for today, sorry :)
- Helps to decide if a package can be easily removed from Fedora
(upstream dead, no users left, good bye is no problem)
- Popcon says nothing about dead upstream
- Having zero counts in popcon does not mean unused.
1) Right, i expect maintainers to find out if upstream is dead or not. I wasn't expecting popcorn to do it for me. 2) Right as well. Though if it's possible to have it implemented as said via mirrormanager (sorry i'm not part of infra so i can't tell if it's possible or not, or how to do it) it would at least indicate there are not much users.
You can not make effective project choices with regard to expiring packages on popcon data that is opt-in. We have way too many niche packages which will have zero popcon counts but are still in use by someone. And if you are proposing that we go further than Debian and have this as on by default?
Well, of course it wouldn't be only because of some popcon data, tough it would be better than nothing (our current situation). I mean mainly packages that are orphaned and nobody want's to pick them up due to nobody uses it. How would we decide today for such a package, we can't. Because we have no idea about how much users "might" be there.
Good question about on or off by default. To make sense it should be on by default.
-jef
devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Thomas Janssen wrote:
Good question about on or off by default. To make sense it should be on by default.
NO! Popcon may have its uses, and I actually have it enabled on my Debian boxes, but it *must* be strictly opt-in. If it were on by default it would be spyware, and I do *not* want an operating system with spyware in it.
Björn Persson
Am Montag, den 03.05.2010, 23:30 +0200 schrieb Björn Persson:
Thomas Janssen wrote:
Good question about on or off by default. To make sense it should be on by default.
NO! Popcon may have its uses, and I actually have it enabled on my Debian boxes, but it *must* be strictly opt-in. If it were on by default it would be spyware, and I do *not* want an operating system with spyware in it.
Good point. +1
I don't think it's spyware, if it's enabled by default on the server side, do you? e.g. sourceforge does the same with their statistics counter (or any other web counter online).
'Phoning home' is indeed some kind of spyware...
T
Thomas Spura wrote:
I don't think it's spyware, if it's enabled by default on the server side, do you? e.g. sourceforge does the same with their statistics counter (or any other web counter online).
No, extracting download statistics from web server logs isn't spyware. Spyware is software that runs on a user's computer and, without the user's explicit permission, sends out data that otherwise wouldn't have left the user's computer.
It can be considered unethical to publish or sell detailed information about individual users' activities even if no spyware was involved in gathering the information, but I don't think anyone thinks a download count is a problem.
Björn Persson
2010/5/3 Björn Persson bjorn@xn--rombobjrn-67a.se:
Thomas Janssen wrote:
Good question about on or off by default. To make sense it should be on by default.
NO! Popcon may have its uses, and I actually have it enabled on my Debian boxes, but it *must* be strictly opt-in. If it were on by default it would be spyware, and I do *not* want an operating system with spyware in it.
Well, i wouldn't call a software that counts serverside downloads of FOSS software and gives based on that downloads/installations, a popularity suggestion in packagekit, spyware. There's nothing at all that gets sent out of your box.
Remind, i'm not speaking of exactly popcon. I spoke about something in the server, just counting the download/installation (not even unique installations via some hash or whatever) and a packagekit extension that shows the count or something like stars or whatever.
So, it has to be on by default.
Thomas Janssen wrote:
Well, i wouldn't call a software that counts serverside downloads of FOSS software and gives based on that downloads/installations, a popularity suggestion in packagekit, spyware. There's nothing at all that gets sent out of your box.
Remind, i'm not speaking of exactly popcon. I spoke about something in the server, just counting the download/installation (not even unique installations via some hash or whatever) and a packagekit extension that shows the count or something like stars or whatever.
So, it has to be on by default.
I'm sorry if I misunderstood you, but if you talked about download statistics then that was far from obvious. I got the impression that you talked about the same thing as "yersinia", and "yersinia" talked about Popcon.
So what would it mean for download statistics to be on or off by default? How would a user override the default? Would there be an option in Yum that would let the user choose whether their downloads should be counted? Would that choice be communicated in every HTTP or FTP request?
Björn Persson
2010/5/4 Björn Persson bjorn@xn--rombobjrn-67a.se:
Thomas Janssen wrote:
Well, i wouldn't call a software that counts serverside downloads of FOSS software and gives based on that downloads/installations, a popularity suggestion in packagekit, spyware. There's nothing at all that gets sent out of your box.
Remind, i'm not speaking of exactly popcon. I spoke about something in the server, just counting the download/installation (not even unique installations via some hash or whatever) and a packagekit extension that shows the count or something like stars or whatever.
So, it has to be on by default.
I'm sorry if I misunderstood you, but if you talked about download statistics then that was far from obvious. I got the impression that you talked about the same thing as "yersinia", and "yersinia" talked about Popcon.
Yeah, my bad. I wasn't very clear, sorry.
So what would it mean for download statistics to be on or off by default? How would a user override the default? Would there be an option in Yum that would let the user choose whether their downloads should be counted? Would that choice be communicated in every HTTP or FTP request?
Very good question. The thing that has to be on by default is the server-side counting. The other part on or off by default would be the "plugin" for packagekit, to get this little stars or numbers (to make sense, on). And i think it wouldn't hurt to have it on by default (assuming there's a switch to turn it off). Of course we could tell the people that there's nothing to worry about (no spyware). That could be done in the release-notes. I'm sure some of the bigger portals will tell it the people anyways in their news when we release.
2010/5/4 Björn Persson bjorn@xn--rombobjrn-67a.se
Thomas Janssen wrote:
Well, i wouldn't call a software that counts serverside downloads of FOSS software and gives based on that downloads/installations, a popularity suggestion in packagekit, spyware. There's nothing at all that gets sent out of your box.
Remind, i'm not speaking of exactly popcon. I spoke about something in the server, just counting the download/installation (not even unique installations via some hash or whatever) and a packagekit extension that shows the count or something like stars or whatever.
So, it has to be on by default.
I'm sorry if I misunderstood you, but if you talked about download statistics then that was far from obvious. I got the impression that you talked about the same thing as "yersinia", and "yersinia" talked about Popcon.
No. I have only post the question whether this feature, right or wrong as it can be, could be interesting in Fedora, as other distro have done elsewhere. Just to to hold a discussion, if there was interest in the functionality. I think it is better to open a debate about a feature before seeing what is the best implementation for it, if exists at all or it is better to develop it from scratch.
On Tue, May 4, 2010 at 9:25 PM, devzero2000 pinto.elia@gmail.com wrote:
2010/5/4 Björn Persson bjorn@xn--rombobjrn-67a.se
Thomas Janssen wrote:
Well, i wouldn't call a software that counts serverside downloads of FOSS software and gives based on that downloads/installations, a popularity suggestion in packagekit, spyware. There's nothing at all that gets sent out of your box.
Remind, i'm not speaking of exactly popcon. I spoke about something in the server, just counting the download/installation (not even unique installations via some hash or whatever) and a packagekit extension that shows the count or something like stars or whatever.
So, it has to be on by default.
I'm sorry if I misunderstood you, but if you talked about download statistics then that was far from obvious. I got the impression that you talked about the same thing as "yersinia", and "yersinia" talked about Popcon.
No. I have only post the question whether this feature, right or wrong as it can be, could be interesting in Fedora, as other distro have done elsewhere. Just to to hold a discussion, if there was interest in the functionality. I think it is better to open a debate about a feature before seeing what is the best implementation for it, if exists at all or it is better to develop it from scratch.
On Mon, May 3, 2010 at 1:06 PM, Thomas Janssen thomasj@fedoraproject.org wrote:
To make sense it should be on by default.
Good luck with that. I strongly suggest that any usage which only makes sense with "on by default" is not a usage you can rely on as a strawman.
The popularity application idea would be a compelling user benefit..but popcon as constructed really doesn't integrate well enough to give users useful "popular" application suggestions in a way that makes sense. We'd need something that integrates with PackageKit to relay suggestions. I suggest you read up on the now defunct mugshot idea that Red Hat/ Gnome developers implemented as a proof-of-concept of how to provide a popular suggestions idea. What mugshot did is directly comparable to what you want to do.
-jef
On Mon, 2010-05-03 at 13:32 -0800, Jeff Spaleta wrote:
The popularity application idea would be a compelling user benefit..but popcon as constructed really doesn't integrate well enough to give users useful "popular" application suggestions in a way that makes sense. We'd need something that integrates with PackageKit to relay suggestions. I suggest you read up on the now defunct mugshot idea that Red Hat/ Gnome developers implemented as a proof-of-concept of how to provide a popular suggestions idea. What mugshot did is directly comparable to what you want to do.
gnome-shell actually still collects app usage data in ~/.gnome2/shell (somewhat similar to what mugshot did), but it does nothing in particular with the data atm. And there is no popcon-esque framework to collect this data in a central place.
On Mon, May 03, 2010 at 10:03:34PM +0200, Thomas Janssen wrote:
- Helps to decide if a package can be easily removed from Fedora
(upstream dead, no users left, good bye is no problem)
Sounds like another tool to beat maintainers with.
Rich.
On Tue, May 4, 2010 at 11:43 AM, Richard W.M. Jones rjones@redhat.com wrote:
On Mon, May 03, 2010 at 10:03:34PM +0200, Thomas Janssen wrote:
- Helps to decide if a package can be easily removed from Fedora
(upstream dead, no users left, good bye is no problem)
Sounds like another tool to beat maintainers with.
To beat maintainers with is not a use case i had in mind. Could you explain how?
On Mon, May 3, 2010 at 9:27 PM, Jeff Spaleta jspaleta@gmail.com wrote:
On Sun, May 2, 2010 at 7:25 AM, yersinia yersinia.spiros@gmail.com wrote:
Look interesting from a QA point of view.
How exactly is this interesting from a QA pov in Fedora? Smolt profiles I can understand being useful for QA because it gives us some ability to look for commonalities when troubleshooting hardware problems. I'm really not sure what installed packaging information gives up in terms of helping any QA process. Care to explain your thoughts on this?
Sure, I can try. If one software is used many time from many user,
directly or indirectly, and it have not such many problems (e.g bug open on bugzilla for example ), well this could guide to the decision of the goodness of the software and the need to delete it or not if the maintainer does not believe to support it yet. Conversely, if software is not used - directly or indirectly - this could facilitate the decision to remove it, in the event that this possibility emerge. In general it depends on what someone thing of what it is the QA of a distribution: finding bugs, automate the process of finding bugs is one thing. But not alone. But it s only a personal opinion.
Regards
On Mon, May 3, 2010 at 12:19 PM, yersinia yersinia.spiros@gmail.com wrote:
Sure, I can try. If one software is used many time from many user, directly or indirectly, and it have not such many problems (e.g bug open on bugzilla for example ), well this could guide to the decision of the goodness of the software and the need to delete it or not if the maintainer does not believe to support it yet.
Are you speaking as a member of the QA team? This sounds very hypothetical..and not very specific. I'm really not sure that popcon is going to tell us anything about fitness of a package in a way that bugzilla does not already....especially when popcon its not setup to provide enough details in its summary reports to distinguish individual versions of one against another...let alone whether they are testing versions or what not. I don't see how this data is useful at all compared to abrt and bugzilla in doing any sort of QA for anyone.
Here's a little exercise . Go into debian's bug tracking system and compare the rank order popularity of a package in popcon with the rank ordering of number of bugs filed against that package in debian's ticketing system for all time. There is a popular theory that holds that all software is roughly equally buggy and that roughly speaking the number of people reporting problems scales with its popularity...not with its general fitness.
To be useful for package QA we'd really have to tie it into bodhi to give detailed information about how many people are using a given testing updates compared to nominal usage to help maintainers feel confident that its seen enough testing even when there's been no karma added or subtracted. But such an integrated system would not look anything like popcon and a there would need to be a sit down discussion with bodhi developers on how to best integrate that sort of client side data.
Conversely, if software is not used - directly or indirectly - this could facilitate the decision to remove it, in the event that this possibility emerge.
Popcon is designed as an opt-in system. You cannot rely on it giving it accurate results on actual usage for the full breath of our repository. Nor the Debian repository for that matter. We carry a significant number of niche packages...packages that will not be seen by popcon-like data collection unless you get a significant proportion of the userbase running it and catch all possible usage scenarios.
If you are proposing that the project employ a data driven package expiration policy.. then lets figure out what that policy needs to be and build data collection to meet its requirements...not build the data collection then build the policy that fits after the fact.
I do not want to see data collected just because its possible to collect. I want there to be a specific reason and a firm commitment to using the data that we can communicate to our users.
-jef