Anonymized access log from a fedora mirror
by Lukas Zapletal
Hello,
I have two students interested in diploma thesis called Yum plugin for
suggesting packages based on usage:
http://bit.ly/18hrHbL
TL;DR - from anonymized access log, create a database of suggested
packages using data mining techniques and provide a Yum plugin that
would suggest "Users of vim also installed: ctags, git, ..."
I am gonna create a Fedora Feature wiki page shortly describing this in
more detail. Our goal is to offer this project for integration into
Fedora later on, at least provide Fedora packages for it.
To do that, we need good source of data. It would be best to collect
access logs from one or two main Fedora mirrors. We would provide short
script in Python that would parse access logs and anonymize the data (IP
address hash-salted) and filtered only relevant data (RPM files from
latest Fedora release or updates repositories). That would be phase one
which should give us a sample data.
Phase two would be to integrate this script with logrotate and for one
Fedora release cycle (Fedora 19) the script would collect relevant
anonymized data into a file. Final suggested package database would be
created from this file (or maybe files to allow us to move them on the
fly out of the stat directory).
The big (legal) question is if we are able to provide this anonymized
data to public, or if we want to sign NDA with all people involved. I am
CCing Tom for this question.
I need your help with connecting to relevant people. Any comments are
appreciated.
Many thanks and I hope this effort will lead to improving user
experience with Fedora packaging.
--
Later,
Lukas "lzap" Zapletal
irc: lzap #theforeman
10 years, 5 months
Fedora 19 Final Freeze in effect.
by Kevin Fenzi
Greetings.
we are now in the infrastructure freeze leading up to the Fedora 19
final release.
You can see a list of hosts that do not freeze by checking out the
ansible repo and running the freezelist script:
git clone http://infrastructure.fedoraproject.org/infra/ansible.git
scripts/freezelist -i inventory/inventory
Anything listed as freezes is frozen until 2013-07-03 (or later if Beta
slips). Frozen hosts should have no changes made to them without a
signoff on the change from at least 2 sysadmin-main or rel-eng members.
Thanks,
kevin
10 years, 10 months
[Freeze Break] Remove app05-08 for pkgdb
by Toshio Kuratomi
We've been seeing intermittent timeouts with a particular long running pkgdb
url. In an attempt to narrow down where the problem is coming from we'd
like to remove sending pkgdb requests to app05-app08. Those are backup app
servers that don't live in phx so the latency for them to talk to the db
server is high. For this particular page, it takes around 35s to get the
page from app01-04. It takes 2.5-3.5 minutes (roughly) to get the page from
app05-08.
We somewhat expect that the problem will continue but doing this will allow
us to eliminate the non-phx app servers as the source of the timeout.
Could I get two +1's for the following change?
diff --git a/modules/haproxy/files/haproxy.cfg b/modules/haproxy/files/haproxy.cfg
index b796235..ed45179 100644
--- a/modules/haproxy/files/haproxy.cfg
+++ b/modules/haproxy/files/haproxy.cfg
@@ -65,9 +65,9 @@ listen pkgdb 0.0.0.0:10003
server app2 app2:80 check inter 10s rise 2 fall 3
server app3 app3:80 check inter 10s rise 2 fall 3
server app4 app4:80 check inter 10s rise 2 fall 3
- server app05 app05:80 backup check inter 15s rise 2 fall 3
- server app6 app6:80 backup check inter 15s rise 2 fall 3
- server app08 app08:80 backup check inter 15s rise 2 fall 3
+# server app05 app05:80 backup check inter 15s rise 2 fall 3
+# server app6 app6:80 backup check inter 15s rise 2 fall 3
+# server app08 app08:80 backup check inter 15s rise 2 fall 3
# server bapp1 bapp1:80 backup check inter 10s rise 2 fall 3
option httpchk GET /pkgdb/collections/
-Toshio
10 years, 10 months
retroactive freeze break: exclude old staged content on download-ib01
by Kevin Fenzi
I just applied the following:
diff --git a/modules/mirrormanager/manifests/init.pp b/modules/mirrormanager/manifests/init.pp
index ad24df5..db33566 100644
--- a/modules/mirrormanager/manifests/init.pp
+++ b/modules/mirrormanager/manifests/init.pp
@@ -338,7 +338,7 @@ class mirrormanager::sync {
}
cron { "releng-sync":
- command => "/usr/local/bin/lock-wrapper staging '/usr/bin/rsync -qaH --progress --numeric-ids --exclude deltaisos/archive --delete --delete-delay --delay-updates alt.fedoraproject.org::fedora-stage /srv/pub/alt/stage/'",
+ command => "/usr/local/bin/lock-wrapper staging '/usr/bin/rsync -qaH --progress --numeric-ids --exclude deltaisos/archive --exclude 19-Alpha --exclude 19-Beta --delete --delete-delay --delay-updates alt.fedoraproject.org::fedora-stage /srv/pub/alt/stage/'",
user => 'root',
minute => [ 15 ]
}
This is to allow download-ib01 to exclude the old Alpha/Beta content
so it has space to mirror the actual release content. Since we need
to get it updated asap as some people pull from it, I went ahead
and applied this.
Retroactive +1s? or better ways to do it?
:)
kevin
10 years, 10 months
mirrormanager support for the cloud image
by Matthew Miller
Um, sorry, this is kind of a last-minute afterthought, so I'm not expecting
it for F19 launch or anything crazy. But:
For F19, we're putting cloud images on the mirrors. In the staging tree, the
URL is like this:
https://dl.fedoraproject.org/pub/alt/stage/19-RC1/Images/x86_64/Fedora-x8...
and I presume that the final pattern will be
../19/Images/x86_64/Fedora-x86_64-19-20130624-sda.qcow2
(Or possibly s/Images/Cloud/ -- I had thought that's what we previously
agreed but I need to dig that up.)
Is there a way that we could make permanent URLs for the following:
- per-versionsorted list of URLs to the image on mirrors
- per-version redirect to the closest (or at least randomly chosen) image suitable
for giving to OpenStack Glance directly
- an unversioned URL redirecting to the latest per-version list
- an unversioned URL redirecting to the latest-version redirect
I'm just guessing that mirrormanager is the best place for this. I'm happy
with any other solution....
--
Matthew Miller ☁☁☁ Fedora Cloud Architect ☁☁☁ <mattdm(a)fedoraproject.org>
10 years, 10 months
Plan for tomorrow's Fedora Infrastructure meeting (2013-06-27)
by Kevin Fenzi
The infrastructure team will be having it's weekly meeting tomorrow,
2013-06-27 at 19:00 UTC in #fedora-meeting on the freenode network.
Suggested topics:
#topic New folks introductions and Apprentice tasks.
If any new folks want to give a quick one line bio or any apprentices
would like to ask general questions, they can do so in this part of the
meeting. Don't be shy!
#topic Applications status / discussion
Check in on status of our applications: pkgdb, fas, bodhi, koji,
community, voting, tagger, packager, dpsearch, etc.
If there's new releases, bugs we need to work around or things to note.
#topic Sysadmin status / discussion
Here we talk about sysadmin related happenings from the previous week,
or things that are upcoming.
#topic Fedora 19 release tasks
We need to make sure we are all ready for Fedora 19 release and there's
nothing that could cause problems for release.
#topic Private Cloud status update / discussion
#topic Upcoming Tasks/Items
https://apps.fedoraproject.org/calendar/list/infrastructure/
#topic Open Floor
Submit your agenda items, as tickets in the trac instance and send a
note replying to this thread.
More info here:
https://fedoraproject.org/wiki/Infrastructure/Meetings#Meetings
Thanks
kevin
10 years, 11 months
apache and app logs retrieval
by Seth Vidal
Last week when we were talking about spawning rdiff-backup to backup
our systems, we diverged into discussing app/apache logs and the
somewhat complicated system we currently have for grabbing those logs.
Right now we have a list of hosts on log02 that it should grab logs
from. Those hosts need to have rsyncd running on them to allow access
from log02 to fetch the /var/log/httpd/ path from them.
That requires 2 things to be coupled and it is a bit awkward if you set
up a host that is tricky to access from log02 or isn't on the vpn.
In general I also am not in love with having to have rsyncd listening
on systems - even if it is ip-restricted.
So the thought was we could do something like this on log02:
1. setup an ssh key on log02 that can run rsync to /var/log/httpd on
all hosts
2. make any host that needs to have its logs retrieved be marked in
the ansible inventory host/group vars
3. git clone public-ansible-repo onto log02
4. use group_by to construct a group of the hosts which can then be
retrieved using rsync.
The sole reason for using ansible here is so we can keep the log sync
info in our inventory and to parallelize the retrieval of logs.
This is more or less identical to what we talked about for backups
using rdiff-backup.
When we were discussing this Luke mentioned then using
tbgrep(https://pypi.python.org/pypi/tbgrep) to search the resulting
files and compile a set of tracebacks our apps are dumping out.
If we have all the logs on log02 generating a report like this would be
pleasantly kept away from the rest of our hosts and could give us
reasonably useful reports of brokenness.
I'd love some feed back on if this is all crazy or not :)
-sv
10 years, 11 months
[PATCH] hopefully add cloud.fp.o redirects to https://fedoraproject.org/en/get-fedora-options#clouds to the proxy config
by Seth Vidal
From: Seth Vidal <skvidal(a)fedoraproject.org>
This is for ticket https://fedorahosted.org/fedora-infrastructure/ticket/3857
mattdm wants cloud.fp.o to redirect to that page.
This _should_ implement that change. :)
two +1's needed
-sv
---
manifests/services/proxy.pp | 14 ++++++++++++++
1 files changed, 14 insertions(+), 0 deletions(-)
diff --git a/manifests/services/proxy.pp b/manifests/services/proxy.pp
index c90c171..429a59e 100644
--- a/manifests/services/proxy.pp
+++ b/manifests/services/proxy.pp
@@ -95,6 +95,14 @@ class proxy {
sSLCertificateChainFile => "wildcard.fedoraproject.org.intermediate.cert",
}
+ httpd::website { "cloud.fedoraproject.org":
+ ips => $wildcard_fpo_ips,
+ ssl => true,
+ sslonly => true,
+ cert_name => "wildcard.fedoraproject.org",
+ sSLCertificateChainFile => "wildcard.fedoraproject.org.intermediate.cert",
+ }
+
httpd::website { "mirrors.fedoraproject.org":
ips => $wildcard_fpo_ips,
server_aliases => [ "mirrors.stg.fedoraproject.org" ],
@@ -798,6 +806,12 @@ if $puppetEnvironment == 'staging'{
target => "http://docs.fedoraproject.org/",
}
+ httpd::redirect { "cloud-front-page":
+ website => "cloud.fedoraproject.org",
+ path => "/",
+ target => "http://fedoraproject.org/en/get-fedora-options#clouds",
+ }
+
httpd::redirect { "infofeed":
website => "fedoraproject.org",
path => "/infofeed",
--
1.7.2.1
10 years, 11 months
freeze break request: update permissions to .transifexrc
by Kévin Raymond
Hi there,
For several weeks (I don't really know when transifex-client has been
updated on the builders), we got issues pulling POs for the websites.
Too errors was introduced by the transifex-client update:
- it has to use https now (fixed with previous patch)
- .transifexrc has to be readable and writable now (only readable before).
Please see the first patch here.
I am also doing an other commit to improve the pulling script.. In order
to send us errors.. And to correct the hostname.
Pretty simple and used only for websites.
Thanks,
--
Kévin Raymond
(Shaiton)
10 years, 11 months