Meeting Agenda Item: Introduction Al Butler
by Aljerin Butler,Jr.
Hello,
IRC handle: albutler
Skills/experience:
long-time Linux/GNOME user (10+ years); experience with HTML and CSS; programming experience using Perl, including CGI and DBI along with SQL on MySQL; VirtualBox virtualization technology; website content management (Zikula); past life as a hardware/software QA engineer
What I would like to learn:
Python, web app development/associated technologies, Linux sys/network admin and script programming, LAMP stack
You can see my LinkedIn profile at http://www.linkedin.com/in/albutler33
What I would like to work on:
I would like to work on cleaning up web pages, web apps for starters
Attached are 2 samples of my Perl programming used in putting together the color tables on my website at:
http://customlinux.tripod.com/sitebuildercontent/link_files/software-web_...
Al Butler
12 years, 4 months
notifications of rel-eng file changes
by Kevin Fenzi
Greetings.
in this ticket:
https://fedorahosted.org/fedora-infrastructure/ticket/2953
we have discussion of installing inotify or the like to allow folks to
see quickly when new files have uploaded, etc. I'm not sure I want the
overhead of inotify on our machines (although I suppose it might not be
too bad).
I was thinking this might be a good usecase for a simple message bus
type setup (which we never got to deploying).
Ie, run a message bus thing, have rel-eng tools add a thing to send a
message "F17 beta rc1 uploading" "F17 beta rc1 uploaded", etc.
Thoughts? Ideas?
kevin
12 years, 4 months
Plan for tomorrow's Fedora Infrastructure meeting (2011-12-15)
by Kevin Fenzi
The infrastructure team will be having it's weekly meeting tomorrow
2011-12-15 at 1900 UTC in #fedora-meeting on the freenode network.
Suggested topics (suggested by whom):
* New folks introductions and Apprentice tasks.
* serverbeach / collab / hosted status
* ibiblio machines status
* Upcoming Tasks/Items
2011-12-22 - contact peer1 about retirement schedule for old machines.
2011-12-23 to 2012-01-02 - rh shutdown week.
2011-01-03 - gitweb-cache removal day.
2012-01-10 - drop inactive maintainers from pkgdb acls.
2012-01-13 to 2012-01-15 - FUDCON blacksberg
2012-02-31 - fas 0.8.11 final release.
2012-02-14 to 2012-02-28 - F17 Alpha Freeze
2011-04-03 - gitweb-cache removal day.
* Meeting tagged tickets:
https://fedorahosted.org/fedora-infrastructure/report/10
* Open Floor
Submit your agenda items, as tickets in the trac instance and send a
note replying to this thread.
More info here:
https://fedoraproject.org/wiki/Infrastructure/Meetings#Meetings
Thanks
kevin
12 years, 4 months
[RFE] creating a meeting reminder cron job for all teams
by Kévin Raymond
As suggested on Trac[1], here is a thread letting us discuss more
about this idea.
The fact:
* People send reminder manually, sometimes late, sometimes wrong
* People forget sometimes to send the reminder.
* Many teams are doing that.
* It takes time.
* Sender need to subscribe to mailing lists
* If there is a new wiki page for each meeting (agenda), we can't find
the last one easily!
Letting teams registering events on a cron will improve that.
We could use the meeting page[2] (which would need to be updated!) to
get the meeting info.
Parameters:
* Meeting time
* How far should the reminder be sent (in days?)
* Meeting frequency, schedule
* To which list the reminder should be sent
* Static link to meeting details.
For example (as we already do in the French team):
===
This mail is a reminder for today's meeting for the French speaking community.
2011-11-28 / 18:30 UTC
IRC: freenode
#fedora-meeting
The agenda is located at:
http://fedoraproject.org/wiki/Réunions_hebdomadaires_de_la_French_team
===
As kevin said, including the time could be wrong because of daylight
saving time.
Hence the static meeting page is really important.
Lots of meeting follow a generic agenda, there is no need to create a
new page each time (take a look at [3]), the wiki history keep the old
meeting agenda for us. Therefor the meeting agenda should also be
included in the static meeting page.
Of course, if this is implemented, all teams (amba, infra, docs…)
should use it, else this is not really useful.
What do you think?
[1] https://fedorahosted.org/fedora-infrastructure/ticket/3029
[2] https://fedoraproject.org/wiki/Meetings
[3] https://fedoraproject.org/w/index.php?title=Category:Ambassadors&from=...
--
Kévin Raymond
User:shaiton
GPG-Key: A5BCB3A2
12 years, 4 months
AUTO: OOO Alernate Work Schedule (returning 12/14/2011)
by Shannon Mcleod
I am out of the office until 12/14/2011.
I am out of the office today taking AWS for an important personal matter. I
will be back at 5pm for scheduled changes.
Thanks,
Shannon McLeod
Unix System Administrator
Dubuque GDF
Note: This is an automated response to your message "infrastructure
Digest, Vol 67, Issue 11" sent on 12/13/2011 5:00:03.
This is the only notification you will receive while this person is away.
12 years, 4 months
NFS outage retrospective
by Kevin Fenzi
I just posted a blog post about the NFS outage, but I thought I would
copy it here to get more feedback.
http://scrye.com/wordpress-mu/nirik/2011/12/10/fedora-nfs-server-outage-r...
As you may have seen if you are on the fedora announce list, we had an
outage the other day of our main build system NFS storage. This meant
that no builds could be made and also data could not be downloaded from
koji (rpms, build info, etc). I thought I would share here what
happened so we can learn from and try and prevent or mitigate this
happening again.
First, a bit of background on the setup: We have a
storage device that exports raw storage as iSCSI. This is then
consumed/used by our main nfs server (nfs01). It’s using the device
with lvm2, and has a ext4 filesystem on it. It’s around 12TB in size.
This data is then exported to various other machines to use, including
builders, kojipkgs squid frontend for packages, koji hubs and release
engineering boxes that push updates. We also have a backup nfs server
(bnfs01) that has it’s own separate storage with a backup copy of the
primary data.
On the morning of December 8th, the connection between
the iSCSI backend and nfs01 had a hiccup. It retried current in
progress writes, and then decided it could resume ok and kept going.
The filesystem had “Errors behavior: Continue” set so it kept going
(although no actual fs errors were logged, so that may not matter).
Shortly after this, NFS locks started failing and builds were getting
I/O errors. A lvm snapshot was made and a fsck run on that snapshot,
which completed after around 2 hours. A fsck was then run on the actual
volume itself, but that took around 8 hours and showed a great deal
more corruption than the snapshot had. In order to get things into a
good state, we then did a rsync of the snapshot off to our backup
storage (which took around 8 hours), and merged that snapshot back as
the master fs on the volume (which took around 30min to complete).
Then, a reboot and we were back up ok, but there were some small number
of builds that were made after the issue started. We purged them from
the koji database and re-ran them with the current/valid/repaired
filesystem. After that builders were brought back on-line and queued up
builds processed and things were back to normal.
So, some lessons/ideas here, in no particular order:
12TB means most anything you decide to do will take a while to finish.
On the plus side that gives you lots of time to think about the next
step.
We should change the default on-error behavior to at least
'read-only' on that volume. Then errors would at least stop further
corruption, and best prevent the need for a lengthy fsck. It's not
entirely clear if the iSCSI errors would have made the fs hit error
condition or not however.
We could do better about more regular backups of this data. A daily
snapshot and rsync off of that snapshot to backup storage could save us
time in the event of another backup sync being needed. We would also
then have the snapshot to go back to if needed Down the road
Some of the cluster fses might be a good thing to investigate and
transition to. If we can spread the backend storage around and have
enough nodes, the failure of any one might not be as much impact.
Perhaps we could add monitoring for iscsi errors and note and react to
them quicker lvm and it's snapshots and ability to merge a snapshot
back in as primary really helped us out here.
Feel free to chime in with other thoughts or ideas. Hopefully it will
be quite some time since we have another outage like this one.
12 years, 4 months
fedorahosted / collab plans and progress
by Kevin Fenzi
Greetings.
Our nice timetable of events for migrating collab01/02 and hosted01 ran
into a monkey wrench the other week when we had to replace our new
hardware again. We've made progress on the new new hardware and here's
where we stand:
* I've installed a hosted02 on serverbeach06. This is a rhel5 clone of
hosted01. This is mostly just to make sure if anything happens to
hosted01 we are still able to bring something up quick.
* I've installed a collab03 (sb08) and collab04 (sb09). I've set them
up with a drdb device for /srv to keep them in sync. The idea is that
collab03 will be primary and in case anything fails, collab04 will be
there to bring up quickly.
* I plan to install a hosted03 later today. (rhel6, hosted01
replacement).
I'd like to get things moved sooner rather than later. Also, we are
starting to push up against the holiday season. So, I am proposing:
2011-12-12 (1 week away): collab move. (collab01 -> collab03)
2011-12-14 (next wed): hosted move. (hosted01 -> hosted03)
Outstanding questions:
1. When do we want to move the lists.fedorahosted.org? on the 12th? the
14th? or some other time entirely? We are going to need to get collab03
able to handle the additional domain and data.
2. On hosted, do we want to still change all projects to use
project.fedorahosted.org ? Or should we perhaps just wait on that
change? We could do that change on a per project opt in basis as we
move them to other instances, possibly with less disruption. On the
other hand, doing it now means we are ready to move things down the
road.
From our testing on hosted03 before we had to rebuild, the rhel5/rhel6
upgrade seemed to go pretty smoothly from all I could tell.
Any other gotchas, ideas, or things we could setup to test?
If no one screams on timing, I will probibly announce the collab move
later today and the hosted move on wed (giving people 1 weeks notice).
kevin
12 years, 4 months
hosted03 / fedorahosted.org rhel6-test back
by Kevin Fenzi
I've re-setup the hosted03 machine, copied all data from hosted01, and
run a convert script on it.
If you would like to test against it and see your project and data and
confirm all looks well for next weeks migration, simply add:
66.135.62.191 fedorahosted.org git.fedorahosted.org hg.fedorahosted.org svn.fedorahosted.org
to your /etc/hosts and browse/use your scm as normal.
Please let me know if you see any issues or concerns.
note that from time to time I will re-sync the data from hosted01 and re-convert it.
This means that sometimes the test instance will show unconverted data/trac errors.
Also, note that any changes you make will be overwritten in the next sync.
Happy testing.
kevin.
12 years, 4 months
Plan for tomorrow's Fedora Infrastructure Meeting (2011-12-08)
by Kevin Fenzi
The infrastructure team will be having it's weekly meeting tomorrow
2011-12-08 at 1900 UTC in #fedora-meeting on the freenode network.
Suggested topics (suggested by whom):
* New folks introductions and Apprentice tasks.
* serverbeach / collab / hosted status
* ibiblio machines status
* Upcoming Tasks/Items
2011-12-07 - announce hosted move and fedorahosted-announce list.
2011-12-08 - Fedora 14 end of life.
2011-12-12 - migrate to collab03/04 (evening) (tenative)
2011-12-14 - migrate hosted (tenative)
2011-12-23 to 2012-01-02 - rh shutdown week.
2012-01-10 - drop inactive maintainers from pkgdb acls.
2012-01-13 to 2012-01-15 - FUDCON blacksberg
2012-02-14 to 2012-02-28 - F17 Alpha Freeze
2012-02-31 - fas 0.8.11 final release.
* Meeting tagged tickets:
https://fedorahosted.org/fedora-infrastructure/report/10
* Open Floor
Submit your agenda items, as tickets in the trac instance and send a
note replying to this thread.
More info here:
https://fedoraproject.org/wiki/Infrastructure/Meetings#Meetings
Thanks
kevin
12 years, 4 months
Sysadmin / sysadmin-qa FAS group membership
by Josef Skladanka
Hello,
I need to join Sysadmin-qa FAS group, in order to be able to manage our staging server. This is accessible via the bastion.fedoraproject.org, and my understading is, that I need to be in the Sysadmin FAS group (which is a prerequisite for sysadmin-qa), to be able to access that machine.
Thank you
Josef
12 years, 4 months