December 2011 - infrastructure - Fedora Mailing-Lists

Meeting Agenda Item: Introduction Al Butler

by Aljerin Butler,Jr.

Hello, IRC handle: albutler Skills/experience: long-time Linux/GNOME user (10+ years); experience with HTML and CSS; programming experience using Perl, including CGI and DBI along with SQL on MySQL; VirtualBox virtualization technology; website content management (Zikula); past life as a hardware/software QA engineer What I would like to learn: Python, web app development/associated technologies, Linux sys/network admin and script programming, LAMP stack You can see my LinkedIn profile at http://www.linkedin.com/in/albutler33 What I would like to work on: I would like to work on cleaning up web pages, web apps for starters Attached are 2 samples of my Perl programming used in putting together the color tables on my website at: http://customlinux.tripod.com/sitebuildercontent/link_files/software-web_... Al Butler

12 years, 4 months

2
1
0 / 0

notifications of rel-eng file changes

by Kevin Fenzi

Greetings. in this ticket: https://fedorahosted.org/fedora-infrastructure/ticket/2953 we have discussion of installing inotify or the like to allow folks to see quickly when new files have uploaded, etc. I'm not sure I want the overhead of inotify on our machines (although I suppose it might not be too bad). I was thinking this might be a good usecase for a simple message bus type setup (which we never got to deploying). Ie, run a message bus thing, have rel-eng tools add a thing to send a message "F17 beta rc1 uploading" "F17 beta rc1 uploaded", etc. Thoughts? Ideas? kevin

12 years, 4 months

2
2
0 / 0

Plan for tomorrow's Fedora Infrastructure meeting (2011-12-15)

by Kevin Fenzi

The infrastructure team will be having it's weekly meeting tomorrow 2011-12-15 at 1900 UTC in #fedora-meeting on the freenode network. Suggested topics (suggested by whom): * New folks introductions and Apprentice tasks. * serverbeach / collab / hosted status * ibiblio machines status * Upcoming Tasks/Items 2011-12-22 - contact peer1 about retirement schedule for old machines. 2011-12-23 to 2012-01-02 - rh shutdown week. 2011-01-03 - gitweb-cache removal day. 2012-01-10 - drop inactive maintainers from pkgdb acls. 2012-01-13 to 2012-01-15 - FUDCON blacksberg 2012-02-31 - fas 0.8.11 final release. 2012-02-14 to 2012-02-28 - F17 Alpha Freeze 2011-04-03 - gitweb-cache removal day. * Meeting tagged tickets: https://fedorahosted.org/fedora-infrastructure/report/10 * Open Floor Submit your agenda items, as tickets in the trac instance and send a note replying to this thread. More info here: https://fedoraproject.org/wiki/Infrastructure/Meetings#Meetings Thanks kevin

12 years, 4 months

1
1
0 / 0

[RFE] creating a meeting reminder cron job for all teams

by Kévin Raymond

As suggested on Trac[1], here is a thread letting us discuss more about this idea. The fact: * People send reminder manually, sometimes late, sometimes wrong * People forget sometimes to send the reminder. * Many teams are doing that. * It takes time. * Sender need to subscribe to mailing lists * If there is a new wiki page for each meeting (agenda), we can't find the last one easily! Letting teams registering events on a cron will improve that. We could use the meeting page[2] (which would need to be updated!) to get the meeting info. Parameters: * Meeting time * How far should the reminder be sent (in days?) * Meeting frequency, schedule * To which list the reminder should be sent * Static link to meeting details. For example (as we already do in the French team): === This mail is a reminder for today's meeting for the French speaking community. 2011-11-28 / 18:30 UTC IRC: freenode #fedora-meeting The agenda is located at: http://fedoraproject.org/wiki/Réunions_hebdomadaires_de_la_French_team === As kevin said, including the time could be wrong because of daylight saving time. Hence the static meeting page is really important. Lots of meeting follow a generic agenda, there is no need to create a new page each time (take a look at [3]), the wiki history keep the old meeting agenda for us. Therefor the meeting agenda should also be included in the static meeting page. Of course, if this is implemented, all teams (amba, infra, docs…) should use it, else this is not really useful. What do you think? [1] https://fedorahosted.org/fedora-infrastructure/ticket/3029 [2] https://fedoraproject.org/wiki/Meetings [3] https://fedoraproject.org/w/index.php?title=Category:Ambassadors&from=... -- Kévin Raymond User:shaiton GPG-Key: A5BCB3A2

12 years, 4 months

2
1
0 / 0

AUTO: OOO Alernate Work Schedule (returning 12/14/2011)

by Shannon Mcleod

I am out of the office until 12/14/2011. I am out of the office today taking AWS for an important personal matter. I will be back at 5pm for scheduled changes. Thanks, Shannon McLeod Unix System Administrator Dubuque GDF Note: This is an automated response to your message "infrastructure Digest, Vol 67, Issue 11" sent on 12/13/2011 5:00:03. This is the only notification you will receive while this person is away.

12 years, 4 months

1
0
0 / 0

NFS outage retrospective

by Kevin Fenzi

I just posted a blog post about the NFS outage, but I thought I would copy it here to get more feedback. http://scrye.com/wordpress-mu/nirik/2011/12/10/fedora-nfs-server-outage-r... As you may have seen if you are on the fedora announce list, we had an outage the other day of our main build system NFS storage. This meant that no builds could be made and also data could not be downloaded from koji (rpms, build info, etc). I thought I would share here what happened so we can learn from and try and prevent or mitigate this happening again. First, a bit of background on the setup: We have a storage device that exports raw storage as iSCSI. This is then consumed/used by our main nfs server (nfs01). It’s using the device with lvm2, and has a ext4 filesystem on it. It’s around 12TB in size. This data is then exported to various other machines to use, including builders, kojipkgs squid frontend for packages, koji hubs and release engineering boxes that push updates. We also have a backup nfs server (bnfs01) that has it’s own separate storage with a backup copy of the primary data. On the morning of December 8th, the connection between the iSCSI backend and nfs01 had a hiccup. It retried current in progress writes, and then decided it could resume ok and kept going. The filesystem had “Errors behavior: Continue” set so it kept going (although no actual fs errors were logged, so that may not matter). Shortly after this, NFS locks started failing and builds were getting I/O errors. A lvm snapshot was made and a fsck run on that snapshot, which completed after around 2 hours. A fsck was then run on the actual volume itself, but that took around 8 hours and showed a great deal more corruption than the snapshot had. In order to get things into a good state, we then did a rsync of the snapshot off to our backup storage (which took around 8 hours), and merged that snapshot back as the master fs on the volume (which took around 30min to complete). Then, a reboot and we were back up ok, but there were some small number of builds that were made after the issue started. We purged them from the koji database and re-ran them with the current/valid/repaired filesystem. After that builders were brought back on-line and queued up builds processed and things were back to normal. So, some lessons/ideas here, in no particular order: 12TB means most anything you decide to do will take a while to finish. On the plus side that gives you lots of time to think about the next step. We should change the default on-error behavior to at least 'read-only' on that volume. Then errors would at least stop further corruption, and best prevent the need for a lengthy fsck. It's not entirely clear if the iSCSI errors would have made the fs hit error condition or not however. We could do better about more regular backups of this data. A daily snapshot and rsync off of that snapshot to backup storage could save us time in the event of another backup sync being needed. We would also then have the snapshot to go back to if needed Down the road Some of the cluster fses might be a good thing to investigate and transition to. If we can spread the backend storage around and have enough nodes, the failure of any one might not be as much impact. Perhaps we could add monitoring for iscsi errors and note and react to them quicker lvm and it's snapshots and ability to merge a snapshot back in as primary really helped us out here. Feel free to chime in with other thoughts or ideas. Hopefully it will be quite some time since we have another outage like this one.

12 years, 4 months

1
0
0 / 0

fedorahosted / collab plans and progress

by Kevin Fenzi

Greetings. Our nice timetable of events for migrating collab01/02 and hosted01 ran into a monkey wrench the other week when we had to replace our new hardware again. We've made progress on the new new hardware and here's where we stand: * I've installed a hosted02 on serverbeach06. This is a rhel5 clone of hosted01. This is mostly just to make sure if anything happens to hosted01 we are still able to bring something up quick. * I've installed a collab03 (sb08) and collab04 (sb09). I've set them up with a drdb device for /srv to keep them in sync. The idea is that collab03 will be primary and in case anything fails, collab04 will be there to bring up quickly. * I plan to install a hosted03 later today. (rhel6, hosted01 replacement). I'd like to get things moved sooner rather than later. Also, we are starting to push up against the holiday season. So, I am proposing: 2011-12-12 (1 week away): collab move. (collab01 -> collab03) 2011-12-14 (next wed): hosted move. (hosted01 -> hosted03) Outstanding questions: 1. When do we want to move the lists.fedorahosted.org? on the 12th? the 14th? or some other time entirely? We are going to need to get collab03 able to handle the additional domain and data. 2. On hosted, do we want to still change all projects to use project.fedorahosted.org ? Or should we perhaps just wait on that change? We could do that change on a per project opt in basis as we move them to other instances, possibly with less disruption. On the other hand, doing it now means we are ready to move things down the road. From our testing on hosted03 before we had to rebuild, the rhel5/rhel6 upgrade seemed to go pretty smoothly from all I could tell. Any other gotchas, ideas, or things we could setup to test? If no one screams on timing, I will probibly announce the collab move later today and the hosted move on wed (giving people 1 weeks notice). kevin

12 years, 4 months

3
5
0 / 0

hosted03 / fedorahosted.org rhel6-test back

by Kevin Fenzi

I've re-setup the hosted03 machine, copied all data from hosted01, and run a convert script on it. If you would like to test against it and see your project and data and confirm all looks well for next weeks migration, simply add: 66.135.62.191 fedorahosted.org git.fedorahosted.org hg.fedorahosted.org svn.fedorahosted.org to your /etc/hosts and browse/use your scm as normal. Please let me know if you see any issues or concerns. note that from time to time I will re-sync the data from hosted01 and re-convert it. This means that sometimes the test instance will show unconverted data/trac errors. Also, note that any changes you make will be overwritten in the next sync. Happy testing. kevin.

12 years, 4 months

1
0
0 / 0

Plan for tomorrow's Fedora Infrastructure Meeting (2011-12-08)

by Kevin Fenzi

The infrastructure team will be having it's weekly meeting tomorrow 2011-12-08 at 1900 UTC in #fedora-meeting on the freenode network. Suggested topics (suggested by whom): * New folks introductions and Apprentice tasks. * serverbeach / collab / hosted status * ibiblio machines status * Upcoming Tasks/Items 2011-12-07 - announce hosted move and fedorahosted-announce list. 2011-12-08 - Fedora 14 end of life. 2011-12-12 - migrate to collab03/04 (evening) (tenative) 2011-12-14 - migrate hosted (tenative) 2011-12-23 to 2012-01-02 - rh shutdown week. 2012-01-10 - drop inactive maintainers from pkgdb acls. 2012-01-13 to 2012-01-15 - FUDCON blacksberg 2012-02-14 to 2012-02-28 - F17 Alpha Freeze 2012-02-31 - fas 0.8.11 final release. * Meeting tagged tickets: https://fedorahosted.org/fedora-infrastructure/report/10 * Open Floor Submit your agenda items, as tickets in the trac instance and send a note replying to this thread. More info here: https://fedoraproject.org/wiki/Infrastructure/Meetings#Meetings Thanks kevin

12 years, 4 months

1
1
0 / 0

Sysadmin / sysadmin-qa FAS group membership

by Josef Skladanka

Hello, I need to join Sysadmin-qa FAS group, in order to be able to manage our staging server. This is accessible via the bastion.fedoraproject.org, and my understading is, that I need to be in the Sysadmin FAS group (which is a prerequisite for sysadmin-qa), to be able to access that machine. Thank you Josef

12 years, 4 months

2
1
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

infrastructure December 2011