builders of the future!!!!!
by Seth Vidal
The discussion on devel list about ARM and my work last week on
reinstalling builders quickly and commonly has raised a number of
issues with how we manage our builders and how we should manage them in
the future.
It is apparent that if we add arm builders they will be lots of
physical systems (probably in a very small space) but physical,
none-the-less. So we need a sensible way to manage and reinstall these
hosts commonly and quickly.
Additionally, we need to consider what the introduction of a largish
number of arm builders (and other arm infrastructure) would do to our
existing puppet setup. Specifically overloading it pretty badly and
making it not-very-manageable.
I'm making certain assumptions here and I'd like to be clear about what
those are:
1. the builders need to be kept pristine
2. that currently our builders are not freshly installed frequently
enough.
3. that the builders are relatively static in their
configuration and most changes are done with pkg additions
4. that builder setups require at least two manual-ish steps of a koji
admin who can disable/enable/register the builder with the kojihub.
5. that the builders are fairly different networking and setup-wise to
the rest of our systems.
So I am proposing that we consider the following as a general process
for maintaining our builders:
1. disable the builder in koji
2. make sure all jobs are finished
3. add installer entries into grub (or run the undefine, reinstall
process if the builder is virt-based)
4. reinstall the system
5. monitor for ssh to return
6. connect in and force our post-install configuration: identification,
network, mount-point setup, ssl certs/keys for koji, etc
7. reboot
8. re-enable host in koji
We would do this with frequency and regularity. Perhaps even having
some percentage of our builders doing this at all times. Ie: 1/10th of
the boxes reinstalling at any given moment so in a certain time
frame*10 all of them are reinstalled.
Additionally, this would mean these systems would NOT have a puppet
management piece at all. Package updates would still be handled
by pushes as we do now, if things were security critical, but barring
the need for significant changes we could rely on the boxes simply being
refreshed frequently enough that it wouldn't need to be pushed.
What do folks think about this idea? It would dramatically reduce the
node entries in our puppet config, it would drop the number of hosts
connecting to puppet, too. It will mean more systems being reinstalled
and more often. It will also require some work to make the steps I
mention above be automated. I think I can achieve that without too much
difficulty, actually. I think, in general, it will increase our ability
to scale up to more and more builders.
I'd like input, constructive, please.
Thanks,
-sv
11 years, 2 months
qa machine management
by Kevin Fenzi
Greetings.
Just had a talk with tflink on IRC about the management of the qa
network machines. Long ago when we setup those machines we were
thinking we could use them as a testbed for bcfg2 to see if we wanted
to start using it or if it worked ok, etc. I setup a bcfg2 server to
try this with, but sadly have never found the time to even start
configuring it.
Machines involved:
virthost-comm01.qa (real hardware)
autoqa01.qa (guest)
autoqa-stg01.qa (guest)
lockbox-comm01.qa (guest)
bastion-comm01.qa (guest)
(someday we may add a sign-bridge-comm01 and sign-vault-comm01 to allow
secondary archs like ppc and arm to sign packages).
Options:
- Try and push forward with a bcfg2 setup on lockbox-comm01.qa and
evaluate it. This would be nice, but I'm really not sure anyone has
the time to do it.
- Just add all the above machines to our puppet repo and configure them
there and call it done. This would mean they wouldn't be seperate
from us and we just update and configure and monitor them like any
other machine.
- Try and work out some setup with ansible or the like to see if it
could manage them. Again, this would be a learning and tweaking
curve, so not sure we have the time.
- We could setup a new puppet for them on lockbox-comm01.qa and use
that to manage them. We could reuse a lot of our current puppet
setup, but it would still be a fair bit of work to get it all
configured.
Thoughts? Brilliant ideas?
kevin
11 years, 3 months
Fedora 17 Beta Freeze in effect.
by Kevin Fenzi
Greetings.
we are now in the infrastructure freeze leading up to the Fedora 17
Beta release. This is a pre-release freeze.
Please see:
https://fedorahosted.org/fedora-infrastructure/browser/architecture/Envir...
Anything in the Pre Release freeze box is frozen until 2012-04-03 (or
later if Beta slips). This means there should be NO puppet changes to
any hosts in there (including global ones) without signoff of the
change from at least 2 folks in sysadmin-main and/or
release-engineering.
Thanks,
kevin
11 years, 5 months
GSoC idea to setup Gitlab for Fedora Hosted, looking for mentors
by Stas Sușcov
Hi,
I'm not sure I'm allowed to write to this list for GSoC questions,
but it looks like most of the people interested in this application can be
found only here, so I took a chance.
I'm a student from Romania, looking to hack some Ruby code during this
summer
and I found Fedora as one of the not so many organizations offering this
opportunity.
I read the whole thread, and I found suggestions and conditions you need
from a student, pleasant for myself.
I know Ruby (among other programming languages) and I know sysadmining.
Previously I had 2 successful GSoC editions with WordPress Foundation,
and my first year project latest commit dates March 14th, this year.
The only problem I can for-see, is that I had and still maintain deep
involvement in Ubuntu community (I'm the infrastructure admin for
ubuntu-ro).
I don't know if this is an issue or not, but I really hope this wont
affect in any way our relationship (here in Romania, we do Barcamps every
year with fedora-ro, with beer, hiking and lots of pics, thx to @nicubunu
http://camp.softwareliber.ro/2011/poze ). :)
If all this stuff sounds interesting, I would be happy to present a draft
on how we could solve FedoraHosted transition to Gitlab, and ensure its
well maintained from now on.
Lately I started to hang on #fedora-summer-coding|#jbosstesting@freenode,
but I'm not sure whom I should query.
I can't see Dan either online.
I also have a Github account: https://github.com/stas
and a resume: http://stas.github.com/resume.html
Thanks in advance for reading this, and I'm looking forward for your reply.
P.S.: I will also be ok if you would like to find a fedora guy for this
project, if so, just let me know, I won't mind. Seriously!
11 years, 6 months
Re: infrastructure Digest, Vol 70, Issue 37
by Jeffrey S. Haemer
> (1) Would it be okay if I downloaded a single module from the repo?
>
> I suppose it would be, but you don't need the entire git history right?
> Just the current files? I'm pretty sure our head revision is ok and
> doesn't contain much sensitive.
>
Yup. Don't even want the entire repo. Just enough for (3), below.
> > (2) Could someone suggest a particularly "typical" repo to play with?
> >
> > Even better would be three -- one very simple, one typical, and one
> > complex -- but I'd be perfectly content to start with one.
>
> Well, for complex our httpd module has often been messy seeming to me.
> The glusterfs one I added recently might be middling complex. It
> doesn't have much in it, but it uses templates and some other things.
> Something like askbot would be a simple one I think.
>
Thanks.
(3)
> I'm interested to hear what puppet lint says about our stuff. ;)
>
Eyup.
--
Jeffrey Haemer <jeffrey.haemer(a)gmail.com>
720-837-8908 [cell], http://seejeffrun.blogspot.com [blog],
http://www.youtube.com/user/goyishekop [vlog]
*פרייהייט? דאס איז יאַנג דינען וואָרט.*
11 years, 6 months
Request
by Jeffrey S. Haemer
I'm a Fedora apprentice, interested in Puppet.
I've been looking through the puppet repo on lockbox01 to try to learn
what's there. I've been looking through them on lockbox01 itself.
I think it would be useful to try running puppet-lint on a module or two.
It seems like it would be easiest to do this on my own machine, since that
way I wouldn't have to persuade someone else to install puppet-lint and its
transitive dependencies on an official, Fedora resource.
So, here are my questions:
(1) Would it be okay if I downloaded a single module from the repo?
(2) Could someone suggest a particularly "typical" repo to play with?
Even better would be three -- one very simple, one typical, and one complex
-- but I'd be perfectly content to start with one.
--
Jeffrey Haemer <jeffrey.haemer(a)gmail.com>
720-837-8908 [cell], http://seejeffrun.blogspot.com [blog],
http://www.youtube.com/user/goyishekop [vlog]
*פרייהייט? דאס איז יאַנג דינען וואָרט.*
11 years, 6 months
Plan for tomorrow's Fedora Infrastructure meeting (2012-03-29)
by Kevin Fenzi
The infrastructure team will be having it's weekly meeting tomorrow
2012-03-29 at 20:00 UTC in #fedora-meeting on the freenode network.
Suggested topics:
#topic New folks introductions and Apprentice tasks.
If any new folks want to give a quick one line bio or any apprentices
would like to ask general questions, they can do so here.
#topic two factor auth status
#topic Staging re-work status
#topic Applications status / discussion
Check in on status of our applications: pkgdb, fas, bodhi, koji,
community, voting, tagger, packager, dpsearch, etc.
If there's new releases, bugs we need to work around or things to note.
#topic Upcoming Tasks/Items
#info 2012-03-20 to 2012-04-03 - F17 Beta Freeze
#info 2012-03-29 - take internetx01 out of rotation and power off
#info 2012-03-30 - 1:30am - run diag on internetx01.
#info 2012-04-01 - nag fi-apprentices.
#info 2012-04-03 - F17Beta release day
#info 2011-04-03 - gitweb-cache removal day.
#info 2012-04-10 - drop inactive fi-apprentices
#info 2012-04-24 to 2012-05-08 - F17 Final Freeze.
#info 2012-05-01 - nag fi-apprentices.
#info 2012-05-08 - F17 release
#topic Tickets from Ages past
In this topic we will dredge up old tickets, discuss them and decide if
we should do them, retarget them, close them or break them into smaller
tickets.
#topic Meeting tagged tickets:
https://fedorahosted.org/fedora-infrastructure/report/10
#topic Open Floor
Submit your agenda items, as tickets in the trac instance and send a
note replying to this thread.
More info here:
https://fedoraproject.org/wiki/Infrastructure/Meetings#Meetings
Thanks
kevin
11 years, 6 months
default user context on fedorapeople.org
by Seth Vidal
We are debating the default user context for fedorapeople.org:
Right now users are unconfined_t.
This would, ostensibly, let them do a lot. However, we have
fedorapeople set up to isolate user tempdirs and every place a user can
write to is mounted noexec,nosuid - so there is no place to run
anything that isn't already on the system.
We're wondering if we should move them to either:
user_t
or
guest_t
User_t sets:
X Windows Login and terminal login, nosetuid, noexec in homedir
As we have things currently configured this would not involve much in
the way of a change to how users can operate on fedorapeople.org
Guest_t sets:
Terminal login, nosetuid, nonetwork, noxwindows, noexec in homedir
X is not really an issue, obviously. So the big difference here is that
outbound network connections would not be allowed with guest_t.
The debate is really over network access. We know that some folks
tunnel through fedorapeople.org for irc and they login there to rsync
things to this space for personal hosting, etc.
So there are some legit reasons for outbound network connections.
However, it is not obvious that those reasons are within the scope
for what fedorapeople.org is supposed to be used.
And that is the more or less it - does anyone have any
suggestions/thoughts?
-sv
11 years, 6 months
Re: Mirror Issue
by Robyn Bergeron
Hi John,
Thanks for the note, much appreciated. :)
I've copied the infrastructure team on this mail - hopefully they can
help out with this issue. I'm also curious about the issue with Gnome
Updater Tool - is infra aware of this as a known issue?
Cheers!
-Robyn
On 03/05/2012 05:30 AM, John Mellor wrote:
> Hi Robyn,
>
> I'm not sure who is supposed to be managing the mirror table or the
> state of the mirrors, as there don't appear to be any docs on this
> important part of the Fedora infrastructure online. Can you pass this
> on to whomever is in charge of deciding the mirror sites?
>
> We need to get someone to either investigate why a particular pair of
> Fedora mirror sites have stopped mirroring. The following has been
> going on for weeks now, for all updates:
>
> http://fedora.mirror.nexicom.net/linux/updates/16/x86_64/squid-3.2.0.15-1...: [Errno 14] HTTP Error 404 - Not Found : http://fedora.mirror.nexicom.net/linux/updates/16/x86_64/squid-3.2.0.15-1...
> Trying other mirror.
> ftp://mirror.nexicom.net/pub/fedora/linux/updates/16/x86_64/squid-3.2.0.1...: [Errno 14] FTP Error 550 : ftp://mirror.nexicom.net/pub/fedora/linux/updates/16/x86_64/squid-3.2.0.1...
> Trying other mirror.
>
> Its not fatal for non-gui command-line users, as the next mirror in the
> list is behaving properly and yum does the right thing, but its deadly
> for the graphical Gnome Updater Tool (which does not seem to use
> fallback sites [possible bug], and simply fails). The end result is
> that people who use the graphical update tools have been seeing no
> security and other updates for weeks now, which is pretty dangerous.
>
> If Nexicom has just broken their refresh mechanism temporarily, can we
> give them a heads-up about the problem? If they are now broken
> permanently, can we get it removed from the list?
>
>
> Thanks,
>
> John Mellor
>
11 years, 6 months