state of the infra ansible, cron job and roadmap
kevin at scrye.com
Wed Jan 8 20:10:40 UTC 2014
So, over the holiday break I did some massive cleanup on our ansible
repo. I took an initial patch from janeznemanic to fix old syntax and
went from there. I got all the depreciated syntax fixed (there might be
some small amount of stray ones). I also moved accelerate into
global.yml, so it should apply to all playbooks. The needed package
and firewall port should be set in the kickstarts now.
Next I took a simple script to run --check --diff on each host and
group playbook and got it up and running. It takes about an hour to run
against our host/group playbooks when it's run one at a time. We could
just fire them all off but that might swamp lockbox01.
Ideally, what I would like to see from a run of this script is all
hosts/groups reachable and 0 items changed. This is the state we should
strive for. ;)
* The following hosts are unreachable:
126.96.36.199 (see jenkins note below)
188.8.131.52 (see jenkins note below)
arm03-packager01.arm.fedoraproject.org (will fix)
arm03-packager02.arm.fedoraproject.org (will fix)
arm03-qa01.arm.fedoraproject.org (will fix)
buildvm-27.phx2.fedoraproject.org (test buildvm, expected down)
jenkins-slaves (these look to need a bit of tweaking)
lists-dev.cloud.fedoraproject.org (is up, but / is 100% full)
releng01.phx2.fedoraproject.org (is down since we don't have a branched
* The following hosts have changed > 0:
I'll work with others to get those all fixed up in the coming weeks.
That said, how do we want to run our non manual ansible jobs?
a) run a --check --diff once a day and yell about unreachable or
(I could commit this now)
b) just run them once a day and yell about anything that changes.
(I could commit this now)
c) Trigger them on git commits.
This would take work to figure out what was affected by the commit,
or just fire off a run of everything.
d) setup some file somewhere that can be created by sysadmin group and
a cron job picks it up and runs the next time it runs. This would allow
someone to commit something, schedule a run and give a bit of time for
someone to notice a problem with it before it does.
As far as roadmap for migration:
I'm going to try and work on splitting out everything that is still on
app* servers to their own ansible instances. Once the app servers are
fully migrated we can tackle proxy*, then virthosts, then various
singletons. Then we can see where we are, and work a final push to get
everything left moved over. ;)
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 836 bytes
Desc: not available
More information about the infrastructure