state of the infra ansible, cron job and roadmap

Kevin Fenzi kevin at scrye.com
Wed Jan 8 20:10:40 UTC 2014


Greetings. 

So, over the holiday break I did some massive cleanup on our ansible
repo. I took an initial patch from janeznemanic to fix old syntax and
went from there. I got all the depreciated syntax fixed (there might be
some small amount of stray ones). I also moved accelerate into
global.yml, so it should apply to all playbooks. The needed package
and firewall port should be set in the kickstarts now. 

Next I took a simple script to run --check --diff on each host and
group playbook and got it up and running. It takes about an hour to run
against our host/group playbooks when it's run one at a time. We could
just fire them all off but that might swamp lockbox01.

Ideally, what I would like to see from a run of this script is all
hosts/groups reachable and 0 items changed. This is the state we should
strive for. ;) 

* The following hosts are unreachable: 

209.132.184.158 (see jenkins note below)
209.132.184.209 (see jenkins note below)
arm03-packager01.arm.fedoraproject.org (will fix)
arm03-packager02.arm.fedoraproject.org (will fix)
arm03-qa01.arm.fedoraproject.org (will fix)
buildvm-27.phx2.fedoraproject.org (test buildvm, expected down)
jenkins-cloud
jenkins-slaves (these look to need a bit of tweaking)
lists-dev.cloud.fedoraproject.org (is up, but / is 100% full)
mailman01.stg.phx2.fedoraproject.org
releng01.phx2.fedoraproject.org (is down since we don't have a branched
right now)

* The following hosts have changed > 0:

209.132.184.144
209.132.184.153
209.132.184.157
arm03-qa00.arm.fedoraproject.org
arm03-qa02.arm.fedoraproject.org
arm03-qa03.arm.fedoraproject.org
arm03-releng00.arm.fedoraproject.org
arm03-releng01.arm.fedoraproject.org
arm03-releng02.arm.fedoraproject.org
arm03-releng03.arm.fedoraproject.org
backup03.phx2.fedoraproject.org
beaker01.qa.fedoraproject.org
bkernel01.phx2.fedoraproject.org
bkernel02.phx2.fedoraproject.org
buildvm-01.phx2.fedoraproject.org
buildvmhost-10.phx2.fedoraproject.org
buildvmhost-11.phx2.fedoraproject.org
buildvmhost-12.phx2.fedoraproject.org
bvirthost07.phx2.fedoraproject.org
copr-be-dev.cloud.fedoraproject.org
copr-fe-dev.cloud.fedoraproject.org
db02.stg.phx2.fedoraproject.org
docs-backend01.phx2.fedoraproject.org
fedocal01.phx2.fedoraproject.org
fedocal01.stg.phx2.fedoraproject.org
fedocal02.phx2.fedoraproject.org
gallery01.stg.phx2.fedoraproject.org
kernel01.qa.fedoraproject.org
kernel02.qa.fedoraproject.org
keys01.fedoraproject.org
mailman01.stg.phx2.fedoraproject.org
notifs-backend01.stg.phx2.fedoraproject.org
notifs-web01.stg.phx2.fedoraproject.org
notifs-web02.stg.phx2.fedoraproject.org
nuancier01.phx2.fedoraproject.org
nuancier01.stg.phx2.fedoraproject.org
nuancier02.phx2.fedoraproject.org
nuancier02.stg.phx2.fedoraproject.org
releng02.phx2.fedoraproject.org
taskotron-dev01.qa.fedoraproject.org
virthost15.phx2.fedoraproject.org

I'll work with others to get those all fixed up in the coming weeks. 

That said, how do we want to run our non manual ansible jobs? 

a) run a --check --diff once a day and yell about unreachable or
changed>0
(I could commit this now)

b) just run them once a day and yell about anything that changes. 
(I could commit this now)

c) Trigger them on git commits. 
This would take work to figure out what was affected by the commit,
or just fire off a run of everything. 

d) setup some file somewhere that can be created by sysadmin group and
a cron job picks it up and runs the next time it runs. This would allow
someone to commit something, schedule a run and give a bit of time for
someone to notice a problem with it before it does. 

Thoughts?

As far as roadmap for migration:

I'm going to try and work on splitting out everything that is still on
app* servers to their own ansible instances. Once the app servers are
fully migrated we can tackle proxy*, then virthosts, then various
singletons. Then we can see where we are, and work a final push to get
everything left moved over. ;) 

kevin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/infrastructure/attachments/20140108/448318f1/attachment-0001.sig>


More information about the infrastructure mailing list