On 04/23/2018 05:00 PM, Randy Barlow wrote:
I would like to deploy Bodhi 3.6.1[0,1] to production
deployed to production last night and ran fine for a while but
eventually caused an outage, so the traffic was sent back to the 3.5.2 VMs.
This morning Patrick and I got some quality log tailing going and sent
traffic back to OpenShift on the same deployment that had the issue last
night and things seemed to handle the load just fine. In fact, the
constant HTTP 500s that the VMs were sending do not seem to happen in
OpenShift. We watched it for a while and it seemed stable.
Patrick noted that both times Bodhi had failed in OpenShift happened to
be when Bodhi was running composes. We asked Mohan to kick off some
composes, and he started one this morning (well, morning for me anyway).
I noticed that Bodhi's /composes/ endpoint went from loading in about
0.7 s to about 3-6 s from my house with one compose running. I believe
the time this endpoint takes to load might be approximately linear with
the number of composes it needs to serialize.
If 1 compose can take up to 6 seconds to load, 8 composes could easily
take more than 30 seconds in my estimation, and OpenShift was configured
to use this endpoint as a health check. We originally had the health
check on Bodhi's home page, but due to the dogpile cache locks sometimes
the home page could take a long time to load in the past. However, the
home page was also the source of the constant HTTP 500's in Bodhi 3.6.0.
3.6.1 was a refactor of many things, including home page caching. Now
that we see 0 500's on the home page, and now that the home page cache
is warmed up during worker start, it consistently loads in about 0.7 s.
Today we configured OpenShift to again monitor Bodhi's home page. Now
that the home page consistently loads quickly, it should pass the health
check consistently too. I will be monitoring Bodhi throughout the day.
Hopefully this was the last hurdle to having it work in OpenShift!