Finally something I feel qualified to comment on (I own Openshift at $DAYJOB these days, and am more than happy to help out!)

On Mon, Jun 15, 2020 at 6:54 PM Kevin Fenzi <kevin@scrye.com> wrote:
Greetings everyone.

* Before we had a staging openshift with staging applications in it.
This is sort of not how openshift is designed to work. In the ideal
openshift world you don't need staging, you just have enough tests and
CI and gradual rollout of new versions so everything just works.
Granted a staging openshift cluster is useful to ops folks to test
upgrades and try out things, and it's useful for developers in our case
to get all the parts setup right in ansible to deploy their application.
So, what do you think? should we setup a staging cluster as before?
Or shall we try and just use the one production cluster for staging and
prod?

At $DAYJOB, we do the latter. All of our tenants (applications) only have access to our production clusters. We do have development and staging clusters, however they are just for us, the infrastructure team, to test upgrades and whatnot on. All "real" workloads run on our production clusters.

One bad thing with this setup could be that sometimes you don't see issues until they hit production if there's weird interactions that you can't test for. This can be *somewhat* resolved by having your "heavy hitter" applications (which I've been out of touch with Fedora infra for so long that I'm not sure if we have any in Openshift) test the upgrade in conjunction with the platform team. This of course involves having willing guinea pigs :).

* Another question is openshift 4. Openshift 3.11 is supported until
june of 2022, so we have some time, but do we want to or need to look at
moving to openshift 4 for our clusters? One thing I hate about this is
that you must have 3 master nodes, and the only machines we have are big
powerfull virthost servers, so it's very wastefull of resources to
deploy a openshift 4 cluster (with the machines we have currently
anyhow).

Yes, you want Openshift 4! (In fact, I thought that the move meant you folks were moving to 4, guess not!). A lot of what you stated is accurate but no different than with 3.11. You needed 3 masters in 3.11 as well for a resilient setup. The nice thing with Openshift 4 that you couldn't do with 3 is scheduling user workloads on the masters. The masters MUST run RHCOS, and the worker nodes I would HIGHLY recommend run RHCOS. There is work underway (it didn't make 4.3 as mentioned, and frankly I'm not sure of its status in 4.4) to allow 3 node clusters (https://github.com/openshift/enhancements/blob/master/enhancements/compact-clusters.md) - currently the minimum viable cluster is 5 nodes. That said, there's nothing that says those 5 nodes have to be bare metal - at the extreme, I have a 5 node cluster running entirely on my desktop (a Xeon W-2155 w/192GB RAM, but I digress....). I'd run the masters on 3 different virthosts if possible, depending on the workload they don't actually have to be that big (i have mine at 4x16, but it's mainly a test cluster)

* In our old staging env we had a subset of things. Some of them we used
the staging instances all the time, others we almost never did. I'm not
sure we have the resources to deploy a 100% copy of our prod env, but
assuming we did, where should we shoot for on the line? ie, between 100%
duplicate of prod or nothing?

I really think it's up to the folks that run the service if a staging environment is useful to them or not. I'd imagine for some things it would be extraordinarily useful, and for others a waste of resources. I think that the individual service owners are in the best position to make that determination.