I am running a service in Fedora CommuniShift (planning to move it to Fedora production OpenShift instance in case it is relevant).
Can anybody please help me understand how to configure some monitoring for it? Is it possible to configure nagios.fedoraproject.org for it? Or is there any other recommended approach?
Basically, I would like to have some custom commands (checking if auth tokens are up-to-date, parsing a log file for specific errors, etc) and periodically run them on my deployed container. Or spawning a separate container in my project to run them. If they find any problem, I'd like to be notified via email.
Thank you, Jakub
On Sat, Jun 01, 2024 at 08:48:32PM GMT, Jakub Kadlcik wrote:
I am running a service in Fedora CommuniShift (planning to move it to Fedora production OpenShift instance in case it is relevant).
Can anybody please help me understand how to configure some monitoring for it? Is it possible to configure nagios.fedoraproject.org for it? Or is there any other recommended approach?
There's currently no monitoring setup for communishift items. Once it's moved to staging / production, nagios checks can be added.
Also, in our stg/prod clusters we have some simple monitoring like mailing you when a pod crashes or a build or cronjob fails.
Basically, I would like to have some custom commands (checking if auth tokens are up-to-date, parsing a log file for specific errors, etc) and periodically run them on my deployed container. Or spawning a separate container in my project to run them. If they find any problem, I'd like to be notified via email.
Another option there is to make some kind of health check, and have openshift monitor it and alert/take the app down if something was unhealthy. I guess that might not be what you want for transitory errors or where you don't want the app to stop working on some errors.
kevin
Thank you very much for the reply Kevin,
Another option there is to make some kind of health check, and have openshift monitor it and alert/take the app down if something was unhealthy.
I have limited experience with OpenShift so I am not sure what is possible or not but I've been reading about health checks and the documentation always mentioned restarting the "unhealthy" container instead of sending an email notification. That wouldn't be helpful for me.
There's currently no monitoring setup for communishift items. Once it's moved to staging / production, nagios checks can be added. Also, in our stg/prod clusters we have some simple monitoring like mailing you when a pod crashes or a build or cronjob fails.
Seems like the right course of action would be migrating to the production OpenShift instance. My questions regarding the migration process were answered in https://pagure.io/fedora-infrastructure/issue/11814 so I will just have to prioritize that. Then I will get back to you in regards to the Nagios configuration.
Thank you again, Jakub
On Sun, Jun 2, 2024 at 7:35 PM Kevin Fenzi kevin@scrye.com wrote:
On Sat, Jun 01, 2024 at 08:48:32PM GMT, Jakub Kadlcik wrote:
I am running a service in Fedora CommuniShift (planning to move it to Fedora production OpenShift instance in case it is relevant).
Can anybody please help me understand how to configure some monitoring
for
it? Is it possible to configure nagios.fedoraproject.org for it? Or is there any other recommended approach?
There's currently no monitoring setup for communishift items. Once it's moved to staging / production, nagios checks can be added.
Also, in our stg/prod clusters we have some simple monitoring like mailing you when a pod crashes or a build or cronjob fails.
Basically, I would like to have some custom commands (checking if auth tokens are up-to-date, parsing a log file for specific errors, etc) and periodically run them on my deployed container. Or spawning a separate container in my project to run them. If they find any problem, I'd like
to
be notified via email.
Another option there is to make some kind of health check, and have openshift monitor it and alert/take the app down if something was unhealthy. I guess that might not be what you want for transitory errors or where you don't want the app to stop working on some errors.
kevin
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
The User Workload monitoring stack is installed and available iirc on the Staging and Production Fedora clusters but not on Communishift (it probably could be turned on in Communishift too, will need to investigate). We just haven't started making use of it yet. See [1]. This will allow you to take ownership of the monitoring of your service
I did a POC a few years back and demoed the user workload monitoring stack, but didn't really get any interest at the time, it might be a little dated so best to follow the instructions [1]. I think will soon also get the Openshift monitoring stacks hooked into Zabbix with a prometheus exporter
- [1] https://docs.openshift.com/container-platform/4.15/observability/monitoring/...
On Thu, 6 Jun 2024 at 12:58, Jakub Kadlcik jkadlcik@redhat.com wrote:
Thank you very much for the reply Kevin,
Another option there is to make some kind of health check, and have openshift monitor it and alert/take the app down if something was unhealthy.
I have limited experience with OpenShift so I am not sure what is possible or not but I've been reading about health checks and the documentation always mentioned restarting the "unhealthy" container instead of sending an email notification. That wouldn't be helpful for me.
There's currently no monitoring setup for communishift items. Once it's moved to staging / production, nagios checks can be added. Also, in our stg/prod clusters we have some simple monitoring like mailing you when a pod crashes or a build or cronjob fails.
Seems like the right course of action would be migrating to the production OpenShift instance. My questions regarding the migration process were answered in https://pagure.io/fedora-infrastructure/issue/11814 so I will just have to prioritize that. Then I will get back to you in regards to the Nagios configuration.
Thank you again, Jakub
On Sun, Jun 2, 2024 at 7:35 PM Kevin Fenzi kevin@scrye.com wrote:
On Sat, Jun 01, 2024 at 08:48:32PM GMT, Jakub Kadlcik wrote:
I am running a service in Fedora CommuniShift (planning to move it to Fedora production OpenShift instance in case it is relevant).
Can anybody please help me understand how to configure some monitoring
for
it? Is it possible to configure nagios.fedoraproject.org for it? Or is there any other recommended approach?
There's currently no monitoring setup for communishift items. Once it's moved to staging / production, nagios checks can be added.
Also, in our stg/prod clusters we have some simple monitoring like mailing you when a pod crashes or a build or cronjob fails.
Basically, I would like to have some custom commands (checking if auth tokens are up-to-date, parsing a log file for specific errors, etc) and periodically run them on my deployed container. Or spawning a separate container in my project to run them. If they find any problem, I'd like
to
be notified via email.
Another option there is to make some kind of health check, and have openshift monitor it and alert/take the app down if something was unhealthy. I guess that might not be what you want for transitory errors or where you don't want the app to stop working on some errors.
kevin
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
-- _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue