Availability scan for Process Scan

List overview All Threads
Download

newer

older

Script Server not working for...

Alert priority format

Charles.LEOW/Systems Integration

3 Jul 2012 3 Jul '12

2:51 a.m.

Hello,

We've setup three nodes with RHEL5.3 and same hardware specifications. One for RHQ server and remaining two for RHQ agents to monitor mongoDB.

mongoDB is monitored using Process resource type with Pid File and PIQL query types. Each query type is setup on a different node. However we found that the availability scan for both agents does not respond in a decent time. In fact sometimes it takes more than 30 minutes to reflect the actual availability of mongoDB. We're using default configuration in agent. The value for rhq.agent.plugins.availability-scan.period-secs is also default (30 seconds).

Anyone encounter this problem before?

Thank you.

Regards,

Charles

RHQ Server

[root@mrbtds1 ~]# uname -a

Linux mrbtds1.waridtel.com 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

You have new mail in /var/spool/mail/root

[root@mrbtds1 ~]# uname -a

Linux mrbtds1.waridtel.com 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

[root@mrbtds1 ~]# cat /etc/redhat-release

Red Hat Enterprise Linux Server release 5.3 (Tikanga)

RHQ Agent 1

[root@swdrcm1 ~]# uname -a

Linux swdrcm1.waridtel.com 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

[root@swdrcm1 ~]# cat /etc/redhat-release

Red Hat Enterprise Linux Server release 5.3 (Tikanga)

[root@swdrcm1 ~]#

RHQ Agent 2

[root@swdrcm2 ~]# uname -a

Linux swdrcm2.waridtel.com 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

[root@swdrcm2 ~]# cat /etc/redhat-release

Red Hat Enterprise Linux Server release 5.3 (Tikanga)

[root@swdrcm2 ~]#

Attachments:

attachment.html (text/html — 7.0 KB)

Show replies by date

John Mazzitelli

3 Jul 3 Jul

5:48 a.m.

What version of RHQ? (RHQ 4.4 introduced alot of changes to the availability scanning stuff).

After you get the initial availability scan, does it update fairly quickly thereafter? If so, could it be a startup issue? (maybe it took a very long time for your agents to start, register, download plugins, start the plugin container and begin avail scanning? If the boxes are heavily loaded, perhaps it takes a long time? I realize 30m would be extremely long (and I can't say I've ever heard of the agent taking 30m to do all that) but the question remains - does this only happen on startup of the agent? Or does it take 30m to report ANY availability change while the agent is running.

I haven't looked at the Process resource type and its resource component code in a while, look in the agent logs and see if there are any log messages regarding errors happening with that plugin. (you should run the agent in debug mode to see if that would be more verbose).

What about everything else about the agent? Is it working OK (all other resources respond quickly? All avail statuses and metrics coming in OK?)

----- Original Message -----

...

We’ve setup three nodes with RHEL5.3 and same hardware specifications. One for RHQ server and remaining two for RHQ agents to monitor mongoDB.

mongoDB is monitored using Process resource type with Pid File and PIQL query types. Each query type is setup on a different node. However we found that the availability scan for both agents does not respond in a decent time. In fact sometimes it takes more than 30 minutes to reflect the actual availability of mongoDB. We’re using default configuration in agent. The value for rhq.agent.plugins.availability-scan.period-secs is also default (30 seconds).

Anyone encounter this problem before?

Lukas Krejci

7:41 a.m.

I think the issue might lie in the fact that the ProcessInfo class that the ProcessComponent relies on when gathering info about the process is not used properly.

Some time ago I was dealing with a similar problem in the Apache plugin that needs accurate process info at any given time and it, too, was not getting accurate info when apache was restarted. Unlike the ProcessComponent, though, it was getting the info from ResourceContext.getNativeProcess() call.

I fixed that in the ResourceContext class by issuing ProcessInfo.refresh() before we check for ProcessInfo.isRunning() - because isRunning() might return stale data.

Now I only fixed that in ResourceContext class which the ProcessComponent (i.e. the component responsible for handling the Process resource type) doesn't use to determine the process state.

So I think we need to fix the ProcessComponent in a similar way I did the ResourceContext.getNativeProcess().

Charles, would you mind creating a bugzilla for this so that we can track it?

https://bugzilla.redhat.com/enter_bug.cgi?product=RHQ%20Project

Thanks,

Lukas

On Tuesday, July 03, 2012 08:48:33 John Mazzitelli wrote:

...

What version of RHQ? (RHQ 4.4 introduced alot of changes to the availability scanning stuff).

After you get the initial availability scan, does it update fairly quickly thereafter? If so, could it be a startup issue? (maybe it took a very long time for your agents to start, register, download plugins, start the plugin container and begin avail scanning? If the boxes are heavily loaded, perhaps it takes a long time? I realize 30m would be extremely long (and I can't say I've ever heard of the agent taking 30m to do all that) but the question remains - does this only happen on startup of the agent? Or does it take 30m to report ANY availability change while the agent is running.

I haven't looked at the Process resource type and its resource component code in a while, look in the agent logs and see if there are any log messages regarding errors happening with that plugin. (you should run the agent in debug mode to see if that would be more verbose).

What about everything else about the agent? Is it working OK (all other resources respond quickly? All avail statuses and metrics coming in OK?)

----- Original Message -----

...
We’ve setup three nodes with RHEL5.3 and same hardware specifications. One for RHQ server and remaining two for RHQ agents to monitor mongoDB.

mongoDB is monitored using Process resource type with Pid File and PIQL query types. Each query type is setup on a different node. However we found that the availability scan for both agents does not respond in a decent time. In fact sometimes it takes more than 30 minutes to reflect the actual availability of mongoDB. We’re using default configuration in agent. The value for rhq.agent.plugins.availability-scan.period-secs is also default (30 seconds).

Anyone encounter this problem before?

rhq-users mailing list rhq-users@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/rhq-users

Charles.LEOW/Systems Integration

4 Jul 4 Jul

2:19 a.m.

Hi Lukas,

I'll log a case for this issue. Mind if I ask when will the fix be ready? BTW, I've also tested with Script Server (plugin) resource type to monitor mongoDB. It doesn't work at all. Do you have any idea? I'll write a separate thread to seek advice. Please post your feedback/comment as well.

Thank you.

Regards, Charles Leow

-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of Lukas Krejci Sent: Tuesday, July 03, 2012 10:41 PM To: rhq-users@lists.fedorahosted.org Subject: Re: Availability scan for Process Scan

I think the issue might lie in the fact that the ProcessInfo class that the ProcessComponent relies on when gathering info about the process is not used properly.

I fixed that in the ResourceContext class by issuing ProcessInfo.refresh() before we check for ProcessInfo.isRunning() - because isRunning() might return stale data.

Now I only fixed that in ResourceContext class which the ProcessComponent (i.e. the component responsible for handling the Process resource type) doesn't use to determine the process state.

So I think we need to fix the ProcessComponent in a similar way I did the ResourceContext.getNativeProcess().

Charles, would you mind creating a bugzilla for this so that we can track it?

https://bugzilla.redhat.com/enter_bug.cgi?product=RHQ%20Project

Thanks,

Lukas

On Tuesday, July 03, 2012 08:48:33 John Mazzitelli wrote:

...

What version of RHQ? (RHQ 4.4 introduced alot of changes to the availability scanning stuff).

After you get the initial availability scan, does it update fairly quickly thereafter? If so, could it be a startup issue? (maybe it took a very long time for your agents to start, register, download plugins, start the plugin container and begin avail scanning? If the boxes are heavily loaded, perhaps it takes a long time? I realize 30m would be extremely long (and I can't say I've ever heard of the agent taking 30m to do all that) but the question remains - does this only happen on startup of the agent? Or does it take 30m to report ANY availability change while the agent is running.

I haven't looked at the Process resource type and its resource component code in a while, look in the agent logs and see if there are any log messages regarding errors happening with that plugin. (you should run the agent in debug mode to see if that would be more verbose).

What about everything else about the agent? Is it working OK (all other resources respond quickly? All avail statuses and metrics coming in OK?)

----- Original Message -----

...
We’ve setup three nodes with RHEL5.3 and same hardware specifications. One for RHQ server and remaining two for RHQ agents to monitor mongoDB.

mongoDB is monitored using Process resource type with Pid File and PIQL query types. Each query type is setup on a different node. However we found that the availability scan for both agents does not respond in a decent time. In fact sometimes it takes more than 30 minutes to reflect the actual availability of mongoDB. We’re using default configuration in agent. The value for rhq.agent.plugins.availability-scan.period-secs is also default (30 seconds).

Anyone encounter this problem before?

rhq-users mailing list rhq-users@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/rhq-users

_______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/rhq-users

Charles.LEOW/Systems Integration

2:16 a.m.

Hello John,

I've tested a few times with the same result. Monitoring mongoDB using Process (plugin) resource type with PIQL and PID file is very slow. Suspect there is a bug with Process plugin and will log a case for this issue as suggested by Lukas Krejci.

BTW, just share with everyone that I've also tested with Script Server (plugin) resource type to monitor mongoDB. It doesn't work at all. I'll write a separate thread to seek advice. Please post your comments as well.

What version of RHQ? rhq-server-4.4.0

After you get the initial availability scan, does it update fairly quickly thereafter? The initial availability scan is almost instantaneous. But the availability update thereafter is very slow...

If so, could it be a startup issue? No because the availability update thereafter is very slow...

maybe it took a very long time for your agents to start, register, download plugins, start the plugin container and begin avail scanning? Still doesn't make sense. The availability update for Tomcat is fast for initial and thereafter.

If the boxes are heavily loaded, perhaps it takes a long time? No. The 2 agent machines are all new and load is low.

I realize 30m would be extremely long (and I can't say I've ever heard of the agent taking 30m to do all that) but the question remains - does this only happen on startup of the agent? No. After agent startup the availability update (Goes Up and Goes Down) for mongoDB is still very slow.

What about everything else about the agent? all other resources respond quickly? All avail statuses and metrics coming in OK? Availability update and metrics for Tomcat is quick.

Thank you.

Regards, Charles

-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of John Mazzitelli Sent: Tuesday, July 03, 2012 8:49 PM To: rhq-users@lists.fedorahosted.org Subject: Re: Availability scan for Process Scan

What version of RHQ? (RHQ 4.4 introduced alot of changes to the availability scanning stuff).

What about everything else about the agent? Is it working OK (all other resources respond quickly? All avail statuses and metrics coming in OK?)

----- Original Message -----

...

We’ve setup three nodes with RHEL5.3 and same hardware specifications. One for RHQ server and remaining two for RHQ agents to monitor mongoDB.

mongoDB is monitored using Process resource type with Pid File and PIQL query types. Each query type is setup on a different node. However we found that the availability scan for both agents does not respond in a decent time. In fact sometimes it takes more than 30 minutes to reflect the actual availability of mongoDB. We’re using default configuration in agent. The value for rhq.agent.plugins.availability-scan.period-secs is also default (30 seconds).

Anyone encounter this problem before?

_______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/rhq-users

4377

Age (days ago)

4378

Last active (days ago)

rhq-users@lists.fedorahosted.org

4 comments

3 participants

tags (0)

participants (3)

Charles.LEOW/Systems Integration
John Mazzitelli
Lukas Krejci