Done. See BZ 834019.
Thanks,
Andreas
On Wed, Jun 20, 2012 at 3:50 PM, John Mazzitelli mazz@redhat.com wrote:
Andreas,
Can you write a BZ on this, explaining the problem (as you did here) and attach your patch to it? I'd like to track this. I might have time to look at this myself.
Thanks, John
----- Original Message -----
We have a somewhat particular RHQ setup where we monitor a large number of resources remotely from a single agent. Par agent, we have +/- 25000 scheduled measurements with +/- 1500 measurement collected per minute. Since most of the metrics are collected with the same interval (10 minutes), this causes the following problem: when the agent is started (t=0), it will schedule all these metrics in the same interval [0s,30s]. However, because of the large number of measurements, the agent is not able to collect all of them in that 30s interval and will reschedule the remaining ones to the next interval in the original schedule, i.e. to [10m,10m+30s]. The same thing again happens in the interval [10m,10m+30s] and most of the measurements are rescheduled to the next interval [20m,20m+30s] and so forth. This means that some metrics are never collected (and are reported as "late" in the metrics of the RHQ agent).
Note that the issue only occurs after restarting the agent. When the resources are originally added to the inventory, the corresponding measurement schedules are spread more or less randomly and the agent is able to collect all of them.
To solve that issue with RHQ 3.0, I applied the following patch:
Index: src/main/java/org/rhq/core/pc/measurement/MeasurementManager.java =================================================================== --- src/main/java/org/rhq/core/pc/measurement/MeasurementManager.java (revision 141630) +++ src/main/java/org/rhq/core/pc/measurement/MeasurementManager.java (revision 141631) @@ -484,6 +484,13 @@ this.scheduledRequests.offer(scheduledMeasurement); } }
- public synchronized void
reschedule(Set<ScheduledMeasurementInfo> scheduledMeasurementInfos, long interval) {
- for (ScheduledMeasurementInfo scheduledMeasurement :
scheduledMeasurementInfos) {
scheduledMeasurement.setNextCollection(scheduledMeasurement.getNextCollection()
- interval);
- this.scheduledRequests.offer(scheduledMeasurement);
- }
- }
/** * Sends the given measurement report to the server, if this plugin container has server services that it can Index: src/main/java/org/rhq/core/pc/measurement/MeasurementCollectorRunner.java ===================================================================
src/main/java/org/rhq/core/pc/measurement/MeasurementCollectorRunner.java (revision 141630) +++ src/main/java/org/rhq/core/pc/measurement/MeasurementCollectorRunner.java (revision 141631) @@ -71,7 +71,7 @@ log.debug("Measurement collection is falling behind... Missed requested time by [" + (System.currentTimeMillis() - requests.iterator().next().getNextCollection()) + "ms]");
- this.measurementManager.reschedule(requests);
- this.measurementManager.reschedule(requests,
30000L); return report; }
The idea is that instead of rescheduling the measurement according to the original schedule (e.g. from [0s,30s] to [10m,10m+30s]), it should simply be rescheduled to the next interval (from [0s,30s] to [30s,60s]).
We are currently in the process of upgrading to RHQ 4.4. I didn't test the patch with that version yet, but after looking at the code I think it is still applicable. I would like to get some feedback about the approach: is it a valid way to solve the issue or are there better ways to do that?
Andreas _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/rhq-users
rhq-users mailing list rhq-users@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/rhq-users