modules/core/plugin-container/src/main/java/org/rhq/core/pc/PluginContainer.java
| 2
modules/core/plugin-container/src/main/java/org/rhq/core/pc/PluginContainerConfiguration.java
| 27
modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/AvailabilityExecutor.java
| 127 +--
modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/AvailabilityProxy.java
| 331 ++++++++++
modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/ForceAvailabilityExecutor.java
| 4
modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/ResourceContainer.java
| 48 +
modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/RuntimeDiscoveryExecutor.java
| 27
modules/core/plugin-container/src/test/java/org/rhq/core/pc/inventory/AvailabilityProxyConcurrencyTest.java
| 114 +++
modules/core/plugin-container/src/test/java/org/rhq/core/pc/inventory/AvailabilityProxyTest.java
| 157 ++++
modules/core/plugin-container/src/test/java/org/rhq/core/pc/inventory/ResourceContainerTest.java
| 2
modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentConfiguration.java
| 6
modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentConfigurationConstants.java
| 19
modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/i18n/AgentSetupInstructions.java
| 9
modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/promptcmd/SetupPromptCommand.java
| 7
modules/enterprise/agent/src/main/resources/agent-configuration.xml
| 11
modules/enterprise/agent/src/main/resources/log4j.xml
| 5
modules/plugins/jmx/src/main/java/org/rhq/plugins/jmx/JMXServerComponent.java
| 18
modules/plugins/rhq-agent/src/main/resources/META-INF/rhq-plugin.xml
| 1
modules/plugins/tomcat/src/main/java/org/jboss/on/plugins/tomcat/TomcatServerComponent.java
| 17
19 files changed, 800 insertions(+), 132 deletions(-)
New commits:
commit bbb18b759bfd3d7ecbb4f8f6ea6c0293211df815
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Tue Dec 3 17:03:55 2013 -0500
BZ 971556 async avail checking for all resources
Squashed commit of the following:
commit 9b165c2dff93a62aac2b71ea427926fc457e1ef0
Merge: 8452615 c954eda
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Tue Dec 3 16:58:48 2013 -0500
Merge remote-tracking branch 'origin/master' into jay-avail
commit 845261576ac892a029eac88b66ad7fd44d37d4f5
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Tue Dec 3 11:36:48 2013 -0500
Changes to AvailabilityProxy
- support test code tweaking the various configurable values.
- save 3 bytes per proxy by making the sync timeout limit a byte type
- add some commented out dev logging, to be removed later as desired
Work on AvailabilityProxyTest
- add testing for the new async timeout stuff
- add testing for the sync disable/enable
commit d816b80f5b9352d2b006d8fa386d513b905a415d
Merge: 2935dec 7fe7f7e
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Mon Dec 2 07:48:01 2013 -0500
Merge branch 'jay-avail' of
ssh://git.fedorahosted.org/git/rhq/rhq into
jay-avail
commit 2935dec7f2e19de84573342634c91865f49fe358
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Wed Nov 27 12:44:18 2013 -0500
get avail proxy test to pass. we no longer assume that we need to abort if the
first time we check if the future is done and it is not.
instead, we check the time when the future was submitted. if its been under a
certain time (1m by default), then
we just return the last avail known to have been returned. otherwise, we timeout.
commit b3c22b2c2ff738e18b790e6bcec3cdf95bcee54d
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Wed Nov 27 10:28:09 2013 -0500
fix up the test - this is still failing, but took out some unnecessary things
commit d8a53868d780dab5deb1a02d2e425948f2380b3f
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Wed Nov 27 10:27:02 2013 -0500
To reduce the memory footprint of the proxy, make logger static.
We don't want one logger instance per proxy.
commit 329596096f3a947acb16e322aa456bc394bf971e
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Tue Nov 26 18:05:38 2013 -0500
unit test that illustrates failure that is talked about in BZ 971556, comments #8
and 9.
this test is disabled for now, since it will fail.
commit ed30ced013679a8c5ba7c6840edae0245258fa66
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Tue Nov 26 15:15:00 2013 -0500
fix logging so its not too noisy
commit 60252bc3f5ccffcb0ab87ab026dd18e36ac985e0
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Mon Nov 25 17:34:07 2013 -0500
part of BZ 971556 - this adds avail scan threadpool setting to agent prefs
commit b9a7d25a955f5f0929e92a353b9ef4c3270c8529
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Mon Nov 25 15:20:07 2013 -0500
Fix jdoc for new avail thread pool
commit c30b41059d9dce8db855ada70947ea3b2a653eae
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Mon Nov 25 15:10:58 2013 -0500
Quiet the Tomcat plugin by switching some INFO logging to DEBUG
commit 1d0c8f48d4537afc01a02e8103eb8b8d3d398785
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Mon Nov 25 15:10:27 2013 -0500
Quiet the JMX plugin by:
- suffocating INFO messages generated by EMS's ConnectionFactory
- protecting against an NPE in JMXServerComponent.start when
connections couldn't be made
- dumping stack traces in in JMXServerComponent.star in debug only
commit 751857e881856349e47601b22598c36a57f9afcf
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Mon Nov 25 15:06:55 2013 -0500
calls to a component's getAvailbility() impl should return UP or DOWN. In
the case of a timeout on the initial getAvail call to the proxy, it will
convert an UNKNOWN to DOWN.
commit 621f5d0506167997018a701b4b5f15a92159857f
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Mon Nov 25 10:58:38 2013 -0500
Add logging for "enable on change to UP avail"
commit 76274cc629757ac8bfe6a913152570fba3a5f580
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Mon Nov 25 10:51:37 2013 -0500
change the prop names such that we use a consistent prefix but so that
an existing value for rhq.agent.plugins.availability-scan.timeout is not
used, because the past semantics are different. We don't want an accidental
override in an upgrade scenario.
commit df3e5f9327157908a173e0b28e15e8768eaafa6a
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Mon Nov 25 10:22:17 2013 -0500
Change the property names to re-purpose existing prop and have others conform
to the naming.
commit 2924ccaaac56e6994d075c049fea286f867b4a0c
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Fri Nov 22 17:29:54 2013 -0500
[971556] Better design for availability checking
Applying Elias's patch and updating with some additional
logic:
- use a fixed thread pool to limit thread creation exposure
- make things configurable
- allow newly UP resources to use sync avail
- add some logging and comments
commit 7fe7f7efb4d5754d8a0ebdccaa4604a450f3e0db
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Wed Nov 27 12:44:18 2013 -0500
get avail proxy test to pass. we no longer assume that we need to abort if the
first time we check if the future is done and it is not.
instead, we check the time when the future was submitted. if its been under a
certain time (1m by default), then
we just return the last avail known to have been returned. otherwise, we timeout.
commit ebba986c1b5beb93b16d6a96d0fa9048275d748c
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Wed Nov 27 10:28:09 2013 -0500
fix up the test - this is still failing, but took out some unnecessary things
commit bd327772cdf65afeed256c154fe4b10c7f6e62cb
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Wed Nov 27 10:27:02 2013 -0500
To reduce the memory footprint of the proxy, make logger static.
We don't want one logger instance per proxy.
commit b2539ee14e487ed58dbf1f041f13420db14965f7
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Tue Nov 26 18:05:38 2013 -0500
unit test that illustrates failure that is talked about in BZ 971556, comments #8
and 9.
this test is disabled for now, since it will fail.
commit 61bf3197f0d32f2c53e855eb074da662b2e23d1c
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Tue Nov 26 15:15:00 2013 -0500
fix logging so its not too noisy
commit ac1cfd0bf919df79b64848d651f7df31cf214019
Author: John Mazzitelli <mazz(a)redhat.com>
Date: Mon Nov 25 17:34:07 2013 -0500
part of BZ 971556 - this adds avail scan threadpool setting to agent prefs
commit f387ddc361cd525680a0253ab84618dac2d34700
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Mon Nov 25 15:20:07 2013 -0500
Fix jdoc for new avail thread pool
commit 8e7f97567327ba25c4a7023e99afd491ee4ec55b
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Mon Nov 25 15:10:58 2013 -0500
Quiet the Tomcat plugin by switching some INFO logging to DEBUG
commit 5389dbbfe4b927a603fb2d2369ac87edcc734679
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Mon Nov 25 15:10:27 2013 -0500
Quiet the JMX plugin by:
- suffocating INFO messages generated by EMS's ConnectionFactory
- protecting against an NPE in JMXServerComponent.start when
connections couldn't be made
- dumping stack traces in in JMXServerComponent.star in debug only
commit 61ba50c44f9a5fad9d9d12db9931e6f7a0c2fcb9
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Mon Nov 25 15:06:55 2013 -0500
calls to a component's getAvailbility() impl should return UP or DOWN. In
the case of a timeout on the initial getAvail call to the proxy, it will
convert an UNKNOWN to DOWN.
commit 602e706c1149637600b221b9fba3f29d6598a72d
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Mon Nov 25 10:58:38 2013 -0500
Add logging for "enable on change to UP avail"
commit 9ec2a048c48b7e156b6e4d067f7505f3ef750be3
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Mon Nov 25 10:51:37 2013 -0500
change the prop names such that we use a consistent prefix but so that
an existing value for rhq.agent.plugins.availability-scan.timeout is not
used, because the past semantics are different. We don't want an accidental
override in an upgrade scenario.
commit ac7a01f0376fbe873ede5b2750caa56b89bae1f3
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Mon Nov 25 10:22:17 2013 -0500
Change the property names to re-purpose existing prop and have others conform
to the naming.
commit 5f24fca42da6633287ad82c19f2dc4ea4296b8fc
Author: Jay Shaughnessy <jshaughn(a)redhat.com>
Date: Fri Nov 22 17:29:54 2013 -0500
[971556] Better design for availability checking
Applying Elias's patch and updating with some additional
logic:
- use a fixed thread pool to limit thread creation exposure
- make things configurable
- allow newly UP resources to use sync avail
- add some logging and comments
diff --git
a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/PluginContainer.java
b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/PluginContainer.java
index 17b1309..a6ff34a 100644
--- a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/PluginContainer.java
+++ b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/PluginContainer.java
@@ -307,7 +307,7 @@ public class PluginContainer {
mbean.register();
}
- ResourceContainer.initialize();
+ ResourceContainer.initialize(configuration);
pluginManager = new PluginManager();
pluginComponentFactory = new PluginComponentFactory();
diff --git
a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/PluginContainerConfiguration.java
b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/PluginContainerConfiguration.java
index 22f8150..36ba906 100644
---
a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/PluginContainerConfiguration.java
+++
b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/PluginContainerConfiguration.java
@@ -55,7 +55,7 @@ public class PluginContainerConfiguration {
public static final boolean WAIT_FOR_SHUTDOWN_SERVICE_TERMINATION_DEFAULT = true;
private static final String SHUTDOWN_SERVICE_TERMINATION_TIMEOUT = PROP_PREFIX +
"shutdown-service-termination-timeout";
public static final long SHUTDOWN_SERVICE_TERMINATION_TIMEOUT_DEFAULT = 5 * 60L; //
in seconds
-
+
// The following configuration settings have hardcoded default values. These defaults
are publicly
// accessible so the entity that embeds the plugin container can know what its
default values are.
@@ -85,6 +85,8 @@ public class PluginContainerConfiguration {
private static final String AVAILABILITY_SCAN_PERIOD_PROP = PROP_PREFIX +
"availability-scan-period";
// in seconds, should be the shortest avail collection interval allowed
public static final long AVAILABILITY_SCAN_PERIOD_DEFAULT = 30L;
+ public static final String AVAILABILITY_SCAN_THREADPOOL_SIZE_PROP =
"availability-scan-threadpool-size";
+ public static final int AVAILABILITY_SCAN_THREADPOOL_SIZE_DEFAULT = 100;
// Measurement ----------
@@ -347,10 +349,10 @@ public class PluginContainerConfiguration {
return ((Boolean)val).booleanValue();
}
}
-
+
/**
* Sets the flag to indicate whether to start the management bean of the plugin
container or not.
- *
+ *
* @see #isStartManagementBean()
* @param value
*/
@@ -440,6 +442,25 @@ public class PluginContainerConfiguration {
}
/**
+ * Returns the number of concurrent threads that can be scanning resource
availabilities.
+ *
+ * @return threadpool size used for thread pool that scans availabilities
+ */
+ public int getAvailabilityScanThreadPoolSize() {
+ Integer size = (Integer)
configuration.get(AVAILABILITY_SCAN_THREADPOOL_SIZE_PROP);
+ return (size == null) ? AVAILABILITY_SCAN_THREADPOOL_SIZE_DEFAULT :
size.intValue();
+ }
+
+ /**
+ * Sets the number of concurrent threads that can be scanning resource
availabilities.
+ *
+ * @param size threadpool size used for thread pool that scans availabilities
+ */
+ public void setAvailabilityScanThreadPoolSize(int size) {
+ configuration.put(AVAILABILITY_SCAN_THREADPOOL_SIZE_PROP,
Integer.valueOf(size));
+ }
+
+ /**
* Returns the length of time, in seconds, before measurements begin getting
collected.
*
* @return number of seconds
diff --git
a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/AvailabilityExecutor.java
b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/AvailabilityExecutor.java
index 03c35ba..e278062 100644
---
a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/AvailabilityExecutor.java
+++
b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/AvailabilityExecutor.java
@@ -37,7 +37,6 @@ import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.jetbrains.annotations.Nullable;
-import org.rhq.core.clientapi.agent.PluginContainerException;
import org.rhq.core.domain.discovery.AvailabilityReport;
import org.rhq.core.domain.measurement.Availability;
import org.rhq.core.domain.measurement.AvailabilityType;
@@ -47,7 +46,6 @@ import org.rhq.core.domain.resource.Resource;
import org.rhq.core.domain.resource.ResourceError;
import org.rhq.core.domain.resource.ResourceErrorType;
import org.rhq.core.pc.inventory.ResourceContainer.ResourceComponentState;
-import org.rhq.core.pc.util.FacetLockType;
import org.rhq.core.pluginapi.availability.AvailabilityFacet;
import org.rhq.core.util.exception.ThrowableUtil;
import org.rhq.core.util.stream.StreamUtil;
@@ -60,33 +58,13 @@ import org.rhq.core.util.stream.StreamUtil;
* @author Ian Springer
*/
public class AvailabilityExecutor implements Runnable, Callable<AvailabilityReport>
{
- /**
- * The get-availability-timeout will rarely, if ever, want to be overridden. It will
default to be 5 seconds
- * and that's what it probably should always be. However, there may be a rare
instance where someone wants
- * to give this availability executor a bit more time to wait for the resource's
availability response
- * and is willing to live with the possible consequences (that being, delayed avail
reports and possibly
- * false-down alerts getting triggered). Rather than changing this timeout, people
should be using
- * the asynchronous-availability-check capabilities that are exposed to the plugins.
Because we do not
- * want to encourage people from changing this, we do not expose this
"backdoor" system property as a
- * standard plugin configuration setting/agent preference - if someone wants to do
this, they must
- * explicitly pass in -D to the JVM running the plugin container.
- */
- public static final int GET_AVAILABILITY_TIMEOUT;
- private static final Random RANDOM = new Random();
- static {
- int timeout;
- try {
- timeout =
Integer.parseInt(System.getProperty("rhq.agent.plugins.availability-scan.timeout",
"5000"));
- } catch (Throwable t) {
- timeout = 5000;
- }
- GET_AVAILABILITY_TIMEOUT = timeout;
- }
private static final Log LOG = LogFactory.getLog(AvailabilityExecutor.class);
protected final InventoryManager inventoryManager;
- private AtomicBoolean sendChangesOnlyReport;
+
+ private final AtomicBoolean sendChangesOnlyReport;
+ private static final Random RANDOM = new Random();
// NOTE: this is probably useless. The concurrency of the availability checks is
mainly guarded by the size of the
// availabilityThreadPoolExecutor in InventoryManager. While this lock object would
prevent multiple avail checks
@@ -98,7 +76,7 @@ public class AvailabilityExecutor implements Runnable,
Callable<AvailabilityRepo
private final Object lock = new Object();
private int scanHistorySize = 1;
- private LinkedList<Scan> scanHistory = new LinkedList<Scan>();
+ private final LinkedList<Scan> scanHistory = new LinkedList<Scan>();
public AvailabilityExecutor(InventoryManager inventoryManager) {
this.inventoryManager = inventoryManager;
@@ -185,23 +163,27 @@ public class AvailabilityExecutor implements Runnable,
Callable<AvailabilityRepo
parent = parent.getParentResource();
}
- //we've gone up past the platform but didn't encounter a single down
resource, hence the parent avail type
- //is to be considered UP (because it either truly is UP or is UNKNOWN as of now)
+ // we've gone up past the platform but didn't encounter a single down
resource, hence the parent avail type
+ // is to be considered UP (because it either truly is UP or is UNKNOWN as of
now)
if (parentAvailabilityType == null) {
parentAvailabilityType = UP;
}
try {
checkInventory(scanRoot, availabilityReport, parentAvailabilityType, false,
scan);
+ } catch (InterruptedException e) {
+ LOG.debug("Availability check was interrupted", e);
+ return;
} catch (RuntimeException e) {
- if (Thread.interrupted()) {
- Thread.currentThread().interrupt();
- if (LOG.isDebugEnabled()) {
+ if (LOG.isDebugEnabled()) {
+ if (Thread.interrupted()) {
LOG.debug("Exception occurred during availability check, but
this thread has been interrupted, "
+ "so most likely the plugin container is shutting down:
" + e);
+ } else {
+ LOG.debug("Exception occurred during availability check: "
+ e);
}
- return;
}
+ return;
}
scan.setEndTime(System.currentTimeMillis());
@@ -229,8 +211,13 @@ public class AvailabilityExecutor implements Runnable,
Callable<AvailabilityRepo
}
}
+ /**
+ * Checks the availability of one resource and then its children.
+ *
+ * @throws InterruptedException if this checking thread was interrupted
+ */
protected void checkInventory(Resource resource, AvailabilityReport
availabilityReport,
- AvailabilityType parentAvailType, boolean isForced, Scan scan) {
+ AvailabilityType parentAvailType, boolean isForced, Scan scan) throws
InterruptedException {
// Only report avail for committed Resources - that's all the Server cares
about.
if (resource.getId() == 0 || resource.getInventoryStatus() !=
InventoryStatus.COMMITTED) {
@@ -244,18 +231,8 @@ public class AvailabilityExecutor implements Runnable,
Callable<AvailabilityRepo
return;
}
- AvailabilityFacet resourceComponent;
- try {
- resourceComponent =
resourceContainer.createResourceComponentProxy(AvailabilityFacet.class,
- FacetLockType.NONE, GET_AVAILABILITY_TIMEOUT, true, false, true);
-
- } catch (PluginContainerException e) {
- // TODO (ips): Why aren't we logging this as an error?
- if (LOG.isDebugEnabled()) {
- LOG.debug("Could not create resource component proxy for " +
resource + ".", e);
- }
- return;
- }
+ // The avail proxy guarantees fast response time for an avail check
+ AvailabilityFacet resourceAvailabilityProxy =
resourceContainer.getAvailabilityProxy();
++scan.numResources;
@@ -319,7 +296,8 @@ public class AvailabilityExecutor implements Runnable,
Callable<AvailabilityRepo
// find out what the avail was the last time we checked it. this may be null
Availability previous = this.inventoryManager.getAvailabilityIfKnown(resource);
- AvailabilityType current = (null == previous) ? UNKNOWN :
previous.getAvailabilityType();
+ AvailabilityType previousType = (null == previous) ? UNKNOWN :
previous.getAvailabilityType();
+ AvailabilityType current = null;
// If the resource's parent is DOWN, the rules are that the resource and all
of the parent's other
// descendants, must also be DOWN. So, there's no need to even ask the
resource component
@@ -329,7 +307,7 @@ public class AvailabilityExecutor implements Runnable,
Callable<AvailabilityRepo
current = parentAvailType;
++scan.numDeferToParent;
- // For the DOWN parent case it's unclear to me whether we should push out
the avil check time of
+ // For the DOWN parent case it's unclear to me whether we should push out
the avail check time of
// the child. For now, we'll leave it alone and let the next check
happen according to the
// schedule already established.
@@ -347,7 +325,7 @@ public class AvailabilityExecutor implements Runnable,
Callable<AvailabilityRepo
if (LOG.isTraceEnabled()) {
LOG.trace("Now checking availability for " + resource);
}
- current = UNKNOWN;
+
try {
++scan.numGetAvailabilityCalls;
@@ -357,11 +335,15 @@ public class AvailabilityExecutor implements Runnable,
Callable<AvailabilityRepo
// down (this is for the case when a plugin component can't start
for whatever reason
// or is just slow to start)
if (resourceContainer.getResourceComponentState() ==
ResourceComponentState.STARTED) {
- current = safeGetAvailability(resourceComponent);
+ current = translate(resourceAvailabilityProxy.getAvailability(),
previousType);
+
} else {
+ // try to start the component and then perform the avail check
this.inventoryManager.activateResource(resource,
resourceContainer, false);
if (resourceContainer.getResourceComponentState() ==
ResourceComponentState.STARTED) {
- current = safeGetAvailability(resourceComponent);
+ current =
translate(resourceAvailabilityProxy.getAvailability(), previousType);
+ } else {
+ current = DOWN;
}
}
if (LOG.isTraceEnabled()) {
@@ -371,31 +353,18 @@ public class AvailabilityExecutor implements Runnable,
Callable<AvailabilityRepo
ResourceError resourceError = new ResourceError(resource,
ResourceErrorType.AVAILABILITY_CHECK,
t.getLocalizedMessage(), ThrowableUtil.getStackAsString(t),
System.currentTimeMillis());
this.inventoryManager.sendResourceErrorToServer(resourceError);
- if (t instanceof TimeoutException) {
- // no need to log the stack trace for timeouts...
- LOG.warn("Availability collection timed out on " +
resource
- + ", availability will be reported as " +
DOWN.name());
- current = DOWN;
- } else {
- LOG.warn("Availability collection failed with exception on
" + resource
- + ", availability will be reported as " +
DOWN.name(), t);
- current = DOWN;
- }
- }
- // Assume DOWN if for some reason the avail check failed
- if (UNKNOWN == current) {
+ LOG.warn("Availability collection failed with exception on
" + resource
+ + ", availability will be reported as " + DOWN.name(),
t);
current = DOWN;
- if (LOG.isTraceEnabled()) {
- LOG.trace("Assuming availability is DOWN for " +
resource);
- }
}
+ } else {
+ current = previousType;
}
}
// Add the availability to the report if it changed from its previous state or if
this is a full report.
// Update the resource container only if the avail has changed.
- boolean availChanged = (null != current && UNKNOWN != current &&
(null == previous || current != previous
- .getAvailabilityType()));
+ boolean availChanged = (UNKNOWN != current && current != previousType);
if (availChanged || scan.isFull) {
Availability availability;
@@ -430,23 +399,13 @@ public class AvailabilityExecutor implements Runnable,
Callable<AvailabilityRepo
checkInventory(child, availabilityReport, current, isForced, scan);
}
- return;
}
- private AvailabilityType safeGetAvailability(AvailabilityFacet component) {
- AvailabilityType availType = component.getAvailability();
- switch (availType) {
- case UP:
- return UP;
- case DOWN:
- return DOWN;
- default:
- if (LOG.isDebugEnabled()) {
- LOG.debug("ResourceComponent " + component + "
getAvailability() returned " + availType
- + ". This is invalid and is being replaced with DOWN.");
- }
- return DOWN;
- }
+ /**
+ * Resources must report UP or DOWN, If current is UNKNOWN, return previously set
avail, otherwise current.
+ */
+ private AvailabilityType translate(AvailabilityType current, AvailabilityType
previousType) {
+ return current == UNKNOWN ? previousType : current;
}
/**
@@ -519,7 +478,7 @@ public class AvailabilityExecutor implements Runnable,
Callable<AvailabilityRepo
}
public static class Scan {
- private long startTime;
+ private final long startTime;
private long endTime;
private long runtime;
diff --git
a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/AvailabilityProxy.java
b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/AvailabilityProxy.java
new file mode 100644
index 0000000..49a7452
--- /dev/null
+++
b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/AvailabilityProxy.java
@@ -0,0 +1,331 @@
+/*
+ * RHQ Management Platform
+ * Copyright (C) 2005-2013 Red Hat, Inc.
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA
+ */
+package org.rhq.core.pc.inventory;
+
+import static org.rhq.core.domain.measurement.AvailabilityType.DOWN;
+import static org.rhq.core.domain.measurement.AvailabilityType.UNKNOWN;
+import static org.rhq.core.domain.measurement.AvailabilityType.UP;
+
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+
+import org.rhq.core.domain.measurement.AvailabilityType;
+import org.rhq.core.pluginapi.availability.AvailabilityFacet;
+
+/**
+ * Proxy class for executing availability checks. Checks are done using a
+ * supplied thread pool. If the resource availability does not return within one
+ * second, the next call to {@link #getAvailability()} will return the
+ * calculated availability, if available.
+ *
+ * With the potential of having thousands, and even tens of thousands, of instances
+ * of this proxy, we must ensure that we keep it as lean as possible to reduce
+ * memory footprint of the agent. For example, we do not create a logger object for
+ * every proxy. Instead, LOG is static. This should be OK for how this proxy is used.
+ *
+ * @author Elias Ross
+ */
+public class AvailabilityProxy implements AvailabilityFacet,
Callable<AvailabilityType> {
+
+ private static final Log LOG = LogFactory.getLog(AvailabilityProxy.class); //
purposefully static, don't create one per proxy
+
+ /**
+ * How long to wait for a resource to return their availability *immediately* (in
ms).
+ * If a resource takes longer than this, then the number of timeouts is incremented,
and then
+ * the container will just assume availability will be returned asynchronously for
this resource.
+ */
+ private static final int AVAIL_SYNC_TIMEOUT;
+
+ /**
+ * Number of consecutive avail sync timeouts before we assume the resource's
avail checking can not meet the async
+ * timeout. At that point stop slowing things down waiting for the timeout and
instead, for this resource,
+ * rely only on the async results. In other words, stop trying to report live avail
if live avail checking is
+ * consistently too slow. Max = 127. We use a byte here to save space.
+ */
+ private static final byte AVAIL_SYNC_TIMEOUT_LIMIT;
+
+ /**
+ * How long to wait for an *async* future to return a resource availability (in ms).
+ * If a resource takes longer than this during an async call (via a thread from the
executor thread pool)
+ * and another request comes in for the availability, then that async call will be
canceled and a new
+ * one will be resubmitted, restarting the clock. This just helps clean up any hung
threads waiting
+ * for an availability that is just taking too much time to complete.
+ */
+ private static final int AVAIL_ASYNC_TIMEOUT;
+
+ static {
+ int syncAvailTimeout;
+ try {
+ // unlikely to be changed but back-door configurable
+ syncAvailTimeout =
Integer.parseInt(System.getProperty("rhq.agent.plugins.availability-scan.sync-timeout",
+ "1000"));
+ } catch (Throwable t) {
+ syncAvailTimeout = 1000;
+ }
+ AVAIL_SYNC_TIMEOUT = syncAvailTimeout;
+
+ byte syncAvailTimeoutLimit;
+ try {
+ // unlikely to be changed but back-door configurable
+ syncAvailTimeoutLimit = Byte.parseByte(System.getProperty(
+ "rhq.agent.plugins.availability-scan.sync-timeout-limit",
"5"));
+ } catch (Throwable t) {
+ syncAvailTimeoutLimit = 5;
+ }
+ if (syncAvailTimeoutLimit > 127) {
+ syncAvailTimeoutLimit = 127;
+ }
+ AVAIL_SYNC_TIMEOUT_LIMIT = syncAvailTimeoutLimit;
+
+ int asyncAvailTimeout;
+ try {
+ // unlikely to be changed but back-door configurable
+ asyncAvailTimeout = Integer.parseInt(System.getProperty(
+ "rhq.agent.plugins.availability-scan.async-timeout",
"60000"));
+ } catch (Throwable t) {
+ asyncAvailTimeout = 60000;
+ }
+ AVAIL_ASYNC_TIMEOUT = asyncAvailTimeout;
+ }
+
+ private final AvailabilityFacet resourceComponent;
+
+ private final ExecutorService executor;
+
+ private Future<AvailabilityType> availabilityFuture = null;
+
+ private volatile Thread current;
+
+ private long lastSubmitTime = 0;
+
+ private AvailabilityType lastAvail = UNKNOWN;
+
+ /**
+ * Number of consecutive avail sync timeouts for the resource. This value is reset if
availability is
+ * returned synchronously (within the timeout period). There is currently no way to
'reset' this (short
+ * of agent restart) after it has triggered, meaning the resource will no longer try
to report live avail.
+ */
+ private byte availSyncConsecutiveTimeouts = 0;
+
+ private final ClassLoader classLoader;
+
+ /**
+ * Constructs a new proxy.
+ */
+ public AvailabilityProxy(AvailabilityFacet resourceComponent, ExecutorService
executor, ClassLoader classLoader) {
+ this.resourceComponent = resourceComponent;
+ this.executor = executor;
+ this.classLoader = classLoader;
+ }
+
+ @Override
+ public AvailabilityType call() throws Exception {
+ current = Thread.currentThread();
+ ClassLoader originalContextClassLoader = current.getContextClassLoader();
+ try {
+ Thread.currentThread().setContextClassLoader(classLoader);
+ return resourceComponent.getAvailability();
+ } finally {
+ current.setContextClassLoader(originalContextClassLoader);
+ }
+ }
+
+ /**
+ * Returns the current or most currently reported availability. If
+ * {@link AvailabilityType#UNKNOWN} is returned, then the availability is
+ * being computed.
+ *
+ * @throws TimeoutException
+ * if an async check exceeds AVAIL_ASYNC_TIMEOUT
+ */
+ @Override
+ public AvailabilityType getAvailability() {
+ // TODO take out DevDebug printlns when we're confident we don't need
them
+ AvailabilityType avail = UNKNOWN;
+
+ try {
+ // If the avail check timed out, or if we are not attempting synchronous
checks (due to
+ // exceeding the consecutive timeout limit) then the future will exist.
+ if (availabilityFuture != null) {
+ if (availabilityFuture.isDone()) {
+ // hold onto and report the last known value if necessary
+ avail = availabilityFuture.get();
+ // System.out.println("DevDebug 1 [" +
System.currentTimeMillis() + "] future done avail [" + avail.name() +
"]");
+
+ } else {
+ // We are still waiting on the previously submitted async avail check
- let's just return
+ // the last one we got. Note that if the future is not done after a
large amount of time,
+ // then it means this thread could somehow be hung or otherwise stuck
and not returning. Not good.
+ // In this case, throw a detailed exception to the avail checker.
+ long elapsedTime = System.currentTimeMillis() - lastSubmitTime;
+ if (elapsedTime > getAsyncTimeout()) {
+ // System.out.println("DevDebug 2 [" +
System.currentTimeMillis() + "] async timeout");
+
+ Throwable t = new Throwable();
+ if (current != null) {
+ t.setStackTrace(current.getStackTrace());
+ }
+ String msg = "Availability check ran too long [" +
elapsedTime + "ms], canceled for ["
+ + resourceComponent + "]; Stack trace includes the timed
out thread's stack trace.";
+ availabilityFuture.cancel(true);
+
+ // try again, maybe the situation will resolve in time for the
next check
+ availabilityFuture = executor.submit(this);
+ lastSubmitTime = System.currentTimeMillis();
+ // System.out.println("DevDebug 3 [" +
System.currentTimeMillis() + "] async timeout submit");
+
+ throw new TimeoutException(msg, t);
+ } else {
+ // System.out.println("DevDebug 4 [" +
System.currentTimeMillis() + "] no async timeout, return lastAvail [" +
lastAvail.name() + "]");
+ return lastAvail;
+ }
+ }
+ }
+
+ // request a thread to do an avail check
+ availabilityFuture = executor.submit(this);
+ lastSubmitTime = System.currentTimeMillis();
+ // System.out.println("DevDebug 5 [" + System.currentTimeMillis()
+ "] standard submit");
+
+ // if we have exceeded the timeout too many times in a row assume that this
is a slow
+ // resource and stop performing synchronous checks, which would likely fail
to return fast enough anyway.
+ if (availSyncConsecutiveTimeouts < getSyncTimeoutLimit()) {
+ // attempt to get availability synchronously
+ avail = availabilityFuture.get(getSyncTimeout(), TimeUnit.MILLISECONDS);
+ // System.out.println("DevDebug 6 [" +
System.currentTimeMillis() + "] sync avail [" + avail.name() + "]");
+
+ // success (failure will throw exception)
+ availSyncConsecutiveTimeouts = 0;
+ availabilityFuture = null;
+
+ } else if (availSyncConsecutiveTimeouts == getSyncTimeoutLimit()) {
+ // System.out.println("DevDebug 7 [" +
System.currentTimeMillis() + "] sync disabled");
+
+ // log one time that we are disabling synchronous checks for this
resource
+ ++availSyncConsecutiveTimeouts;
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Disabling synchronous availability collection for
[" + resourceComponent + "]; ["
+ + getSyncTimeoutLimit() + "] consecutive timeouts exceeding
[" + getSyncTimeout() + "ms]");
+ }
+ }
+ } catch (InterruptedException e) {
+ // System.out.println("DevDebug 8 [" + System.currentTimeMillis()
+ "] Interrupted");
+
+ LOG.debug("InterruptedException; shut down is (likely) in
progress.");
+ availabilityFuture.cancel(true);
+ availabilityFuture = null;
+ Thread.currentThread().interrupt();
+ return UNKNOWN;
+
+ } catch (ExecutionException e) {
+ throw new RuntimeException("Availability check failed",
e.getCause());
+
+ } catch (java.util.concurrent.TimeoutException e) {
+ // System.out.println("DevDebug 9 [" + System.currentTimeMillis()
+ "] Sync Timeout");
+
+ // failed to get avail synchronously. next call to the future will return
availability (we hope)
+ ++availSyncConsecutiveTimeouts;
+ }
+
+ return processAvail(avail);
+ }
+
+ private AvailabilityType processAvail(AvailabilityType type) {
+ AvailabilityType result = type;
+ switch (type) {
+ case UP:
+ case DOWN:
+ break;
+ default:
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("ResourceComponent [" + resourceComponent + "]
getAvailability() returned " + type
+ + ". This is invalid and is being replaced with DOWN.");
+ }
+ result = DOWN;
+ }
+
+ // whenever changing to UP we reset the timeout counter. This is because DOWN
resources often respond
+ // slowly to getAvailability() calls (for example, waiting for a connection
attempt to time out). When a
+ // resource comes up we should give it a chance to respond quickly and provide
live avail.
+ if (result != lastAvail) {
+ if (result == UP) {
+ if (availSyncConsecutiveTimeouts >= getSyncTimeoutLimit()) {
+ // System.out.println("DevDebug 10 [" +
System.currentTimeMillis() + "] Enabling Sync");
+
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Enabling synchronous availability collection for
[" + resourceComponent
+ + "]; Availability has just changed from [" +
lastAvail + "] to UP.");
+ }
+ }
+ availSyncConsecutiveTimeouts = 0;
+
+ }
+ lastAvail = result;
+ }
+
+ // System.out.println("DevDebug 11 [" + System.currentTimeMillis() +
"] returning processAvail [" + result.getName()+ "]");
+
+ return result;
+ }
+
+ /**
+ * Override point. Typically for testing.
+ * @return something other than the env var setting.
+ */
+ protected long getAsyncTimeout() {
+ return AVAIL_ASYNC_TIMEOUT;
+ }
+
+ /**
+ * Override point. Typically for testing.
+ * @return something other than the env var setting.
+ */
+ protected long getSyncTimeout() {
+ return AVAIL_SYNC_TIMEOUT;
+ }
+
+ /**
+ * Override point. Typically for testing.
+ * @return something other than the env var setting.
+ */
+ protected byte getSyncTimeoutLimit() {
+ return AVAIL_SYNC_TIMEOUT_LIMIT;
+ }
+
+ protected boolean isSyncDisabled() {
+ return availSyncConsecutiveTimeouts >= getSyncTimeoutLimit();
+ }
+
+ /**
+ * Debug string.
+ */
+ @Override
+ public String toString() {
+ return "AvailabilityProxy [resourceComponent=" + resourceComponent +
", lastAvail=" + lastAvail
+ + ", lastSubmitTime=" + new java.util.Date(lastSubmitTime) +
", executor=" + executor
+ + ", availabilityFuture=" + availabilityFuture + ",
current=" + current + ", timeouts="
+ + availSyncConsecutiveTimeouts + "]";
+ }
+}
diff --git
a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/ForceAvailabilityExecutor.java
b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/ForceAvailabilityExecutor.java
index 31ea976..53475fd 100644
---
a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/ForceAvailabilityExecutor.java
+++
b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/ForceAvailabilityExecutor.java
@@ -26,7 +26,7 @@ import org.rhq.core.domain.resource.Resource;
/**
* A thin subclass that ensures that the avail report generation forces avail checks for
all resources.
- *
+ *
* @author Jay Shaughnessy
*/
public class ForceAvailabilityExecutor extends AvailabilityExecutor {
@@ -37,7 +37,7 @@ public class ForceAvailabilityExecutor extends AvailabilityExecutor {
@Override
protected void checkInventory(Resource resource, AvailabilityReport
availabilityReport,
- AvailabilityType parentAvailType, boolean forceCheck, Scan scan) {
+ AvailabilityType parentAvailType, boolean forceCheck, Scan scan) throws
InterruptedException {
scan.setForced(true);
super.checkInventory(resource, availabilityReport, parentAvailType, true, scan);
diff --git
a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/ResourceContainer.java
b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/ResourceContainer.java
index 31bd96c..aadd71d 100644
---
a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/ResourceContainer.java
+++
b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/ResourceContainer.java
@@ -19,8 +19,6 @@
package org.rhq.core.pc.inventory;
-import static org.rhq.core.pc.component.ComponentInvocationContextImpl.LocalContext;
-
import java.io.Serializable;
import java.lang.reflect.InvocationHandler;
import java.lang.reflect.InvocationTargetException;
@@ -55,9 +53,12 @@ import org.rhq.core.domain.measurement.MeasurementDefinition;
import org.rhq.core.domain.measurement.MeasurementSchedule;
import org.rhq.core.domain.measurement.MeasurementScheduleRequest;
import org.rhq.core.domain.resource.Resource;
+import org.rhq.core.pc.PluginContainerConfiguration;
import org.rhq.core.pc.component.ComponentInvocationContextImpl;
+import org.rhq.core.pc.component.ComponentInvocationContextImpl.LocalContext;
import org.rhq.core.pc.util.FacetLockType;
import org.rhq.core.pc.util.LoggingThreadFactory;
+import org.rhq.core.pluginapi.availability.AvailabilityFacet;
import org.rhq.core.pluginapi.inventory.ResourceComponent;
import org.rhq.core.pluginapi.inventory.ResourceContext;
import org.rhq.core.util.exception.ThrowableUtil;
@@ -85,15 +86,26 @@ public class ResourceContainer implements Serializable {
// thread pools used to invoke methods on container's components
private static final String DAEMON_THREAD_POOL_NAME =
"ResourceContainer.invoker.daemon";
private static final String NON_DAEMON_THREAD_POOL_NAME =
"ResourceContainer.invoker.nonDaemon";
+ private static final String AVAIL_CHECK_THREAD_POOL_NAME =
"ResourceContainer.invoker.availCheck.daemon";
private static ExecutorService DAEMON_THREAD_POOL;
private static ExecutorService NON_DAEMON_THREAD_POOL;
+ /**
+ * This thread pool protects us from generating a potentially huge number of threads
on slow running
+ * agents where avail checks are taking longer that 1s (given a default setting).
Each avail check
+ * requests a thread on the assumption that most if not all checks will be
sub-second. But if that
+ * is not the case we could, if using an CachedThreadPool, end up with N concurrent
avail check threads,
+ * where N is the number of resources managed by the agent (because that type of pool
can grow unbounded).
+ * Instead, limit the max # of threads and fall back to synchronous checking when
overloaded.
+ */
+ private static ExecutorService AVAIL_CHECK_THREAD_POOL;
+
// non-transient fields
private final Resource resource;
private SynchronizationState synchronizationState = SynchronizationState.NEW;
private Set<MeasurementScheduleRequest> measurementSchedule = new
HashSet<MeasurementScheduleRequest>();
private Set<ResourcePackageDetails> installedPackages = new
HashSet<ResourcePackageDetails>();
- private Map<String, DriftDefinition> driftDefinitions = new HashMap<String,
DriftDefinition>();
+ private final Map<String, DriftDefinition> driftDefinitions = new
HashMap<String, DriftDefinition>();
private MeasurementScheduleRequest availabilitySchedule = null;
// transient fields
@@ -107,15 +119,21 @@ public class ResourceContainer implements Serializable {
private transient Availability availability;
// the time at which this resource is up for an avail check. null indicates
unscheduled.
private transient Long availabilityScheduleTime;
+ private transient AvailabilityProxy availabilityProxy;
/**
* Initialize the ResourceContainer's internals, such as its thread pools.
+ *
+ * @param configuration the plugin container's configuration
*/
- public static void initialize() {
+ public static void initialize(PluginContainerConfiguration pcConfig) {
LoggingThreadFactory daemonFactory = new
LoggingThreadFactory(DAEMON_THREAD_POOL_NAME, true);
LoggingThreadFactory nonDaemonFactory = new
LoggingThreadFactory(NON_DAEMON_THREAD_POOL_NAME, false);
+ LoggingThreadFactory availCheckFactory = new
LoggingThreadFactory(AVAIL_CHECK_THREAD_POOL_NAME, true);
DAEMON_THREAD_POOL = Executors.newCachedThreadPool(daemonFactory);
NON_DAEMON_THREAD_POOL = Executors.newCachedThreadPool(nonDaemonFactory);
+ AVAIL_CHECK_THREAD_POOL =
Executors.newFixedThreadPool(pcConfig.getAvailabilityScanThreadPoolSize(),
+ availCheckFactory);
}
/**
@@ -125,6 +143,7 @@ public class ResourceContainer implements Serializable {
// TODO (ips, 04/30/12): Should we funnel these through
PluginContainer.shutdownExecutorService()?
DAEMON_THREAD_POOL.shutdown();
NON_DAEMON_THREAD_POOL.shutdown();
+ AVAIL_CHECK_THREAD_POOL.shutdown();
}
public ResourceContainer(Resource resource, ClassLoader resourceClassLoader) {
@@ -198,6 +217,8 @@ public class ResourceContainer implements Serializable {
public void setResourceComponent(ResourceComponent resourceComponent) {
synchronized (this) {
this.resourceComponent = resourceComponent;
+ this.availabilityProxy = new AvailabilityProxy(resourceComponent,
AVAIL_CHECK_THREAD_POOL,
+ resourceClassLoader);
}
}
@@ -228,7 +249,10 @@ public class ResourceContainer implements Serializable {
for (MeasurementScheduleRequest sched : this.measurementSchedule) {
if (sched.getInterval() < MeasurementSchedule.MINIMUM_INTERVAL) {
String smallStack = ThrowableUtil.getFilteredStackAsString(new
Throwable());
- String msg = "Invalid collection interval [" + sched +
"] for Resource [" + resource
+ String msg = "Invalid collection interval ["
+ + sched
+ + "] for Resource ["
+ + resource
+ "]. Setting it to 20 minutes until the situation is
corrected. Please report to Development: "
+ smallStack;
LogFactory.getLog(ResourceContainer.class).error(msg);
@@ -390,8 +414,7 @@ public class ResourceContainer implements Serializable {
public String toString() {
AvailabilityType avail = (this.availability != null) ?
this.availability.getAvailabilityType() : null;
return this.getClass().getSimpleName() + "[resource=" + this.resource +
", syncState="
- + this.synchronizationState + ", componentState=" +
this.resourceComponentState + ", avail=" + avail
- + "]";
+ + this.synchronizationState + ", componentState=" +
this.resourceComponentState + ", avail=" + avail + "]";
}
/**
@@ -489,6 +512,15 @@ public class ResourceContainer implements Serializable {
}
/**
+ * Return a proxy for a call to check resource availability, using the daemon thread
pool.
+ *
+ * @see AvailabilityProxy for details
+ */
+ public AvailabilityFacet getAvailabilityProxy() {
+ return this.availabilityProxy;
+ }
+
+ /**
* This is a ResourceComponent proxy that invokes component methods in pooled
threads. Depending on the parameters
* passed to its constructor, it may also:
*
@@ -574,7 +606,7 @@ public class ResourceContainer implements Serializable {
throw e.getCause();
} catch (java.util.concurrent.TimeoutException e) {
String msg = invokedMethodString(method, args, "timed out after
" + timeout
- + " milliseconds - invocation thread will be
interrupted.");
+ + " milliseconds - invocation thread will be
interrupted.");
LOG.debug(msg);
Throwable cause = new Throwable();
cause.setStackTrace(componentInvocation.getStackTrace());
diff --git
a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/RuntimeDiscoveryExecutor.java
b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/RuntimeDiscoveryExecutor.java
index d6a9614..5265977 100644
---
a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/RuntimeDiscoveryExecutor.java
+++
b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/RuntimeDiscoveryExecutor.java
@@ -42,7 +42,6 @@ import org.rhq.core.domain.resource.ResourceType;
import org.rhq.core.pc.PluginContainer;
import org.rhq.core.pc.PluginContainerConfiguration;
import org.rhq.core.pc.plugin.PluginComponentFactory;
-import org.rhq.core.pc.util.FacetLockType;
import org.rhq.core.pluginapi.availability.AvailabilityFacet;
import org.rhq.core.pluginapi.inventory.ProcessScanResult;
import org.rhq.core.pluginapi.inventory.ResourceDiscoveryComponent;
@@ -54,15 +53,15 @@ import org.rhq.core.util.exception.Severity;
* discovering children of existing resources. It recursively walks the hierarchy
looking for new resources, which
* are typically services (but could be non-top-level servers). It is complemented by
{@link AutoDiscoveryExecutor}
* which looks for new top level servers.
- *
+ *
* @author Greg Hinkle
* @author Ian Springer
*/
public class RuntimeDiscoveryExecutor implements Runnable,
Callable<InventoryReport> {
private Log log = LogFactory.getLog(RuntimeDiscoveryExecutor.class);
- private InventoryManager inventoryManager;
- private PluginContainerConfiguration pluginContainerConfiguration;
+ private final InventoryManager inventoryManager;
+ private final PluginContainerConfiguration pluginContainerConfiguration;
/**
* Resource to scan. If null, the entire platform will be scanned.
@@ -210,8 +209,8 @@ public class RuntimeDiscoveryExecutor implements Runnable,
Callable<InventoryRep
// to still perform the check in two cases: if the current avail is not UP or if
the resource category is
// SERVER. This means we won't miss an opportunity to do discovery for stale
DOWN resource, and we won't
// waste time doing discovery on a stale UP SERVER, which can be time consuming.
Since most resources are
- // SERVICEs, and also are typically UP and stay UP, perfoming checks in these two
situations should
- // not add much overhead. Finally, make sure to use facet proxy to do the avail
check, this allows us to use
+ // SERVICEs, and also are typically UP and stay UP, performing checks in these
two situations should
+ // not add much overhead. Finally, make sure to use facet proxy to do the avail
check, this allows us to use
// a timeout, and therefore not hang discovery if the avail check is slow.
Availability currentAvailability = parentContainer.getAvailability();
AvailabilityType currentAvailabilityType = (null == currentAvailability) ?
AvailabilityType.DOWN
@@ -221,20 +220,10 @@ public class RuntimeDiscoveryExecutor implements Runnable,
Callable<InventoryRep
if (AvailabilityType.UP != currentAvailabilityType
|| ResourceCategory.SERVER ==
parentContainer.getResource().getResourceType().getCategory()) {
- AvailabilityFacet parentComponent = null;
- try {
- parentComponent =
parentContainer.createResourceComponentProxy(AvailabilityFacet.class,
- FacetLockType.NONE, AvailabilityExecutor.GET_AVAILABILITY_TIMEOUT,
true, false, true);
-
- } catch (PluginContainerException e) {
- if (log.isDebugEnabled()) {
- log.debug("Parent component for [" + parent + "] was
null; cannot perform service scan.");
- }
- return;
- }
+ AvailabilityFacet parentAvailabilityProxy =
parentContainer.getAvailabilityProxy();
try {
- currentAvailabilityType = parentComponent.getAvailability();
+ currentAvailabilityType = parentAvailabilityProxy.getAvailability();
} catch (Exception e) {
currentAvailabilityType = AvailabilityType.DOWN;
}
@@ -293,7 +282,7 @@ public class RuntimeDiscoveryExecutor implements Runnable,
Callable<InventoryRep
}
}
- // get rid of any child resources of this type that were not yet
committed and are now gone
+ // get rid of any child resources of this type that were not yet
committed and are now gone
removeStaleResources(parent, childResourceType, mergedResources);
}
diff --git
a/modules/core/plugin-container/src/test/java/org/rhq/core/pc/inventory/AvailabilityProxyConcurrencyTest.java
b/modules/core/plugin-container/src/test/java/org/rhq/core/pc/inventory/AvailabilityProxyConcurrencyTest.java
new file mode 100644
index 0000000..06753ca
--- /dev/null
+++
b/modules/core/plugin-container/src/test/java/org/rhq/core/pc/inventory/AvailabilityProxyConcurrencyTest.java
@@ -0,0 +1,114 @@
+/*
+ * RHQ Management Platform
+ * Copyright (C) 2005-2013 Red Hat, Inc.
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA
+ */
+package org.rhq.core.pc.inventory;
+
+import static org.rhq.core.domain.measurement.AvailabilityType.UP;
+
+import java.util.Date;
+import java.util.Hashtable;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import org.testng.annotations.Test;
+
+import org.rhq.core.domain.measurement.AvailabilityType;
+import org.rhq.core.pluginapi.availability.AvailabilityFacet;
+
+@Test
+public class AvailabilityProxyConcurrencyTest implements AvailabilityFacet {
+
+ private AtomicInteger numberOfFacetCalls = new AtomicInteger(-1);
+
+ public void testConcurrentAvailChecks() throws Exception {
+ Thread.interrupted(); // clear any hanging around interrupt status
+
+ ExecutorService executor = Executors.newCachedThreadPool();
+ try {
+ // our one proxy we want to call concurrently
+ final AvailabilityProxy ap = new AvailabilityProxy(this, executor,
getClass().getClassLoader());
+
+ // prime the pump by getting the first one without problems
+ AvailabilityType firstAvail = ap.getAvailability();
+ assert UP.equals(firstAvail) : "Can't even get our first avail
correctly: " + firstAvail;
+
+ // create several threads that will concurrently call getAvailability
+ final int numThreads = 10;
+ final Hashtable<String, AvailabilityType> availResults = new
Hashtable<String, AvailabilityType>(numThreads);
+ final Hashtable<String, Date> dateResults = new Hashtable<String,
Date>(numThreads);
+ final Hashtable<String, Throwable> throwableResults = new
Hashtable<String, Throwable>(numThreads);
+ final CountDownLatch startLatch = new CountDownLatch(1);
+ final CountDownLatch endLatch = new CountDownLatch(numThreads);
+ final Runnable runnable = new Runnable() {
+ public void run() {
+ try {
+ startLatch.await();
+ AvailabilityType availCheck = ap.getAvailability();
+ availResults.put(Thread.currentThread().getName(), availCheck);
+ } catch (Exception e) {
+ throwableResults.put(Thread.currentThread().getName(), e);
+ } finally {
+ dateResults.put(Thread.currentThread().getName(), new Date());
+ endLatch.countDown();
+ }
+ }
+ };
+ numberOfFacetCalls.set(0); // this will count how many times the proxy
actually calls the facet getAvail method
+ for (int i = 0; i < numThreads; i++) {
+ Thread t = new Thread(runnable, "t" + i);
+ t.start();
+ }
+
+ // release the hounds! then wait for them to all finish
+ System.out.println("~~~THREADS STARTED AT: " + new Date());
+ startLatch.countDown();
+ endLatch.await(10000, TimeUnit.SECONDS); // should never take this long
+ System.out.println("~~~THREADS FINISHED AT: " + new Date());
+ System.out.println("~~~THREAD FINISH TIMES: " + dateResults);
+ System.out.println("~~~THREADS WITH EXCEPTIONS: " +
throwableResults);
+
+ // now make sure all of them returns UP
+ assert availResults.size() == numThreads : "Failed, bad threads:
availResults = " + availResults;
+ for (AvailabilityType availtype : availResults.values()) {
+ assert availtype.equals(UP) : "Failed, bad avail: availResults =
" + availResults;
+ }
+
+ // make sure we actually tested the code we need to test - we should not be
making
+ // individual facet calls for each request because we shotgun the requests so
fast,
+ // and the facet sleeps so long, that the proxy should return the last avail
rather
+ // than requiring a new facet call.
+ assert (numberOfFacetCalls.get()) < numThreads : numberOfFacetCalls;
+ } finally {
+ executor.shutdownNow();
+ }
+ }
+
+ @Override
+ public synchronized AvailabilityType getAvailability() {
+ try {
+ System.out.println("~~~AVAILABILITY FACET CALL #" +
numberOfFacetCalls.incrementAndGet());
+ Thread.sleep(250); // just make it slow enough so a few proxy calls are done
concurrently while this method is running
+ } catch (Exception e) {
+ System.out.println("~~~AVAILABILITY SLEEP WAS ABORTED: " + e);
+ }
+ return UP;
+ }
+}
\ No newline at end of file
diff --git
a/modules/core/plugin-container/src/test/java/org/rhq/core/pc/inventory/AvailabilityProxyTest.java
b/modules/core/plugin-container/src/test/java/org/rhq/core/pc/inventory/AvailabilityProxyTest.java
new file mode 100644
index 0000000..e861845
--- /dev/null
+++
b/modules/core/plugin-container/src/test/java/org/rhq/core/pc/inventory/AvailabilityProxyTest.java
@@ -0,0 +1,157 @@
+/*
+ * RHQ Management Platform
+ * Copyright (C) 2005-2013 Red Hat, Inc.
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA
+ */
+package org.rhq.core.pc.inventory;
+
+import static org.rhq.core.domain.measurement.AvailabilityType.DOWN;
+import static org.rhq.core.domain.measurement.AvailabilityType.UP;
+import static org.testng.AssertJUnit.assertEquals;
+import static org.testng.AssertJUnit.fail;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.testng.annotations.Test;
+
+import org.rhq.core.domain.measurement.AvailabilityType;
+import org.rhq.core.pluginapi.availability.AvailabilityFacet;
+
+/**
+ * @author Elias Ross
+ */
+@Test
+public class AvailabilityProxyTest implements AvailabilityFacet {
+
+ private final Log LOG = LogFactory.getLog(AvailabilityProxyTest.class);
+ private volatile int timeout = 1;
+ private AvailabilityType returnedAvail = UP;
+ private final ExecutorService executor = Executors.newCachedThreadPool();
+
+ /**
+ * Run a test. Note this may not be 100% reliable, as it depends on thread execution
to
+ * happen according to our sleep schedule...
+ */
+ public void test() throws InterruptedException {
+ TestAvailabilityProxy ap = new TestAvailabilityProxy(this, executor,
getClass().getClassLoader());
+ LOG.debug("proxy " + ap);
+
+ assertEquals("should be up", UP, ap.getAvailability()); // waits 1ms
and returns synchronously
+ timeout = 1200;
+ assertEquals("should be down", DOWN, ap.getAvailability()); // waits 1s
and times out
+ Thread.sleep(300); // now waited total of 1s + .3s = 1.3 sec > 1.2s
+ assertEquals("should be up now", UP, ap.getAvailability()); // waits 1s
and returns last reported value (UP)
+
+ ap.setAsyncTimeout(1020L);
+ Thread.sleep(50); // waited 1.050 seconds
+ try {
+ ap.getAvailability(); // this submits another which we need to let finish
+ fail("should timeout 1020, waited 1050");
+ } catch (TimeoutException e) {
+ }
+ // wait for the last submit to return
+ Thread.sleep(1210);
+
+ LOG.debug("proxy " + ap);
+
+ // try disabling sync checks
+ // - start returning DOWN avail in order to perform a sync disable and then
re-enable
+ // - go back to default async timeout, we don't want it to trigger anymore
+ // short timeout but longer than the sync timeout to force several sync timeouts
+ returnedAvail = DOWN;
+ ap.setAsyncTimeout(null);
+ timeout = 75;
+ ap.setSyncTimeout(50L);
+
+ while (!ap.isSyncDisabled()) {
+ ap.getAvailability();
+ Thread.sleep(50L);
+ }
+
+ // go back to returning UP so we can re-enable sync checking
+ // make the sync check a half second so we can prove that sync checking is not
happening
+ returnedAvail = UP;
+ timeout = 500;
+ ap.setSyncTimeout(500L);
+ long start = System.currentTimeMillis();
+ assertEquals("should be DOWN", DOWN, ap.getAvailability());
+ assert System.currentTimeMillis() - start < 100 : "Should have been fast,
returning old avail";
+ // wait for the last submit to return
+ Thread.sleep(510);
+
+ // check for re-enable sync checks
+ assertEquals("should be UP", UP, ap.getAvailability());
+ assertEquals("should be enabled", false, ap.isSyncDisabled());
+ // wait for the last submit to return
+ Thread.sleep(510);
+
+ // test interrupt handling
+ LOG.debug("interrupt this thread");
+ Thread.currentThread().interrupt();
+ assertEquals("cancellation", AvailabilityType.UNKNOWN,
ap.getAvailability());
+ assertEquals(true, Thread.currentThread().isInterrupted());
+ }
+
+ @Override
+ public synchronized AvailabilityType getAvailability() {
+ try {
+ LOG.debug("sleep " + timeout);
+ Thread.sleep(timeout);
+ } catch (InterruptedException e) {
+ Thread.currentThread().interrupt();
+ }
+ LOG.debug("return " + returnedAvail.getName());
+ return returnedAvail;
+ }
+
+ private class TestAvailabilityProxy extends AvailabilityProxy {
+
+ private Long asyncTimeout = null;
+ private Long syncTimeout = null;
+
+ public TestAvailabilityProxy(AvailabilityFacet resourceComponent, ExecutorService
executor,
+ ClassLoader classLoader) {
+ super(resourceComponent, executor, classLoader);
+ }
+
+ @Override
+ public AvailabilityType getAvailability() {
+ // System.out.println("DevDebug 0 [" + System.currentTimeMillis()
+ "] getAvail() timeout=[" + timeout + "], syncTimeout=[" +
syncTimeout + "], asyncTimeout=[" + asyncTimeout + "]");
+ return super.getAvailability();
+ }
+
+ public void setAsyncTimeout(Long asyncTimeout) {
+ this.asyncTimeout = asyncTimeout;
+ }
+
+ public void setSyncTimeout(Long syncTimeout) {
+ this.syncTimeout = syncTimeout;
+ }
+
+ @Override
+ protected long getSyncTimeout() {
+ return null == syncTimeout ? super.getSyncTimeout() : syncTimeout;
+ }
+
+ @Override
+ protected long getAsyncTimeout() {
+ return null == asyncTimeout ? super.getAsyncTimeout() : asyncTimeout;
+ }
+ }
+}
diff --git
a/modules/core/plugin-container/src/test/java/org/rhq/core/pc/inventory/ResourceContainerTest.java
b/modules/core/plugin-container/src/test/java/org/rhq/core/pc/inventory/ResourceContainerTest.java
index 58446d0..c9cc490 100644
---
a/modules/core/plugin-container/src/test/java/org/rhq/core/pc/inventory/ResourceContainerTest.java
+++
b/modules/core/plugin-container/src/test/java/org/rhq/core/pc/inventory/ResourceContainerTest.java
@@ -64,12 +64,10 @@ public class ResourceContainerTest {
PluginContainer pc = PluginContainer.getInstance();
pc.setConfiguration(config);
pc.initialize();
- ResourceContainer.initialize();
}
@AfterClass
public void afterClass() {
- ResourceContainer.shutdown();
PluginContainer.getInstance().shutdown();
}
diff --git
a/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentConfiguration.java
b/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentConfiguration.java
index 939457a..34f669b 100644
---
a/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentConfiguration.java
+++
b/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentConfiguration.java
@@ -1143,6 +1143,11 @@ public class AgentConfiguration {
AgentConfigurationConstants.PLUGINS_AVAILABILITY_SCAN_INITIAL_DELAY,
AgentConfigurationConstants.DEFAULT_PLUGINS_AVAILABILITY_SCAN_INITIAL_DELAY);
+ // get the avail thread pool size
+ int avail_scan_threadpool_size = m_preferences.getInt(
+ AgentConfigurationConstants.PLUGINS_AVAILABILITY_SCAN_THREADPOOL_SIZE,
+
AgentConfigurationConstants.DEFAULT_PLUGINS_AVAILABILITY_SCAN_THREADPOOL_SIZE);
+
// get the initial delay before measurement collections begin
long meas_scan_initial_delay = m_preferences.getLong(
AgentConfigurationConstants.PLUGINS_MEASUREMENT_COLLECTION_INITIAL_DELAY,
@@ -1250,6 +1255,7 @@ public class AgentConfiguration {
config.setChildResourceDiscoveryDelay(childResourceDiscoveryDelay);
config.setAvailabilityScanInitialDelay(avail_scan_initial_delay);
config.setAvailabilityScanPeriod(avail_scan_period);
+ config.setAvailabilityScanThreadPoolSize(avail_scan_threadpool_size);
config.setMeasurementCollectionThreadPoolSize(meas_threadpool_size);
config.setMeasurementCollectionInitialDelay(meas_scan_initial_delay);
config.setDriftDetectionInitialDelay(drift_initial_delay);
diff --git
a/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentConfigurationConstants.java
b/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentConfigurationConstants.java
index 000fdc0..96a71a9 100644
---
a/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentConfigurationConstants.java
+++
b/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentConfigurationConstants.java
@@ -151,12 +151,12 @@ public interface AgentConfigurationConstants {
boolean DEFAULT_AGENT_UPDATE_ENABLED = true;
/**
- * If this preference is defined (its default is null), this will be the URL that
contains the agent update version info.
+ * If this preference is defined (its default is null), this will be the URL that
contains the agent update version info.
*/
String AGENT_UPDATE_VERSION_URL = PROPERTY_NAME_PREFIX +
"agent-update.version-url";
/**
- * If this preference is defined (its default is null), this will be the URL the
agent downloads the agent update from.
+ * If this preference is defined (its default is null), this will be the URL the
agent downloads the agent update from.
*/
String AGENT_UPDATE_DOWNLOAD_URL = PROPERTY_NAME_PREFIX +
"agent-update.download-url";
@@ -541,12 +541,12 @@ public interface AgentConfigurationConstants {
String DEFAULT_PLUGINS_DIRECTORY = "plugins";
/**
- * The regular expression to indicate what agent/plugin container classes the plugins
cannot access.
+ * The regular expression to indicate what agent/plugin container classes the plugins
cannot access.
*/
String PLUGINS_ROOT_PLUGIN_CLASSLOADER_REGEX = PROPERTY_NAME_PREFIX +
"plugins.root-plugin-classloader-regex";
/**
- * The comma separated list of names of plugins that are to be disabled at startup
+ * The comma separated list of names of plugins that are to be disabled at startup
*/
String PLUGINS_DISABLED = PROPERTY_NAME_PREFIX + "plugins.disabled";
@@ -630,6 +630,17 @@ public interface AgentConfigurationConstants {
long DEFAULT_PLUGINS_AVAILABILITY_SCAN_PERIOD =
PluginContainerConfiguration.AVAILABILITY_SCAN_PERIOD_DEFAULT;
/**
+ * Defines how many threads can be concurrently scanning for resource
availabilities.
+ */
+ String PLUGINS_AVAILABILITY_SCAN_THREADPOOL_SIZE = PROPERTY_NAME_PREFIX
+ + "plugins.availability-scan.threadpool-size";
+
+ /**
+ * The default threadpool size for availability scanning.
+ */
+ int DEFAULT_PLUGINS_AVAILABILITY_SCAN_THREADPOOL_SIZE =
PluginContainerConfiguration.AVAILABILITY_SCAN_THREADPOOL_SIZE_DEFAULT;
+
+ /**
* If defined, this is to be the size of the measurement collection thread pool. If
not defined, the plugin
* container should default to something it considers appropriate.
*/
diff --git
a/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/i18n/AgentSetupInstructions.java
b/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/i18n/AgentSetupInstructions.java
index ece2c4f..9044c79 100644
---
a/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/i18n/AgentSetupInstructions.java
+++
b/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/i18n/AgentSetupInstructions.java
@@ -709,6 +709,15 @@ public interface AgentSetupInstructions {
@I18NMessages( { @I18NMessage("The time in seconds before the initial
availability scan is performed.") })
String SETUP_INSTRUCTION_PLUGINSAVAILSCANINITIALDELAY_HELP =
"PromptCommand.setup.instruction.plugins.avail-scan-initialdelay.help";
+ // PLUGINS AVAILABILITY SCAN THREAD POOL SIZE
+ String SETUP_INSTRUCTION_PLUGINSAVAILSCANTHREADPOOLSIZE_PREF =
AgentConfigurationConstants.PLUGINS_AVAILABILITY_SCAN_THREADPOOL_SIZE;
+ String SETUP_INSTRUCTION_PLUGINSAVAILSCANTHREADPOOLSIZE_DEFAULT = Integer
+
.toString(AgentConfigurationConstants.DEFAULT_PLUGINS_AVAILABILITY_SCAN_THREADPOOL_SIZE);
+ @I18NMessages({ @I18NMessage("Availability Scan ThreadPool Size") })
+ String SETUP_INSTRUCTION_PLUGINSAVAILSCANTHREADPOOLSIZE_PROMPT =
"PromptCommand.setup.instruction.plugins.avail-scan-threadpoolsize.prompt";
+ @I18NMessages({ @I18NMessage("The number of threads that can concurrently scan
resource availabilities.") })
+ String SETUP_INSTRUCTION_PLUGINSAVAILSCANTHREADPOOLSIZE_HELP =
"PromptCommand.setup.instruction.plugins.avail-scan-threadpoolsize.help";
+
// PLUGINS MEASUREMENT COLLECTION INITIAL DELAY
String SETUP_INSTRUCTION_PLUGINSMEASUREMENTCOLLINITIALDELAY_PREF =
AgentConfigurationConstants.PLUGINS_MEASUREMENT_COLLECTION_INITIAL_DELAY;
String SETUP_INSTRUCTION_PLUGINSMEASUREMENTCOLLINITIALDELAY_DEFAULT = Long
diff --git
a/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/promptcmd/SetupPromptCommand.java
b/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/promptcmd/SetupPromptCommand.java
index d2de209..e97e0cc 100644
---
a/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/promptcmd/SetupPromptCommand.java
+++
b/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/promptcmd/SetupPromptCommand.java
@@ -577,6 +577,13 @@ public class SetupPromptCommand implements AgentPromptCommand {
.getMsg(AgentSetupInstructions.SETUP_INSTRUCTION_PLUGINSAVAILSCANINITIALDELAY_HELP)));
instr.add(new DefaultSetupInstruction(
+
AgentSetupInstructions.SETUP_INSTRUCTION_PLUGINSAVAILSCANTHREADPOOLSIZE_PREF,
+
AgentSetupInstructions.SETUP_INSTRUCTION_PLUGINSAVAILSCANTHREADPOOLSIZE_DEFAULT,
+ new IntegerSetupValidityChecker(1, null), SETUPMSG
+
.getMsg(AgentSetupInstructions.SETUP_INSTRUCTION_PLUGINSAVAILSCANTHREADPOOLSIZE_PROMPT),
SETUPMSG
+
.getMsg(AgentSetupInstructions.SETUP_INSTRUCTION_PLUGINSAVAILSCANTHREADPOOLSIZE_HELP)));
+
+ instr.add(new DefaultSetupInstruction(
AgentSetupInstructions.SETUP_INSTRUCTION_PLUGINSMEASUREMENTCOLLINITIALDELAY_PREF,
AgentSetupInstructions.SETUP_INSTRUCTION_PLUGINSMEASUREMENTCOLLINITIALDELAY_DEFAULT,
new LongSetupValidityChecker(1L, null), SETUPMSG
diff --git a/modules/enterprise/agent/src/main/resources/agent-configuration.xml
b/modules/enterprise/agent/src/main/resources/agent-configuration.xml
index 9f46f62..ce27dde 100644
--- a/modules/enterprise/agent/src/main/resources/agent-configuration.xml
+++ b/modules/enterprise/agent/src/main/resources/agent-configuration.xml
@@ -654,6 +654,17 @@ commands named "config", "setconfig" and
"setup" and the command line options
<!--
_______________________________________________________________
+ rhq.agent.plugins.availability-scan.threadpool-size
+
+ The number of threads that can be concurrently scanning
+ resource availabilities.
+ -->
+ <!--
+ <entry
key="rhq.agent.plugins.availability-scan.threadpool-size"
value="100"/>
+ -->
+
+ <!--
+ _______________________________________________________________
rhq.agent.plugins.measurement-collection.threadpool-size
When measurement's are scheduled for collection, the collection
diff --git a/modules/enterprise/agent/src/main/resources/log4j.xml
b/modules/enterprise/agent/src/main/resources/log4j.xml
index b135bb8..23a7064 100644
--- a/modules/enterprise/agent/src/main/resources/log4j.xml
+++ b/modules/enterprise/agent/src/main/resources/log4j.xml
@@ -91,6 +91,11 @@
</category>
-->
+ <!-- EMS connection factory can be noisy with its INFO messages - comment the below
to see them. -->
+ <category name="org.mc4j.ems.connection.ConnectionFactory">
+ <priority value="WARN"/>
+ </category>
+
<!-- EMS can be noisy with its WARN messages - uncomment the below to suppress
them. -->
<!--
<category name="org.mc4j.ems">
diff --git a/modules/plugins/jmx/src/main/java/org/rhq/plugins/jmx/JMXServerComponent.java
b/modules/plugins/jmx/src/main/java/org/rhq/plugins/jmx/JMXServerComponent.java
index 14f43b9..f801cc5 100644
--- a/modules/plugins/jmx/src/main/java/org/rhq/plugins/jmx/JMXServerComponent.java
+++ b/modules/plugins/jmx/src/main/java/org/rhq/plugins/jmx/JMXServerComponent.java
@@ -71,9 +71,15 @@ public class JMXServerComponent<T extends
ResourceComponent<?>> implements JMXCo
} catch (Exception e) {
if (e.getCause() instanceof SecurityException) {
throw new InvalidPluginConfigurationException("Failed to
authenticate to managed JVM - "
- + "principal and/or credentials connection properties are
not set correctly.");
+ + "principal and/or credentials connection properties are not
set correctly.");
+ }
+ // don't litter agent log with a stack trace unless we're in debug
+ if (log.isDebugEnabled()) {
+ log.warn("Failed to connect to " + context.getResourceType() +
"[" + context.getResourceKey() + "].", e);
+ } else {
+ log.warn("Failed to connect to " + context.getResourceType() +
"[" + context.getResourceKey() + "]: "
+ + e.getMessage());
}
- log.warn("Failed to connect to " + context.getResourceType() +
"[" + context.getResourceKey() + "].", e);
}
}
@@ -82,13 +88,19 @@ public class JMXServerComponent<T extends
ResourceComponent<?>> implements JMXCo
String connectionTypeDescriptorClassName =
pluginConfig.getSimple(JMXDiscoveryComponent.CONNECTION_TYPE)
.getStringValue();
if (JMXDiscoveryComponent.PARENT_TYPE.equals(connectionTypeDescriptorClassName))
{
- // Our parent is itself a JMX component, so just reuse its connection.
+ // Our parent is itself a JMX component, so just reuse its connection, if it
has one.
this.connection = ((JMXComponent)
context.getParentResourceComponent()).getEmsConnection();
+ if (null == this.connection) {
+ throw new IllegalStateException("Could not access parent connection,
parent may be down");
+ }
this.connectionProvider = this.connection.getConnectionProvider();
} else {
this.connectionProvider =
ConnectionProviderFactory.createConnectionProvider(pluginConfig,
this.context.getNativeProcess(), this.context.getTemporaryDirectory());
this.connection = this.connectionProvider.connect();
+ if (null == this.connection) {
+ throw new IllegalStateException("Failed to create connection,
resource may be down");
+ }
this.connection.loadSynchronous(false);
}
}
diff --git a/modules/plugins/rhq-agent/src/main/resources/META-INF/rhq-plugin.xml
b/modules/plugins/rhq-agent/src/main/resources/META-INF/rhq-plugin.xml
index a544775..641fbee 100644
--- a/modules/plugins/rhq-agent/src/main/resources/META-INF/rhq-plugin.xml
+++ b/modules/plugins/rhq-agent/src/main/resources/META-INF/rhq-plugin.xml
@@ -315,6 +315,7 @@
<c:simple-property
name="rhq.agent.plugins.service-discovery.period-secs" type="integer"
units="seconds" activationPolicy="restart" required="false"
default="86400" displayName="Service Discovery Period"
description="Time between service discoveries (in seconds)" />
<c:simple-property
name="rhq.agent.plugins.availability-scan.initial-delay-secs"
type="integer" units="seconds" activationPolicy="restart"
required="false" default="30" displayName="Availability Scan
Initial Delay" description="Startup delay before the first availability scan is
run (in seconds)" />
<c:simple-property
name="rhq.agent.plugins.availability-scan.period-secs" type="integer"
units="seconds" activationPolicy="restart" required="false"
default="300" displayName="Availability Scan Period"
description="Time between availability scans (in seconds)" />
+ <c:simple-property
name="rhq.agent.plugins.availability-scan.threadpool-size"
type="integer" activationPolicy="restart" required="false"
default="100" displayName="Availability Scan ThreadPool Size"
description="Number of concurrent threads that scan for resource availabilities"
/>
<c:simple-property
name="rhq.agent.plugins.measurement-collection.initial-delay-secs"
type="integer" units="seconds" activationPolicy="restart"
required="false" default="30" displayName="Measurement Collection
Initial Delay" description="Startup delay before the first measurement
collection is run (in seconds)" />
<c:simple-property
name="rhq.agent.plugins.measurement-collection.threadpool-size"
type="integer" activationPolicy="restart" required="false"
default="5" displayName="Measurement Collection Threadpool Size"
description="Number of concurrent measurement collections that can be run"
/>
<c:simple-property
name="rhq.agent.plugins.drift-detection.initial-delay-secs"
type="integer" units="seconds" activationPolicy="restart"
required="false" default="30" displayName="Drift Detection
Initial Delay" description="Startup delay before the first drift detection scan
is run (in seconds)" />
diff --git
a/modules/plugins/tomcat/src/main/java/org/jboss/on/plugins/tomcat/TomcatServerComponent.java
b/modules/plugins/tomcat/src/main/java/org/jboss/on/plugins/tomcat/TomcatServerComponent.java
index 9838c14..64c531e 100644
---
a/modules/plugins/tomcat/src/main/java/org/jboss/on/plugins/tomcat/TomcatServerComponent.java
+++
b/modules/plugins/tomcat/src/main/java/org/jboss/on/plugins/tomcat/TomcatServerComponent.java
@@ -221,12 +221,16 @@ public class TomcatServerComponent<T extends
ResourceComponent<?>> implements JM
connectionSettings.getControlProperties().setProperty(ConnectionFactory.JAR_TEMP_DIR,
tempDir.getAbsolutePath());
- log.info("Loading connection [" +
connectionSettings.getServerUrl() + "] with install path ["
- + connectionSettings.getLibraryURI() + "] and temp directory
[" + tempDir.getAbsolutePath()
- + "]");
+ if (log.isDebugEnabled()) {
+ log.debug("Loading connection [" +
connectionSettings.getServerUrl() + "] with install path ["
+ + connectionSettings.getLibraryURI() + "] and temp
directory [" + tempDir.getAbsolutePath()
+ + "]");
+ }
} else {
- log.info("Loading connection [" +
connectionSettings.getServerUrl()
- + "] ignoring remote install path [" + catalinaHome +
"]");
+ if (log.isDebugEnabled()) {
+ log.debug("Loading connection [" +
connectionSettings.getServerUrl()
+ + "] ignoring remote install path [" + catalinaHome
+ "]");
+ }
}
ConnectionProvider connectionProvider =
connectionFactory.getConnectionProvider(connectionSettings);
@@ -425,7 +429,8 @@ public class TomcatServerComponent<T extends
ResourceComponent<?>> implements JM
return scriptFile;
}
- private File resolvePathRelativeToHomeDir(@NotNull String path) {
+ private File resolvePathRelativeToHomeDir(@NotNull
+ String path) {
return
resolvePathRelativeToHomeDir(this.resourceContext.getPluginConfiguration(), path);
}