[rhq-project/rhq] f64d2a: Bug 1032039 - JON server cannot restart itself
by Libor Zoubek
Branch: refs/heads/release/jon3.3.x
Home: https://github.com/rhq-project/rhq
Commit: f64d2a135b9f4dd6584a97c7601543ea4affe42d
https://github.com/rhq-project/rhq/commit/f64d2a135b9f4dd6584a97c7601543e...
Author: Libor Zoubek <lzoubek(a)redhat.com>
Date: 2015-03-21 (Sat, 21 Mar 2015)
Changed paths:
M modules/plugins/rhq-server/src/main/java/org/rhq/plugins/server/DiscoveryCallbackImpl.java
M modules/plugins/rhq-server/src/main/java/org/rhq/plugins/server/ResourceUpgradeCallbackImpl.java
Log Message:
-----------
Bug 1032039 - JON server cannot restart itself
The fix implements DiscoveryCallback and ResourceUpgradeCallback in
rhq-server plugin and sets proper start script parameters (uses rhqctl) for
RHQ Server resources.
When non-HA server is restarted via RHQ Server scheduled operation (agent
attempting to restart the server is also reporting to the same server)
restart operation will not succeed (although server was in fact restarted)
and it will timeout - this happens, because when agent wants to deliver
operation result to server, server goes immediatelly down (as a consequence
of restart operation) and when it comes up, agent is unable to resend
operation result data anymore (this maybe a separate bug/edge case)
(cherry picked from commit a6beed4cb9b976c50ed944e9c590215e2dca1164)
Signed-off-by: Libor Zoubek <lzoubek(a)redhat.com>
9 years, 1 month
[rhq-project/rhq] 883e17: [BZ 185375] speed up data migration step
by John Sanda
Branch: refs/heads/RHQ_4_12_0_JON331GA/bug/1198845
Home: https://github.com/rhq-project/rhq
Commit: 883e171de27a36e29aa271e071b244407163bdeb
https://github.com/rhq-project/rhq/commit/883e171de27a36e29aa271e071b2444...
Author: John Sanda <jsanda(a)redhat.com>
Date: 2015-03-18 (Wed, 18 Mar 2015)
Changed paths:
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/AbortedException.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/DateUtils.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/KeyScanner.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/RateMonitor.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/Replace412Index.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/ReplaceIndex.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/ReplaceRHQ411Index.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/RetryWrite.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/SchemaUpdateThreadFactory.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/TaskTracker.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/exception/KeyScanException.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/BatchInsertFuture.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/BatchResult.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/FailedBatch.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/QueryExecutor.java
M modules/common/cassandra-schema/src/test/java/org/rhq/cassandra/schema/MigrateAggregateMetricsTest.java
M modules/common/cassandra-schema/src/test/java/org/rhq/cassandra/schema/ReplaceIndexTest.java
M modules/common/cassandra-schema/src/test/java/org/rhq/cassandra/schema/SchemaUpgradeTest.java
Log Message:
-----------
[BZ 185375] speed up data migration step
Writes are now done in batches. Reads are done in parallel and concurrently in
batches.
[BZ 185375] refactor date functions into util class
[BZ 185375] lots of clean up, refactoring, and perf improvements
The TaskTracker class has been copied into the source tree from the
service-metrics maven module. The reason for copying the class is beause the
changes for this BZ are going into a patch release, and I want to minimize the
number of modules that have to be touched for this.
We no longer query the rdbms to load schedule ids. Instead we use the legacy
thrift APIs via the hector library to scan for schedule ids. This way, we query
for only those schedule ids that have data.
Reads are now throttled using a RateLimiter. This is better than using a
semaphore for a couple reasons. First, we do not have to worry about releasing
permits, which had to be done in a couple different places in the code.
Secondly, it is more expressive about the throughput. With a semaphore, it was
hard to determine the read throughput, whereas with RateLimiter it is directly
specified as querys per second.
The migration code previously tried to abort processing if a write failed. This
made the code overly complicated. Since we are now using batches, we only do a
handful of writes per schedule id. We now just have the failure detection and
handling code in the migration finished callback. This simplifies things a lot.
[BZ 185375] add failure detection
There is a now a failure threshold that if exceeded will cause the migration to
be aborted. The rationale is that the read/write rates will likely have to be
adjusted based on each environment. If we are reading too fast for example,
and start generating lots of failiures, it makes sense to go ahead and
terminate the migration and restart with a lower read rate.
[BZ 185375] remove key scanning code with hector
I went back and did some testing with the original approach of querying against
all schedule ids that are returned from the RDBMS. I was able to achieve the
same performance (if not better) as I did with hector. I do not see any reason
therefore to pull in additional dependencies.
Conflicts:
modules/common/cassandra-schema/pom.xml
[BZ 185375] load schedule ids using astyanax
We do not have a good way in C* 1.2 to scan for schedule ids. Instead I am
using netflix's astyanax library which uses the legacy thrift api. It is
using the get_range_slices operation. This should hopefully be faster than
querying every single schedule id returned from the rdbms.
Conflicts:
modules/common/cassandra-schema/pom.xml
[BZ 185375] initial support for dynamic throttling
[BZ 185375] retry data migrations on failures
[BZ 185375] use async writes and dynamic throttling for index update
[BZ 185375] fix regression in MigrateData
Add back check to execute statements when there is not a full batch. Also
updating RateMonitor to include minimum rates.
[BZ 185375] adding jboss modules for astyanax
Conflicts:
modules/common/cassandra-schema/pom.xml
[BZ 185375] fix astyanax module dependencies
[BZ 185375] fix transitive dependency conflicts
[BZ 185375] fix transitive dependency conflicts, take 2
[BZ 185375] still trying to fix compile error on jenkins which I no longer see locally
[BZ 185375] put cassandra-schema in provided scope to get past build error
I think the build error might be more of a maven issue. It occurs with
maven 3.0.4, but it does not occur when I run maven 3.2.1. I did not check
other versions. Any how, this change seems to have resolved the error.
[BZ 185375] adding another astyanax dependency to make available for container build
[BZ 185375] simplify queries and TTL calculations
THere is no need to include the schedule id, write time, and ttl in the query
results. We already know the schedule id, and we can compute the ttl.
[BZ 185375] scan for keys using cql instead of astyanax
I am abandoning the use of Astyanax since it requires Cassandra's rpc server to
be running. We turn the rpc server off by default. It can be turned on via jmx,
but we restrict jmx access to localhost. This means users would have to
manually enable the thrift server for each storage node for the upgrade. We
cannot impose that kind of burden.
[BZ 185375] remove dependency on Astyanax libraries
[BZ 185375] more clean up of astyanax dependency removal
[BZ 185375] make sure we shut down key scanner in event of failure
[BZ 185375] retry failed batches individually
Previously when a request failed whether it was on the data read or on one or
more of the batch inserts, we would resubmit all of the requests. We now retry
failed batches individually which should be more efficient since we are not
resubmitting requests that have succeeded.
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateData.java
[BZ 185375] adding changes from merge conflicts from previous commit
[BZ 185375] keep read rate at a fixed factor of the write rate
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/RateMonitor.java
[BZ 185375] adding some debug code to figure out blocking issue
[BZ 185375] Try different approach for retrying writes to avoid indefinitely blocking
[BZ 185375] another attempt to retry failed writes
[BZ 185375] add logic to detect and handle indefinite blocking
[BZ 185375] Do not use FutureFallback and do not decrement counts when retrying writes
Decrementing remaining metric counts when retrying writes results in negative
counts because I was decrementing once for each retried batch, but it should
be once for all the batches belonging to a paritcular metric. I decided to
simply remove the decrement call since retrying the failed batches should
be fast.
[BZ 185375] allow for schema propagation of table creation before doing any work
[BZ 185375] don't write until we finish reading
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
[BZ 185375] store rows on fail writes since the result set will be exhausted
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
[BZ 185375] decrement count when metric has already been migrated
[BZ 185375] clean up and remove unused code
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
[BZ 185375] MigrateData is no longer used
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateData.java
[BZ 185375] clean up merge conflicts
[BZ 185375] add null check on shutdown
This check was here but lost in some of the prior merge conflicts.
9 years, 1 month
[rhq-project/rhq] b76f9f: [BZ 185375] speed up data migration step
by John Sanda
Branch: refs/heads/release/jon3.3.x
Home: https://github.com/rhq-project/rhq
Commit: b76f9f35a754b2c77fbb9af68af83a31b5838123
https://github.com/rhq-project/rhq/commit/b76f9f35a754b2c77fbb9af68af83a3...
Author: John Sanda <jsanda(a)redhat.com>
Date: 2015-03-16 (Mon, 16 Mar 2015)
Changed paths:
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/AbortedException.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/DateUtils.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/KeyScanner.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/RateMonitor.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/Replace412Index.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/ReplaceIndex.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/ReplaceRHQ411Index.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/RetryWrite.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/SchemaUpdateThreadFactory.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/TaskTracker.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/exception/KeyScanException.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/BatchInsertFuture.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/BatchResult.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/FailedBatch.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/QueryExecutor.java
M modules/common/cassandra-schema/src/test/java/org/rhq/cassandra/schema/MigrateAggregateMetricsTest.java
M modules/common/cassandra-schema/src/test/java/org/rhq/cassandra/schema/ReplaceIndexTest.java
M modules/common/cassandra-schema/src/test/java/org/rhq/cassandra/schema/SchemaUpgradeTest.java
Log Message:
-----------
[BZ 185375] speed up data migration step
Writes are now done in batches. Reads are done in parallel and concurrently in
batches.
[BZ 185375] refactor date functions into util class
[BZ 185375] lots of clean up, refactoring, and perf improvements
The TaskTracker class has been copied into the source tree from the
service-metrics maven module. The reason for copying the class is beause the
changes for this BZ are going into a patch release, and I want to minimize the
number of modules that have to be touched for this.
We no longer query the rdbms to load schedule ids. Instead we use the legacy
thrift APIs via the hector library to scan for schedule ids. This way, we query
for only those schedule ids that have data.
Reads are now throttled using a RateLimiter. This is better than using a
semaphore for a couple reasons. First, we do not have to worry about releasing
permits, which had to be done in a couple different places in the code.
Secondly, it is more expressive about the throughput. With a semaphore, it was
hard to determine the read throughput, whereas with RateLimiter it is directly
specified as querys per second.
The migration code previously tried to abort processing if a write failed. This
made the code overly complicated. Since we are now using batches, we only do a
handful of writes per schedule id. We now just have the failure detection and
handling code in the migration finished callback. This simplifies things a lot.
[BZ 185375] add failure detection
There is a now a failure threshold that if exceeded will cause the migration to
be aborted. The rationale is that the read/write rates will likely have to be
adjusted based on each environment. If we are reading too fast for example,
and start generating lots of failiures, it makes sense to go ahead and
terminate the migration and restart with a lower read rate.
[BZ 185375] remove key scanning code with hector
I went back and did some testing with the original approach of querying against
all schedule ids that are returned from the RDBMS. I was able to achieve the
same performance (if not better) as I did with hector. I do not see any reason
therefore to pull in additional dependencies.
Conflicts:
modules/common/cassandra-schema/pom.xml
[BZ 185375] load schedule ids using astyanax
We do not have a good way in C* 1.2 to scan for schedule ids. Instead I am
using netflix's astyanax library which uses the legacy thrift api. It is
using the get_range_slices operation. This should hopefully be faster than
querying every single schedule id returned from the rdbms.
Conflicts:
modules/common/cassandra-schema/pom.xml
[BZ 185375] initial support for dynamic throttling
[BZ 185375] retry data migrations on failures
[BZ 185375] use async writes and dynamic throttling for index update
[BZ 185375] fix regression in MigrateData
Add back check to execute statements when there is not a full batch. Also
updating RateMonitor to include minimum rates.
[BZ 185375] adding jboss modules for astyanax
Conflicts:
modules/common/cassandra-schema/pom.xml
[BZ 185375] fix astyanax module dependencies
[BZ 185375] fix transitive dependency conflicts
[BZ 185375] fix transitive dependency conflicts, take 2
[BZ 185375] still trying to fix compile error on jenkins which I no longer see locally
[BZ 185375] put cassandra-schema in provided scope to get past build error
I think the build error might be more of a maven issue. It occurs with
maven 3.0.4, but it does not occur when I run maven 3.2.1. I did not check
other versions. Any how, this change seems to have resolved the error.
[BZ 185375] adding another astyanax dependency to make available for container build
[BZ 185375] simplify queries and TTL calculations
THere is no need to include the schedule id, write time, and ttl in the query
results. We already know the schedule id, and we can compute the ttl.
[BZ 185375] scan for keys using cql instead of astyanax
I am abandoning the use of Astyanax since it requires Cassandra's rpc server to
be running. We turn the rpc server off by default. It can be turned on via jmx,
but we restrict jmx access to localhost. This means users would have to
manually enable the thrift server for each storage node for the upgrade. We
cannot impose that kind of burden.
[BZ 185375] remove dependency on Astyanax libraries
[BZ 185375] more clean up of astyanax dependency removal
[BZ 185375] make sure we shut down key scanner in event of failure
[BZ 185375] retry failed batches individually
Previously when a request failed whether it was on the data read or on one or
more of the batch inserts, we would resubmit all of the requests. We now retry
failed batches individually which should be more efficient since we are not
resubmitting requests that have succeeded.
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateData.java
[BZ 185375] adding changes from merge conflicts from previous commit
[BZ 185375] keep read rate at a fixed factor of the write rate
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/RateMonitor.java
[BZ 185375] adding some debug code to figure out blocking issue
[BZ 185375] Try different approach for retrying writes to avoid indefinitely blocking
[BZ 185375] another attempt to retry failed writes
[BZ 185375] add logic to detect and handle indefinite blocking
[BZ 185375] Do not use FutureFallback and do not decrement counts when retrying writes
Decrementing remaining metric counts when retrying writes results in negative
counts because I was decrementing once for each retried batch, but it should
be once for all the batches belonging to a paritcular metric. I decided to
simply remove the decrement call since retrying the failed batches should
be fast.
[BZ 185375] allow for schema propagation of table creation before doing any work
[BZ 185375] don't write until we finish reading
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
[BZ 185375] store rows on fail writes since the result set will be exhausted
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
[BZ 185375] decrement count when metric has already been migrated
[BZ 185375] clean up and remove unused code
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
[BZ 185375] MigrateData is no longer used
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateData.java
[BZ 185375] clean up merge conflicts
[BZ 185375] add null check on shutdown
This check was here but lost in some of the prior merge conflicts.
9 years, 1 month
[rhq-project/rhq] d12964: [BZ 185375] adding some debug code to figure out b...
by John Sanda
Branch: refs/heads/master
Home: https://github.com/rhq-project/rhq
Commit: d129647e3a6f74df17607747c133f4662d87a8f2
https://github.com/rhq-project/rhq/commit/d129647e3a6f74df17607747c133f46...
Author: John Sanda <jsanda(a)redhat.com>
Date: 2015-03-14 (Sat, 14 Mar 2015)
Changed paths:
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/SchemaUpdateThreadFactory.java
Log Message:
-----------
[BZ 185375] adding some debug code to figure out blocking issue
Commit: acced41a909dcc97fb6f82a1ac00b9371e97850f
https://github.com/rhq-project/rhq/commit/acced41a909dcc97fb6f82a1ac00b93...
Author: John Sanda <jsanda(a)redhat.com>
Date: 2015-03-14 (Sat, 14 Mar 2015)
Changed paths:
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateData.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/BatchInsertFuture.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/BatchResult.java
Log Message:
-----------
[BZ 185375] Try different approach for retrying writes to avoid indefinitely blocking
Commit: c3f5ad5c7c8d0238b120382b98806c0d2ed45403
https://github.com/rhq-project/rhq/commit/c3f5ad5c7c8d0238b120382b98806c0...
Author: John Sanda <jsanda(a)redhat.com>
Date: 2015-03-14 (Sat, 14 Mar 2015)
Changed paths:
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateData.java
Log Message:
-----------
[BZ 185375] another attempt to retry failed writes
Commit: 66a6f753fbf874d2b8d9391783dd3d88bb0cb55f
https://github.com/rhq-project/rhq/commit/66a6f753fbf874d2b8d9391783dd3d8...
Author: John Sanda <jsanda(a)redhat.com>
Date: 2015-03-14 (Sat, 14 Mar 2015)
Changed paths:
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/ReplaceRHQ411Index.java
Log Message:
-----------
[BZ 185375] add logic to detect and handle indefinite blocking
Commit: bca72eba024c97beb2fb9f78a243f44bf15b5e01
https://github.com/rhq-project/rhq/commit/bca72eba024c97beb2fb9f78a243f44...
Author: John Sanda <jsanda(a)redhat.com>
Date: 2015-03-14 (Sat, 14 Mar 2015)
Changed paths:
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateData.java
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/BatchInsertFuture.java
A modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/FailedBatch.java
Log Message:
-----------
[BZ 185375] Do not use FutureFallback and do not decrement counts when retrying writes
Decrementing remaining metric counts when retrying writes results in negative
counts because I was decrementing once for each retried batch, but it should
be once for all the batches belonging to a paritcular metric. I decided to
simply remove the decrement call since retrying the failed batches should
be fast.
Commit: 3de1acc443acd7ed0cf0e858efa523d880f169fb
https://github.com/rhq-project/rhq/commit/3de1acc443acd7ed0cf0e858efa523d...
Author: John Sanda <jsanda(a)redhat.com>
Date: 2015-03-14 (Sat, 14 Mar 2015)
Changed paths:
M modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/ReplaceRHQ411Index.java
Log Message:
-----------
[BZ 185375] allow for schema propagation of table creation before doing any work
Compare: https://github.com/rhq-project/rhq/compare/4c30284446bd...3de1acc443ac
9 years, 1 month