Branch: refs/heads/release/jon3.3.x
Home:
https://github.com/rhq-project/rhq
Commit: b76f9f35a754b2c77fbb9af68af83a31b5838123
https://github.com/rhq-project/rhq/commit/b76f9f35a754b2c77fbb9af68af83a3...
Author: John Sanda <jsanda(a)redhat.com>
Date: 2015-03-16 (Mon, 16 Mar 2015)
Changed paths:
A
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/AbortedException.java
A
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/DateUtils.java
A
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/KeyScanner.java
M
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
A
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/RateMonitor.java
M
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/Replace412Index.java
M
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/ReplaceIndex.java
M
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/ReplaceRHQ411Index.java
A
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/RetryWrite.java
M
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/SchemaUpdateThreadFactory.java
A
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/TaskTracker.java
A
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/exception/KeyScanException.java
A
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/BatchInsertFuture.java
A
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/BatchResult.java
A
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/FailedBatch.java
A
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/migration/QueryExecutor.java
M
modules/common/cassandra-schema/src/test/java/org/rhq/cassandra/schema/MigrateAggregateMetricsTest.java
M
modules/common/cassandra-schema/src/test/java/org/rhq/cassandra/schema/ReplaceIndexTest.java
M
modules/common/cassandra-schema/src/test/java/org/rhq/cassandra/schema/SchemaUpgradeTest.java
Log Message:
-----------
[BZ 185375] speed up data migration step
Writes are now done in batches. Reads are done in parallel and concurrently in
batches.
[BZ 185375] refactor date functions into util class
[BZ 185375] lots of clean up, refactoring, and perf improvements
The TaskTracker class has been copied into the source tree from the
service-metrics maven module. The reason for copying the class is beause the
changes for this BZ are going into a patch release, and I want to minimize the
number of modules that have to be touched for this.
We no longer query the rdbms to load schedule ids. Instead we use the legacy
thrift APIs via the hector library to scan for schedule ids. This way, we query
for only those schedule ids that have data.
Reads are now throttled using a RateLimiter. This is better than using a
semaphore for a couple reasons. First, we do not have to worry about releasing
permits, which had to be done in a couple different places in the code.
Secondly, it is more expressive about the throughput. With a semaphore, it was
hard to determine the read throughput, whereas with RateLimiter it is directly
specified as querys per second.
The migration code previously tried to abort processing if a write failed. This
made the code overly complicated. Since we are now using batches, we only do a
handful of writes per schedule id. We now just have the failure detection and
handling code in the migration finished callback. This simplifies things a lot.
[BZ 185375] add failure detection
There is a now a failure threshold that if exceeded will cause the migration to
be aborted. The rationale is that the read/write rates will likely have to be
adjusted based on each environment. If we are reading too fast for example,
and start generating lots of failiures, it makes sense to go ahead and
terminate the migration and restart with a lower read rate.
[BZ 185375] remove key scanning code with hector
I went back and did some testing with the original approach of querying against
all schedule ids that are returned from the RDBMS. I was able to achieve the
same performance (if not better) as I did with hector. I do not see any reason
therefore to pull in additional dependencies.
Conflicts:
modules/common/cassandra-schema/pom.xml
[BZ 185375] load schedule ids using astyanax
We do not have a good way in C* 1.2 to scan for schedule ids. Instead I am
using netflix's astyanax library which uses the legacy thrift api. It is
using the get_range_slices operation. This should hopefully be faster than
querying every single schedule id returned from the rdbms.
Conflicts:
modules/common/cassandra-schema/pom.xml
[BZ 185375] initial support for dynamic throttling
[BZ 185375] retry data migrations on failures
[BZ 185375] use async writes and dynamic throttling for index update
[BZ 185375] fix regression in MigrateData
Add back check to execute statements when there is not a full batch. Also
updating RateMonitor to include minimum rates.
[BZ 185375] adding jboss modules for astyanax
Conflicts:
modules/common/cassandra-schema/pom.xml
[BZ 185375] fix astyanax module dependencies
[BZ 185375] fix transitive dependency conflicts
[BZ 185375] fix transitive dependency conflicts, take 2
[BZ 185375] still trying to fix compile error on jenkins which I no longer see locally
[BZ 185375] put cassandra-schema in provided scope to get past build error
I think the build error might be more of a maven issue. It occurs with
maven 3.0.4, but it does not occur when I run maven 3.2.1. I did not check
other versions. Any how, this change seems to have resolved the error.
[BZ 185375] adding another astyanax dependency to make available for container build
[BZ 185375] simplify queries and TTL calculations
THere is no need to include the schedule id, write time, and ttl in the query
results. We already know the schedule id, and we can compute the ttl.
[BZ 185375] scan for keys using cql instead of astyanax
I am abandoning the use of Astyanax since it requires Cassandra's rpc server to
be running. We turn the rpc server off by default. It can be turned on via jmx,
but we restrict jmx access to localhost. This means users would have to
manually enable the thrift server for each storage node for the upgrade. We
cannot impose that kind of burden.
[BZ 185375] remove dependency on Astyanax libraries
[BZ 185375] more clean up of astyanax dependency removal
[BZ 185375] make sure we shut down key scanner in event of failure
[BZ 185375] retry failed batches individually
Previously when a request failed whether it was on the data read or on one or
more of the batch inserts, we would resubmit all of the requests. We now retry
failed batches individually which should be more efficient since we are not
resubmitting requests that have succeeded.
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateData.java
[BZ 185375] adding changes from merge conflicts from previous commit
[BZ 185375] keep read rate at a fixed factor of the write rate
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/RateMonitor.java
[BZ 185375] adding some debug code to figure out blocking issue
[BZ 185375] Try different approach for retrying writes to avoid indefinitely blocking
[BZ 185375] another attempt to retry failed writes
[BZ 185375] add logic to detect and handle indefinite blocking
[BZ 185375] Do not use FutureFallback and do not decrement counts when retrying writes
Decrementing remaining metric counts when retrying writes results in negative
counts because I was decrementing once for each retried batch, but it should
be once for all the batches belonging to a paritcular metric. I decided to
simply remove the decrement call since retrying the failed batches should
be fast.
[BZ 185375] allow for schema propagation of table creation before doing any work
[BZ 185375] don't write until we finish reading
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
[BZ 185375] store rows on fail writes since the result set will be exhausted
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
[BZ 185375] decrement count when metric has already been migrated
[BZ 185375] clean up and remove unused code
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateAggregateMetrics.java
[BZ 185375] MigrateData is no longer used
Conflicts:
modules/common/cassandra-schema/src/main/java/org/rhq/cassandra/schema/MigrateData.java
[BZ 185375] clean up merge conflicts
[BZ 185375] add null check on shutdown
This check was here but lost in some of the prior merge conflicts.