Just the voice of QE on this ....  I am hearing lots of discussion on design decisions on this data migration tool ... which is good.  

One word of caution (and I hear this in all your emails ... so I know you are thinking about this...this is just a reminder):  Avoid premature optimization.   

The course of action that makes sense to me:

#1.   Baseline the current implementation.  Use small and large datasets typical of what a customer would use.  Document the baseline.   This will be incredibly useful moving forward to compare alternative solutions.   You need the baseline.  
#2.    Determine if any design optimizations need to be made.
#3.   Compare alternative solutions against the baseline. 


I guess what I am saying ... I think we need a baseline before this conversation can continue in a meaningful way.   

How can QE help establish a meaningful baseline?  What do we need?  Resources in the bladecenter?  Determining an SLA or acceptance criteria for performance?  Datasets?  




From: "Thomas Segismont" <tsegismo@redhat.com>
To: rhq-devel@lists.fedorahosted.org, rhq-users@lists.fedorahosted.org
Sent: Thursday, January 10, 2013 12:02:07 PM
Subject: Re: Metrics Migration Tool - Cassandra

Le 09/01/2013 20:32, John Sanda a écrit :
> At this point, all we can do is speculate about how long the migration will actually take until we do some load testing. If we find that the migration is taking longer than we would like, another option could be to explore using the bulk import/export utilities provided by each of the databases.

I think working on bulk export files would be far more efficient. And it
shouldn't be too difficult given the measurement tables have very simple
schema (migrating to Cassandra may not be as simple as migrating these
tables data though).

So why not having the two mechanisms:
1. batching with Hibernate which would support a larger number of
deployments (Postgres, Oracle, SQLServer)
2. batching with bulk export files for the supported databases
(Postgres, Oracle)

I know it's double code, test and support but I really doubt #1 can
handle large amounts of data in less than a few hours.

And you're right, we cannot speculate on this and I don't believe we
could make a release without actually trying the tool on different
workloads.

Thomas
_______________________________________________
rhq-users mailing list
rhq-users@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/rhq-users