On Wed, Mar 22, 2017 at 7:55 PM, Adam Williamson <adamwill@fedoraproject.org> wrote:
Is asking for 'all results for item == X' gonna be too much work for
rdb to handle? Is that the problem?

Sure could be done that way, the only slight difference is through-output.

First up, be sure to have a look at the attached graph_data_count.png representing the number of results per item (Y-axis). tl;dr;
 - DEV contains data since 2015-03-18, and has 226428 distinct items
 - PROD contains data since2017-01-10, and has 30913 distinct items
 - DEV
   - median = 16
   - max = 16817
   - outliers begin at about 140 results per item (cca 7%)
 - PROD
   - median = 17
   - max = 2867
   - outliers begin at about 65 results per item (cca 13%)

The bulk of the outliers is composed of depcheck results, taking gnote-3.17.0-3.fc23 as an example:
  testcase_name   | count
------------------+-------
 dist.rpmlint     |     1
 dist.depcheck    | 16287
 dist.upgradepath |    21

Let's get some stats going on.. I did some tests on DEV by timing various parts of the process of querying data.(inside RedHat, outside Infra), removing the outlier times to crop the IO bound problems with DB (at the moment the DB server a bit overloaded, and data are routinely swapped in and out of memory, thus creating order of magnitude delays for non-cahed data). Roundtrip for a databse-disconnected, almost empty result <http://taskotron-dev.fedoraproject.org/resultsdb_api/api/v2.0/> is about 500 ms, this time will most probably be better for Bodhi, as it is in the same datacenter (I would think), so take the 'roundtrip' times with that in mind

The times are split as such:
 - database ~ the time to get the results from database (read the notes section below)
 - serialization ~ the time it takes to produce the serialized data (dict to be jsonified) based on the data from database
 - jsonify ~ the time it takes to jsonify the prepared dict
 - server ~ sum of all the above
 - roundtrip ~ time it took to get the data

All the test queries were run 25 times, first 5 were discarded (to force the data in question into memory on the DB machine) and the times are averaged.

= 20 results (median)=

`curl "http://taskotron-dev.fedoraproject.org/resultsdb_api/api/v2.0/results?item=hplip-3.16.7-3.fc25&limit=20" -s > /tmp/datafile`
    "database": 0.0072879791259765625,
    "jsonify": 0.0033910274505615234,
    "serialization": 0.19698095321655273,
    "server": 0.20765995979309082,
    "roundtrip": 0.935


= 55 results (3rd quartile) =

`curl "http://taskotron-dev.fedoraproject.org/resultsdb_api/api/v2.0/results?item=dolphin-15.12.3-1.fc24&limit=55" -s > /tmp/datafile`

``
    "database": 0.007297039031982422,
    "jsonify": 0.008601188659667969,
    "serialization": 0.4342458248138428,
    "server": 0.45014405250549316,
    "roundtrip": 1.547


= 140 results (data without outliers) =

`curl "http://taskotron-dev.fedoraproject.org/resultsdb_api/api/v2.0/results?item=moodle-3.0.4-1.fc24&limit=140&callback=timeit" -s > /tmp/datafile`

    "database": 0.015254974365234375,
    "jsonify": 0.022364139556884766,
    "serialization": 1.390178918838501,
    "server": 1.4277980327606201,
    "roundtrip": 3.020


= Notes =

== database vs. serializatoin times ==

The `databse` tiime does not account for loading the extra data, testcase and group information, that is in the `serialization` time. This is mostly due to the fact that I was not able to force the sqlalchemy to load all the related data at once, to be able to profile the actual process of serialization separately. The queries performed there are all only direct index searches over numerical id, but three of them are performed per result. If the webapp-database roundtrip takes noticeable amount of time (and my gut feeling would be that they are not insignificant) , finding a way to load the data all at once might help with the time, at the cost of higher memory consumption. But the time to grab the 'raw' results is still informative, as it shows the base time to even get the process started.


== DB IO ==

As I said earlier, these are the bestcase scenarios - most of the time, you (Bodhi) won't be doing that N-th same query to get that benefit of having data in memory, but the data will need to be loaded from the disk. This is something Tim and I want to get sorted out (by splitting the various databases from the one machine that currently handles them all). But as this will be the "common" expected experience, consider the "slow" times for those same queries:

20 results:
    "database": 0.19269204139709473,
    "jsonify": 0.003347158432006836,
    "serialization": 0.5247390270233154,
    "server": 0.720778226852417

55 results:
    "database": 0.2362060546875,
    "jsonify": 0.008295059204101562,
    "serialization": 0.8430919647216797,
    "server": 1.0875930786132812

140 results:
    "database": 0.013692140579223633,
    "jsonify": 0.02150702476501465,
    "serialization": 2.1643788814544678,
    "server": 2.199578046798706


= Conclusion =

== Regarding ResultsDB ==

I will profile the serialization code itself, and investigate on a way to force less sql queries so we can compare the results. My gut feeling is that we could lower the "best" time focusing on these aspects

The actual problem, that needs solving though, is moving the resultsdb production DB to a separate machine to get rid of that IO bound problems. These cause/will cause the actual slowdown that can be observed by querying ther resultsdb in the semi-random way bodhi does/will be doing.

== Regarding the Bodhi endpoint/middleware optimization ==

I'm not sure it is worth doing _right now_ but given the nature of what it would do, and the way it would be implemented, there would be measurable spead-up in the DB-part, as the database would be optimized for this one specific use-case. It would also lower the Bodhi load, since the de-duplication would be taken care of there, instead of in the consumer.
I still believe, that having designated DB server for resultsdb prod and exploring the loading process will have bigger impact, but I can not take care of the first part on my own.


Did this help to answer your questions?

Josef