Paolo Valente kindly wrote a response on my StackExchange question. It is possible StackExchange will remove it, since it was not strictly an answer to my question. So I am re-posting his response on this mailing list.
-------- Forwarded Message --------
From: Paolo Valente paolo.valente@linaro.org
Some information that might be useful for your choice
I'm one of the authors of BFQ, so I'm all but a disinterested party :) But I'll report only numbers obtained with repeatable test.
We have been testing BFQ on SD Cards, eMMC, HDDs, SATA SSDs, and NVMe SSDs. As for HDDs and SSDs, we have run tests with both single-disk and RAID configurations.
In terms of throughput, results can be summarized as follows. With SD Cards, eMMC and HDDs (single and RAID), there is no regression in terms of throughput. In contrast, with HDDs, there is a gain around 20-30% with some workload.
On SSDs, there is a loss of throughput only
- with random sync I/O: around 2-3 % on average SSDs, up to 10-15% on very fast NVMe SSDs. With a workload meant to put BFQ in the most difficult condition, we reached a loss of 18% [1], but in any other third-party test the loss is around 10% in the worst case. This loss is mainly due to the fact that BFQ is not a minimal I/O scheduler. We are working on this. It is not easy; we will need time to fill this gap. - with only-write I/O on very fast SSDs: around 5-10%. This is due to a problem with I/O-request tags. We have already found a solution. Since we do not consider this issue critical, we are giving more priority to other items in our TODO list. If you think otherwise, we are willing to change our priorities.
Because of the above overhead, BFQ cannot process more than 400-500 KIOPS on a commodity CPU.
In terms of responsiveness and latency for time-sensitive applications (such as audio/video players), results are simply incomparable. For example, regardless of the I/O workload in the background, with BFQ applications start as quickly as if the drive was idle. With any of the other schedulers, applications may take ten times as long, or even not start at all (until the background workload is over) [1].
In addition, as for server-like workloads, BFQ enable, e.g., the desired fraction of the I/O bandwidth to be guaranteed to each client (or container, VM, or any other kind of entity sharing storage), while reaching a total throughput not comparable to that reached by any other solution for controlling I/O [2].
Finally, if you are in doubt about some particular workload, we will be glad to test it.
[1] http://algo.ing.unimo.it/people/paolo/disk_sched/results.php