[Fedora-music-list] Low Latency vs. Real Time Kernel - actual latencies ?

Wed Dec 31 21:23:38 UTC 2014

On 12/31/2014 11:51 AM, Brian Monroe wrote:
> I agree that we need lag to be less than 5ms with no xruns for serious
> musicians. It makes a difference.
>
> I was chatting in #opensourcemusicians about the rt patch issue and
> someone threw this out there: http://www.funtoo.org/Kernel/configs/realtime

Sigh:

"This is exactly what the real-time patch is doing: it provides a 
mechanism for aggregating the audio tasks, and for attributing them a 
higher priority than the other tasks."

No, definitely not, this is NOT what the real-time patch does.

ANY KERNEL CAN DO THIS, you do not need to patch it at all.

As usual - I see this frequently - the writer confuses two different 
mechanisms (or layers?) that contribute to audio apps having good 
performance for low latency settings:

1) giving user tasks access to SCHED_FIFO and/or SCHED_RR scheduling. 
What does this mean? The audio threads in your audio applications will 
be able to run in this scheduling ring and will preempt any other 
processes in the computer (that is, the audio threads have priority over 
everything else). This can be done with /etc/limits.d/* (the current 
solution) or cgroups (newer, only available in newer kernels). Both 
limits.conf and cgroups can do the same thing - cgroups can also reserve 
some CPU for non-audio tasks (could be a bad thing, could be a good 
thing, it depends on your goals).

If this is not done you will not get good performance out of audio apps, 
period. And an RT patched kernel will not help at all.

2) running a kernel that has good low latency performance. There is a 
whole range of options for this. The simplest is to enable full 
preemption in a vanilla kernel. What does this do? It tries to minimize 
the time the kernel spends in critical sections of code within which 
scheduling is forbidden. If you can't schedule an audio task for a 
"long" time you will get a click as the sound card is starved of 
samples. These options can have a small but probably measurable impact 
in overall performance (ie: nothing is free).

A step further is to boot the kernel with the threadirqs parameter _and_ 
properly optimize the priorities of the IRQ kernel threads (the rtirq 
package does that). What does this do? It makes sure that the interrupt 
request of the sound card is processed with higher priority than 
(almost) all the others. The processing of the IRQ will trigger the 
scheduling of the userland task that handles the audio samples, so it is 
important to do this as well (and the priorities of the IRQ handling 
routines and the userland audio threads - jack, for example - have to be 
properly ordered).

Going further you tinker with the kernel itself by patching it so that 
more of it can be preempted (the type of kernel I maintain for Planet 
CCRMA). This is the RT patch which is maintained separately from the 
vanilla kernel. It significantly lowers the time the kernel spends in 
critical sections of code that can stop scheduling of tasks. The smaller 
that time, the faster an audio task will be scheduled after the sound 
card signals the system it has (or needs) samples.

As there are less users actively using the RT patch there are more bugs. 
Also, the RT patch has in the past uncovered bugs in the mainline code 
that only showed up with the RT patch.

In the past years code from the RT patch has slowly migrated to the 
vanilla kernel, so that the maximum latency of a properly configured 
vanilla kernel has gone down significantly.

This is all further complicated by hyperthreading (fake cpu cores) and 
the newest intel_pstate power budget cpu core speed control driver. You 
need to optimize those things as well. For best performance (or even 
decent performance) you will have to enable full speed on all cpu cores, 
and very likely disable hyperthreading as well. I used to not notice a 
difference with hyperthreading, but in my latest hardware I really need 
to do that.

And on some hardware (usually laptops) you are dead in the water, the 
BIOS can have badly designed MSI(sp?) handlers that tie up the CPU for 
milliseconds and screw up everything that Linux can try to do. Nothing 
can be done save for complaining to the vendor and upgrading the BIOS.

Anyway, hopefully this was a clear explanation...

> I'm smart enough to get what they're doing, but not smart enough to know
> if this is what we're already doing in Fedora or how it'll affect other
> security concerns.

----
$ grep PREEMPT /boot/config-3.17.4-200.fc20.x86_64
# CONFIG_PREEMPT_RCU is not set
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
----

So we do not have CONFIG_PREEMPT set. The Fedora kernel is not optimized 
for low latency operation. See:

https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions

for some of the options available (PREEMPT_VOLUNTARY is the lowest 
possible optimization).

>Also the settings listed
> for /etc/security/limits.conf is setting you up for a bad time.
>
> They said they could get less than 1ms with no xruns (except at
> application startup) which sounds promising. Certainly if we're shooting
> for less than 5 ms instead of less than 1ms.

The statement in that page regarding performance is, well, meaningless. 
It does not state what hardware is used. It also says that it gets xruns 
"only at application startup" (which application? under which conditions?).

If you are running a properly tuned system and the audio applications 
are properly coded - a big if - then you should never[*] get xruns. If 
you get them sometimes it means that, well, your system is not useful 
for low latency work. The question would be: do you get the same 
performance if you _load_ your system? Can you play at 16x2 without 
xruns while reading email, browsing the web and copying a file tree with 
rsync? Even when all CPU cores are cranking up at 60-70% utilization? If 
the answer is yes then you are in business...

-- Fernando

[*] never does not really really mean never, if the load of the computer 
is really really high then at some point you will run out of CPU and you 
will get an xrun. At that point you need to get a faster computer :-)