Draft of Tomorrow's 05/21 Print Interview w/Lennart Poettering

Wed May 20 14:12:39 UTC 2009

*The Sound of Fedora 11 - Audio Control with Lennart Poettering*

Where would we be without sound?  It's the most primitive of 
communication methods, and yet it has spawned so much technology around 
it.  Whether you're a musician, a DJ, riding a bus to work or even just 
stuck in a cubicle listening to the radio somewhere, sound has become an 
integral part of our daily experiences.  When Fedora 11 lands along with 
it will land more than a handful of enhancements to the sound subsystem, 
including unified volume control, per stream and per device monitoring 
and proper Bluetooth audio support.  I recently caught up with Lennart 
Poettering, Red Hat Desktop Team Engineer and resident audio guru.  
Here's what he had to say about the upcoming improvements and what the 
future holds:

*1. Please introduce yourself and give us a brief intro to how you 
started working on the upcoming audio improvement in F11.*

I am Lennart Poettering and have been working for Red Hat in the Desktop 
Group for two years now this month. I live in Berlin, Germany.

PA has been part of Fedora since F8. Since then we used to ship two 
volume control appications: the GNOME volume control and a PA specific 
tool (pavucontrol). The latter was mostly a showcase what can be done 
with PA and I wrote it mostly as a demo, not because I thought it was 
any good as an UI.

Of course having these two volume control UIs in Fedora was a situation 
that badly needed fixing. Especially since both UIs exposed too many 
unnecessary options: the GNOME volume control exposed a lot of low-level 
hardware-specific features that only a tiny minority of people actually 
really understood, and the PA volume control exposed a lot of low-level 
software features that a slightly larger minority of people only 
actually really understood.

Now during the last year we reached a point were the feature set of PA 
for volume controls became very complete (with such things as arbitrary 
meta data on every stream/device, per-stream and per-device monitoring, 
hardware volume range extension, "flat" volumes and lots of other stuff) 
and Jon McCan with help from Bastien Nocera finally took up the work to 
fix the UI situation.

They basically designed the new UI from scratch with input from 
usability experts. It implements many of the features the old 
pavucontrol tool did, but in a much nicer, streamlined way. Also it 
integrates sound theme/event sound control with general audio 
configuraton and volume control in a single UI tool.

*2. Can you give us some background on the upcoming changes to the audio 
subsystem in the Fedora 11 Release.*

If you want to know more about the Volume Control, I'd just refer to the 
Feature page:

https://fedoraproject.org/wiki/Features/VolumeControl

We moved PA 0.9.15 into F11, a nice overview over the new features you 
can find here:

http://0pointer.de/blog/projects/oh-nine-fifteen.html

However that overview is a bit out-of-date. There are quite a few 
additional features that went into 0.9.15, most prominently full 
Bluetooth Audio support: Together with Bastien Nocera and the BlueZ guys 
I worked to make Bluetooth audio easily accessible -- the bluetooth 
applet now exposes an easy dialog that allows you to pair and activate a 
bluetooth headset. After that is done it will automatically appear in 
PulseAudio. If you need to reactivate it later, you can do that with a 
simple click in the applet menu. It works surprisingly well. It even 
works fine for lip-sync video. Which is kind of magic, given that 
Bluetooth Audio doesn't actually offer any timing interfaces, so syncing 
up audio with video is not really possible. I spent a lot of time to 
make sure it does work nonetheless, and it seems to work on the majority 
of headphones although I cannot say for sure if it does for all of them.

*3. Where did the ideas to change all this stuff come from. Didn't audio 
always work in Fedora?*

Depends what you mean by 'work'. Sure, basic audio output worked. But in 
many ways what we had on Linux was not comparable to what MacOS or 
Windows supported. And it still isn't in many ways. However in other 
ways we have now surpassed those competitors.

A lot of the changes we introduced with PA are not directly visible to 
the user. For example the so called 'glitch-free' logic in PA is very 
important for a modern audio stack, however the normal user will never 
notice it -- except maybe because when we introduced it initially a lot 
of driver bugs got exposed that people were not aware of before because 
that driver functionality (usually timing related) was not really 
depended on by any application. In fact even now many of the older 
drivers expose broken timing that makes usage with PA not as much fun as 
it could be.

A more detailed explanation of this 'glitch-free' logic you may find here:

http://0pointer.de/blog/projects/pulse-glitch-free.html

Both Windows Vista and MacOS X have similar g-f logic in their audio 
stacks, however with PA we brought it to the next step. For example, we 
implemented this logic in a zero-copy fashion and with arbitrary sample 
types. This allows us to pass PCM data through our pipelines without 
ever having to copy/convert it unless we really have to.

So yes, as you might noticed I spend a lot of time to get low-level 
internals right. And I like to speak about it, even though most people 
are not aware of all those technical details and how awesome this all 
is.  ;-) That said, this stuff isn't perfect yet and could need more 
improvements.

But it's not all just in the low-level details. Also on higher levels we 
got inspired by how our competitors do things. For example the new 
"flat" volume logic was pioneered in Vista, and we have now adopted a 
similar logic in PA. It's a great way to reduce the complexities of 
volume control by 'merging' a few of the sliders in the pipeline. It 
thus solves the "So which slider is now causing my volume to be too 
low?" a bit. But also here, there's more work to be done.

It's not all just getting inspired by our competitors. There are a lot 
of genuinily new features in PA that none of them have (at least to my 
knowledge). For example, in PA we have 'spatial' event sounds. i.e. if 
an event sound sound is triggered by a mouse click/dialog at the left 
side of the screen the sound is generated more from the left speakers, 
and similar for the right side. This is of course mostly a toy. But I 
think a useful one  ;-) .

Listing all the fancy features PA has would certainly be a bit too much 
for your interview. So I'll leave it with this...  ;-)

Generally, we get inspiration from everywhere. And sure, as long as the 
most basic music playback was enough for you audio did always work in 
Fedora. But OTOH, when we started with the integration of all of these 
new audio features into Fedora two years ago the audio stack was still 
at a point of what was modern in the 90's. With the new features of the 
new volume control and PA we are working on bringing Linux audio to what 
is modern today.

*4. Can you also give us a comparison of our new audio framework in 
reference to other audio frameworks and audio subsystem models that are 
out there?*

There are many frameworks out there. On Free Software systems PA doesn't 
really have any competitor. Some people think that JACK is one, but it 
actually is not. JACK is clearly focussed on audio production and not 
very useful on the desktop otherwise. For example, it is strictly 
designed to provide very low-latency at the price of power consumption. 
This is the right thing to do for audio production but not on the 
general desktop. Logic like 'glitch-free' (see above) makes a lot of 
sense for the usual desktop audio since it allows flexible adjusting of 
the latency to what is needed. If used properly it can be used to 
decrease the interrupt rate to 1/s, while still allowing instant 
reaction to user input. Since most PCs these days are laptops theses 
kind of power consumption related features are very important.

One of the current weaker points of Audio on Linux is that we have this 
clear seperation of JACK for audio production and PA for 
desktop/embedded. Other operating systems have managed to make this a 
bit smoother by having a single stack for both. This however actually 
has both advantages and disadvantages.

To improve the sitatuion f now we focussed on making PA and JACK 
cooperate better. In F11 when JACK needs low-level access to an audio 
device it will tell PA so and PA will comply and release the device. 
This should make switching between the two sound systems easier though 
of course this is no perfect solution. Given the lack of manpower 
further integration is unlikely to happen anytime soon -- though both 
the JACK guys and I seem not generally opposed to something like that.

Now, if you compare our audio stacks with those of the big other 
operating systems (Windows and MacOSX), then besides the fact that they 
usually integrate desktop audio and audio production better than we do 
(as mentioned) there are many things we are better in and many they are 
better in. We certainly have more flexibility: i.e. depending on your 
application you can access audio on a lot of different levels: you can 
access ALSA directly if you need very low-level control, or via PA for 
desktop level control. You have APIs like GStreamer for media streaming 
and so on.

This flexibility however translates to more complexity in many ways, and 
a hodgepodge of API styles. (OTOH Apple's CoreAudio actually isn't as 
streamlined as many MacOS proponents like us to think.) The 
documentation for our APIs is usually much worse then theirs. We really 
need some improvements in that area. Featurewise, PA usually has better 
networking related features then those counterparts. But there's a lot 
of features they have right now we lack.

Other Unixes, such as FreeBSD and OpenSolaris are still stuck with OSS 
(Open Sound System) audio. In F11 we finally switched OSS off by default 
(though you can still reenable it via some minor hackery). OSS was the 
predecessor of ALSA. Thankfully it is now fully obsolete on Linux. OSS 
is mostly a design from the early nineties. It has received only minor 
updating since then. It is no way comparable to what we now have on 
Linux or even what MacOS or Windows provide. (Although is has some very 
vocal fans which like to write me hate mails because I say things like this)

*5. This work all started in earlier releases dating all the way to even 
Fedora 8, if I am correct. How has all this stuff progressed and evolved 
from then? What was done in previous releases that enabled building upon 
for this release?*

Fedora 8 was the first release where we integrated PA. In Fedora 9 we 
stabilized PA support. In F10 we integrated the 'glitch-free' logic 
which turned out to be quite a bumpy ride given that it exposed a lot of 
timing related driver bugs. In F11 g-f has now been made more robust and 
most of the more modern audio drivers should now be fixed. Also we have 
now started to push PA support more into the UI, like with this new 
volume control.

*6. What are the plans for the future, if any, in this particular space 
in the distro?*

I am working on multiple things for F12. Firstly there will be a couple 
of more low-level changes to PA. The core will be made more threaded. 
Right now, we run most things in one 'main' thread and do low-level 
audio I/O in one thread for each audio card. My plan for F12 is to split 
that one 'main' thread up into as many threads as possible. THis should 
make PA more robust for a couple of operations, and make latencies more 
reliable.

Then, I am working on considerably beefing up PA's usage of the 
low-level hardware volume controls. For example, many cards have 
seperate low-level volume sliders for "Speaker", "Master", "PCM" (and 
more) that are in the line from the PCM data we stream to the speakers. 
PA currently exposes only one of those sliders (usually "Master"). My 
plan is to 'multiply' those sliders and create a single 'product' 
virtual slider from them that has a better granularity and a larger 
range. This rework will also introduce input/output switching and 
probably more.

What has already landed in PA's git repository is support for UPnP A/V. 
When used in conjunction with Zeeshan Ali's Rygel UPnP MediaServer 
implementation this allows streaming any application's audio to a any 
UPnP MediaRenderer (including PS3/Xboxes and all those 'Internet Radio' 
devices). This is actually pretty neat. Later on we hope to make PA a 
MediaRenderer as as well as a MediaServer. This nicely compliments our 
current Apple RAOP support.

And there's a lot of other things planned. We'll see how much of that 
will be ready for F12. I don't like to talk too much about upcoming 
features and planned code if I don't have anything to show yet, so I'll 
leave it at this.

And then there's always a little project of mine that is called 
'libsydney' that is intended to be a portable, modern and friendly PCM 
API. During the last months I focussed more on PA itself though.

*7. Do you feel that work like this helps enhance the desktop experience 
on Linux in general and strengthens the cause of the Linux Desktop, or 
is it more like all in days work?*

I think that PA is the way forward for audio on the Linux desktop. It 
may have its deficiencies -- but everything has. We still have some way 
to go, but I believe that a modern audio layer is really important for 
the Linux Desktop to succeed.

And no, it doesn't feel at all in a day's work. It always is a great 
feeling to see how PA got incorporated into so many distributions and 
how it is now used by so many people. I am pretty sure that only if you 
hack on Linux software you get this in this ways.

*8. Speaking of all in a days work, what are things do you usually work 
on? What do you most enjoy doing outside of work.*

RH basically hired me to help improving audio on Linux. So that's what I 
am doing during work.

Outside of work spend my time with photopgraphy. And I am trying my best 
to travel to interesting places as much as I can and my time off allows.