Cloud product kernel requirements

Thu Oct 31 15:26:25 UTC 2013

On Wed, Oct 30, 2013 at 11:04 AM, Matthew Miller
<mattdm at fedoraproject.org> wrote:
> On Wed, Oct 30, 2013 at 10:07:46AM -0400, Josh Boyer wrote:
>> The kernel team has heard in the past that the Cloud group would like
>> to see something of a more minimal kernel for usage in cloud images.
>> We'd like to hear the requirements for what this smaller image would
>> need to cover.
>
> I think the main four cases are:
>
>   1. Running inside a container
>   2. Running as a typical guest in a private or public cloud
>   3. Running as a special guest in the same
>   4. Running on bare metal (as compute node or possibly host node)
>
> Case 1 is easy -- no kernel, no problem.

Or, perhaps more accurately, that case is covered by either someone
else's kernel (which is SEP), or the standard kernel found in
Workstation and Server.

> Case 2 is everything needed to boot and get network, console output, and
> normal storage under KVM, Xen (especially as used in EC2), VirtualBox, and
> VMware. (With priority to the first two.) This *could* be split further,
> making a distinction between cloud providers, but there's diminishing
> returns for effort.

I'm going to be blunt.  VirtualBox and VMware aren't really focal
points for the kernel team, for exact opposite reasons.  VirtualBox
has open-source userspace but no upstream kernel drivers and has been
known to be relatively broken.  We just don't support it.  VMware has
closed userspace, but actually took the time to get their kernel
drivers upstream.  We build those and we'll care to the point of
forwarding bugs in them to the upstream maintainers.  However, from a
testing point of view we aren't going to have time to make sure the
kernel works in those environments.  If that is something that is
important to the Cloud product, please make sure you invest resources
into testing and bug resolution.

> Case 3 covers things like PCI passthrough or running a remote desktop where
> you want virtual sound card support. For this, I think it's perfectly fine
> to say "add the extra drivers pack".

By which you mean admins manually (or via some tool like
puppet/chef/ansible) installs the subpackage, correct?  Not "we create
a special cloud image with the drier subpackage already included".

> Case 4 could use a bit more discussion. *Mostly*, I think we can either say
> that this is the same as case 3 or that we will just use whatever Fedora
> Server does in this case (if different). However, I know oVirt Node (and
> probably also OpenStack node) is concerned with image size on bare metal.
> This would be a good time for anyone interested in that as a focus to chime
> in.

OK.  I literally have no idea how this is different from a minimal
server install, so understanding that would be good.

> More responses below....
>
>
> [...]
>> various things) is about 11MB.  Drivers can be trimmed to a degree,
>> but please keep in mind that the kernel is already relatively small
>> for the functionality it provides.  For example, it is not much bigger
>> than glibc-common (119MB).
>
> Most of that in glibc-common is translations, so that's one of the other
> things we're working on tackling.

Great.  And that package is just a counter-example.  As you noted,
there are others.

>> 1) We're mostly talking about packaging here, not building a separate
>> cloud kernel package or vmlinux.  The kernel team really wants to have
>> a single vmlinux across the 3 products if at all possible.  We can't
>> scale to much else.
>
> Yeah. That also has ripple-effect benefits beyond just the core kernel team
> (QA, documentation...).
>
>> 2) What usecases is the cloud image going to cover?  E.g. is it just
>> virtio stuff, or will it also fit PCI passthru (which then requires
>> drivers for those PCI devices)?
>
> We'll need to develop this in more detail beyond the four general cases
> above. Possibly one of our first trac tickets. :)

Feel free to CC me.  I'm subscribed to the cloud list now, but I can't
say I'll have time to fully pay attention to it.  Please drag me (or
someone else on the kernel team) into specific things if you think you
need to.

>> 3) What are the common provisioning requirements that are driving the
>> size reduction?  (See comment about glibc-common.
>
> Main drivers are network traffic, provisioning speed, and density. With
> probably a smidgen of marketing thrown in.

So the thinking is smaller size means less to transfer, faster to
boot, cheaper to store?  I can see the first one.  The second one is
mostly either going to be in the noise range or just false.  The third
one I don't buy.

Now that's all basically image (as in file) size.  What about runtime
overhead of the kernel?  The server group is likely going to want
things like NR_CPUS to be larger than it is today, which incurs some
runtime memory usage overhead.  It isn't huge, but it would be good to
know what common provisioning is in the cloud environments you're
targeting in terms of memory.

josh