systemd and cgroups: heads up

Daniel P. Berrange berrange at redhat.com
Fri Aug 27 09:22:11 UTC 2010


On Thu, Aug 26, 2010 at 02:44:15PM -0400, Daniel J Walsh wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 08/26/2010 01:18 PM, Daniel P. Berrange wrote:
> > On Thu, Aug 26, 2010 at 01:04:33PM -0400, Daniel J Walsh wrote:
> >>
> >> I don't know.  My goal with sandbox was to allow users to startup
> >> sandboxes in such a way that they could be still killed.
> >>
> >> Is there a way in cgroups to say
> >>
> >> dwalsh gets 80% CPU
> >> Then allow dwalsh to specify sandboxes can only use 80% of His CPU.  So
> >> he can kill them.
> > 
> > You can't directly specify absolute CPU%. You can only set relative
> > prioritization between groups via the 'cpu_shares' tunable. A group
> > with double the 'cpu_shares' value will get twice as much running
> > time from the schedular. If you know all groups at a particular
> > level of the hierarchy you can calculate the relative shares 
> > required to give the absolute 80% value, but it gets increasingly
> > "fun" to calculate as you add more groups/shares :-)
> > 
> > eg with 2 cgroups
> > 
> >  group1:  cpu_shares=1024  (20%)
> >  group2:  cpu_shares=4096  (80%)
> > 
> > With 3 groups
> > 
> >  group1:  cpu_shares=512  (10%)
> >  group2:  cpu_shares=512  (10%)
> >  group3:  cpu_shares=4096  (80%)
> > 
> > Or with 3 groups
> > 
> >  group1:  cpu_shares=342  (6.66%)
> >  group1:  cpu_shares=682  (13.34%)
> >  group2:  cpu_shares=4096 (80%)
> > 
> > 
> > 
> > Regards,
> > Daniel
> 
> Seems we have a new hammer and everyone is looking to use.  So far
> systemd, sandbox, libvirt and chrome-sandbox are using it.  Which
> probably is not going to get the results we want.
> 
> Since systemd goal might be to make sure no user uses more then X% of
> memory/CPU/ or setup CPU afinity.  But sandbox and chrome-sandbox might
> allow you to use more.
> 
> Which is why I think the kernel needs to allow nesting of cgroups.

It already does allow nesting. The way libvirt works is that we don't
try to mount any cgroups ourselves. We expect the OS distro or sysadmin
to have mounted any controllers they want to be active. libvirt detects
what location in the cgroups hierarchy libvirtd has been placed. It
then creates 3 levels below this. level one just 'libvirt', level 2
is per libvirt driver 'qemu', 'lxc', level 3 is per guest. In this
way libvirt aims play nicely with whatever systemd/cgconfig do to
place libvirtd in a initial cgroup. All we ask is that other tools
don't mess around with the sub-cgroups we then create. 

This setup means systemd can setup top level cgroups with relative
CPU priorities, to give libvirtd 80% of runtime. libvirtd can then
sub-divide this 80% between all guests it runs at the next levels
it creates.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|


More information about the devel mailing list