If you are using the libcgroup package, and in particular the cgconfig serivice, be aware that this will break systemd. This package is pulled into Fedora by policycoreutils, so you likely have it on your system. However, cgconfig is not enabled by default.
https://bugzilla.redhat.com/show_bug.cgi?id=626794
On Wed, 25.08.10 17:04, Matthew Miller (mattdm@mattdm.org) wrote:
If you are using the libcgroup package, and in particular the cgconfig serivice, be aware that this will break systemd. This package is pulled into Fedora by policycoreutils, so you likely have it on your system. However, cgconfig is not enabled by default.
Hmm, why is libcgroup pulled in by policycoreutils? What's the rationale?
libcgroup should not interfere with /sys/fs/cgroup/systemd. That's systemd's turf, and to make that clear it is called... well... "systemd"...
https://bugzilla.redhat.com/show_bug.cgi?id=627378
Lennart
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 08/25/2010 05:46 PM, Lennart Poettering wrote:
On Wed, 25.08.10 17:04, Matthew Miller (mattdm@mattdm.org) wrote:
If you are using the libcgroup package, and in particular the cgconfig serivice, be aware that this will break systemd. This package is pulled into Fedora by policycoreutils, so you likely have it on your system. However, cgconfig is not enabled by default.
Hmm, why is libcgroup pulled in by policycoreutils? What's the rationale?
libcgroup should not interfere with /sys/fs/cgroup/systemd. That's systemd's turf, and to make that clear it is called... well... "systemd"...
https://bugzilla.redhat.com/show_bug.cgi?id=627378
Lennart
It is used for confining sandboxes.
On Wed, Aug 25, 2010 at 10:13:05PM -0400, Daniel J Walsh wrote:
Hmm, why is libcgroup pulled in by policycoreutils? What's the rationale?
It is used for confining sandboxes.
Having now looked at both projects, it appears to me that they are in conflict. They could be made to work side by side, in the same way that systemd's cron replacement feature doesn't necessarily mean that you can't run traditional crond, but there is significant overlap in terms of categorization policy. That is, libcgroup uses cgclassify to put stuff into cgroups, whereas systemd uses pam_systemd for users and creates cgroups automatically for services.
This overlap doesn't seem good for the distribution.
Dan, *could* systemd as it stands provide what you need for sandboxes?
On Thu, Aug 26, 2010 at 09:59:59AM -0400, Matthew Miller wrote:
Dan, *could* systemd as it stands provide what you need for sandboxes?
Having looked a bit more at libcgroup, let me put this question in an entirely different way, because I understand better what's going on. So:
Dan, do you use the userspace tools bundled with libcgroup, or do you just use the library?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 08/26/2010 12:18 PM, Matthew Miller wrote:
On Thu, Aug 26, 2010 at 09:59:59AM -0400, Matthew Miller wrote:
Dan, *could* systemd as it stands provide what you need for sandboxes?
Having looked a bit more at libcgroup, let me put this question in an entirely different way, because I understand better what's going on. So:
Dan, do you use the userspace tools bundled with libcgroup, or do you just use the library?
We are using the library.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 08/26/2010 09:59 AM, Matthew Miller wrote:
On Wed, Aug 25, 2010 at 10:13:05PM -0400, Daniel J Walsh wrote:
Hmm, why is libcgroup pulled in by policycoreutils? What's the rationale?
It is used for confining sandboxes.
Having now looked at both projects, it appears to me that they are in conflict. They could be made to work side by side, in the same way that systemd's cron replacement feature doesn't necessarily mean that you can't run traditional crond, but there is significant overlap in terms of categorization policy. That is, libcgroup uses cgclassify to put stuff into cgroups, whereas systemd uses pam_systemd for users and creates cgroups automatically for services.
This overlap doesn't seem good for the distribution.
Dan, *could* systemd as it stands provide what you need for sandboxes?
I don't know. My goal with sandbox was to allow users to startup sandboxes in such a way that they could be still killed.
Is there a way in cgroups to say
dwalsh gets 80% CPU Then allow dwalsh to specify sandboxes can only use 80% of His CPU. So he can kill them.
On Thu, Aug 26, 2010 at 01:04:33PM -0400, Daniel J Walsh wrote:
I don't know. My goal with sandbox was to allow users to startup sandboxes in such a way that they could be still killed.
Is there a way in cgroups to say
dwalsh gets 80% CPU Then allow dwalsh to specify sandboxes can only use 80% of His CPU. So he can kill them.
You can't directly specify absolute CPU%. You can only set relative prioritization between groups via the 'cpu_shares' tunable. A group with double the 'cpu_shares' value will get twice as much running time from the schedular. If you know all groups at a particular level of the hierarchy you can calculate the relative shares required to give the absolute 80% value, but it gets increasingly "fun" to calculate as you add more groups/shares :-)
eg with 2 cgroups
group1: cpu_shares=1024 (20%) group2: cpu_shares=4096 (80%)
With 3 groups
group1: cpu_shares=512 (10%) group2: cpu_shares=512 (10%) group3: cpu_shares=4096 (80%)
Or with 3 groups
group1: cpu_shares=342 (6.66%) group1: cpu_shares=682 (13.34%) group2: cpu_shares=4096 (80%)
Regards, Daniel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 08/26/2010 01:18 PM, Daniel P. Berrange wrote:
On Thu, Aug 26, 2010 at 01:04:33PM -0400, Daniel J Walsh wrote:
I don't know. My goal with sandbox was to allow users to startup sandboxes in such a way that they could be still killed.
Is there a way in cgroups to say
dwalsh gets 80% CPU Then allow dwalsh to specify sandboxes can only use 80% of His CPU. So he can kill them.
You can't directly specify absolute CPU%. You can only set relative prioritization between groups via the 'cpu_shares' tunable. A group with double the 'cpu_shares' value will get twice as much running time from the schedular. If you know all groups at a particular level of the hierarchy you can calculate the relative shares required to give the absolute 80% value, but it gets increasingly "fun" to calculate as you add more groups/shares :-)
eg with 2 cgroups
group1: cpu_shares=1024 (20%) group2: cpu_shares=4096 (80%)
With 3 groups
group1: cpu_shares=512 (10%) group2: cpu_shares=512 (10%) group3: cpu_shares=4096 (80%)
Or with 3 groups
group1: cpu_shares=342 (6.66%) group1: cpu_shares=682 (13.34%) group2: cpu_shares=4096 (80%)
Regards, Daniel
Seems we have a new hammer and everyone is looking to use. So far systemd, sandbox, libvirt and chrome-sandbox are using it. Which probably is not going to get the results we want.
Since systemd goal might be to make sure no user uses more then X% of memory/CPU/ or setup CPU afinity. But sandbox and chrome-sandbox might allow you to use more.
Which is why I think the kernel needs to allow nesting of cgroups.
On Thu, Aug 26, 2010 at 8:44 PM, Daniel J Walsh dwalsh@redhat.com wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 08/26/2010 01:18 PM, Daniel P. Berrange wrote:
On Thu, Aug 26, 2010 at 01:04:33PM -0400, Daniel J Walsh wrote:
I don't know. My goal with sandbox was to allow users to startup sandboxes in such a way that they could be still killed.
Is there a way in cgroups to say
dwalsh gets 80% CPU Then allow dwalsh to specify sandboxes can only use 80% of His CPU. So he can kill them.
You can't directly specify absolute CPU%. You can only set relative prioritization between groups via the 'cpu_shares' tunable. A group with double the 'cpu_shares' value will get twice as much running time from the schedular. If you know all groups at a particular level of the hierarchy you can calculate the relative shares required to give the absolute 80% value, but it gets increasingly "fun" to calculate as you add more groups/shares :-)
eg with 2 cgroups
group1: cpu_shares=1024 (20%) group2: cpu_shares=4096 (80%)
With 3 groups
group1: cpu_shares=512 (10%) group2: cpu_shares=512 (10%) group3: cpu_shares=4096 (80%)
Or with 3 groups
group1: cpu_shares=342 (6.66%) group1: cpu_shares=682 (13.34%) group2: cpu_shares=4096 (80%)
Regards, Daniel
Seems we have a new hammer and everyone is looking to use. So far systemd, sandbox, libvirt and chrome-sandbox are using it. Which probably is not going to get the results we want.
Since systemd goal might be to make sure no user uses more then X% of memory/CPU/ or setup CPU afinity. But sandbox and chrome-sandbox might allow you to use more.
As Daniel Berrange has stated, you cannot do X% of CPU yet for fair scheduling. I think the feature is under development, but it will be sometime before it hits upstream.
Which is why I think the kernel needs to allow nesting of cgroups.
The kernel already allows nesting of cgroups.
Dhaval
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 08/26/2010 02:49 PM, Dhaval Giani wrote:
On Thu, Aug 26, 2010 at 8:44 PM, Daniel J Walsh dwalsh@redhat.com wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 08/26/2010 01:18 PM, Daniel P. Berrange wrote:
On Thu, Aug 26, 2010 at 01:04:33PM -0400, Daniel J Walsh wrote:
I don't know. My goal with sandbox was to allow users to startup sandboxes in such a way that they could be still killed.
Is there a way in cgroups to say
dwalsh gets 80% CPU Then allow dwalsh to specify sandboxes can only use 80% of His CPU. So he can kill them.
You can't directly specify absolute CPU%. You can only set relative prioritization between groups via the 'cpu_shares' tunable. A group with double the 'cpu_shares' value will get twice as much running time from the schedular. If you know all groups at a particular level of the hierarchy you can calculate the relative shares required to give the absolute 80% value, but it gets increasingly "fun" to calculate as you add more groups/shares :-)
eg with 2 cgroups
group1: cpu_shares=1024 (20%) group2: cpu_shares=4096 (80%)
With 3 groups
group1: cpu_shares=512 (10%) group2: cpu_shares=512 (10%) group3: cpu_shares=4096 (80%)
Or with 3 groups
group1: cpu_shares=342 (6.66%) group1: cpu_shares=682 (13.34%) group2: cpu_shares=4096 (80%)
Regards, Daniel
Seems we have a new hammer and everyone is looking to use. So far systemd, sandbox, libvirt and chrome-sandbox are using it. Which probably is not going to get the results we want.
Since systemd goal might be to make sure no user uses more then X% of memory/CPU/ or setup CPU afinity. But sandbox and chrome-sandbox might allow you to use more.
As Daniel Berrange has stated, you cannot do X% of CPU yet for fair scheduling. I think the feature is under development, but it will be sometime before it hits upstream.
Which is why I think the kernel needs to allow nesting of cgroups.
The kernel already allows nesting of cgroups.
Dhaval
Ok, then I guess sandbox and chrome-sandbox should check their current cgroup and create a subgroup within them.
On Thu, Aug 26, 2010 at 02:44:15PM -0400, Daniel J Walsh wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 08/26/2010 01:18 PM, Daniel P. Berrange wrote:
On Thu, Aug 26, 2010 at 01:04:33PM -0400, Daniel J Walsh wrote:
I don't know. My goal with sandbox was to allow users to startup sandboxes in such a way that they could be still killed.
Is there a way in cgroups to say
dwalsh gets 80% CPU Then allow dwalsh to specify sandboxes can only use 80% of His CPU. So he can kill them.
You can't directly specify absolute CPU%. You can only set relative prioritization between groups via the 'cpu_shares' tunable. A group with double the 'cpu_shares' value will get twice as much running time from the schedular. If you know all groups at a particular level of the hierarchy you can calculate the relative shares required to give the absolute 80% value, but it gets increasingly "fun" to calculate as you add more groups/shares :-)
eg with 2 cgroups
group1: cpu_shares=1024 (20%) group2: cpu_shares=4096 (80%)
With 3 groups
group1: cpu_shares=512 (10%) group2: cpu_shares=512 (10%) group3: cpu_shares=4096 (80%)
Or with 3 groups
group1: cpu_shares=342 (6.66%) group1: cpu_shares=682 (13.34%) group2: cpu_shares=4096 (80%)
Regards, Daniel
Seems we have a new hammer and everyone is looking to use. So far systemd, sandbox, libvirt and chrome-sandbox are using it. Which probably is not going to get the results we want.
Since systemd goal might be to make sure no user uses more then X% of memory/CPU/ or setup CPU afinity. But sandbox and chrome-sandbox might allow you to use more.
Which is why I think the kernel needs to allow nesting of cgroups.
It already does allow nesting. The way libvirt works is that we don't try to mount any cgroups ourselves. We expect the OS distro or sysadmin to have mounted any controllers they want to be active. libvirt detects what location in the cgroups hierarchy libvirtd has been placed. It then creates 3 levels below this. level one just 'libvirt', level 2 is per libvirt driver 'qemu', 'lxc', level 3 is per guest. In this way libvirt aims play nicely with whatever systemd/cgconfig do to place libvirtd in a initial cgroup. All we ask is that other tools don't mess around with the sub-cgroups we then create.
This setup means systemd can setup top level cgroups with relative CPU priorities, to give libvirtd 80% of runtime. libvirtd can then sub-divide this 80% between all guests it runs at the next levels it creates.
Regards, Daniel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 08/27/2010 05:22 AM, Daniel P. Berrange wrote:
On Thu, Aug 26, 2010 at 02:44:15PM -0400, Daniel J Walsh wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 08/26/2010 01:18 PM, Daniel P. Berrange wrote:
On Thu, Aug 26, 2010 at 01:04:33PM -0400, Daniel J Walsh wrote:
I don't know. My goal with sandbox was to allow users to startup sandboxes in such a way that they could be still killed.
Is there a way in cgroups to say
dwalsh gets 80% CPU Then allow dwalsh to specify sandboxes can only use 80% of His CPU. So he can kill them.
You can't directly specify absolute CPU%. You can only set relative prioritization between groups via the 'cpu_shares' tunable. A group with double the 'cpu_shares' value will get twice as much running time from the schedular. If you know all groups at a particular level of the hierarchy you can calculate the relative shares required to give the absolute 80% value, but it gets increasingly "fun" to calculate as you add more groups/shares :-)
eg with 2 cgroups
group1: cpu_shares=1024 (20%) group2: cpu_shares=4096 (80%)
With 3 groups
group1: cpu_shares=512 (10%) group2: cpu_shares=512 (10%) group3: cpu_shares=4096 (80%)
Or with 3 groups
group1: cpu_shares=342 (6.66%) group1: cpu_shares=682 (13.34%) group2: cpu_shares=4096 (80%)
Regards, Daniel
Seems we have a new hammer and everyone is looking to use. So far systemd, sandbox, libvirt and chrome-sandbox are using it. Which probably is not going to get the results we want.
Since systemd goal might be to make sure no user uses more then X% of memory/CPU/ or setup CPU afinity. But sandbox and chrome-sandbox might allow you to use more.
Which is why I think the kernel needs to allow nesting of cgroups.
It already does allow nesting. The way libvirt works is that we don't try to mount any cgroups ourselves. We expect the OS distro or sysadmin to have mounted any controllers they want to be active. libvirt detects what location in the cgroups hierarchy libvirtd has been placed. It then creates 3 levels below this. level one just 'libvirt', level 2 is per libvirt driver 'qemu', 'lxc', level 3 is per guest. In this way libvirt aims play nicely with whatever systemd/cgconfig do to place libvirtd in a initial cgroup. All we ask is that other tools don't mess around with the sub-cgroups we then create.
This setup means systemd can setup top level cgroups with relative CPU priorities, to give libvirtd 80% of runtime. libvirtd can then sub-divide this 80% between all guests it runs at the next levels it creates.
Regards, Daniel
That sounds perfect. I will look into how you did this and copy it for sandbox.
On Thu, 26.08.10 13:04, Daniel J Walsh (dwalsh@redhat.com) wrote:
Dan, *could* systemd as it stands provide what you need for sandboxes?
I don't know. My goal with sandbox was to allow users to startup sandboxes in such a way that they could be still killed.
Is there a way in cgroups to say
dwalsh gets 80% CPU Then allow dwalsh to specify sandboxes can only use 80% of His CPU. So he can kill them.
systemd is not a tool to set up anything like a sandbox or even something that resembles a container or VM. systemd uses cgroups only to the level that there is a 1:1 relationship between services and cgroups, and does not get beyond that. That makes it not particularly useful for Dan's usecase I believe.
Lennart
On Thu, 26.08.10 09:59, Matthew Miller (mattdm@mattdm.org) wrote:
On Wed, Aug 25, 2010 at 10:13:05PM -0400, Daniel J Walsh wrote:
Hmm, why is libcgroup pulled in by policycoreutils? What's the rationale?
It is used for confining sandboxes.
Having now looked at both projects, it appears to me that they are in conflict. They could be made to work side by side, in the same way that systemd's cron replacement feature doesn't necessarily mean that you can't run traditional crond, but there is significant overlap in terms of categorization policy. That is, libcgroup uses cgclassify to put stuff into cgroups, whereas systemd uses pam_systemd for users and creates cgroups automatically for services.
This overlap doesn't seem good for the distribution.
While there indeed is some overlap, both projects make a lot of sense when used in conjunction with each other (and independently anyway).
What I mean to say with this is the following: systemd's focus when handling cgroups is clearly on services: i.e. that there is an implicit 1:1 relationship between each service and its respective cgroup. On the other hand libcgroup and its tools allow you to set up arbitrary group hierarchies in a much more flexible way. Example: with systemd you will get one group for apache, and one for mysql and one for postfix. With libcgroup you can set up a group hierarchy that would for example seperate the web server part from the mail server part, i.e. make mysql and apache go in one group and make postfix go in another. This would be a less detailed view on things. However, you can use libcgroup to make things more fine grained too: i.e. while systemd would put all apache processes and its cgi scripts into one cgroup, with libcgroup you could split them up into multiple cgroups.
So, even if both systemd and libcgroup create and maintain cgroups, their focus is certainly different.
Note that systemd even has explicit support for cgroups created by other software such as libcgroup: by using the ControlGroup= switch in service files you can move your services into arbitrary groups in abitrary hierarchies, and then use for example the libcgroup tools to set limits or other properties of these groups. While systemd can work fine without libcgroup we carefully made sure that if you want to use them together you can do this nicely and systemd supports you in this.
I plan to explain how this works in a later blog story in more detail.
Regarding the compatiblity of libcgroup and systemd right now. I see three issues:
1) The mount point for the cgroup hierarchies has recently changed in the upstream kernel, from /cgroup/ to /sys/fs/cgroup/. systemd 8 now follows that scheme, libcgroup stil needs some updating, to use this mount point out of the box for its hierarchies. (bug filed)
2) systemd mounts all hierarchies exposed by the kernel by default. The scheme how it does that follows the default configuration libcgroup installs (modulo the recent /sys/fs/cgroup root dir change). We mount all hierarchies because we can then make them available with the ControlGroup= switch, way before the libcgroup init script is even run. systemd only really insists on its own hierarchy to be around, i.e. /sys/fs/cgroup/systemd/, the other hiearchies can actually be remounted differently later on, and moved to other places if the user really wants that. However, I personally see little reason to encourage this, for the same reasons we don't allow people to mount /sys to a different place even if the kernel would be fine with that. Jan raised the issue that mounting things like this by default would make it imposible to use hierarchies with more than one controller. However, I am not sure we want to support this, since firstly libcgroup makes it unnecessary to mount hierarchies like that because it is able to synchronize hierarchies anyway, and secondly this combined mounting only takes away features, and doesn't add any. But again, the fact that systemd makes all hierarchies available to you ot of the box doesn't mean you couldnt change them -- with the exception of the systemd hierarchy itself.
3) libcgroup currently tempers with the systemd tree in some cases, where it shouldn't. Dhaval already agreed to change this, and make sure libcgroup always leaves the systemd tree unmodified.
I hope this clears things up a little. The summary:
There's not systemd vs. libcgroup; more a systemd + libcgroup = ♥
Lennart
On Thu, Aug 26, 2010 at 10:19:44PM +0200, Lennart Poettering wrote:
- systemd mounts all hierarchies exposed by the kernel by default. The scheme how it does that follows the default configuration libcgroup installs (modulo the recent /sys/fs/cgroup root dir change). We mount
Would it be possible to have systemd either use libcgroup to mount these directories, or to parse the libcgroup config file to determine where to put the mounts?
I hope this clears things up a little. The summary: There's not systemd vs. libcgroup; more a systemd + libcgroup = ♥
:)
On Thu, 26.08.10 17:03, Matthew Miller (mattdm@mattdm.org) wrote:
On Thu, Aug 26, 2010 at 10:19:44PM +0200, Lennart Poettering wrote:
- systemd mounts all hierarchies exposed by the kernel by default. The scheme how it does that follows the default configuration libcgroup installs (modulo the recent /sys/fs/cgroup root dir change). We mount
Would it be possible to have systemd either use libcgroup to mount these directories, or to parse the libcgroup config file to determine where to put the mounts?
libcgroup doesn't really have an API for mounting things.
Note however, that when libcgroup initializes it figures out where the hierarchies are mounted based on the mount info exported by the kernel, so regardless who mounted the hiearchies and where they are mounted libcgroup actually does the right thing.
Lennart
On Thu, 26.08.10 23:30, Lennart Poettering (mzerqung@0pointer.de) wrote:
On Thu, 26.08.10 17:03, Matthew Miller (mattdm@mattdm.org) wrote:
On Thu, Aug 26, 2010 at 10:19:44PM +0200, Lennart Poettering wrote:
- systemd mounts all hierarchies exposed by the kernel by default. The scheme how it does that follows the default configuration libcgroup installs (modulo the recent /sys/fs/cgroup root dir change). We mount
Would it be possible to have systemd either use libcgroup to mount these directories, or to parse the libcgroup config file to determine where to put the mounts?
libcgroup doesn't really have an API for mounting things.
Note however, that when libcgroup initializes it figures out where the hierarchies are mounted based on the mount info exported by the kernel, so regardless who mounted the hiearchies and where they are mounted libcgroup actually does the right thing.
And one more thing: the kernel actually allows mounting of hierarchies to multiple places. That means even if systemd mounts a controller to /sys/fs/cgroup/foo the user may still choose to mount it to /cgroup/bar and the tree will be visible at both places.
Lennart
On Thu, Aug 26, 2010 at 11:30:59PM +0200, Lennart Poettering wrote:
Would it be possible to have systemd either use libcgroup to mount these directories, or to parse the libcgroup config file to determine where to put the mounts?
libcgroup doesn't really have an API for mounting things.
Okay, yeah, I was confused there between the library and the included tools.