From f3b6556eedcc1f8278e45ed809b06f243fe99ced Mon Sep 17 00:00:00 2001 Message-Id: From: "Daniel P. Berrange" Date: Mon, 2 Dec 2013 13:36:29 +0000 Subject: [PATCH] Improve cgroups docs to cover systemd integration For https://bugzilla.redhat.com/show_bug.cgi?id=1004340 As of libvirt 1.1.1 and systemd 205, the cgroups layout used by libvirt has some changes. Update the 'cgroups.html' file from the website to describe how it works in a systemd world. Signed-off-by: Daniel P. Berrange (cherry picked from commit 7f2b173febaefda73b486337b6c53f5c2127070f) Signed-off-by: Jiri Denemark --- docs/cgroups.html.in | 212 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 172 insertions(+), 40 deletions(-) diff --git a/docs/cgroups.html.in b/docs/cgroups.html.in index 77656b2..f7c2450 100644 --- a/docs/cgroups.html.in +++ b/docs/cgroups.html.in @@ -47,17 +47,121 @@

As of libvirt 1.0.5 or later, the cgroups layout created by libvirt has been simplified, in order to facilitate the setup of resource control policies by - administrators / management applications. The layout is based on the concepts of - "partitions" and "consumers". Each virtual machine or container is a consumer, - and has a corresponding cgroup named $VMNAME.libvirt-{qemu,lxc}. - Each consumer is associated with exactly one partition, which also have a - corresponding cgroup usually named $PARTNAME.partition. The - exceptions to this naming rule are the three top level default partitions, - named /system (for system services), /user (for - user login sessions) and /machine (for virtual machines and - containers). By default every consumer will of course be associated with - the /machine partition. This leads to a hierarchy that looks - like + administrators / management applications. The new layout is based on the concepts + of "partitions" and "consumers". A "consumer" is a cgroup which holds the + processes for a single virtual machine or container. A "partition" is a cgroup + which does not contain any processes, but can have resource controls applied. + A "partition" will have zero or more child directories which may be either + "consumer" or "partition". +

+ +

+ As of libvirt 1.1.1 or later, the cgroups layout will have some slight + differences when running on a host with systemd 205 or later. The overall + tree structure is the same, but there are some differences in the naming + conventions for the cgroup directories. Thus the following docs split + in two, one describing systemd hosts and the other non-systemd hosts. +

+ +

Systemd cgroups integration

+ +

+ On hosts which use systemd, each consumer maps to a systemd scope unit, + while partitions map to a system slice unit. +

+ +

Systemd scope naming

+ +

+ The systemd convention is for the scope name of virtual machines / containers + to be of the general format machine-$NAME.scope. Libvirt forms the + $NAME part of this by concatenating the driver type with the name + of the guest, and then escaping any systemd reserved characters. + So for a guest demo running under the lxc driver, + we get a $NAME of lxc-demo which when escaped is + lxc\x2ddemo. So the complete scope name is machine-lxc\x2ddemo.scope. + The scope names map directly to the cgroup directory names. +

+ +

Systemd slice naming

+ +

+ The systemd convention for slice naming is that a slice should include the + name of all of its parents prepended on its own name. So for a libvirt + partition /machine/engineering/testing, the slice name will + be machine-engineering-testing.slice. Again the slice names + map directly to the cgroup directory names. Systemd creates three top level + slices by default, system.slice user.slice and + machine.slice. All virtual machines or containers created + by libvirt will be associated with machine.slice by default. +

+ +

Systemd cgroup layout

+ +

+ Given this, a possible systemd cgroups layout involving 3 qemu guests, + 3 lxc containers and 3 custom child slices, would be: +

+ +
+$ROOT
+  |
+  +- system.slice
+  |   |
+  |   +- libvirtd.service
+  |
+  +- machine.slice
+      |
+      +- machine-qemu\x2dvm1.scope
+      |   |
+      |   +- emulator
+      |   +- vcpu0
+      |   +- vcpu1
+      |
+      +- machine-qemu\x2dvm2.scope
+      |   |
+      |   +- emulator
+      |   +- vcpu0
+      |   +- vcpu1
+      |
+      +- machine-qemu\x2dvm3.scope
+      |   |
+      |   +- emulator
+      |   +- vcpu0
+      |   +- vcpu1
+      |
+      +- machine-engineering.slice
+      |   |
+      |   +- machine-engineering-testing.slice
+      |   |   |
+      |   |   +- machine-lxc\x2dcontainer1.scope
+      |   |
+      |   +- machine-engineering-production.slice
+      |       |
+      |       +- machine-lxc\x2dcontainer2.scope
+      |
+      +- machine-marketing.slice
+          |
+          +- machine-lxc\x2dcontainer3.scope
+    
+ +

Non-systemd cgroups layout

+ +

+ On hosts which do not use systemd, each consumer has a corresponding cgroup + named $VMNAME.libvirt-{qemu,lxc}. Each consumer is associated + with exactly one partition, which also have a corresponding cgroup usually + named $PARTNAME.partition. The exceptions to this naming rule + are the three top level default partitions, named /system (for + system services), /user (for user login sessions) and + /machine (for virtual machines and containers). By default + every consumer will of course be associated with the /machine + partition. +

+ +

+ Given this, a possible systemd cgroups layout involving 3 qemu guests, + 3 lxc containers and 2 custom child slices, would be:

@@ -87,23 +191,21 @@ $ROOT
       |   +- vcpu0
       |   +- vcpu1
       |
-      +- container1.libvirt-lxc
-      |
-      +- container2.libvirt-lxc
+      +- engineering.partition
+      |   |
+      |   +- testing.partition
+      |   |   |
+      |   |   +- container1.libvirt-lxc
+      |   |
+      |   +- production.partition
+      |       |
+      |       +- container2.libvirt-lxc
       |
-      +- container3.libvirt-lxc
+      +- marketing.partition
+          |
+          +- container3.libvirt-lxc
     
-

- The default cgroups layout ensures that, when there is contention for - CPU time, it is shared equally between system services, user sessions - and virtual machines / containers. This prevents virtual machines from - locking the administrator out of the host, or impacting execution of - system services. Conversely, when there is no contention from - system services / user sessions, it is possible for virtual machines - to fully utilize the host CPUs. -

-

Using custom partitions

@@ -127,12 +229,54 @@ $ROOT

+ Note that the partition names in the guest XML are using a + generic naming format, not the low level naming convention + required by the underlying host OS. That is, you should not include + any of the .partition or .slice + suffixes in the XML config. Given a partition name + /machine/production, libvirt will automatically + apply the platform specific translation required to get + /machine/production.partition (non-systemd) + or /machine.slice/machine-production.slice + (systemd) as the underlying cgroup name +

+ +

Libvirt will not auto-create the cgroups directory to back this partition. In the future, libvirt / virsh will provide APIs / commands to create custom partitions, but currently - this is left as an exercise for the administrator. For - example, given the XML config above, the admin would need - to create a cgroup named '/machine/production.partition' + this is left as an exercise for the administrator. +

+ +

+ Note: the ability to place guests in custom + partitions is only available with libvirt >= 1.0.5, using + the new cgroup layout. The legacy cgroups layout described + later in this document did not support customization per guest. +

+ +

Creating custom partitions (systemd)

+ +

+ Given the XML config above, the admin on a systemd based host would + need to create a unit file /etc/systemd/system/machine-production.slice +

+ +
+# cat > /etc/systemd/system/machine-testing.slice <<EOF
+[Unit]
+Description=VM testing slice
+Before=slices.target
+Wants=machine.slice
+EOF
+# systemctl start machine-testing.slice
+    
+ +

Creating custom partitions (non-systemd)

+ +

+ Given the XML config above, the admin on a non-systemd based host + would need to create a cgroup named '/machine/production.partition'

@@ -147,18 +291,6 @@ $ROOT
   done
 
-

- Note: the cgroups directory created as a ".partition" - suffix, but the XML config does not require this suffix. -

- -

- Note: the ability to place guests in custom - partitions is only available with libvirt >= 1.0.5, using - the new cgroup layout. The legacy cgroups layout described - later did not support customization per guest. -

-

Resource management APIs/commands

-- 1.8.4.5