c401cc
From f3b6556eedcc1f8278e45ed809b06f243fe99ced Mon Sep 17 00:00:00 2001
c401cc
Message-Id: <f3b6556eedcc1f8278e45ed809b06f243fe99ced.1386348946.git.jdenemar@redhat.com>
c401cc
From: "Daniel P. Berrange" <berrange@redhat.com>
c401cc
Date: Mon, 2 Dec 2013 13:36:29 +0000
c401cc
Subject: [PATCH] Improve cgroups docs to cover systemd integration
c401cc
c401cc
For
c401cc
c401cc
  https://bugzilla.redhat.com/show_bug.cgi?id=1004340
c401cc
c401cc
As of libvirt 1.1.1 and systemd 205, the cgroups layout used by
c401cc
libvirt has some changes. Update the 'cgroups.html' file from
c401cc
the website to describe how it works in a systemd world.
c401cc
c401cc
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
c401cc
(cherry picked from commit 7f2b173febaefda73b486337b6c53f5c2127070f)
c401cc
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
c401cc
---
c401cc
 docs/cgroups.html.in | 212 +++++++++++++++++++++++++++++++++++++++++----------
c401cc
 1 file changed, 172 insertions(+), 40 deletions(-)
c401cc
c401cc
diff --git a/docs/cgroups.html.in b/docs/cgroups.html.in
c401cc
index 77656b2..f7c2450 100644
c401cc
--- a/docs/cgroups.html.in
c401cc
+++ b/docs/cgroups.html.in
c401cc
@@ -47,17 +47,121 @@
c401cc
     

c401cc
       As of libvirt 1.0.5 or later, the cgroups layout created by libvirt has been
c401cc
       simplified, in order to facilitate the setup of resource control policies by
c401cc
-      administrators / management applications. The layout is based on the concepts of
c401cc
-      "partitions" and "consumers". Each virtual machine or container is a consumer,
c401cc
-      and has a corresponding cgroup named $VMNAME.libvirt-{qemu,lxc}.
c401cc
-      Each consumer is associated with exactly one partition, which also have a
c401cc
-      corresponding cgroup usually named $PARTNAME.partition. The
c401cc
-      exceptions to this naming rule are the three top level default partitions,
c401cc
-      named /system (for system services), /user (for
c401cc
-      user login sessions) and /machine (for virtual machines and
c401cc
-      containers). By default every consumer will of course be associated with
c401cc
-      the /machine partition. This leads to a hierarchy that looks
c401cc
-      like
c401cc
+      administrators / management applications. The new layout is based on the concepts
c401cc
+      of "partitions" and "consumers". A "consumer" is a cgroup which holds the
c401cc
+      processes for a single virtual machine or container. A "partition" is a cgroup
c401cc
+      which does not contain any processes, but can have resource controls applied.
c401cc
+      A "partition" will have zero or more child directories which may be either
c401cc
+      "consumer" or "partition".
c401cc
+    

c401cc
+
c401cc
+    

c401cc
+      As of libvirt 1.1.1 or later, the cgroups layout will have some slight
c401cc
+      differences when running on a host with systemd 205 or later. The overall
c401cc
+      tree structure is the same, but there are some differences in the naming
c401cc
+      conventions for the cgroup directories. Thus the following docs split
c401cc
+      in two, one describing systemd hosts and the other non-systemd hosts.
c401cc
+    

c401cc
+
c401cc
+    

Systemd cgroups integration

c401cc
+
c401cc
+    

c401cc
+      On hosts which use systemd, each consumer maps to a systemd scope unit,
c401cc
+      while partitions map to a system slice unit.
c401cc
+    

c401cc
+
c401cc
+    

Systemd scope naming

c401cc
+
c401cc
+    

c401cc
+      The systemd convention is for the scope name of virtual machines / containers
c401cc
+      to be of the general format machine-$NAME.scope. Libvirt forms the
c401cc
+      $NAME part of this by concatenating the driver type with the name
c401cc
+      of the guest, and then escaping any systemd reserved characters.
c401cc
+      So for a guest demo running under the lxc driver,
c401cc
+      we get a $NAME of lxc-demo which when escaped is
c401cc
+      lxc\x2ddemo. So the complete scope name is machine-lxc\x2ddemo.scope.
c401cc
+      The scope names map directly to the cgroup directory names.
c401cc
+    

c401cc
+
c401cc
+    

Systemd slice naming

c401cc
+
c401cc
+    

c401cc
+      The systemd convention for slice naming is that a slice should include the
c401cc
+      name of all of its parents prepended on its own name. So for a libvirt
c401cc
+      partition /machine/engineering/testing, the slice name will
c401cc
+      be machine-engineering-testing.slice. Again the slice names
c401cc
+      map directly to the cgroup directory names. Systemd creates three top level
c401cc
+      slices by default, system.slice user.slice and
c401cc
+      machine.slice. All virtual machines or containers created
c401cc
+      by libvirt will be associated with machine.slice by default.
c401cc
+    

c401cc
+
c401cc
+    

Systemd cgroup layout

c401cc
+
c401cc
+    

c401cc
+      Given this, a possible systemd cgroups layout involving 3 qemu guests,
c401cc
+      3 lxc containers and 3 custom child slices, would be:
c401cc
+    

c401cc
+
c401cc
+    
c401cc
+$ROOT
c401cc
+  |
c401cc
+  +- system.slice
c401cc
+  |   |
c401cc
+  |   +- libvirtd.service
c401cc
+  |
c401cc
+  +- machine.slice
c401cc
+      |
c401cc
+      +- machine-qemu\x2dvm1.scope
c401cc
+      |   |
c401cc
+      |   +- emulator
c401cc
+      |   +- vcpu0
c401cc
+      |   +- vcpu1
c401cc
+      |
c401cc
+      +- machine-qemu\x2dvm2.scope
c401cc
+      |   |
c401cc
+      |   +- emulator
c401cc
+      |   +- vcpu0
c401cc
+      |   +- vcpu1
c401cc
+      |
c401cc
+      +- machine-qemu\x2dvm3.scope
c401cc
+      |   |
c401cc
+      |   +- emulator
c401cc
+      |   +- vcpu0
c401cc
+      |   +- vcpu1
c401cc
+      |
c401cc
+      +- machine-engineering.slice
c401cc
+      |   |
c401cc
+      |   +- machine-engineering-testing.slice
c401cc
+      |   |   |
c401cc
+      |   |   +- machine-lxc\x2dcontainer1.scope
c401cc
+      |   |
c401cc
+      |   +- machine-engineering-production.slice
c401cc
+      |       |
c401cc
+      |       +- machine-lxc\x2dcontainer2.scope
c401cc
+      |
c401cc
+      +- machine-marketing.slice
c401cc
+          |
c401cc
+          +- machine-lxc\x2dcontainer3.scope
c401cc
+    
c401cc
+
c401cc
+    

Non-systemd cgroups layout

c401cc
+
c401cc
+    

c401cc
+      On hosts which do not use systemd, each consumer has a corresponding cgroup
c401cc
+      named $VMNAME.libvirt-{qemu,lxc}. Each consumer is associated
c401cc
+      with exactly one partition, which also have a corresponding cgroup usually
c401cc
+      named $PARTNAME.partition. The exceptions to this naming rule
c401cc
+      are the three top level default partitions, named /system (for
c401cc
+      system services), /user (for user login sessions) and
c401cc
+      /machine (for virtual machines and containers). By default
c401cc
+      every consumer will of course be associated with the /machine
c401cc
+      partition.
c401cc
+    

c401cc
+
c401cc
+    

c401cc
+      Given this, a possible systemd cgroups layout involving 3 qemu guests,
c401cc
+      3 lxc containers and 2 custom child slices, would be:
c401cc
     

c401cc
 
c401cc
     
c401cc
@@ -87,23 +191,21 @@ $ROOT
c401cc
       |   +- vcpu0
c401cc
       |   +- vcpu1
c401cc
       |
c401cc
-      +- container1.libvirt-lxc
c401cc
-      |
c401cc
-      +- container2.libvirt-lxc
c401cc
+      +- engineering.partition
c401cc
+      |   |
c401cc
+      |   +- testing.partition
c401cc
+      |   |   |
c401cc
+      |   |   +- container1.libvirt-lxc
c401cc
+      |   |
c401cc
+      |   +- production.partition
c401cc
+      |       |
c401cc
+      |       +- container2.libvirt-lxc
c401cc
       |
c401cc
-      +- container3.libvirt-lxc
c401cc
+      +- marketing.partition
c401cc
+          |
c401cc
+          +- container3.libvirt-lxc
c401cc
     
c401cc
 
c401cc
-    

c401cc
-      The default cgroups layout ensures that, when there is contention for
c401cc
-      CPU time, it is shared equally between system services, user sessions
c401cc
-      and virtual machines / containers. This prevents virtual machines from
c401cc
-      locking the administrator out of the host, or impacting execution of
c401cc
-      system services. Conversely, when there is no contention from
c401cc
-      system services / user sessions, it is possible for virtual machines
c401cc
-      to fully utilize the host CPUs.
c401cc
-    

c401cc
-
c401cc
     

Using custom partitions

c401cc
 
c401cc
     

c401cc
@@ -127,12 +229,54 @@ $ROOT
c401cc
     
c401cc
 
c401cc
     

c401cc
+      Note that the partition names in the guest XML are using a
c401cc
+      generic naming format, not the low level naming convention
c401cc
+      required by the underlying host OS. That is, you should not include
c401cc
+      any of the .partition or .slice
c401cc
+      suffixes in the XML config. Given a partition name
c401cc
+      /machine/production, libvirt will automatically
c401cc
+      apply the platform specific translation required to get
c401cc
+      /machine/production.partition (non-systemd)
c401cc
+      or /machine.slice/machine-production.slice
c401cc
+      (systemd) as the underlying cgroup name
c401cc
+    

c401cc
+
c401cc
+    

c401cc
       Libvirt will not auto-create the cgroups directory to back
c401cc
       this partition. In the future, libvirt / virsh will provide
c401cc
       APIs / commands to create custom partitions, but currently
c401cc
-      this is left as an exercise for the administrator. For
c401cc
-      example, given the XML config above, the admin would need
c401cc
-      to create a cgroup named '/machine/production.partition'
c401cc
+      this is left as an exercise for the administrator.
c401cc
+    

c401cc
+
c401cc
+    

c401cc
+      Note: the ability to place guests in custom
c401cc
+      partitions is only available with libvirt >= 1.0.5, using
c401cc
+      the new cgroup layout. The legacy cgroups layout described
c401cc
+      later in this document did not support customization per guest.
c401cc
+    

c401cc
+
c401cc
+    

Creating custom partitions (systemd)

c401cc
+
c401cc
+    

c401cc
+      Given the XML config above, the admin on a systemd based host would
c401cc
+      need to create a unit file /etc/systemd/system/machine-production.slice
c401cc
+    

c401cc
+
c401cc
+    
c401cc
+# cat > /etc/systemd/system/machine-testing.slice <<EOF
c401cc
+[Unit]
c401cc
+Description=VM testing slice
c401cc
+Before=slices.target
c401cc
+Wants=machine.slice
c401cc
+EOF
c401cc
+# systemctl start machine-testing.slice
c401cc
+    
c401cc
+
c401cc
+    

Creating custom partitions (non-systemd)

c401cc
+
c401cc
+    

c401cc
+      Given the XML config above, the admin on a non-systemd based host
c401cc
+      would need to create a cgroup named '/machine/production.partition'
c401cc
     

c401cc
 
c401cc
     
c401cc
@@ -147,18 +291,6 @@ $ROOT
c401cc
   done
c401cc
 
c401cc
 
c401cc
-    

c401cc
-      Note: the cgroups directory created as a ".partition"
c401cc
-      suffix, but the XML config does not require this suffix.
c401cc
-    

c401cc
-
c401cc
-    

c401cc
-      Note: the ability to place guests in custom
c401cc
-      partitions is only available with libvirt >= 1.0.5, using
c401cc
-      the new cgroup layout. The legacy cgroups layout described
c401cc
-      later did not support customization per guest.
c401cc
-    

c401cc
-
c401cc
     

Resource management APIs/commands

c401cc
 
c401cc
     

c401cc
-- 
c401cc
1.8.4.5
c401cc