anitazha / rpms / ndctl

Forked from rpms/ndctl 2 years ago
Clone

Blame 0049-daxctl-Add-Soft-Reservation-theory-of-operation.patch

Jeff Moyer 2c91dc
From 8f4e42c0c526e85b045fd0329df7cb904f511c98 Mon Sep 17 00:00:00 2001
Jeff Moyer 2c91dc
From: Dan Williams <dan.j.williams@intel.com>
Jeff Moyer 2c91dc
Date: Thu, 7 Oct 2021 14:59:53 -0700
Jeff Moyer 2c91dc
Subject: [PATCH 049/217] daxctl: Add "Soft Reservation" theory of operation
Jeff Moyer 2c91dc
Jeff Moyer 2c91dc
As systems are starting to ship memory with the EFI "Special Purpose"
Jeff Moyer 2c91dc
attribute that Linux optionally turns into "Soft Reserved" ranges one of
Jeff Moyer 2c91dc
the immediate first questions is "where is my special memory, and how do
Jeff Moyer 2c91dc
access it". Add some documentation to explain the default behaviour of
Jeff Moyer 2c91dc
"Soft Reserved".
Jeff Moyer 2c91dc
Jeff Moyer 2c91dc
Link: https://lore.kernel.org/r/163364399303.201290.6835215953983673447.stgit@dwillia2-desk3.amr.corp.intel.com
Jeff Moyer 2c91dc
Reported-by: John Groves <john@jagalactic.com>
Jeff Moyer 2c91dc
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jeff Moyer 2c91dc
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Jeff Moyer 2c91dc
---
Jeff Moyer 2c91dc
 .../daxctl/daxctl-reconfigure-device.txt      | 127 ++++++++++++------
Jeff Moyer 2c91dc
 1 file changed, 88 insertions(+), 39 deletions(-)
Jeff Moyer 2c91dc
Jeff Moyer 2c91dc
diff --git a/Documentation/daxctl/daxctl-reconfigure-device.txt b/Documentation/daxctl/daxctl-reconfigure-device.txt
Jeff Moyer 2c91dc
index f112b3c..132684c 100644
Jeff Moyer 2c91dc
--- a/Documentation/daxctl/daxctl-reconfigure-device.txt
Jeff Moyer 2c91dc
+++ b/Documentation/daxctl/daxctl-reconfigure-device.txt
Jeff Moyer 2c91dc
@@ -12,6 +12,94 @@ SYNOPSIS
Jeff Moyer 2c91dc
 [verse]
Jeff Moyer 2c91dc
 'daxctl reconfigure-device' <dax0.0> [<dax1.0>...<daxY.Z>] [<options>]
Jeff Moyer 2c91dc
 
Jeff Moyer 2c91dc
+DESCRIPTION
Jeff Moyer 2c91dc
+-----------
Jeff Moyer 2c91dc
+
Jeff Moyer 2c91dc
+Reconfigure the operational mode of a dax device. This can be used to convert
Jeff Moyer 2c91dc
+a regular 'devdax' mode device to the 'system-ram' mode which arranges for the
Jeff Moyer 2c91dc
+dax range to be hot-plugged into the system as regular memory.
Jeff Moyer 2c91dc
+
Jeff Moyer 2c91dc
+NOTE: This is a destructive operation. Any data on the dax device *will* be
Jeff Moyer 2c91dc
+lost.
Jeff Moyer 2c91dc
+
Jeff Moyer 2c91dc
+NOTE: Device reconfiguration depends on the dax-bus device model. See
Jeff Moyer 2c91dc
+linkdaxctl:daxctl-migrate-device-model[1] for more information. If dax-class is
Jeff Moyer 2c91dc
+in use (via the dax_pmem_compat driver), the reconfiguration will fail with an
Jeff Moyer 2c91dc
+error such as the following:
Jeff Moyer 2c91dc
+----
Jeff Moyer 2c91dc
+# daxctl reconfigure-device --mode=system-ram --region=0 all
Jeff Moyer 2c91dc
+libdaxctl: daxctl_dev_disable: dax3.0: error: device model is dax-class
Jeff Moyer 2c91dc
+dax3.0: disable failed: Operation not supported
Jeff Moyer 2c91dc
+error reconfiguring devices: Operation not supported
Jeff Moyer 2c91dc
+reconfigured 0 devices
Jeff Moyer 2c91dc
+----
Jeff Moyer 2c91dc
+
Jeff Moyer 2c91dc
+'daxctl-reconfigure-device' nominally expects that it will online new memory
Jeff Moyer 2c91dc
+blocks as 'movable', so that kernel data doesn't make it into this memory.
Jeff Moyer 2c91dc
+However, there are other potential agents that may be configured to
Jeff Moyer 2c91dc
+automatically online new hot-plugged memory as it appears. Most notably,
Jeff Moyer 2c91dc
+these are the '/sys/devices/system/memory/auto_online_blocks' configuration,
Jeff Moyer 2c91dc
+or system udev rules. If such an agent races to online memory sections, daxctl
Jeff Moyer 2c91dc
+checks if the blocks were onlined as 'movable' memory. If this was not the
Jeff Moyer 2c91dc
+case, and the memory blocks are found to be in a different zone, then a
Jeff Moyer 2c91dc
+warning is displayed. If it is desired that a different agent control the
Jeff Moyer 2c91dc
+onlining of memory blocks, and the associated memory zone, then it is
Jeff Moyer 2c91dc
+recommended to use the --no-online option described below. This will abridge
Jeff Moyer 2c91dc
+the device reconfiguration operation to just hotplugging the memory, and
Jeff Moyer 2c91dc
+refrain from then onlining it.
Jeff Moyer 2c91dc
+
Jeff Moyer 2c91dc
+In case daxctl detects that there is a kernel policy to auto-online blocks
Jeff Moyer 2c91dc
+(via /sys/devices/system/memory/auto_online_blocks), then reconfiguring to
Jeff Moyer 2c91dc
+system-ram will result in a failure. This can be overridden with '--force'.
Jeff Moyer 2c91dc
+
Jeff Moyer 2c91dc
+
Jeff Moyer 2c91dc
+THEORY OF OPERATION
Jeff Moyer 2c91dc
+-------------------
Jeff Moyer 2c91dc
+The kernel device-dax subsystem surfaces character devices
Jeff Moyer 2c91dc
+that provide DAX-access (direct mappings sans page-cache buffering) to a
Jeff Moyer 2c91dc
+given memory region. The devices are named /dev/daxX.Y where X is a
Jeff Moyer 2c91dc
+region-id and Y is an instance-id within that region. There are 2
Jeff Moyer 2c91dc
+mechanisms that trigger device-dax instances to appear:
Jeff Moyer 2c91dc
+
Jeff Moyer 2c91dc
+1. Persistent Memory (PMEM) namespace configured in "devdax" mode. See
Jeff Moyer 2c91dc
+"ndctl create-namspace --help" and
Jeff Moyer 2c91dc
+https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dax/Kconfig[CONFIG_DEV_DAX_PMEM].
Jeff Moyer 2c91dc
+In this case the device-dax instance is statically sized to its host
Jeff Moyer 2c91dc
+memory region which is bounded to the physical address range of the host
Jeff Moyer 2c91dc
+namespace.
Jeff Moyer 2c91dc
+
Jeff Moyer 2c91dc
+2. Soft Reserved memory enumerated by platform firmware. On EFI systems
Jeff Moyer 2c91dc
+this is communicated via the so called EFI_MEMORY_SP "Special Purpose"
Jeff Moyer 2c91dc
+attribute. See
Jeff Moyer 2c91dc
+https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dax/Kconfig[CONFIG_DEV_DAX_HMEM].
Jeff Moyer 2c91dc
+In this case the device-dax instance(s) associated with the given memory
Jeff Moyer 2c91dc
+region can be resized and divided into multiple devices.
Jeff Moyer 2c91dc
+
Jeff Moyer 2c91dc
+In the Soft Reservation case the expectation for EFI + ACPI based
Jeff Moyer 2c91dc
+platforms is that in addition to the EFI_MEMORY_SP attribute the
Jeff Moyer 2c91dc
+firmware also creates distinct ACPI proximity domains for any address
Jeff Moyer 2c91dc
+range that has different performance characteristics than default
Jeff Moyer 2c91dc
+"System RAM". So, the SRAT will define the proximity domain, the SLIT
Jeff Moyer 2c91dc
+communicates relative distance to other proximity domains, and the HMAT
Jeff Moyer 2c91dc
+is populated with nominal read/write latency and read/write bandwidth
Jeff Moyer 2c91dc
+data. That HMAT data is emitted to the kernel log on bootup, and also
Jeff Moyer 2c91dc
+exported to sysfs. See
Jeff Moyer 2c91dc
+https://www.kernel.org/doc/html/latest/admin-guide/mm/numaperf.html[NUMAPERF],
Jeff Moyer 2c91dc
+for the runtime representation of CPU to Memory node performance
Jeff Moyer 2c91dc
+details.
Jeff Moyer 2c91dc
+
Jeff Moyer 2c91dc
+Outside of the NUMA performance details linked above the other method to
Jeff Moyer 2c91dc
+detect the presence of "Soft Reserved" memory is to dump /proc/iomem and
Jeff Moyer 2c91dc
+look for "Soft Reserved" ranges. If the kernel was not built with
Jeff Moyer 2c91dc
+CONFIG_EFI_SOFTRESERVE, predates the introduction of
Jeff Moyer 2c91dc
+CONFIG_EFI_SOFTRESERVE (v5.5), or was booted with the efi=nosoftreserve
Jeff Moyer 2c91dc
+command line then device-dax will not attach and the expectation is that
Jeff Moyer 2c91dc
+the memory shows up as a memory-only NUMA node. Otherwise the memory
Jeff Moyer 2c91dc
+shows up as a device-dax instance and DAXCTL(1) can be used to
Jeff Moyer 2c91dc
+optionally partition it and assign the memory back to the kernel as
Jeff Moyer 2c91dc
+"System RAM", or the device can be mapped directly as the back end of a
Jeff Moyer 2c91dc
+userspace memory allocator like https://pmem.io/vmem/libvmem/[LIBVMEM].
Jeff Moyer 2c91dc
+
Jeff Moyer 2c91dc
 EXAMPLES
Jeff Moyer 2c91dc
 --------
Jeff Moyer 2c91dc
 
Jeff Moyer 2c91dc
@@ -83,45 +171,6 @@ reconfigured 1 device
Jeff Moyer 2c91dc
 reconfigured 1 device
Jeff Moyer 2c91dc
 ----
Jeff Moyer 2c91dc
 
Jeff Moyer 2c91dc
-DESCRIPTION
Jeff Moyer 2c91dc
------------
Jeff Moyer 2c91dc
-
Jeff Moyer 2c91dc
-Reconfigure the operational mode of a dax device. This can be used to convert
Jeff Moyer 2c91dc
-a regular 'devdax' mode device to the 'system-ram' mode which arranges for the
Jeff Moyer 2c91dc
-dax range to be hot-plugged into the system as regular memory.
Jeff Moyer 2c91dc
-
Jeff Moyer 2c91dc
-NOTE: This is a destructive operation. Any data on the dax device *will* be
Jeff Moyer 2c91dc
-lost.
Jeff Moyer 2c91dc
-
Jeff Moyer 2c91dc
-NOTE: Device reconfiguration depends on the dax-bus device model. See
Jeff Moyer 2c91dc
-linkdaxctl:daxctl-migrate-device-model[1] for more information. If dax-class is
Jeff Moyer 2c91dc
-in use (via the dax_pmem_compat driver), the reconfiguration will fail with an
Jeff Moyer 2c91dc
-error such as the following:
Jeff Moyer 2c91dc
-----
Jeff Moyer 2c91dc
-# daxctl reconfigure-device --mode=system-ram --region=0 all
Jeff Moyer 2c91dc
-libdaxctl: daxctl_dev_disable: dax3.0: error: device model is dax-class
Jeff Moyer 2c91dc
-dax3.0: disable failed: Operation not supported
Jeff Moyer 2c91dc
-error reconfiguring devices: Operation not supported
Jeff Moyer 2c91dc
-reconfigured 0 devices
Jeff Moyer 2c91dc
-----
Jeff Moyer 2c91dc
-
Jeff Moyer 2c91dc
-'daxctl-reconfigure-device' nominally expects that it will online new memory
Jeff Moyer 2c91dc
-blocks as 'movable', so that kernel data doesn't make it into this memory.
Jeff Moyer 2c91dc
-However, there are other potential agents that may be configured to
Jeff Moyer 2c91dc
-automatically online new hot-plugged memory as it appears. Most notably,
Jeff Moyer 2c91dc
-these are the '/sys/devices/system/memory/auto_online_blocks' configuration,
Jeff Moyer 2c91dc
-or system udev rules. If such an agent races to online memory sections, daxctl
Jeff Moyer 2c91dc
-checks if the blocks were onlined as 'movable' memory. If this was not the
Jeff Moyer 2c91dc
-case, and the memory blocks are found to be in a different zone, then a
Jeff Moyer 2c91dc
-warning is displayed. If it is desired that a different agent control the
Jeff Moyer 2c91dc
-onlining of memory blocks, and the associated memory zone, then it is
Jeff Moyer 2c91dc
-recommended to use the --no-online option described below. This will abridge
Jeff Moyer 2c91dc
-the device reconfiguration operation to just hotplugging the memory, and
Jeff Moyer 2c91dc
-refrain from then onlining it.
Jeff Moyer 2c91dc
-
Jeff Moyer 2c91dc
-In case daxctl detects that there is a kernel policy to auto-online blocks
Jeff Moyer 2c91dc
-(via /sys/devices/system/memory/auto_online_blocks), then reconfiguring to
Jeff Moyer 2c91dc
-system-ram will result in a failure. This can be overridden with '--force'.
Jeff Moyer 2c91dc
 
Jeff Moyer 2c91dc
 OPTIONS
Jeff Moyer 2c91dc
 -------
Jeff Moyer 2c91dc
-- 
Jeff Moyer 2c91dc
2.27.0
Jeff Moyer 2c91dc