9ae3a8
From bd86e4e5fd283179e97ef07354d822afbf21b7dd Mon Sep 17 00:00:00 2001
9ae3a8
Message-Id: <bd86e4e5fd283179e97ef07354d822afbf21b7dd.1387382496.git.minovotn@redhat.com>
9ae3a8
In-Reply-To: <c5386144fbf09f628148101bc674e2421cdd16e3.1387382496.git.minovotn@redhat.com>
9ae3a8
References: <c5386144fbf09f628148101bc674e2421cdd16e3.1387382496.git.minovotn@redhat.com>
9ae3a8
From: Nigel Croxon <ncroxon@redhat.com>
9ae3a8
Date: Thu, 14 Nov 2013 22:52:48 +0100
9ae3a8
Subject: [PATCH 12/46] rdma: update documentation to reflect new unpin
9ae3a8
 support
9ae3a8
9ae3a8
RH-Author: Nigel Croxon <ncroxon@redhat.com>
9ae3a8
Message-id: <1384469598-13137-13-git-send-email-ncroxon@redhat.com>
9ae3a8
Patchwork-id: 55702
9ae3a8
O-Subject: [RHEL7.0 PATCH 12/42] rdma: update documentation to reflect new unpin support
9ae3a8
Bugzilla: 1011720
9ae3a8
RH-Acked-by: Orit Wasserman <owasserm@redhat.com>
9ae3a8
RH-Acked-by: Amit Shah <amit.shah@redhat.com>
9ae3a8
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
9ae3a8
9ae3a8
Bugzilla: 1011720
9ae3a8
https://bugzilla.redhat.com/show_bug.cgi?id=1011720
9ae3a8
9ae3a8
>From commit ID:
9ae3a8
commit a5f56b906e0d7975b87dc3d3c5bfe5a75a4028d2
9ae3a8
Author: Michael R. Hines <mrhines@us.ibm.com>
9ae3a8
Date:   Mon Jul 22 10:01:51 2013 -0400
9ae3a8
9ae3a8
    rdma: update documentation to reflect new unpin support
9ae3a8
9ae3a8
    As requested, the protocol now includes memory unpinning support.
9ae3a8
    This has been implemented in a non-optimized manner, in such a way
9ae3a8
    that one could devise an LRU or other workload-specific information
9ae3a8
    on top of the basic mechanism to influence the way unpinning happens
9ae3a8
    during runtime.
9ae3a8
9ae3a8
    The feature is not yet user-facing, and is thus can only be enabled
9ae3a8
    at compile-time.
9ae3a8
9ae3a8
    Reviewed-by: Eric Blake <eblake@redhat.com>
9ae3a8
    Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
9ae3a8
    Signed-off-by: Juan Quintela <quintela@redhat.com>
9ae3a8
---
9ae3a8
 docs/rdma.txt |   51 ++++++++++++++++++++++++++++++---------------------
9ae3a8
 1 files changed, 30 insertions(+), 21 deletions(-)
9ae3a8
9ae3a8
Signed-off-by: Michal Novotny <minovotn@redhat.com>
9ae3a8
---
9ae3a8
 docs/rdma.txt | 51 ++++++++++++++++++++++++++++++---------------------
9ae3a8
 1 file changed, 30 insertions(+), 21 deletions(-)
9ae3a8
9ae3a8
diff --git a/docs/rdma.txt b/docs/rdma.txt
9ae3a8
index 45a4b1d..45d1c8a 100644
9ae3a8
--- a/docs/rdma.txt
9ae3a8
+++ b/docs/rdma.txt
9ae3a8
@@ -35,7 +35,7 @@ memory tracked during each live migration iteration round cannot keep pace
9ae3a8
 with the rate of dirty memory produced by the workload.
9ae3a8
 
9ae3a8
 RDMA currently comes in two flavors: both Ethernet based (RoCE, or RDMA
9ae3a8
-over Convered Ethernet) as well as Infiniband-based. This implementation of
9ae3a8
+over Converged Ethernet) as well as Infiniband-based. This implementation of
9ae3a8
 migration using RDMA is capable of using both technologies because of
9ae3a8
 the use of the OpenFabrics OFED software stack that abstracts out the
9ae3a8
 programming model irrespective of the underlying hardware.
9ae3a8
@@ -188,9 +188,9 @@ header portion and a data portion (but together are transmitted
9ae3a8
 as a single SEND message).
9ae3a8
 
9ae3a8
 Header:
9ae3a8
-    * Length  (of the data portion, uint32, network byte order)
9ae3a8
-    * Type    (what command to perform, uint32, network byte order)
9ae3a8
-    * Repeat  (Number of commands in data portion, same type only)
9ae3a8
+    * Length               (of the data portion, uint32, network byte order)
9ae3a8
+    * Type                 (what command to perform, uint32, network byte order)
9ae3a8
+    * Repeat               (Number of commands in data portion, same type only)
9ae3a8
 
9ae3a8
 The 'Repeat' field is here to support future multiple page registrations
9ae3a8
 in a single message without any need to change the protocol itself
9ae3a8
@@ -202,17 +202,19 @@ The maximum number of repeats is hard-coded to 4096. This is a conservative
9ae3a8
 limit based on the maximum size of a SEND message along with emperical
9ae3a8
 observations on the maximum future benefit of simultaneous page registrations.
9ae3a8
 
9ae3a8
-The 'type' field has 10 different command values:
9ae3a8
-    1. Unused
9ae3a8
-    2. Error              (sent to the source during bad things)
9ae3a8
-    3. Ready              (control-channel is available)
9ae3a8
-    4. QEMU File          (for sending non-live device state)
9ae3a8
-    5. RAM Blocks request (used right after connection setup)
9ae3a8
-    6. RAM Blocks result  (used right after connection setup)
9ae3a8
-    7. Compress page      (zap zero page and skip registration)
9ae3a8
-    8. Register request   (dynamic chunk registration)
9ae3a8
-    9. Register result    ('rkey' to be used by sender)
9ae3a8
-    10. Register finished  (registration for current iteration finished)
9ae3a8
+The 'type' field has 12 different command values:
9ae3a8
+     1. Unused
9ae3a8
+     2. Error                      (sent to the source during bad things)
9ae3a8
+     3. Ready                      (control-channel is available)
9ae3a8
+     4. QEMU File                  (for sending non-live device state)
9ae3a8
+     5. RAM Blocks request         (used right after connection setup)
9ae3a8
+     6. RAM Blocks result          (used right after connection setup)
9ae3a8
+     7. Compress page              (zap zero page and skip registration)
9ae3a8
+     8. Register request           (dynamic chunk registration)
9ae3a8
+     9. Register result            ('rkey' to be used by sender)
9ae3a8
+    10. Register finished          (registration for current iteration finished)
9ae3a8
+    11. Unregister request         (unpin previously registered memory)
9ae3a8
+    12. Unregister finished        (confirmation that unpin completed)
9ae3a8
 
9ae3a8
 A single control message, as hinted above, can contain within the data
9ae3a8
 portion an array of many commands of the same type. If there is more than
9ae3a8
@@ -243,7 +245,7 @@ qemu_rdma_exchange_send(header, data, optional response header & data):
9ae3a8
    from the receiver to tell us that the receiver
9ae3a8
    is *ready* for us to transmit some new bytes.
9ae3a8
 2. Optionally: if we are expecting a response from the command
9ae3a8
-   (that we have no yet transmitted), let's post an RQ
9ae3a8
+   (that we have not yet transmitted), let's post an RQ
9ae3a8
    work request to receive that data a few moments later.
9ae3a8
 3. When the READY arrives, librdmacm will
9ae3a8
    unblock us and we immediately post a RQ work request
9ae3a8
@@ -293,8 +295,10 @@ librdmacm provides the user with a 'private data' area to be exchanged
9ae3a8
 at connection-setup time before any infiniband traffic is generated.
9ae3a8
 
9ae3a8
 Header:
9ae3a8
-    * Version (protocol version validated before send/recv occurs), uint32, network byte order
9ae3a8
-    * Flags   (bitwise OR of each capability), uint32, network byte order
9ae3a8
+    * Version (protocol version validated before send/recv occurs),
9ae3a8
+                                               uint32, network byte order
9ae3a8
+    * Flags   (bitwise OR of each capability),
9ae3a8
+                                               uint32, network byte order
9ae3a8
 
9ae3a8
 There is no data portion of this header right now, so there is
9ae3a8
 no length field. The maximum size of the 'private data' section
9ae3a8
@@ -313,7 +317,7 @@ If the version is invalid, we throw an error.
9ae3a8
 If the version is new, we only negotiate the capabilities that the
9ae3a8
 requested version is able to perform and ignore the rest.
9ae3a8
 
9ae3a8
-Currently there is only *one* capability in Version #1: dynamic page registration
9ae3a8
+Currently there is only one capability in Version #1: dynamic page registration
9ae3a8
 
9ae3a8
 Finally: Negotiation happens with the Flags field: If the primary-VM
9ae3a8
 sets a flag, but the destination does not support this capability, it
9ae3a8
@@ -326,8 +330,8 @@ QEMUFileRDMA Interface:
9ae3a8
 
9ae3a8
 QEMUFileRDMA introduces a couple of new functions:
9ae3a8
 
9ae3a8
-1. qemu_rdma_get_buffer()  (QEMUFileOps rdma_read_ops)
9ae3a8
-2. qemu_rdma_put_buffer()  (QEMUFileOps rdma_write_ops)
9ae3a8
+1. qemu_rdma_get_buffer()               (QEMUFileOps rdma_read_ops)
9ae3a8
+2. qemu_rdma_put_buffer()               (QEMUFileOps rdma_write_ops)
9ae3a8
 
9ae3a8
 These two functions are very short and simply use the protocol
9ae3a8
 describe above to deliver bytes without changing the upper-level
9ae3a8
@@ -413,3 +417,8 @@ TODO:
9ae3a8
    the use of KSM and ballooning while using RDMA.
9ae3a8
 4. Also, some form of balloon-device usage tracking would also
9ae3a8
    help alleviate some issues.
9ae3a8
+5. Move UNREGISTER requests to a separate thread.
9ae3a8
+6. Use LRU to provide more fine-grained direction of UNREGISTER
9ae3a8
+   requests for unpinning memory in an overcommitted environment.
9ae3a8
+7. Expose UNREGISTER support to the user by way of workload-specific
9ae3a8
+   hints about application behavior.
9ae3a8
-- 
9ae3a8
1.7.11.7
9ae3a8