0a122b
From bd86e4e5fd283179e97ef07354d822afbf21b7dd Mon Sep 17 00:00:00 2001
0a122b
Message-Id: <bd86e4e5fd283179e97ef07354d822afbf21b7dd.1387382496.git.minovotn@redhat.com>
0a122b
In-Reply-To: <c5386144fbf09f628148101bc674e2421cdd16e3.1387382496.git.minovotn@redhat.com>
0a122b
References: <c5386144fbf09f628148101bc674e2421cdd16e3.1387382496.git.minovotn@redhat.com>
0a122b
From: Nigel Croxon <ncroxon@redhat.com>
0a122b
Date: Thu, 14 Nov 2013 22:52:48 +0100
0a122b
Subject: [PATCH 12/46] rdma: update documentation to reflect new unpin
0a122b
 support
0a122b
0a122b
RH-Author: Nigel Croxon <ncroxon@redhat.com>
0a122b
Message-id: <1384469598-13137-13-git-send-email-ncroxon@redhat.com>
0a122b
Patchwork-id: 55702
0a122b
O-Subject: [RHEL7.0 PATCH 12/42] rdma: update documentation to reflect new unpin support
0a122b
Bugzilla: 1011720
0a122b
RH-Acked-by: Orit Wasserman <owasserm@redhat.com>
0a122b
RH-Acked-by: Amit Shah <amit.shah@redhat.com>
0a122b
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
0a122b
0a122b
Bugzilla: 1011720
0a122b
https://bugzilla.redhat.com/show_bug.cgi?id=1011720
0a122b
0a122b
>From commit ID:
0a122b
commit a5f56b906e0d7975b87dc3d3c5bfe5a75a4028d2
0a122b
Author: Michael R. Hines <mrhines@us.ibm.com>
0a122b
Date:   Mon Jul 22 10:01:51 2013 -0400
0a122b
0a122b
    rdma: update documentation to reflect new unpin support
0a122b
0a122b
    As requested, the protocol now includes memory unpinning support.
0a122b
    This has been implemented in a non-optimized manner, in such a way
0a122b
    that one could devise an LRU or other workload-specific information
0a122b
    on top of the basic mechanism to influence the way unpinning happens
0a122b
    during runtime.
0a122b
0a122b
    The feature is not yet user-facing, and is thus can only be enabled
0a122b
    at compile-time.
0a122b
0a122b
    Reviewed-by: Eric Blake <eblake@redhat.com>
0a122b
    Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
0a122b
    Signed-off-by: Juan Quintela <quintela@redhat.com>
0a122b
---
0a122b
 docs/rdma.txt |   51 ++++++++++++++++++++++++++++++---------------------
0a122b
 1 files changed, 30 insertions(+), 21 deletions(-)
0a122b
0a122b
Signed-off-by: Michal Novotny <minovotn@redhat.com>
0a122b
---
0a122b
 docs/rdma.txt | 51 ++++++++++++++++++++++++++++++---------------------
0a122b
 1 file changed, 30 insertions(+), 21 deletions(-)
0a122b
0a122b
diff --git a/docs/rdma.txt b/docs/rdma.txt
0a122b
index 45a4b1d..45d1c8a 100644
0a122b
--- a/docs/rdma.txt
0a122b
+++ b/docs/rdma.txt
0a122b
@@ -35,7 +35,7 @@ memory tracked during each live migration iteration round cannot keep pace
0a122b
 with the rate of dirty memory produced by the workload.
0a122b
 
0a122b
 RDMA currently comes in two flavors: both Ethernet based (RoCE, or RDMA
0a122b
-over Convered Ethernet) as well as Infiniband-based. This implementation of
0a122b
+over Converged Ethernet) as well as Infiniband-based. This implementation of
0a122b
 migration using RDMA is capable of using both technologies because of
0a122b
 the use of the OpenFabrics OFED software stack that abstracts out the
0a122b
 programming model irrespective of the underlying hardware.
0a122b
@@ -188,9 +188,9 @@ header portion and a data portion (but together are transmitted
0a122b
 as a single SEND message).
0a122b
 
0a122b
 Header:
0a122b
-    * Length  (of the data portion, uint32, network byte order)
0a122b
-    * Type    (what command to perform, uint32, network byte order)
0a122b
-    * Repeat  (Number of commands in data portion, same type only)
0a122b
+    * Length               (of the data portion, uint32, network byte order)
0a122b
+    * Type                 (what command to perform, uint32, network byte order)
0a122b
+    * Repeat               (Number of commands in data portion, same type only)
0a122b
 
0a122b
 The 'Repeat' field is here to support future multiple page registrations
0a122b
 in a single message without any need to change the protocol itself
0a122b
@@ -202,17 +202,19 @@ The maximum number of repeats is hard-coded to 4096. This is a conservative
0a122b
 limit based on the maximum size of a SEND message along with emperical
0a122b
 observations on the maximum future benefit of simultaneous page registrations.
0a122b
 
0a122b
-The 'type' field has 10 different command values:
0a122b
-    1. Unused
0a122b
-    2. Error              (sent to the source during bad things)
0a122b
-    3. Ready              (control-channel is available)
0a122b
-    4. QEMU File          (for sending non-live device state)
0a122b
-    5. RAM Blocks request (used right after connection setup)
0a122b
-    6. RAM Blocks result  (used right after connection setup)
0a122b
-    7. Compress page      (zap zero page and skip registration)
0a122b
-    8. Register request   (dynamic chunk registration)
0a122b
-    9. Register result    ('rkey' to be used by sender)
0a122b
-    10. Register finished  (registration for current iteration finished)
0a122b
+The 'type' field has 12 different command values:
0a122b
+     1. Unused
0a122b
+     2. Error                      (sent to the source during bad things)
0a122b
+     3. Ready                      (control-channel is available)
0a122b
+     4. QEMU File                  (for sending non-live device state)
0a122b
+     5. RAM Blocks request         (used right after connection setup)
0a122b
+     6. RAM Blocks result          (used right after connection setup)
0a122b
+     7. Compress page              (zap zero page and skip registration)
0a122b
+     8. Register request           (dynamic chunk registration)
0a122b
+     9. Register result            ('rkey' to be used by sender)
0a122b
+    10. Register finished          (registration for current iteration finished)
0a122b
+    11. Unregister request         (unpin previously registered memory)
0a122b
+    12. Unregister finished        (confirmation that unpin completed)
0a122b
 
0a122b
 A single control message, as hinted above, can contain within the data
0a122b
 portion an array of many commands of the same type. If there is more than
0a122b
@@ -243,7 +245,7 @@ qemu_rdma_exchange_send(header, data, optional response header & data):
0a122b
    from the receiver to tell us that the receiver
0a122b
    is *ready* for us to transmit some new bytes.
0a122b
 2. Optionally: if we are expecting a response from the command
0a122b
-   (that we have no yet transmitted), let's post an RQ
0a122b
+   (that we have not yet transmitted), let's post an RQ
0a122b
    work request to receive that data a few moments later.
0a122b
 3. When the READY arrives, librdmacm will
0a122b
    unblock us and we immediately post a RQ work request
0a122b
@@ -293,8 +295,10 @@ librdmacm provides the user with a 'private data' area to be exchanged
0a122b
 at connection-setup time before any infiniband traffic is generated.
0a122b
 
0a122b
 Header:
0a122b
-    * Version (protocol version validated before send/recv occurs), uint32, network byte order
0a122b
-    * Flags   (bitwise OR of each capability), uint32, network byte order
0a122b
+    * Version (protocol version validated before send/recv occurs),
0a122b
+                                               uint32, network byte order
0a122b
+    * Flags   (bitwise OR of each capability),
0a122b
+                                               uint32, network byte order
0a122b
 
0a122b
 There is no data portion of this header right now, so there is
0a122b
 no length field. The maximum size of the 'private data' section
0a122b
@@ -313,7 +317,7 @@ If the version is invalid, we throw an error.
0a122b
 If the version is new, we only negotiate the capabilities that the
0a122b
 requested version is able to perform and ignore the rest.
0a122b
 
0a122b
-Currently there is only *one* capability in Version #1: dynamic page registration
0a122b
+Currently there is only one capability in Version #1: dynamic page registration
0a122b
 
0a122b
 Finally: Negotiation happens with the Flags field: If the primary-VM
0a122b
 sets a flag, but the destination does not support this capability, it
0a122b
@@ -326,8 +330,8 @@ QEMUFileRDMA Interface:
0a122b
 
0a122b
 QEMUFileRDMA introduces a couple of new functions:
0a122b
 
0a122b
-1. qemu_rdma_get_buffer()  (QEMUFileOps rdma_read_ops)
0a122b
-2. qemu_rdma_put_buffer()  (QEMUFileOps rdma_write_ops)
0a122b
+1. qemu_rdma_get_buffer()               (QEMUFileOps rdma_read_ops)
0a122b
+2. qemu_rdma_put_buffer()               (QEMUFileOps rdma_write_ops)
0a122b
 
0a122b
 These two functions are very short and simply use the protocol
0a122b
 describe above to deliver bytes without changing the upper-level
0a122b
@@ -413,3 +417,8 @@ TODO:
0a122b
    the use of KSM and ballooning while using RDMA.
0a122b
 4. Also, some form of balloon-device usage tracking would also
0a122b
    help alleviate some issues.
0a122b
+5. Move UNREGISTER requests to a separate thread.
0a122b
+6. Use LRU to provide more fine-grained direction of UNREGISTER
0a122b
+   requests for unpinning memory in an overcommitted environment.
0a122b
+7. Expose UNREGISTER support to the user by way of workload-specific
0a122b
+   hints about application behavior.
0a122b
-- 
0a122b
1.7.11.7
0a122b