bbaaef
From afab54226927711c719e8012c43333273f581ac7 Mon Sep 17 00:00:00 2001
bbaaef
From: Numan Siddique <numans@ovn.org>
bbaaef
Date: Sat, 1 Feb 2020 16:23:43 +0530
bbaaef
Subject: [PATCH] ovn-northd: Address scale issues with DNAT flows.
bbaaef
bbaaef
When the commit [1] added Distributed NAT support in OVN, it didn't address
bbaaef
the requirement of making East/West NAT traffic distributed. The E/W NAT
bbaaef
traffic was still centralized. Later a couple of patches [2], addressed this
bbaaef
requirement. But the approach taken in [2] resulted in a lot of logical flows
bbaaef
as number of dnat_and_snat entries increase, as reported in @Reported-at.
bbaaef
bbaaef
This patch
bbaaef
  - reverts the approch taken in [2].
bbaaef
  - removing the flows which does the NAT direct (REGBIT_NAT_REDIRECT) to
bbaaef
    the gateway chassis.
bbaaef
  - and to solve the E/W centralized NAT it does the following:
bbaaef
     * Since for each NAT entry we know the MAC binding to be used for the
bbaaef
       external_ip - either the external_mac if set or the MAC of the
bbaaef
       distributed gateway router port, this patch adds the flows in the
bbaaef
       S_ROUTER_IN_ARP_RESOLVE stage to set the eth.dst to the MAC if the
bbaaef
       IP destination is external_ip.
bbaaef
     * The existing flows in the S_ROUTER_OUT_EGR_LOOP are now added by additional
bbaaef
       match -  is_chassis_resident('P') - where 'P' is logical_port of the NAT entry
bbaaef
       if set, otherwise it is the chassis resident port of distributed router port.
bbaaef
       With this additional match, the packet will be loopbacked to apply the unSNAT/DNAT
bbaaef
       rules on the relevant chassis.
bbaaef
bbaaef
Suppose if a logical port 'P' with IP 'A' has a dnat_and_snat entry with external_mac/logical_port
bbaaef
set, and if the packet's IP destination is one of the DNAT IP - then the packet will be sent out
bbaaef
of the local chassis, since eth.dst is resolved in the S_ROUTER_IN_ARP_RESOLVE stage.
bbaaef
If the external_mac/logical_port is not in NAT entry, then the packet will be redirected to
bbaaef
the gateway chassis.
bbaaef
bbaaef
With this patch, for the logical resource reported in @Reported-at, the number of logical
bbaaef
flows come down to around 45k from 650k.
bbaaef
bbaaef
[1] - ceacd9d49316("ovn: distributed NAT flows")
bbaaef
bbaaef
[2] - 551e3d989557("OVN: fix DVR Floating IP support")
bbaaef
      8244c6b6bd88("OVN: do not distribute traffic for local FIP")
bbaaef
bbaaef
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-January/049714.html
bbaaef
Reported-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
bbaaef
Signed-off-by: Numan Siddique <numans@ovn.org>
bbaaef
Acked-by: Dumitru Ceara <dceara@redhat.com>
bbaaef
Tested-By: Daniel Alvarez Sanchez <dalvarez@redhat.com>
bbaaef
Acked-By: Daniel Alvarez Sanchez <dalvarez@redhat.com>
bbaaef
bbaaef
(cherry-picked from upstream commit 2dc7869436de32205f60128172196b3a207ab265)
bbaaef
Conflicts:
bbaaef
	ovn/northd/ovn-northd.c
bbaaef
bbaaef
Change-Id: I7684c7f5114ba7c800293e843d5d4b856dedbb96
bbaaef
---
bbaaef
 ovn/northd/ovn-northd.8.xml | 191 ++++++++------------------
bbaaef
 ovn/northd/ovn-northd.c     | 263 +++++-------------------------------
bbaaef
 tests/ovn-northd.at         |   8 +-
bbaaef
 3 files changed, 98 insertions(+), 364 deletions(-)
bbaaef
bbaaef
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
bbaaef
index d94d9aef9..a42a67c19 100644
bbaaef
--- a/ovn/northd/ovn-northd.8.xml
bbaaef
+++ b/ovn/northd/ovn-northd.8.xml
bbaaef
@@ -1487,6 +1487,24 @@ next;
bbaaef
     

bbaaef
 
bbaaef
     
    bbaaef
    +      
  • bbaaef
    +        

    bbaaef
    +          For each NAT entry of a distributed logical router  (with
    bbaaef
    +          distributed gateway router port) of type snat,
    bbaaef
    +          a priorirty-120 flow with the match inport == P
    bbaaef
    +          && ip4.src == A advances the packet to
    bbaaef
    +          the next pipeline, where P is the distributed logical
    bbaaef
    +          router port and A is the external_ip set
    bbaaef
    +          in the NAT entry. If A is an IPv6 address, then
    bbaaef
    +          ip6.src is used for the match.
    bbaaef
    +        

    bbaaef
    +
    bbaaef
    +        

    bbaaef
    +          The above flow is required to handle the routing of the East/west NAT
    bbaaef
    +          traffic.
    bbaaef
    +        

    bbaaef
    +      
    bbaaef
    +
    bbaaef
           
  • bbaaef
             

    bbaaef
               L3 admission control: A priority-100 flow drops packets that match
    bbaaef
    @@ -1977,21 +1995,6 @@ icmp6 {
    bbaaef
               redirect-chassis.
    bbaaef
             

    bbaaef
     
    bbaaef
    -        

    bbaaef
    -          For each configuration in the OVN Northbound database, that asks
    bbaaef
    -          to change the source IP address of a packet from A to
    bbaaef
    -          B, a priority-50 flow matches
    bbaaef
    -          ip && ip4.dst == B or
    bbaaef
    -          ip && ip6.dst == B
    bbaaef
    -          with an action
    bbaaef
    -          REGBIT_NAT_REDIRECT = 1; next;.  This flow is for
    bbaaef
    -          east/west traffic to a NAT destination IPv4/IPv6 address.  By
    bbaaef
    -          setting the REGBIT_NAT_REDIRECT flag, in the
    bbaaef
    -          ingress table Gateway Redirect this will trigger a
    bbaaef
    -          redirect to the instance of the gateway port on the
    bbaaef
    -          redirect-chassis.
    bbaaef
    -        

    bbaaef
    -
    bbaaef
             

    bbaaef
               A priority-0 logical flow with match 1 has actions
    bbaaef
               next;.
    bbaaef
    @@ -2147,20 +2150,6 @@ icmp6 {
    bbaaef
               redirect-chassis.
    bbaaef
             

    bbaaef
     
    bbaaef
    -        

    bbaaef
    -          For each configuration in the OVN Northbound database, that asks
    bbaaef
    -          to change the destination IP address of a packet from A to
    bbaaef
    -          B, a priority-50 flow matches ip &&
    bbaaef
    -          ip4.dst == B or ip &&
    bbaaef
    -          ip6.dst == B with an action
    bbaaef
    -          REGBIT_NAT_REDIRECT = 1; next;.  This flow is for
    bbaaef
    -          east/west traffic to a NAT destination IPv4/IPv6 address.  By
    bbaaef
    -          setting the REGBIT_NAT_REDIRECT flag, in the
    bbaaef
    -          ingress table Gateway Redirect this will trigger a
    bbaaef
    -          redirect to the instance of the gateway port on the
    bbaaef
    -          redirect-chassis.
    bbaaef
    -        

    bbaaef
    -
    bbaaef
             

    bbaaef
               A priority-0 logical flow with match 1 has actions
    bbaaef
               next;.
    bbaaef
    @@ -2285,54 +2274,6 @@ output;
    bbaaef
             

    bbaaef
           
    bbaaef
     
    bbaaef
    -      
  • bbaaef
    -        

    bbaaef
    -          For distributed logical routers where one of the logical router
    bbaaef
    -          ports specifies a redirect-chassis, a priority-400
    bbaaef
    -          logical flow for each ip source/destination couple that matches the
    bbaaef
    -          dnat_and_snat NAT rules configured. These flows will
    bbaaef
    -          allow to properly forward traffic to the external connections if
    bbaaef
    -          available and avoid sending it through the tunnel.
    bbaaef
    -          Assuming the two following NAT rules have been configured:
    bbaaef
    -        

    bbaaef
    -
    bbaaef
    -        
    bbaaef
    -external_ip{0,1} = EIP{0,1};
    bbaaef
    -external_mac{0,1} = MAC{0,1};
    bbaaef
    -logical_ip{0,1} = LIP{0,1};
    bbaaef
    -        
    bbaaef
    -
    bbaaef
    -        

    bbaaef
    -            the following action will be applied:
    bbaaef
    -        

    bbaaef
    -
    bbaaef
    -        
    bbaaef
    -eth.dst = MAC0;
    bbaaef
    -eth.src = MAC1;
    bbaaef
    -reg0 = ip4.dst; /* xxreg0 = ip6.dst; in the IPv6 case */
    bbaaef
    -reg1 = EIP1; /* xxreg1 in the IPv6 case */
    bbaaef
    -outport = redirect-chassis-port;
    bbaaef
    -REGBIT_DISTRIBUTED_NAT = 1; next;.
    bbaaef
    -        
    bbaaef
    -
    bbaaef
    -        

    bbaaef
    -            Morover a priority-400 logical flow is configured for each
    bbaaef
    -            dnat_and_snat NAT rule configured in order to
    bbaaef
    -            not send traffic for local FIP through the overlay tunnels
    bbaaef
    -            but manage it in the local hypervisor
    bbaaef
    -        

    bbaaef
    -      
    bbaaef
    -
    bbaaef
    -      
  • bbaaef
    -        

    bbaaef
    -          For distributed logical routers where one of the logical router
    bbaaef
    -          ports specifies a redirect-chassis, a priority-300
    bbaaef
    -          logical flow with match REGBIT_NAT_REDIRECT == 1 has
    bbaaef
    -          actions ip.ttl--; next;.  The outport
    bbaaef
    -          will be set later in the Gateway Redirect table.
    bbaaef
    -        

    bbaaef
    -      
    bbaaef
    -
    bbaaef
           
  • bbaaef
             

    bbaaef
               IPv4 routing table.  For each route to IPv4 network N with
    bbaaef
    @@ -2427,23 +2368,6 @@ next;
    bbaaef
             

    bbaaef
           
    bbaaef
     
    bbaaef
    -      
  • bbaaef
    -        

    bbaaef
    -          For distributed logical routers where one of the logical router
    bbaaef
    -          ports specifies a redirect-chassis, a priority-400
    bbaaef
    -          logical flow with match REGBIT_DISTRIBUTED_NAT == 1
    bbaaef
    -          has action next;
    bbaaef
    -        

    bbaaef
    -        

    bbaaef
    -          For distributed logical routers where one of the logical router
    bbaaef
    -          ports specifies a redirect-chassis, a priority-200
    bbaaef
    -          logical flow with match REGBIT_NAT_REDIRECT == 1 has
    bbaaef
    -          actions eth.dst = E; next;, where
    bbaaef
    -          E is the ethernet address of the router's distributed
    bbaaef
    -          gateway port.
    bbaaef
    -        

    bbaaef
    -      
    bbaaef
    -
    bbaaef
           
  • bbaaef
             

    bbaaef
               Static MAC bindings.  MAC bindings can be known statically based on
    bbaaef
    @@ -2518,6 +2442,35 @@ next;
    bbaaef
             

    bbaaef
           
    bbaaef
     
    bbaaef
    +      
  • bbaaef
    +        

    bbaaef
    +          Static MAC bindings from NAT entries.  MAC bindings can also be known
    bbaaef
    +          for the entries in the NAT table. Below flows are
    bbaaef
    +          programmed for distributed logical routers i.e with a distributed
    bbaaef
    +          router port.
    bbaaef
    +        

    bbaaef
    +
    bbaaef
    +        

    bbaaef
    +          For each row in the NAT table with IPv4 address
    bbaaef
    +          A in the 
    bbaaef
    +          table="NAT" db="OVN_Northbound"/> column of
    bbaaef
    +          <ref table="NAT" db="OVN_Northbound"/> table, a priority-100
    bbaaef
    +          flow with the match outport === P &&
    bbaaef
    +          reg0 == A has actions eth.dst = E;
    bbaaef
    +          next;, where P is the distributed logical router
    bbaaef
    +          port, E is the Ethernet address if set in the
    bbaaef
    +          <ref column="external_mac" table="NAT" db="OVN_Northbound"/> column
    bbaaef
    +          of <ref table="NAT" db="OVN_Northbound"/> table for of type
    bbaaef
    +          dnat_and_snat, otherwise the Ethernet address of the
    bbaaef
    +          distributed logical router port.
    bbaaef
    +        

    bbaaef
    +
    bbaaef
    +        

    bbaaef
    +          For IPv6 NAT entries, same flows are added, but using the register
    bbaaef
    +          xxreg0 for the match.
    bbaaef
    +        

    bbaaef
    +      
    bbaaef
    +
    bbaaef
           
  • bbaaef
             

    bbaaef
               Dynamic MAC bindings.  These flows resolve MAC-to-IP bindings
    bbaaef
    @@ -2640,20 +2593,6 @@ icmp4 {
    bbaaef
         

    bbaaef
     
    bbaaef
         
      bbaaef
      -      
    • bbaaef
      -        A priority-300 logical flow with match
      bbaaef
      -        REGBIT_DISTRIBUTED_NAT == 1 has action
      bbaaef
      -        next;
      bbaaef
      -      
      bbaaef
      -      
    • bbaaef
      -        A priority-200 logical flow with match
      bbaaef
      -        REGBIT_NAT_REDIRECT == 1 has actions
      bbaaef
      -        outport = CR; next;, where CR
      bbaaef
      -        is the chassisredirect port representing the instance
      bbaaef
      -        of the logical router distributed gateway port on the
      bbaaef
      -        redirect-chassis.
      bbaaef
      -      
      bbaaef
      -
      bbaaef
             
    • bbaaef
               A priority-150 logical flow with match
      bbaaef
               outport == GW &&
      bbaaef
      @@ -2945,19 +2884,6 @@ nd_ns {
      bbaaef
             ports specifies a redirect-chassis.
      bbaaef
           

      bbaaef
       
      bbaaef
      -    

      bbaaef
      -      Earlier in the ingress pipeline, some east-west traffic was
      bbaaef
      -      redirected to the chassisredirect port, based on
      bbaaef
      -      flows in the UNSNAT and DNAT ingress
      bbaaef
      -      tables setting the REGBIT_NAT_REDIRECT flag, which
      bbaaef
      -      then triggered a match to a flow in the
      bbaaef
      -      Gateway Redirect ingress table.  The intention was
      bbaaef
      -      not to actually send traffic out the distributed gateway port
      bbaaef
      -      instance on the redirect-chassis.  This traffic was
      bbaaef
      -      sent to the distributed gateway port instance in order for DNAT
      bbaaef
      -      and/or SNAT processing to be applied.
      bbaaef
      -    

      bbaaef
      -
      bbaaef
           

      bbaaef
             While UNDNAT and SNAT processing have already occurred by this
      bbaaef
             point, this traffic needs to be forced through egress loopback on
      bbaaef
      @@ -2973,23 +2899,20 @@ nd_ns {
      bbaaef
       
      bbaaef
           
        bbaaef
               
      • bbaaef
        -        

        bbaaef
        -          For each dnat_and_snat NAT rule couple in the
        bbaaef
        -          OVN Northbound database on a distributed router,
        bbaaef
        -          a priority-200 logical with match
        bbaaef
        -          ip4.dst == external_ip0 &&
        bbaaef
        -          ip4.src == external_ip1, has action
        bbaaef
        -          next;
        bbaaef
        -        

        bbaaef
        -
        bbaaef
                 

        bbaaef
                   For each NAT rule in the OVN Northbound database on a
        bbaaef
                   distributed router, a priority-100 logical flow with match
        bbaaef
                   ip4.dst == E &&
        bbaaef
        -          outport == GW, where E is the
        bbaaef
        -          external IP address specified in the NAT rule, and GW
        bbaaef
        -          is the logical router distributed gateway port, with the
        bbaaef
        -          following actions:
        bbaaef
        +          outport == GW &&
        bbaaef
        +          is_chassis_resident(P), where E is the
        bbaaef
        +          external IP address specified in the NAT rule, GW
        bbaaef
        +          is the logical router distributed gateway port. For dnat_and_snat
        bbaaef
        +          NAT rule, P is the logical port specified in the NAT rule.
        bbaaef
        +          If 
        bbaaef
        +          table="NAT" db="OVN_Northbound"/> column of
        bbaaef
        +          <ref table="NAT" db="OVN_Northbound"/> table is NOT set, then
        bbaaef
        +          P is the chassisredirect port of
        bbaaef
        +          GW with the following actions:
        bbaaef
                 

        bbaaef
         
        bbaaef
                 
        bbaaef
        diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
        bbaaef
        index 58213742d..e2df94f4b 100644
        bbaaef
        --- a/ovn/northd/ovn-northd.c
        bbaaef
        +++ b/ovn/northd/ovn-northd.c
        bbaaef
        @@ -205,17 +205,16 @@ enum ovn_stage {
        bbaaef
         #define REGBIT_ND_RA_OPTS_RESULT "reg0[5]"
        bbaaef
         
        bbaaef
         /* Register definitions for switches and routers. */
        bbaaef
        -#define REGBIT_NAT_REDIRECT     "reg9[0]"
        bbaaef
        +
        bbaaef
         /* Indicate that this packet has been recirculated using egress
        bbaaef
          * loopback.  This allows certain checks to be bypassed, such as a
        bbaaef
          * logical router dropping packets with source IP address equals
        bbaaef
          * one of the logical router's own IP addresses. */
        bbaaef
        -#define REGBIT_EGRESS_LOOPBACK  "reg9[1]"
        bbaaef
        -#define REGBIT_DISTRIBUTED_NAT  "reg9[2]"
        bbaaef
        +#define REGBIT_EGRESS_LOOPBACK  "reg9[0]"
        bbaaef
         /* Register to store the result of check_pkt_larger action. */
        bbaaef
        -#define REGBIT_PKT_LARGER        "reg9[3]"
        bbaaef
        -#define REGBIT_LOOKUP_NEIGHBOR_RESULT "reg9[4]"
        bbaaef
        -#define REGBIT_SKIP_LOOKUP_NEIGHBOR "reg9[5]"
        bbaaef
        +#define REGBIT_PKT_LARGER        "reg9[1]"
        bbaaef
        +#define REGBIT_LOOKUP_NEIGHBOR_RESULT "reg9[2]"
        bbaaef
        +#define REGBIT_SKIP_LOOKUP_NEIGHBOR "reg9[3]"
        bbaaef
         
        bbaaef
         #define FLAGBIT_NOT_VXLAN "flags[1] == 0"
        bbaaef
         
        bbaaef
        @@ -6599,128 +6598,6 @@ build_routing_policy_flow(struct hmap *lflows, struct ovn_datapath *od,
        bbaaef
             ds_destroy(&actions);
        bbaaef
         }
        bbaaef
         
        bbaaef
        -static void
        bbaaef
        -add_distributed_nat_routes(struct hmap *lflows, const struct ovn_port *op)
        bbaaef
        -{
        bbaaef
        -    struct ds actions = DS_EMPTY_INITIALIZER;
        bbaaef
        -    struct ds match = DS_EMPTY_INITIALIZER;
        bbaaef
        -
        bbaaef
        -    if (!op->od->l3dgw_port) {
        bbaaef
        -        return;
        bbaaef
        -    }
        bbaaef
        -
        bbaaef
        -    if (!op->peer || !op->peer->od->nbs) {
        bbaaef
        -        return;
        bbaaef
        -    }
        bbaaef
        -
        bbaaef
        -    for (size_t i = 0; i < op->od->nbr->n_nat; i++) {
        bbaaef
        -        const struct nbrec_nat *nat = op->od->nbr->nat[i];
        bbaaef
        -        bool found = false;
        bbaaef
        -        struct eth_addr mac;
        bbaaef
        -
        bbaaef
        -        if (strcmp(nat->type, "dnat_and_snat") ||
        bbaaef
        -                !nat->external_mac ||
        bbaaef
        -                !eth_addr_from_string(nat->external_mac, &mac) ||
        bbaaef
        -                !nat->external_ip || !nat->logical_port) {
        bbaaef
        -            continue;
        bbaaef
        -        }
        bbaaef
        -
        bbaaef
        -        const struct ovn_datapath *peer_dp = op->peer->od;
        bbaaef
        -        for (size_t j = 0; j < peer_dp->nbs->n_ports; j++) {
        bbaaef
        -            if (!strcmp(peer_dp->nbs->ports[j]->name, nat->logical_port)) {
        bbaaef
        -                found = true;
        bbaaef
        -                break;
        bbaaef
        -            }
        bbaaef
        -        }
        bbaaef
        -        if (!found) {
        bbaaef
        -            continue;
        bbaaef
        -        }
        bbaaef
        -
        bbaaef
        -        /* Determine if we need to create IPv4 or IPv6 flows */
        bbaaef
        -        ovs_be32 ip;
        bbaaef
        -        struct in6_addr ipv6;
        bbaaef
        -        int family = AF_INET;
        bbaaef
        -        if (!ip_parse(nat->external_ip, &ip) || !ip) {
        bbaaef
        -            family = AF_INET6;
        bbaaef
        -            if (!ipv6_parse(nat->external_ip, &ipv6)) {
        bbaaef
        -                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
        bbaaef
        -                VLOG_WARN_RL(&rl, "bad ip address %s in nat configuration "
        bbaaef
        -                             "for router %s", nat->external_ip, op->key);
        bbaaef
        -                /* We'll create IPv6 flows anyway, but the address
        bbaaef
        -                 * is probably bogus ... */
        bbaaef
        -            }
        bbaaef
        -        }
        bbaaef
        -
        bbaaef
        -        ds_put_format(&match, "inport == %s && "
        bbaaef
        -                      "ip%s.src == %s && ip%s.dst == %s",
        bbaaef
        -                       op->json_key,
        bbaaef
        -                       family == AF_INET ? "4" : "6",
        bbaaef
        -                       nat->logical_ip,
        bbaaef
        -                       family == AF_INET ? "4" : "6",
        bbaaef
        -                       nat->external_ip);
        bbaaef
        -        ds_put_format(&actions, "outport = %s; eth.dst = %s; "
        bbaaef
        -                      REGBIT_DISTRIBUTED_NAT" = 1; "
        bbaaef
        -                      REGBIT_NAT_REDIRECT" = 0; next;",
        bbaaef
        -                      op->od->l3dgw_port->json_key,
        bbaaef
        -                      nat->external_mac);
        bbaaef
        -        ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, 400,
        bbaaef
        -                      ds_cstr(&match), ds_cstr(&actions));
        bbaaef
        -        ds_clear(&match);
        bbaaef
        -        ds_clear(&actions);
        bbaaef
        -
        bbaaef
        -        for (size_t j = 0; j < op->od->nbr->n_nat; j++) {
        bbaaef
        -            const struct nbrec_nat *nat2 = op->od->nbr->nat[j];
        bbaaef
        -            struct eth_addr mac2;
        bbaaef
        -
        bbaaef
        -            if (nat == nat2 || strcmp(nat2->type, "dnat_and_snat") ||
        bbaaef
        -                    !nat2->external_mac ||
        bbaaef
        -                    !eth_addr_from_string(nat2->external_mac, &mac2) ||
        bbaaef
        -                    !nat2->external_ip) {
        bbaaef
        -                continue;
        bbaaef
        -            }
        bbaaef
        -
        bbaaef
        -            family = AF_INET;
        bbaaef
        -            if (!ip_parse(nat2->external_ip, &ip) || !ip) {
        bbaaef
        -                family = AF_INET6;
        bbaaef
        -                if (!ipv6_parse(nat2->external_ip, &ipv6)) {
        bbaaef
        -                    static struct vlog_rate_limit rl =
        bbaaef
        -                        VLOG_RATE_LIMIT_INIT(5, 1);
        bbaaef
        -                    VLOG_WARN_RL(&rl, "bad ip address %s in nat configuration "
        bbaaef
        -                                 "for router %s", nat2->external_ip, op->key);
        bbaaef
        -                    /* We'll create IPv6 flows anyway, but the address
        bbaaef
        -                     * is probably bogus ... */
        bbaaef
        -                }
        bbaaef
        -            }
        bbaaef
        -
        bbaaef
        -            ds_put_format(&match, "inport == %s && "
        bbaaef
        -                          "ip%s.src == %s && ip%s.dst == %s",
        bbaaef
        -                          op->json_key,
        bbaaef
        -                          family == AF_INET ? "4" : "6",
        bbaaef
        -                          nat->logical_ip,
        bbaaef
        -                          family == AF_INET ? "4" : "6",
        bbaaef
        -                          nat2->external_ip);
        bbaaef
        -            ds_put_format(&actions, "outport = %s; "
        bbaaef
        -                          "eth.src = %s; eth.dst = %s; "
        bbaaef
        -                          "%sreg0 = ip%s.dst; %sreg1 = %s; "
        bbaaef
        -                          REGBIT_DISTRIBUTED_NAT" = 1; "
        bbaaef
        -                          REGBIT_NAT_REDIRECT" = 0; next;",
        bbaaef
        -                          op->od->l3dgw_port->json_key,
        bbaaef
        -                          op->od->l3dgw_port->lrp_networks.ea_s,
        bbaaef
        -                          nat2->external_mac,
        bbaaef
        -                          family == AF_INET ? "" : "xx",
        bbaaef
        -                          family == AF_INET ? "4" : "6",
        bbaaef
        -                          family == AF_INET ? "" : "xx",
        bbaaef
        -                          nat->external_ip);
        bbaaef
        -            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, 400,
        bbaaef
        -                          ds_cstr(&match), ds_cstr(&actions));
        bbaaef
        -            ds_clear(&match);
        bbaaef
        -            ds_clear(&actions);
        bbaaef
        -        }
        bbaaef
        -    }
        bbaaef
        -    ds_destroy(&match);
        bbaaef
        -    ds_destroy(&actions);
        bbaaef
        -}
        bbaaef
        -
        bbaaef
         static void
        bbaaef
         add_route(struct hmap *lflows, const struct ovn_port *op,
        bbaaef
                   const char *lrp_addr_s, const char *network_s, int plen,
        bbaaef
        @@ -8144,17 +8021,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
        bbaaef
         
        bbaaef
                             ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 100,
        bbaaef
                                           ds_cstr(&match), ds_cstr(&actions));
        bbaaef
        -
        bbaaef
        -                    /* Traffic received on other router ports must be
        bbaaef
        -                     * redirected to the central instance of the l3dgw_port
        bbaaef
        -                     * for NAT processing. */
        bbaaef
        -                    ds_clear(&match);
        bbaaef
        -                    ds_put_format(&match, "ip && ip%s.dst == %s",
        bbaaef
        -                                  is_v6 ? "6" : "4",
        bbaaef
        -                                  nat->external_ip);
        bbaaef
        -                    ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 50,
        bbaaef
        -                                  ds_cstr(&match),
        bbaaef
        -                                  REGBIT_NAT_REDIRECT" = 1; next;");
        bbaaef
                         }
        bbaaef
                     }
        bbaaef
         
        bbaaef
        @@ -8220,18 +8086,33 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
        bbaaef
         
        bbaaef
                             ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 100,
        bbaaef
                                           ds_cstr(&match), ds_cstr(&actions));
        bbaaef
        +                }
        bbaaef
        +            }
        bbaaef
         
        bbaaef
        -                    /* Traffic received on other router ports must be
        bbaaef
        -                     * redirected to the central instance of the l3dgw_port
        bbaaef
        -                     * for NAT processing. */
        bbaaef
        +            /* ARP resolve for NAT IPs. */
        bbaaef
        +            if (od->l3dgw_port) {
        bbaaef
        +                if (!strcmp(nat->type, "snat")) {
        bbaaef
                             ds_clear(&match);
        bbaaef
        -                    ds_put_format(&match, "ip && ip%s.dst == %s",
        bbaaef
        -                                  is_v6 ? "6" : "4",
        bbaaef
        -                                  nat->external_ip);
        bbaaef
        -                    ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
        bbaaef
        -                                  ds_cstr(&match),
        bbaaef
        -                                  REGBIT_NAT_REDIRECT" = 1; next;");
        bbaaef
        +                    ds_put_format(
        bbaaef
        +                        &match, "inport == %s && %s == %s",
        bbaaef
        +                        od->l3dgw_port->json_key,
        bbaaef
        +                        is_v6 ? "ip6.src" : "ip4.src", nat->external_ip);
        bbaaef
        +                    ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_INPUT, 120,
        bbaaef
        +                                  ds_cstr(&match), "next;");
        bbaaef
                         }
        bbaaef
        +
        bbaaef
        +                ds_clear(&match);
        bbaaef
        +                ds_put_format(
        bbaaef
        +                    &match, "outport == %s && %s == %s",
        bbaaef
        +                    od->l3dgw_port->json_key,
        bbaaef
        +                    is_v6 ? "xxreg0" : "reg0", nat->external_ip);
        bbaaef
        +                ds_clear(&actions);
        bbaaef
        +                ds_put_format(
        bbaaef
        +                    &actions, "eth.dst = %s; next;",
        bbaaef
        +                    distributed ? nat->external_mac :
        bbaaef
        +                    od->l3dgw_port->lrp_networks.ea_s);
        bbaaef
        +                ovn_lflow_add(lflows, od, S_ROUTER_IN_ARP_RESOLVE, 100,
        bbaaef
        +                                ds_cstr(&match), ds_cstr(&actions));
        bbaaef
                     }
        bbaaef
         
        bbaaef
                     /* Egress UNDNAT table: It is for already established connections'
        bbaaef
        @@ -8378,49 +8259,19 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
        bbaaef
                      * ingress pipeline with inport = outport. */
        bbaaef
                     if (od->l3dgw_port) {
        bbaaef
                         /* Distributed router. */
        bbaaef
        -                if (!strcmp(nat->type, "dnat_and_snat") &&
        bbaaef
        -                        nat->external_mac && nat->external_ip &&
        bbaaef
        -                        eth_addr_from_string(nat->external_mac, &mac)) {
        bbaaef
        -                    for (int j = 0; j < od->nbr->n_nat; j++) {
        bbaaef
        -                        const struct nbrec_nat *nat2 = od->nbr->nat[j];
        bbaaef
        -
        bbaaef
        -                        if (nat2 == nat ||
        bbaaef
        -                            strcmp(nat2->type, "dnat_and_snat") ||
        bbaaef
        -                            !nat2->external_mac || !nat2->external_ip) {
        bbaaef
        -                            continue;
        bbaaef
        -                        }
        bbaaef
        -
        bbaaef
        -                        ds_clear(&match);
        bbaaef
        -                        ds_put_format(&match, "is_chassis_resident(\"%s\") && "
        bbaaef
        -                                      "ip%s.src == %s && ip%s.dst == %s",
        bbaaef
        -                                      nat->logical_port,
        bbaaef
        -                                      is_v6 ? "6" : "4", nat2->external_ip,
        bbaaef
        -                                      is_v6 ? "6" : "4", nat->external_ip);
        bbaaef
        -                        ds_clear(&actions);
        bbaaef
        -                        ds_put_format(&actions,
        bbaaef
        -                                      "inport = outport; outport = \"\"; "
        bbaaef
        -                                      "flags = 0; flags.loopback = 1; "
        bbaaef
        -                                      REGBIT_EGRESS_LOOPBACK" = 1; "
        bbaaef
        -                                      "next(pipeline=ingress, table=0); ");
        bbaaef
        -                        ovn_lflow_add(lflows, od, S_ROUTER_OUT_EGR_LOOP, 300,
        bbaaef
        -                                      ds_cstr(&match),  ds_cstr(&actions));
        bbaaef
        -
        bbaaef
        -                        ds_clear(&match);
        bbaaef
        -                        ds_put_format(&match,
        bbaaef
        -                                      "ip%s.src == %s && ip%s.dst == %s",
        bbaaef
        -                                      is_v6 ? "6" : "4", nat2->external_ip,
        bbaaef
        -                                      is_v6 ? "6" : "4", nat->external_ip);
        bbaaef
        -                        ovn_lflow_add(lflows, od, S_ROUTER_OUT_EGR_LOOP, 200,
        bbaaef
        -                                      ds_cstr(&match), "next;");
        bbaaef
        -                        ds_clear(&match);
        bbaaef
        -                    }
        bbaaef
        -                }
        bbaaef
        -
        bbaaef
                         ds_clear(&match);
        bbaaef
                         ds_put_format(&match, "ip%s.dst == %s && outport == %s",
        bbaaef
                                       is_v6 ? "6" : "4",
        bbaaef
                                       nat->external_ip,
        bbaaef
                                       od->l3dgw_port->json_key);
        bbaaef
        +                if (!distributed) {
        bbaaef
        +                    ds_put_format(&match, " && is_chassis_resident(%s)",
        bbaaef
        +                                  od->l3redirect_port->json_key);
        bbaaef
        +                } else {
        bbaaef
        +                    ds_put_format(&match, " && is_chassis_resident(\"%s\")",
        bbaaef
        +                                  nat->logical_port);
        bbaaef
        +                }
        bbaaef
        +
        bbaaef
                         ds_clear(&actions);
        bbaaef
                         ds_put_format(&actions,
        bbaaef
                                       "clone { ct_clear; "
        bbaaef
        @@ -8491,40 +8342,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
        bbaaef
                     * we can do it here, saving a future re-circulation. */
        bbaaef
                     ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
        bbaaef
                                   "ip", "flags.loopback = 1; ct_dnat;");
        bbaaef
        -        } else {
        bbaaef
        -            ovn_lflow_add(lflows, od, S_ROUTER_IN_ARP_RESOLVE, 400,
        bbaaef
        -                          REGBIT_DISTRIBUTED_NAT" == 1", "next;");
        bbaaef
        -
        bbaaef
        -            /* For NAT on a distributed router, add flows to Ingress
        bbaaef
        -             * IP Routing table, Ingress ARP Resolution table, and
        bbaaef
        -             * Ingress Gateway Redirect Table that are not specific to a
        bbaaef
        -             * NAT rule. */
        bbaaef
        -
        bbaaef
        -            /* The highest priority IN_IP_ROUTING rule matches packets
        bbaaef
        -             * with REGBIT_NAT_REDIRECT (set in DNAT or UNSNAT stages),
        bbaaef
        -             * with action "ip.ttl--; next;".  The IN_GW_REDIRECT table
        bbaaef
        -             * will take care of setting the outport. */
        bbaaef
        -            ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING, 300,
        bbaaef
        -                          REGBIT_NAT_REDIRECT" == 1", "ip.ttl--; next;");
        bbaaef
        -
        bbaaef
        -            /* The highest priority IN_ARP_RESOLVE rule matches packets
        bbaaef
        -             * with REGBIT_NAT_REDIRECT (set in DNAT or UNSNAT stages),
        bbaaef
        -             * then sets eth.dst to the distributed gateway port's
        bbaaef
        -             * ethernet address. */
        bbaaef
        -            ds_clear(&actions);
        bbaaef
        -            ds_put_format(&actions, "eth.dst = %s; next;",
        bbaaef
        -                          od->l3dgw_port->lrp_networks.ea_s);
        bbaaef
        -            ovn_lflow_add(lflows, od, S_ROUTER_IN_ARP_RESOLVE, 200,
        bbaaef
        -                          REGBIT_NAT_REDIRECT" == 1", ds_cstr(&actions));
        bbaaef
        -
        bbaaef
        -            /* The highest priority IN_GW_REDIRECT rule redirects packets
        bbaaef
        -             * with REGBIT_NAT_REDIRECT (set in DNAT or UNSNAT stages) to
        bbaaef
        -             * the central instance of the l3dgw_port for NAT processing. */
        bbaaef
        -            ds_clear(&actions);
        bbaaef
        -            ds_put_format(&actions, "outport = %s; next;",
        bbaaef
        -                          od->l3redirect_port->json_key);
        bbaaef
        -            ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 200,
        bbaaef
        -                          REGBIT_NAT_REDIRECT" == 1", ds_cstr(&actions));
        bbaaef
                 }
        bbaaef
         
        bbaaef
                 /* Load balancing and packet defrag are only valid on
        bbaaef
        @@ -8720,9 +8537,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
        bbaaef
                     continue;
        bbaaef
                 }
        bbaaef
         
        bbaaef
        -        /* create logical flows for DVR floating IPs */
        bbaaef
        -        add_distributed_nat_routes(lflows, op);
        bbaaef
        -
        bbaaef
                 for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) {
        bbaaef
                     add_route(lflows, op, op->lrp_networks.ipv4_addrs[i].addr_s,
        bbaaef
                               op->lrp_networks.ipv4_addrs[i].network_s,
        bbaaef
        @@ -9252,9 +9066,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
        bbaaef
                     continue;
        bbaaef
                 }
        bbaaef
                 if (od->l3dgw_port && od->l3redirect_port) {
        bbaaef
        -            ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 300,
        bbaaef
        -                          REGBIT_DISTRIBUTED_NAT" == 1", "next;");
        bbaaef
        -
        bbaaef
                     /* For traffic with outport == l3dgw_port, if the
        bbaaef
                      * packet did not match any higher priority redirect
        bbaaef
                      * rule, then the traffic is redirected to the central
        bbaaef
        diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
        bbaaef
        index da566f900..3e4120ac5 100644
        bbaaef
        --- a/tests/ovn-northd.at
        bbaaef
        +++ b/tests/ovn-northd.at
        bbaaef
        @@ -990,7 +990,7 @@ echo "CR-LRP UUID is: " $uuid
        bbaaef
         # IPV4
        bbaaef
         ovn-nbctl lr-nat-add R1 dnat_and_snat  172.16.1.1 50.0.0.11
        bbaaef
         
        bbaaef
        -OVS_WAIT_UNTIL([test 3 = `ovn-sbctl dump-flows R1 | grep lr_in_unsnat | \
        bbaaef
        +OVS_WAIT_UNTIL([test 2 = `ovn-sbctl dump-flows R1 | grep lr_in_unsnat | \
        bbaaef
         wc -l`])
        bbaaef
         
        bbaaef
         AT_CHECK([ovn-sbctl dump-flows R1 | grep ct_snat | wc -l], [0], [2
        bbaaef
        @@ -1008,7 +1008,7 @@ AT_CHECK([ovn-sbctl dump-flows R1 | grep ip4.src=| wc -l], [0], [0
        bbaaef
         ovn-nbctl lr-nat-del R1 dnat_and_snat  172.16.1.1
        bbaaef
         
        bbaaef
         ovn-nbctl --stateless lr-nat-add R1 dnat_and_snat  172.16.1.1 50.0.0.11
        bbaaef
        -OVS_WAIT_UNTIL([test 3 = `ovn-sbctl dump-flows R1 | grep lr_in_unsnat | \
        bbaaef
        +OVS_WAIT_UNTIL([test 2 = `ovn-sbctl dump-flows R1 | grep lr_in_unsnat | \
        bbaaef
         wc -l`])
        bbaaef
         
        bbaaef
         AT_CHECK([ovn-sbctl dump-flows R1 | grep ct_snat | wc -l], [0], [0
        bbaaef
        @@ -1027,7 +1027,7 @@ ovn-nbctl lr-nat-del R1 dnat_and_snat  172.16.1.1
        bbaaef
         # IPV6
        bbaaef
         ovn-nbctl lr-nat-add R1 dnat_and_snat fd01::1 fd11::2
        bbaaef
         
        bbaaef
        -OVS_WAIT_UNTIL([test 3 = `ovn-sbctl dump-flows R1 | grep lr_in_unsnat | \
        bbaaef
        +OVS_WAIT_UNTIL([test 2 = `ovn-sbctl dump-flows R1 | grep lr_in_unsnat | \
        bbaaef
         wc -l`])
        bbaaef
         
        bbaaef
         AT_CHECK([ovn-sbctl dump-flows R1 | grep ct_snat | wc -l], [0], [2
        bbaaef
        @@ -1045,7 +1045,7 @@ AT_CHECK([ovn-sbctl dump-flows R1 | grep ip6.src=| wc -l], [0], [0
        bbaaef
         ovn-nbctl lr-nat-del R1 dnat_and_snat  fd01::1
        bbaaef
         ovn-nbctl --stateless lr-nat-add R1 dnat_and_snat fd01::1 fd11::2
        bbaaef
         
        bbaaef
        -OVS_WAIT_UNTIL([test 3 = `ovn-sbctl dump-flows R1 | grep lr_in_unsnat | \
        bbaaef
        +OVS_WAIT_UNTIL([test 2 = `ovn-sbctl dump-flows R1 | grep lr_in_unsnat | \
        bbaaef
         wc -l`])
        bbaaef
         
        bbaaef
         AT_CHECK([ovn-sbctl dump-flows R1 | grep ct_snat | wc -l], [0], [0
        bbaaef
        -- 
        bbaaef
        2.24.1
        bbaaef