d7059b
From 63b84acdf5bd7971a7da3137a6aa609c71205625 Mon Sep 17 00:00:00 2001
d7059b
From: Hans Dedecker <dedeckeh@gmail.com>
d7059b
Date: Tue, 27 Jun 2017 22:08:47 +0100
d7059b
Subject: [PATCH] Try other servers if first returns REFUSED when
d7059b
 --strict-order active.
d7059b
d7059b
If a DNS server replies REFUSED for a given DNS query in strict order mode
d7059b
no failover to the next DNS server is triggered as the failover logic only
d7059b
covers non strict mode.
d7059b
As a result the client will be returned the REFUSED reply without first
d7059b
falling back to the secondary DNS server(s).
d7059b
d7059b
Make failover support work as well for strict mode config in case REFUSED is
d7059b
replied by deleting the strict order check and rely only on forwardall being
d7059b
equal to 0 which is the case in non strict mode when a single server has been
d7059b
contacted or when strict order mode has been configured.
d7059b
d7059b
(cherry picked from commit 9396752c115b3ab733fa476b30da73237e12e7ba)
d7059b
d7059b
Stop treating SERVFAIL as a successful response from upstream servers.
d7059b
d7059b
This effectively reverts most of 51967f9807 ("SERVFAIL is an expected
d7059b
error return, don't try all servers.") and 4ace25c5d6 ("Treat REFUSED (not
d7059b
SERVFAIL) as an unsuccessful upstream response").
d7059b
d7059b
With the current behaviour, as soon as dnsmasq receives a SERVFAIL from an
d7059b
upstream server, it stops trying to resolve the query and simply returns
d7059b
SERVFAIL to the client.  With this commit, dnsmasq will instead try to
d7059b
query other upstream servers upon receiving a SERVFAIL response.
d7059b
d7059b
According to RFC 1034 and 1035, the semantic of SERVFAIL is that of a
d7059b
temporary error condition.  Recursive resolvers are expected to encounter
d7059b
network or resources issues from time to time, and will respond with
d7059b
SERVFAIL in this case.  Similarly, if a validating DNSSEC resolver [RFC
d7059b
4033] encounters issues when checking signatures (unknown signing
d7059b
algorithm, missing signatures, expired signatures because of a wrong
d7059b
system clock, etc), it will respond with SERVFAIL.
d7059b
d7059b
Note that all those behaviours are entirely different from a negative
d7059b
response, which would provide a definite indication that the requested
d7059b
name does not exist.  In our case, if an upstream server responds with
d7059b
SERVFAIL, another upstream server may well provide a positive answer for
d7059b
the same query.
d7059b
d7059b
Thus, this commit will increase robustness whenever some upstream servers
d7059b
encounter temporary issues or are misconfigured.
d7059b
d7059b
Quoting RFC 1034, Section 4.3.1. "Queries and responses":
d7059b
d7059b
    If recursive service is requested and available, the recursive response
d7059b
    to a query will be one of the following:
d7059b
d7059b
       - The answer to the query, possibly preface by one or more CNAME
d7059b
         RRs that specify aliases encountered on the way to an answer.
d7059b
d7059b
       - A name error indicating that the name does not exist.  This
d7059b
         may include CNAME RRs that indicate that the original query
d7059b
	  name was an alias for a name which does not exist.
d7059b
d7059b
       - A temporary error indication.
d7059b
d7059b
Here is Section 5.2.3. of RFC 1034, "Temporary failures":
d7059b
d7059b
    In a less than perfect world, all resolvers will occasionally be unable
d7059b
    to resolve a particular request.  This condition can be caused by a
d7059b
    resolver which becomes separated from the rest of the network due to a
d7059b
    link failure or gateway problem, or less often by coincident failure or
d7059b
    unavailability of all servers for a particular domain.
d7059b
d7059b
And finally, RFC 1035 specifies RRCODE 2 for this usage, which is now more
d7059b
widely known as SERVFAIL (RFC 1035, Section 4.1.1. "Header section format"):
d7059b
d7059b
    RCODE           Response code - this 4 bit field is set as part of
d7059b
                    responses.  The values have the following
d7059b
                    interpretation:
d7059b
                    (...)
d7059b
d7059b
                    2               Server failure - The name server was
d7059b
                                    unable to process this query due to a
d7059b
                                    problem with the name server.
d7059b
d7059b
For the DNSSEC-related usage of SERVFAIL, here is RFC 4033
d7059b
Section 5. "Scope of the DNSSEC Document Set and Last Hop Issues":
d7059b
d7059b
    A validating resolver can determine the following 4 states:
d7059b
    (...)
d7059b
d7059b
    Insecure: The validating resolver has a trust anchor, a chain of
d7059b
       trust, and, at some delegation point, signed proof of the
d7059b
       non-existence of a DS record.  This indicates that subsequent
d7059b
       branches in the tree are provably insecure.  A validating resolver
d7059b
       may have a local policy to mark parts of the domain space as
d7059b
       insecure.
d7059b
d7059b
    Bogus: The validating resolver has a trust anchor and a secure
d7059b
       delegation indicating that subsidiary data is signed, but the
d7059b
       response fails to validate for some reason: missing signatures,
d7059b
       expired signatures, signatures with unsupported algorithms, data
d7059b
       missing that the relevant NSEC RR says should be present, and so
d7059b
       forth.
d7059b
    (...)
d7059b
d7059b
    This specification only defines how security-aware name servers can
d7059b
    signal non-validating stub resolvers that data was found to be bogus
d7059b
    (using RCODE=2, "Server Failure"; see [RFC4035]).
d7059b
d7059b
Notice the difference between a definite negative answer ("Insecure"
d7059b
state), and an indefinite error condition ("Bogus" state).  The second
d7059b
type of error may be specific to a recursive resolver, for instance
d7059b
because its system clock has been incorrectly set, or because it does not
d7059b
implement newer cryptographic primitives.  Another recursive resolver may
d7059b
succeed for the same query.
d7059b
d7059b
There are other similar situations in which the specified behaviour is
d7059b
similar to the one implemented by this commit.
d7059b
d7059b
For instance, RFC 2136 specifies the behaviour of a "requestor" that wants
d7059b
to update a zone using the DNS UPDATE mechanism.  The requestor tries to
d7059b
contact all authoritative name servers for the zone, with the following
d7059b
behaviour specified in RFC 2136, Section 4:
d7059b
d7059b
    4.6. If a response is received whose RCODE is SERVFAIL or NOTIMP, or
d7059b
    if no response is received within an implementation dependent timeout
d7059b
    period, or if an ICMP error is received indicating that the server's
d7059b
    port is unreachable, then the requestor will delete the unusable
d7059b
    server from its internal name server list and try the next one,
d7059b
    repeating until the name server list is empty.  If the requestor runs
d7059b
    out of servers to try, an appropriate error will be returned to the
d7059b
    requestor's caller.
d7059b
d7059b
(cherry picked from commit 68f6312d4bae30b78daafcd6f51dc441b8685b1e)
d7059b
---
d7059b
 src/forward.c | 4 ++--
d7059b
 1 file changed, 2 insertions(+), 2 deletions(-)
d7059b
d7059b
diff --git a/src/forward.c b/src/forward.c
d7059b
index 245c448..1bbb264 100644
d7059b
--- a/src/forward.c
d7059b
+++ b/src/forward.c
d7059b
@@ -794,7 +794,6 @@ void reply_query(int fd, int family, time_t now)
d7059b
   /* Note: if we send extra options in the EDNS0 header, we can't recreate
d7059b
      the query from the reply. */
d7059b
   if (RCODE(header) == REFUSED &&
d7059b
-      !option_bool(OPT_ORDER) &&
d7059b
       forward->forwardall == 0 &&
d7059b
       !(forward->flags & FREC_HAS_EXTRADATA))
d7059b
     /* for broken servers, attempt to send to another one. */
d7059b
@@ -859,7 +858,8 @@ void reply_query(int fd, int family, time_t now)
d7059b
      we get a good reply from another server. Kill it when we've
d7059b
      had replies from all to avoid filling the forwarding table when
d7059b
      everything is broken */
d7059b
-  if (forward->forwardall == 0 || --forward->forwardall == 1 || RCODE(header) != REFUSED)
d7059b
+  if (forward->forwardall == 0 || --forward->forwardall == 1 ||
d7059b
+      (RCODE(header) != REFUSED && RCODE(header) != SERVFAIL))
d7059b
     {
d7059b
       int check_rebind = 0, no_cache_dnssec = 0, cache_secure = 0, bogusanswer = 0;
d7059b
       
d7059b
-- 
d7059b
2.21.1
d7059b