From 3dbe5903235efabcaf438f5e5f526946dfbdf661 Mon Sep 17 00:00:00 2001 From: Christopher Faulet Date: Wed, 2 May 2018 12:12:45 +0200 Subject: [PATCH] BUG/MINOR: checks: Fix check->health computation for flapping servers This patch fixes an old bug introduced in the commit 7b1d47ce ("MAJOR: checks: move health checks changes to set_server_check_status()"). When a DOWN server is flapping, everytime a check succeds, check->health is incremented. But when a check fails, it is decremented only when it is higher than the rise value. So if only one check succeds for a DOWN server, check->health will remain set to 1 for all subsequent failing checks. So, at first glance, it seems not that terrible because the server remains DOWN. But it is reported in the transitional state "DOWN server, going up". And it will remain in this state until it is UP again. And there is also an insidious side effect. If a DOWN server is flapping time to time, It will end to be considered UP after a uniq successful check, , regardless the rise threshold, because check->health will be increased slowly and never decreased. To fix the bug, we just need to reset check->health to 0 when a check fails for a DOWN server. To do so, we just need to relax the condition to handle a failure in the function set_server_check_status. This patch must be backported to haproxy 1.5 and newer. (cherry picked from commit b119a79fc336f2b6074de1c3113b1682c717985c) Signed-off-by: Willy Tarreau (cherry picked from commit edb5a1efd22eb9918574d962640cd2ae3bb45ad3) Signed-off-by: William Lallemand (cherry picked from commit 6d2f7fb1531a446dcf609e1340a1c1e40e907a39) Signed-off-by: Willy Tarreau --- src/checks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/checks.c b/src/checks.c index 27a23b21..fcd85aba 100644 --- a/src/checks.c +++ b/src/checks.c @@ -247,7 +247,7 @@ static void set_server_check_status(struct check *check, short status, const cha */ if ((!(check->state & CHK_ST_AGENT) || (check->status >= HCHK_STATUS_L57DATA)) && - (check->health >= check->rise)) { + (check->health > 0)) { s->counters.failed_checks++; report = 1; check->health--; -- 2.29.2