Tree - rpms/resource-agents - CentOS Git server

rpms / resource-agents

Blame SOURCES/bz1718219-podman-4-use-exec-to-avoid-performance-issues.patch

Blob History Raw

		b4b3ce	`From 6016283dfdcb45bf750f96715fc653a4c0904bca Mon Sep 17 00:00:00 2001`
		b4b3ce	`From: Damien Ciabrini <dciabrin@redhat.com>`
		b4b3ce	`Date: Fri, 28 Jun 2019 13:34:40 +0200`
		b4b3ce	`Subject: [PATCH] podman: only use exec to manage container's lifecycle`
		b4b3ce
		b4b3ce	`Under heavy IO load, podman may be impacted and take a long time`
		b4b3ce	`to execute some actions. If that takes more than the default`
		b4b3ce	`20s container monitoring timeout, containers will restart unexpectedly.`
		b4b3ce
		b4b3ce	`Replace all IO-sensitive podman calls (inspect, exists...) by`
		b4b3ce	`equivalent "podman exec" calls, because the latter command seems`
		b4b3ce	`less prone to performance degradation under IO load.`
		b4b3ce
		b4b3ce	`With this commit, the resource agent now requires podman 1.0.2+,`
		b4b3ce	`because it relies on of two different patches [1,2] that improve`
		b4b3ce	`IO performance and enable to distinguish "container stopped"`
		b4b3ce	`"container doesn't exist" error codes.`
		b4b3ce
		b4b3ce	`Tested on an OpenStack environment with podman 1.0.2, with the`
		b4b3ce	`following scenario:`
		b4b3ce	`. regular start/stop/monitor operations`
		b4b3ce	`. probe operations (pcs resource cleanup/refresh)`
		b4b3ce	`. unmanage/manage operations`
		b4b3ce	`. reboot`
		b4b3ce
		b4b3ce	`[1] https://github.com/containers/libpod/commit/90b835db69d589de559462d988cb3fae5cf1ef49`
		b4b3ce	`[2] https://github.com/containers/libpod/commit/a19975f96d2ee7efe186d9aa0be42285cfafa3f4`
		b4b3ce	`---`
		b4b3ce	`heartbeat/podman \| 75 ++++++++++++++++++++++++------------------------`
		b4b3ce	`1 file changed, 37 insertions(+), 38 deletions(-)`
		b4b3ce
		b4b3ce	`diff --git a/heartbeat/podman b/heartbeat/podman`
		b4b3ce	`index 51f6ba883..8fc2c4695 100755`
		b4b3ce	`--- a/heartbeat/podman`
		b4b3ce	`+++ b/heartbeat/podman`
		b4b3ce	`@@ -129,9 +129,6 @@ the health of the container. This command must return 0 to indicate that`
		b4b3ce	`the container is healthy. A non-zero return code will indicate that the`
		b4b3ce	`container has failed and should be recovered.`
		b4b3ce
		b4b3ce	`-If 'podman exec' is supported, it is used to execute the command. If not,`
		b4b3ce	`-nsenter is used.`
		b4b3ce	`-`
		b4b3ce	`Note: Using this method for monitoring processes inside a container`
		b4b3ce	`is not recommended, as containerd tries to track processes running`
		b4b3ce	`inside the container and does not deal well with many short-lived`
		b4b3ce	`@@ -192,17 +189,13 @@ monitor_cmd_exec()`
		b4b3ce	`local rc=$OCF_SUCCESS`
		b4b3ce	`local out`
		b4b3ce
		b4b3ce	`- if [ -z "$OCF_RESKEY_monitor_cmd" ]; then`
		b4b3ce	`- return $rc`
		b4b3ce	`- fi`
		b4b3ce	`-`
		b4b3ce	`out=$(podman exec ${CONTAINER} $OCF_RESKEY_monitor_cmd 2>&1)`
		b4b3ce	`rc=$?`
		b4b3ce	`- if [ $rc -eq 127 ]; then`
		b4b3ce	`- ocf_log err "monitor cmd failed (rc=$rc), output: $out"`
		b4b3ce	`- ocf_exit_reason "monitor_cmd, ${OCF_RESKEY_monitor_cmd} , not found within container."`
		b4b3ce	`- # there is no recovering from this, exit immediately`
		b4b3ce	`- exit $OCF_ERR_ARGS`
		b4b3ce	`+ # 125: no container with name or ID ${CONTAINER} found`
		b4b3ce	`+ # 126: container state improper (not running)`
		b4b3ce	`+ # 127: any other error`
		b4b3ce	`+ if [ $rc -eq 125 ] \|\| [ $rc -eq 126 ]; then`
		b4b3ce	`+ rc=$OCF_NOT_RUNNING`
		b4b3ce	`elif [ $rc -ne 0 ]; then`
		b4b3ce	`ocf_exit_reason "monitor cmd failed (rc=$rc), output: $out"`
		b4b3ce	`rc=$OCF_ERR_GENERIC`
		b4b3ce	`@@ -215,7 +208,16 @@ monitor_cmd_exec()`
		b4b3ce
		b4b3ce	`container_exists()`
		b4b3ce	`{`
		b4b3ce	`- podman inspect --format {{.State.Running}} $CONTAINER \| egrep '(true\|false)' >/dev/null 2>&1`
		b4b3ce	`+ local rc`
		b4b3ce	`+ local out`
		b4b3ce	`+`
		b4b3ce	`+ out=$(podman exec ${CONTAINER} $OCF_RESKEY_monitor_cmd 2>&1)`
		b4b3ce	`+ rc=$?`
		b4b3ce	`+ # 125: no container with name or ID ${CONTAINER} found`
		b4b3ce	`+ if [ $rc -ne 125 ]; then`
		b4b3ce	`+ return 0`
		b4b3ce	`+ fi`
		b4b3ce	`+ return 1`
		b4b3ce	`}`
		b4b3ce
		b4b3ce	`remove_container()`
		b4b3ce	`@@ -236,30 +238,30 @@ remove_container()`
		b4b3ce
		b4b3ce	`podman_simple_status()`
		b4b3ce	`{`
		b4b3ce	`- local val`
		b4b3ce	`-`
		b4b3ce	`- # retrieve the 'Running' attribute for the container`
		b4b3ce	`- val=$(podman inspect --format {{.State.Running}} $CONTAINER 2>/dev/null)`
		b4b3ce	`- if [ $? -ne 0 ]; then`
		b4b3ce	`- #not running as a result of container not being found`
		b4b3ce	`- return $OCF_NOT_RUNNING`
		b4b3ce	`- fi`
		b4b3ce	`+ local rc`
		b4b3ce
		b4b3ce	`- if ocf_is_true "$val"; then`
		b4b3ce	`- # container exists and is running`
		b4b3ce	`- return $OCF_SUCCESS`
		b4b3ce	`+ # simple status is implemented via podman exec`
		b4b3ce	`+ # everything besides success is considered "not running"`
		b4b3ce	`+ monitor_cmd_exec`
		b4b3ce	`+ rc=$?`
		b4b3ce	`+ if [ $rc -ne $OCF_SUCCESS ]; then`
		b4b3ce	`+ rc=$OCF_NOT_RUNNING;`
		b4b3ce	`fi`
		b4b3ce	`-`
		b4b3ce	`- return $OCF_NOT_RUNNING`
		b4b3ce	`+ return $rc`
		b4b3ce	`}`
		b4b3ce
		b4b3ce	`podman_monitor()`
		b4b3ce	`{`
		b4b3ce	`- if [ -z "$OCF_RESKEY_monitor_cmd" ]; then`
		b4b3ce	`- podman_simple_status`
		b4b3ce	`- return $?`
		b4b3ce	`- fi`
		b4b3ce	`+ # We rely on running podman exec to monitor the container`
		b4b3ce	`+ # state because that command seems to be less prone to`
		b4b3ce	`+ # performance issue under IO load.`
		b4b3ce	`+ #`
		b4b3ce	`+ # For probes to work, we expect cmd_exec to be able to report`
		b4b3ce	`+ # when a container is not running. Here, we're not interested`
		b4b3ce	`+ # in distinguishing whether it's stopped or non existing`
		b4b3ce	`+ # (there's function container_exists for that)`
		b4b3ce	`monitor_cmd_exec`
		b4b3ce	`+ return $?`
		b4b3ce	`}`
		b4b3ce
		b4b3ce	`podman_create_mounts() {`
		b4b3ce	`@@ -416,14 +418,6 @@ podman_validate()`
		b4b3ce	`exit $OCF_ERR_CONFIGURED`
		b4b3ce	`fi`
		b4b3ce
		b4b3ce	`- if [ -n "$OCF_RESKEY_monitor_cmd" ]; then`
		b4b3ce	`- podman exec --help >/dev/null 2>&1`
		b4b3ce	`- if [ ! $? ]; then`
		b4b3ce	`- ocf_log info "checking for nsenter, which is required when 'monitor_cmd' is specified"`
		b4b3ce	`- check_binary nsenter`
		b4b3ce	`- fi`
		b4b3ce	`- fi`
		b4b3ce	`-`
		b4b3ce	`image_exists`
		b4b3ce	`if [ $? -ne 0 ]; then`
		b4b3ce	`ocf_exit_reason "base image, ${OCF_RESKEY_image}, could not be found."`
		b4b3ce	`@@ -457,6 +451,11 @@ fi`
		b4b3ce
		b4b3ce	`CONTAINER=$OCF_RESKEY_name`
		b4b3ce
		b4b3ce	`+# Note: we currently monitor podman containers by with the "podman exec"`
		b4b3ce	`+# command, so make sure that invocation is always valid by enforcing the`
		b4b3ce	`+# exec command to be non-empty`
		b4b3ce	`+: ${OCF_RESKEY_monitor_cmd:=/bin/true}`
		b4b3ce	`+`
		b4b3ce	`case $__OCF_ACTION in`
		b4b3ce	`meta-data) meta_data`
		b4b3ce	`exit $OCF_SUCCESS;;`

rpms / resource-agents

Source Code

Blame SOURCES/bz1718219-podman-4-use-exec-to-avoid-performance-issues.patch