From 204c560922ebdd84cad601d13f0474dce23ae72d Mon Sep 17 00:00:00 2001 From: Fabian Arrotin Date: May 13 2022 07:14:58 +0000 Subject: Adding pointers and notes to add a new etcd nodes to ocp cluster Signed-off-by: Fabian Arrotin --- diff --git a/docs/operations/ci/adding_nodes.md b/docs/operations/ci/adding_nodes.md index e84e688..9e2a511 100644 --- a/docs/operations/ci/adding_nodes.md +++ b/docs/operations/ci/adding_nodes.md @@ -1,9 +1,8 @@ -# Adding Compute/Worker nodes +# Adding Compute/Worker nodes This SOP should be used in the following scenario: - Red Hat OpenShift Container Platform 4.x cluster has been installed some time ago (1+ days ago) and additional worker nodes are required to increase the capacity for the cluster. - ## Steps 1. Add the new nodes being added to the cluster to the appropriate inventory file in the appropriate group. @@ -35,7 +34,6 @@ newnode9.example.centos.org newnode10.example.centos.org ``` - 2. Examine the `inventory` file for `ocp` or `ocp.stg` and determine which management node corresponds with the group `ocp-ci-management`. eg: @@ -116,6 +114,42 @@ ansible-playbook playbooks/role-haproxy.yml --tags="config" To see more information about adding new worker/compute nodes to a user provisioned infrastructure based OCP4 cluster see the detailed steps at [1],[2]. +# Adding/Replacing etcd/control plane nodes + +Depending on the scenario (just adding more control planes nodes or just installing new one as one failed), you'll need to take some actions first (or not) + +## Deleting from cluster a dead node (hardware issue) (and only if needed) + +If you have one unrecoverable node and that you don't even want to reinstall on same node (same hostname/ip address/etc), you can start by following [official doc](https://docs.openshift.com/container-platform/4.9/backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.html#restore-identify-unhealthy-etcd-member_replacing-unhealthy-etcd-member) + +So basically : + + * reviewing which node to remove with from etcd cluster (`oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd`) + * taking remote shell on one of the remaining etcd nodes (`oc rsh -n openshift-etcd `) + * delete it from cluster (`etcdctl member remove `) + * remove secrets for *that* node from openshit (` oc get secrets -n openshift-etcd | grep |awk '{print $1}'|while read secret ; do oc delete secret -n openshift-etcd ${secret};done` ) + * delete node from openshift (`oc delete node NotReady master ` through `oc get nodes` + * once all signed csr are processed, you should see activity through `oc get pods -n openshift-etcd ` and some containers being created and finally appearing as `Ready` ### Resources