#1 add OCP4 docs
Merged 3 years ago by arrfab. Opened 3 years ago by siddharthvipul1.
centos/ siddharthvipul1/centos-infra-docs master  into  master

@@ -0,0 +1,17 @@ 

+ # Add user to jumphost

+ 

+ Jumphost users live in inventory file `inventory/ci-ssh-jumphosts`

+ 

+ * make an entry of user in inventory in the following format

+ ```

+ login_name: billgates

+ full_name: "Bill Gates | Microsoft loves linux"

+ ssh_pub_keys:

+   - "- - - - "

+ ```

+ * Ensure all the lastest commits in playbook directory and inventory are pulled locally.

+ `git pull inventory/ && git pull playbooks`

+ * Use playbook baseline-playbook to envoke role role-baseline. we will limit

+   the run to just the ci ssh jumphost.

+ `ansible-playbook playbooks/role-baseline.yml --limit ci-ssh-jumphosts`

+ * Update remote with latest changes

@@ -0,0 +1,49 @@ 

+ # Adding users to the cluster admin group

+ To add cluster admin privileges to a particular user do the following.

+ 

+ When authenticating to the Openshift cluster via ACO, it will automatically create a User object within Openshift. eg:

+ 

+ ```

+ kind: User

+ apiVersion: user.openshift.io/v1

+ metadata:

+   name: email@address.com

+ ...

+ ```

+ 

+ Created a Group ocp-ci-admins, and added the following users. Each "user" corresponds with the metadata, name for the corresponding User object.

+ 

+ ```

+ kind: Group

+ apiVersion: user.openshift.io/v1

+ metadata:

+   name: ocp-ci-admins

+   selfLink: /apis/user.openshift.io/v1/groups/ocp-ci-admins

+   uid: 24a5ad4d-7ee0-4e30-8f92-4b398ba5d389

+   resourceVersion: '6800501'

+   creationTimestamp: '2020-05-27T16:03:26Z'

+ users:

+   - email@address.com

+ ```

+ 

+ Added a ClusterRoleBinding, to bind our Group ocp-ci-admins to the ClusterRole cluster-admin

+ 

+ ```

+ kind: ClusterRoleBinding

+ apiVersion: rbac.authorization.k8s.io/v1

+ metadata:

+   name: ocp-ci-cluster-admins

+   selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/ocp-ci-cluster-admins

+   uid: 7979a53b-6597-4ec7-9d6c-53b5ab8004c7

+   resourceVersion: '6799178'

+   creationTimestamp: '2020-05-27T16:03:58Z'

+ subjects:

+   - kind: Group

+     apiGroup: rbac.authorization.k8s.io

+     name: ocp-ci-admins

+ roleRef:

+   apiGroup: rbac.authorization.k8s.io

+   kind: ClusterRole

+   name: cluster-admin

+ ```

+ 

@@ -0,0 +1,106 @@ 

+ # SOP to Create a duffy API/SSH keys

+ This SOP covers the process of creating an API key for duffy, and adding it to the duffy database table

+ 

+ 

+ ## Requirements

+ 

+ - project name

+ 

+ ## Duffy Database Schemas

+ 

+ ```

+ MariaDB [duffy]> show tables;

+ +-----------------+

+ | Tables_in_duffy |

+ +-----------------+

+ | alembic_version |

+ | session_archive |

+ | session_hosts   |

+ | sessions        |

+ | stock           |

+ | userkeys        |

+ | users           |

+ +-----------------+

+ 7 rows in set (0.00 sec)

+ 

+ MariaDB [duffy]> describe stock;

+ +--------------+--------------+------+-----+---------+-------+

+ | Field        | Type         | Null | Key | Default | Extra |

+ +--------------+--------------+------+-----+---------+-------+

+ | id           | int(11)      | NO   | PRI | NULL    |       |

+ | hostname     | varchar(20)  | YES  |     | NULL    |       |

+ | ip           | varchar(15)  | YES  |     | NULL    |       |

+ | chassis      | varchar(20)  | YES  |     | NULL    |       |

+ | used_count   | int(11)      | YES  |     | NULL    |       |

+ | state        | varchar(20)  | YES  |     | NULL    |       |

+ | comment      | varchar(255) | YES  |     | NULL    |       |

+ | distro       | varchar(20)  | YES  |     | NULL    |       |

+ | rel          | varchar(10)  | YES  |     | NULL    |       |

+ | ver          | varchar(10)  | YES  |     | NULL    |       |

+ | arch         | varchar(10)  | YES  |     | NULL    |       |

+ | pool         | int(11)      | YES  |     | NULL    |       |

+ | console_port | int(11)      | YES  |     | NULL    |       |

+ | flavor       | varchar(20)  | YES  |     | NULL    |       |

+ | session_id   | varchar(37)  | YES  | MUL | NULL    |       |

+ | next_state   | varchar(20)  | YES  |     | NULL    |       |

+ +--------------+--------------+------+-----+---------+-------+

+ 16 rows in set (0.01 sec)

+ 

+ MariaDB [duffy]> describe users;

+ +-------------+-------------+------+-----+---------+-------+

+ | Field       | Type        | Null | Key | Default | Extra |

+ +-------------+-------------+------+-----+---------+-------+

+ | apikey      | varchar(37) | NO   | PRI |         |       |

+ | projectname | varchar(50) | YES  |     | NULL    |       |

+ | jobname     | varchar(50) | YES  |     | NULL    |       |

+ | createdat   | date        | YES  |     | NULL    |       |

+ | limitnodes  | int(11)     | YES  |     | NULL    |       |

+ +-------------+-------------+------+-----+---------+-------+

+ 5 rows in set (0.00 sec)

+ 

+ MariaDB [duffy]> describe userkeys;

+ +------------+---------------+------+-----+---------+----------------+

+ | Field      | Type          | Null | Key | Default | Extra          |

+ +------------+---------------+------+-----+---------+----------------+

+ | id         | int(11)       | NO   | PRI | NULL    | auto_increment |

+ | project_id | varchar(37)   | YES  | MUL | NULL    |                |

+ | key        | varchar(8192) | YES  |     | NULL    |                |

+ +------------+---------------+------+-----+---------+----------------+

+ 3 rows in set (0.00 sec)

+ 

+ MariaDB [duffy]> 

+ 

+ ```

+ 

+ 

+ ```

+ +-----------+----------------------+----------------------+------------+-------------+

+ | apikey    | projectname          | jobname              | createdat  | limitnodes |

+ +-----------+----------------------+----------------------+------------+-------------+

+ | xxxx-yyyy | nfs-ganesha          | nfs-ganesha          | 2016-02-24 |         10 |

+ | zzzz-aaaa | CentOS               | centos_arrfab        | 2015-04-17 |         10 |

+ +-----------+----------------------+----------------------+------------+-------------+

+ ```

+ 

+ ## Steps to create a new duffy SSH key

+ 1. On the home directory of user duffy on the admin.ci.centos.org instance, we have a folder where we store the created ssh keys for duffy tenants.

+ 2. `mkdir -p keys/project-name/` then `ssh-keygen -f ~duffy/keys/project-name/id_rsa -C project-name@CI`

+ 3. Copy the public key

+ 

+ ## Steps to create a new duffy API key

+ 

+ 1. How do we connect to instances

+ 

+ The Duffy database runs on the admin.ci node: `ssh admin.ci.centos.org`. 

+ 

+ 2. We have a script which does this work.. how do we use it

+ 

+ 3. Create user in usertable

+ `insert into users values(UUID(), 'projectname', 'projectname', NOW(), 5);`

+ 

+ 4. Retrieve the api key from the users table

+ ` select * from users where projectname="projectname";`

+ 

+ 5. Using that api-key/UUID as project_id, enter ssh key of a user from the project so that they can ssh into the machines. This process must be repeated for every user we wish to add access to via SSH.

+ `insert into userkeys (`project_id`,`key`) values('<project-UUID>', '<ssh-key>');`

+ This ssh key is pushed to duffy nodes - authorized keys when a tenant requests the node through api key.

@@ -0,0 +1,123 @@ 

+ # Adding Compute/Worker nodes

+ This SOP should be used in the following scenario:

+ 

+ - Red Hat OpenShift Container Platform 4.x cluster has been installed some time ago (1+ days ago) and additional worker nodes are required to increase the capacity for the cluster.

+ 

+ 

+ ## Steps

+ 

+ 1. Add the new nodes being added to the cluster to the appropriate inventory file in the appropriate group.

+ 

+ eg:

+ 

+ ```

+ # ocp, compute/worker:

+ [ocp-ci-compute]

+ newnode1.example.centos.org

+ newnode2.example.centos.org

+ newnode3.example.centos.org

+ newnode4.example.centos.org

+ newnode5.example.centos.org

+ ```

+ 

+ eg:

+ 

+ ```

+ # ocp.stg, compute/worker:

+ [ocp-stg-ci-compute]

+ newnode6.example.centos.org

+ newnode7.example.centos.org

+ newnode8.example.centos.org

+ newnode9.example.centos.org

+ 

+ # ocp.stg, master/control plane

+ [ocp-stg-ci-master]

+ newnode10.example.centos.org

+ ```

+ 

+ 

+ 2. Examine the `inventory` file for `ocp` or `ocp.stg` and determine which management node corresponds with the group `ocp-ci-management`.

+ 

+ eg:

+ 

+ ```

+ [ocp-ci-management]

+ some-managementnode.example.centos.org

+ ```

+ 

+ 3. Find the OCP admin user which is contained in the hostvars for this management node at the key `ocp_service_account`.

+ 

+ eg:

+ 

+ ```

+ host_vars/some-managementnode.example.centos.org:ocp_service_account: adminuser

+ ```

+ 

+ 4. SSH to the node identified in step `2`, and become the user identified in step `3`.

+ 

+ eg:

+ 

+ ```

+ ssh some-managementnode.example.centos.org

+ 

+ sudo su - adminuser

+ ```

+   

+ 5. Verify that you are authenticated correctly to the Openshift cluster as the `system:admin`.

+ 

+ ```

+ oc whoami

+ system:admin

+ ```

+ 

+ 6. Retrieve the certificate from the internal API and convert the contents to base64 string like so.

+ 

+ eg:

+ 

+ ```

+ echo "q" | openssl s_client -connect api-int.ocp.ci.centos.org:22623  -showcerts | awk '/-----BEGIN CERTIFICATE-----/,/-----END CERTIFICATE-----/' | base64 --wrap=0

+ DONE

+ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXCERTSTOREDASABASE64ENCODEDSTRING=

+ ```

+ 

+ 7. Replace the cert in the compute/worker ignition file, at the `XXXXXXXXREPLACEMEXXXXXXXX=` point, be sure to save this change in SCM, and push.

+ 

+ ```

+ cat filestore/rhcos/compute.ign

+ {"ignition":{"config":{"append":[{"source":"https://api-int.ocp.ci.centos.org:22623/config/worker","verification":{}}]},"security":{"tls":{"certificateAuthorities":[{"source":"data:text/plain;charset=utf-8;base64,XXXXXXXXREPLACEMEXXXXXXXX=","verification":{}}]}},"timeouts":{},"version":"2.2.0"},"networkd":{},"passwd":{},"storage":{"disks":[{"device":"/dev/sdb","wipeTable":true}]},"systemd":{}}

+ ```

+ 

+ 8. Once the ignition file has been updated, run the `adhoc-provision-ocp4-node` playbook to copy the updated ignition files up to the http server, and install the new node(s). When prompted, specify the hostname of the new node. Best to do one at a time, it takes a minute or two per new node being added at this step.

+ 

+ eg:

+ 

+ ```

+ ansible-playbook playbooks/adhoc-provision-ocp4-node.yml

+ [WARNING] Nodes to be fully wiped/reinstalled with OCP => : newnode6.example.centos.org

+ ```

+ 

+ 9. As the new nodes are provisioned, they will attempt to join the cluster. They must first be accepted.

+ 

+ ```

+ # List the certs. If you see status pending, this is the worker/compute nodes attempting to join the cluster. It must be approved.

+ oc get csr

+ 

+ # Accept all node CSRs one liner

+ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve

+ ```

+ 

+ 

+ 10. Finally run the playbook to update haproxy config to monitor the new nodes.

+ 

+ ```

+ ansible-playbook playbooks/role-haproxy.yml --tags="config"

+ ```

+ 

+ 

+ To see more information about adding new worker/compute nodes to a user provisioned infrastructure based OCP4 cluster see the detailed steps at [1],[2].

+ 

+ 

+ ### Resources

+ 

+ - [1] [How to add Openshift 4 RHCOS worker nodes in UPI <24 hours](https://access.redhat.com/solutions/4246261)

+ - [2] [How to add Openshift 4 RHCOS worker nodes to UPI >24 hours](https://access.redhat.com/solutions/4799921)

@@ -0,0 +1,63 @@ 

+ # Adding OIDC Authentication

+ In CentOS, we have an instance of Ipsilon[1] which we currently use to authenticate many of our services. 

+ 

+ 

+ ### Steps

+ This SOP covers configuring ocp.ci/ocp.stg.ci with an OpenID identity provider which is used to communicate with our ACO Ipsilon instance and provide authentication to the cluster.

+ 

+ - Authenticate with the ocp.ci/ocp.stg.ci cluster via the cli

+ - Create an Openshift Secret containing the ACO/Ipsilon clientSecret

+ - Create an Openshift Oauth object with the identityProvider configuration

+ 

+ 

+ See below for sample template which achieves this.

+ 

+ 

+ ```

+ apiVersion: template.openshift.io/v1

+ kind: Template

+ metadata:

+   name: openshift-oidc-config

+ objects:

+ - kind: Secret

+   apiVersion: v1

+   metadata:

+     name: openid-client-secret-ocp-ci

+     namespace: openshift-config

+   data:

+     clientSecret: <base64 encoded OIDC client secret>

+   type: Opaque

+ - apiVersion: config.openshift.io/v1

+   kind: OAuth

+   metadata:

+     name: cluster

+   spec:

+     identityProviders:

+       - mappingMethod: claim

+         name: accounts-centos-org

+         openID:

+           claims:

+               email:

+               - email

+               - custom_email_claim

+             name:

+               - name

+               - nickname

+               - given_name

+             preferredUsername:

+               - email

+           clientID: ocp.ci.centos

+           clientSecret:

+             name: openid-client-secret-ocp-ci

+           extraScopes:

+             - email

+             - profile

+           issuer: 'https://id.centos.org/idp/openidc'

+         type: OpenID

+ ```

+ 

+ 

+ 

+ ### Resources:

+ - [1] [Ipsilon](https://ipsilon-project.org/)

+  

@@ -0,0 +1,19 @@ 

+ # Adding Privileged SCC to Service Accounts

+ This SOP should be used in the following scenario:

+ 

+ - A tenant has been approved to run `privileged container` workloads.

+ 

+ 

+ ## Steps

+ 

+ 1. Add the `prvileged` security context constraint to the service account in the tenants namespace like so:

+ 

+ ```

+ oc adm policy add-scc-to-user privileged -n namespace -z myserviceaccount

+ ```

+ 

+ 

+ 

+ ### Resources

+ 

+ - [1] [How to add the privileged SCC to a service account](https://docs.openshift.com/container-platform/4.5/cli_reference/openshift_cli/administrator-cli-commands.html#policy)

@@ -0,0 +1,24 @@ 

+ # Adding Taints to a nodes

+ A taint allows a Node to control which pods should or should not be scheduled on them. A toleration is something which can be applied to a pod, to indicate that it can tolerate a taint, and may mark it as being schedulable on a node with the matching taint.

+ 

+ To view the official docs for Openshift/Kubernetes see [1]. This also provides information on some of the default taints which have special meaning in a Kubernetes environment.

+ 

+ ## Example taint

+ The following example `node.kubernetes.io/unschedulable` is an example of a special taint which can be applied to a Node configuration. Internal Openshift/Kubernetes systems have tolerations in place by default for. With this knowledge, we can use it to prevent user workloads from being scheduled, while leaving internal system workloads in place. The effect `PreferNoSchedule` applys the following logic:

+ 

+ - New pods which dont have this taint will not get scheduled on a node with this taint

+ - Existing pods will be allowed to run

+ 

+ For the full list of effects see the official documentation at [1].

+ 

+ ```

+ spec:

+   taints:

+     - key: node.kubernetes.io/unschedulable

+       effect: PreferNoSchedule

+ ```

+ 

+ 

+ ### Resources

+ 

+ - [1] [Controlling Pod Placement using Node Taints](https://docs.openshift.com/container-platform/4.5/nodes/scheduling/nodes-scheduler-taints-tolerations.html)

@@ -0,0 +1,15 @@ 

+ ## Authenticating via CLI

+ Members of the CentOS CI Infrastructure team have admin access for ocp.ci and ocp.stg.ci Openshift clusters for their ACO accounts.

+ 

+ To login via the CLI using your main account authenticated via ACO simply:

+ 

+ - Authenticate via accounts-centos-org option in ocp.ci/ocp.stg.ci

+ - In the top right of the Openshift console, click the drop down menu for your user and click `Copy Login Command`

+ - Copy the `Log in with this token: oc login --token=xxxx --server=https://uri` and paste into your terminal

+ 

+ To login via the CLI using the `system:admin` user simply:

+ 

+ - ssh to the admin node which corresponds with `ocp-ci-management` or `ocp-ci-management-stg` inventory group

+ - Change user to the ocp admin user on the admin node choose appropriate version: `sudo su - ocpadm` or for staging `sudo su - ocpadmstg`

+ - `export /home/<ocpadmuser>/.kube/config`

+ - You should now have `system:admin` access to the cluster.

@@ -0,0 +1,35 @@ 

+ # Cleaning jenkins storage

+ When recieving Zabbix alerts for low storage on the legacy Jenkins, we can prune old builds from some of the largest storage users on the cluster using this SOP.

+ 

+ 

+ ## Step 1: Creating a token

+ * Firstly generate a jenkins token go to `https://ci.centos.org/user/username/configure`

+ * Create a token from API token section

+ * Set the username and token variables below

+ 

+ ```

+ JENKINSUSER=username

+ JENKINSAPITOKEN=token

+ ```

+ 

+ 

+ ## Step 2: Getting list of jobs

+ * ssh into `jenkins.ci.centos.org`

+ * Find the list of projects which are consuming most space by `du -csh /var/lib/jenkins/* | grep 'G' | sort -r`

+ 

+ 

+ ## Getting crumb api

+ * Use curl to generate a Crumb token 

+ 

+ ```

+ CRUMB=$(curl 'https://$JENKINSUSER:$JENKINSAPITOKEN@ci.centos.org/crumbIssuer/api/xml?xpath=concat(//crumbRequestField,":",//crumb)')

+ ```

+ 

+ 

+ ## Deleting builds from job

+ * Now with the crumb token set, we can delete the jobs using the API.

+ * In the following example, update the `jobname` and `start range/ end range` values which correspond with the build numbers in the jobname:

+ 

+ ```

+ curl -H "$CRUMB" -X POST "https://$JENKINSUSER:$JENKINSAPITOKEN@ci.centos.org/job/<jobname>/[<start>-<end>]/doDelete"

+ ```

@@ -0,0 +1,66 @@ 

+ ## Configure default permission for ACO users

+ By default, all users which are authenticated with Openshift (system:authenticated) will be apart of the group `self-provisioners`. This role provides the basic access to create projects etc, where the user then has admin access within.

+ 

+ To prevent this, we must first delete this `self-provisioner` ClusterRoleBinding. Should we ever wish to restore for whatever reason, see the following which is the original contents of the object:

+ 

+ ```

+ kind: ClusterRoleBinding

+ apiVersion: rbac.authorization.k8s.io/v1

+ metadata:

+   name: self-provisioners

+   annotations:

+     rbac.authorization.kubernetes.io/autoupdate: 'true'

+ subjects:

+   - kind: Group

+     apiGroup: rbac.authorization.k8s.io

+     name: 'system:authenticated:oauth'

+ roleRef:

+   apiGroup: rbac.authorization.k8s.io

+   kind: ClusterRole

+   name: self-provisioner

+ ```

+ 

+ Once removed, a new user which authenticates via ACO, now no longer has permission to do much of anything, beyond what a `basic-user` role provides.

+ 

+ To find this role originally, see resources [1][2]. To list the cluster roles and their bindings do the following `oc describe clusterrole.rbac` and `oc describe clusterrolebinding.rbac`. Searching for `system:authenticated` pointed toward which role was being automatically applied to the users which were authenticated with the cluster.

+ 

+ ### Adding permissions to an authenticated user

+ We first create a group which will contain all the users for a particular proejct. eg:

+ 

+ ```

+ kind: Group

+ apiVersion: user.openshift.io/v1

+ metadata:

+   name: project-group-admins

+ users:

+   - user2

+   - user1

+ ```

+ 

+ Then create a project/namespace for the project. eg: `oc create namespace "project"`

+ 

+ Next create a rolebinding for the group to a role. We want to give members of this group, admin access within the namespace. eg:

+ 

+ ```

+ kind: RoleBinding

+ apiVersion: rbac.authorization.k8s.io/v1

+ metadata:

+   name: project-admins

+   namespace: project

+ subjects:

+   - kind: Group

+     apiGroup: rbac.authorization.k8s.io

+     name: project-group-admins

+ roleRef:

+   apiGroup: rbac.authorization.k8s.io

+   kind: ClusterRole

+   name: admin

+ ```

+ 

+ Users listed in the group will now have admin access to the project/namespace and nothing else within the cluster, which is what we want.

+ 

+ 

+ ### Resources

+ - [1] Using RBAC to define and apply permissions https://docs.openshift.com/container-platform/4.4/authentication/using-rbac.html#default-roles_using-rbac

+ - [2] Using OIDC to authenticate https://docs.openshift.com/container-platform/4.4/authentication/identity_providers/configuring-oidc-identity-provider.html#configuring-oidc-identity-provider

+ 

@@ -0,0 +1,55 @@ 

+ ## Image Registry

+ 

+ ### Resources

+ - [1] https://docs.openshift.com/container-platform/4.4/registry/configuring_registry_storage/configuring-registry-storage-baremetal.html

+ 

+ 

+ ### Prerequisites

+ 

+ - Cluster administrator permissions.

+ - A cluster on bare metal.

+ - Provision persistent storage for your cluster, such as Red Hat OpenShift Container Storage. To deploy a private image registry, your storage must provide ReadWriteMany access mode.

+ - Must have "100Gi" capacity.

+ 

+ 

+ 

+ To start the image registry, you must change ManagementState Image Registry Operator configuration from Removed to Managed.

+ Leave the claim field blank to allow the automatic creation of an image-registry-storage PVC.

+ 

+ 

+ ```

+ $ oc edit configs.imageregistry/cluster

+ apiVersion: imageregistry.operator.openshift.io/v1

+ kind: Config

+ metadata:

+ ...

+ spec:

+ ...

+   managementState: Managed

+   storage:

+     pvc:

+       claim:

+ ...

+ ```

+ 

+ 

+ We want to enable the image pruner, to occationally prune images in the registry.

+ 

+ ```

+ $ oc edit imagepruner.imageregistry/cluster

+ apiVersion: imageregistry.operator.openshift.io/v1

+ kind: ImagePruner

+ metadata:

+   name: cluster

+ spec:

+   suspend: false

+ ...

+ ```

+ 

+ 

+ Check the status of the deployment:

+ 

+ ```

+ oc get clusteroperator image-registry

+ ```

+ 

@@ -0,0 +1,54 @@ 

+ # Cordoning Nodes and Draining Pods

+ This SOP should be followed in the following scenarios:

+ 

+ - If maintenance is scheduled to be carried out on an Openshift node.

+ 

+ 

+ ## Steps

+ 

+ 1. Mark the node as unschedulable:

+ 

+ ```

+ nodes=$(oc get nodes -o name  | sed -E "s/node\///")

+ echo $nodes

+ 

+ for node in ${nodes[@]}; do oc adm cordon $node; done

+ node/<node> cordoned

+ ```

+ 

+ 2. Check that the node status is `NotReady,SchedulingDisabled`

+ 

+ ```

+ oc get node <node1>

+ NAME        STATUS                        ROLES     AGE       VERSION

+ <node1>     NotReady,SchedulingDisabled   worker    1d        v1.18.3

+ ```

+ 

+ Note: It might not switch to `NotReady` immediately, there maybe many pods still running.

+ 

+ 

+ 3. Evacuate the Pods from **worker nodes** using one of the following methods

+ This will drain node `<node1>`, delete any local data, and ignore daemonsets, and give a period of 60 seconds for pods to drain gracefully.

+ 

+ ```

+ oc adm drain <node1> --delete-local-data=true --ignore-daemonsets=true --grace-period=60

+ ```

+ 

+ 4. Perform the scheduled maintenance on the node

+ Do what ever is required in the scheduled maintenance window

+ 

+ 

+ 5. Once the node is ready to be added back into the cluster

+ We must uncordon the node. This allows it to be marked scheduleable once more.

+ 

+ ```

+ nodes=$(oc get nodes -o name  | sed -E "s/node\///")

+ echo $nodes

+ 

+ for node in ${nodes[@]}; do oc adm uncordon $node; done

+ ```

+ 

+ 

+ ### Resources

+ 

+ - [1] [Nodes - working with nodes](https://docs.openshift.com/container-platform/4.5/nodes/nodes/nodes-nodes-working.html)

@@ -0,0 +1,49 @@ 

+ # Create etcd backup

+ This SOP should be followed in the following scenarios:

+ 

+ - When the need exists to create an etcd backup. 

+ - When shutting a cluster down gracefully.

+ 

+ ## Steps

+ 

+ 1. Connect to a master node

+ 

+ ```

+ oc debug node/<node_name>

+ ```

+ 

+ 2. Chroot to the /host directory on the containers filesystem

+ 

+ ```

+ sh-4.2# chroot /host

+ ```

+ 

+ 3. Run the cluster-backup.sh script and pass in the location to save the backup to

+ 

+ ```

+ sh-4.4# /usr/local/bin/cluster-backup.sh /home/core/assets/backup

+ ```

+ 

+ 4. Chown the backup files to be owned by user `core` and group `core`

+ 

+ ```

+ chown -R core:core /home/core/assets/backup

+ ```

+ 

+ 5. From the admin machine, see inventory group: `ocp-ci-management`, become the Openshift service account, see the inventory hostvars for the host identified in the previous step and note the `ocp_service_account` variable.

+ 

+ ```

+ ssh <host>

+ sudo su - <ocp_service_account>

+ ```

+ 

+ 6. Copy the files down to the admin machine.

+ 

+ ```

+ scp -i <ssh_key> core@<node_name>:/home/core/assets/backup/* ocp_backups/

+ ```

+ 

+ 

+ ### Resources

+ 

+ - [1] [Creating an etcd backup](https://docs.openshift.com/container-platform/4.5/backup_and_restore/backing-up-etcd.html#backing-up-etcd-data_backup-etcd)

@@ -0,0 +1,69 @@ 

+ # Disabling self-provisioners role

+ By default, when a user authenticates with Openshift via Oauth, it is part of the `self-provisioners` group. This group provides the ability to create new projects. On CentOS CI we do not want users to be able to create their own projects, as we have a system in place where we create a project and control the administrators of that project.

+ 

+ To disable the self-provisioner role do the following as outlined in the documentation[1].

+ 

+ ```

+ oc describe clusterrolebinding.rbac self-provisioners

+ 

+ Name:		self-provisioners

+ Labels:		<none>

+ Annotations:	rbac.authorization.kubernetes.io/autoupdate=true

+ Role:

+   Kind:	ClusterRole

+   Name:	self-provisioner

+ Subjects:

+   Kind	Name				Namespace

+   ----	----				---------

+   Group	system:authenticated:oauth

+ ```

+ 

+ Remove the subjects that the self-provisioners role applies to.

+ 

+ ```

+ oc patch clusterrolebinding.rbac self-provisioners -p '{"subjects": null}'

+ ```

+ 

+ Verify the change occurred successfully

+ 

+ ```

+ oc describe clusterrolebinding.rbac self-provisioners

+ Name:         self-provisioners

+ Labels:       <none>

+ Annotations:  rbac.authorization.kubernetes.io/autoupdate: true

+ Role:

+   Kind:  ClusterRole

+   Name:  self-provisioner

+ Subjects:

+   Kind  Name  Namespace

+   ----  ----  ---------

+ ```

+ 

+ When the cluster is updated to a new version, unless we mark the role appropriately, the permissions will be restored after the update is complete.

+ 

+ Verify that the value is currently set to be restored after an update:

+ 

+ ```

+ oc get clusterrolebinding.rbac self-provisioners -o yaml

+ ```

+ 

+ ```

+ apiVersion: authorization.openshift.io/v1

+ kind: ClusterRoleBinding

+ metadata:

+   annotations:

+     rbac.authorization.kubernetes.io/autoupdate: "true"

+   ...

+ ```

+ 

+ We wish to set this `rbac.authorization.kubernetes.io/autoupdate` to `false`. To patch this do the following.

+ 

+ ```

+ oc patch clusterrolebinding.rbac self-provisioners -p '{ "metadata": { "annotations": { "rbac.authorization.kubernetes.io/autoupdate": "false" } } }'

+ ```

+ 

+ 

+ 

+ ### Resources

+ 

+ - [1] https://docs.openshift.com/container-platform/4.4/applications/projects/configuring-project-creation.html#disabling-project-self-provisioning_configuring-project-creation

@@ -0,0 +1,29 @@ 

+ # Graceful Shutdown of an Openshift 4 Cluster

+ This SOP should be followed in the following scenarios:

+ 

+ - Shutting down an Openshift 4 cluster. 

+ 

+ 

+ ## Steps

+ 

+ Prequisite steps:

+ - Follow the SOP for cordoning and draining the nodes.

+ - Follow the SOP for creating an `etcd` backup.

+ 

+ 

+ 1. Get the nodes

+ 

+ ```

+ nodes=$(oc get nodes -o name  | sed -E "s/node\///")

+ ```

+ 

+ 2. Shutdown the nodes from the administration box associated with the cluster eg prod/staging.

+ 

+ ```

+ for node in ${nodes[@]}; do ssh -i <ssh_key> core@$node sudo shutdown -h now; done

+ ```

+ 

+ 

+ ### Resources

+ 

+ - [1] [Graceful Cluster Shutdown](https://docs.openshift.com/container-platform/4.5/backup_and_restore/graceful-cluster-shutdown.html)

@@ -0,0 +1,71 @@ 

+ # Graceful Startup of an Openshift 4 Cluster

+ This SOP should be followed in the following scenarios:

+ 

+ - Starting up an Openshift 4 cluster. 

+ 

+ 

+ ## Steps

+ 

+ Prequisite steps:

+ 

+ 

+ 1. Start the physical nodes

+ 

+ - Production uses `adhoc-openshift-nfs-stats.yaml` playbook to stop/start/restart nodes

+ - Staging uses seamicro accessible from admin machine, user manual contained in centosci/ocp4-docs/sops/seamicro

+ 

+ 2. Once the nodes have been started they must be uncordoned if appropriate

+ 

+ ```

+ oc get nodes

+ NAME                       STATUS                     ROLES    AGE    VERSION

+ dumpty-n1.ci.centos.org    Ready,SchedulingDisabled   worker   77d    v1.18.3+6c42de8

+ dumpty-n2.ci.centos.org    Ready,SchedulingDisabled   worker   77d    v1.18.3+6c42de8

+ dumpty-n3.ci.centos.org    Ready,SchedulingDisabled   worker   77d    v1.18.3+6c42de8

+ dumpty-n4.ci.centos.org    Ready,SchedulingDisabled   worker   77d    v1.18.3+6c42de8

+ dumpty-n5.ci.centos.org    Ready,SchedulingDisabled   worker   77d    v1.18.3+6c42de8

+ kempty-n10.ci.centos.org   Ready,SchedulingDisabled   worker   106d   v1.18.3+6c42de8

+ kempty-n11.ci.centos.org   Ready,SchedulingDisabled   worker   106d   v1.18.3+6c42de8

+ kempty-n12.ci.centos.org   Ready,SchedulingDisabled   worker   106d   v1.18.3+6c42de8

+ kempty-n6.ci.centos.org    Ready,SchedulingDisabled   master   106d   v1.18.3+6c42de8

+ kempty-n7.ci.centos.org    Ready,SchedulingDisabled   master   106d   v1.18.3+6c42de8

+ kempty-n8.ci.centos.org    Ready,SchedulingDisabled   master   106d   v1.18.3+6c42de8

+ kempty-n9.ci.centos.org    Ready,SchedulingDisabled   worker   106d   v1.18.3+6c42de8

+ 

+ nodes=$(oc get nodes -o name  | sed -E "s/node\///")

+ 

+ for node in ${nodes[@]}; do oc adm uncordon $node; done

+ node/dumpty-n1.ci.centos.org uncordoned

+ node/dumpty-n2.ci.centos.org uncordoned

+ node/dumpty-n3.ci.centos.org uncordoned

+ node/dumpty-n4.ci.centos.org uncordoned

+ node/dumpty-n5.ci.centos.org uncordoned

+ node/kempty-n10.ci.centos.org uncordoned

+ node/kempty-n11.ci.centos.org uncordoned

+ node/kempty-n12.ci.centos.org uncordoned

+ node/kempty-n6.ci.centos.org uncordoned

+ node/kempty-n7.ci.centos.org uncordoned

+ node/kempty-n8.ci.centos.org uncordoned

+ node/kempty-n9.ci.centos.org uncordoned

+ 

+ oc get nodes

+ NAME                       STATUS   ROLES    AGE    VERSION

+ dumpty-n1.ci.centos.org    Ready    worker   77d    v1.18.3+6c42de8

+ dumpty-n2.ci.centos.org    Ready    worker   77d    v1.18.3+6c42de8

+ dumpty-n3.ci.centos.org    Ready    worker   77d    v1.18.3+6c42de8

+ dumpty-n4.ci.centos.org    Ready    worker   77d    v1.18.3+6c42de8

+ dumpty-n5.ci.centos.org    Ready    worker   77d    v1.18.3+6c42de8

+ kempty-n10.ci.centos.org   Ready    worker   106d   v1.18.3+6c42de8

+ kempty-n11.ci.centos.org   Ready    worker   106d   v1.18.3+6c42de8

+ kempty-n12.ci.centos.org   Ready    worker   106d   v1.18.3+6c42de8

+ kempty-n6.ci.centos.org    Ready    master   106d   v1.18.3+6c42de8

+ kempty-n7.ci.centos.org    Ready    master   106d   v1.18.3+6c42de8

+ kempty-n8.ci.centos.org    Ready    master   106d   v1.18.3+6c42de8

+ kempty-n9.ci.centos.org    Ready    worker   106d   v1.18.3+6c42de8

+ ```

+ 

+ 

+ ### Resources

+ 

+ - [1] [Graceful Cluster Startup](https://docs.openshift.com/container-platform/4.5/backup_and_restore/graceful-cluster-restart.html)

+ - [2] [Cluster disaster recovery](https://docs.openshift.com/container-platform/4.5/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html#dr-restoring-cluster-state)

@@ -0,0 +1,60 @@ 

+ # Spike: Investigate adding routes from apps.ci.centos.org

+ The Ingress Operator[1] manages `IngressController` resources which will allow us to achieve[3] this on Openshift 4.

+ 

+ ### Resources

+ - [1] https://docs.openshift.com/container-platform/4.4/networking/ingress-operator.html

+ - [2] https://rcarrata.com/openshift/ocp4_route_sharding/

+ - [3] https://projects.engineering.redhat.com/browse/CPE-764

+ 

+ 

+ ### POC

+ Performed the following steps to achieve goal:

+ 

+ ```

+ -rw-rw-r--. 1 dkirwan dkirwan 1060 Jul  6 18:12 deployment.yaml

+ -rw-rw-r--. 1 dkirwan dkirwan  286 Jul  6 17:13 ingresscontroller.yaml

+ -rw-rw-r--. 1 dkirwan dkirwan  336 Jul  6 17:53 route.yaml

+ -rw-rw-r--. 1 dkirwan dkirwan  273 Jul  6 17:58 service.yaml

+ ```

+ - Created an `IngressController` which creates 2 router replicas and configured to manage Routes which point at `*.apps.ci.centos.org`. It also has a `routeSelector` to match labels `type: sharded`

+ ```

+   routeSelector:

+     matchLabels:

+       type: sharded

+ ``` 

+ - Created a Deployment with simple app. 

+ - Created Service and Route to expose the app externally at `ingress-controller-test.apps.ci.centos.org`.

+ - Route has been given a label: `type: sharded`

+ - Used `dig` to retrieve the public IP address of the cluster

+ ```

+ dig console-openshift-console.apps.ocp.stg.ci.centos.org

+ 

+ ; <<>> DiG 9.11.18-RedHat-9.11.18-1.fc32 <<>> console-openshift-console.apps.ocp.stg.ci.centos.org

+ ;; global options: +cmd

+ ;; Got answer:

+ ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21722

+ ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

+ 

+ ;; OPT PSEUDOSECTION:

+ ; EDNS: version: 0, flags:; udp: 4096

+ ;; QUESTION SECTION:

+ ;console-openshift-console.apps.ocp.stg.ci.centos.org. IN A

+ 

+ ;; ANSWER SECTION:

+ console-openshift-console.apps.ocp.stg.ci.centos.org. 600 IN A 8.43.84.237

+ 

+ ;; Query time: 77 msec

+ ;; SERVER: 10.38.5.26#53(10.38.5.26)

+ ;; WHEN: Mon Jul 06 18:43:35 IST 2020

+ ;; MSG SIZE  rcvd: 97

+ ```

+ - Configured my `/etc/hosts` file accordingly:

+ ```

+ cat /etc/hosts

+ 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

+ ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

+ 8.43.84.237 ingress-controller-test.apps.ci.centos.org

+ ```

+ - Visited `http://ingress-controller-test.apps.ci.centos.org` and was greeted with the expected content in the deployed app.

+ - We should be able to achieve this spike CPE-764 using the Ingress Controller Operator.

+ 

@@ -0,0 +1,41 @@ 

+ apiVersion: apps/v1

+ kind: Deployment

+ metadata:

+   name: ingress-controller-deployment

+   namespace: ingress-controller-test

+ spec:

+   progressDeadlineSeconds: 600

+   replicas: 1

+   revisionHistoryLimit: 10

+   selector:

+     matchLabels:

+       app: ingress-controller-test

+   strategy:

+     rollingUpdate:

+       maxSurge: 25%

+       maxUnavailable: 25%

+     type: RollingUpdate

+   template:

+     metadata:

+       annotations:

+         openshift.io/generated-by: OpenShiftWebConsole

+       creationTimestamp: null

+       labels:

+         app: ingress-controller-test

+     spec:

+       containers:

+       - image: quay.io/dkirwan_redhat/crypto_monitoring:v0.0.1

+         imagePullPolicy: IfNotPresent

+         name: ingress-controller-test

+         ports:

+         - containerPort: 8080

+           protocol: TCP

+         resources: {}

+         terminationMessagePath: /dev/termination-log

+         terminationMessagePolicy: File

+       dnsPolicy: ClusterFirst

+       restartPolicy: Always

+       schedulerName: default-scheduler

+       securityContext: {}

+       terminationGracePeriodSeconds: 30

+ 

@@ -0,0 +1,14 @@ 

+ apiVersion: operator.openshift.io/v1

+ kind: IngressController

+ metadata:

+   name: cpe-764-spike

+   namespace: openshift-ingress-operator

+ spec:

+   domain: apps.ci.centos.org

+   endpointPublishingStrategy:

+     type: HostNetwork

+   routeSelector:

+     matchLabels:

+       type: sharded

+ status: {}

+ 

@@ -0,0 +1,17 @@ 

+ apiVersion: route.openshift.io/v1

+ kind: Route

+ metadata:

+   creationTimestamp: null

+   labels:

+     type: sharded

+   name: test

+   namespace: ingress-controller-test

+ spec:

+   host: ingress-controller-test.apps.ci.centos.org

+   port:

+     targetPort: 8080-tcp

+   to:

+     kind: Service

+     name: test-service

+     weight: 100

+   wildcardPolicy: None

@@ -0,0 +1,16 @@ 

+ apiVersion: v1

+ kind: Service

+ metadata:

+   name: test-service

+   namespace: ingress-controller-test

+ spec:

+   ports:

+   - name: 8080-tcp

+     port: 8080

+     protocol: TCP

+     targetPort: 8080

+   selector:

+     app: ingress-controller-test

+   sessionAffinity: None

+   type: ClusterIP

+ 

@@ -0,0 +1,208 @@ 

+ # Steps for installing OCP 4.3 on bare metal:

+ 

+ Documentation: [docs](https://access.redhat.com/documentation/en-us/openshift_container_platform/4.3/html/installing_on_bare_metal/installing-on-bare-metal)

+ 

+ ## Install: 

+ * mkdir ocp-ci-centos-org

+ * cd ocp-ci-centos-org

+ * For installations of OpenShift Container Platform that use user-provisioned infrastructure, you must manually generate your installation configuration file.

+ * 1.1.7.1. for sample config see: [here](https://projects.engineering.redhat.com/secure/attachment/104626/install-config.yaml.bak)

+ 

+ ```

+ apiVersion: v1

+ baseDomain: centos.org

+ compute:                                                                                                                                                                                                                                      

+ - hyperthreading: Enabled

+   name: worker

+   replicas: 0

+ controlPlane:

+   hyperthreading: Enabled

+   name: master

+   replicas: 3

+ metadata:

+   name: ocp.ci

+ networking:

+   clusterNetwork:

+   - cidr: 10.128.0.0/14

+     hostPrefix: 23

+   networkType: OpenShiftSDN

+   serviceNetwork:

+   - 172.30.0.0/16

+ platform:

+   none: {}

+ fips: false

+ pullSecret: '<installation pull secret from cloud.redhat.com>'

+ sshKey: '<ssh key for the RHCOS nodes>'

+ ```

+ 

+ 

+ *   get the **pullsecret** from [https://cloud.redhat.com/openshift/install/metal/user-provisioned](https://cloud.redhat.com/openshift/install/metal/user-provisioned) requires your access.redhat.com login.

+ *   “You must set the value of the replicas parameter to 0. This parameter controls the number of workers that the cluster creates and manages for you, which are functions that the cluster does not perform when you use user-provisioned infrastructure. You must manually deploy worker machines for the cluster to use before you finish installing OpenShift Container Platform.”

+ *   **1.1.8**. Once the **install-config.yaml** configuration has been added correctly, take a backup of this file for future installs or reference as the next step will consume it. Then run the following:

+ *   `openshift-install create manifests --dir=/home/dkirwan/ocp-ci-centos-org`

+ 

+     INFO Consuming Install Config from target directory  

+     WARNING Certificate 35183CE837878BAC77A802A8A00B6434857 from additionalTrustBundle is x509 v3 but not a certificate authority  

+     WARNING Making control-plane schedulable by setting MastersSchedulable to true for  Scheduler cluster settings.

+ *   Running this command converts the **install-config.yaml** to a number of files eg:

+ ```

+     ~/ocp-ci-centos-org $ tree .

+     .

+     ├── manifests

+     │   ├── 04-openshift-machine-config-operator.yaml

+     │   ├── cluster-config.yaml

+     │   ├── cluster-dns-02-config.yml

+     │   ├── cluster-infrastructure-02-config.yml

+     │   ├── cluster-ingress-02-config.yml

+     │   ├── cluster-network-01-crd.yml

+     │   ├── cluster-network-02-config.yml

+     │   ├── cluster-proxy-01-config.yaml

+     │   ├── cluster-scheduler-02-config.yml

+     │   ├── cvo-overrides.yaml

+     │   ├── etcd-ca-bundle-configmap.yaml

+     │   ├── etcd-client-secret.yaml

+     │   ├── etcd-host-service-endpoints.yaml

+     │   ├── etcd-host-service.yaml

+     │   ├── etcd-metric-client-secret.yaml

+     │   ├── etcd-metric-serving-ca-configmap.yaml

+     │   ├── etcd-metric-signer-secret.yaml

+     │   ├── etcd-namespace.yaml

+     │   ├── etcd-service.yaml

+     │   ├── etcd-serving-ca-configmap.yaml

+     │   ├── etcd-signer-secret.yaml

+     │   ├── kube-cloud-config.yaml

+     │   ├── kube-system-configmap-root-ca.yaml

+     │   ├── machine-config-server-tls-secret.yaml

+     │   ├── openshift-config-secret-pull-secret.yaml

+     │   └── user-ca-bundle-config.yaml

+     └── openshift

+         ├── 99_kubeadmin-password-secret.yaml

+     	├── 99_openshift-cluster-api_master-user-data-secret.yaml

+       	├── 99_openshift-cluster-api_worker-user-data-secret.yaml

+      	├── 99_openshift-machineconfig_99-master-ssh.yaml

+     	├── 99_openshift-machineconfig_99-worker-ssh.yaml

+     	└── openshift-install-manifests.yaml

+     2 directories, 32 files

+ ```

+ 

+ *    Edit **manifests/cluster-scheduler-02-config.yml** and set **mastersSchedulable** to false. This will prevent Pods from being scheduled on the master instances.

+ *   `sed -i 's/mastersSchedulable: true/mastersSchedulable: false/g' manifests/cluster-scheduler-02-config.yml`

+ *   Create the machineconfigs to disable dhcp on the master/worker nodes: 

+ 

+ ```

+ for variant in master worker; do 

+ cat << EOF > ./99_openshift-machineconfig_99-${variant}-nm-nodhcp.yaml

+ apiVersion: machineconfiguration.openshift.io/v1

+ kind: MachineConfig

+ metadata:

+   labels:

+     machineconfiguration.openshift.io/role: ${variant}

+   name: nm-${variant}-nodhcp

+ spec:

+   config:

+     ignition:

+       config: {}

+       security:

+         tls: {}

+       timeouts: {}

+       version: 2.2.0

+     networkd: {}

+     passwd: {}

+     storage:

+       files:

+       - contents:

+           source: data:text/plain;charset=utf-8;base64,W21haW5dCm5vLWF1dG8tZGVmYXVsdD0qCg==

+           verification: {}

+         filesystem: root

+         mode: 0644

+         path: /etc/NetworkManager/conf.d/disabledhcp.conf

+   osImageURL: ""

+ EOF

+ done

+ ```

+ 

+ *   *NOTE* There is a gotcha here, fs mode is **octal** and should start with 0 eg 0644 (-rwxr--r--), however it will be **decimal** value 420 when queried later via kubernetes api.

+ *   Create the ignition configurations:

+ *   Rename `worker.ign` to `compute.ign`, as later steps in the process are configured to point at compute.ign.

+ 

+ ```

+ openshift-install create ignition-configs --dir=/home/dkirwan/ocp-ci-centos-org

+ INFO Consuming OpenShift Install (Manifests) from target directory  

+ INFO Consuming Common Manifests from target directory  

+ INFO Consuming Master Machines from target directory  

+ INFO Consuming Worker Machines from target directory  

+ INFO Consuming Openshift Manifests from target directory

+ 

+ # Should have the following layout

+ .

+ ├── auth

+ │   ├── kubeadmin-password

+ │   └── kubeconfig

+ ├── bootstrap.ign

+ ├── master.ign

+ ├── metadata.json

+ └── compute.ign

+ ```

+ 

+ 

+ *   *NOTE* for production ie `ocp.ci` we must perform an extra step at this point, as the machines have 2 hard disks attached. We want to ensure that `/dev/sdb` gets its partition table wiped at bootstrapping time, so at a later time we can configure the Local Storage Operator to manage this disk drive.

+ *   Modify the `master.ign` and `compute.ign` ignition files with the following:

+ 

+ ```

+ +   "storage":{"disks":[{"device":"/dev/sdb","wipeTable":true}]},

+ -   "storage":{},

+ ```

+ 

+ 

+ *   **1.1.9. Creating Red Hat Enterprise Linux CoreOS (RHCOS) machines**

+ *   Prerequisites: 

+ *   Obtain the Ignition config files for your cluster. 

+ *   Configure suitable PXE or iPXE infrastructure. 

+ *   Have access to an HTTP server that you can access from your computer.

+ *   Have a load balancer eg Haproxy available

+ *   You must download the kernel, initramfs, ISO file and the RAW disk files eg:

+ *   [https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/](https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/)

+     *    [rhcos-4.3.8-x86_64-installer-kernel-x86_64](https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/rhcos-4.3.8-x86_64-installer-kernel-x86_64)

+     * [rhcos-4.3.8-x86_64-installer-initramfs.x86_64.img](https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/rhcos-4.3.8-x86_64-installer-initramfs.x86_64.img)

+     * [rhcos-4.3.8-x86_64-installer.x86_64.iso](https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/rhcos-4.3.8-x86_64-installer.x86_64.iso)

+     * [rhcos-4.3.8-x86_64-metal.x86_64.raw.gz](https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/rhcos-4.3.8-x86_64-metal.x86_64.raw.gz)

+ *   These files should be copied over to a webserver which is accessible from the bootstrap/master/compute instances.

+ *   **1.1.9.2.** “Configure the network boot infrastructure so that the machines boot from their local disks after RHCOS is installed on them. “

+ *   Existing CentOS PXE boot configuration Ansible [example](https://github.com/CentOS/ansible-infra-playbooks/blob/master/templates/pxeboot.j2)

+ *   Example RHCOS PXE boot configuration [here](https://projects.engineering.redhat.com/secure/attachment/104734/centos-ci-pxe_sampleconfig.txt)

+ *   **1.1.10. Once the systems are booting and installing, you can monitor the installation with: `./openshift-install --dir=/home/dkirwan/ocp-ci-centos-org wait-for bootstrap-complete --log-level=info`

+ *   Once the master nodes come up successfully, this command will exit. We can now remove the bootstrap instance, and repurpose it as a worker/compute node.

+ *   Run the haproxy role, once the bootstrap node has been removed from the `ocp-ci-master-and-bootstrap-stg` ansible inventory group.

+ *   Begin installing the compute/worker nodes.

+ *   Once the workers are up accept them into the cluster by accepting their `csr` certs:

+ ```

+ # List the certs. If you see status pending, this is the worker/compute nodes attempting to join the cluster. It must be approved.

+ oc get csr

+ 

+ # Accept all node CSRs one liner

+ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve

+ ```

+ *   1.1.11. Logging in to the cluster. At this point the cluster is up, and we’re in configuration territory.

+ 

+ 

+ ## Manually test the bootstrap process RHCOS

+ 

+ Resources:

+ 

+ *   [1] JIRA corresponding with this section: [CPE-661](https://projects.engineering.redhat.com/browse/CPE-661)

+ *   [2] [https://github.com/CentOS/ansible-infra-playbooks/pull/4](https://github.com/CentOS/ansible-infra-playbooks/pull/4)

+ *   [3] [https://scm.infra.centos.org/CentOS/ansible-inventory-ci/pulls/1](https://scm.infra.centos.org/CentOS/ansible-inventory-ci/pulls/1)

+ *   [4] [https://scm.infra.centos.org/CentOS/ansible-pkistore-ci/pulls/1](https://scm.infra.centos.org/CentOS/ansible-pkistore-ci/pulls/1)

+ *   [5] [CentOS/ansible-infra-playbooks/staging/templates/ocp_pxeboot.j2](https://raw.githubusercontent.com/CentOS/ansible-infra-playbooks/staging/templates/ocp_pxeboot.j2)

+ *   [https://www.openshift.com/blog/openshift-4-bare-metal-install-quickstart](https://www.openshift.com/blog/openshift-4-bare-metal-install-quickstart)

+ *   [6] [Create a raid enabled data volume via ignition file](https://coreos.com/ignition/docs/latest/examples.html#create-a-raid-enabled-data-volume)

+ *   [7] HAProxy config for OCP4 [https://github.com/openshift-tigerteam/guides/blob/master/ocp4/ocp4-haproxy.cfg](https://github.com/openshift-tigerteam/guides/blob/master/ocp4/ocp4-haproxy.cfg)

+ 

+ 

+ Steps:

+ 

+ *   Create ssh key pair using `ssh-keygen` and uploaded it to the ansible-pkistore-ci repository at [4]

+ *   Through trial and error, we’ve produced a PXE boot configuration for one of the machines and managed to get it to boot and begin the bootstrap process via an ignition file see [5].

+ *   Next steps is to make a decision on networking configuration then configure DNS and create 2 haproxy proxies before creating the bootstrap and master OCP nodes. Jiras created: [CPE-678](https://projects.engineering.redhat.com/browse/CPE-678), [CPE-677](https://projects.engineering.redhat.com/browse/CPE-677) and [CPE-676](https://projects.engineering.redhat.com/browse/CPE-676)

+ *   PR configuration for the HAProxy loadbalancers: [here](https://github.com/CentOS/ansible-role-haproxy/pull/2)

+ *   Configuration for DNS/bind (encrypted): [here](https://scm.infra.centos.org/CentOS/ansible-filestore-ci/src/branch/master/bind/ci.centos.org)

@@ -0,0 +1,14 @@ 

+ # Persistent storage via NFS

+ Once the NFS storage is configured and available for use within the cluster, we can create PVs with the following adhoc playbook: [ansible-infra-playbooks/adhoc-openshift-pv.yml](https://github.com/CentOS/ansible-infra-playbooks/blob/master/adhoc-openshift-pv.yml)

+ 

+ Sample usage:

+ 

+ ```

+ ansible-playbook playbooks/adhoc-openshift-pv.yml -e "host=<admin host where the NFS storage is mounted>" -e "pv_size=10Gi" -e "cico_project_name=project-pv-name" 

+ ```

+ 

+ 

+ 

+ Resources:

+ *   [1] Jira [https://projects.engineering.redhat.com/browse/CPE-701](https://projects.engineering.redhat.com/browse/CPE-701)

+ *   [2] Configuring NFS [https://docs.openshift.com/container-platform/4.4/storage/persistent_storage/persistent-storage-nfs.html](https://docs.openshift.com/container-platform/4.4/storage/persistent_storage/persistent-storage-nfs.html)

@@ -0,0 +1,84 @@ 

+ ## Prerequisites

+ The following are the prerequisites required to install OCP4 on bare metal

+ 

+ ### Access confirmation

+ 

+ *   Access to [https://access.redhat.com/](https://access.redhat.com/), if not, follow the steps

+     *   [https://mojo.redhat.com/docs/DOC-99172](https://mojo.redhat.com/docs/DOC-99172)

+     *   [https://docs.google.com/document/d/15DmYrfspKVwf4z8DzPK7sU-zmRERVWHBN3tghRkrkGU/edit](https://docs.google.com/document/d/15DmYrfspKVwf4z8DzPK7sU-zmRERVWHBN3tghRkrkGU/edit)

+ *   Git repo for the installer [https://github.com/openshift/installer](https://github.com/openshift/installer)

+ *   OpenShift playground: [https://try.openshift.com](https://try.openshift.com/)

+ *   Access.redhat.com account, to download packages/pull secrets [https://cloud.redhat.com/openshift/install](https://cloud.redhat.com/openshift/install)

+     *   openshift-install client: [https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-install-linux.tar.gz](https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-install-linux.tar.gz)

+     *   RHCOS download to create machines for your cluster to use during the installation [https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/latest/](https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/latest/)

+     *   Openshift Command Line tools: [https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-client-linux.tar.gz](https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-client-linux.tar.gz)

+ *   Official documentation for installation: 

+     *   [https://docs.openshift.com/container-platform/4.3/installing/installing_bare_metal/installing-bare-metal.html](https://docs.openshift.com/container-platform/4.3/installing/installing_bare_metal/installing-bare-metal.html)

+     *   [https://docs.openshift.com/container-platform/4.3/architecture/architecture-installation.html#architecture-installation](https://docs.openshift.com/container-platform/4.3/architecture/architecture-installation.html#architecture-installation)

+ *   Access RH employee subscription benefits: \

+ [https://mojo.redhat.com/docs/DOC-99172 \

+ https://docs.google.com/document/d/15DmYrfspKVwf4z8DzPK7sU-zmRERVWHBN3tghRkrkGU/edit](https://mojo.redhat.com/docs/DOC-99172)

+ 

+ 

+ ### Bootstrap node Identification

+ As per [1], the minimum number of nodes needed for an Openshift 4 cluster is 6

+ 

+ *   1 bootstrap node

+ *   3 master nodes

+ *   2 worker nodes.

+ 

+ As per [2] the minimum requirements for the bootstrap machine is:

+ 

+ <table>

+   <tr>

+    <td>

+     <strong>Machine</strong>

+    </td>

+    <td>

+     <strong>Operating System</strong>

+    </td>

+    <td>

+     <strong>vCPU</strong>

+    </td>

+    <td>

+     <strong>RAM</strong>

+    </td>

+    <td>

+     <strong>Storage</strong>

+    </td>

+   </tr>

+   <tr>

+    <td>

+     Bootstrap

+    </td>

+    <td>

+     RHCOS

+    </td>

+    <td>

+     4

+    </td>

+    <td>

+     16 GB

+    </td>

+    <td>

+     120 GB

+    </td>

+   </tr>

+ </table>

+ 

+ 

+ *   [1] Minimum number of nodes [https://docs.openshift.com/container-platform/4.3/installing/installing_bare_metal/installing-bare-metal.html#machine-requirements_installing-bare-metal](https://docs.openshift.com/container-platform/4.3/installing/installing_bare_metal/installing-bare-metal.html#machine-requirements_installing-bare-metal)

+ *   [2] Minimum bootstrap/master/worker node requirements [https://docs.openshift.com/container-platform/4.3/installing/installing_bare_metal/installing-bare-metal.html#minimum-resource-requirements_installing-bare-metal](https://docs.openshift.com/container-platform/4.3/installing/installing_bare_metal/installing-bare-metal.html#minimum-resource-requirements_installing-bare-metal)

+ *   [3] [https://ark.intel.com/content/www/us/en/ark/products/64591/intel-xeon-processor-e5-2640-15m-cache-2-50-ghz-7-20-gt-s-intel-qpi.html](https://ark.intel.com/content/www/us/en/ark/products/64591/intel-xeon-processor-e5-2640-15m-cache-2-50-ghz-7-20-gt-s-intel-qpi.html)

+ 

+ 

+ ### Miscellaneous Prerequisites

+ 

+ *   Need internet access from the bootstrap/master/compute nodes so as to:

+ *   Access the Red Hat OpenShift Cluster Manager page to download the installation program and perform subscription management and entitlement. If the cluster has internet access and you do not disable Telemetry, that service automatically entitles your cluster. If the Telemetry service cannot entitle your cluster, you must manually entitle it on the Cluster registration page

+ *   Access quay.io to obtain the packages (images?) that are required to install your cluster.

+ *   Obtain the packages that are required to perform cluster updates.

+ *   **1.1.3.1**. Before you install OpenShift Container Platform, you must provision two layer-4 load balancers.

+ *   Minimum of 6 nodes, 1 bootstrap node, 3 master, 2 compute.

+ *   **1.1.3**. See this section to see the **network ports** which are required to be open and accessible from each machine

+ *   Configure DHCP or set static IP addresses on each node. Be sure to configure it so the nodes always get the same IP address if configured via DHCP.

@@ -0,0 +1,7 @@ 

+ # Adding some workloads for testing

+ Openshift 4, ships with a number of operators already configured and available via OperatorHub. We have tested with the Jenkinsci operator 0.4.0[1]. 

+ 

+ Resources:

+ *   [1] jenkinsci/kubernetes/operator: [github](https://github.com/jenkinsci/kubernetes-operator)

+ *   [2] Deploy the jenkinsci/kubernetes-operator on Kubernetes: [deploy yaml](https://raw.githubusercontent.com/jenkinsci/kubernetes-operator/master/deploy/all-in-one-v1alpha2.yaml)

+ *   [3] Changes required to make this work correctly on Openshift: [gist](https://gist.github.com/davidkirwan/d3301c550c94dd1a95965dd8d7a91594)

@@ -0,0 +1,40 @@ 

+ # kubevirt Instruction

+ 

+ `Note: This doc is full of snippets of official doc in order to keep it to point. This is not to be considered a documentation/guide for others. Please refer official guide. This is mere a note for CentOS CI admins based on our workflow`

+ 

+ ## How to install Kubevirt in cluster

+ 

+ * Open a browser window and log in to the OpenShift Container Platform web console.

+ * Navigate to the Operators → OperatorHub page.

+ * Search for Container-native virtualization and then select it.

+ * Read the information about the Operator and click Install.

+ * On the Create Operator Subscription page:

+     * For Installed Namespace, ensure that the Operator recommended namespace option is selected. This installs the Operator in the mandatory openshift-cnv namespace, which is automatically created if it does not exist.

+ 

+     * Select 2.3 from the list of available Update Channel options.

+ 

+     * Click Subscribe to make the Operator available to the openshift-cnv namespace.

+ 

+ On the Installed Operators screen, the Status displays Succeeded when container-native virtualization finishes installation.

+ 

+ ## Deploying container-native virtualization

+ 

+ After subscribing to the Container-native virtualization catalog, create the CNV Operator Deployment custom resource to deploy container-native virtualization.

+ 

+ * Navigate to the Operators → Installed Operators page.

+ * Click Container-native virtualization.

+ * Click the CNV Operator Deployment tab and click Create HyperConverged Cluster.

+ * Click Create to launch container-native virtualization.

+ * Navigate to the Workloads → Pods page and monitor the container-native virtualization Pods until they are all Running. After all the Pods display the Running state, you can access container-native virtualization.

+ 

+ 

+ ## creating a vm

+ 

+ * create a vm template (or for testing if kubevirt works in your cluster, you can also use a test template from kubevirt: `https://raw.githubusercontent.com/kubevirt/demo/master/manifests/vm.yaml`)

+ * once you have your template ready, type `oc create -f <template.yaml>` or for test purpose `oc create -f https://raw.githubusercontent.com/kubevirt/demo/master/manifests/vm.yaml`

+ * once it returns success, check if the vm is created with `oc get vm`

+ * Go to webUI to start the vm and you should be able to see all there is to see in a vm.

+ 

+ VMs created are in state off by default. To control them from CLI, you need to install kubevirt-virtctl. Find [instruction here](https://docs.openshift.com/container-platform/4.4/cnv/cnv_install/cnv-installing-virtctl.html#cnv-enabling-cnv-repos_cnv-installing-virtctl)

+ 

+ 

@@ -0,0 +1,28 @@ 

+ # Binding a PVC to a local storage PV

+ 

+ In order to bind a PVC to a local storage PV you can do so using the following:

+ 

+ Steps:

+ 

+ *   Create a PersistantVolumeClaim object like the following, simply update the `NAMESPACE` to match the namespace this PVC will be created in, the `NAME` the name of the PVC, the `SIZE` the size which matches the local storage PV and finally the `LOCAL_STORAGE_PV_NAME` to match the local storage PV's name which you wish to bind it to.

+ *   Important, don't chose a local storage PV name which exists on a master node, as they are marked as unschedulable for user workloads.

+ 

+ 

+ ```

+ kind: PersistentVolumeClaim

+ apiVersion: v1

+ metadata:

+   name: NAME

+   namespace: NAMESPACE

+   finalizers:

+     - kubernetes.io/pvc-protection

+ spec:

+   accessModes:

+     - ReadWriteMany

+   resources:

+     requests:

+       storage: SIZE

+   volumeName: LOCAL_STORAGE_PV_NAME

+   storageClassName: local-sc

+   volumeMode: Filesystem

+ ```

@@ -0,0 +1,63 @@ 

+ # Adding Local Storage

+ Planning to make use of the Local Storage Operator to format the /dev/sdb disks on each node. Following the instructions at [4].

+ 

+ Resources:

+ 

+ *   [1] 1.3.12.1. [https://access.redhat.com/documentation/en-us/openshift_container_platform/4.3/html/installing_on_bare_metal/installing-on-bare-metal](https://access.redhat.com/documentation/en-us/openshift_container_platform/4.3/html/installing_on_bare_metal/installing-on-bare-metal)

+ *   [2] Parameters to configure the image registry operator [https://access.redhat.com/documentation/en-us/openshift_container_platform/4.3/html-single/registry/index#registry-operator-configuration-resource-overview_configuring-registry-operator](https://access.redhat.com/documentation/en-us/openshift_container_platform/4.3/html-single/registry/index#registry-operator-configuration-resource-overview_configuring-registry-operator)

+ *   [3] [https://docs.openshift.com/container-platform/4.4/storage/understanding-persistent-storage.html](https://docs.openshift.com/container-platform/4.4/storage/understanding-persistent-storage.html)

+ *   [4] Configuring local storage [https://docs.openshift.com/container-platform/4.4/storage/persistent_storage/persistent-storage-local.html](https://docs.openshift.com/container-platform/4.4/storage/persistent_storage/persistent-storage-local.html)

+ *   [5] Configuring nfs storage [https://docs.openshift.com/container-platform/4.4/storage/persistent_storage/persistent-storage-nfs.html](https://docs.openshift.com/container-platform/4.4/storage/persistent_storage/persistent-storage-nfs.html)

+ *   [6] Persistent storage accessModes [https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes)

+ 

+ Steps:

+ 

+ *   Installed the Local Storage Operator via instructions at [4]

+ *   Created a LocalVolume object via instructions at [4], see contents: [link](https://gist.github.com/davidkirwan/4cfbee653ecbab70484c9ce878e5eb90)

+ *   The documentation at [4] suggest that you can simply patch the daemonset config to add configuration to run on master nodes also. This is not true. The Local Storage Operator will revert any changes to the objects which it is managing. This change instead must be made to the LocalStorage object created at step 2.

+ *   Daemonset pod runs on each node that matches the selector in the LocalVolume object:

+ 

+ 

+ ```

+ oc get ds

+ NAME                        	DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE

+ local-disks-local-diskmaker 	7     	7     	7   	7        	7       	<none>      	58m

+ local-disks-local-provisioner   7     	7     	7   	7        	7       	<none>      	58m

+ ```

+