Blame docs/operations/ci/installation/install.md

47c289
# Steps for installing OCP 4.3 on bare metal:
47c289
47c289
Documentation: [docs](https://access.redhat.com/documentation/en-us/openshift_container_platform/4.3/html/installing_on_bare_metal/installing-on-bare-metal)
47c289
47c289
## Install: 
47c289
* mkdir ocp-ci-centos-org
47c289
* cd ocp-ci-centos-org
47c289
* For installations of OpenShift Container Platform that use user-provisioned infrastructure, you must manually generate your installation configuration file.
47c289
* 1.1.7.1. for sample config see: [here](https://projects.engineering.redhat.com/secure/attachment/104626/install-config.yaml.bak)
47c289
47c289
```
47c289
apiVersion: v1
47c289
baseDomain: centos.org
47c289
compute:                                                                                                                                                                                                                                      
47c289
- hyperthreading: Enabled
47c289
  name: worker
47c289
  replicas: 0
47c289
controlPlane:
47c289
  hyperthreading: Enabled
47c289
  name: master
47c289
  replicas: 3
47c289
metadata:
47c289
  name: ocp.ci
47c289
networking:
47c289
  clusterNetwork:
47c289
  - cidr: 10.128.0.0/14
47c289
    hostPrefix: 23
47c289
  networkType: OpenShiftSDN
47c289
  serviceNetwork:
47c289
  - 172.30.0.0/16
47c289
platform:
47c289
  none: {}
47c289
fips: false
47c289
pullSecret: '<installation pull secret from cloud.redhat.com>'
47c289
sshKey: '<ssh key for the RHCOS nodes>'
47c289
```
47c289
47c289
47c289
*   get the **pullsecret** from [https://cloud.redhat.com/openshift/install/metal/user-provisioned](https://cloud.redhat.com/openshift/install/metal/user-provisioned) requires your access.redhat.com login.
47c289
*   “You must set the value of the replicas parameter to 0. This parameter controls the number of workers that the cluster creates and manages for you, which are functions that the cluster does not perform when you use user-provisioned infrastructure. You must manually deploy worker machines for the cluster to use before you finish installing OpenShift Container Platform.”
47c289
*   **1.1.8**. Once the **install-config.yaml** configuration has been added correctly, take a backup of this file for future installs or reference as the next step will consume it. Then run the following:
47c289
*   `openshift-install create manifests --dir=/home/dkirwan/ocp-ci-centos-org`
47c289
47c289
    INFO Consuming Install Config from target directory  
47c289
    WARNING Certificate 35183CE837878BAC77A802A8A00B6434857 from additionalTrustBundle is x509 v3 but not a certificate authority  
47c289
    WARNING Making control-plane schedulable by setting MastersSchedulable to true for  Scheduler cluster settings.
47c289
*   Running this command converts the **install-config.yaml** to a number of files eg:
47c289
```
47c289
    ~/ocp-ci-centos-org $ tree .
47c289
    .
47c289
    ├── manifests
47c289
    │   ├── 04-openshift-machine-config-operator.yaml
47c289
    │   ├── cluster-config.yaml
47c289
    │   ├── cluster-dns-02-config.yml
47c289
    │   ├── cluster-infrastructure-02-config.yml
47c289
    │   ├── cluster-ingress-02-config.yml
47c289
    │   ├── cluster-network-01-crd.yml
47c289
    │   ├── cluster-network-02-config.yml
47c289
    │   ├── cluster-proxy-01-config.yaml
47c289
    │   ├── cluster-scheduler-02-config.yml
47c289
    │   ├── cvo-overrides.yaml
47c289
    │   ├── etcd-ca-bundle-configmap.yaml
47c289
    │   ├── etcd-client-secret.yaml
47c289
    │   ├── etcd-host-service-endpoints.yaml
47c289
    │   ├── etcd-host-service.yaml
47c289
    │   ├── etcd-metric-client-secret.yaml
47c289
    │   ├── etcd-metric-serving-ca-configmap.yaml
47c289
    │   ├── etcd-metric-signer-secret.yaml
47c289
    │   ├── etcd-namespace.yaml
47c289
    │   ├── etcd-service.yaml
47c289
    │   ├── etcd-serving-ca-configmap.yaml
47c289
    │   ├── etcd-signer-secret.yaml
47c289
    │   ├── kube-cloud-config.yaml
47c289
    │   ├── kube-system-configmap-root-ca.yaml
47c289
    │   ├── machine-config-server-tls-secret.yaml
47c289
    │   ├── openshift-config-secret-pull-secret.yaml
47c289
    │   └── user-ca-bundle-config.yaml
47c289
    └── openshift
47c289
        ├── 99_kubeadmin-password-secret.yaml
47c289
    	├── 99_openshift-cluster-api_master-user-data-secret.yaml
47c289
      	├── 99_openshift-cluster-api_worker-user-data-secret.yaml
47c289
     	├── 99_openshift-machineconfig_99-master-ssh.yaml
47c289
    	├── 99_openshift-machineconfig_99-worker-ssh.yaml
47c289
    	└── openshift-install-manifests.yaml
47c289
    2 directories, 32 files
47c289
```
47c289
47c289
*    Edit **manifests/cluster-scheduler-02-config.yml** and set **mastersSchedulable** to false. This will prevent Pods from being scheduled on the master instances.
47c289
*   `sed -i 's/mastersSchedulable: true/mastersSchedulable: false/g' manifests/cluster-scheduler-02-config.yml`
47c289
*   Create the machineconfigs to disable dhcp on the master/worker nodes: 
47c289
47c289
```
47c289
for variant in master worker; do 
47c289
cat << EOF > ./99_openshift-machineconfig_99-${variant}-nm-nodhcp.yaml
47c289
apiVersion: machineconfiguration.openshift.io/v1
47c289
kind: MachineConfig
47c289
metadata:
47c289
  labels:
47c289
    machineconfiguration.openshift.io/role: ${variant}
47c289
  name: nm-${variant}-nodhcp
47c289
spec:
47c289
  config:
47c289
    ignition:
47c289
      config: {}
47c289
      security:
47c289
        tls: {}
47c289
      timeouts: {}
47c289
      version: 2.2.0
47c289
    networkd: {}
47c289
    passwd: {}
47c289
    storage:
47c289
      files:
47c289
      - contents:
47c289
          source: data:text/plain;charset=utf-8;base64,W21haW5dCm5vLWF1dG8tZGVmYXVsdD0qCg==
47c289
          verification: {}
47c289
        filesystem: root
47c289
        mode: 0644
47c289
        path: /etc/NetworkManager/conf.d/disabledhcp.conf
47c289
  osImageURL: ""
47c289
EOF
47c289
done
47c289
```
47c289
47c289
*   *NOTE* There is a gotcha here, fs mode is **octal** and should start with 0 eg 0644 (-rwxr--r--), however it will be **decimal** value 420 when queried later via kubernetes api.
47c289
*   Create the ignition configurations:
47c289
*   Rename `worker.ign` to `compute.ign`, as later steps in the process are configured to point at compute.ign.
47c289
47c289
```
47c289
openshift-install create ignition-configs --dir=/home/dkirwan/ocp-ci-centos-org
47c289
INFO Consuming OpenShift Install (Manifests) from target directory  
47c289
INFO Consuming Common Manifests from target directory  
47c289
INFO Consuming Master Machines from target directory  
47c289
INFO Consuming Worker Machines from target directory  
47c289
INFO Consuming Openshift Manifests from target directory
47c289
47c289
# Should have the following layout
47c289
.
47c289
├── auth
47c289
│   ├── kubeadmin-password
47c289
│   └── kubeconfig
47c289
├── bootstrap.ign
47c289
├── master.ign
47c289
├── metadata.json
47c289
└── compute.ign
47c289
```
47c289
47c289
47c289
*   *NOTE* for production ie `ocp.ci` we must perform an extra step at this point, as the machines have 2 hard disks attached. We want to ensure that `/dev/sdb` gets its partition table wiped at bootstrapping time, so at a later time we can configure the Local Storage Operator to manage this disk drive.
47c289
*   Modify the `master.ign` and `compute.ign` ignition files with the following:
47c289
47c289
```
47c289
+   "storage":{"disks":[{"device":"/dev/sdb","wipeTable":true}]},
47c289
-   "storage":{},
47c289
```
47c289
47c289
47c289
*   **1.1.9. Creating Red Hat Enterprise Linux CoreOS (RHCOS) machines**
47c289
*   Prerequisites: 
47c289
*   Obtain the Ignition config files for your cluster. 
47c289
*   Configure suitable PXE or iPXE infrastructure. 
47c289
*   Have access to an HTTP server that you can access from your computer.
47c289
*   Have a load balancer eg Haproxy available
47c289
*   You must download the kernel, initramfs, ISO file and the RAW disk files eg:
47c289
*   [https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/](https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/)
47c289
    *    [rhcos-4.3.8-x86_64-installer-kernel-x86_64](https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/rhcos-4.3.8-x86_64-installer-kernel-x86_64)
47c289
    * [rhcos-4.3.8-x86_64-installer-initramfs.x86_64.img](https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/rhcos-4.3.8-x86_64-installer-initramfs.x86_64.img)
47c289
    * [rhcos-4.3.8-x86_64-installer.x86_64.iso](https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/rhcos-4.3.8-x86_64-installer.x86_64.iso)
47c289
    * [rhcos-4.3.8-x86_64-metal.x86_64.raw.gz](https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/rhcos-4.3.8-x86_64-metal.x86_64.raw.gz)
47c289
*   These files should be copied over to a webserver which is accessible from the bootstrap/master/compute instances.
47c289
*   **1.1.9.2.** “Configure the network boot infrastructure so that the machines boot from their local disks after RHCOS is installed on them. “
47c289
*   Existing CentOS PXE boot configuration Ansible [example](https://github.com/CentOS/ansible-infra-playbooks/blob/master/templates/pxeboot.j2)
47c289
*   Example RHCOS PXE boot configuration [here](https://projects.engineering.redhat.com/secure/attachment/104734/centos-ci-pxe_sampleconfig.txt)
47c289
*   **1.1.10. Once the systems are booting and installing, you can monitor the installation with: `./openshift-install --dir=/home/dkirwan/ocp-ci-centos-org wait-for bootstrap-complete --log-level=info`
47c289
*   Once the master nodes come up successfully, this command will exit. We can now remove the bootstrap instance, and repurpose it as a worker/compute node.
47c289
*   Run the haproxy role, once the bootstrap node has been removed from the `ocp-ci-master-and-bootstrap-stg` ansible inventory group.
47c289
*   Begin installing the compute/worker nodes.
47c289
*   Once the workers are up accept them into the cluster by accepting their `csr` certs:
47c289
```
47c289
# List the certs. If you see status pending, this is the worker/compute nodes attempting to join the cluster. It must be approved.
47c289
oc get csr
47c289
47c289
# Accept all node CSRs one liner
47c289
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
47c289
```
47c289
*   1.1.11. Logging in to the cluster. At this point the cluster is up, and we’re in configuration territory.
47c289
47c289
47c289
## Manually test the bootstrap process RHCOS
47c289
47c289
Resources:
47c289
47c289
*   [1] JIRA corresponding with this section: [CPE-661](https://projects.engineering.redhat.com/browse/CPE-661)
47c289
*   [2] [https://github.com/CentOS/ansible-infra-playbooks/pull/4](https://github.com/CentOS/ansible-infra-playbooks/pull/4)
47c289
*   [3] [https://scm.infra.centos.org/CentOS/ansible-inventory-ci/pulls/1](https://scm.infra.centos.org/CentOS/ansible-inventory-ci/pulls/1)
47c289
*   [4] [https://scm.infra.centos.org/CentOS/ansible-pkistore-ci/pulls/1](https://scm.infra.centos.org/CentOS/ansible-pkistore-ci/pulls/1)
47c289
*   [5] [CentOS/ansible-infra-playbooks/staging/templates/ocp_pxeboot.j2](https://raw.githubusercontent.com/CentOS/ansible-infra-playbooks/staging/templates/ocp_pxeboot.j2)
47c289
*   [https://www.openshift.com/blog/openshift-4-bare-metal-install-quickstart](https://www.openshift.com/blog/openshift-4-bare-metal-install-quickstart)
47c289
*   [6] [Create a raid enabled data volume via ignition file](https://coreos.com/ignition/docs/latest/examples.html#create-a-raid-enabled-data-volume)
47c289
*   [7] HAProxy config for OCP4 [https://github.com/openshift-tigerteam/guides/blob/master/ocp4/ocp4-haproxy.cfg](https://github.com/openshift-tigerteam/guides/blob/master/ocp4/ocp4-haproxy.cfg)
47c289
47c289
47c289
Steps:
47c289
47c289
*   Create ssh key pair using `ssh-keygen` and uploaded it to the ansible-pkistore-ci repository at [4]
47c289
*   Through trial and error, we’ve produced a PXE boot configuration for one of the machines and managed to get it to boot and begin the bootstrap process via an ignition file see [5].
47c289
*   Next steps is to make a decision on networking configuration then configure DNS and create 2 haproxy proxies before creating the bootstrap and master OCP nodes. Jiras created: [CPE-678](https://projects.engineering.redhat.com/browse/CPE-678), [CPE-677](https://projects.engineering.redhat.com/browse/CPE-677) and [CPE-676](https://projects.engineering.redhat.com/browse/CPE-676)
47c289
*   PR configuration for the HAProxy loadbalancers: [here](https://github.com/CentOS/ansible-role-haproxy/pull/2)
47c289
*   Configuration for DNS/bind (encrypted): [here](https://scm.infra.centos.org/CentOS/ansible-filestore-ci/src/branch/master/bind/ci.centos.org)