CentOS CI Infra Outage Preparation
During a scheduled outage where it is likely we will lose network access entirely to the entire rack, or between racks, it is advisable to shutdown the following services:
- Duffy
- CentOS CI Openshift prod/stg
- Legacy CI Jenkins
- Legacy OKD
- keepalived on gateway02.ci.centos.org
Legacy OKD
- bstinson, as the only person on the team which has access to the legacy OKD cluster, must handle tasks related to this cluster.
OCP
https://github.com/centosci/ocp4-docs/blob/master/sops/create_etcd_backup.md
https://github.com/centosci/ocp4-docs/blob/master/sops/cordoning_nodes_and_draining_pods.md
https://github.com/centosci/ocp4-docs/blob/master/sops/graceful_shutdown_ocp_cluster.md
Admin nodes
Prod: ocp-admin.ci.centos.org
Stg: n4-136.cloud.ci.centos.org
- Take etcd backup to the admin node associated with prod/stg
- Cordon and drain all nodes
- gracefully shutdown
Duffy
- switch off duffy - workers
- source duffy2-venv/bin/activate; FLASK_APP=duffy DUFFY_SETTINGS=/etc/duffy.conf python scripts/worker.py
- switch off duffy server
- FLASK_APP=duffy DUFFY_SETTINGS=/etc/duffy.conf flask run -h 0.0.0.0 -p 8080
- ci.centos.org legacy jenkins: manage jenkins, prepare for shutdown
- ssh jenkins - systemctl restart jenkins
keepalived on Gateway nodes
- Shutdown keepalived on gateway02.ci.centos.org
* sudo systemctl stop keepalived