Blame docs/operations/ci/outage_preparation.md

47c289
# CentOS CI Infra Outage Preparation
47c289
During a scheduled outage where it is likely we will lose network access entirely to the entire rack, or between racks, it is advisable to shutdown the following services:
47c289
47c289
- Duffy
47c289
- CentOS CI Openshift prod/stg
47c289
- Legacy CI Jenkins
47c289
- Legacy OKD
47c289
- keepalived on gateway02.ci.centos.org
47c289
47c289
47c289
## Legacy OKD
47c289
1. bstinson, as the only person on the team which has access to the legacy OKD cluster, must handle tasks related to this cluster.
47c289
47c289
47c289
## OCP
47c289
https://github.com/centosci/ocp4-docs/blob/master/sops/create_etcd_backup.md
47c289
https://github.com/centosci/ocp4-docs/blob/master/sops/cordoning_nodes_and_draining_pods.md
47c289
https://github.com/centosci/ocp4-docs/blob/master/sops/graceful_shutdown_ocp_cluster.md
47c289
47c289
Admin nodes
47c289
Prod: ocp-admin.ci.centos.org
47c289
Stg: n4-136.cloud.ci.centos.org
47c289
47c289
2. Take etcd backup to the admin node associated with prod/stg
47c289
3. Cordon and drain all nodes
47c289
4. gracefully shutdown
47c289
47c289
47c289
## Duffy
47c289
47c289
5. switch off duffy - workers
47c289
    * source duffy2-venv/bin/activate; FLASK_APP=duffy DUFFY_SETTINGS=/etc/duffy.conf python scripts/worker.py
47c289
6. switch off duffy server
47c289
    * FLASK_APP=duffy DUFFY_SETTINGS=/etc/duffy.conf flask run -h 0.0.0.0 -p 8080
47c289
7. ci.centos.org legacy jenkins: manage jenkins, prepare for shutdown
47c289
    * ssh jenkins - systemctl restart jenkins
47c289
47c289
## keepalived on Gateway nodes
47c289
47c289
8. Shutdown keepalived on gateway02.ci.centos.org
47c289
   *  sudo systemctl stop keepalived
47c289
47c289