centos / centos-infra-docs

Files

Commit: 307b0099ff7665f434cdaaf4117ab1c0a8bd8e6d

Text Blame History Raw

CentOS CI Infra Outage Preparation

During a scheduled outage where it is likely we will lose network access entirely to the entire rack, or between racks, it is advisable to shutdown the following services:

Duffy
CentOS CI Openshift prod/stg
Legacy CI Jenkins
Legacy OKD
keepalived on gateway02.ci.centos.org

Legacy OKD

bstinson, as the only person on the team which has access to the legacy OKD cluster, must handle tasks related to this cluster.

OCP

https://github.com/centosci/ocp4-docs/blob/master/sops/create_etcd_backup.md https://github.com/centosci/ocp4-docs/blob/master/sops/cordoning_nodes_and_draining_pods.md https://github.com/centosci/ocp4-docs/blob/master/sops/graceful_shutdown_ocp_cluster.md

Admin nodes Prod: ocp-admin.ci.centos.org Stg: n4-136.cloud.ci.centos.org

Take etcd backup to the admin node associated with prod/stg
Cordon and drain all nodes
gracefully shutdown

Duffy

switch off duffy - workers
- source duffy2-venv/bin/activate; FLASK_APP=duffy DUFFY_SETTINGS=/etc/duffy.conf python scripts/worker.py
switch off duffy server
- FLASK_APP=duffy DUFFY_SETTINGS=/etc/duffy.conf flask run -h 0.0.0.0 -p 8080
ci.centos.org legacy jenkins: manage jenkins, prepare for shutdown
- ssh jenkins - systemctl restart jenkins

keepalived on Gateway nodes

Shutdown keepalived on gateway02.ci.centos.org * sudo systemctl stop keepalived

Powered by Pagure 5.14.1

SSH Hostkey/Fingerprint | Documentation