From ee6248db113abb9dc4a0b189dae35c9379a2b01e Mon Sep 17 00:00:00 2001 From: Fabian Arrotin Date: Jun 30 2021 14:49:58 +0000 Subject: Basic explanations about bare-metal/VM deploy in infra Signed-off-by: Fabian Arrotin --- diff --git a/docs/index.md b/docs/index.md index 0731b2c..42ddf65 100644 --- a/docs/index.md +++ b/docs/index.md @@ -34,3 +34,14 @@ Worth also knowing that this site is automatically rendered from a [git reposito !!! tip You can use the `search` feature/box on top level to search for some specific topics or keywords + +## Available Environments + +While the same automation code should be used to configure all infra/services nodes within CentOS Infra, it's worth knowing that we still "divide" it into sub-sections, and so having different environments. +Let's just have a quick look at the existing environments, *each* using its own dedicated [Ansible](/ansible) inventory, and so various settings and/or permitted access : + + * `CentOS main` : if not defined, all the nodes considered as "production" nodes and managed as such + * `CentOS staging` (STG) : pre-prod environment, with limited number of nodes, but mostly used to test changes/deployments before being rolled-out to the `CentOS main` one + * `CentOS dev` (DEV) : really ephemeral setup pointing to very low spec machines (usually VMs) to test new stack/applications and write automation before being then deployed in `CentOS staging` + * `CentOS CI` : everything that is configuring/deploying the infra behind `ci.centos.org` domain (public or internal) + * `CentOS Stream MVBE` : dedicated/isolated environment for CentOS Stream 9 buildsys and having its own inventory/rollout strategy diff --git a/docs/operations/deploy/bare-metal.md b/docs/operations/deploy/bare-metal.md index e69de29..b2177a6 100644 --- a/docs/operations/deploy/bare-metal.md +++ b/docs/operations/deploy/bare-metal.md @@ -0,0 +1,60 @@ +# Bare-metal host deploy operation + +This process can be used to add a new bare-metal node in the CentOS Infra/inventory. +It can be hosted within the `Community Cage` (Red Hat) DC, or dedicated/hosted server hosted by a CentOS sponsor + +## DataCenter we control (Red Hat DC) + +Through internal ticket with PNT/DevOps we ensure that machine/chassis is racked, and documented. +We also add it in the [Internal Inventory](https://docs.google.com/spreadsheets/d/1K-aewLJ17z3pRC6K5qyBRJYtNXy1WcxRSVwPkGf4NXQ), and start also "reserving" IP addresses needed for IPMI/iDrac/mgmt vlan interface and also for Operating System. + +We also have to create probably another ticket on [internal](https://help.redhat.com) portal to ensure that ToR switches (that we don't have control on) would have ports configured correctly (enabled, set to correct VLAN PVID, etc) + +### Hardware initialization + +There is a *very* small ip range in the mgmt vlan available for new nodes that would be connected. So on the internal dhcpd node (see in inventory which server is current for the `boot-server` ansible role), you can always verify/see if new machine is leased an ip from the oob/management vlan. + +Once we have `dial tone` on the hardware side (oob/mgmt vlan), we need to ensure that we : + + * change default credentials with randomly generated one + * configure alerting for hardware issues + * setup correctly raid array if we have a hardware raid controller + +### Preparing PXE/UEFI boot env + +If we want ansible to automatically deploy it, we'll just have to add the node in the inventory and ensure that the /host_vars/ will have at least : + + * following variables set : ` + * ipmi_ip`, `ipmi_user`, `ipmi_pass` : used to remotely pxe boot the node + * `ip` , `gateway`, `netmask` and `dns` (usually apart from `ip`, which is unique, the rest is coming through inheritance + * based on group inheritance, ensure that variables documented in [adhoc-provision-node.yml](https://github.com/CentOS/ansible-infra-playbooks/blob/master/adhoc-provision-node.yml) are also defined + +### Deploying the machine + +If previous steps are done and also network switch port[s] working, we can just now proceed with ansible : + +``` +ansible-playbook-prod playbooks/adhoc-provision-node.yml +[WARNING] Nodes to be fully wiped/reinstalled with CentOS => : +``` + +In a summary that playbook will (through `delegate_to` ansible tasks) : + + * prepare the kickstart needed for the host to be deployed (jinja2 template) + * prepare the pxe/tftp/grub settings to boot from network (on the tftpd node) + * use ipmi to reset the hardware node and force booting over pxe + * wait for sshd to be available on the freshly deployed node + +!!! warning + Attention : this will *wipe* existing operating system, reason why that playbook is using ansible `vars_prompt` to ensure that it's waiting for input that *you* need to verify. As you can also specify a group of machines to also be deployed but a wrong input would destroy/reinstall existing nodes. + +## Sponsored machine + +When we receive a new dedicated server, hosted in another DC that we don't control (no pxe/dhcp), the process usually goes like this : + + * through email exchanged with sponsor, we agree on a minimal setup + * we receive initial credentials + * we collect needed informations (like ipv4/ipv6 address[es], dns resolvers, etc) + * we perform remotely (without remote console access) a reinstall on itself (faster then auditing the state in which we receive a machine) that is reinstalled following our standards + * we add node in dns/ansible (see [Common section](/operations/deploy/common) ) + diff --git a/docs/operations/deploy/virtual-machine.md b/docs/operations/deploy/virtual-machine.md index e69de29..9ee1cbc 100644 --- a/docs/operations/deploy/virtual-machine.md +++ b/docs/operations/deploy/virtual-machine.md @@ -0,0 +1,30 @@ +# Virtual Machine deploy operation + +## CentOS infra KVM hosts + +If the KVM virtual machine we want to deploy is hosted on a CentOS KVM host and hosted within the `Community Cage` (Red Hat) DC, same thing as for bare-metal host : add it in the [Internal Inventory](https://docs.google.com/spreadsheets/d/1K-aewLJ17z3pRC6K5qyBRJYtNXy1WcxRSVwPkGf4NXQ), and start also "reserving" IP address[es] for Operating System. + +Then add the new node to /host_vars/ and also in the /kvm file/group (for inheritance). + +Ensure that *all* the variables explained in [adhoc-deploy-kvm-guest](https://github.com/CentOS/ansible-infra-playbooks/blob/master/adhoc-deploy-kvm-guest.yml) playbook are set correctly. + +Then you can call it like that : + +``` + ansible-playbook playbooks/adhoc-deploy-kvm-guest.yml +[WARNING] KVM guests to be deployed with CentOS => : + +``` + +The process will go like this : + + * ansible will generate a kickstart locally on the kvm host + * will deploy a wrapper template/script (calling virt-install) with correct settings + * call that script to kick that `virt-install` script that will inject the kickstart into initrd on demand + * wait for sshd to be available on the node + * clean-up the `virt-install` wrapper script for the kvm guest + + +## Cloud providers + +### AWS/EC2 diff --git a/mkdocs.yml b/mkdocs.yml index 3015b81..dfc0373 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -9,11 +9,9 @@ nav: - Operations: - operations/intro.md - Common infra: - - operations/deploy/common.md - operations/deploy/bare-metal.md - operations/deploy/virtual-machine.md - - operations/sponsored/ec2.md - - operations/sponsored/bare-metal.md + - operations/deploy/common.md - operations/decommission.md - CI Infra: - SOP: