Bare-metal host deploy operation

This process can be used to add a new bare-metal node in the CentOS Infra/inventory. It can be hosted within the Community Cage (Red Hat) DC, or dedicated/hosted server hosted by a CentOS sponsor

DataCenter we control (Red Hat DC)

Through internal ticket with PNT/DevOps we ensure that machine/chassis is racked, and documented. We also add it in the Internal Inventory, and start also "reserving" IP addresses needed for IPMI/iDrac/mgmt vlan interface and also for Operating System.

We also have to create probably another ticket on internal portal to ensure that ToR switches (that we don't have control on) would have ports configured correctly (enabled, set to correct VLAN PVID, etc)

Hardware initialization

There is a very small ip range in the mgmt vlan available for new nodes that would be connected. So on the internal dhcpd node (see in inventory which server is current for the boot-server ansible role), you can always verify/see if new machine is leased an ip from the oob/management vlan.

Once we have dial tone on the hardware side (oob/mgmt vlan), we need to ensure that we :

change default credentials with randomly generated one
configure alerting for hardware issues
setup correctly raid array if we have a hardware raid controller

Preparing PXE/UEFI boot env

If we want ansible to automatically deploy it, we'll just have to add the node in the inventory and ensure that the <inventory>/host_vars/<node> will have at least :

following variables set :
- ipmi_ip,ipmi_user,ipmi_pass` : used to remotely pxe boot the node
- ip , gateway, netmask and dns (usually apart from ip, which is unique, the rest is coming through inheritance
based on group inheritance, ensure that variables documented in adhoc-provision-node.yml are also defined

Note

We can deploy both CentOS and RHEL so if you define rhel_version it will be deploying RHEL but otherwise it will default to CentOS and centos_version, which is normally 8-stream for now

Deploying the machine

If previous steps are done and also network switch port[s] working, we can just now proceed with ansible :

ansible-playbook-prod playbooks/adhoc-provision-node.yml 
[WARNING] Nodes to be fully wiped/reinstalled with CentOS => : <my_new_node[s>

In a summary that playbook will (through delegate_to ansible tasks) :

prepare the kickstart needed for the host to be deployed (jinja2 template)
prepare the pxe/tftp/grub settings to boot from network (on the tftpd node)
use ipmi to reset the hardware node and force booting over pxe
wait for sshd to be available on the freshly deployed node

Warning

Attention : this will wipe existing operating system, reason why that playbook is using ansible vars_prompt to ensure that it's waiting for input that you need to verify. As you can also specify a group of machines to also be deployed but a wrong input would destroy/reinstall existing nodes.

centos / centos-infra-docs

Source Code

Files

Bare-metal host deploy operation

DataCenter we control (Red Hat DC)

Hardware initialization

Preparing PXE/UEFI boot env

Deploying the machine

Sponsored machine