Text Blame History Raw

Bare-metal host deploy operation

This process can be used to add a new bare-metal node in the CentOS Infra/inventory. It can be hosted within the Community Cage (Red Hat) DC, or dedicated/hosted server hosted by a CentOS sponsor

DataCenter we control (Red Hat DC)

Through internal ticket with PNT/DevOps we ensure that machine/chassis is racked, and documented. We also add it in the Internal Inventory, and start also "reserving" IP addresses needed for IPMI/iDrac/mgmt vlan interface and also for Operating System.

We also have to create probably another ticket on internal portal to ensure that ToR switches (that we don't have control on) would have ports configured correctly (enabled, set to correct VLAN PVID, etc)

Hardware initialization

There is a very small ip range in the mgmt vlan available for new nodes that would be connected. So on the internal dhcpd node (see in inventory which server is current for the boot-server ansible role), you can always verify/see if new machine is leased an ip from the oob/management vlan.

Once we have dial tone on the hardware side (oob/mgmt vlan), we need to ensure that we :

  • change default credentials with randomly generated one
  • configure alerting for hardware issues
  • setup correctly raid array if we have a hardware raid controller

Preparing PXE/UEFI boot env

If we want ansible to automatically deploy it, we'll just have to add the node in the inventory and ensure that the <inventory>/host_vars/<node> will have at least :

  • following variables set :
    • ipmi_ip,ipmi_user,ipmi_pass` : used to remotely pxe boot the node
    • ip , gateway, netmask and dns (usually apart from ip, which is unique, the rest is coming through inheritance
  • based on group inheritance, ensure that variables documented in adhoc-provision-node.yml are also defined

Note

We can deploy both CentOS and RHEL so if you define rhel_version it will be deploying RHEL but otherwise it will default to CentOS and centos_version, which is normally 8-stream for now

Deploying the machine

If previous steps are done and also network switch port[s] working, we can just now proceed with ansible :

ansible-playbook-prod playbooks/adhoc-provision-node.yml 
[WARNING] Nodes to be fully wiped/reinstalled with CentOS => : <my_new_node[s>

In a summary that playbook will (through delegate_to ansible tasks) :

  • prepare the kickstart needed for the host to be deployed (jinja2 template)
  • prepare the pxe/tftp/grub settings to boot from network (on the tftpd node)
  • use ipmi to reset the hardware node and force booting over pxe
  • wait for sshd to be available on the freshly deployed node

Warning

Attention : this will wipe existing operating system, reason why that playbook is using ansible vars_prompt to ensure that it's waiting for input that you need to verify. As you can also specify a group of machines to also be deployed but a wrong input would destroy/reinstall existing nodes.

Sponsored machine

When we receive a new dedicated server, hosted in another DC that we don't control (no pxe/dhcp), the process usually goes like this :

  • through email exchanged with sponsor, we agree on a minimal setup
  • we receive initial credentials
  • we collect needed informations (like ipv4/ipv6 address[es], dns resolvers, etc)
  • we perform remotely (without remote console access) a reinstall on itself (faster then auditing the state in which we receive a machine) that is reinstalled following our standards
  • we add node in dns/ansible (see Common section )