Tree - centos/centos-infra-docs

centos / centos-infra-docs

Files

Commit: 4166e679190d107506415359d0c6bb59812f1b3e
Blob Blame History Raw
# Bare-metal host deploy operation

This process can be used to add a new bare-metal node in the CentOS Infra/inventory.
It can be hosted within the `Community Cage` (Red Hat) DC, or dedicated/hosted server hosted by a CentOS sponsor

## DataCenter we control (Red Hat DC)

Through internal ticket with PNT/DevOps we ensure that machine/chassis is racked, and documented.
We also add it in the [Internal Inventory](https://docs.google.com/spreadsheets/d/1K-aewLJ17z3pRC6K5qyBRJYtNXy1WcxRSVwPkGf4NXQ), and start also "reserving" IP addresses needed for IPMI/iDrac/mgmt vlan interface and also for Operating System.

We also have to create probably another ticket on [internal](https://help.redhat.com) portal to ensure that ToR switches (that we don't have control on) would have ports configured correctly (enabled, set to correct VLAN PVID, etc)

### Hardware initialization

There is a *very* small ip range in the mgmt vlan available for new nodes that would be connected. So on the internal dhcpd node (see in inventory which server is current for the `boot-server` ansible role), you can always verify/see if new machine is leased an ip from the oob/management vlan.

Once we have `dial tone` on the hardware side (oob/mgmt vlan), we need to ensure that we :

 * change default credentials with randomly generated one
 * configure alerting for hardware issues
 * setup correctly raid array if we have a hardware raid controller

### Preparing PXE/UEFI boot env

If we want ansible to automatically deploy it, we'll just have to add the node in the inventory and ensure that the <inventory>/host_vars/<node> will have at least : 

  * following variables set :
    * ipmi_ip`, `ipmi_user`, `ipmi_pass` : used to remotely pxe boot the node
    * `ip` , `gateway`, `netmask` and `dns` (usually apart from `ip`, which is unique, the rest is coming through inheritance
  * based on group inheritance, ensure that variables documented in [adhoc-provision-node.yml](https://github.com/CentOS/ansible-infra-playbooks/blob/master/adhoc-provision-node.yml) are also defined

### Deploying the machine

If previous steps are done and also network switch port[s] working, we can just now proceed with ansible :

```
ansible-playbook-prod playbooks/adhoc-provision-node.yml 
[WARNING] Nodes to be fully wiped/reinstalled with CentOS => : <my_new_node[s>
``` 

In a summary that playbook will (through `delegate_to` ansible tasks) : 

  * prepare the kickstart needed for the host to be deployed (jinja2 template)
  * prepare the pxe/tftp/grub settings to boot from network (on the tftpd node)
  * use ipmi to reset the hardware node and force booting over pxe
  * wait for sshd to be available on the freshly deployed node

!!! warning
    Attention : this will *wipe* existing operating system, reason why that playbook is using ansible `vars_prompt` to ensure that it's waiting for input that *you* need to verify. As you can also specify a group of machines to also be deployed but a wrong input would destroy/reinstall existing nodes.

## Sponsored machine

When we receive a new dedicated server, hosted in another DC that we don't control (no pxe/dhcp), the process usually goes like this : 

  * through email exchanged with sponsor, we agree on a minimal setup
  * we receive initial credentials
  * we collect needed informations (like ipv4/ipv6 address[es], dns resolvers, etc)
  * we perform remotely (without remote console access) a reinstall on itself (faster then auditing the state in which we receive a machine) that is reinstalled following our standards
  * we add node in dns/ansible (see [Common section](/operations/deploy/common) )
centos / centos-infra-docs

Source Code

Files