Tree - centos/centos-infra-docs

centos / centos-infra-docs

Blame docs/operations/deploy/bare-metal.md

Blob History Raw

		ee6248	`# Bare-metal host deploy operation`
		ee6248
		ee6248	`This process can be used to add a new bare-metal node in the CentOS Infra/inventory.`
		ee6248	It can be hosted within the `Community Cage` (Red Hat) DC, or dedicated/hosted server hosted by a CentOS sponsor
		ee6248
		ee6248	`## DataCenter we control (Red Hat DC)`
		ee6248
		ee6248	`Through internal ticket with PNT/DevOps we ensure that machine/chassis is racked, and documented.`
		ee6248	`We also add it in the [Internal Inventory](https://docs.google.com/spreadsheets/d/1K-aewLJ17z3pRC6K5qyBRJYtNXy1WcxRSVwPkGf4NXQ), and start also "reserving" IP addresses needed for IPMI/iDrac/mgmt vlan interface and also for Operating System.`
		ee6248
		ee6248	`We also have to create probably another ticket on [internal](https://help.redhat.com) portal to ensure that ToR switches (that we don't have control on) would have ports configured correctly (enabled, set to correct VLAN PVID, etc)`
		ee6248
		ee6248	`### Hardware initialization`
		ee6248
		ee6248	There is a very small ip range in the mgmt vlan available for new nodes that would be connected. So on the internal dhcpd node (see in inventory which server is current for the `boot-server` ansible role), you can always verify/see if new machine is leased an ip from the oob/management vlan.
		ee6248
		ee6248	Once we have `dial tone` on the hardware side (oob/mgmt vlan), we need to ensure that we :
		ee6248
		ee6248	`* change default credentials with randomly generated one`
		ee6248	`* configure alerting for hardware issues`
		ee6248	`* setup correctly raid array if we have a hardware raid controller`
		ee6248
		ee6248	`### Preparing PXE/UEFI boot env`
		ee6248
		ee6248	`If we want ansible to automatically deploy it, we'll just have to add the node in the inventory and ensure that the <inventory>/host_vars/<node> will have at least :`
		ee6248
		6f32ae	`* following variables set :`
		ee6248	* ipmi_ip`, `ipmi_user`, `ipmi_pass` : used to remotely pxe boot the node
		ee6248	* `ip` , `gateway`, `netmask` and `dns` (usually apart from `ip`, which is unique, the rest is coming through inheritance
		ee6248	`* based on group inheritance, ensure that variables documented in [adhoc-provision-node.yml](https://github.com/CentOS/ansible-infra-playbooks/blob/master/adhoc-provision-node.yml) are also defined`
		ee6248
		ee6248	`### Deploying the machine`
		ee6248
		ee6248	`If previous steps are done and also network switch port[s] working, we can just now proceed with ansible :`
		ee6248
		ee6248	```
		ee6248	`ansible-playbook-prod playbooks/adhoc-provision-node.yml`
		ee6248	`[WARNING] Nodes to be fully wiped/reinstalled with CentOS => : <my_new_node[s>`
		ee6248	```
		ee6248
		ee6248	In a summary that playbook will (through `delegate_to` ansible tasks) :
		ee6248
		ee6248	`* prepare the kickstart needed for the host to be deployed (jinja2 template)`
		ee6248	`* prepare the pxe/tftp/grub settings to boot from network (on the tftpd node)`
		ee6248	`* use ipmi to reset the hardware node and force booting over pxe`
		ee6248	`* wait for sshd to be available on the freshly deployed node`
		ee6248
		ee6248	`!!! warning`
		ee6248	Attention : this will wipe existing operating system, reason why that playbook is using ansible `vars_prompt` to ensure that it's waiting for input that you need to verify. As you can also specify a group of machines to also be deployed but a wrong input would destroy/reinstall existing nodes.
		ee6248
		ee6248	`## Sponsored machine`
		ee6248
		ee6248	`When we receive a new dedicated server, hosted in another DC that we don't control (no pxe/dhcp), the process usually goes like this :`
		ee6248
		ee6248	`* through email exchanged with sponsor, we agree on a minimal setup`
		ee6248	`* we receive initial credentials`
		ee6248	`* we collect needed informations (like ipv4/ipv6 address[es], dns resolvers, etc)`
		ee6248	`* we perform remotely (without remote console access) a reinstall on itself (faster then auditing the state in which we receive a machine) that is reinstalled following our standards`
		ee6248	`* we add node in dns/ansible (see [Common section](/operations/deploy/common) )`
		ee6248

centos / centos-infra-docs

Source Code

Blame docs/operations/deploy/bare-metal.md