Warning
This section needs attention and isn't meant to be simply a copy/paste operation. "Some Thinking Required [TM]" mantra applies here.
Assuming that you need to remotely reinstall a physical server (like a sponsored node in a remote DC) where you don't control dhcp (so no pxe install) and also without remote console access (so ipmi nor keyboard/video/mouse - kvm - feature), you can always combine multiple elements all together :
Before reinstalling a node from a major version to a new major version, you need first to verify the HCL and if the network card and HBA is still supported as a kernel module. It happens that from centos release to new one, some kernel modules are gone (in the rhel kernel) and so you wouldn't have working network, nor disks.
So from the machine that you need to reinstall (you probably have ssh/root access somehow) verify which kernel module is used for network card. Let's assume that it's enp2s0f0
:
ethtool -i enp2s0f0|egrep 'driver|^version' driver: mlx5_core version: 4.18.0-338.el8.x86_64
So our kernel module is mlx5_core
. Let's now check the Hard disk, assuming that it's /dev/sda
.
We can use udevadm
to show the
udevadm info -a -n /dev/sda| egrep 'looking|DRIVER' looking at device '/devices/pci0000:00/0000:00:03.0/0000:03:00.0/host0/target0:2:0/0:2:0:0/block/sda': DRIVER=="" looking at parent device '/devices/pci0000:00/0000:00:03.0/0000:03:00.0/host0/target0:2:0/0:2:0:0': DRIVERS=="sd" looking at parent device '/devices/pci0000:00/0000:00:03.0/0000:03:00.0/host0/target0:2:0': DRIVERS=="" looking at parent device '/devices/pci0000:00/0000:00:03.0/0000:03:00.0/host0': DRIVERS=="" looking at parent device '/devices/pci0000:00/0000:00:03.0/0000:03:00.0': DRIVERS=="megaraid_sas" looking at parent device '/devices/pci0000:00/0000:00:03.0': DRIVERS=="pcieport" looking at parent device '/devices/pci0000:00': DRIVERS==""
So device /dev/sda is attached to host0, which uses kernel module megaraid_sas
Now that we have our mlx5_core
and megaraid_sas
kernel modules, we have to verify on a target system that they exist in new kernel :
for i in mlx5_core megaraid_sas ; do modinfo $i|egrep name; done filename: /lib/modules/4.18.0-338.el8.x86_64/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko.xz name: mlx5_core filename: /lib/modules/4.18.0-338.el8.x86_64/kernel/drivers/scsi/megaraid/megaraid_sas.ko.xz name: megaraid_sas
Fine, they exist so our hardware should be compatible for a (re)install with new major version.
We can just define from which closest mirror we want to reinstall, define a unique and temporary vnc password (only used during the install, not used anymore after, generated with pwgen -s 8 1
), a hostname for install and proceed with install :
mirror_url="http://mirror.centos.org/centos/8-stream/" arch=$(uname -m) hostname="reinstall1.dev.centos.org" vnc_pass="Xs9x0mx9" yum install -y wget kexec-tools cd /boot curl --location --fail ${mirror_url}/BaseOS/${arch}/os/images/pxeboot/vmlinuz > vmlinuz.install curl --location --fail ${mirror_url}/BaseOS/${arch}/os/images/pxeboot/initrd.img > initrd.img.install
Info
Worth knowing that if you want to access a user/pass protected mirror , you can use http://user:pass@mirror.fqdn or even better : https to ensure creds aren't sent in clear text
Let's gather some network informations :
dns=$(cat /etc/resolv.conf |grep nameserver|head -n1|awk '{print $2}') gateway=$(ip route|grep default|head -n 1|awk '{print $3}') eth_dev=$(ip route|grep default|head -n 1|awk '{print $5}') ip_addr=$(ip addr show dev $eth_dev|grep inet|grep $eth_dev|head -n 1|awk '{print $2}'|cut -f 1 -d '/') netmask=$(ipcalc --netmask $( ip addr show dev $eth_dev|grep inet|grep $eth_dev|head -n 1|awk '{print $2}')|cut -f 2 -d '=') ip6_addr=$(ip -6 addr show dev $eth_dev|grep glob|awk '{print $2}') ip6_gw=$(ip -6 route|grep default|awk '{print $3}') echo "list of devices : " echo "===================" ip addr|grep qdisc|awk '{print $2}'|tr -d ':' if [[ $eth_dev = *bond* ]] ; then echo "Bonding interface found ! " eth_dev=$(cat /proc/net/bonding/bond0 |grep 'Slave Interface'|head -n 1|awk '{print $3}') echo "Real device is $eth_dev" elif [[ $eth_dev = *eth* ]] ; then echo "Device is still named eth[*] so using net.ifnames=0" eth_opts="net.ifnames=0" echo "Eth device = $eth_dev" else echo "Eth device = $eth_dev" eth_opts="" fi echo ip=$ip_addr netmask=$netmask gateway=$gateway dns=$dns echo IPv6 : $ip6_addr / gw : $ip6_gw echo "nmcli con mod $eth_dev ipv6.method manual ipv6.address $ip6_addr ipv6.gateway $ip6_gw ; nmcli con up $eth_dev" echo "eth device= $eth_dev" echo "eth options = $eth_opts"
Danger
Now verify closely the informations and if that looks correct (remember : thinking required), select one of the following possible ways to kick the reinstall. In case of issue, you can always ask the remote DC to just reset
the node and it should come back on os installed on disk
# Normal kexec -l vmlinuz.install --append="$eth_opts biosdevname=0 rd.neednet=1 ksdevice=$eth_dev inst.repo=${mirror_url}/BaseOS/${arch}/os/ inst.lang=en_GB inst.keymap=be-latin1 inst.vnc inst.vncpassword=$vnc_pass ip=$ip_addr::$gateway:$netmask:$hostname:$eth_dev:none nameserver=$dns" --initrd=initrd.img.install && kexec -e # For Dell and biosdevname like eno1 etc kexec -l vmlinuz.install --append="$eth_opts rd.neednet=1 ksdevice=$eth_dev inst.repo=${mirror_url}/BaseOS/${arch}/os/ inst.lang=en_GB inst.keymap=be-latin1 inst.vnc inst.vncpassword=$vnc_pass ip=$ip_addr::$gateway:$netmask:$hostname:$eth_dev:none nameserver=$dns" --initrd=initrd.img.install && kexec -e # With console on ttyS0 (serial redirection, normally not needed) kexec -l vmlinuz.install --append="$eth_opts biosdevname=0 rd.neednet=1 ksdevice=$eth_dev inst.repo=${mirror_url}/BaseOS/${arch}/os/ inst.lang=en_GB inst.keymap=be-latin1 inst.vnc inst.vncpassword=$vnc_pass ip=$ip_addr::$gateway:$netmask:$hostname:$eth_dev:none nameserver=$dns console=ttyS0,115200n8" --initrd=initrd.img.install && kexec -e
As you launched this over ssh, you'll lose your connection (as new kernel will be started).
From your workstation (or elsewhere) you can try to test pinging the machine and wait for the server to have fetched stage2 image from mirror and launched anaconda through vnc. If machine responds to ping
you can just wait for vnc with a snippet like :
host="ip.address.of.reinstalled.node" while true do sleep 2 >/dev/null 2>&1 >/dev/tcp/${host}/5901 if [ "$?" = "0" ] ; then notify-send "${host} VNC is ready to be connected" echo "${host} ready for vnc connection"|festival --tts break fi done # launching vnc echo "launching vnc on ${host}" vncviewer ${host}:1 &
Some default settings that we use by default: