Oracle Linux 7 kickstart and bonding

IPMP (IP MultiPathing) is a very important technique. It allows an increased maximum bandwidth of sent or received network traffic. It also provides network layer redundancy if one of the physical paths is broken and is therefore required on building HA systems. This also allows the host to remain connected to the network during network maintenance operations.

Using aggregated network interfaces during OS install allows mainly to reduce installation time as during installation of software on provisioned hosts usually the main bottleneck is maximum network bandwidth (usually local storage provides higher data transfer bandwidth than single network connection).

I’ve been using Linux bonding for many years. As long as providing the correct bonding setup is not an issue it seems everyone is fighting with bonding setup on install systems stage using kickstart.

All publicly available kickstart examples which are using bonding seems are using setting up bonding in %post by changing /etc/sysconfig/network-scripts/ifcfg-* files. Everything because anaconda installer has some issues with setting up or using bonding.

The other issue is that as long as network configuration used on boot stage is not the same as the one used in kickstart “network” line it causes that whole install takes sometimes few times longer than it should because “network” kickstart line uses network setup with bonding when on early stages of the install process is used only single interface. By this anaconda is trying to download mini rootfs over incorrect network setup.

Simple installer on downloading files from install server first is trying to use as first ksdevice network interface and after this is trying to download those files over each physical interface and at the end is using interface which is set up by KS profile “network”. In some cases like my provisioned system may be with quite big number of physical interfaces (in my case up to 10) so failing to next interface may take quite long time.

Seems only solution of above issue is to provide in kernel boot parameter passing over PXE settings network setup which will be matching with setup used during whole install.

Another issue is with anaconda used in OL7.2. At the end installation is successful but after final reboot there is no network communication because anaconda creates incomplete slave interface files.

So how it may look some working example of using bond network interface setup and some minimal set of changes fixing existing issues?

PXE menu file:

DEFAULT menu.c32
PROMPT   0
TIMEOUT   100
#ONTIMEOUT localdisk
ONTIMEOUT install_OL7

MENU TITLE PXE Network Boot - <my.hostname>

LABEL localdisk
 MENU LABEL ^Local Hard Drive
 MENU  DEFAULT
 KERNEL chain.c32
 APPEND hd0

LABEL rescue_ol7
 MENU LABEL ^Rescue mode OL7
 KERNEL OL7/vmlinuz
 APPEND initrd=OL7/initrd.img rescue ks=http://<install.server>//OL7/ks/rescue.ks

LABEL install_OL7
 MENU LABEL Install OL7-64-bit
 KERNEL OL7/vmlinuz
 APPEND initrd=OL7/initrd.img bond=bond0:enp3s0,enp7s0:mode=mode=6,miimon=100 ks=http://<install.server>//OL7/ks/<my.hostname>.ks 

LABEL reboot
 MENU LABEL ^Reboot
 KERNEL reboot.c32

As I wrote such initial anaconda network configuration passed in kernel command line must be matching with “network” part in kickstart profile in which must look like:

network  --device=bond0 --bondopts mode=6,miimon=100 --bondslaves enp3s0,enp7s0 --bootproto=static --noipv6 --hostname=<my.hostname> --ip=<IP> --gateway=<def.GW> --netmask=<netmask> --nameserver=<DNS.IP1>,<DNS.IP2>

So above setup theoretically should be working but after finish system installation and reboot system is booting correctly but communication over network is not working.

Why? Answer is very simple: bug in anaconda is causing that to /etc/sysconfig/network-scripts/ifcfg-bond0-slave{1,2} files are not added two lines indicating that physical  interface are bond slaves, and is missing information to which one master bond interface physical interface slave must be connected. Effectively anaconda is not adding two lines to each ifcfg-*slave* files:

SLAVE=yes
MASTER=<master_bond_interface>

I found that it is quite easy to fix this issue by add in kickstart %post script for example in case of two slaves of bond0 interface lines corrections like:

# ------ Setup: correcting bond0 slave interfaces setup
(echo SLAVE=yes; echo MASTER=bond0) >> /etc/sysconfig/network-scripts/ifcfg-bond0_slave_1
(echo SLAVE=yes; echo MASTER=bond0) >> /etc/sysconfig/network-scripts/ifcfg-bond0_slave_2

Nevertheless whole time spent on diagnosing this issue was worth it because in most of my cases systems are connected over T1000 interfaces and aggregating two network interfaces allows me to reduce install time by almost 50%. On doing multiple experiments with improved install profiles it will allow me save a lot of time 🙂

Advertisements

#bonding, #install, #kickstart, #linux, #pxe