# Copyright (C) 2009, 2010, 2013, 2014 Nicira Networks, Inc. # # Copying and distribution of this file, with or without modification, # are permitted in any medium without royalty provided the copyright # notice and this notice are preserved. This file is offered as-is, # without warranty of any kind. # # If tests have to be skipped while building, specify the '--without check' # option. For example: # rpmbuild -bb --without check rhel/openvswitch-fedora.spec # This defines the base package name's version. %define pkgname openvswitch2.16 %if 0%{?commit:1} %global shortcommit %(c=%{commit}; echo ${c:0:7}) %endif # Enable PIE, bz#955181 %global _hardened_build 1 # RHEL-7 doesn't define _rundir macro yet # Fedora 15 onwards uses /run as _rundir %if 0%{!?_rundir:1} %define _rundir /run %endif # FIXME Test "STP - flush the fdb and mdb when topology changed" fails on s390x # FIXME 2 tests fails on ppc64le. They will be hopefully fixed before official 2.11 %ifarch %{ix86} x86_64 aarch64 %bcond_without check %else %bcond_with check %endif # option to run kernel datapath tests, requires building as root! %bcond_with check_datapath_kernel # option to build with libcap-ng, needed for running OVS as regular user %bcond_without libcapng # option to build with ipsec support %bcond_without ipsec # Build python2 (that provides python) and python3 subpackages on Fedora # Build only python3 (that provides python) subpackage on RHEL8 # Build only python subpackage on RHEL7 %if 0%{?rhel} > 7 || 0%{?fedora} # On RHEL8 Sphinx is included in buildroot %global external_sphinx 1 %else # Don't use external sphinx (RHV doesn't have optional repositories enabled) %global external_sphinx 0 %endif Name: %{pkgname} Summary: Open vSwitch Group: System Environment/Daemons daemon/database/utilities URL: http://www.openvswitch.org/ Version: 2.16.0 Release: 48%{?dist} # Nearly all of openvswitch is ASL 2.0. The bugtool is LGPLv2+, and the # lib/sflow*.[ch] files are SISSL # datapath/ is GPLv2 (although not built into any of the binary packages) License: ASL 2.0 and LGPLv2+ and SISSL %define dpdkver 20.11.1 %define dpdkdir dpdk %define dpdksver %(echo %{dpdkver} | cut -d. -f-2) # NOTE: DPDK does not currently build for s390x # DPDK on aarch64 is not stable enough to be enabled in FDP %if 0%{?rhel} > 7 || 0%{?fedora} %define dpdkarches x86_64 ppc64le %else %define dpdkarches %endif %if 0%{?commit:1} Source: https://github.com/openvswitch/ovs/archive/%{commit}.tar.gz#/openvswitch-%{commit}.tar.gz %else Source: https://github.com/openvswitch/ovs/archive/v%{version}.tar.gz#/openvswitch-%{version}.tar.gz %endif Source10: https://fast.dpdk.org/rel/dpdk-%{dpdkver}.tar.xz %define docutilsver 0.12 %define pygmentsver 1.4 %define sphinxver 1.2.3 Source100: https://pypi.io/packages/source/d/docutils/docutils-%{docutilsver}.tar.gz Source101: https://pypi.io/packages/source/P/Pygments/Pygments-%{pygmentsver}.tar.gz Source102: https://pypi.io/packages/source/S/Sphinx/Sphinx-%{sphinxver}.tar.gz Patch: openvswitch-%{version}.patch # The DPDK is designed to optimize througput of network traffic using, among # other techniques, carefully crafted assembly instructions. As such it # needs extensive work to port it to other architectures. ExclusiveArch: x86_64 aarch64 ppc64le s390x # Do not enable this otherwise YUM will break on any upgrade. # Provides: openvswitch Conflicts: openvswitch < 2.16 Conflicts: openvswitch-dpdk < 2.16 Conflicts: openvswitch2.10 Conflicts: openvswitch2.11 Conflicts: openvswitch2.12 Conflicts: openvswitch2.13 Conflicts: openvswitch2.14 Conflicts: openvswitch2.15 # FIXME Sphinx is used to generate some manpages, unfortunately, on RHEL, it's # in the -optional repository and so we can't require it directly since RHV # doesn't have the -optional repository enabled and so TPS fails %if %{external_sphinx} BuildRequires: python3-sphinx %else # Sphinx dependencies BuildRequires: python-devel BuildRequires: python-setuptools #BuildRequires: python2-docutils BuildRequires: python-jinja2 BuildRequires: python-nose #BuildRequires: python2-pygments # docutils dependencies BuildRequires: python-imaging # pygments dependencies BuildRequires: python-nose %endif BuildRequires: gcc gcc-c++ make BuildRequires: autoconf automake libtool BuildRequires: systemd-units openssl openssl-devel BuildRequires: python3-devel python3-setuptools BuildRequires: desktop-file-utils BuildRequires: groff-base graphviz BuildRequires: unbound-devel # make check dependencies BuildRequires: procps-ng %if 0%{?rhel} > 7 || 0%{?fedora} BuildRequires: python3-pyOpenSSL %endif %if %{with check_datapath_kernel} BuildRequires: nmap-ncat # would be useful but not available in RHEL or EPEL #BuildRequires: pyftpdlib %endif %if %{with libcapng} BuildRequires: libcap-ng libcap-ng-devel %endif %ifarch %{dpdkarches} BuildRequires: meson # DPDK driver dependencies BuildRequires: zlib-devel numactl-devel %ifarch x86_64 BuildRequires: rdma-core-devel >= 15 libmnl-devel %endif # Required by packaging policy for the bundled DPDK Provides: bundled(dpdk) = %{dpdkver} %endif Requires: openssl iproute module-init-tools #Upstream kernel commit 4f647e0a3c37b8d5086214128614a136064110c3 #Requires: kernel >= 3.15.0-0 Requires: openvswitch-selinux-extra-policy Requires(pre): shadow-utils Requires(post): /bin/sed Requires(post): /usr/sbin/usermod Requires(post): /usr/sbin/groupadd Requires(post): systemd-units Requires(preun): systemd-units Requires(postun): systemd-units Obsoletes: openvswitch-controller <= 0:2.1.0-1 %description Open vSwitch provides standard network bridging functions and support for the OpenFlow protocol for remote per-flow control of traffic. %package -n python3-%{pkgname} Summary: Open vSwitch python3 bindings License: ASL 2.0 Requires: %{pkgname} = %{?epoch:%{epoch}:}%{version}-%{release} Provides: python-%{pkgname} = %{?epoch:%{epoch}:}%{version}-%{release} %description -n python3-%{pkgname} Python bindings for the Open vSwitch database %package test Summary: Open vSwitch testing utilities License: ASL 2.0 BuildArch: noarch Requires: python3-%{pkgname} = %{?epoch:%{epoch}:}%{version}-%{release} Requires: tcpdump %description test Utilities that are useful to diagnose performance and connectivity issues in Open vSwitch setup. %package devel Summary: Open vSwitch OpenFlow development package (library, headers) License: ASL 2.0 Requires: %{pkgname} = %{?epoch:%{epoch}:}%{version}-%{release} %description devel This provides shared library, libopenswitch.so and the openvswitch header files needed to build an external application. %if 0%{?rhel} > 7 || 0%{?fedora} > 28 %package -n network-scripts-%{name} Summary: Open vSwitch legacy network service support License: ASL 2.0 Requires: network-scripts Supplements: (%{name} and network-scripts) %description -n network-scripts-%{name} This provides the ifup and ifdown scripts for use with the legacy network service. %endif %if %{with ipsec} %package ipsec Summary: Open vSwitch IPsec tunneling support License: ASL 2.0 Requires: python3-%{pkgname} = %{?epoch:%{epoch}:}%{version}-%{release} Requires: libreswan %description ipsec This package provides IPsec tunneling support for OVS tunnels. %endif %prep %if 0%{?commit:1} %setup -q -n ovs-%{commit} -a 10 %else %setup -q -n ovs-%{version} -a 10 %endif %if ! %{external_sphinx} %if 0%{?commit:1} %setup -n ovs-%{commit} -q -D -T -a 100 -a 101 -a 102 %else %setup -n ovs-%{version} -q -D -T -a 100 -a 101 -a 102 %endif %endif mv dpdk-*/ %{dpdkdir}/ # FIXME should we propose a way to do that upstream? sed -ri "/^subdir\('(usertools|app)'\)/d" %{dpdkdir}/meson.build %patch -p1 %build # Build Sphinx on RHEL %if ! %{external_sphinx} export PYTHONPATH="${PYTHONPATH:+$PYTHONPATH:}%{_builddir}/pytmp/lib/python" for x in docutils-%{docutilsver} Pygments-%{pygmentsver} Sphinx-%{sphinxver}; do pushd "$x" python2 setup.py install --home %{_builddir}/pytmp popd done export PATH="$PATH:%{_builddir}/pytmp/bin" %endif ./boot.sh %ifarch %{dpdkarches} # build dpdk # Lets build DPDK first cd %{dpdkdir} ENABLED_DRIVERS=( bus/pci bus/vdev mempool/ring net/failsafe net/i40e net/ring net/vhost net/virtio net/tap ) %ifarch x86_64 ENABLED_DRIVERS+=( bus/vmbus common/iavf common/mlx5 net/bnxt net/enic net/iavf net/ice net/mlx5 net/netvsc net/nfp net/qede net/vdev_netvsc ) %endif %ifarch aarch64 x86_64 ENABLED_DRIVERS+=( net/e1000 net/ixgbe ) %endif # Since upstream doesn't have a way for driver in drivers/*/*/; do driver=${driver#drivers/} driver=${driver%/} [[ " ${ENABLED_DRIVERS[@]} " == *" $driver "* ]] || \ disable_drivers="${disable_drivers:+$disable_drivers,}"$driver done #CFLAGS="$(echo %{optflags} | sed -e 's:-Wall::g' -e 's:-march=[[:alnum:]]* ::g') -Wformat -fPIC %{_hardening_ldflags}" \ %set_build_flags %__meson --prefix=%{_builddir}/dpdk-build \ --buildtype=plain \ -Ddisable_drivers="$disable_drivers" \ -Dmachine=default \ -Dmax_ethports=128 \ -Dmax_numa_nodes=8 \ -Dtests=false \ %{_vpath_builddir} %meson_build %__meson install -C %{_vpath_builddir} --no-rebuild # FIXME currently with LTO enabled OVS tries to link with both static and shared libraries rm -v %{_builddir}/dpdk-build/%{_lib}/*.so* # Generate a list of supported drivers, its hard to tell otherwise. cat << EOF > README.DPDK-PMDS DPDK drivers included in this package: EOF for f in %{_builddir}/dpdk-build/%{_lib}/librte_net_*.a; do basename ${f} | cut -c12- | cut -d. -f1 | tr [:lower:] [:upper:] done >> README.DPDK-PMDS cat << EOF >> README.DPDK-PMDS For further information about the drivers, see http://dpdk.org/doc/guides-%{dpdksver}/nics/index.html EOF cd - %endif # build dpdk # And now for OVS... mkdir build-shared build-static pushd build-shared ln -s ../configure %configure \ %if %{with libcapng} --enable-libcapng \ %else --disable-libcapng \ %endif --disable-static \ --enable-shared \ --enable-ssl \ --with-pkidir=%{_sharedstatedir}/openvswitch/pki make %{?_smp_mflags} popd pushd build-static ln -s ../configure %ifarch %{dpdkarches} PKG_CONFIG_PATH=%{_builddir}/dpdk-build/%{_lib}/pkgconfig \ %endif %configure \ %if %{with libcapng} --enable-libcapng \ %else --disable-libcapng \ %endif --enable-ssl \ %ifarch %{dpdkarches} --with-dpdk=static \ %endif --with-pkidir=%{_sharedstatedir}/openvswitch/pki make %{?_smp_mflags} popd /usr/bin/python3 build-aux/dpdkstrip.py \ --dpdk \ < rhel/usr_lib_systemd_system_ovs-vswitchd.service.in \ > rhel/usr_lib_systemd_system_ovs-vswitchd.service %install rm -rf $RPM_BUILD_ROOT make -C build-shared install-libLTLIBRARIES DESTDIR=$RPM_BUILD_ROOT make -C build-static install DESTDIR=$RPM_BUILD_ROOT install -d -m 0755 $RPM_BUILD_ROOT%{_rundir}/openvswitch install -d -m 0750 $RPM_BUILD_ROOT%{_localstatedir}/log/openvswitch install -d -m 0755 $RPM_BUILD_ROOT%{_sysconfdir}/openvswitch install -p -D -m 0644 rhel/usr_lib_udev_rules.d_91-vfio.rules \ $RPM_BUILD_ROOT%{_udevrulesdir}/91-vfio.rules install -p -D -m 0644 \ rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template \ $RPM_BUILD_ROOT/%{_sysconfdir}/sysconfig/openvswitch for service in openvswitch ovsdb-server ovs-vswitchd \ ovs-delete-transient-ports; do install -p -D -m 0644 \ rhel/usr_lib_systemd_system_${service}.service \ $RPM_BUILD_ROOT%{_unitdir}/${service}.service done %if %{with ipsec} install -p -D -m 0644 rhel/usr_lib_systemd_system_openvswitch-ipsec.service \ $RPM_BUILD_ROOT%{_unitdir}/openvswitch-ipsec.service %endif install -m 0755 rhel/etc_init.d_openvswitch \ $RPM_BUILD_ROOT%{_datadir}/openvswitch/scripts/openvswitch.init install -p -D -m 0644 rhel/etc_openvswitch_default.conf \ $RPM_BUILD_ROOT/%{_sysconfdir}/openvswitch/default.conf install -p -D -m 0644 rhel/etc_logrotate.d_openvswitch \ $RPM_BUILD_ROOT/%{_sysconfdir}/logrotate.d/openvswitch install -m 0644 vswitchd/vswitch.ovsschema \ $RPM_BUILD_ROOT/%{_datadir}/openvswitch/vswitch.ovsschema install -d -m 0755 $RPM_BUILD_ROOT/%{_sysconfdir}/sysconfig/network-scripts/ install -p -m 0755 rhel/etc_sysconfig_network-scripts_ifdown-ovs \ $RPM_BUILD_ROOT/%{_sysconfdir}/sysconfig/network-scripts/ifdown-ovs install -p -m 0755 rhel/etc_sysconfig_network-scripts_ifup-ovs \ $RPM_BUILD_ROOT/%{_sysconfdir}/sysconfig/network-scripts/ifup-ovs install -d -m 0755 $RPM_BUILD_ROOT%{python3_sitelib} cp -a $RPM_BUILD_ROOT/%{_datadir}/openvswitch/python/ovstest \ $RPM_BUILD_ROOT%{python3_sitelib} # Build the JSON C extension for the Python lib (#1417738) pushd python ( export CPPFLAGS="-I ../include -I ../build-shared/include" export LDFLAGS="%{__global_ldflags} -L $RPM_BUILD_ROOT%{_libdir}" %py3_build %py3_install [ -f "$RPM_BUILD_ROOT/%{python3_sitearch}/ovs/_json$(python3-config --extension-suffix)" ] ) popd rm -rf $RPM_BUILD_ROOT/%{_datadir}/openvswitch/python/ install -d -m 0755 $RPM_BUILD_ROOT/%{_sharedstatedir}/openvswitch install -d -m 0755 $RPM_BUILD_ROOT%{_prefix}/lib/firewalld/services/ install -p -D -m 0755 \ rhel/usr_share_openvswitch_scripts_ovs-systemd-reload \ $RPM_BUILD_ROOT%{_datadir}/openvswitch/scripts/ovs-systemd-reload touch $RPM_BUILD_ROOT%{_sysconfdir}/openvswitch/conf.db # The db needs special permission as IPsec Pre-shared keys are stored in it. chmod 0640 $RPM_BUILD_ROOT%{_sysconfdir}/openvswitch/conf.db touch $RPM_BUILD_ROOT%{_sysconfdir}/openvswitch/system-id.conf # remove unpackaged files rm -f $RPM_BUILD_ROOT/%{_bindir}/ovs-benchmark \ $RPM_BUILD_ROOT/%{_bindir}/ovs-docker \ $RPM_BUILD_ROOT/%{_bindir}/ovs-parse-backtrace \ $RPM_BUILD_ROOT/%{_bindir}/ovs-testcontroller \ $RPM_BUILD_ROOT/%{_sbindir}/ovs-vlan-bug-workaround \ $RPM_BUILD_ROOT/%{_mandir}/man1/ovs-benchmark.1* \ $RPM_BUILD_ROOT/%{_mandir}/man8/ovs-testcontroller.* \ $RPM_BUILD_ROOT/%{_mandir}/man8/ovs-vlan-bug-workaround.8* %if ! %{with ipsec} rm -f $RPM_BUILD_ROOT/%{_datadir}/openvswitch/scripts/ovs-monitor-ipsec %endif # remove ovn unpackages files rm -f $RPM_BUILD_ROOT%{_bindir}/ovn* rm -f $RPM_BUILD_ROOT%{_mandir}/man1/ovn* rm -f $RPM_BUILD_ROOT%{_mandir}/man5/ovn* rm -f $RPM_BUILD_ROOT%{_mandir}/man7/ovn* rm -f $RPM_BUILD_ROOT%{_mandir}/man8/ovn* rm -f $RPM_BUILD_ROOT%{_datadir}/openvswitch/ovn* rm -f $RPM_BUILD_ROOT%{_datadir}/openvswitch/scripts/ovn* rm -f $RPM_BUILD_ROOT%{_includedir}/ovn/* %check %if %{with check} pushd build-static touch resolv.conf export OVS_RESOLV_CONF=$(pwd)/resolv.conf if make check TESTSUITEFLAGS='%{_smp_mflags}' || make check TESTSUITEFLAGS='--recheck'; then :; else cat tests/testsuite.log exit 1 fi popd %endif %if %{with check_datapath_kernel} pushd build-static if make check-kernel RECHECK=yes; then :; else cat tests/system-kmod-testsuite.log exit 1 fi popd %endif %clean rm -rf $RPM_BUILD_ROOT %preun %if 0%{?systemd_preun:1} %systemd_preun openvswitch.service %else if [ $1 -eq 0 ] ; then # Package removal, not upgrade /bin/systemctl --no-reload disable openvswitch.service >/dev/null 2>&1 || : /bin/systemctl stop openvswitch.service >/dev/null 2>&1 || : fi %endif %pre getent group openvswitch >/dev/null || groupadd -r openvswitch getent passwd openvswitch >/dev/null || \ useradd -r -g openvswitch -d / -s /sbin/nologin \ -c "Open vSwitch Daemons" openvswitch %ifarch %{dpdkarches} getent group hugetlbfs >/dev/null || groupadd hugetlbfs usermod -a -G hugetlbfs openvswitch %endif exit 0 %post if [ $1 -eq 1 ]; then sed -i 's:^#OVS_USER_ID=:OVS_USER_ID=:' /etc/sysconfig/openvswitch %ifarch %{dpdkarches} sed -i \ 's@OVS_USER_ID="openvswitch:openvswitch"@OVS_USER_ID="openvswitch:hugetlbfs"@'\ /etc/sysconfig/openvswitch %endif fi chown -R openvswitch:openvswitch /etc/openvswitch %if 0%{?systemd_post:1} %systemd_post openvswitch.service %else # Package install, not upgrade if [ $1 -eq 1 ]; then /bin/systemctl daemon-reload >dev/null || : fi %endif %postun %if 0%{?systemd_postun:1} %systemd_postun openvswitch.service %else /bin/systemctl daemon-reload >/dev/null 2>&1 || : %endif %triggerun -- openvswitch < 2.5.0-22.git20160727%{?dist} # old rpm versions restart the service in postun, but # due to systemd some preparation is needed. if systemctl is-active openvswitch >/dev/null 2>&1 ; then /usr/share/openvswitch/scripts/ovs-ctl stop >/dev/null 2>&1 || : systemctl daemon-reload >/dev/null 2>&1 || : systemctl stop openvswitch ovsdb-server ovs-vswitchd >/dev/null 2>&1 || : systemctl start openvswitch >/dev/null 2>&1 || : fi exit 0 %files -n python3-%{pkgname} %{python3_sitearch}/ovs %{python3_sitearch}/ovs-*.egg-info %doc LICENSE %files test %{_bindir}/ovs-pcap %{_bindir}/ovs-tcpdump %{_bindir}/ovs-tcpundump %{_mandir}/man1/ovs-pcap.1* %{_mandir}/man8/ovs-tcpdump.8* %{_mandir}/man1/ovs-tcpundump.1* %{_bindir}/ovs-test %{_bindir}/ovs-vlan-test %{_bindir}/ovs-l3ping %{_mandir}/man8/ovs-test.8* %{_mandir}/man8/ovs-vlan-test.8* %{_mandir}/man8/ovs-l3ping.8* %{python3_sitelib}/ovstest %files devel %{_libdir}/*.so %{_libdir}/pkgconfig/*.pc %{_includedir}/openvswitch/* %{_includedir}/openflow/* %exclude %{_libdir}/*.a %exclude %{_libdir}/*.la %if 0%{?rhel} > 7 || 0%{?fedora} > 28 %files -n network-scripts-%{name} %{_sysconfdir}/sysconfig/network-scripts/ifup-ovs %{_sysconfdir}/sysconfig/network-scripts/ifdown-ovs %endif %files %defattr(-,openvswitch,openvswitch) %dir %{_sysconfdir}/openvswitch %{_sysconfdir}/openvswitch/default.conf %config %ghost %verify(not owner group md5 size mtime) %{_sysconfdir}/openvswitch/conf.db %ghost %attr(0600,-,-) %verify(not owner group md5 size mtime) %{_sysconfdir}/openvswitch/.conf.db.~lock~ %config %ghost %{_sysconfdir}/openvswitch/system-id.conf %defattr(-,root,root) %config(noreplace) %verify(not md5 size mtime) %{_sysconfdir}/sysconfig/openvswitch %{_sysconfdir}/bash_completion.d/ovs-appctl-bashcomp.bash %{_sysconfdir}/bash_completion.d/ovs-vsctl-bashcomp.bash %config(noreplace) %{_sysconfdir}/logrotate.d/openvswitch %{_unitdir}/openvswitch.service %{_unitdir}/ovsdb-server.service %{_unitdir}/ovs-vswitchd.service %{_unitdir}/ovs-delete-transient-ports.service %{_datadir}/openvswitch/scripts/openvswitch.init %{_datadir}/openvswitch/scripts/ovs-check-dead-ifs %{_datadir}/openvswitch/scripts/ovs-lib %{_datadir}/openvswitch/scripts/ovs-save %{_datadir}/openvswitch/scripts/ovs-vtep %{_datadir}/openvswitch/scripts/ovs-ctl %{_datadir}/openvswitch/scripts/ovs-kmod-ctl %{_datadir}/openvswitch/scripts/ovs-systemd-reload %config %{_datadir}/openvswitch/vswitch.ovsschema %config %{_datadir}/openvswitch/vtep.ovsschema %{_bindir}/ovs-appctl %{_bindir}/ovs-dpctl %{_bindir}/ovs-ofctl %{_bindir}/ovs-vsctl %{_bindir}/ovsdb-client %{_bindir}/ovsdb-tool %{_bindir}/ovs-pki %{_bindir}/vtep-ctl %{_libdir}/*.so.* %{_sbindir}/ovs-vswitchd %{_sbindir}/ovsdb-server %{_mandir}/man1/ovsdb-client.1* %{_mandir}/man1/ovsdb-server.1* %{_mandir}/man1/ovsdb-tool.1* %{_mandir}/man5/ovsdb.5* %{_mandir}/man5/ovsdb-server.5.* %{_mandir}/man5/ovs-vswitchd.conf.db.5* %{_mandir}/man5/vtep.5* %{_mandir}/man7/ovsdb-server.7* %{_mandir}/man7/ovsdb.7* %{_mandir}/man7/ovs-actions.7* %{_mandir}/man7/ovs-fields.7* %{_mandir}/man8/vtep-ctl.8* %{_mandir}/man8/ovs-appctl.8* %{_mandir}/man8/ovs-ctl.8* %{_mandir}/man8/ovs-dpctl.8* %{_mandir}/man8/ovs-kmod-ctl.8.* %{_mandir}/man8/ovs-ofctl.8* %{_mandir}/man8/ovs-pki.8* %{_mandir}/man8/ovs-vsctl.8* %{_mandir}/man8/ovs-vswitchd.8* %{_mandir}/man8/ovs-parse-backtrace.8* %{_udevrulesdir}/91-vfio.rules %doc LICENSE NOTICE README.rst NEWS rhel/README.RHEL.rst %ifarch %{dpdkarches} %doc %{dpdkdir}/README.DPDK-PMDS %attr(750,openvswitch,hugetlbfs) %verify(not owner group) /var/log/openvswitch %else %attr(750,openvswitch,openvswitch) %verify(not owner group) /var/log/openvswitch %endif /var/lib/openvswitch %ghost %attr(755,root,root) %verify(not owner group) %{_rundir}/openvswitch %{_datadir}/openvswitch/bugtool-plugins/ %{_datadir}/openvswitch/scripts/ovs-bugtool-* %{_bindir}/ovs-dpctl-top %{_sbindir}/ovs-bugtool %{_mandir}/man8/ovs-dpctl-top.8* %{_mandir}/man8/ovs-bugtool.8* %if (0%{?rhel} && 0%{?rhel} <= 7) || (0%{?fedora} && 0%{?fedora} < 29) %{_sysconfdir}/sysconfig/network-scripts/ifup-ovs %{_sysconfdir}/sysconfig/network-scripts/ifdown-ovs %endif %if %{with ipsec} %files ipsec %{_datadir}/openvswitch/scripts/ovs-monitor-ipsec %{_unitdir}/openvswitch-ipsec.service %endif %changelog * Mon Feb 07 2022 Michael Santana - 2.16.0-48 - Merging upstream branch-2.16 [RH git: 9d51785142] Commit list: 1ec567a752 ci: Install wheel before installing any other python packages. 031a99cef0 odp-util: Fix tunnel key attr for GTP-U. 558699c73c ovsdb-idl: Only process successful txn in ovsdb_idl_loop_run. * Wed Feb 02 2022 Open vSwitch CI - 2.16.0-47 - Merging upstream branch-2.16 [RH git: 6e6f66ffd0] Commit list: 0276bdb30a ofproto-dpif-upcall: Fix n_revalidators on upcall show. * Wed Feb 02 2022 Open vSwitch CI - 2.16.0-46 - Merging upstream branch-2.16 [RH git: 513117cbb0] Commit list: 16575362dc acinclude: Detect avx512 vpopcntdq compiler support. * Tue Feb 01 2022 Ilya Maximets - 2.16.0-45 - ovsdb: transaction: Keep one entry in the transaction history. [RH git: 7665f42d12] (#2044621) commit 6e13565dd32fb2cf5517f51ca06956e2052c4bba Author: Ilya Maximets Date: Sun Dec 19 15:09:38 2021 +0100 ovsdb: transaction: Keep one entry in the transaction history. If a single transaction exceeds the size of the whole database (e.g., a lot of rows got removed and new ones added), transaction history will be drained. This leads to sending UUID_ZERO to the clients as the last transaction id in the next monitor update, because monitor doesn't know what was the actual last transaction id. In case of a re-connect that will cause re-downloading of the whole database, since the client's last_id will be out of sync. One solution would be to store the last transaction ID separately from the actual transactions, but that will require a careful management in cases where database gets reset and the history needs to be cleared. Keeping the one last transaction instead to avoid the problem. That should not be a big concern in terms of memory consumption, because this last transaction will be removed from the history once the next transaction appeared. This is also not a concern for a fast re-sync, because this last transaction will not be used for the monitor reply; it's either client already has it, so no need to send, or it's a history miss. The test updated to not check the number of atoms if there is only one transaction in the history. Fixes: 317b1bfd7dd3 ("ovsdb: Don't let transaction history grow larger than the database.") Acked-by: Mike Pattrick Acked-by: Han Zhou Signed-off-by: Ilya Maximets Reported-at: https://bugzilla.redhat.com/2044621 Signed-off-by: Ilya Maximets * Mon Jan 31 2022 Open vSwitch CI - 2.16.0-44 - Merging upstream branch-2.16 [RH git: d202cd6da1] Commit list: 34c830c540 ovsdb-idl: ovsdb_idl_loop_destroy must also destroy the committing txn. 13009736b2 ovsdb-cs: Clear last_id on reconnect if condition changes in-flight. 017e2ae50e ofp-flow: Skip flow reply if it exceeds the maximum message size. e0c6f92a95 ovsdb-cs: Fix ignoring of the last id from the initial monitor reply. (#2044624) * Fri Jan 28 2022 Ilya Maximets - 2.16.0-43 - ovsdb: storage: Randomize should_snapshot checks when the minimum time passed. [RH git: abe61535ca] (#2044614) commit 339f97044e3c2312fbb65b932fa14a181acf40d5 Author: Ilya Maximets Date: Mon Dec 13 16:43:33 2021 +0100 ovsdb: storage: Randomize should_snapshot checks when the minimum time passed. Snapshots are scheduled for every 10-20 minutes. It's a random value in this interval for each server. Once the time is up, but the maximum time (24 hours) not reached yet, ovsdb will start checking if the log grew a lot on every iteration. Once the growth is detected, compaction is triggered. OTOH, it's very common for an OVSDB cluster to not have the log growing very fast. If the log didn't grow 2x in 20 minutes, the randomness of the initial scheduled time is gone and all the servers are checking if they need to create snapshot on every iteration. And since all of them are part of the same cluster, their logs are growing with the same speed. Once the critical mass is reached, all the servers will start creating snapshots at the same time. If the database is big enough, that might leave the cluster unresponsive for an extended period of time (e.g. 10-15 seconds for OVN_Southbound database in a larger scale OVN deployment) until the compaction completed. Fix that by re-scheduling a quick retry if the minimal time already passed. Effectively, this will work as a randomized 1-2 min delay between checks, so the servers will not synchronize. Scheduling function updated to not change the upper limit on quick reschedules to avoid delaying the snapshot creation indefinitely. Currently quick re-schedules are only used for the error cases, and there is always a 'slow' re-schedule after the successful compaction. So, the change of a scheduling function doesn't change the current behavior much. Signed-off-by: Ilya Maximets Acked-by: Han Zhou Acked-by: Dumitru Ceara Reported-at: https://bugzilla.redhat.com/2044614 Signed-off-by: Ilya Maximets * Fri Jan 28 2022 Ilya Maximets - 2.16.0-42 - raft: Only allow followers to snapshot. [RH git: 915efc8c00] (#2044614) commit bf07cc9cdb2f37fede8c0363937f1eb9f4cfd730 Author: Dumitru Ceara Date: Mon Dec 13 20:46:03 2021 +0100 raft: Only allow followers to snapshot. Commit 3c2d6274bcee ("raft: Transfer leadership before creating snapshots.") made it such that raft leaders transfer leadership before snapshotting. However, there's still the case when the next leader to be is in the process of snapshotting. To avoid delays in that case too, we now explicitly allow snapshots only on followers. Cluster members will have to wait until the current election is settled before snapshotting. Given the following logs taken from an OVN_Southbound 3-server cluster during a scale test: S1 (old leader): 19:07:51.226Z|raft|INFO|Transferring leadership to write a snapshot. 19:08:03.830Z|ovsdb|INFO|OVN_Southbound: Database compaction took 12601ms 19:08:03.940Z|raft|INFO|server 8b8d is leader for term 43 S2 (follower): 19:08:00.870Z|raft|INFO|server 8b8d is leader for term 43 S3 (new leader): 19:07:51.242Z|raft|INFO|received leadership transfer from f5c9 in term 42 19:07:51.244Z|raft|INFO|term 43: starting election 19:08:00.805Z|ovsdb|INFO|OVN_Southbound: Database compaction took 9559ms 19:08:00.869Z|raft|INFO|term 43: elected leader by 2+ of 3 servers We see that the leader to be (S3) receives the leadership transfer, initiates the election and immediately after starts a snapshot that takes ~9.5 seconds. During this time, S2 votes for S3 electing it as cluster leader but S3 doesn't effectively become leader until it finishes snapshotting, essentially keeping the cluster without a leader for up to ~9.5 seconds. With the current change, S3 will delay compaction and snapshotting until the election is finished. The only exception is the case of single-node clusters for which we allow the node to snapshot regardless of role. Acked-by: Han Zhou Signed-off-by: Dumitru Ceara Signed-off-by: Ilya Maximets Reported-at: https://bugzilla.redhat.com/2044614 Signed-off-by: Ilya Maximets * Wed Jan 26 2022 Open vSwitch CI - 2.16.0-41 - Merging upstream branch-2.16 [RH git: f1ca7b8ac3] Commit list: 2571b1a464 ofproto-dpif: Fix issue with non-reversible actions on a patch ports. * Fri Jan 21 2022 Open vSwitch CI - 2.16.0-40 - Merging upstream branch-2.16 [RH git: 60b19f443c] Commit list: 07a115f7d9 ovs-monitor-ipsec: Fix generated strongSwan ipsec.conf for IPv6. * Thu Jan 20 2022 Open vSwitch CI - 2.16.0-39 - Merging upstream branch-2.16 [RH git: 349d687673] Commit list: f2ee013f73 datapath-windows: Pickup Ct tuple as CT lookup key in function OvsCtSetupLookupCtx * Tue Jan 18 2022 Open vSwitch CI - 2.16.0-38 - Merging upstream branch-2.16 [RH git: e370e283cf] Commit list: bd8ebcd10c Documentation: Fix Rx/Tx queue configuration section. * Mon Jan 17 2022 Open vSwitch CI - 2.16.0-37 - Merging upstream branch-2.16 [RH git: c9297f5ef7] Commit list: 29936a853f ofproto-dpif: Fix memory leak in dpif/show-dp-features appctl. * Thu Jan 13 2022 Open vSwitch CI - 2.16.0-36 - Merging upstream branch-2.16 [RH git: edae801e00] Commit list: ba7fffb832 dpif-netdev: Improve loading of packet data for undersized packets. * Sat Dec 18 2021 Open vSwitch CI - 2.16.0-35 - Merging upstream branch-2.16 [RH git: 6ad0375ff5] Commit list: 2595b7b3d1 Prepare for 2.16.3. 6caaae525c Set release date for 2.16.2. 443e3657d7 ofproto-dpif-xlate: Snoop ingress packets and update neigh cache if needed. 75d2ef9a60 tnl-neigh-cache: Do not refresh the entry while revalidating. 5d88836566 tnl-neigh-cache: Read/write expires atomically. fb42c99c15 dpif-netdev: Improve handling of IP/TCP in avx512 mfex. * Thu Dec 09 2021 Open vSwitch CI - 2.16.0-34 - Merging upstream branch-2.16 [RH git: 07b9bf085a] Commit list: f42c484445 compat: handle NF_REPEAT error on nf_conntrack_in. * Mon Dec 06 2021 Open vSwitch CI - 2.16.0-33 - Merging upstream branch-2.16 [RH git: 8708b55152] Commit list: 3e527f21cf flow: Consider dataofs when parsing TCP packets. b537e049ad tests/flowgen: Fix packet data endianness. 35244b4980 ofproto: Fix resource usage explosion due to removal of large number of flows. a201297639 ofproto: Fix resource usage explosion while processing bundled FLOW_MOD. cd0133402c tests/flowgen: Fix length field of 802.2 data link header. 2d65b8ffd2 ovs-lib: Backup and remove existing DB when joining cluster. ab01177637 docs/dpdk: Fix install doc. 38a2129524 ovs-save: Save igmp flows in ofp_parse syntax. dc77857ce2 faq: Update OVS/DPDK version table for OVS 2.13/2.14. * Thu Nov 18 2021 Open vSwitch CI - 2.16.0-32 - Merging upstream branch-2.16 [RH git: e90e06a818] Commit list: 1d8e0f861f ofproto-dpif-xlate: Fix check_pkt_larger incomplete translation. * Mon Nov 15 2021 Open vSwitch CI - 2.16.0-31 - Merging upstream branch-2.16 [RH git: 77a249d38b] Commit list: f8f2f7c9cb datapath-windows: Reset flow key after Ipv4 fragments are reassembled * Wed Nov 10 2021 Timothy Redaelli - 2.16.0-30 - python: Replace pyOpenSSL with ssl. [RH git: 0cd5867531] (#1988429) Currently, pyOpenSSL is half-deprecated upstream and so it's removed on some distributions (for example on CentOS Stream 9, https://issues.redhat.com/browse/CS-336), but since OVS only supports Python 3 it's possible to replace pyOpenSSL with "import ssl" included in base Python 3. Stream recv and send had to be splitted as _recv and _send, since SSLError is a subclass of socket.error and so it was not possible to except for SSLWantReadError and SSLWantWriteError in recv and send of SSLStream. TCPstream._open cannot be used in SSLStream, since Python ssl module requires the SSL socket to be created before connecting it, so SSLStream._open needs to create the socket, create SSL socket and then connect the SSL socket. Reported-by: Timothy Redaelli Reported-at: https://bugzilla.redhat.com/1988429 Signed-off-by: Timothy Redaelli Acked-by: Terry Wilson Tested-by: Terry Wilson Signed-off-by: Ilya Maximets Signed-off-by: Timothy Redaelli * Wed Nov 10 2021 Timothy Redaelli - 2.16.0-29 - python: socket-util: Split inet_open_active function and use connect_ex. [RH git: 2e704b371c] In an upcoming patch, PyOpenSSL will be replaced with Python ssl module, but in order to do an async connection with Python ssl module the ssl socket must be created when the socket is created, but before the socket is connected. So, inet_open_active function is splitted in 3 parts: - inet_create_socket_active: creates the socket and returns the family and the socket, or (error, None) if some error needs to be returned. - inet_connect_active: connect the socket and returns the errno (it returns 0 if errno is EINPROGRESS or EWOULDBLOCK). connect is replaced by connect_ex, since Python suggest to use it for asynchronous connects and it's also cleaner since inet_connect_active returns errno that connect_ex already returns, moreover due to a Python limitation connect cannot not be used with ssl module. inet_open_active function is changed in order to use the new functions inet_create_socket_active and inet_connect_active. Signed-off-by: Timothy Redaelli Acked-by: Terry Wilson Tested-by: Terry Wilson Signed-off-by: Ilya Maximets Signed-off-by: Timothy Redaelli * Wed Nov 10 2021 Timothy Redaelli - 2.16.0-28 - redhat: remove mlx4 support [RH git: 4c846afd24] (#1998122) Resolves: #1998122 * Tue Nov 09 2021 Ilya Maximets - 2.16.0-27 - ovsdb: Don't let transaction history grow larger than the database. [RH git: 93d1fa0bdf] (#2012949) commit 317b1bfd7dd315e241c158e6d4095002ff391ee3 Author: Ilya Maximets Date: Tue Sep 28 13:17:21 2021 +0200 ovsdb: Don't let transaction history grow larger than the database. If user frequently changes a lot of rows in a database, transaction history could grow way larger than the database itself. This wastes a lot of memory and also makes monitor_cond_since slower than usual monotor_cond if the transaction id is old enough, because re-construction of the changes from a history is slower than just creation of initial database snapshot. This is also the case if user deleted a lot of data, so transaction history still holds all of it while the database itself doesn't. In case of current lb-per-service model in ovn-kubernetes, each load-balancer is added to every logical switch/router. Such a transaction touches more than a half of a OVN_Northbound database. And each of these transactions is added to the transaction history. Since transaction history depth is 100, in worst case scenario, it will hold 100 copies of a database increasing memory consumption dramatically. In tests with 3000 LBs and 120 LSs, memory goes up to 3 GB, while holding at 30 MB if transaction history disabled in the code. Fixing that by keeping count of the number of ovsdb_atom's in the database and not allowing the total number of atoms in transaction history to grow larger than this value. Counting atoms is fairly cheap because we don't need to iterate over them, so it doesn't have significant performance impact. It would be ideal to measure the size of individual atoms, but that will hit the performance. Counting cells instead of atoms is not sufficient, because OVN users are adding hundreds or thousands of atoms to a single cell, so they are largely different in size. Signed-off-by: Ilya Maximets Acked-by: Han Zhou Acked-by: Dumitru Ceara Reported-at: https://bugzilla.redhat.com/2012949 Signed-off-by: Ilya Maximets * Tue Nov 09 2021 Ilya Maximets - 2.16.0-26 - ovsdb: transaction: Incremental reassessment of weak refs. [RH git: e8a363db49] (#2005958) commit 4dbff9f0a68579241ac1a040726be3906afb8fe9 Author: Ilya Maximets Date: Sat Oct 16 03:20:23 2021 +0200 ovsdb: transaction: Incremental reassessment of weak refs. The main idea is to not store list of weak references in the source row, so they all don't need to be re-checked/updated on every modification of that source row. The point is that source row already knows UUIDs of all destination rows stored in the data, so there is no much profit in storing this information somewhere else. If needed, destination row can be looked up and reference can be looked up in the destination row. For the fast lookup, destination row now stores references in a hash map. Weak reference structure now contains the table and uuid of a source row instead of a direct pointer. This allows to replace/update the source row without breaking any weak references stored in destination rows. Structure also now contains the key-value pair of atoms that triggered creation of this reference. These atoms can be used to quickly subtract removed references from a source row. During reassessment, ovsdb now only needs to care about new added or removed atoms, and atoms that got removed due to removal of the destination rows, but these are marked for reassessment by the destination row. ovsdb_datum_subtract() is used to remove atoms that points to removed or incorrect rows, so there is no need to re-sort datum in the end. Results of an OVN load-balancer benchmark that adds 3K load-balancers to each of 120 logical switches and 120 logical routers in the OVN sandbox with clustered Northbound database and then removes them: Before: %CPU CPU Time CMD 86.8 00:16:05 ovsdb-server nb1.db 44.1 00:08:11 ovsdb-server nb2.db 43.2 00:08:00 ovsdb-server nb3.db After: %CPU CPU Time CMD 54.9 00:02:58 ovsdb-server nb1.db 33.3 00:01:48 ovsdb-server nb2.db 32.2 00:01:44 ovsdb-server nb3.db So, on a cluster leader the processing time dropped by 5.4x, on followers - by 4.5x. More load-balancers - larger the performance difference. There is a slight increase of memory usage, because new reference structure is larger, but the difference is not significant. Signed-off-by: Ilya Maximets Acked-by: Dumitru Ceara Reported-at: https://bugzilla.redhat.com/2005958 Signed-off-by: Ilya Maximets * Thu Oct 28 2021 Open vSwitch CI - 2.16.0-25 - Merging upstream branch-2.16 [RH git: f5366890c5] Commit list: c221c8e613 datapath-windows:Reset PseudoChecksum value only for TX direction offload case * Wed Oct 27 2021 Open vSwitch CI - 2.16.0-24 - Merging upstream branch-2.16 [RH git: 4682b76694] Commit list: b79f0369f2 ci: Make linux-prepare trust system installs. * Mon Oct 25 2021 Open vSwitch CI - 2.16.0-23 - Merging upstream branch-2.16 [RH git: cce913794e] Commit list: 2a4c87f300 Prepare for 2.16.2. aaa1439b8e Set release date for 2.16.1. * Thu Oct 21 2021 Open vSwitch CI - 2.16.0-22 - Merging upstream branch-2.16 [RH git: 29f01c4fdb] Commit list: 108176ab5a github: Stick to python 3.9. * Tue Oct 19 2021 Open vSwitch CI - 2.16.0-21 - Merging upstream branch-2.16 [RH git: 2546fa9646] Commit list: 5c5e34603b datapath-windows: add layers when adding the deferred actions * Thu Oct 14 2021 Open vSwitch CI - 2.16.0-20 - Merging upstream branch-2.16 [RH git: d572c95f69] Commit list: 458a4f75f3 ofproto-dpif-xlate: Fix zone set from non-frozen-metadata fields. * Wed Oct 13 2021 Open vSwitch CI - 2.16.0-19 - Merging upstream branch-2.16 [RH git: 557ca689f7] Commit list: 6d8190584a dpif-netdev: Fix use-after-free on PACKET_OUT of IP fragments. 44a66cc1d0 tunnel-push-pop.at: Mask source port in tunnel header. * Tue Oct 12 2021 Open vSwitch CI - 2.16.0-18 - Merging upstream branch-2.16 [RH git: a6c4770398] Commit list: 27a5848a33 ovs-ctl: Add missing description for --ovs-vswitchd-options and --ovsdb-server-options to usage(). 0300d0c0c2 dpdk-stub: Change the ERR log to DBG. cdd6dd821d dpif-netlink: Fix feature negotiation for older kernels. c2682c42cb dpif-netdev: Fix pmd thread comments to include SMC. 9377f4a465 python: idl: Avoid sending transactions when the DB is not synced up. * Tue Oct 12 2021 Open vSwitch CI - 2.16.0-17 - Merging upstream branch-2.16 [RH git: c1145b5236] Commit list: 0fd17fbb09 ipf: release unhandled packets from the batch * Thu Sep 30 2021 Open vSwitch CI - 2.16.0-16 - Merging upstream branch-2.16 [RH git: 5c05133179] Commit list: 3f692fba98 datapath-windows:adjust Offset when processing packet in POP_VLAN action * Wed Sep 29 2021 Dumitru Ceara - 2.16.0-15 - ovsdb-data: Deduplicate string atoms. [RH git: 24e7d1140e] (#2006839) commit 429b114c5aadee24ccfb16ad7d824f45cdcea75a Author: Ilya Maximets Date: Wed Sep 22 09:28:50 2021 +0200 ovsdb-server spends a lot of time cloning atoms for various reasons, e.g. to create a diff of two rows or to clone a row to the transaction. All atoms, except for strings, contains a simple value that could be copied in efficient way, but duplicating strings every time has a significant performance impact. Introducing a new reference-counted structure 'ovsdb_atom_string' that allows to not copy strings every time, but just increase a reference counter. This change allows to increase transaction throughput in benchmarks up to 2x for standalone databases and 3x for clustered databases, i.e. number of transactions that ovsdb-server can handle per second. It also noticeably reduces memory consumption of ovsdb-server. Next step will be to consolidate this structure with json strings, so we will not need to duplicate strings while converting database objects to json and back. Signed-off-by: Ilya Maximets Acked-by: Dumitru Ceara Acked-by: Mark D. Gray Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2006839 Signed-off-by: Dumitru Ceara * Wed Sep 29 2021 Dumitru Ceara - 2.16.0-14 - ovsdb-data: Add function to apply diff in-place. [RH git: df0e4bda98] (#2006851) commit 32b51326ef9c307b4acd0bacafb0218dd1372f3d Author: Ilya Maximets Date: Thu Sep 23 01:47:24 2021 +0200 ovsdb_datum_apply_diff() is heavily used in ovsdb transactions, but it's linear in terms of number of comparisons. And it also clones all the atoms along the way. In most cases size of a diff is much smaller than the size of the original datum, this allows to perform the same operation in-place with only O(diff->n * log2(old->n)) comparisons and O(old->n + diff->n) memory copies with memcpy. Using this function while applying diffs read from the storage gives a significant performance boost and allows to execute much more transactions per second. Signed-off-by: Ilya Maximets Acked-by: Mark D. Gray Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2006851 Signed-off-by: Dumitru Ceara * Wed Sep 29 2021 Dumitru Ceara - 2.16.0-13 - ovsdb-data: Optimize subtraction of sets. [RH git: 5bace82405] (#2005483) commit bb12b63176389e516ddfefce20dfa165f24430fb Author: Ilya Maximets Date: Thu Sep 23 01:47:23 2021 +0200 Current algorithm for ovsdb_datum_subtract looks like this: for-each atom in a: if atom in b: swap(atom, ) destroy(atom) quicksort(a) Complexity: Na * log2(Nb) + (Na - Nb) * log2(Na - Nb) Search Comparisons for quicksort It's not optimal, especially because Nb << Na in a vast majority of cases. Reversing the search phase to look up atoms from 'b' in 'a', and closing gaps from deleted elements in 'a' by plain memory copy to avoid quicksort. Resulted complexity: Nb * log2(Na) + (Na - Nb) Search Memory copies Subtraction is heavily used while executing database transactions. For example, to remove one port from a logical switch in OVN. Complexity of such operation if original logical switch had 100 ports goes down from 100 * log2(1) = 100 comparisons for search and 99 * log2(99) = 656 comparisons for quicksort ------------------------------ 756 comparisons in total to only 1 * log2(100) = 7 comparisons for search + memory copy of 99 * sizeof (union ovsdb_atom) bytes. We could use memmove to close the gaps after removing atoms, but it will lead to 2 memory copies inside the call, while we can perform only one to the temporary 'result' and swap pointers. Performance in cases, where sizes of 'a' and 'b' are comparable, should not change. Cases with Nb >> Na should not happen in practice. All in all, this change allows ovsdb-server to perform several times more transactions, that removes elements from sets, per second. Signed-off-by: Ilya Maximets Acked-by: Han Zhou Acked-by: Mark D. Gray Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2005483 Signed-off-by: Dumitru Ceara * Wed Sep 29 2021 Dumitru Ceara - 2.16.0-12 - ovsdb-data: Optimize union of sets. [RH git: e2a4c7d794] (#2005483) commit 51946d22274cd591dc061358fb507056fbd91420 Author: Ilya Maximets Date: Thu Sep 23 01:47:22 2021 +0200 Current algorithm of ovsdb_datum_union looks like this: for-each atom in b: if not bin_search(a, atom): push(a, clone(atom)) quicksort(a) So, the complexity looks like this: Nb * log2(Na) + Nb + (Na + Nb) * log2(Na + Nb) Comparisons clones Comparisons for quicksort for search ovsdb_datum_union() is heavily used in database transactions while new element is added to a set. For example, if new logical switch port is added to a logical switch in OVN. This is a very common use case where CMS adds one new port to an existing switch that already has, let's say, 100 ports. For this case ovsdb-server will have to perform: 1 * log2(100) + 1 clone + 101 * log2(101) Comparisons Comparisons for for search quicksort. ~7 1 ~707 Roughly 714 comparisons of atoms and 1 clone. Since binary search can give us position, where new atom should go (it's the 'low' index after the search completion) for free, the logic can be re-worked like this: copied = 0 for-each atom in b: desired_position = bin_search(a, atom) push(result, a[ copied : desired_position - 1 ]) copied = desired_position push(result, clone(atom)) push(result, a[ copied : Na ]) swap(a, result) Complexity of this schema: Nb * log2(Na) + Nb + Na Comparisons clones memory copy on push for search 'swap' is just a swap of a few pointers. 'push' is not a 'clone', but a simple memory copy of 'union ovsdb_atom'. In general, this schema substitutes complexity of a quicksort with complexity of a memory copy of Na atom structures, where we're not even copying strings that these atoms are pointing to. Complexity in the example above goes down from 714 comparisons to 7 comparisons and memcpy of 100 * sizeof (union ovsdb_atom) bytes. General complexity of a memory copy should always be lower than complexity of a quicksort, especially because these copies usually performed in bulk, so this new schema should work faster for any input. All in all, this change allows to execute several times more transactions per second for transactions that adds new entries to sets. Alternatively, union can be implemented as a linear merge of two sorted arrays, but this will result in O(Na) comparisons, which is more than Nb * log2(Na) in common case, since Na is usually far bigger than Nb. Linear merge will also mean per-atom memory copies instead of copying in bulk. 'replace' functionality of ovsdb_datum_union() had no users, so it just removed. But it can easily be added back if needed in the future. Signed-off-by: Ilya Maximets Acked-by: Han Zhou Acked-by: Mark D. Gray Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2005483 Signed-off-by: Dumitru Ceara * Wed Sep 29 2021 Dumitru Ceara - 2.16.0-11 - ovsdb: transaction: Use diffs for strong reference counting. [RH git: 85da133eaa] (#2003203) commit b2712d026eae2d9a5150c2805310eaf506e1f162 Author: Ilya Maximets Date: Tue Sep 14 00:19:57 2021 +0200 Currently, even if one reference added to the set of strong references or removed from it, ovsdb-server will walk through the whole set and re-count references to other rows. These referenced rows will also be added to the transaction in order to re-count their references. For example, every time Logical Switch Port added to a Logical Switch, OVN Northbound database server will walk through all ports of this Logical Switch, clone their rows, and re-count references. This is not very efficient. Instead, it can only increase reference counters for added references and reduce for removed ones. In many cases this will be only one row affected in the Logical_Switch_Port table. Introducing new function that generates a diff of two datum objects, but stores added and removed atoms separately, so they can be used to increase or decrease row reference counters accordingly. This change allows to perform several times more transactions that adds or removes strong references to/from sets per second, because ovsdb-server no longer clones and re-counts rows that are irrelevant to current transaction. Acked-by: Dumitru Ceara Signed-off-by: Ilya Maximets Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2003203 Signed-off-by: Dumitru Ceara * Mon Sep 27 2021 Open vSwitch CI - 2.16.0-10 - Merging upstream branch-2.16 [RH git: 2114714012] Commit list: 547371ecdb cirrus: Reduce memory requirements for FreeBSD VMs. * Thu Sep 23 2021 Timothy Redaelli - 2.16.0-9 - redhat: use hugetlbfs group for /var/log/openvswitch when dpdk is enabled [RH git: 4e5928b671] (#2004543) Resolves: #2004543 * Thu Sep 16 2021 Open vSwitch CI - 2.16.0-8 - Merging upstream branch-2.16 [RH git: 7332b410fc] Commit list: facaf5bc71 netdev-linux: Fix a null pointer dereference in netdev_linux_notify_sock(). 6e203d4873 pcap-file: Fix memory leak in ovs_pcap_open(). f50da0b267 odp-util: Fix a null pointer dereference in odp_flow_format(). 7da752e43f odp-util: Fix a null pointer dereference in odp_nsh_key_from_attr__(). bc22b01459 netdev-dpdk: Fix RSS configuration for virtio. 81706c5d43 ipf: Fix only nat the first fragment in the reass process. * Wed Sep 08 2021 Open vSwitch CI - 2.16.0-7 - Merging upstream branch-2.16 [RH git: e71f31dfd6] Commit list: 242c280f0e dpif-netdev: Fix crash when PACKET_OUT is metered. * Tue Aug 31 2021 Ilya Maximets - 2.16.0-6 - ovsdb: monitor: Store serialized json in a json cache. [RH git: bc20330c85] (#1996152) commit 43e66fc27659af2a5c976bdd27fe747b442b5554 Author: Ilya Maximets Date: Tue Aug 24 21:00:39 2021 +0200 Same json from a json cache is typically sent to all the clients, e.g., in case of OVN deployment with ovn-monitor-all=true. There could be hundreds or thousands connected clients and ovsdb will serialize the same json object for each of them before sending. Serializing it once before storing into json cache to speed up processing. This change allows to save a lot of CPU cycles and a bit of memory since we need to store in memory only a string and not the full json object. Testing with ovn-heater on 120 nodes using density-heavy scenario shows reduction of the total CPU time used by Southbound DB processes from 256 minutes to 147. Duration of unreasonably long poll intervals also reduced dramatically from 7 to 2 seconds: Count Min Max Median Mean 95 percentile ------------------------------------------------------------- Before 1934 1012 7480 4302.5 4875.3 7034.3 After 1909 1004 2730 1453.0 1532.5 2053.6 Acked-by: Dumitru Ceara Acked-by: Han Zhou Signed-off-by: Ilya Maximets Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1996152 Signed-off-by: Ilya Maximets * Tue Aug 31 2021 Ilya Maximets - 2.16.0-5 - raft: Don't keep full json objects in memory if no longer needed. [RH git: 4606423e8b] (#1990058) commit 0de882954032aa37dc943bafd72c33324aa0c95a Author: Ilya Maximets Date: Tue Aug 24 21:00:38 2021 +0200 raft: Don't keep full json objects in memory if no longer needed. Raft log entries (and raft database snapshot) contains json objects of the data. Follower receives append requests with data that gets parsed and added to the raft log. Leader receives execution requests, parses data out of them and adds to the log. In both cases, later ovsdb-server reads the log with ovsdb_storage_read(), constructs transaction and updates the database. On followers these json objects in common case are never used again. Leader may use them to send append requests or snapshot installation requests to followers. However, all these operations (except for ovsdb_storage_read()) are just serializing the json in order to send it over the network. Json objects are significantly larger than their serialized string representation. For example, the snapshot of the database from one of the ovn-heater scale tests takes 270 MB as a string, but 1.6 GB as a json object from the total 3.8 GB consumed by ovsdb-server process. ovsdb_storage_read() for a given raft entry happens only once in a lifetime, so after this call, we can serialize the json object, store the string representation and free the actual json object that ovsdb will never need again. This can save a lot of memory and can also save serialization time, because each raft entry for append requests and snapshot installation requests serialized only once instead of doing that every time such request needs to be sent. JSON_SERIALIZED_OBJECT can be used in order to seamlessly integrate pre-serialized data into raft_header and similar json objects. One major special case is creation of a database snapshot. Snapshot installation request received over the network will be parsed and read by ovsdb-server just like any other raft log entry. However, snapshots created locally with raft_store_snapshot() will never be read back, because they reflect the current state of the database, hence already applied. For this case we can free the json object right after writing snapshot on disk. Tests performed with ovn-heater on 60 node density-light scenario, where on-disk database goes up to 97 MB, shows average memory consumption of ovsdb-server Southbound DB processes decreased by 58% (from 602 MB to 256 MB per process) and peak memory consumption decreased by 40% (from 1288 MB to 771 MB). Test with 120 nodes on density-heavy scenario with 270 MB on-disk database shows 1.5 GB memory consumption decrease as expected. Also, total CPU time consumed by the Southbound DB process reduced from 296 to 256 minutes. Number of unreasonably long poll intervals reduced from 2896 down to 1934. Deserialization is also implemented just in case. I didn't see this function being invoked in practice. Acked-by: Dumitru Ceara Acked-by: Han Zhou Signed-off-by: Ilya Maximets Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1990058 Signed-off-by: Ilya Maximets * Tue Aug 31 2021 Ilya Maximets - 2.16.0-4 - json: Add support for partially serialized json objects. [RH git: 885e5ce1b5] (#1990058) commit b0bca6f27aae845c3ca8b48d66a7dbd3d978162a Author: Ilya Maximets Date: Tue Aug 24 21:00:37 2021 +0200 json: Add support for partially serialized json objects. Introducing a new json type JSON_SERIALIZED_OBJECT. It's not an actual type that can be seen in a json message on a wire, but internal type that is intended to hold a serialized version of some other json object. For this reason it's defined after the JSON_N_TYPES to not confuse parsers and other parts of the code that relies on compliance with RFC 4627. With this JSON type internal users may construct large JSON objects, parts of which are already serialized. This way, while serializing the larger object, data from JSON_SERIALIZED_OBJECT can be added directly to the result, without additional processing. This will be used by next commits to add pre-serialized JSON data to the raft_header structure, that can be converted to a JSON before writing the file transaction on disk or sending to other servers. Same technique can also be used to pre-serialize json_cache for ovsdb monitors, this should allow to not perform serialization for every client and will save some more memory. Since serialized JSON is just a string, reusing the 'json->string' pointer for it. Acked-by: Dumitru Ceara Acked-by: Han Zhou Signed-off-by: Ilya Maximets Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1990058 Signed-off-by: Ilya Maximets * Tue Aug 31 2021 Ilya Maximets - 2.16.0-3 - json: Optimize string serialization. [RH git: bb1654da63] (#1990069) commit 748010ff304b7cd2c43f4eb98a554433f0df07f9 Author: Ilya Maximets Date: Tue Aug 24 23:07:22 2021 +0200 json: Optimize string serialization. Current string serialization code puts all characters one by one. This is slow because dynamic string needs to perform length checks on every ds_put_char() and it's also doesn't allow compiler to use better memory copy operations, i.e. doesn't allow copying few bytes at once. Special symbols are rare in a typical database. Quotes are frequent, but not too frequent. In databases created by ovn-kubernetes, for example, usually there are at least 10 to 50 chars between quotes. So, it's better to count characters that doesn't require escaping and use fast data copy for the whole sequential block. Testing with a synthetic benchmark (included) on my laptop shows following performance improvement: Size Q S Before After Diff ----------------------------------------------------- 100000 0 0 : 0.227 ms 0.142 ms -37.4 % 100000 2 1 : 0.277 ms 0.186 ms -32.8 % 100000 10 1 : 0.361 ms 0.309 ms -14.4 % 10000000 0 0 : 22.720 ms 12.160 ms -46.4 % 10000000 2 1 : 27.470 ms 19.300 ms -29.7 % 10000000 10 1 : 37.950 ms 31.250 ms -17.6 % 100000000 0 0 : 239.600 ms 126.700 ms -47.1 % 100000000 2 1 : 292.400 ms 188.600 ms -35.4 % 100000000 10 1 : 387.700 ms 321.200 ms -17.1 % Here Q - probability (%) for a character to be a '\"' and S - probability (%) to be a special character ( < 32). Testing with a closer to real world scenario shows overall decrease of the time needed for database compaction by ~5-10 %. And this change also decreases CPU consumption in general, because string serialization is used in many different places including ovsdb monitors and raft. Signed-off-by: Ilya Maximets Acked-by: Numan Siddique Acked-by: Dumitru Ceara Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1990069 Signed-off-by: Ilya Maximets * Fri Aug 20 2021 Open vSwitch CI - 2.16.0-2 - Merging upstream branch-2.16 [RH git: 7d7567e339] Commit list: 0991ea8d19 Prepare for 2.16.1. * Wed Aug 18 2021 Flavio Leitner - 2.16.0-1 - redhat: First 2.16.0 release. [RH git: 0a1c4276cc]