Blame SOURCES/ovmf-whitepaper-c770f8c.txt

6009e6
Open Virtual Machine Firmware (OVMF) Status Report
6009e6
July 2014 (with updates in August 2014 - January 2015)
6009e6
6009e6
Author: Laszlo Ersek <lersek@redhat.com>
6009e6
Copyright (C) 2014-2015, Red Hat, Inc.
6009e6
CC BY-SA 4.0 <http://creativecommons.org/licenses/by-sa/4.0/>
6009e6
6009e6
Abstract
6009e6
--------
6009e6
6009e6
The Unified Extensible Firmware Interface (UEFI) is a specification that
6009e6
defines a software interface between an operating system and platform firmware.
6009e6
UEFI is designed to replace the Basic Input/Output System (BIOS) firmware
6009e6
interface.
6009e6
6009e6
Hardware platform vendors have been increasingly adopting the UEFI
6009e6
Specification to govern their boot firmware developments. OVMF (Open Virtual
6009e6
Machine Firmware), a sub-project of Intel's EFI Development Kit II (edk2),
6009e6
enables UEFI support for Ia32 and X64 Virtual Machines.
6009e6
6009e6
This paper reports on the status of the OVMF project, treats features and
6009e6
limitations, gives end-user hints, and examines some areas in-depth.
6009e6
6009e6
Keywords: ACPI, boot options, CSM, edk2, firmware, flash, fw_cfg, KVM, memory
6009e6
map, non-volatile variables, OVMF, PCD, QEMU, reset vector, S3, Secure Boot,
6009e6
Smbios, SMM, TianoCore, UEFI, VBE shim, Virtio
6009e6
6009e6
Table of Contents
6009e6
-----------------
6009e6
6009e6
- Motivation
6009e6
- Scope
6009e6
- Example qemu invocation
6009e6
- Installation of OVMF guests with virt-manager and virt-install
6009e6
- Supported guest operating systems
6009e6
- Compatibility Support Module (CSM)
6009e6
- Phases of the boot process
6009e6
- Project structure
6009e6
- Platform Configuration Database (PCD)
6009e6
- Firmware image structure
6009e6
- S3 (suspend to RAM and resume)
6009e6
- A comprehensive memory map of OVMF
6009e6
- Known Secure Boot limitations
6009e6
- Variable store and LockBox in SMRAM
6009e6
- Select features
6009e6
  - X64-specific reset vector for OVMF
6009e6
  - Client library for QEMU's firmware configuration interface
6009e6
  - Guest ACPI tables
6009e6
  - Guest SMBIOS tables
6009e6
  - Platform-specific boot policy
6009e6
  - Virtio drivers
6009e6
  - Platform Driver
6009e6
  - Video driver
6009e6
- Afterword
6009e6
6009e6
Motivation
6009e6
----------
6009e6
6009e6
OVMF extends the usual benefits of virtualization to UEFI. Reasons to use OVMF
6009e6
include:
6009e6
6009e6
- Legacy-free guests. A UEFI-based environment eliminates dependencies on
6009e6
  legacy address spaces and devices. This is especially beneficial when used
6009e6
  with physically assigned devices where the legacy operating mode is
6009e6
  troublesome to support, ex. assigned graphics cards operating in legacy-free,
6009e6
  non-VGA mode in the guest.
6009e6
6009e6
- Future proof guests. The x86 market is steadily moving towards a legacy-free
6009e6
  platform and guest operating systems may eventually require a UEFI
6009e6
  environment. OVMF provides that next generation firmware support for such
6009e6
  applications.
6009e6
6009e6
- GUID partition tables (GPTs). MBR partition tables represent partition
6009e6
  offsets and sizes with 32-bit integers, in units of 512 byte sectors. This
6009e6
  limits the addressable portion of the disk to 2 TB. GPT represents logical
6009e6
  block addresses with 64 bits.
6009e6
6009e6
- Liberating boot loader binaries from residing in contested and poorly defined
6009e6
  space between the partition table and the partitions.
6009e6
6009e6
- Support for booting off disks (eg. pass-through physical SCSI devices) with a
6009e6
  4kB physical and logical sector size, i.e. which don't have 512-byte block
6009e6
  emulation.
6009e6
6009e6
- Development and testing of Secure Boot-related features in guest operating
6009e6
  systems. Although OVMF's Secure Boot implementation is currently not secure
6009e6
  against malicious UEFI drivers, UEFI applications, and guest kernels,
6009e6
  trusted guest code that only uses standard UEFI interfaces will find a valid
6009e6
  Secure Boot environment under OVMF, with working key enrollment and signature
6009e6
  validation. This enables development and testing of portable, Secure
6009e6
  Boot-related guest code.
6009e6
6009e6
- Presence of non-volatile UEFI variables. This furthers development and
6009e6
  testing of OS installers, UEFI boot loaders, and unique, dependent guest OS
6009e6
  features. For example, an efivars-backed pstore (persistent storage)
6009e6
  file system works under Linux.
6009e6
6009e6
- Altogether, a near production-level UEFI environment for virtual machines
6009e6
  when Secure Boot is not required.
6009e6
6009e6
Scope
6009e6
-----
6009e6
6009e6
UEFI and especially Secure Boot have been topics fraught with controversy and
6009e6
political activism. This paper sidesteps these aspects and strives to focus on
6009e6
use cases, hands-on information for end users, and technical details.
6009e6
6009e6
Unless stated otherwise, the expression "X supports Y" means "X is technically
6009e6
compatible with interfaces provided or required by Y". It does not imply
6009e6
support as an activity performed by natural persons or companies.
6009e6
6009e6
We discuss the status of OVMF at a state no earlier than edk2 SVN revision
6009e6
16158. The paper concentrates on upstream projects and communities, but
6009e6
occasionally it pans out about OVMF as it is planned to be shipped (as
6009e6
Technical Preview) in Red Hat Enterprise Linux 7.1. Such digressions are marked
6009e6
with the [RHEL] margin notation.
6009e6
6009e6
Although other VMMs and accelerators are known to support (or plan to support)
6009e6
OVMF to various degrees -- for example, VirtualBox, Xen, BHyVe --, we'll
6009e6
emphasize OVMF on qemu/KVM, because QEMU and KVM have always been Red Hat's
6009e6
focus wrt. OVMF.
6009e6
6009e6
The recommended upstream QEMU version is 2.1+. The recommended host Linux
6009e6
kernel (KVM) version is 3.10+. The recommended QEMU machine type is
6009e6
"qemu-system-x86_64 -M pc-i440fx-2.1" or later.
6009e6
6009e6
The term "TianoCore" is used interchangeably with "edk2" in this paper.
6009e6
6009e6
Example qemu invocation
6009e6
-----------------------
6009e6
6009e6
The following commands give a quick foretaste of installing a UEFI operating
6009e6
system on OVMF, relying only on upstream edk2 and qemu.
6009e6
6009e6
- Clone and build OVMF:
6009e6
6009e6
  git clone https://github.com/tianocore/edk2.git
6009e6
  cd edk2
6009e6
  nice OvmfPkg/build.sh -a X64 -n $(getconf _NPROCESSORS_ONLN)
6009e6
6009e6
  (Note that this ad-hoc build will not include the Secure Boot feature.)
6009e6
6009e6
- The build output file, "OVMF.fd", includes not only the executable firmware
6009e6
  code, but the non-volatile variable store as well. For this reason, make a
6009e6
  VM-specific copy of the build output (the variable store should be private to
6009e6
  the virtual machine):
6009e6
6009e6
  cp Build/OvmfX64/DEBUG_GCC4?/FV/OVMF.fd fedora.flash
6009e6
6009e6
  (The variable store and the firmware executable are also available in the
6009e6
  build output as separate files: "OVMF_VARS.fd" and "OVMF_CODE.fd". This
6009e6
  enables central management and updates of the firmware executable, while each
6009e6
  virtual machine can retain its own variable store.)
6009e6
6009e6
- Download a Fedora LiveCD:
6009e6
6009e6
  wget https://dl.fedoraproject.org/pub/fedora/linux/releases/20/Live/x86_64/Fedora-Live-Xfce-x86_64-20-1.iso
6009e6
6009e6
- Create a virtual disk (qcow2 format, 20 GB in size):
6009e6
6009e6
  qemu-img create -f qcow2 fedora.img 20G
6009e6
6009e6
- Create the following qemu wrapper script under the name "fedora.sh":
6009e6
6009e6
  # Basic virtual machine properties: a recent i440fx machine type, KVM
6009e6
  # acceleration, 2048 MB RAM, two VCPUs.
6009e6
  OPTS="-M pc-i440fx-2.1 -enable-kvm -m 2048 -smp 2"
6009e6
6009e6
  # The OVMF binary, including the non-volatile variable store, appears as a
6009e6
  # "normal" qemu drive on the host side, and it is exposed to the guest as a
6009e6
  # persistent flash device.
6009e6
  OPTS="$OPTS -drive if=pflash,format=raw,file=fedora.flash"
6009e6
6009e6
  # The hard disk is exposed to the guest as a virtio-block device. OVMF has a
6009e6
  # driver stack that supports such a disk. We specify this disk as first boot
6009e6
  # option. OVMF recognizes the boot order specification.
6009e6
  OPTS="$OPTS -drive id=disk0,if=none,format=qcow2,file=fedora.img"
6009e6
  OPTS="$OPTS -device virtio-blk-pci,drive=disk0,bootindex=0"
6009e6
6009e6
  # The Fedora installer disk appears as an IDE CD-ROM in the guest. This is
6009e6
  # the 2nd boot option.
6009e6
  OPTS="$OPTS -drive id=cd0,if=none,format=raw,readonly"
6009e6
  OPTS="$OPTS,file=Fedora-Live-Xfce-x86_64-20-1.iso"
6009e6
  OPTS="$OPTS -device ide-cd,bus=ide.1,drive=cd0,bootindex=1"
6009e6
6009e6
  # The following setting enables S3 (suspend to RAM). OVMF supports S3
6009e6
  # suspend/resume.
6009e6
  OPTS="$OPTS -global PIIX4_PM.disable_s3=0"
6009e6
6009e6
  # OVMF emits a number of info / debug messages to the QEMU debug console, at
6009e6
  # ioport 0x402. We configure qemu so that the debug console is indeed
6009e6
  # available at that ioport. We redirect the host side of the debug console to
6009e6
  # a file.
6009e6
  OPTS="$OPTS -global isa-debugcon.iobase=0x402 -debugcon file:fedora.ovmf.log"
6009e6
6009e6
  # QEMU accepts various commands and queries from the user on the monitor
6009e6
  # interface. Connect the monitor with the qemu process's standard input and
6009e6
  # output.
6009e6
  OPTS="$OPTS -monitor stdio"
6009e6
6009e6
  # A USB tablet device in the guest allows for accurate pointer tracking
6009e6
  # between the host and the guest.
6009e6
  OPTS="$OPTS -device piix3-usb-uhci -device usb-tablet"
6009e6
6009e6
  # Provide the guest with a virtual network card (virtio-net).
6009e6
  #
6009e6
  # Normally, qemu provides the guest with a UEFI-conformant network driver
6009e6
  # from the iPXE project, in the form of a PCI expansion ROM. For this test,
6009e6
  # we disable the expansion ROM and allow OVMF's built-in virtio-net driver to
6009e6
  # take effect.
6009e6
  #
6009e6
  # On the host side, we use the SLIRP ("user") network backend, which has
6009e6
  # relatively low performance, but it doesn't require extra privileges from
6009e6
  # the user executing qemu.
6009e6
  OPTS="$OPTS -netdev id=net0,type=user"
6009e6
  OPTS="$OPTS -device virtio-net-pci,netdev=net0,romfile="
6009e6
6009e6
  # A Spice QXL GPU is recommended as the primary VGA-compatible display
6009e6
  # device. It is a full-featured virtual video card, with great operating
6009e6
  # system driver support. OVMF supports it too.
6009e6
  OPTS="$OPTS -device qxl-vga"
6009e6
6009e6
  qemu-system-x86_64 $OPTS
6009e6
6009e6
- Start the Fedora guest:
6009e6
6009e6
  sh fedora.sh
6009e6
6009e6
- The above command can be used for both installation and later boots of the
6009e6
  Fedora guest.
6009e6
6009e6
- In order to verify basic OVMF network connectivity:
6009e6
6009e6
  - Assuming that the non-privileged user running qemu belongs to group G
6009e6
    (where G is a numeric identifier), ensure as root on the host that the
6009e6
    group range in file "/proc/sys/net/ipv4/ping_group_range" includes G.
6009e6
6009e6
  - As the non-privileged user, boot the guest as usual.
6009e6
6009e6
  - On the TianoCore splash screen, press ESC.
6009e6
6009e6
  - Navigate to Boot Manager | EFI Internal Shell
6009e6
6009e6
  - In the UEFI Shell, issue the following commands:
6009e6
6009e6
    ifconfig -s eth0 dhcp
6009e6
    ping A.B.C.D
6009e6
6009e6
    where A.B.C.D is a public IPv4 address in dotted decimal notation that your
6009e6
    host can reach.
6009e6
6009e6
  - Type "quit" at the (qemu) monitor prompt.
6009e6
6009e6
Installation of OVMF guests with virt-manager and virt-install
6009e6
--------------------------------------------------------------
6009e6
6009e6
(1) Assuming OVMF has been installed on the host with the following files:
6009e6
    - /usr/share/OVMF/OVMF_CODE.fd
6009e6
    - /usr/share/OVMF/OVMF_VARS.fd
6009e6
6009e6
    locate the "nvram" stanza in "/etc/libvirt/qemu.conf", and edit it as
6009e6
    follows:
6009e6
6009e6
    nvram = [ "/usr/share/OVMF/OVMF_CODE.fd:/usr/share/OVMF/OVMF_VARS.fd" ]
6009e6
6009e6
(2) Restart libvirtd with your Linux distribution's service management tool;
6009e6
    for example,
6009e6
6009e6
    systemctl restart libvirtd
6009e6
6009e6
(3) In virt-manager, proceed with the guest installation as usual:
6009e6
    - select File | New Virtual Machine,
6009e6
    - advance to Step 5 of 5,
6009e6
    - in Step 5, check "Customize configuration before install",
6009e6
    - click Finish;
6009e6
    - in the customization dialog, select Overview | Firmware, and choose UEFI,
6009e6
    - click Apply and Begin Installation.
6009e6
6009e6
(4) With virt-install:
6009e6
6009e6
    LDR="loader=/usr/share/OVMF/OVMF_CODE.fd,loader_ro=yes,loader_type=pflash"
6009e6
    virt-install \
6009e6
      --name fedora20 \
6009e6
      --memory 2048 \
6009e6
      --vcpus 2 \
6009e6
      --os-variant fedora20 \
6009e6
      --boot hd,cdrom,$LDR \
6009e6
      --disk size=20 \
6009e6
      --disk path=Fedora-Live-Xfce-x86_64-20-1.iso,device=cdrom,bus=scsi
6009e6
6009e6
(5) A popular, distribution-independent, bleeding-edge OVMF package is
6009e6
    available under <https://www.kraxel.org/repos/>, courtesy of Gerd Hoffmann.
6009e6
6009e6
    The "edk2.git-ovmf-x64" package provides the following files, among others:
6009e6
    - /usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd
6009e6
    - /usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd
6009e6
6009e6
    When using this package, adapt steps (1) and (4) accordingly.
6009e6
6009e6
(6) Additionally, the "edk2.git-ovmf-x64" package seeks to simplify the
6009e6
    enablement of Secure Boot in a virtual machine (strictly for development
6009e6
    and testing purposes).
6009e6
6009e6
    - Boot the virtual machine off the CD-ROM image called
6009e6
      "/usr/share/edk2.git/ovmf-x64/UefiShell.iso"; before or after installing
6009e6
      the main guest operating system.
6009e6
6009e6
    - When the UEFI shell appears, issue the following commands:
6009e6
6009e6
      EnrollDefaultKeys.efi
6009e6
      reset -s
6009e6
6009e6
    - The EnrollDefaultKeys.efi utility enrolls the following keys:
6009e6
6009e6
      - A static example X.509 certificate (CN=TestCommonName) as Platform Key
6009e6
        and first Key Exchange Key.
6009e6
6009e6
        The private key matching this certificate has been destroyed (but you
6009e6
        shouldn't trust this statement).
6009e6
6009e6
      - "Microsoft Corporation KEK CA 2011" as second Key Exchange Key
6009e6
        (SHA1: 31:59:0b:fd:89:c9:d7:4e:d0:87:df:ac:66:33:4b:39:31:25:4b:30).
6009e6
6009e6
      - "Microsoft Windows Production PCA 2011" as first DB entry
6009e6
        (SHA1: 58:0a:6f:4c:c4:e4:b6:69:b9:eb:dc:1b:2b:3e:08:7b:80:d0:67:8d).
6009e6
6009e6
      - "Microsoft Corporation UEFI CA 2011" as second DB entry
6009e6
        (SHA1: 46:de:f6:3b:5c:e6:1c:f8:ba:0d:e2:e6:63:9c:10:19:d0:ed:14:f3).
6009e6
6009e6
      These keys suffice to boot released versions of popular Linux
6009e6
      distributions (through the shim.efi utility), and Windows 8 and Windows
6009e6
      Server 2012 R2, in Secure Boot mode.
6009e6
6009e6
Supported guest operating systems
6009e6
---------------------------------
6009e6
6009e6
Upstream OVMF does not favor some guest operating systems over others for
6009e6
political or ideological reasons. However, some operating systems are harder to
6009e6
obtain and/or technically more difficult to support. The general expectation is
6009e6
that recent UEFI OSes should just work. Please consult the "OvmfPkg/README"
6009e6
file.
6009e6
6009e6
The following guest OSes were tested with OVMF:
6009e6
- Red Hat Enterprise Linux 6
6009e6
- Red Hat Enterprise Linux 7
6009e6
- Fedora 18
6009e6
- Fedora 19
6009e6
- Fedora 20
6009e6
- Windows Server 2008 R2 SP1
6009e6
- Windows Server 2012
6009e6
- Windows 8
6009e6
6009e6
Notes about Windows Server 2008 R2 (paraphrasing the "OvmfPkg/README" file):
6009e6
6009e6
- QEMU should be started with one of the "-device qxl-vga" and "-device VGA"
6009e6
  options.
6009e6
6009e6
- Only one video mode, 1024x768x32, is supported at OS runtime.
6009e6
6009e6
  Please refer to the section about QemuVideoDxe (OVMF's built-in video driver)
6009e6
  for more details on this limitation.
6009e6
6009e6
- The qxl-vga video card is recommended ("-device qxl-vga"). After booting the
6009e6
  installed guest OS, select the video card in Device Manager, and upgrade the
6009e6
  video driver to the QXL XDDM one.
6009e6
6009e6
  The QXL XDDM driver can be downloaded from
6009e6
  <http://www.spice-space.org/download.html>, under Guest | Windows binaries.
6009e6
6009e6
  This driver enables additional graphics resolutions at OS runtime, and
6009e6
  provides S3 (suspend/resume) capability.
6009e6
6009e6
Notes about Windows Server 2012 and Windows 8:
6009e6
6009e6
- QEMU should be started with the "-device qxl-vga,revision=4" option (or a
6009e6
  later revision, if available).
6009e6
6009e6
- The guest OS's builtin video driver inherits the video mode / frame buffer
6009e6
  from OVMF. There's no way to change the resolution at OS runtime.
6009e6
6009e6
  For this reason, a platform driver has been developed for OVMF, which allows
6009e6
  users to change the preferred video mode in the firmware. Please refer to the
6009e6
  section about PlatformDxe for details.
6009e6
6009e6
- It is recommended to upgrade the guest OS's video driver to the QXL WDDM one,
6009e6
  via Device Manager.
6009e6
6009e6
  Binaries for the QXL WDDM driver can be found at
6009e6
  <http://people.redhat.com/~vrozenfe/qxlwddm> (pick a version greater than or
6009e6
  equal to 0.6), while the source code resides at
6009e6
  <https://github.com/vrozenfe/qxl-dod>.
6009e6
6009e6
  This driver enables additional graphics resolutions at OS runtime, and
6009e6
  provides S3 (suspend/resume) capability.
6009e6
6009e6
Compatibility Support Module (CSM)
6009e6
----------------------------------
6009e6
6009e6
Collaboration between SeaBIOS and OVMF developers has enabled SeaBIOS to be
6009e6
built as a Compatibility Support Module, and OVMF to embed and use it.
6009e6
6009e6
Benefits of a SeaBIOS CSM include:
6009e6
6009e6
- The ability to boot legacy (non-UEFI) operating systems, such as legacy Linux
6009e6
  systems, Windows 7, OpenBSD 5.2, FreeBSD 8/9, NetBSD, DragonflyBSD, Solaris
6009e6
  10/11.
6009e6
6009e6
- Legacy (non-UEFI-compliant) PCI expansion ROMs, such as a VGA BIOS, mapped by
6009e6
  QEMU in emulated devices' ROM BARs, are loaded and executed by OVMF.
6009e6
6009e6
  For example, this grants the Windows Server 2008 R2 SP1 guest's native,
6009e6
  legacy video driver access to all modes of all QEMU video cards.
6009e6
6009e6
Building the CSM target of the SeaBIOS source tree is out of scope for this
6009e6
report. Additionally, upstream OVMF does not enable the CSM by default.
6009e6
6009e6
Interested users and developers should look for OVMF's "-D CSM_ENABLE"
6009e6
build-time option, and check out the <https://www.kraxel.org/repos/> continuous
6009e6
integration repository, which provides CSM-enabled OVMF builds.
6009e6
6009e6
[RHEL] The "OVMF_CODE.fd" firmware image made available on the Red Hat
6009e6
       Enterprise Linux 7.1 host does not include a Compatibility Support
6009e6
       Module, for the following reasons:
6009e6
6009e6
       - Virtual machines running officially supported, legacy guest operating
6009e6
         systems should just use the standalone SeaBIOS firmware. Firmware
6009e6
         selection is flexible in virtualization, see eg. "Installation of OVMF
6009e6
         guests with virt-manager and virt-install" above.
6009e6
6009e6
       - The 16-bit thunking interface between OVMF and SeaBIOS is very complex
6009e6
         and presents a large debugging and support burden, based on past
6009e6
         experience.
6009e6
6009e6
       - Secure Boot is incompatible with CSM.
6009e6
6009e6
       - Inter-project dependencies should be minimized whenever possible.
6009e6
6009e6
       - Using the default QXL video card, the Windows 2008 R2 SP1 guest can be
6009e6
         installed with its built-in, legacy video driver. Said driver will
6009e6
         select the only available video mode, 1024x768x32. After installation,
6009e6
         the video driver can be upgraded to the full-featured QXL XDDM driver.
6009e6
6009e6
Phases of the boot process
6009e6
--------------------------
6009e6
6009e6
The PI and UEFI specifications, and Intel's UEFI and EDK II Learning and
6009e6
Development materials provide ample information on PI and UEFI concepts. The
6009e6
following is an absolutely minimal, rough glossary that is included only to
6009e6
help readers new to PI and UEFI understand references in later, OVMF-specific
6009e6
sections. We defer heavily to the official specifications and the training
6009e6
materials, and frequently quote them below.
6009e6
6009e6
A central concept to mention early is the GUID -- globally unique identifier. A
6009e6
GUID is a 128-bit number, written as XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX,
6009e6
where each X stands for a hexadecimal nibble. GUIDs are used to name everything
6009e6
in PI and in UEFI. Programmers introduce new GUIDs with the "uuidgen" utility,
6009e6
and standards bodies standardize well-known services by positing their GUIDs.
6009e6
6009e6
The boot process is roughly divided in the following phases:
6009e6
6009e6
- Reset vector code.
6009e6
6009e6
- SEC: Security phase. This phase is the root of firmware integrity.
6009e6
6009e6
- PEI: Pre-EFI Initialization. This phase performs "minimal processor, chipset
6009e6
  and platform configuration for the purpose of discovering memory". Modules in
6009e6
  PEI collectively save their findings about the platform in a list of HOBs
6009e6
  (hand-off blocks).
6009e6
6009e6
  When developing PEI code, the Platform Initialization (PI) specification
6009e6
  should be consulted.
6009e6
6009e6
- DXE: Driver eXecution Environment, pronounced as "Dixie". This "is the phase
6009e6
  where the bulk of the booting occurs: devices are enumerated and initialized,
6009e6
  UEFI services are supported, and protocols and drivers are implemented. Also,
6009e6
  the tables that create the UEFI interface are produced".
6009e6
6009e6
  On the PEI/DXE boundary, the HOBs produced by PEI are consumed. For example,
6009e6
  this is how the memory space map is configured initially.
6009e6
6009e6
- BDS: Boot Device Selection. It is "responsible for determining how and where
6009e6
  you want to boot the operating system".
6009e6
6009e6
  When developing DXE and BDS code, it is mainly the UEFI specification that
6009e6
  should be consulted. When speaking about DXE, BDS is frequently considered to
6009e6
  be a part of it.
6009e6
6009e6
The following concepts are tied to specific boot process phases:
6009e6
6009e6
- PEIM: a PEI Module (pronounced "PIM"). A binary module running in the PEI
6009e6
  phase, consuming some PPIs and producing other PPIs, and producing HOBs.
6009e6
6009e6
- PPI: PEIM-to-PEIM interface. A structure of function pointers and related
6009e6
  data members that establishes a PEI service, or an instance of a PEI service.
6009e6
  PPIs are identified by GUID.
6009e6
6009e6
  An example is EFI_PEI_S3_RESUME2_PPI (6D582DBC-DB85-4514-8FCC-5ADF6227B147).
6009e6
6009e6
- DXE driver: a binary module running in the DXE and BDS phases, consuming some
6009e6
  protocols and producing other protocols.
6009e6
6009e6
- Protocol: A structure of function pointers and related data members that
6009e6
  establishes a DXE service, or an instance of a DXE service. Protocols are
6009e6
  identified by GUID.
6009e6
6009e6
  An example is EFI_BLOCK_IO_PROTOCOL (964E5B21-6459-11D2-8E39-00A0C969723B).
6009e6
6009e6
- Architectural protocols: a set of standard protocols that are foundational to
6009e6
  the working of a UEFI system. Each architectural protocol has at most one
6009e6
  instance. Architectural protocols are implemented by a subset of DXE drivers.
6009e6
  DXE drivers explicitly list the set of protocols (including architectural
6009e6
  protocols) that they need to work. UEFI drivers can only be loaded once all
6009e6
  architectural protocols have become available during the DXE phase.
6009e6
6009e6
  An example is EFI_VARIABLE_WRITE_ARCH_PROTOCOL
6009e6
  (6441F818-6362-4E44-B570-7DBA31DD2453).
6009e6
6009e6
Project structure
6009e6
-----------------
6009e6
6009e6
The term "OVMF" usually denotes the project (community and development effort)
6009e6
that provide and maintain the subject matter UEFI firmware for virtual
6009e6
machines. However the term is also frequently applied to the firmware binary
6009e6
proper that a virtual machine executes.
6009e6
6009e6
OVMF emerges as a compilation of several modules from the edk2 source
6009e6
repository. "edk2" stands for EFI Development Kit II; it is a "modern,
6009e6
feature-rich, cross-platform firmware development environment for the UEFI and
6009e6
PI specifications".
6009e6
6009e6
The composition of OVMF is dictated by the following build control files:
6009e6
6009e6
  OvmfPkg/OvmfPkgIa32.dsc
6009e6
  OvmfPkg/OvmfPkgIa32.fdf
6009e6
6009e6
  OvmfPkg/OvmfPkgIa32X64.dsc
6009e6
  OvmfPkg/OvmfPkgIa32X64.fdf
6009e6
6009e6
  OvmfPkg/OvmfPkgX64.dsc
6009e6
  OvmfPkg/OvmfPkgX64.fdf
6009e6
6009e6
The format of these files is described in the edk2 DSC and FDF specifications.
6009e6
Roughly, the DSC file determines:
6009e6
- library instance resolutions for library class requirements presented by the
6009e6
  modules to be compiled,
6009e6
- the set of modules to compile.
6009e6
6009e6
The FDF file roughly determines:
6009e6
- what binary modules (compilation output files, precompiled binaries, graphics
6009e6
  image files, verbatim binary sections) to include in the firmware image,
6009e6
- how to lay out the firmware image.
6009e6
6009e6
The Ia32 flavor of these files builds a firmware where both PEI and DXE phases
6009e6
are 32-bit. The Ia32X64 flavor builds a firmware where the PEI phase consists
6009e6
of 32-bit modules, and the DXE phase is 64-bit. The X64 flavor builds a purely
6009e6
64-bit firmware.
6009e6
6009e6
The word size of the DXE phase must match the word size of the runtime OS -- a
6009e6
32-bit DXE can't cooperate with a 64-bit OS, and a 64-bit DXE can't work a
6009e6
32-bit OS.
6009e6
6009e6
OVMF pulls together modules from across the edk2 tree. For example:
6009e6
6009e6
- common drivers and libraries that are platform independent are usually
6009e6
  located under MdeModulePkg and MdePkg,
6009e6
6009e6
- common but hardware-specific drivers and libraries that match QEMU's
6009e6
  pc-i440fx-* machine type are pulled in from IntelFrameworkModulePkg,
6009e6
  PcAtChipsetPkg and UefiCpuPkg,
6009e6
6009e6
- the platform independent UEFI Shell is built from ShellPkg,
6009e6
6009e6
- OvmfPkg includes drivers and libraries that are useful for virtual machines
6009e6
  and may or may not be specific to QEMU's pc-i440fx-* machine type.
6009e6
6009e6
Platform Configuration Database (PCD)
6009e6
-------------------------------------
6009e6
6009e6
Like the "Phases of the boot process" section, this one introduces a concept in
6009e6
very raw form. We defer to the PCD related edk2 specifications, and we won't
6009e6
discuss implementation details here. Our purpose is only to offer the reader a
6009e6
usable (albeit possibly inaccurate) definition, so that we can refer to PCDs
6009e6
later on.
6009e6
6009e6
Colloquially, when we say "PCD", we actually mean "PCD entry"; that is, an
6009e6
entry stored in the Platform Configuration Database.
6009e6
6009e6
The Platform Configuration Database is
6009e6
- a firmware-wide
6009e6
- name-value store
6009e6
- of scalars and buffers
6009e6
- where each entry may be
6009e6
  - build-time constant, or
6009e6
  - run-time dynamic, or
6009e6
  - theoretically, a middle option: patchable in the firmware file itself,
6009e6
    using a dedicated tool. (OVMF does not utilize externally patchable
6009e6
    entries.)
6009e6
6009e6
A PCD entry is declared in the DEC file of the edk2 top-level Package directory
6009e6
whose modules (drivers and libraries) are the primary consumers of the PCD
6009e6
entry. (See for example OvmfPkg/OvmfPkg.dec). Basically, a PCD in a DEC file
6009e6
exposes a simple customization point.
6009e6
6009e6
Interest in a PCD entry is communicated to the build system by naming the PCD
6009e6
entry in the INF file of the interested module (application, driver or
6009e6
library). The module may read and -- dependent on the PCD entry's category --
6009e6
write the PCD entry.
6009e6
6009e6
Let's investigate the characteristics of the Database and the PCD entries.
6009e6
6009e6
- Firmware-wide: technically, all modules may access all entries they are
6009e6
  interested in, assuming they advertise their interest in their INF files.
6009e6
  With careful design, PCDs enable inter-driver propagation of (simple) system
6009e6
  configuration. PCDs are available in both PEI and DXE.
6009e6
6009e6
  (UEFI drivers meant to be portable (ie. from third party vendors) are not
6009e6
  supposed to use PCDs, since PCDs qualify internal to the specific edk2
6009e6
  firmware in question.)
6009e6
6009e6
- Name-value store of scalars and buffers: each PCD has a symbolic name, and a
6009e6
  fixed scalar type (UINT16, UINT32 etc), or VOID* for buffers. Each PCD entry
6009e6
  belongs to a namespace, where a namespace is (obviously) a GUID, defined in
6009e6
  the DEC file.
6009e6
6009e6
- A DEC file can permit several categories for a PCD:
6009e6
  - build-time constant ("FixedAtBuild"),
6009e6
  - patchable in the firmware image ("PatchableInModule", unused in OVMF),
6009e6
  - runtime modifiable ("Dynamic").
6009e6
6009e6
The platform description file (DSC) of a top-level Package directory may choose
6009e6
the exact category for a given PCD entry that its modules wish to use, and
6009e6
assign a default (or constant) initial value to it.
6009e6
6009e6
In addition, the edk2 build system too can initialize PCD entries to values
6009e6
that it calculates while laying out the flash device image. Such PCD
6009e6
assignments are described in the FDF control file.
6009e6
6009e6
Firmware image structure
6009e6
------------------------
6009e6
6009e6
(We assume the common X64 choice for both PEI and DXE, and the default DEBUG
6009e6
build target.)
6009e6
6009e6
The OvmfPkg/OvmfPkgX64.fdf file defines the following layout for the flash
6009e6
device image "OVMF.fd":
6009e6
6009e6
  Description                     Compression type        Size
6009e6
  ------------------------------  ----------------------  -------
6009e6
  Non-volatile data storage       open-coded binary data   128 KB
6009e6
    Variable store                                          56 KB
6009e6
    Event log                                                4 KB
6009e6
    Working block                                            4 KB
6009e6
    Spare area                                              64 KB
6009e6
6009e6
  FVMAIN_COMPACT                  uncompressed            1712 KB
6009e6
    FV Firmware File System file  LZMA compressed
6009e6
      PEIFV                       uncompressed             896 KB
6009e6
        individual PEI modules    uncompressed
6009e6
      DXEFV                       uncompressed            8192 KB
6009e6
        individual DXE modules    uncompressed
6009e6
6009e6
  SECFV                           uncompressed             208 KB
6009e6
    SEC driver
6009e6
    reset vector code
6009e6
6009e6
The top-level image consists of three regions (three firmware volumes):
6009e6
- non-volatile data store (128 KB),
6009e6
- main firmware volume (FVMAIN_COMPACT, 1712 KB),
6009e6
- firmware volume containing the reset vector code and the SEC phase code (208
6009e6
  KB).
6009e6
6009e6
In total, the OVMF.fd file has size 128 KB + 1712 KB + 208 KB == 2 MB.
6009e6
6009e6
(1) The firmware volume with non-volatile data store (128 KB) has the following
6009e6
    internal structure, in blocks of 4 KB:
6009e6
6009e6
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  L: event log
6009e6
       LIVE | varstore                  |L|W|  W: working block
6009e6
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
6009e6
6009e6
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
6009e6
      SPARE |                               |
6009e6
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
6009e6
6009e6
    The first half of this firmware volume is "live", while the second half is
6009e6
    "spare". The spare half is important when the variable driver reclaims
6009e6
    unused storage and reorganizes the variable store.
6009e6
6009e6
    The live half dedicates 14 blocks (56 KB) to the variable store itself. On
6009e6
    top of those, one block is set aside for an event log, and one block is
6009e6
    used as the working block of the fault tolerant write protocol. Fault
6009e6
    tolerant writes are used to recover from an occasional (virtual) power loss
6009e6
    during variable updates.
6009e6
6009e6
    The blocks in this firmware volume are accessed, in stacking order from
6009e6
    least abstract to most abstract, by:
6009e6
6009e6
    - EFI_FIRMWARE_VOLUME_BLOCK_PROTOCOL (provided by
6009e6
      OvmfPkg/QemuFlashFvbServicesRuntimeDxe),
6009e6
6009e6
    - EFI_FAULT_TOLERANT_WRITE_PROTOCOL (provided by
6009e6
      MdeModulePkg/Universal/FaultTolerantWriteDxe),
6009e6
6009e6
    - architectural protocols instrumental to the runtime UEFI variable
6009e6
      services:
6009e6
      - EFI_VARIABLE_ARCH_PROTOCOL,
6009e6
      - EFI_VARIABLE_WRITE_ARCH_PROTOCOL.
6009e6
6009e6
      In a non-secure boot build, the DXE driver providing these architectural
6009e6
      protocols is MdeModulePkg/Universal/Variable/RuntimeDxe. In a secure boot
6009e6
      build, where authenticated variables are available, the DXE driver
6009e6
      offering these protocols is SecurityPkg/VariableAuthenticated/RuntimeDxe.
6009e6
6009e6
(2) The main firmware volume (FVMAIN_COMPACT, 1712 KB) embeds further firmware
6009e6
    volumes. The outermost layer is a Firmware File System (FFS), carrying a
6009e6
    single file. This file holds an LZMA-compressed section, which embeds two
6009e6
    firmware volumes: PEIFV (896 KB) with PEIMs, and DXEFV (8192 KB) with DXE
6009e6
    and UEFI drivers.
6009e6
6009e6
    This scheme enables us to build 896 KB worth of PEI drivers and 8192 KB
6009e6
    worth of DXE and UEFI drivers, compress them all with LZMA in one go, and
6009e6
    store the compressed result in 1712 KB, saving room in the flash device.
6009e6
6009e6
(3) The SECFV firmware volume (208 KB) is not compressed. It carries the
6009e6
    "volume top file" with the reset vector code, to end at 4 GB in
6009e6
    guest-physical address space, and the SEC phase driver (OvmfPkg/Sec).
6009e6
6009e6
    The last 16 bytes of the volume top file (mapped directly under 4 GB)
6009e6
    contain a NOP slide and a jump instruction. This is where QEMU starts
6009e6
    executing the firmware, at address 0xFFFF_FFF0. The reset vector and the
6009e6
    SEC driver run from flash directly.
6009e6
6009e6
    The SEC driver locates FVMAIN_COMPACT in the flash, and decompresses the
6009e6
    main firmware image to RAM. The rest of OVMF (PEI, DXE, BDS phases) run
6009e6
    from RAM.
6009e6
6009e6
As already mentioned, the OVMF.fd file is mapped by qemu's
6009e6
"hw/block/pflash_cfi01.c" device just under 4 GB in guest-physical address
6009e6
space, according to the command line option
6009e6
6009e6
  -drive if=pflash,format=raw,file=fedora.flash
6009e6
6009e6
(refer to the Example qemu invocation). This is a "ROMD device", which can
6009e6
switch out of "ROMD mode" and back into it.
6009e6
6009e6
Namely, in the default ROMD mode, the guest-physical address range backed by
6009e6
the flash device reads and executes as ROM (it does not trap from KVM to QEMU).
6009e6
The first write access in this mode traps to QEMU, and flips the device out of
6009e6
ROMD mode.
6009e6
6009e6
In non-ROMD mode, the flash chip is programmed by storing CFI (Common Flash
6009e6
Interface) command values at the flash-covered addresses; both reads and writes
6009e6
trap to QEMU, and the flash contents are modified and synchronized to the
6009e6
host-side file. A special CFI command flips the flash device back to ROMD mode.
6009e6
6009e6
Qemu implements the above based on the KVM_CAP_READONLY_MEM / KVM_MEM_READONLY
6009e6
KVM features, and OVMF puts it to use in its EFI_FIRMWARE_VOLUME_BLOCK_PROTOCOL
6009e6
implementation, under "OvmfPkg/QemuFlashFvbServicesRuntimeDxe".
6009e6
6009e6
IMPORTANT: Never pass OVMF.fd to qemu with the -bios option. That option maps
6009e6
the firmware image as ROM into the guest's address space, and forces OVMF to
6009e6
emulate non-volatile variables with a fallback driver that is bound to have
6009e6
insufficient and confusing semantics.
6009e6
6009e6
The 128 KB firmware volume with the variable store, discussed under (1), is
6009e6
also built as a separate host-side file, named "OVMF_VARS.fd". The "rest" is
6009e6
built into a third file, "OVMF_CODE.fd", which is only 1920 KB in size. The
6009e6
variable store is mapped into its usual location, at 4 GB - 2 MB = 0xFFE0_0000,
6009e6
through the following qemu options:
6009e6
6009e6
  -drive if=pflash,format=raw,readonly,file=OVMF_CODE.fd   \
6009e6
  -drive if=pflash,format=raw,file=fedora.varstore.fd
6009e6
6009e6
This way qemu configures two flash chips consecutively, with start addresses
6009e6
growing downwards, which is transparent to OVMF.
6009e6
6009e6
[RHEL] Red Hat Enterprise Linux 7.1 ships a Secure Boot-enabled, X64, DEBUG
6009e6
       firmware only. Furthermore, only the split files ("OVMF_VARS.fd" and
6009e6
       "OVMF_CODE.fd") are available.
6009e6
6009e6
S3 (suspend to RAM and resume)
6009e6
------------------------------
6009e6
6009e6
As noted in Example qemu invocation, the
6009e6
6009e6
  -global PIIX4_PM.disable_s3=0
6009e6
6009e6
command line option tells qemu and OVMF if the user would like to enable S3
6009e6
support. (This is corresponds to the /domain/pm/suspend-to-mem/@enabled libvirt
6009e6
domain XML attribute.)
6009e6
6009e6
Implementing / orchestrating S3 was a considerable community effort in OVMF. A
6009e6
detailed description exceeds the scope of this report; we only make a few
6009e6
statements.
6009e6
6009e6
(1) S3-related PPIs and protocols are well documented in the PI specification.
6009e6
6009e6
(2) Edk2 contains most modules that are needed to implement S3 on a given
6009e6
    platform. One abstraction that is central to the porting / extending of the
6009e6
    S3-related modules to a new platform is the LockBox library interface,
6009e6
    which a specific platform can fill in by implementing its own LockBox
6009e6
    library instance.
6009e6
6009e6
    The LockBox library provides a privileged name-value store (to be addressed
6009e6
    by GUIDs). The privilege separation stretches between the firmware and the
6009e6
    operating system. That is, the S3-related machinery of the firmware saves
6009e6
    some items in the LockBox securely, under well-known GUIDs, before booting
6009e6
    the operating system. During resume (which is a form of warm reset), the
6009e6
    firmware is activated again, and retrieves items from the LockBox. Before
6009e6
    jumping to the OS's resume vector, the LockBox is secured again.
6009e6
6009e6
    We'll return to this later when we separately discuss SMRAM and SMM.
6009e6
6009e6
(3) During resume, the DXE and later phases are never reached; only the reset
6009e6
    vector, and the SEC and PEI phases of the firmware run. The platform is
6009e6
    supposed to detect a resume in progress during PEI, and to store that fact
6009e6
    in the BootMode field of the Phase Handoff Information Table (PHIT) HOB.
6009e6
    OVMF keys this off the CMOS, see OvmfPkg/PlatformPei.
6009e6
6009e6
    At the end of PEI, the DXE IPL PEIM (Initial Program Load PEI Module, see
6009e6
    MdeModulePkg/Core/DxeIplPeim) examines the Boot Mode, and if it says "S3
6009e6
    resume in progress", then the IPL branches to the PEIM that exports
6009e6
    EFI_PEI_S3_RESUME2_PPI (provided by UefiCpuPkg/Universal/Acpi/S3Resume2Pei)
6009e6
    rather than loading the DXE core.
6009e6
6009e6
    S3Resume2Pei executes the technical steps of the resumption, relying on the
6009e6
    contents of the LockBox.
6009e6
6009e6
(4) During first boot (or after a normal platform reset), when DXE does run,
6009e6
    hardware drivers in the DXE phase are encouraged to "stash" their hardware
6009e6
    configuration steps (eg. accesses to PCI config space, I/O ports, memory
6009e6
    mapped addresses, and so on) in a centrally maintained, so called "S3 boot
6009e6
    script". Hardware accesses are represented with opcodes of a special binary
6009e6
    script language.
6009e6
6009e6
    This boot script is to be replayed during resume, by S3Resume2Pei. The
6009e6
    general goal is to bring back hardware devices -- which have been powered
6009e6
    off during suspend -- to their original after-first-boot state, and in
6009e6
    particular, to do so quickly.
6009e6
6009e6
    At the moment, OVMF saves only one opcode in the S3 resume boot script: an
6009e6
    INFORMATION opcode, with contents 0xDEADBEEF (in network byte order). The
6009e6
    consensus between Linux developers seems to be that boot firmware is only
6009e6
    responsible for restoring basic chipset state, which OVMF does during PEI
6009e6
    anyway, independently of S3 vs. normal reset. (One example is the power
6009e6
    management registers of the i440fx chipset.) Device and peripheral state is
6009e6
    the responsibility of the runtime operating system.
6009e6
6009e6
    Although an experimental OVMF S3 boot script was at one point captured for
6009e6
    the virtual Cirrus VGA card, such a boot script cannot follow eg. video
6009e6
    mode changes effected by the OS. Hence the operating system can never avoid
6009e6
    restoring device state, and most Linux display drivers (eg. stdvga, QXL)
6009e6
    already cover S3 resume fully.
6009e6
6009e6
    The XDDM and WDDM driver models used under Windows OSes seem to recognize
6009e6
    this notion of runtime OS responsibility as well. (See the list of OSes
6009e6
    supported by OVMF in a separate section.)
6009e6
6009e6
(5) The S3 suspend/resume data flow in OVMF is included here tersely, for
6009e6
    interested developers.
6009e6
6009e6
    (a) BdsLibBootViaBootOption()
6009e6
          EFI_ACPI_S3_SAVE_PROTOCOL [AcpiS3SaveDxe]
6009e6
          - saves ACPI S3 Context to LockBox  ---------------------+
6009e6
            (including FACS address -- FACS ACPI table             |
6009e6
            contains OS waking vector)                             |
6009e6
                                                                   |
6009e6
          - prepares boot script:                                  |
6009e6
            EFI_S3_SAVE_STATE_PROTOCOL.Write() [S3SaveStateDxe]    |
6009e6
              S3BootScriptLib [PiDxeS3BootScriptLib]               |
6009e6
              - opcodes & arguments are saved in NVS.  --+         |
6009e6
                                                         |         |
6009e6
          - issues a notification by installing          |         |
6009e6
            EFI_DXE_SMM_READY_TO_LOCK_PROTOCOL           |         |
6009e6
                                                         |         |
6009e6
    (b) EFI_S3_SAVE_STATE_PROTOCOL [S3SaveStateDxe]      |         |
6009e6
          S3BootScriptLib [PiDxeS3BootScriptLib]         |         |
6009e6
          - closes script with special opcode  <---------+         |
6009e6
          - script is available in non-volatile memory             |
6009e6
            via PcdS3BootScriptTablePrivateDataPtr  --+            |
6009e6
                                                      |            |
6009e6
        BootScriptExecutorDxe                         |            |
6009e6
          S3BootScriptLib [PiDxeS3BootScriptLib]      |            |
6009e6
          - Knows about boot script location by  <----+            |
6009e6
            synchronizing with the other library                   |
6009e6
            instance via                                           |
6009e6
            PcdS3BootScriptTablePrivateDataPtr.                    |
6009e6
          - Copies relocated image of itself to                    |
6009e6
            reserved memory. --------------------------------+     |
6009e6
          - Saved image contains pointer to boot script.  ---|--+  |
6009e6
                                                             |  |  |
6009e6
    Runtime:                                                 |  |  |
6009e6
                                                             |  |  |
6009e6
    (c) OS is booted, writes OS waking vector to FACS,       |  |  |
6009e6
        suspends machine                                     |  |  |
6009e6
                                                             |  |  |
6009e6
    S3 Resume (PEI):                                         |  |  |
6009e6
                                                             |  |  |
6009e6
    (d) PlatformPei sets S3 Boot Mode based on CMOS          |  |  |
6009e6
                                                             |  |  |
6009e6
    (e) DXE core is skipped and EFI_PEI_S3_RESUME2 is        |  |  |
6009e6
        called as last step of PEI                           |  |  |
6009e6
                                                             |  |  |
6009e6
    (f) S3Resume2Pei retrieves from LockBox:                 |  |  |
6009e6
        - ACPI S3 Context (path to FACS)  <------------------|--|--+
6009e6
                                          |                  |  |
6009e6
                                          +------------------|--|--+
6009e6
        - Boot Script Executor Image  <----------------------+  |  |
6009e6
                                                                |  |
6009e6
    (g) BootScriptExecutorDxe                                   |  |
6009e6
          S3BootScriptLib [PiDxeS3BootScriptLib]                |  |
6009e6
          - executes boot script  <-----------------------------+  |
6009e6
                                                                   |
6009e6
    (h) OS waking vector available from ACPI S3 Context / FACS  <--+
6009e6
        is called
6009e6
6009e6
A comprehensive memory map of OVMF
6009e6
----------------------------------
6009e6
6009e6
The following section gives a detailed analysis of memory ranges below 4 GB
6009e6
that OVMF statically uses.
6009e6
6009e6
In the rightmost column, the PCD entry is identified by which the source refers
6009e6
to the address or size in question.
6009e6
6009e6
The flash-covered range has been discussed previously in "Firmware image
6009e6
structure", therefore we include it only for completeness. Due to the fact that
6009e6
this range is always backed by a memory mapped device (and never RAM), it is
6009e6
unaffected by S3 (suspend to RAM and resume).
6009e6
6009e6
+--------------------------+ 4194304 KB
6009e6
|                          |
6009e6
|          SECFV           | size: 208 KB
6009e6
|                          |
6009e6
+--------------------------+ 4194096 KB
6009e6
|                          |
6009e6
|      FVMAIN_COMPACT      | size: 1712 KB
6009e6
|                          |
6009e6
+--------------------------+ 4192384 KB
6009e6
|                          |
6009e6
|      variable store      | size: 64 KB   PcdFlashNvStorageFtwSpareSize
6009e6
|        spare area        |
6009e6
|                          |
6009e6
+--------------------------+ 4192320 KB    PcdOvmfFlashNvStorageFtwSpareBase
6009e6
|                          |
6009e6
|    FTW working block     | size: 4 KB    PcdFlashNvStorageFtwWorkingSize
6009e6
|                          |
6009e6
+--------------------------+ 4192316 KB    PcdOvmfFlashNvStorageFtwWorkingBase
6009e6
|                          |
6009e6
|       Event log of       | size: 4 KB    PcdOvmfFlashNvStorageEventLogSize
6009e6
|   non-volatile storage   |
6009e6
|                          |
6009e6
+--------------------------+ 4192312 KB    PcdOvmfFlashNvStorageEventLogBase
6009e6
|                          |
6009e6
|      variable store      | size: 56 KB   PcdFlashNvStorageVariableSize
6009e6
|                          |
6009e6
+--------------------------+ 4192256 KB    PcdOvmfFlashNvStorageVariableBase
6009e6
6009e6
The flash-mapped image of OVMF.fd covers the entire structure above (2048 KB).
6009e6
6009e6
When using the split files, the address 4192384 KB
6009e6
(PcdOvmfFlashNvStorageFtwSpareBase + PcdFlashNvStorageFtwSpareSize) is the
6009e6
boundary between the mapped images of OVMF_VARS.fd (56 KB + 4 KB + 4 KB + 64 KB
6009e6
= 128 KB) and OVMF_CODE.fd (1712 KB + 208 KB = 1920 KB).
6009e6
6009e6
With regard to RAM that is statically used by OVMF, S3 (suspend to RAM and
6009e6
resume) complicates matters. Many ranges have been introduced only to support
6009e6
S3, hence for all ranges below, the following questions will be audited:
6009e6
6009e6
(a) when and how a given range is initialized after first boot of the VM,
6009e6
(b) how it is protected from memory allocations during DXE,
6009e6
(c) how it is protected from the OS,
6009e6
(d) how it is accessed on the S3 resume path,
6009e6
(e) how it is accessed on the warm reset path.
6009e6
6009e6
Importantly, the term "protected" is meant as protection against inadvertent
6009e6
reallocations and overwrites by co-operating DXE and OS modules. It does not
6009e6
imply security against malicious code.
6009e6
6009e6
+--------------------------+ 17408 KB
6009e6
|                          |
6009e6
|DXEFV from FVMAIN_COMPACT | size: 8192 KB PcdOvmfDxeMemFvSize
6009e6
|  decompressed firmware   |
6009e6
| volume with DXE modules  |
6009e6
|                          |
6009e6
+--------------------------+ 9216 KB       PcdOvmfDxeMemFvBase
6009e6
|                          |
6009e6
|PEIFV from FVMAIN_COMPACT | size: 896 KB  PcdOvmfPeiMemFvSize
6009e6
|  decompressed firmware   |
6009e6
| volume with PEI modules  |
6009e6
|                          |
6009e6
+--------------------------+ 8320 KB       PcdOvmfPeiMemFvBase
6009e6
|                          |
6009e6
| permanent PEI memory for | size: 32 KB   PcdS3AcpiReservedMemorySize
6009e6
|   the S3 resume path     |
6009e6
|                          |
6009e6
+--------------------------+ 8288 KB       PcdS3AcpiReservedMemoryBase
6009e6
|                          |
6009e6
|  temporary SEC/PEI heap  | size: 32 KB   PcdOvmfSecPeiTempRamSize
6009e6
|         and stack        |
6009e6
|                          |
6009e6
+--------------------------+ 8256 KB       PcdOvmfSecPeiTempRamBase
6009e6
|                          |
6009e6
|          unused          | size: 32 KB
6009e6
|                          |
6009e6
+--------------------------+ 8224 KB
6009e6
|                          |
6009e6
|      SEC's table of      | size: 4 KB    PcdGuidedExtractHandlerTableSize
6009e6
| GUIDed section handlers  |
6009e6
|                          |
6009e6
+--------------------------+ 8220 KB       PcdGuidedExtractHandlerTableAddress
6009e6
|                          |
6009e6
|     LockBox storage      | size: 4 KB    PcdOvmfLockBoxStorageSize
6009e6
|                          |
6009e6
+--------------------------+ 8216 KB       PcdOvmfLockBoxStorageBase
6009e6
|                          |
6009e6
| early page tables on X64 | size: 24 KB   PcdOvmfSecPageTablesSize
6009e6
|                          |
6009e6
+--------------------------+ 8192 KB       PcdOvmfSecPageTablesBase
6009e6
6009e6
(1) Early page tables on X64:
6009e6
6009e6
  (a) when and how it is initialized after first boot of the VM
6009e6
6009e6
    The range is filled in during the SEC phase
6009e6
    [OvmfPkg/ResetVector/Ia32/PageTables64.asm]. The CR3 register is verified
6009e6
    against the base address in SecCoreStartupWithStack()
6009e6
    [OvmfPkg/Sec/SecMain.c].
6009e6
6009e6
  (b) how it is protected from memory allocations during DXE
6009e6
6009e6
    If S3 was enabled on the QEMU command line (see "-global
6009e6
    PIIX4_PM.disable_s3=0" earlier), then InitializeRamRegions()
6009e6
    [OvmfPkg/PlatformPei/MemDetect.c] protects the range with an AcpiNVS memory
6009e6
    allocation HOB, in PEI.
6009e6
6009e6
    If S3 was disabled, then this range is not protected. DXE's own page tables
6009e6
    are first built while still in PEI (see HandOffToDxeCore()
6009e6
    [MdeModulePkg/Core/DxeIplPeim/X64/DxeLoadFunc.c]). Those tables are located
6009e6
    in permanent PEI memory. After CR3 is switched over to them (which occurs
6009e6
    before jumping to the DXE core entry point), we don't have to preserve the
6009e6
    initial tables.
6009e6
6009e6
  (c) how it is protected from the OS
6009e6
6009e6
    If S3 is enabled, then (1b) reserves it from the OS too.
6009e6
6009e6
    If S3 is disabled, then the range needs no protection.
6009e6
6009e6
  (d) how it is accessed on the S3 resume path
6009e6
6009e6
    It is rewritten same as in (1a), which is fine because (1c) reserved it.
6009e6
6009e6
  (e) how it is accessed on the warm reset path
6009e6
6009e6
    It is rewritten same as in (1a).
6009e6
6009e6
(2) LockBox storage:
6009e6
6009e6
  (a) when and how it is initialized after first boot of the VM
6009e6
6009e6
    InitializeRamRegions() [OvmfPkg/PlatformPei/MemDetect.c] zeroes out the
6009e6
    area during PEI. This is correct but not strictly necessary, since on first
6009e6
    boot the area is zero-filled anyway.
6009e6
6009e6
    The LockBox signature of the area is filled in by the PEI module or DXE
6009e6
    driver that has been linked against OVMF's LockBoxLib and is run first. The
6009e6
    signature is written in LockBoxLibInitialize()
6009e6
    [OvmfPkg/Library/LockBoxLib/LockBoxLib.c].
6009e6
6009e6
    Any module calling SaveLockBox() [OvmfPkg/Library/LockBoxLib/LockBoxLib.c]
6009e6
    will co-populate this area.
6009e6
6009e6
  (b) how it is protected from memory allocations during DXE
6009e6
6009e6
    If S3 is enabled, then InitializeRamRegions()
6009e6
    [OvmfPkg/PlatformPei/MemDetect.c] protects the range as AcpiNVS.
6009e6
6009e6
    Otherwise, the range is covered with a BootServicesData memory allocation
6009e6
    HOB.
6009e6
6009e6
  (c) how it is protected from the OS
6009e6
6009e6
    If S3 is enabled, then (2b) protects it sufficiently.
6009e6
6009e6
    Otherwise the range requires no runtime protection, and the
6009e6
    BootServicesData allocation type from (2b) ensures that the range will be
6009e6
    released to the OS.
6009e6
6009e6
  (d) how it is accessed on the S3 resume path
6009e6
6009e6
    The S3 Resume PEIM restores data from the LockBox, which has been correctly
6009e6
    protected in (2c).
6009e6
6009e6
  (e) how it is accessed on the warm reset path
6009e6
6009e6
    InitializeRamRegions() [OvmfPkg/PlatformPei/MemDetect.c] zeroes out the
6009e6
    range during PEI, effectively emptying the LockBox. Modules will
6009e6
    re-populate the LockBox as described in (2a).
6009e6
6009e6
(3) SEC's table of GUIDed section handlers
6009e6
6009e6
  (a) when and how it is initialized after first boot of the VM
6009e6
6009e6
    The following two library instances are linked into SecMain:
6009e6
    - IntelFrameworkModulePkg/Library/LzmaCustomDecompressLib,
6009e6
    - MdePkg/Library/BaseExtractGuidedSectionLib.
6009e6
6009e6
    The first library registers its LZMA decompressor plugin (which is a called
6009e6
    a "section handler") by calling the second library:
6009e6
6009e6
    LzmaDecompressLibConstructor() [GuidedSectionExtraction.c]
6009e6
      ExtractGuidedSectionRegisterHandlers() [BaseExtractGuidedSectionLib.c]
6009e6
6009e6
    The second library maintains its table of registered "section handlers", to
6009e6
    be indexed by GUID, in this fixed memory area, independently of S3
6009e6
    enablement.
6009e6
6009e6
    (The decompression of FVMAIN_COMPACT's FFS file section that contains the
6009e6
    PEIFV and DXEFV firmware volumes occurs with the LZMA decompressor
6009e6
    registered above. See (6) and (7) below.)
6009e6
6009e6
  (b) how it is protected from memory allocations during DXE
6009e6
6009e6
    There is no need to protect this area from DXE: because nothing else in
6009e6
    OVMF links against BaseExtractGuidedSectionLib, the area loses its
6009e6
    significance as soon as OVMF progresses from SEC to PEI, therefore DXE is
6009e6
    allowed to overwrite the region.
6009e6
6009e6
  (c) how it is protected from the OS
6009e6
6009e6
    When S3 is enabled, we cover the range with an AcpiNVS memory allocation
6009e6
    HOB in InitializeRamRegions().
6009e6
6009e6
    When S3 is disabled, the range is not protected.
6009e6
6009e6
  (d) how it is accessed on the S3 resume path
6009e6
6009e6
    The table of registered section handlers is again managed by
6009e6
    BaseExtractGuidedSectionLib linked into SecMain exclusively. Section
6009e6
    handler registrations update the table in-place (based on GUID matches).
6009e6
6009e6
  (e) how it is accessed on the warm reset path
6009e6
6009e6
    If S3 is enabled, then the OS won't damage the table (due to (3c)), thus
6009e6
    see (3d).
6009e6
6009e6
    If S3 is disabled, then the OS has most probably overwritten the range with
6009e6
    its own data, hence (3a) -- complete reinitialization -- will come into
6009e6
    effect, based on the table signature check in BaseExtractGuidedSectionLib.
6009e6
6009e6
(4) temporary SEC/PEI heap and stack
6009e6
6009e6
  (a) when and how it is initialized after first boot of the VM
6009e6
6009e6
    The range is configured in [OvmfPkg/Sec/X64/SecEntry.S] and
6009e6
    SecCoreStartupWithStack() [OvmfPkg/Sec/SecMain.c]. The stack half is read &
6009e6
    written by the CPU transparently. The heap half is used for memory
6009e6
    allocations during PEI.
6009e6
6009e6
    Data is migrated out (to permanent PEI stack & memory) in (or soon after)
6009e6
    PublishPeiMemory() [OvmfPkg/PlatformPei/MemDetect.c].
6009e6
6009e6
  (b) how it is protected from memory allocations during DXE
6009e6
6009e6
    It is not necessary to protect this range during DXE because its use ends
6009e6
    still in PEI.
6009e6
6009e6
  (c) how it is protected from the OS
6009e6
6009e6
    If S3 is enabled, then InitializeRamRegions()
6009e6
    [OvmfPkg/PlatformPei/MemDetect.c] reserves it as AcpiNVS.
6009e6
6009e6
    If S3 is disabled, then the range doesn't require protection.
6009e6
6009e6
  (d) how it is accessed on the S3 resume path
6009e6
6009e6
    Same as in (4a), except the target area of the migration triggered by
6009e6
    PublishPeiMemory() [OvmfPkg/PlatformPei/MemDetect.c] is different -- see
6009e6
    (5).
6009e6
6009e6
  (e) how it is accessed on the warm reset path
6009e6
6009e6
    Same as in (4a). The stack and heap halves both may contain garbage, but it
6009e6
    doesn't matter.
6009e6
6009e6
(5) permanent PEI memory for the S3 resume path
6009e6
6009e6
  (a) when and how it is initialized after first boot of the VM
6009e6
6009e6
    No particular initialization or use.
6009e6
6009e6
  (b) how it is protected from memory allocations during DXE
6009e6
6009e6
    We don't need to protect this area during DXE.
6009e6
6009e6
  (c) how it is protected from the OS
6009e6
6009e6
    When S3 is enabled, InitializeRamRegions()
6009e6
    [OvmfPkg/PlatformPei/MemDetect.c] makes sure the OS stays away by covering
6009e6
    the range with an AcpiNVS memory allocation HOB.
6009e6
6009e6
    When S3 is disabled, the range needs no protection.
6009e6
6009e6
  (d) how it is accessed on the S3 resume path
6009e6
6009e6
    PublishPeiMemory() installs the range as permanent RAM for PEI. The range
6009e6
    will serve as stack and will satisfy allocation requests during the rest of
6009e6
    PEI. OS data won't overlap due to (5c).
6009e6
6009e6
  (e) how it is accessed on the warm reset path
6009e6
6009e6
    Same as (5a).
6009e6
6009e6
(6) PEIFV -- decompressed firmware volume with PEI modules
6009e6
6009e6
  (a) when and how it is initialized after first boot of the VM
6009e6
6009e6
    DecompressMemFvs() [OvmfPkg/Sec/SecMain.c] populates the area, by
6009e6
    decompressing the flash-mapped FVMAIN_COMPACT volume's contents. (Refer to
6009e6
    "Firmware image structure".)
6009e6
6009e6
  (b) how it is protected from memory allocations during DXE
6009e6
6009e6
    When S3 is disabled, PeiFvInitialization() [OvmfPkg/PlatformPei/Fv.c]
6009e6
    covers the range with a BootServicesData memory allocation HOB.
6009e6
6009e6
    When S3 is enabled, the same is coverage is ensured, just with the stronger
6009e6
    AcpiNVS memory allocation type.
6009e6
6009e6
  (c) how it is protected from the OS
6009e6
6009e6
    When S3 is disabled, it is not necessary to keep the range from the OS.
6009e6
6009e6
    Otherwise the AcpiNVS type allocation from (6b) provides coverage.
6009e6
6009e6
  (d) how it is accessed on the S3 resume path
6009e6
6009e6
    Rather than decompressing it again from FVMAIN_COMPACT, GetS3ResumePeiFv()
6009e6
    [OvmfPkg/Sec/SecMain.c] reuses the protected area for parsing / execution
6009e6
    from (6c).
6009e6
6009e6
  (e) how it is accessed on the warm reset path
6009e6
6009e6
    Same as (6a).
6009e6
6009e6
(7) DXEFV -- decompressed firmware volume with DXE modules
6009e6
6009e6
  (a) when and how it is initialized after first boot of the VM
6009e6
6009e6
    Same as (6a).
6009e6
6009e6
  (b) how it is protected from memory allocations during DXE
6009e6
6009e6
    PeiFvInitialization() [OvmfPkg/PlatformPei/Fv.c] covers the range with a
6009e6
    BootServicesData memory allocation HOB.
6009e6
6009e6
  (c) how it is protected from the OS
6009e6
6009e6
    The OS is allowed to release and reuse this range.
6009e6
6009e6
  (d) how it is accessed on the S3 resume path
6009e6
6009e6
    It's not; DXE never runs during S3 resume.
6009e6
6009e6
  (e) how it is accessed on the warm reset path
6009e6
6009e6
    Same as in (7a).
6009e6
6009e6
Known Secure Boot limitations
6009e6
-----------------------------
6009e6
6009e6
Under "Motivation" we've mentioned that OVMF's Secure Boot implementation is
6009e6
not suitable for production use yet -- it's only good for development and
6009e6
testing of standards-conformant, non-malicious guest code (UEFI and operating
6009e6
system alike).
6009e6
6009e6
Now that we've examined the persistent flash device, the workings of S3, and
6009e6
the memory map, we can discuss two currently known shortcomings of OVMF's
6009e6
Secure Boot that in fact make it insecure. (Clearly problems other than these
6009e6
two might exist; the set of issues considered here is not meant to be
6009e6
exhaustive.)
6009e6
6009e6
One trait of Secure Boot is tamper-evidence. Secure Boot may not prevent
6009e6
malicious modification of software components (for example, operating system
6009e6
drivers), but by being the root of integrity on a platform, it can catch (or
6009e6
indirectly contribute to catching) unauthorized changes, by way of signature
6009e6
and certificate checks at the earliest phases of boot.
6009e6
6009e6
If an attacker can tamper with key material stored in authenticated and/or
6009e6
boot-time only persistent variables (for example, PK, KEK, db, dbt, dbx), then
6009e6
the intended security of this scheme is compromised. The UEFI 2.4A
6009e6
specification says
6009e6
6009e6
- in section 28.3.4:
6009e6
6009e6
  Platform Keys:
6009e6
6009e6
    The public key must be stored in non-volatile storage which is tamper and
6009e6
    delete resistant.
6009e6
6009e6
  Key Exchange Keys:
6009e6
6009e6
    The public key must be stored in non-volatile storage which is tamper
6009e6
    resistant.
6009e6
6009e6
- in section 28.6.1:
6009e6
6009e6
  The signature database variables db, dbt, and dbx must be stored in
6009e6
  tamper-resistant non-volatile storage.
6009e6
6009e6
(1) The combination of QEMU, KVM, and OVMF does not provide this kind of
6009e6
    resistance. The variable store in the emulated flash chip is directly
6009e6
    accessible to, and reprogrammable by, UEFI drivers, applications, and
6009e6
    operating systems.
6009e6
6009e6
(2) Under "S3 (suspend to RAM and resume)" we pointed out that the LockBox
6009e6
    storage must be similarly secure and tamper-resistant.
6009e6
6009e6
    On the S3 resume path, the PEIM providing EFI_PEI_S3_RESUME2_PPI
6009e6
    (UefiCpuPkg/Universal/Acpi/S3Resume2Pei) restores and interprets data from
6009e6
    the LockBox that has been saved there during boot. This PEIM, being part of
6009e6
    the firmware, has full access to the platform. If an operating system can
6009e6
    tamper with the contents of the LockBox, then at the next resume the
6009e6
    platform's integrity might be subverted.
6009e6
6009e6
    OVMF stores the LockBox in normal guest RAM (refer to the memory map
6009e6
    section above). Operating systems and third party UEFI drivers and UEFI
6009e6
    applications that respect the UEFI memory map will not inadvertently
6009e6
    overwrite the LockBox storage, but there's nothing to prevent eg. a
6009e6
    malicious kernel from modifying the LockBox.
6009e6
6009e6
One means to address these issues is SMM and SMRAM (System Management Mode and
6009e6
System Management RAM).
6009e6
6009e6
During boot and resume, the firmware can enter and leave SMM and access SMRAM.
6009e6
Before the DXE phase is left, and control is transferred to the BDS phase (when
6009e6
third party UEFI drivers and applications can be loaded, and an operating
6009e6
system can be loaded), SMRAM is locked in hardware, and subsequent modules
6009e6
cannot access it directly. (See EFI_DXE_SMM_READY_TO_LOCK_PROTOCOL.)
6009e6
6009e6
Once SMRAM has been locked, UEFI drivers and the operating system can enter SMM
6009e6
by raising a System Management Interrupt (SMI), at which point trusted code
6009e6
(part of the platform firmware) takes control. SMRAM is also unlocked by
6009e6
platform reset, at which point the boot firmware takes control again.
6009e6
6009e6
Variable store and LockBox in SMRAM
6009e6
-----------------------------------
6009e6
6009e6
Edk2 provides almost all components to implement the variable store and the
6009e6
LockBox in SMRAM. In this section we summarize ideas for utilizing those
6009e6
facilities.
6009e6
6009e6
The SMRAM and SMM infrastructure in edk2 is built up as follows:
6009e6
6009e6
(1) The platform hardware provides SMM / SMI / SMRAM.
6009e6
6009e6
    Qemu/KVM doesn't support these features currently and should implement them
6009e6
    in the longer term.
6009e6
6009e6
(2) The platform vendor (in this case, OVMF developers) implement device
6009e6
    drivers for the platform's System Management Mode:
6009e6
6009e6
    - EFI_SMM_CONTROL2_PROTOCOL: for raising a synchronous (and/or) periodic
6009e6
      SMI(s); that is, for entering SMM.
6009e6
6009e6
    - EFI_SMM_ACCESS2_PROTOCOL: for describing and accessing SMRAM.
6009e6
6009e6
    These protocols are documented in the PI Specification, Volume 4.
6009e6
6009e6
(3) The platform DSC file is to include the following platform-independent
6009e6
    modules:
6009e6
6009e6
    - MdeModulePkg/Core/PiSmmCore/PiSmmIpl.inf: SMM Initial Program Load
6009e6
    - MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf: SMM Core
6009e6
6009e6
(4) At this point, modules of type DXE_SMM_DRIVER can be loaded.
6009e6
6009e6
    Such drivers are privileged. They run in SMM, have access to SMRAM, and are
6009e6
    separated and switched from other drivers through SMIs. Secure
6009e6
    communication between unprivileged (non-SMM) and privileged (SMM) drivers
6009e6
    happens through EFI_SMM_COMMUNICATION_PROTOCOL (implemented by the SMM
6009e6
    Core, see (3)).
6009e6
6009e6
    DXE_SMM_DRIVER modules must sanitize their input (coming from unprivileged
6009e6
    drivers) carefully.
6009e6
6009e6
(5) The authenticated runtime variable services driver (for Secure Boot builds)
6009e6
    is located under "SecurityPkg/VariableAuthenticated/RuntimeDxe". OVMF
6009e6
    currently builds the driver (a DXE_RUNTIME_DRIVER module) with the
6009e6
    "VariableRuntimeDxe.inf" control file (refer to "OvmfPkg/OvmfPkgX64.dsc"),
6009e6
    which does not use SMM.
6009e6
6009e6
    The directory includes two more INF files:
6009e6
6009e6
    - VariableSmm.inf -- module type: DXE_SMM_DRIVER. A privileged driver that
6009e6
      runs in SMM and has access to SMRAM.
6009e6
6009e6
    - VariableSmmRuntimeDxe.inf -- module type: DXE_RUNTIME_DRIVER. A
6009e6
      non-privileged driver that implements the variable runtime services
6009e6
      (replacing the current "VariableRuntimeDxe.inf" file) by communicating
6009e6
      with the above privileged SMM half via EFI_SMM_COMMUNICATION_PROTOCOL.
6009e6
6009e6
(6) An SMRAM-based LockBox implementation needs to be discussed in two parts,
6009e6
    because the LockBox is accessed in both PEI and DXE.
6009e6
6009e6
    (a) During DXE, drivers save data in the LockBox. A save operation is
6009e6
        layered as follows:
6009e6
6009e6
        - The unprivileged driver wishing to store data in the LockBox links
6009e6
          against the "MdeModulePkg/Library/SmmLockBoxLib/SmmLockBoxDxeLib.inf"
6009e6
          library instance.
6009e6
6009e6
          The library allows the unprivileged driver to format requests for the
6009e6
          privileged SMM LockBox driver (see below), and to parse responses.
6009e6
6009e6
        - The privileged SMM LockBox driver is built from
6009e6
          "MdeModulePkg/Universal/LockBox/SmmLockBox/SmmLockBox.inf". This
6009e6
          driver has module type DXE_SMM_DRIVER and can access SMRAM.
6009e6
6009e6
          The driver delegates command parsing and response formatting to
6009e6
          "MdeModulePkg/Library/SmmLockBoxLib/SmmLockBoxSmmLib.inf".
6009e6
6009e6
        - The above two halves (unprivileged and privileged) mirror what we've
6009e6
          seen in case of the variable service drivers, under (5).
6009e6
6009e6
    (b) In PEI, the S3 Resume PEIM (UefiCpuPkg/Universal/Acpi/S3Resume2Pei)
6009e6
        retrieves data from the LockBox.
6009e6
6009e6
        Presumably, S3Resume2Pei should be considered an "unprivileged PEIM",
6009e6
        and the SMRAM access should be layered as seen in DXE. Unfortunately,
6009e6
        edk2 does not implement all of the layers in PEI -- the code either
6009e6
        doesn't exist, or it is not open source:
6009e6
6009e6
  role         | DXE: protocol/module           | PEI: PPI/module
6009e6
  -------------+--------------------------------+------------------------------
6009e6
  unprivileged | any                            | S3Resume2Pei.inf
6009e6
  driver       |                                |
6009e6
  -------------+--------------------------------+------------------------------
6009e6
  command      | LIBRARY_CLASS = LockBoxLib     | LIBRARY_CLASS = LockBoxLib
6009e6
  formatting   |                                |
6009e6
  and response | SmmLockBoxDxeLib.inf           | SmmLockBoxPeiLib.inf
6009e6
  parsing      |                                |
6009e6
  -------------+--------------------------------+------------------------------
6009e6
  privilege    | EFI_SMM_COMMUNICATION_PROTOCOL | EFI_PEI_SMM_COMMUNICATION_PPI
6009e6
  separation   |                                |
6009e6
               | PiSmmCore.inf                  | missing!
6009e6
  -------------+--------------------------------+------------------------------
6009e6
  platform SMM | EFI_SMM_CONTROL2_PROTOCOL      | PEI_SMM_CONTROL_PPI
6009e6
  and SMRAM    | EFI_SMM_ACCESS2_PROTOCOL       | PEI_SMM_ACCESS_PPI
6009e6
  access       |                                |
6009e6
               | to be done in OVMF             | to be done in OVMF
6009e6
  -------------+--------------------------------+------------------------------
6009e6
  command      | LIBRARY_CLASS = LockBoxLib     | LIBRARY_CLASS = LockBoxLib
6009e6
  parsing and  |                                |
6009e6
  response     | SmmLockBoxSmmLib.inf           | missing!
6009e6
  formatting   |                                |
6009e6
  -------------+--------------------------------+------------------------------
6009e6
  privileged   | SmmLockBox.inf                 | missing!
6009e6
  LockBox      |                                |
6009e6
  driver       |                                |
6009e6
6009e6
        Alternatively, in the future OVMF might be able to provide a LockBoxLib
6009e6
        instance (an SmmLockBoxPeiLib substitute) for S3Resume2Pei that
6009e6
        accesses SMRAM directly, eliminating the need for deeper layers in the
6009e6
        stack (that is, EFI_PEI_SMM_COMMUNICATION_PPI and deeper).
6009e6
6009e6
        In fact, a "thin" EFI_PEI_SMM_COMMUNICATION_PPI implementation whose
6009e6
        sole Communicate() member invariably returns EFI_NOT_STARTED would
6009e6
        cause the current SmmLockBoxPeiLib library instance to directly perform
6009e6
        full-depth SMRAM access and LockBox search, obviating the "missing"
6009e6
        cells. (With reference to A Tour Beyond BIOS: Implementing S3 Resume
6009e6
        with EDK2, by Jiewen Yao and Vincent Zimmer, October 2014.)
6009e6
6009e6
Select features
6009e6
---------------
6009e6
6009e6
In this section we'll browse the top-level "OvmfPkg" package directory, and
6009e6
discuss the more interesting drivers and libraries that have not been mentioned
6009e6
thus far.
6009e6
6009e6
X64-specific reset vector for OVMF
6009e6
..................................
6009e6
6009e6
The "OvmfPkg/ResetVector" directory customizes the reset vector (found in
6009e6
"UefiCpuPkg/ResetVector/Vtf0") for "OvmfPkgX64.fdf", that is, when the SEC/PEI
6009e6
phases run in 64-bit (ie. long) mode.
6009e6
6009e6
The reset vector's control flow looks roughly like:
6009e6
6009e6
  resetVector                               [Ia16/ResetVectorVtf0.asm]
6009e6
  EarlyBspInitReal16                        [Ia16/Init16.asm]
6009e6
  Main16                                    [Main.asm]
6009e6
    EarlyInit16                             [Ia16/Init16.asm]
6009e6
6009e6
    ; Transition the processor from
6009e6
    ; 16-bit real mode to 32-bit flat mode
6009e6
    TransitionFromReal16To32BitFlat         [Ia16/Real16ToFlat32.asm]
6009e6
6009e6
    ; Search for the
6009e6
    ; Boot Firmware Volume (BFV)
6009e6
    Flat32SearchForBfvBase                  [Ia32/SearchForBfvBase.asm]
6009e6
6009e6
    ; Search for the SEC entry point
6009e6
    Flat32SearchForSecEntryPoint            [Ia32/SearchForSecEntry.asm]
6009e6
6009e6
    %ifdef ARCH_IA32
6009e6
      ; Jump to the 32-bit SEC entry point
6009e6
    %else
6009e6
      ; Transition the processor
6009e6
      ; from 32-bit flat mode
6009e6
      ; to 64-bit flat mode
6009e6
      Transition32FlatTo64Flat              [Ia32/Flat32ToFlat64.asm]
6009e6
6009e6
        SetCr3ForPageTables64               [Ia32/PageTables64.asm]
6009e6
          ; set CR3 to page tables
6009e6
          ; built into the ROM image
6009e6
6009e6
        ; enable PAE
6009e6
        ; set LME
6009e6
        ; enable paging
6009e6
6009e6
      ; Jump to the 64-bit SEC entry point
6009e6
    %endif
6009e6
6009e6
On physical platforms, the initial page tables referenced by
6009e6
SetCr3ForPageTables64 are built statically into the flash device image, and are
6009e6
present in ROM at runtime. This is fine on physical platforms because the
6009e6
pre-built page table entries have the Accessed and Dirty bits set from the
6009e6
start.
6009e6
6009e6
Accordingly, for OVMF running in long mode on qemu/KVM, the initial page tables
6009e6
were mapped as a KVM_MEM_READONLY slot, as part of QEMU's pflash device (refer
6009e6
to "Firmware image structure" above).
6009e6
6009e6
In spite of the Accessed and Dirty bits being pre-set in the read-only,
6009e6
in-flash PTEs, in a virtual machine attempts are made to update said PTE bits,
6009e6
differently from physical hardware. The component attempting to update the
6009e6
read-only PTEs can be one of the following:
6009e6
6009e6
- The processor itself, if it supports nested paging, and the user enables that
6009e6
  processor feature,
6009e6
6009e6
- KVM code implementing shadow paging, otherwise.
6009e6
6009e6
The first case presents no user-visible symptoms, but the second case (KVM,
6009e6
shadow paging) used to cause a triple fault, prior to Linux commit ba6a354
6009e6
("KVM: mmu: allow page tables to be in read-only slots").
6009e6
6009e6
For compatibility with earlier KVM versions, the OvmfPkg/ResetVector directory
6009e6
adapts the generic reset vector code as follows:
6009e6
6009e6
      Transition32FlatTo64Flat         [UefiCpuPkg/.../Ia32/Flat32ToFlat64.asm]
6009e6
6009e6
        SetCr3ForPageTables64       [OvmfPkg/ResetVector/Ia32/PageTables64.asm]
6009e6
6009e6
          ; dynamically build the initial page tables in RAM, at address
6009e6
          ; PcdOvmfSecPageTablesBase (refer to the memory map above),
6009e6
          ; identity-mapping the first 4 GB of address space
6009e6
6009e6
          ; set CR3 to PcdOvmfSecPageTablesBase
6009e6
6009e6
        ; enable PAE
6009e6
        ; set LME
6009e6
        ; enable paging
6009e6
6009e6
This way the PTEs that earlier KVM versions try to update (during shadow
6009e6
paging) are located in a read-write memory slot, and the write attempts
6009e6
succeed.
6009e6
6009e6
Client library for QEMU's firmware configuration interface
6009e6
..........................................................
6009e6
6009e6
QEMU provides a write-only, 16-bit wide control port, and a read-write, 8-bit
6009e6
wide data port for exchanging configuration elements with the firmware.
6009e6
6009e6
The firmware writes a selector (a key) to the control port (0x510), and then
6009e6
reads the corresponding configuration data (produced by QEMU) from the data
6009e6
port (0x511).
6009e6
6009e6
If the selected entry is writable, the firmware may overwrite it. If QEMU has
6009e6
associated a callback with the entry, then when the entry is completely
6009e6
rewritten, QEMU runs the callback. (OVMF does not rewrite any entries at the
6009e6
moment.)
6009e6
6009e6
A number of selector values (keys) are predefined. In particular, key 0x19
6009e6
selects (returns) a directory of { name, selector, size } triplets, roughly
6009e6
speaking.
6009e6
6009e6
The firmware can request configuration elements by well-known name as well, by
6009e6
looking up the selector value first in the directory, by name, and then writing
6009e6
the selector to the control port. The number of bytes to read subsequently from
6009e6
the data port is known from the directory entry's "size" field.
6009e6
6009e6
By convention, directory entries (well-known symbolic names of configuration
6009e6
elements) are formatted as POSIX pathnames. For example, the array selected by
6009e6
the "etc/system-states" name indicates (among other things) whether the user
6009e6
enabled S3 support in QEMU.
6009e6
6009e6
The above interface is called "fw_cfg".
6009e6
6009e6
The binary data associated with a symbolic name is called an "fw_cfg file".
6009e6
6009e6
OVMF's fw_cfg client library is found in "OvmfPkg/Library/QemuFwCfgLib". OVMF
6009e6
discovers many aspects of the virtual system with it; we refer to a few
6009e6
examples below.
6009e6
6009e6
Guest ACPI tables
6009e6
.................
6009e6
6009e6
An operating system discovers a good amount of its hardware by parsing ACPI
6009e6
tables, and by interpreting ACPI objects and methods. On physical hardware, the
6009e6
platform vendor's firmware installs ACPI tables in memory that match both the
6009e6
hardware present in the system and the user's firmware configuration ("BIOS
6009e6
setup").
6009e6
6009e6
Under qemu/KVM, the owner of the (virtual) hardware configuration is QEMU.
6009e6
Hardware can easily be reconfigured on the command line. Furthermore, features
6009e6
like CPU hotplug, PCI hotplug, memory hotplug are continuously developed for
6009e6
QEMU, and operating systems need direct ACPI support to exploit these features.
6009e6
6009e6
For this reason, QEMU builds its own ACPI tables dynamically, in a
6009e6
self-descriptive manner, and exports them to the firmware through a complex,
6009e6
multi-file fw_cfg interface. It is rooted in the "etc/table-loader" fw_cfg
6009e6
file. (Further details of this interface are out of scope for this report.)
6009e6
6009e6
OVMF's AcpiPlatformDxe driver fetches the ACPI tables, and installs them for
6009e6
the guest OS with the EFI_ACPI_TABLE_PROTOCOL (which is in turn provided by the
6009e6
generic "MdeModulePkg/Universal/Acpi/AcpiTableDxe" driver).
6009e6
6009e6
For earlier QEMU versions and machine types (which we generally don't recommend
6009e6
for OVMF; see "Scope"), the "OvmfPkg/AcpiTables" directory contains a few
6009e6
static ACPI table templates. When the "etc/table-loader" fw_cfg file is
6009e6
unavailable, AcpiPlatformDxe installs these default tables (with a little bit
6009e6
of dynamic patching).
6009e6
6009e6
When OVMF runs in a Xen domU, AcpiTableDxe also installs ACPI tables that
6009e6
originate from the hypervisor's environment.
6009e6
6009e6
Guest SMBIOS tables
6009e6
...................
6009e6
6009e6
Quoting the SMBIOS Reference Specification,
6009e6
6009e6
  [...] the System Management BIOS Reference Specification addresses how
6009e6
  motherboard and system vendors present management information about their
6009e6
  products in a standard format [...]
6009e6
6009e6
In practice SMBIOS tables are just another set of tables that the platform
6009e6
vendor's firmware installs in RAM for the operating system, and, importantly,
6009e6
for management applications running on the OS. Without rehashing the "Guest
6009e6
ACPI tables" section in full, let's map the OVMF roles seen there from ACPI to
6009e6
SMBIOS:
6009e6
6009e6
  role                     | ACPI                    | SMBIOS
6009e6
  -------------------------+-------------------------+-------------------------
6009e6
  fw_cfg file              | etc/table-loader        | etc/smbios/smbios-tables
6009e6
  -------------------------+-------------------------+-------------------------
6009e6
  OVMF driver              | AcpiPlatformDxe         | SmbiosPlatformDxe
6009e6
  under "OvmfPkg"          |                         |
6009e6
  -------------------------+-------------------------+-------------------------
6009e6
  Underlying protocol,     | EFI_ACPI_TABLE_PROTOCOL | EFI_SMBIOS_PROTOCOL
6009e6
  implemented by generic   |                         |
6009e6
  driver under             | Acpi/AcpiTableDxe       | SmbiosDxe
6009e6
  "MdeModulePkg/Universal" |                         |
6009e6
  -------------------------+-------------------------+-------------------------
6009e6
  default tables available | yes                     | [RHEL] yes, Type0 and
6009e6
  for earlier QEMU machine |                         |        Type1 tables
6009e6
  types, with hot-patching |                         |
6009e6
  -------------------------+-------------------------+-------------------------
6009e6
  tables fetched in Xen    | yes                     | yes
6009e6
  domUs                    |                         |
6009e6
6009e6
Platform-specific boot policy
6009e6
.............................
6009e6
6009e6
OVMF's BDS (Boot Device Selection) phase is implemented by
6009e6
IntelFrameworkModulePkg/Universal/BdsDxe. Roughly speaking, this large driver:
6009e6
6009e6
- provides the EFI BDS architectural protocol (which DXE transfers control to
6009e6
  after dispatching all DXE drivers),
6009e6
6009e6
- connects drivers to devices,
6009e6
6009e6
- enumerates boot devices,
6009e6
6009e6
- auto-generates boot options,
6009e6
6009e6
- provides "BIOS setup" screens, such as:
6009e6
6009e6
  - Boot Manager, for booting an option,
6009e6
6009e6
  - Boot Maintenance Manager, for adding, deleting, and reordering boot
6009e6
    options, changing console properties etc,
6009e6
6009e6
  - Device Manager, where devices can register configuration forms, including
6009e6
6009e6
    - Secure Boot configuration forms,
6009e6
6009e6
    - OVMF's Platform Driver form (see under PlatformDxe).
6009e6
6009e6
Firmware that includes the "IntelFrameworkModulePkg/Universal/BdsDxe" driver
6009e6
can customize its behavior by providing an instance of the PlatformBdsLib
6009e6
library class. The driver links against this platform library, and the
6009e6
platform library can call Intel's BDS utility functions from
6009e6
"IntelFrameworkModulePkg/Library/GenericBdsLib".
6009e6
6009e6
OVMF's PlatformBdsLib instance can be found in
6009e6
"OvmfPkg/Library/PlatformBdsLib". The main function where the BdsDxe driver
6009e6
enters the library is PlatformBdsPolicyBehavior(). We mention two OVMF
6009e6
particulars here.
6009e6
6009e6
(1) OVMF is capable of loading kernel images directly from fw_cfg, matching
6009e6
    QEMU's -kernel, -initrd, and -append command line options. This feature is
6009e6
    useful for rapid, repeated Linux kernel testing, and is implemented in the
6009e6
    following call tree:
6009e6
6009e6
    PlatformBdsPolicyBehavior() [OvmfPkg/Library/PlatformBdsLib/BdsPlatform.c]
6009e6
      TryRunningQemuKernel() [OvmfPkg/Library/PlatformBdsLib/QemuKernel.c]
6009e6
        LoadLinux*() [OvmfPkg/Library/LoadLinuxLib/Linux.c]
6009e6
6009e6
    OvmfPkg/Library/LoadLinuxLib ports the efilinux bootloader project into
6009e6
    OvmfPkg.
6009e6
6009e6
(2) OVMF seeks to comply with the boot order specification passed down by QEMU
6009e6
    over fw_cfg.
6009e6
6009e6
    (a) About Boot Modes
6009e6
6009e6
      During the PEI phase, OVMF determines and stores the Boot Mode in the
6009e6
      PHIT HOB (already mentioned in "S3 (suspend to RAM and resume)"). The
6009e6
      boot mode is supposed to influence the rest of the system, for example it
6009e6
      distinguishes S3 resume (BOOT_ON_S3_RESUME) from a "normal" boot.
6009e6
6009e6
      In general, "normal" boots can be further differentiated from each other;
6009e6
      for example for speed reasons. When the firmware can tell during PEI that
6009e6
      the chassis has not been opened since last power-up, then it might want
6009e6
      to save time by not connecting all devices and not enumerating all boot
6009e6
      options from scratch; it could just rely on the stored results of the
6009e6
      last enumeration. The matching BootMode value, to be set during PEI,
6009e6
      would be BOOT_ASSUMING_NO_CONFIGURATION_CHANGES.
6009e6
6009e6
      OVMF only sets one of the following two boot modes, based on CMOS
6009e6
      contents:
6009e6
      - BOOT_ON_S3_RESUME,
6009e6
      - BOOT_WITH_FULL_CONFIGURATION.
6009e6
6009e6
      For BOOT_ON_S3_RESUME, please refer to "S3 (suspend to RAM and resume)".
6009e6
      The other boot mode supported by OVMF, BOOT_WITH_FULL_CONFIGURATION, is
6009e6
      an appropriate "catch-all" for a virtual machine, where hardware can
6009e6
      easily change from boot to boot.
6009e6
6009e6
    (b) Auto-generation of boot options
6009e6
6009e6
      Accordingly, when not resuming from S3 sleep (*), OVMF always connects
6009e6
      all devices, and enumerates all bootable devices as new boot options
6009e6
      (non-volatile variables called Boot####).
6009e6
6009e6
      (*) During S3 resume, DXE is not reached, hence BDS isn't either.
6009e6
6009e6
      The auto-enumerated boot options are stored in the BootOrder non-volatile
6009e6
      variable after any preexistent options. (Boot options may exist before
6009e6
      auto-enumeration eg. because the user added them manually with the Boot
6009e6
      Maintenance Manager or the efibootmgr utility. They could also originate
6009e6
      from an earlier auto-enumeration.)
6009e6
6009e6
      PlatformBdsPolicyBehavior()                   [OvmfPkg/.../BdsPlatform.c]
6009e6
        TryRunningQemuKernel()                       [OvmfPkg/.../QemuKernel.c]
6009e6
        BdsLibConnectAll()           [IntelFrameworkModulePkg/.../BdsConnect.c]
6009e6
        BdsLibEnumerateAllBootOption()  [IntelFrameworkModulePkg/.../BdsBoot.c]
6009e6
          BdsLibBuildOptionFromHandle() [IntelFrameworkModulePkg/.../BdsBoot.c]
6009e6
            BdsLibRegisterNewOption()   [IntelFrameworkModulePkg/.../BdsMisc.c]
6009e6
              //
6009e6
              // Append the new option number to the original option order
6009e6
              //
6009e6
6009e6
    (c) Relative UEFI device paths in boot options
6009e6
6009e6
      The handling of relative ("short-form") UEFI device paths is best
6009e6
      demonstrated through an example, and by quoting the UEFI 2.4A
6009e6
      specification.
6009e6
6009e6
      A short-form hard drive UEFI device path could be (displaying each device
6009e6
      path node on a separate line for readability):
6009e6
6009e6
        HD(1,GPT,14DD1CC5-D576-4BBF-8858-BAF877C8DF61,0x800,0x64000)/
6009e6
        \EFI\fedora\shim.efi
6009e6
6009e6
      This device path lacks prefix nodes (eg. hardware or messaging type
6009e6
      nodes) that would lead to the hard drive. During load option processing,
6009e6
      the above short-form or relative device path could be matched against the
6009e6
      following absolute device path:
6009e6
6009e6
        PciRoot(0x0)/
6009e6
        Pci(0x4,0x0)/
6009e6
        HD(1,GPT,14DD1CC5-D576-4BBF-8858-BAF877C8DF61,0x800,0x64000)/
6009e6
        \EFI\fedora\shim.efi
6009e6
6009e6
      The motivation for this type of device path matching / completion is to
6009e6
      allow the user to move around the hard drive (for example, to plug a
6009e6
      controller in a different PCI slot, or to expose the block device on a
6009e6
      different iSCSI path) and still enable the firmware to find the hard
6009e6
      drive.
6009e6
6009e6
      The UEFI specification says,
6009e6
6009e6
        9.3.6 Media Device Path
6009e6
        9.3.6.1 Hard Drive
6009e6
6009e6
          [...] Section 3.1.2 defines special rules for processing the Hard
6009e6
          Drive Media Device Path. These special rules enable a disk's location
6009e6
          to change and still have the system boot from the disk. [...]
6009e6
6009e6
        3.1.2 Load Option Processing
6009e6
6009e6
          [...] The boot manager must [...] support booting from a short-form
6009e6
          device path that starts with the first element being a hard drive
6009e6
          media device path [...]. The boot manager must use the GUID or
6009e6
          signature and partition number in the hard drive device path to match
6009e6
          it to a device in the system. If the drive supports the GPT
6009e6
          partitioning scheme the GUID in the hard drive media device path is
6009e6
          compared with the UniquePartitionGuid field of the GUID Partition
6009e6
          Entry [...]. If the drive supports the PC-AT MBR scheme the signature
6009e6
          in the hard drive media device path is compared with the
6009e6
          UniqueMBRSignature in the Legacy Master Boot Record [...]. If a
6009e6
          signature match is made, then the partition number must also be
6009e6
          matched. The hard drive device path can be appended to the matching
6009e6
          hardware device path and normal boot behavior can then be used. If
6009e6
          more than one device matches the hard drive device path, the boot
6009e6
          manager will pick one arbitrarily. Thus the operating system must
6009e6
          ensure the uniqueness of the signatures on hard drives to guarantee
6009e6
          deterministic boot behavior.
6009e6
6009e6
      Edk2 implements and exposes the device path completion logic in the
6009e6
      already referenced "IntelFrameworkModulePkg/Library/GenericBdsLib"
6009e6
      library, in the BdsExpandPartitionPartialDevicePathToFull() function.
6009e6
6009e6
    (d) Filtering and reordering the boot options based on fw_cfg
6009e6
6009e6
      Once we have an "all-inclusive", partly preexistent, partly freshly
6009e6
      auto-generated boot option list from bullet (b), OVMF loads QEMU's
6009e6
      requested boot order from fw_cfg, and filters and reorders the list from
6009e6
      (b) with it:
6009e6
6009e6
      PlatformBdsPolicyBehavior()                   [OvmfPkg/.../BdsPlatform.c]
6009e6
        TryRunningQemuKernel()                       [OvmfPkg/.../QemuKernel.c]
6009e6
        BdsLibConnectAll()           [IntelFrameworkModulePkg/.../BdsConnect.c]
6009e6
        BdsLibEnumerateAllBootOption()  [IntelFrameworkModulePkg/.../BdsBoot.c]
6009e6
        SetBootOrderFromQemu()                    [OvmfPkg/.../QemuBootOrder.c]
6009e6
6009e6
      According to the (preferred) "-device ...,bootindex=N" and the (legacy)
6009e6
      '-boot order=drives' command line options, QEMU requests a boot order
6009e6
      from the firmware through the "bootorder" fw_cfg file. (For a bootindex
6009e6
      example, refer to the "Example qemu invocation" section.)
6009e6
6009e6
      This fw_cfg file consists of OpenFirmware (OFW) device paths -- note: not
6009e6
      UEFI device paths! --, one per line. An example list is:
6009e6
6009e6
        /pci@i0cf8/scsi@4/disk@0,0
6009e6
        /pci@i0cf8/ide@1,1/drive@1/disk@0
6009e6
        /pci@i0cf8/ethernet@3/ethernet-phy@0
6009e6
6009e6
      OVMF filters and reorders the boot option list from bullet (b) with the
6009e6
      following nested loops algorithm:
6009e6
6009e6
        new_uefi_order := <empty>
6009e6
        for each qemu_ofw_path in QEMU's OpenFirmware device path list:
6009e6
          qemu_uefi_path_prefix := translate(qemu_ofw_path)
6009e6
6009e6
          for each boot_option in current_uefi_order:
6009e6
            full_boot_option := complete(boot_option)
6009e6
6009e6
            if match(qemu_uefi_path_prefix, full_boot_option):
6009e6
              append(new_uefi_order, boot_option)
6009e6
              break
6009e6
6009e6
        for each unmatched boot_option in current_uefi_order:
6009e6
          if survives(boot_option):
6009e6
            append(new_uefi_order, boot_option)
6009e6
6009e6
        current_uefi_order := new_uefi_order
6009e6
6009e6
      OVMF iterates over QEMU's OFW device paths in order, translates each to a
6009e6
      UEFI device path prefix, tries to match the translated prefix against the
6009e6
      UEFI boot options (which are completed from relative form to absolute
6009e6
      form for the purpose of prefix matching), and if there's a match, the
6009e6
      matching boot option is appended to the new boot order (which starts out
6009e6
      empty).
6009e6
6009e6
      (We elaborate on the translate() function under bullet (e). The
6009e6
      complete() function has been explained in bullet (c).)
6009e6
6009e6
      In addition, UEFI boot options that remain unmatched after filtering and
6009e6
      reordering are post-processed, and some of them "survive". Due to the
6009e6
      fact that OpenFirmware device paths have less expressive power than their
6009e6
      UEFI counterparts, some UEFI boot options are simply inexpressible (hence
6009e6
      unmatchable) by the nested loops algorithm.
6009e6
6009e6
      An important example is the memory-mapped UEFI shell, whose UEFI device
6009e6
      path is inexpressible by QEMU's OFW device paths:
6009e6
6009e6
        MemoryMapped(0xB,0x900000,0x10FFFFF)/
6009e6
        FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)
6009e6
6009e6
      (Side remark: notice that the address range visible in the MemoryMapped()
6009e6
      node corresponds to DXEFV under "comprehensive memory map of OVMF"! In
6009e6
      addition, the FvFile() node's GUID originates from the FILE_GUID entry of
6009e6
      "ShellPkg/Application/Shell/Shell.inf".)
6009e6
6009e6
      The UEFI shell can be booted by pressing ESC in OVMF on the TianoCore
6009e6
      splash screen, and navigating to Boot Manager | EFI Internal Shell. If
6009e6
      the "survival policy" was not implemented, the UEFI shell's boot option
6009e6
      would always be filtered out.
6009e6
6009e6
      The current "survival policy" preserves all boot options that start with
6009e6
      neither PciRoot() nor HD().
6009e6
6009e6
    (e) Translating QEMU's OpenFirmware device paths to UEFI device path
6009e6
        prefixes
6009e6
6009e6
      In this section we list the (strictly heuristical) mappings currently
6009e6
      performed by OVMF.
6009e6
6009e6
      The "prefix only" nature of the translation output is rooted minimally in
6009e6
      the fact that QEMU's OpenFirmware device paths cannot carry pathnames
6009e6
      within filesystems. There's no way to specify eg.
6009e6
6009e6
        \EFI\fedora\shim.efi
6009e6
6009e6
      in an OFW device path, therefore a UEFI device path translated from an
6009e6
      OFW device path can at best be a prefix (not a full match) of a UEFI
6009e6
      device path that ends with "\EFI\fedora\shim.efi".
6009e6
6009e6
      - IDE disk, IDE CD-ROM:
6009e6
6009e6
        OpenFirmware device path:
6009e6
6009e6
          /pci@i0cf8/ide@1,1/drive@0/disk@0
6009e6
               ^         ^ ^       ^      ^
6009e6
               |         | |       |      master or slave
6009e6
               |         | |       primary or secondary
6009e6
               |         PCI slot & function holding IDE controller
6009e6
               PCI root at system bus port, PIO
6009e6
6009e6
        UEFI device path prefix:
6009e6
6009e6
          PciRoot(0x0)/Pci(0x1,0x1)/Ata(Primary,Master,0x0)
6009e6
                                                       ^
6009e6
                                                       fixed LUN
6009e6
6009e6
      - Floppy disk:
6009e6
6009e6
        OpenFirmware device path:
6009e6
6009e6
          /pci@i0cf8/isa@1/fdc@03f0/floppy@0
6009e6
               ^         ^     ^           ^
6009e6
               |         |     |           A: or B:
6009e6
               |         |     ISA controller io-port (hex)
6009e6
               |         PCI slot holding ISA controller
6009e6
               PCI root at system bus port, PIO
6009e6
6009e6
        UEFI device path prefix:
6009e6
6009e6
          PciRoot(0x0)/Pci(0x1,0x0)/Floppy(0x0)
6009e6
                                           ^
6009e6
                                           ACPI UID (A: or B:)
6009e6
6009e6
      - Virtio-block disk:
6009e6
6009e6
        OpenFirmware device path:
6009e6
6009e6
          /pci@i0cf8/scsi@6[,3]/disk@0,0
6009e6
               ^          ^  ^       ^ ^
6009e6
               |          |  |       fixed
6009e6
               |          |  PCI function corresponding to disk (optional)
6009e6
               |          PCI slot holding disk
6009e6
               PCI root at system bus port, PIO
6009e6
6009e6
        UEFI device path prefixes (dependent on the presence of a nonzero PCI
6009e6
        function in the OFW device path):
6009e6
6009e6
          PciRoot(0x0)/Pci(0x6,0x0)/HD(
6009e6
          PciRoot(0x0)/Pci(0x6,0x3)/HD(
6009e6
6009e6
      - Virtio-scsi disk and virtio-scsi passthrough:
6009e6
6009e6
        OpenFirmware device path:
6009e6
6009e6
          /pci@i0cf8/scsi@7[,3]/channel@0/disk@2,3
6009e6
               ^          ^             ^      ^ ^
6009e6
               |          |             |      | LUN
6009e6
               |          |             |      target
6009e6
               |          |             channel (unused, fixed 0)
6009e6
               |          PCI slot[, function] holding SCSI controller
6009e6
               PCI root at system bus port, PIO
6009e6
6009e6
        UEFI device path prefixes (dependent on the presence of a nonzero PCI
6009e6
        function in the OFW device path):
6009e6
6009e6
          PciRoot(0x0)/Pci(0x7,0x0)/Scsi(0x2,0x3)
6009e6
          PciRoot(0x0)/Pci(0x7,0x3)/Scsi(0x2,0x3)
6009e6
6009e6
      - Emulated and passed-through (physical) network cards:
6009e6
6009e6
        OpenFirmware device path:
6009e6
6009e6
          /pci@i0cf8/ethernet@3[,2]
6009e6
               ^              ^
6009e6
               |              PCI slot[, function] holding Ethernet card
6009e6
               PCI root at system bus port, PIO
6009e6
6009e6
        UEFI device path prefixes (dependent on the presence of a nonzero PCI
6009e6
        function in the OFW device path):
6009e6
6009e6
          PciRoot(0x0)/Pci(0x3,0x0)
6009e6
          PciRoot(0x0)/Pci(0x3,0x2)
6009e6
6009e6
Virtio drivers
6009e6
..............
6009e6
6009e6
UEFI abstracts various types of hardware resources into protocols, and allows
6009e6
firmware developers to implement those protocols in device drivers. The Virtio
6009e6
Specification defines various types of virtual hardware for virtual machines.
6009e6
Connecting the two specifications, OVMF provides UEFI drivers for QEMU's
6009e6
virtio-block, virtio-scsi, and virtio-net devices.
6009e6
6009e6
The following diagram presents the protocol and driver stack related to Virtio
6009e6
devices in edk2 and OVMF. Each node in the graph identifies a protocol and/or
6009e6
the edk2 driver that produces it. Nodes on the top are more abstract.
6009e6
6009e6
  EFI_BLOCK_IO_PROTOCOL                             EFI_SIMPLE_NETWORK_PROTOCOL
6009e6
  [OvmfPkg/VirtioBlkDxe]                              [OvmfPkg/VirtioNetDxe]
6009e6
             |                                                   |
6009e6
             |         EFI_EXT_SCSI_PASS_THRU_PROTOCOL           |
6009e6
             |             [OvmfPkg/VirtioScsiDxe]               |
6009e6
             |                        |                          |
6009e6
             +------------------------+--------------------------+
6009e6
                                      |
6009e6
                           VIRTIO_DEVICE_PROTOCOL
6009e6
                                      |
6009e6
                +---------------------+---------------------+
6009e6
                |                                           |
6009e6
  [OvmfPkg/VirtioPciDeviceDxe]                  [custom platform drivers]
6009e6
                |                                           |
6009e6
                |                                           |
6009e6
       EFI_PCI_IO_PROTOCOL                [OvmfPkg/Library/VirtioMmioDeviceLib]
6009e6
 [MdeModulePkg/Bus/Pci/PciBusDxe]              direct MMIO register access
6009e6
6009e6
The top three drivers produce standard UEFI abstractions: the Block IO
6009e6
Protocol, the Extended SCSI Pass Thru Protocol, and the Simple Network
6009e6
Protocol, for virtio-block, virtio-scsi, and virtio-net devices, respectively.
6009e6
6009e6
Comparing these device-specific virtio drivers to each other, we can determine:
6009e6
6009e6
- They all conform to the UEFI Driver Model. This means that their entry point
6009e6
  functions don't immediately start to search for devices and to drive them,
6009e6
  they only register instances of the EFI_DRIVER_BINDING_PROTOCOL. The UEFI
6009e6
  Driver Model then enumerates devices and chains matching drivers
6009e6
  automatically.
6009e6
6009e6
- They are as minimal as possible, while remaining correct (refer to source
6009e6
  code comments for details). For example, VirtioBlkDxe and VirtioScsiDxe both
6009e6
  support only one request in flight.
6009e6
6009e6
  In theory, VirtioBlkDxe could implement EFI_BLOCK_IO2_PROTOCOL, which allows
6009e6
  queueing. Similarly, VirtioScsiDxe does not support the non-blocking mode of
6009e6
  EFI_EXT_SCSI_PASS_THRU_PROTOCOL.PassThru(). (Which is permitted by the UEFI
6009e6
  specification.) Both VirtioBlkDxe and VirtioScsiDxe delegate synchronous
6009e6
  request handling to "OvmfPkg/Library/VirtioLib". This limitation helps keep
6009e6
  the implementation simple, and testing thus far seems to imply satisfactory
6009e6
  performance, for a virtual boot firmware.
6009e6
6009e6
  VirtioNetDxe cannot avoid queueing, because EFI_SIMPLE_NETWORK_PROTOCOL
6009e6
  requires it on the interface level. Consequently, VirtioNetDxe is
6009e6
  significantly more complex than VirtioBlkDxe and VirtioScsiDxe. Technical
6009e6
  notes are provided in "OvmfPkg/VirtioNetDxe/TechNotes.txt".
6009e6
6009e6
- None of these drivers access hardware directly. Instead, the Virtio Device
6009e6
  Protocol (OvmfPkg/Include/Protocol/VirtioDevice.h) collects / extracts virtio
6009e6
  operations defined in the Virtio Specification, and these backend-independent
6009e6
  virtio device drivers go through the abstract VIRTIO_DEVICE_PROTOCOL.
6009e6
6009e6
  IMPORTANT: the VIRTIO_DEVICE_PROTOCOL is not a standard UEFI protocol. It is
6009e6
  internal to edk2 and not described in the UEFI specification. It should only
6009e6
  be used by drivers and applications that live inside the edk2 source tree.
6009e6
6009e6
Currently two providers exist for VIRTIO_DEVICE_PROTOCOL:
6009e6
6009e6
- The first one is the "more traditional" virtio-pci backend, implemented by
6009e6
  OvmfPkg/VirtioPciDeviceDxe. This driver also complies with the UEFI Driver
6009e6
  Model. It consumes an instance of the EFI_PCI_IO_PROTOCOL, and, if the PCI
6009e6
  device/function under probing appears to be a virtio device, it produces a
6009e6
  Virtio Device Protocol instance for it. The driver translates abstract virtio
6009e6
  operations to PCI accesses.
6009e6
6009e6
- The second provider, the virtio-mmio backend, is a library, not a driver,
6009e6
  living in OvmfPkg/Library/VirtioMmioDeviceLib. This library translates
6009e6
  abstract virtio operations to MMIO accesses.
6009e6
6009e6
  The virtio-mmio backend is only a library -- rather than a standalone, UEFI
6009e6
  Driver Model-compliant driver -- because the type of resource it consumes, an
6009e6
  MMIO register block base address, is not enumerable.
6009e6
6009e6
  In other words, while the PCI root bridge driver and the PCI bus driver
6009e6
  produce instances of EFI_PCI_IO_PROTOCOL automatically, thereby enabling the
6009e6
  UEFI Driver Model to probe devices and stack up drivers automatically, no
6009e6
  such enumeration exists for MMIO register blocks.
6009e6
6009e6
  For this reason, VirtioMmioDeviceLib needs to be linked into thin, custom
6009e6
  platform drivers that dispose over this kind of information. As soon as a
6009e6
  driver knows about the MMIO register block base addresses, it can pass each
6009e6
  to the library, and then the VIRTIO_DEVICE_PROTOCOL will be instantiated
6009e6
  (assuming a valid virtio-mmio register block of course). From that point on
6009e6
  the UEFI Driver Model again takes care of the chaining.
6009e6
6009e6
  Typically, such a custom driver does not conform to the UEFI Driver Model
6009e6
  (because that would presuppose auto-enumeration for MMIO register blocks).
6009e6
  Hence it has the following responsibilities:
6009e6
6009e6
  - it shall behave as a "wrapper" UEFI driver around the library,
6009e6
6009e6
  - it shall know virtio-mmio base addresses,
6009e6
6009e6
  - in its entry point function, it shall create a new UEFI handle with an
6009e6
    instance of the EFI_DEVICE_PATH_PROTOCOL for each virtio-mmio device it
6009e6
    knows the base address for,
6009e6
6009e6
  - it shall call VirtioMmioInstallDevice() on those handles, with the
6009e6
    corresponding base addresses.
6009e6
6009e6
  OVMF itself does not employ VirtioMmioDeviceLib. However, the library is used
6009e6
  (or has been tested as Proof-of-Concept) in the following 64-bit and 32-bit
6009e6
  ARM emulator setups:
6009e6
6009e6
  - in "RTSM_VE_FOUNDATIONV8_EFI.fd" and "FVP_AARCH64_EFI.fd", on ARM Holdings'
6009e6
    ARM(R) v8-A Foundation Model and ARM(R) AEMv8-A Base Platform FVP
6009e6
    emulators, respectively:
6009e6
6009e6
                           EFI_BLOCK_IO_PROTOCOL
6009e6
                           [OvmfPkg/VirtioBlkDxe]
6009e6
                                      |
6009e6
                           VIRTIO_DEVICE_PROTOCOL
6009e6
        [ArmPlatformPkg/ArmVExpressPkg/ArmVExpressDxe/ArmFvpDxe.inf]
6009e6
                                      |
6009e6
                    [OvmfPkg/Library/VirtioMmioDeviceLib]
6009e6
                         direct MMIO register access
6009e6
6009e6
  - in "RTSM_VE_CORTEX-A15_EFI.fd" and "RTSM_VE_CORTEX-A15_MPCORE_EFI.fd", on
6009e6
    "qemu-system-arm -M vexpress-a15":
6009e6
6009e6
        EFI_BLOCK_IO_PROTOCOL            EFI_SIMPLE_NETWORK_PROTOCOL
6009e6
        [OvmfPkg/VirtioBlkDxe]             [OvmfPkg/VirtioNetDxe]
6009e6
                   |                                  |
6009e6
                   +------------------+---------------+
6009e6
                                      |
6009e6
                           VIRTIO_DEVICE_PROTOCOL
6009e6
        [ArmPlatformPkg/ArmVExpressPkg/ArmVExpressDxe/ArmFvpDxe.inf]
6009e6
                                      |
6009e6
                    [OvmfPkg/Library/VirtioMmioDeviceLib]
6009e6
                         direct MMIO register access
6009e6
6009e6
  In the above ARM / VirtioMmioDeviceLib configurations, VirtioBlkDxe was
6009e6
  tested with booting Linux distributions, while VirtioNetDxe was tested with
6009e6
  pinging public IPv4 addresses from the UEFI shell.
6009e6
6009e6
Platform Driver
6009e6
...............
6009e6
6009e6
Sometimes, elements of persistent firmware configuration are best exposed to
6009e6
the user in a friendly way. OVMF's platform driver (OvmfPkg/PlatformDxe)
6009e6
presents such settings on the "OVMF Platform Configuration" dialog:
6009e6
6009e6
- Press ESC on the TianoCore splash screen,
6009e6
- Navigate to Device Manager | OVMF Platform Configuration.
6009e6
6009e6
At the moment, OVMF's platform driver handles only one setting: the preferred
6009e6
graphics resolution. This is useful for two purposes:
6009e6
6009e6
- Some UEFI shell commands, like DRIVERS and DEVICES, benefit from a wide
6009e6
  display. Using the MODE shell command, the user can switch to a larger text
6009e6
  resolution (limited by the graphics resolution), and see the command output
6009e6
  in a more easily consumable way.
6009e6
6009e6
  [RHEL] The list of text modes available to the MODE command is also limited
6009e6
         by ConSplitterDxe (found under MdeModulePkg/Universal/Console).
6009e6
         ConSplitterDxe builds an intersection of text modes that are
6009e6
         simultaneously supported by all consoles that ConSplitterDxe
6009e6
         multiplexes console output to.
6009e6
6009e6
         In practice, the strongest text mode restriction comes from
6009e6
         TerminalDxe, which provides console I/O on serial ports. TerminalDxe
6009e6
         has a very limited built-in list of text modes, heavily pruning the
6009e6
         intersection built by ConSplitterDxe, and made available to the MODE
6009e6
         command.
6009e6
6009e6
         On the Red Hat Enterprise Linux 7.1 host, TerminalDxe's list of modes
6009e6
         has been extended with text resolutions that match the Spice QXL GPU's
6009e6
         common graphics resolutions. This way a "full screen" text mode should
6009e6
         always be available in the MODE command.
6009e6
6009e6
- The other advantage of controlling the graphics resolution lies with UEFI
6009e6
  operating systems that don't (yet) have a native driver for QEMU's virtual
6009e6
  video cards  -- eg. the Spice QXL GPU. Such OSes may choose to inherit the
6009e6
  properties of OVMF's EFI_GRAPHICS_OUTPUT_PROTOCOL (provided by
6009e6
  OvmfPkg/QemuVideoDxe, see later).
6009e6
6009e6
  Although the display can be used at runtime in such cases, by direct
6009e6
  framebuffer access, its properties, for example, the resolution, cannot be
6009e6
  modified. The platform driver allows the user to select the preferred GOP
6009e6
  resolution, reboot, and let the guest OS inherit that preferred resolution.
6009e6
6009e6
The platform driver has three access points: the "normal" driver entry point, a
6009e6
set of HII callbacks, and a GOP installation callback.
6009e6
6009e6
(1) Driver entry point: the PlatformInit() function.
6009e6
6009e6
    (a) First, this function loads any available settings, and makes them take
6009e6
        effect. For the preferred graphics resolution in particular, this means
6009e6
        setting the following PCDs:
6009e6
6009e6
          gEfiMdeModulePkgTokenSpaceGuid.PcdVideoHorizontalResolution
6009e6
          gEfiMdeModulePkgTokenSpaceGuid.PcdVideoVerticalResolution
6009e6
6009e6
        These PCDs influence the GraphicsConsoleDxe driver (located under
6009e6
        MdeModulePkg/Universal/Console), which switches to the preferred
6009e6
        graphics mode, and produces EFI_SIMPLE_TEXT_OUTPUT_PROTOCOLs on GOPs:
6009e6
6009e6
                    EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL
6009e6
          [MdeModulePkg/Universal/Console/GraphicsConsoleDxe]
6009e6
                                   |
6009e6
                      EFI_GRAPHICS_OUTPUT_PROTOCOL
6009e6
                         [OvmfPkg/QemuVideoDxe]
6009e6
                                   |
6009e6
                          EFI_PCI_IO_PROTOCOL
6009e6
                   [MdeModulePkg/Bus/Pci/PciBusDxe]
6009e6
6009e6
  (b) Second, the driver entry point registers the user interface, including
6009e6
      HII callbacks.
6009e6
6009e6
  (c) Third, the driver entry point registers a GOP installation callback.
6009e6
6009e6
(2) HII callbacks and the user interface.
6009e6
6009e6
    The Human Interface Infrastructure (HII) "is a set of protocols that allow
6009e6
    a UEFI driver to provide the ability to register user interface and
6009e6
    configuration content with the platform firmware".
6009e6
6009e6
    OVMF's platform driver:
6009e6
6009e6
    - provides a static, basic, visual form (PlatformForms.vfr), written in the
6009e6
      Visual Forms Representation language,
6009e6
6009e6
    - includes a UCS-16 encoded message catalog (Platform.uni),
6009e6
6009e6
    - includes source code that dynamically populates parts of the form, with
6009e6
      the help of MdeModulePkg/Library/UefiHiiLib -- this library simplifies
6009e6
      the handling of IFR (Internal Forms Representation) opcodes,
6009e6
6009e6
    - processes form actions that the user takes (Callback() function),
6009e6
6009e6
    - loads and saves platform configuration in a private, non-volatile
6009e6
      variable (ExtractConfig() and RouteConfig() functions).
6009e6
6009e6
    The ExtractConfig() HII callback implements the following stack of
6009e6
    conversions, for loading configuration and presenting it to the user:
6009e6
6009e6
          MultiConfigAltResp       -- form engine / HII communication
6009e6
                  ^
6009e6
                  |
6009e6
           [BlockToConfig]
6009e6
                  |
6009e6
           MAIN_FORM_STATE         -- binary representation of form/widget
6009e6
                  ^                   state
6009e6
                  |
6009e6
      [PlatformConfigToFormState]
6009e6
                  |
6009e6
           PLATFORM_CONFIG         -- accessible to DXE and UEFI drivers
6009e6
                  ^
6009e6
                  |
6009e6
         [PlatformConfigLoad]
6009e6
                  |
6009e6
        UEFI non-volatile variable -- accessible to external utilities
6009e6
6009e6
    The layers are very similar for the reverse direction, ie. when taking
6009e6
    input from the user, and saving the configuration (RouteConfig() HII
6009e6
    callback):
6009e6
6009e6
             ConfigResp            -- form engine / HII communication
6009e6
                  |
6009e6
           [ConfigToBlock]
6009e6
                  |
6009e6
                  v
6009e6
           MAIN_FORM_STATE         -- binary representation of form/widget
6009e6
                  |                   state
6009e6
      [FormStateToPlatformConfig]
6009e6
                  |
6009e6
                  v
6009e6
           PLATFORM_CONFIG         -- accessible to DXE and UEFI drivers
6009e6
                  |
6009e6
         [PlatformConfigSave]
6009e6
                  |
6009e6
                  v
6009e6
        UEFI non-volatile variable -- accessible to external utilities
6009e6
6009e6
(3) When the platform driver starts, a GOP may not be available yet. Thus the
6009e6
    driver entry point registers a callback (the GopInstalled() function) for
6009e6
    GOP installations.
6009e6
6009e6
    When the first GOP is produced (usually by QemuVideoDxe, or potentially by
6009e6
    a third party video driver), PlatformDxe retrieves the list of graphics
6009e6
    modes the GOP supports, and dynamically populates the drop-down list of
6009e6
    available resolutions on the form. The GOP installation callback is then
6009e6
    removed.
6009e6
6009e6
Video driver
6009e6
............
6009e6
6009e6
OvmfPkg/QemuVideoDxe is OVMF's built-in video driver. We can divide its
6009e6
services in two parts: graphics output protocol (primary), and Int10h (VBE)
6009e6
shim (secondary).
6009e6
6009e6
(1) QemuVideoDxe conforms to the UEFI Driver Model; it produces an instance of
6009e6
    the EFI_GRAPHICS_OUTPUT_PROTOCOL (GOP) on each PCI display that it supports
6009e6
    and is connected to:
6009e6
6009e6
                      EFI_GRAPHICS_OUTPUT_PROTOCOL
6009e6
                         [OvmfPkg/QemuVideoDxe]
6009e6
                                   |
6009e6
                          EFI_PCI_IO_PROTOCOL
6009e6
                   [MdeModulePkg/Bus/Pci/PciBusDxe]
6009e6
6009e6
    It supports the following QEMU video cards:
6009e6
6009e6
    - Cirrus 5430 ("-device cirrus-vga"),
6009e6
    - Standard VGA ("-device VGA"),
6009e6
    - QXL VGA ("-device qxl-vga", "-device qxl").
6009e6
6009e6
    For Cirrus the following resolutions and color depths are available:
6009e6
    640x480x32, 800x600x32, 1024x768x24. On stdvga and QXL a long list of
6009e6
    resolutions is available. The list is filtered against the frame buffer
6009e6
    size during initialization.
6009e6
6009e6
    The size of the QXL VGA compatibility framebuffer can be changed with the
6009e6
6009e6
      -device qxl-vga,vgamem_mb=$NUM_MB
6009e6
6009e6
    QEMU option. If $NUM_MB exceeds 32, then the following is necessary
6009e6
    instead:
6009e6
6009e6
      -device qxl-vga,vgamem_mb=$NUM_MB,ram_size_mb=$((NUM_MB*2))
6009e6
6009e6
    because the compatibility framebuffer can't cover more than half of PCI BAR
6009e6
    #0. The latter defaults to 64MB in size, and is controlled by the
6009e6
    "ram_size_mb" property.
6009e6
6009e6
(2) When QemuVideoDxe binds the first Standard VGA or QXL VGA device, and there
6009e6
    is no real VGA BIOS present in the C to F segments (which could originate
6009e6
    from a legacy PCI option ROM -- refer to "Compatibility Support Module
6009e6
    (CSM)"), then QemuVideoDxe installs a minimal, "fake" VGA BIOS -- an Int10h
6009e6
    (VBE) "shim".
6009e6
6009e6
    The shim is implemented in 16-bit assembly in
6009e6
    "OvmfPkg/QemuVideoDxe/VbeShim.asm". The "VbeShim.sh" shell script assembles
6009e6
    it and formats it as a C array ("VbeShim.h") with the help of the "nasm"
6009e6
    utility. The driver's InstallVbeShim() function copies the shim in place
6009e6
    (the C segment), and fills in the VBE Info and VBE Mode Info structures.
6009e6
    The real-mode 10h interrupt vector is pointed to the shim's handler.
6009e6
6009e6
    The shim is (correctly) irrelevant and invisible for all UEFI operating
6009e6
    systems we know about -- except Windows Server 2008 R2 and other Windows
6009e6
    operating systems in that family.
6009e6
6009e6
    Namely, the Windows 2008 R2 SP1 (and Windows 7) UEFI guest's default video
6009e6
    driver dereferences the real mode Int10h vector, loads the pointed-to
6009e6
    handler code, and executes what it thinks to be VGA BIOS services in an
6009e6
    internal real-mode emulator. Consequently, video mode switching used not to
6009e6
    work in Windows 2008 R2 SP1 when it ran on the "pure UEFI" build of OVMF,
6009e6
    making the guest uninstallable. Hence the (otherwise optional, non-default)
6009e6
    Compatibility Support Module (CSM) ended up a requirement for running such
6009e6
    guests.
6009e6
6009e6
    The hard dependency on the sophisticated SeaBIOS CSM and the complex
6009e6
    supporting edk2 infrastructure, for enabling this family of guests, was
6009e6
    considered suboptimal by some members of the upstream community,
6009e6
6009e6
    [RHEL] and was certainly considered a serious maintenance disadvantage for
6009e6
           Red Hat Enterprise Linux 7.1 hosts.
6009e6
6009e6
    Thus, the shim has been collaboratively developed for the Windows 7 /
6009e6
    Windows Server 2008 R2 family. The shim provides a real stdvga / QXL
6009e6
    implementation for the few services that are in fact necessary for the
6009e6
    Windows 2008 R2 SP1 (and Windows 7) UEFI guest, plus some "fakes" that the
6009e6
    guest invokes but whose effect is not important. The only supported mode is
6009e6
    1024x768x32, which is enough to install the guest and then upgrade its
6009e6
    video driver to the full-featured QXL XDDM one.
6009e6
6009e6
    The C segment is not present in the UEFI memory map prepared by OVMF.
6009e6
    Memory space that would cover it is never added (either in PEI, in the form
6009e6
    of memory resource descriptor HOBs, or in DXE, via gDS->AddMemorySpace()).
6009e6
    This way the handler body is invisible to all other UEFI guests, and the
6009e6
    rest of edk2.
6009e6
6009e6
    The Int10h real-mode IVT entry is covered with a Boot Services Code page,
6009e6
    making that too inaccessible to the rest of edk2. Due to the allocation
6009e6
    type, UEFI guest OSes different from the Windows Server 2008 family can
6009e6
    reclaim the page at zero. (The Windows 2008 family accesses that page
6009e6
    regardless of the allocation type.)
6009e6
6009e6
Afterword
6009e6
---------
6009e6
6009e6
After the bulk of this document was written in July 2014, OVMF development has
6009e6
not stopped. To name two significant code contributions from the community: in
6009e6
January 2015, OVMF runs on the "q35" machine type of QEMU, and it features a
6009e6
driver for Xen paravirtual block devices (and another for the underlying Xen
6009e6
bus).
6009e6
6009e6
Furthermore, a dedicated virtualization platform has been contributed to
6009e6
ArmPlatformPkg that plays a role parallel to OvmfPkg's. It targets the "virt"
6009e6
machine type of qemu-system-arm and qemu-system-aarch64. Parts of OvmfPkg are
6009e6
being refactored and modularized so they can be reused in
6009e6
"ArmPlatformPkg/ArmVirtualizationPkg/ArmVirtualizationQemu.dsc".