diff options
Diffstat (limited to 'Documentation/x86')
-rw-r--r-- | Documentation/x86/boot.rst | 174 | ||||
-rw-r--r-- | Documentation/x86/index.rst | 1 | ||||
-rw-r--r-- | Documentation/x86/tsx_async_abort.rst | 117 |
3 files changed, 292 insertions, 0 deletions
diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst index 08a2f100c0e6..90bb8f5ab384 100644 --- a/Documentation/x86/boot.rst +++ b/Documentation/x86/boot.rst @@ -68,8 +68,25 @@ Protocol 2.12 (Kernel 3.8) Added the xloadflags field and extension fields Protocol 2.13 (Kernel 3.14) Support 32- and 64-bit flags being set in xloadflags to support booting a 64-bit kernel from 32-bit EFI + +Protocol 2.14: BURNT BY INCORRECT COMMIT ae7e1238e68f2a472a125673ab506d49158c1889 + (x86/boot: Add ACPI RSDP address to setup_header) + DO NOT USE!!! ASSUME SAME AS 2.13. + +Protocol 2.15: (Kernel 5.5) Added the kernel_info and kernel_info.setup_type_max. ============= ============================================================ +.. note:: + The protocol version number should be changed only if the setup header + is changed. There is no need to update the version number if boot_params + or kernel_info are changed. Additionally, it is recommended to use + xloadflags (in this case the protocol version number should not be + updated either) or kernel_info to communicate supported Linux kernel + features to the boot loader. Due to very limited space available in + the original setup header every update to it should be considered + with great care. Starting from the protocol 2.15 the primary way to + communicate things to the boot loader is the kernel_info. + Memory Layout ============= @@ -207,6 +224,7 @@ Offset/Size Proto Name Meaning 0258/8 2.10+ pref_address Preferred loading address 0260/4 2.10+ init_size Linear memory required during initialization 0264/4 2.11+ handover_offset Offset of handover entry point +0268/4 2.15+ kernel_info_offset Offset of the kernel_info =========== ======== ===================== ============================================ .. note:: @@ -809,6 +827,47 @@ Protocol: 2.09+ sure to consider the case where the linked list already contains entries. + The setup_data is a bit awkward to use for extremely large data objects, + both because the setup_data header has to be adjacent to the data object + and because it has a 32-bit length field. However, it is important that + intermediate stages of the boot process have a way to identify which + chunks of memory are occupied by kernel data. + + Thus setup_indirect struct and SETUP_INDIRECT type were introduced in + protocol 2.15. + + struct setup_indirect { + __u32 type; + __u32 reserved; /* Reserved, must be set to zero. */ + __u64 len; + __u64 addr; + }; + + The type member is a SETUP_INDIRECT | SETUP_* type. However, it cannot be + SETUP_INDIRECT itself since making the setup_indirect a tree structure + could require a lot of stack space in something that needs to parse it + and stack space can be limited in boot contexts. + + Let's give an example how to point to SETUP_E820_EXT data using setup_indirect. + In this case setup_data and setup_indirect will look like this: + + struct setup_data { + __u64 next = 0 or <addr_of_next_setup_data_struct>; + __u32 type = SETUP_INDIRECT; + __u32 len = sizeof(setup_data); + __u8 data[sizeof(setup_indirect)] = struct setup_indirect { + __u32 type = SETUP_INDIRECT | SETUP_E820_EXT; + __u32 reserved = 0; + __u64 len = <len_of_SETUP_E820_EXT_data>; + __u64 addr = <addr_of_SETUP_E820_EXT_data>; + } + } + +.. note:: + SETUP_INDIRECT | SETUP_NONE objects cannot be properly distinguished + from SETUP_INDIRECT itself. So, this kind of objects cannot be provided + by the bootloaders. + ============ ============ Field name: pref_address Type: read (reloc) @@ -855,6 +914,121 @@ Offset/size: 0x264/4 See EFI HANDOVER PROTOCOL below for more details. +============ ================== +Field name: kernel_info_offset +Type: read +Offset/size: 0x268/4 +Protocol: 2.15+ +============ ================== + + This field is the offset from the beginning of the kernel image to the + kernel_info. The kernel_info structure is embedded in the Linux image + in the uncompressed protected mode region. + + +The kernel_info +=============== + +The relationships between the headers are analogous to the various data +sections: + + setup_header = .data + boot_params/setup_data = .bss + +What is missing from the above list? That's right: + + kernel_info = .rodata + +We have been (ab)using .data for things that could go into .rodata or .bss for +a long time, for lack of alternatives and -- especially early on -- inertia. +Also, the BIOS stub is responsible for creating boot_params, so it isn't +available to a BIOS-based loader (setup_data is, though). + +setup_header is permanently limited to 144 bytes due to the reach of the +2-byte jump field, which doubles as a length field for the structure, combined +with the size of the "hole" in struct boot_params that a protected-mode loader +or the BIOS stub has to copy it into. It is currently 119 bytes long, which +leaves us with 25 very precious bytes. This isn't something that can be fixed +without revising the boot protocol entirely, breaking backwards compatibility. + +boot_params proper is limited to 4096 bytes, but can be arbitrarily extended +by adding setup_data entries. It cannot be used to communicate properties of +the kernel image, because it is .bss and has no image-provided content. + +kernel_info solves this by providing an extensible place for information about +the kernel image. It is readonly, because the kernel cannot rely on a +bootloader copying its contents anywhere, but that is OK; if it becomes +necessary it can still contain data items that an enabled bootloader would be +expected to copy into a setup_data chunk. + +All kernel_info data should be part of this structure. Fixed size data have to +be put before kernel_info_var_len_data label. Variable size data have to be put +after kernel_info_var_len_data label. Each chunk of variable size data has to +be prefixed with header/magic and its size, e.g.: + + kernel_info: + .ascii "LToP" /* Header, Linux top (structure). */ + .long kernel_info_var_len_data - kernel_info + .long kernel_info_end - kernel_info + .long 0x01234567 /* Some fixed size data for the bootloaders. */ + kernel_info_var_len_data: + example_struct: /* Some variable size data for the bootloaders. */ + .ascii "0123" /* Header/Magic. */ + .long example_struct_end - example_struct + .ascii "Struct" + .long 0x89012345 + example_struct_end: + example_strings: /* Some variable size data for the bootloaders. */ + .ascii "ABCD" /* Header/Magic. */ + .long example_strings_end - example_strings + .asciz "String_0" + .asciz "String_1" + example_strings_end: + kernel_info_end: + +This way the kernel_info is self-contained blob. + +.. note:: + Each variable size data header/magic can be any 4-character string, + without \0 at the end of the string, which does not collide with + existing variable length data headers/magics. + + +Details of the kernel_info Fields +================================= + +============ ======== +Field name: header +Offset/size: 0x0000/4 +============ ======== + + Contains the magic number "LToP" (0x506f544c). + +============ ======== +Field name: size +Offset/size: 0x0004/4 +============ ======== + + This field contains the size of the kernel_info including kernel_info.header. + It does not count kernel_info.kernel_info_var_len_data size. This field should be + used by the bootloaders to detect supported fixed size fields in the kernel_info + and beginning of kernel_info.kernel_info_var_len_data. + +============ ======== +Field name: size_total +Offset/size: 0x0008/4 +============ ======== + + This field contains the size of the kernel_info including kernel_info.header + and kernel_info.kernel_info_var_len_data. + +============ ============== +Field name: setup_type_max +Offset/size: 0x000c/4 +============ ============== + + This field contains maximal allowed type for setup_data and setup_indirect structs. + The Image Checksum ================== diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index af64c4bb4447..a8de2fbc1caa 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -27,6 +27,7 @@ x86-specific Documentation mds microcode resctrl_ui + tsx_async_abort usb-legacy-support i386/index x86_64/index diff --git a/Documentation/x86/tsx_async_abort.rst b/Documentation/x86/tsx_async_abort.rst new file mode 100644 index 000000000000..583ddc185ba2 --- /dev/null +++ b/Documentation/x86/tsx_async_abort.rst @@ -0,0 +1,117 @@ +.. SPDX-License-Identifier: GPL-2.0 + +TSX Async Abort (TAA) mitigation +================================ + +.. _tsx_async_abort: + +Overview +-------- + +TSX Async Abort (TAA) is a side channel attack on internal buffers in some +Intel processors similar to Microachitectural Data Sampling (MDS). In this +case certain loads may speculatively pass invalid data to dependent operations +when an asynchronous abort condition is pending in a Transactional +Synchronization Extensions (TSX) transaction. This includes loads with no +fault or assist condition. Such loads may speculatively expose stale data from +the same uarch data structures as in MDS, with same scope of exposure i.e. +same-thread and cross-thread. This issue affects all current processors that +support TSX. + +Mitigation strategy +------------------- + +a) TSX disable - one of the mitigations is to disable TSX. A new MSR +IA32_TSX_CTRL will be available in future and current processors after +microcode update which can be used to disable TSX. In addition, it +controls the enumeration of the TSX feature bits (RTM and HLE) in CPUID. + +b) Clear CPU buffers - similar to MDS, clearing the CPU buffers mitigates this +vulnerability. More details on this approach can be found in +:ref:`Documentation/admin-guide/hw-vuln/mds.rst <mds>`. + +Kernel internal mitigation modes +-------------------------------- + + ============= ============================================================ + off Mitigation is disabled. Either the CPU is not affected or + tsx_async_abort=off is supplied on the kernel command line. + + tsx disabled Mitigation is enabled. TSX feature is disabled by default at + bootup on processors that support TSX control. + + verw Mitigation is enabled. CPU is affected and MD_CLEAR is + advertised in CPUID. + + ucode needed Mitigation is enabled. CPU is affected and MD_CLEAR is not + advertised in CPUID. That is mainly for virtualization + scenarios where the host has the updated microcode but the + hypervisor does not expose MD_CLEAR in CPUID. It's a best + effort approach without guarantee. + ============= ============================================================ + +If the CPU is affected and the "tsx_async_abort" kernel command line parameter is +not provided then the kernel selects an appropriate mitigation depending on the +status of RTM and MD_CLEAR CPUID bits. + +Below tables indicate the impact of tsx=on|off|auto cmdline options on state of +TAA mitigation, VERW behavior and TSX feature for various combinations of +MSR_IA32_ARCH_CAPABILITIES bits. + +1. "tsx=off" + +========= ========= ============ ============ ============== =================== ====================== +MSR_IA32_ARCH_CAPABILITIES bits Result with cmdline tsx=off +---------------------------------- ------------------------------------------------------------------------- +TAA_NO MDS_NO TSX_CTRL_MSR TSX state VERW can clear TAA mitigation TAA mitigation + after bootup CPU buffers tsx_async_abort=off tsx_async_abort=full +========= ========= ============ ============ ============== =================== ====================== + 0 0 0 HW default Yes Same as MDS Same as MDS + 0 0 1 Invalid case Invalid case Invalid case Invalid case + 0 1 0 HW default No Need ucode update Need ucode update + 0 1 1 Disabled Yes TSX disabled TSX disabled + 1 X 1 Disabled X None needed None needed +========= ========= ============ ============ ============== =================== ====================== + +2. "tsx=on" + +========= ========= ============ ============ ============== =================== ====================== +MSR_IA32_ARCH_CAPABILITIES bits Result with cmdline tsx=on +---------------------------------- ------------------------------------------------------------------------- +TAA_NO MDS_NO TSX_CTRL_MSR TSX state VERW can clear TAA mitigation TAA mitigation + after bootup CPU buffers tsx_async_abort=off tsx_async_abort=full +========= ========= ============ ============ ============== =================== ====================== + 0 0 0 HW default Yes Same as MDS Same as MDS + 0 0 1 Invalid case Invalid case Invalid case Invalid case + 0 1 0 HW default No Need ucode update Need ucode update + 0 1 1 Enabled Yes None Same as MDS + 1 X 1 Enabled X None needed None needed +========= ========= ============ ============ ============== =================== ====================== + +3. "tsx=auto" + +========= ========= ============ ============ ============== =================== ====================== +MSR_IA32_ARCH_CAPABILITIES bits Result with cmdline tsx=auto +---------------------------------- ------------------------------------------------------------------------- +TAA_NO MDS_NO TSX_CTRL_MSR TSX state VERW can clear TAA mitigation TAA mitigation + after bootup CPU buffers tsx_async_abort=off tsx_async_abort=full +========= ========= ============ ============ ============== =================== ====================== + 0 0 0 HW default Yes Same as MDS Same as MDS + 0 0 1 Invalid case Invalid case Invalid case Invalid case + 0 1 0 HW default No Need ucode update Need ucode update + 0 1 1 Disabled Yes TSX disabled TSX disabled + 1 X 1 Enabled X None needed None needed +========= ========= ============ ============ ============== =================== ====================== + +In the tables, TSX_CTRL_MSR is a new bit in MSR_IA32_ARCH_CAPABILITIES that +indicates whether MSR_IA32_TSX_CTRL is supported. + +There are two control bits in IA32_TSX_CTRL MSR: + + Bit 0: When set it disables the Restricted Transactional Memory (RTM) + sub-feature of TSX (will force all transactions to abort on the + XBEGIN instruction). + + Bit 1: When set it disables the enumeration of the RTM and HLE feature + (i.e. it will make CPUID(EAX=7).EBX{bit4} and + CPUID(EAX=7).EBX{bit11} read as 0). |