diff options
Diffstat (limited to 'Documentation/admin-guide/hw-vuln')
-rw-r--r-- | Documentation/admin-guide/hw-vuln/index.rst | 14 | ||||
-rw-r--r-- | Documentation/admin-guide/hw-vuln/l1tf.rst | 615 | ||||
-rw-r--r-- | Documentation/admin-guide/hw-vuln/mds.rst | 308 | ||||
-rw-r--r-- | Documentation/admin-guide/hw-vuln/spectre.rst | 697 |
4 files changed, 1634 insertions, 0 deletions
diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst new file mode 100644 index 000000000000..49311f3da6f2 --- /dev/null +++ b/Documentation/admin-guide/hw-vuln/index.rst @@ -0,0 +1,14 @@ +======================== +Hardware vulnerabilities +======================== + +This section describes CPU vulnerabilities and provides an overview of the +possible mitigations along with guidance for selecting mitigations if they +are configurable at compile, boot or run time. + +.. toctree:: + :maxdepth: 1 + + spectre + l1tf + mds diff --git a/Documentation/admin-guide/hw-vuln/l1tf.rst b/Documentation/admin-guide/hw-vuln/l1tf.rst new file mode 100644 index 000000000000..656aee262e23 --- /dev/null +++ b/Documentation/admin-guide/hw-vuln/l1tf.rst @@ -0,0 +1,615 @@ +L1TF - L1 Terminal Fault +======================== + +L1 Terminal Fault is a hardware vulnerability which allows unprivileged +speculative access to data which is available in the Level 1 Data Cache +when the page table entry controlling the virtual address, which is used +for the access, has the Present bit cleared or other reserved bits set. + +Affected processors +------------------- + +This vulnerability affects a wide range of Intel processors. The +vulnerability is not present on: + + - Processors from AMD, Centaur and other non Intel vendors + + - Older processor models, where the CPU family is < 6 + + - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft, + Penwell, Pineview, Silvermont, Airmont, Merrifield) + + - The Intel XEON PHI family + + - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the + IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected + by the Meltdown vulnerability either. These CPUs should become + available by end of 2018. + +Whether a processor is affected or not can be read out from the L1TF +vulnerability file in sysfs. See :ref:`l1tf_sys_info`. + +Related CVEs +------------ + +The following CVE entries are related to the L1TF vulnerability: + + ============= ================= ============================== + CVE-2018-3615 L1 Terminal Fault SGX related aspects + CVE-2018-3620 L1 Terminal Fault OS, SMM related aspects + CVE-2018-3646 L1 Terminal Fault Virtualization related aspects + ============= ================= ============================== + +Problem +------- + +If an instruction accesses a virtual address for which the relevant page +table entry (PTE) has the Present bit cleared or other reserved bits set, +then speculative execution ignores the invalid PTE and loads the referenced +data if it is present in the Level 1 Data Cache, as if the page referenced +by the address bits in the PTE was still present and accessible. + +While this is a purely speculative mechanism and the instruction will raise +a page fault when it is retired eventually, the pure act of loading the +data and making it available to other speculative instructions opens up the +opportunity for side channel attacks to unprivileged malicious code, +similar to the Meltdown attack. + +While Meltdown breaks the user space to kernel space protection, L1TF +allows to attack any physical memory address in the system and the attack +works across all protection domains. It allows an attack of SGX and also +works from inside virtual machines because the speculation bypasses the +extended page table (EPT) protection mechanism. + + +Attack scenarios +---------------- + +1. Malicious user space +^^^^^^^^^^^^^^^^^^^^^^^ + + Operating Systems store arbitrary information in the address bits of a + PTE which is marked non present. This allows a malicious user space + application to attack the physical memory to which these PTEs resolve. + In some cases user-space can maliciously influence the information + encoded in the address bits of the PTE, thus making attacks more + deterministic and more practical. + + The Linux kernel contains a mitigation for this attack vector, PTE + inversion, which is permanently enabled and has no performance + impact. The kernel ensures that the address bits of PTEs, which are not + marked present, never point to cacheable physical memory space. + + A system with an up to date kernel is protected against attacks from + malicious user space applications. + +2. Malicious guest in a virtual machine +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The fact that L1TF breaks all domain protections allows malicious guest + OSes, which can control the PTEs directly, and malicious guest user + space applications, which run on an unprotected guest kernel lacking the + PTE inversion mitigation for L1TF, to attack physical host memory. + + A special aspect of L1TF in the context of virtualization is symmetric + multi threading (SMT). The Intel implementation of SMT is called + HyperThreading. The fact that Hyperthreads on the affected processors + share the L1 Data Cache (L1D) is important for this. As the flaw allows + only to attack data which is present in L1D, a malicious guest running + on one Hyperthread can attack the data which is brought into the L1D by + the context which runs on the sibling Hyperthread of the same physical + core. This context can be host OS, host user space or a different guest. + + If the processor does not support Extended Page Tables, the attack is + only possible, when the hypervisor does not sanitize the content of the + effective (shadow) page tables. + + While solutions exist to mitigate these attack vectors fully, these + mitigations are not enabled by default in the Linux kernel because they + can affect performance significantly. The kernel provides several + mechanisms which can be utilized to address the problem depending on the + deployment scenario. The mitigations, their protection scope and impact + are described in the next sections. + + The default mitigations and the rationale for choosing them are explained + at the end of this document. See :ref:`default_mitigations`. + +.. _l1tf_sys_info: + +L1TF system information +----------------------- + +The Linux kernel provides a sysfs interface to enumerate the current L1TF +status of the system: whether the system is vulnerable, and which +mitigations are active. The relevant sysfs file is: + +/sys/devices/system/cpu/vulnerabilities/l1tf + +The possible values in this file are: + + =========================== =============================== + 'Not affected' The processor is not vulnerable + 'Mitigation: PTE Inversion' The host protection is active + =========================== =============================== + +If KVM/VMX is enabled and the processor is vulnerable then the following +information is appended to the 'Mitigation: PTE Inversion' part: + + - SMT status: + + ===================== ================ + 'VMX: SMT vulnerable' SMT is enabled + 'VMX: SMT disabled' SMT is disabled + ===================== ================ + + - L1D Flush mode: + + ================================ ==================================== + 'L1D vulnerable' L1D flushing is disabled + + 'L1D conditional cache flushes' L1D flush is conditionally enabled + + 'L1D cache flushes' L1D flush is unconditionally enabled + ================================ ==================================== + +The resulting grade of protection is discussed in the following sections. + + +Host mitigation mechanism +------------------------- + +The kernel is unconditionally protected against L1TF attacks from malicious +user space running on the host. + + +Guest mitigation mechanisms +--------------------------- + +.. _l1d_flush: + +1. L1D flush on VMENTER +^^^^^^^^^^^^^^^^^^^^^^^ + + To make sure that a guest cannot attack data which is present in the L1D + the hypervisor flushes the L1D before entering the guest. + + Flushing the L1D evicts not only the data which should not be accessed + by a potentially malicious guest, it also flushes the guest + data. Flushing the L1D has a performance impact as the processor has to + bring the flushed guest data back into the L1D. Depending on the + frequency of VMEXIT/VMENTER and the type of computations in the guest + performance degradation in the range of 1% to 50% has been observed. For + scenarios where guest VMEXIT/VMENTER are rare the performance impact is + minimal. Virtio and mechanisms like posted interrupts are designed to + confine the VMEXITs to a bare minimum, but specific configurations and + application scenarios might still suffer from a high VMEXIT rate. + + The kernel provides two L1D flush modes: + - conditional ('cond') + - unconditional ('always') + + The conditional mode avoids L1D flushing after VMEXITs which execute + only audited code paths before the corresponding VMENTER. These code + paths have been verified that they cannot expose secrets or other + interesting data to an attacker, but they can leak information about the + address space layout of the hypervisor. + + Unconditional mode flushes L1D on all VMENTER invocations and provides + maximum protection. It has a higher overhead than the conditional + mode. The overhead cannot be quantified correctly as it depends on the + workload scenario and the resulting number of VMEXITs. + + The general recommendation is to enable L1D flush on VMENTER. The kernel + defaults to conditional mode on affected processors. + + **Note**, that L1D flush does not prevent the SMT problem because the + sibling thread will also bring back its data into the L1D which makes it + attackable again. + + L1D flush can be controlled by the administrator via the kernel command + line and sysfs control files. See :ref:`mitigation_control_command_line` + and :ref:`mitigation_control_kvm`. + +.. _guest_confinement: + +2. Guest VCPU confinement to dedicated physical cores +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + To address the SMT problem, it is possible to make a guest or a group of + guests affine to one or more physical cores. The proper mechanism for + that is to utilize exclusive cpusets to ensure that no other guest or + host tasks can run on these cores. + + If only a single guest or related guests run on sibling SMT threads on + the same physical core then they can only attack their own memory and + restricted parts of the host memory. + + Host memory is attackable, when one of the sibling SMT threads runs in + host OS (hypervisor) context and the other in guest context. The amount + of valuable information from the host OS context depends on the context + which the host OS executes, i.e. interrupts, soft interrupts and kernel + threads. The amount of valuable data from these contexts cannot be + declared as non-interesting for an attacker without deep inspection of + the code. + + **Note**, that assigning guests to a fixed set of physical cores affects + the ability of the scheduler to do load balancing and might have + negative effects on CPU utilization depending on the hosting + scenario. Disabling SMT might be a viable alternative for particular + scenarios. + + For further information about confining guests to a single or to a group + of cores consult the cpusets documentation: + + https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.rst + +.. _interrupt_isolation: + +3. Interrupt affinity +^^^^^^^^^^^^^^^^^^^^^ + + Interrupts can be made affine to logical CPUs. This is not universally + true because there are types of interrupts which are truly per CPU + interrupts, e.g. the local timer interrupt. Aside of that multi queue + devices affine their interrupts to single CPUs or groups of CPUs per + queue without allowing the administrator to control the affinities. + + Moving the interrupts, which can be affinity controlled, away from CPUs + which run untrusted guests, reduces the attack vector space. + + Whether the interrupts with are affine to CPUs, which run untrusted + guests, provide interesting data for an attacker depends on the system + configuration and the scenarios which run on the system. While for some + of the interrupts it can be assumed that they won't expose interesting + information beyond exposing hints about the host OS memory layout, there + is no way to make general assumptions. + + Interrupt affinity can be controlled by the administrator via the + /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is + available at: + + https://www.kernel.org/doc/Documentation/IRQ-affinity.txt + +.. _smt_control: + +4. SMT control +^^^^^^^^^^^^^^ + + To prevent the SMT issues of L1TF it might be necessary to disable SMT + completely. Disabling SMT can have a significant performance impact, but + the impact depends on the hosting scenario and the type of workloads. + The impact of disabling SMT needs also to be weighted against the impact + of other mitigation solutions like confining guests to dedicated cores. + + The kernel provides a sysfs interface to retrieve the status of SMT and + to control it. It also provides a kernel command line interface to + control SMT. + + The kernel command line interface consists of the following options: + + =========== ========================================================== + nosmt Affects the bring up of the secondary CPUs during boot. The + kernel tries to bring all present CPUs online during the + boot process. "nosmt" makes sure that from each physical + core only one - the so called primary (hyper) thread is + activated. Due to a design flaw of Intel processors related + to Machine Check Exceptions the non primary siblings have + to be brought up at least partially and are then shut down + again. "nosmt" can be undone via the sysfs interface. + + nosmt=force Has the same effect as "nosmt" but it does not allow to + undo the SMT disable via the sysfs interface. + =========== ========================================================== + + The sysfs interface provides two files: + + - /sys/devices/system/cpu/smt/control + - /sys/devices/system/cpu/smt/active + + /sys/devices/system/cpu/smt/control: + + This file allows to read out the SMT control state and provides the + ability to disable or (re)enable SMT. The possible states are: + + ============== =================================================== + on SMT is supported by the CPU and enabled. All + logical CPUs can be onlined and offlined without + restrictions. + + off SMT is supported by the CPU and disabled. Only + the so called primary SMT threads can be onlined + and offlined without restrictions. An attempt to + online a non-primary sibling is rejected + + forceoff Same as 'off' but the state cannot be controlled. + Attempts to write to the control file are rejected. + + notsupported The processor does not support SMT. It's therefore + not affected by the SMT implications of L1TF. + Attempts to write to the control file are rejected. + ============== =================================================== + + The possible states which can be written into this file to control SMT + state are: + + - on + - off + - forceoff + + /sys/devices/system/cpu/smt/active: + + This file reports whether SMT is enabled and active, i.e. if on any + physical core two or more sibling threads are online. + + SMT control is also possible at boot time via the l1tf kernel command + line parameter in combination with L1D flush control. See + :ref:`mitigation_control_command_line`. + +5. Disabling EPT +^^^^^^^^^^^^^^^^ + + Disabling EPT for virtual machines provides full mitigation for L1TF even + with SMT enabled, because the effective page tables for guests are + managed and sanitized by the hypervisor. Though disabling EPT has a + significant performance impact especially when the Meltdown mitigation + KPTI is enabled. + + EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. + +There is ongoing research and development for new mitigation mechanisms to +address the performance impact of disabling SMT or EPT. + +.. _mitigation_control_command_line: + +Mitigation control on the kernel command line +--------------------------------------------- + +The kernel command line allows to control the L1TF mitigations at boot +time with the option "l1tf=". The valid arguments for this option are: + + ============ ============================================================= + full Provides all available mitigations for the L1TF + vulnerability. Disables SMT and enables all mitigations in + the hypervisors, i.e. unconditional L1D flushing + + SMT control and L1D flush control via the sysfs interface + is still possible after boot. Hypervisors will issue a + warning when the first VM is started in a potentially + insecure configuration, i.e. SMT enabled or L1D flush + disabled. + + full,force Same as 'full', but disables SMT and L1D flush runtime + control. Implies the 'nosmt=force' command line option. + (i.e. sysfs control of SMT is disabled.) + + flush Leaves SMT enabled and enables the default hypervisor + mitigation, i.e. conditional L1D flushing + + SMT control and L1D flush control via the sysfs interface + is still possible after boot. Hypervisors will issue a + warning when the first VM is started in a potentially + insecure configuration, i.e. SMT enabled or L1D flush + disabled. + + flush,nosmt Disables SMT and enables the default hypervisor mitigation, + i.e. conditional L1D flushing. + + SMT control and L1D flush control via the sysfs interface + is still possible after boot. Hypervisors will issue a + warning when the first VM is started in a potentially + insecure configuration, i.e. SMT enabled or L1D flush + disabled. + + flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is + started in a potentially insecure configuration. + + off Disables hypervisor mitigations and doesn't emit any + warnings. + It also drops the swap size and available RAM limit restrictions + on both hypervisor and bare metal. + + ============ ============================================================= + +The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`. + + +.. _mitigation_control_kvm: + +Mitigation control for KVM - module parameter +------------------------------------------------------------- + +The KVM hypervisor mitigation mechanism, flushing the L1D cache when +entering a guest, can be controlled with a module parameter. + +The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the +following arguments: + + ============ ============================================================== + always L1D cache flush on every VMENTER. + + cond Flush L1D on VMENTER only when the code between VMEXIT and + VMENTER can leak host memory which is considered + interesting for an attacker. This still can leak host memory + which allows e.g. to determine the hosts address space layout. + + never Disables the mitigation + ============ ============================================================== + +The parameter can be provided on the kernel command line, as a module +parameter when loading the modules and at runtime modified via the sysfs +file: + +/sys/module/kvm_intel/parameters/vmentry_l1d_flush + +The default is 'cond'. If 'l1tf=full,force' is given on the kernel command +line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush +module parameter is ignored and writes to the sysfs file are rejected. + +.. _mitigation_selection: + +Mitigation selection guide +-------------------------- + +1. No virtualization in use +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The system is protected by the kernel unconditionally and no further + action is required. + +2. Virtualization with trusted guests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + If the guest comes from a trusted source and the guest OS kernel is + guaranteed to have the L1TF mitigations in place the system is fully + protected against L1TF and no further action is required. + + To avoid the overhead of the default L1D flushing on VMENTER the + administrator can disable the flushing via the kernel command line and + sysfs control files. See :ref:`mitigation_control_command_line` and + :ref:`mitigation_control_kvm`. + + +3. Virtualization with untrusted guests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +3.1. SMT not supported or disabled +"""""""""""""""""""""""""""""""""" + + If SMT is not supported by the processor or disabled in the BIOS or by + the kernel, it's only required to enforce L1D flushing on VMENTER. + + Conditional L1D flushing is the default behaviour and can be tuned. See + :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`. + +3.2. EPT not supported or disabled +"""""""""""""""""""""""""""""""""" + + If EPT is not supported by the processor or disabled in the hypervisor, + the system is fully protected. SMT can stay enabled and L1D flushing on + VMENTER is not required. + + EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. + +3.3. SMT and EPT supported and active +""""""""""""""""""""""""""""""""""""" + + If SMT and EPT are supported and active then various degrees of + mitigations can be employed: + + - L1D flushing on VMENTER: + + L1D flushing on VMENTER is the minimal protection requirement, but it + is only potent in combination with other mitigation methods. + + Conditional L1D flushing is the default behaviour and can be tuned. See + :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`. + + - Guest confinement: + + Confinement of guests to a single or a group of physical cores which + are not running any other processes, can reduce the attack surface + significantly, but interrupts, soft interrupts and kernel threads can + still expose valuable data to a potential attacker. See + :ref:`guest_confinement`. + + - Interrupt isolation: + + Isolating the guest CPUs from interrupts can reduce the attack surface + further, but still allows a malicious guest to explore a limited amount + of host physical memory. This can at least be used to gain knowledge + about the host address space layout. The interrupts which have a fixed + affinity to the CPUs which run the untrusted guests can depending on + the scenario still trigger soft interrupts and schedule kernel threads + which might expose valuable information. See + :ref:`interrupt_isolation`. + +The above three mitigation methods combined can provide protection to a +certain degree, but the risk of the remaining attack surface has to be +carefully analyzed. For full protection the following methods are +available: + + - Disabling SMT: + + Disabling SMT and enforcing the L1D flushing provides the maximum + amount of protection. This mitigation is not depending on any of the + above mitigation methods. + + SMT control and L1D flushing can be tuned by the command line + parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run + time with the matching sysfs control files. See :ref:`smt_control`, + :ref:`mitigation_control_command_line` and + :ref:`mitigation_control_kvm`. + + - Disabling EPT: + + Disabling EPT provides the maximum amount of protection as well. It is + not depending on any of the above mitigation methods. SMT can stay + enabled and L1D flushing is not required, but the performance impact is + significant. + + EPT can be disabled in the hypervisor via the 'kvm-intel.ept' + parameter. + +3.4. Nested virtual machines +"""""""""""""""""""""""""""" + +When nested virtualization is in use, three operating systems are involved: +the bare metal hypervisor, the nested hypervisor and the nested virtual +machine. VMENTER operations from the nested hypervisor into the nested +guest will always be processed by the bare metal hypervisor. If KVM is the +bare metal hypervisor it will: + + - Flush the L1D cache on every switch from the nested hypervisor to the + nested virtual machine, so that the nested hypervisor's secrets are not + exposed to the nested virtual machine; + + - Flush the L1D cache on every switch from the nested virtual machine to + the nested hypervisor; this is a complex operation, and flushing the L1D + cache avoids that the bare metal hypervisor's secrets are exposed to the + nested virtual machine; + + - Instruct the nested hypervisor to not perform any L1D cache flush. This + is an optimization to avoid double L1D flushing. + + +.. _default_mitigations: + +Default mitigations +------------------- + + The kernel default mitigations for vulnerable processors are: + + - PTE inversion to protect against malicious user space. This is done + unconditionally and cannot be controlled. The swap storage is limited + to ~16TB. + + - L1D conditional flushing on VMENTER when EPT is enabled for + a guest. + + The kernel does not by default enforce the disabling of SMT, which leaves + SMT systems vulnerable when running untrusted guests with EPT enabled. + + The rationale for this choice is: + + - Force disabling SMT can break existing setups, especially with + unattended updates. + + - If regular users run untrusted guests on their machine, then L1TF is + just an add on to other malware which might be embedded in an untrusted + guest, e.g. spam-bots or attacks on the local network. + + There is no technical way to prevent a user from running untrusted code + on their machines blindly. + + - It's technically extremely unlikely and from today's knowledge even + impossible that L1TF can be exploited via the most popular attack + mechanisms like JavaScript because these mechanisms have no way to + control PTEs. If this would be possible and not other mitigation would + be possible, then the default might be different. + + - The administrators of cloud and hosting setups have to carefully + analyze the risk for their scenarios and make the appropriate + mitigation choices, which might even vary across their deployed + machines and also result in other changes of their overall setup. + There is no way for the kernel to provide a sensible default for this + kind of scenarios. diff --git a/Documentation/admin-guide/hw-vuln/mds.rst b/Documentation/admin-guide/hw-vuln/mds.rst new file mode 100644 index 000000000000..e3a796c0d3a2 --- /dev/null +++ b/Documentation/admin-guide/hw-vuln/mds.rst @@ -0,0 +1,308 @@ +MDS - Microarchitectural Data Sampling +====================================== + +Microarchitectural Data Sampling is a hardware vulnerability which allows +unprivileged speculative access to data which is available in various CPU +internal buffers. + +Affected processors +------------------- + +This vulnerability affects a wide range of Intel processors. The +vulnerability is not present on: + + - Processors from AMD, Centaur and other non Intel vendors + + - Older processor models, where the CPU family is < 6 + + - Some Atoms (Bonnell, Saltwell, Goldmont, GoldmontPlus) + + - Intel processors which have the ARCH_CAP_MDS_NO bit set in the + IA32_ARCH_CAPABILITIES MSR. + +Whether a processor is affected or not can be read out from the MDS +vulnerability file in sysfs. See :ref:`mds_sys_info`. + +Not all processors are affected by all variants of MDS, but the mitigation +is identical for all of them so the kernel treats them as a single +vulnerability. + +Related CVEs +------------ + +The following CVE entries are related to the MDS vulnerability: + + ============== ===== =================================================== + CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling + CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling + CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling + CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory + ============== ===== =================================================== + +Problem +------- + +When performing store, load, L1 refill operations, processors write data +into temporary microarchitectural structures (buffers). The data in the +buffer can be forwarded to load operations as an optimization. + +Under certain conditions, usually a fault/assist caused by a load +operation, data unrelated to the load memory address can be speculatively +forwarded from the buffers. Because the load operation causes a fault or +assist and its result will be discarded, the forwarded data will not cause +incorrect program execution or state changes. But a malicious operation +may be able to forward this speculative data to a disclosure gadget which +allows in turn to infer the value via a cache side channel attack. + +Because the buffers are potentially shared between Hyper-Threads cross +Hyper-Thread attacks are possible. + +Deeper technical information is available in the MDS specific x86 +architecture section: :ref:`Documentation/x86/mds.rst <mds>`. + + +Attack scenarios +---------------- + +Attacks against the MDS vulnerabilities can be mounted from malicious non +priviledged user space applications running on hosts or guest. Malicious +guest OSes can obviously mount attacks as well. + +Contrary to other speculation based vulnerabilities the MDS vulnerability +does not allow the attacker to control the memory target address. As a +consequence the attacks are purely sampling based, but as demonstrated with +the TLBleed attack samples can be postprocessed successfully. + +Web-Browsers +^^^^^^^^^^^^ + + It's unclear whether attacks through Web-Browsers are possible at + all. The exploitation through Java-Script is considered very unlikely, + but other widely used web technologies like Webassembly could possibly be + abused. + + +.. _mds_sys_info: + +MDS system information +----------------------- + +The Linux kernel provides a sysfs interface to enumerate the current MDS +status of the system: whether the system is vulnerable, and which +mitigations are active. The relevant sysfs file is: + +/sys/devices/system/cpu/vulnerabilities/mds + +The possible values in this file are: + + .. list-table:: + + * - 'Not affected' + - The processor is not vulnerable + * - 'Vulnerable' + - The processor is vulnerable, but no mitigation enabled + * - 'Vulnerable: Clear CPU buffers attempted, no microcode' + - The processor is vulnerable but microcode is not updated. + + The mitigation is enabled on a best effort basis. See :ref:`vmwerv` + * - 'Mitigation: Clear CPU buffers' + - The processor is vulnerable and the CPU buffer clearing mitigation is + enabled. + +If the processor is vulnerable then the following information is appended +to the above information: + + ======================== ============================================ + 'SMT vulnerable' SMT is enabled + 'SMT mitigated' SMT is enabled and mitigated + 'SMT disabled' SMT is disabled + 'SMT Host state unknown' Kernel runs in a VM, Host SMT state unknown + ======================== ============================================ + +.. _vmwerv: + +Best effort mitigation mode +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + If the processor is vulnerable, but the availability of the microcode based + mitigation mechanism is not advertised via CPUID the kernel selects a best + effort mitigation mode. This mode invokes the mitigation instructions + without a guarantee that they clear the CPU buffers. + + This is done to address virtualization scenarios where the host has the + microcode update applied, but the hypervisor is not yet updated to expose + the CPUID to the guest. If the host has updated microcode the protection + takes effect otherwise a few cpu cycles are wasted pointlessly. + + The state in the mds sysfs file reflects this situation accordingly. + + +Mitigation mechanism +------------------------- + +The kernel detects the affected CPUs and the presence of the microcode +which is required. + +If a CPU is affected and the microcode is available, then the kernel +enables the mitigation by default. The mitigation can be controlled at boot +time via a kernel command line option. See +:ref:`mds_mitigation_control_command_line`. + +.. _cpu_buffer_clear: + +CPU buffer clearing +^^^^^^^^^^^^^^^^^^^ + + The mitigation for MDS clears the affected CPU buffers on return to user + space and when entering a guest. + + If SMT is enabled it also clears the buffers on idle entry when the CPU + is only affected by MSBDS and not any other MDS variant, because the + other variants cannot be protected against cross Hyper-Thread attacks. + + For CPUs which are only affected by MSBDS the user space, guest and idle + transition mitigations are sufficient and SMT is not affected. + +.. _virt_mechanism: + +Virtualization mitigation +^^^^^^^^^^^^^^^^^^^^^^^^^ + + The protection for host to guest transition depends on the L1TF + vulnerability of the CPU: + + - CPU is affected by L1TF: + + If the L1D flush mitigation is enabled and up to date microcode is + available, the L1D flush mitigation is automatically protecting the + guest transition. + + If the L1D flush mitigation is disabled then the MDS mitigation is + invoked explicit when the host MDS mitigation is enabled. + + For details on L1TF and virtualization see: + :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <mitigation_control_kvm>`. + + - CPU is not affected by L1TF: + + CPU buffers are flushed before entering the guest when the host MDS + mitigation is enabled. + + The resulting MDS protection matrix for the host to guest transition: + + ============ ===== ============= ============ ================= + L1TF MDS VMX-L1FLUSH Host MDS MDS-State + + Don't care No Don't care N/A Not affected + + Yes Yes Disabled Off Vulnerable + + Yes Yes Disabled Full Mitigated + + Yes Yes Enabled Don't care Mitigated + + No Yes N/A Off Vulnerable + + No Yes N/A Full Mitigated + ============ ===== ============= ============ ================= + + This only covers the host to guest transition, i.e. prevents leakage from + host to guest, but does not protect the guest internally. Guests need to + have their own protections. + +.. _xeon_phi: + +XEON PHI specific considerations +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The XEON PHI processor family is affected by MSBDS which can be exploited + cross Hyper-Threads when entering idle states. Some XEON PHI variants allow + to use MWAIT in user space (Ring 3) which opens an potential attack vector + for malicious user space. The exposure can be disabled on the kernel + command line with the 'ring3mwait=disable' command line option. + + XEON PHI is not affected by the other MDS variants and MSBDS is mitigated + before the CPU enters a idle state. As XEON PHI is not affected by L1TF + either disabling SMT is not required for full protection. + +.. _mds_smt_control: + +SMT control +^^^^^^^^^^^ + + All MDS variants except MSBDS can be attacked cross Hyper-Threads. That + means on CPUs which are affected by MFBDS or MLPDS it is necessary to + disable SMT for full protection. These are most of the affected CPUs; the + exception is XEON PHI, see :ref:`xeon_phi`. + + Disabling SMT can have a significant performance impact, but the impact + depends on the type of workloads. + + See the relevant chapter in the L1TF mitigation documentation for details: + :ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`. + + +.. _mds_mitigation_control_command_line: + +Mitigation control on the kernel command line +--------------------------------------------- + +The kernel command line allows to control the MDS mitigations at boot +time with the option "mds=". The valid arguments for this option are: + + ============ ============================================================= + full If the CPU is vulnerable, enable all available mitigations + for the MDS vulnerability, CPU buffer clearing on exit to + userspace and when entering a VM. Idle transitions are + protected as well if SMT is enabled. + + It does not automatically disable SMT. + + full,nosmt The same as mds=full, with SMT disabled on vulnerable + CPUs. This is the complete mitigation. + + off Disables MDS mitigations completely. + + ============ ============================================================= + +Not specifying this option is equivalent to "mds=full". + + +Mitigation selection guide +-------------------------- + +1. Trusted userspace +^^^^^^^^^^^^^^^^^^^^ + + If all userspace applications are from a trusted source and do not + execute untrusted code which is supplied externally, then the mitigation + can be disabled. + + +2. Virtualization with trusted guests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The same considerations as above versus trusted user space apply. + +3. Virtualization with untrusted guests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The protection depends on the state of the L1TF mitigations. + See :ref:`virt_mechanism`. + + If the MDS mitigation is enabled and SMT is disabled, guest to host and + guest to guest attacks are prevented. + +.. _mds_default_mitigations: + +Default mitigations +------------------- + + The kernel default mitigations for vulnerable processors are: + + - Enable CPU buffer clearing + + The kernel does not by default enforce the disabling of SMT, which leaves + SMT systems vulnerable when running untrusted code. The same rationale as + for L1TF applies. + See :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <default_mitigations>`. diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst new file mode 100644 index 000000000000..25f3b2532198 --- /dev/null +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -0,0 +1,697 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Spectre Side Channels +===================== + +Spectre is a class of side channel attacks that exploit branch prediction +and speculative execution on modern CPUs to read memory, possibly +bypassing access controls. Speculative execution side channel exploits +do not modify memory but attempt to infer privileged data in the memory. + +This document covers Spectre variant 1 and Spectre variant 2. + +Affected processors +------------------- + +Speculative execution side channel methods affect a wide range of modern +high performance processors, since most modern high speed processors +use branch prediction and speculative execution. + +The following CPUs are vulnerable: + + - Intel Core, Atom, Pentium, and Xeon processors + + - AMD Phenom, EPYC, and Zen processors + + - IBM POWER and zSeries processors + + - Higher end ARM processors + + - Apple CPUs + + - Higher end MIPS CPUs + + - Likely most other high performance CPUs. Contact your CPU vendor for details. + +Whether a processor is affected or not can be read out from the Spectre +vulnerability files in sysfs. See :ref:`spectre_sys_info`. + +Related CVEs +------------ + +The following CVE entries describe Spectre variants: + + ============= ======================= ================= + CVE-2017-5753 Bounds check bypass Spectre variant 1 + CVE-2017-5715 Branch target injection Spectre variant 2 + ============= ======================= ================= + +Problem +------- + +CPUs use speculative operations to improve performance. That may leave +traces of memory accesses or computations in the processor's caches, +buffers, and branch predictors. Malicious software may be able to +influence the speculative execution paths, and then use the side effects +of the speculative execution in the CPUs' caches and buffers to infer +privileged data touched during the speculative execution. + +Spectre variant 1 attacks take advantage of speculative execution of +conditional branches, while Spectre variant 2 attacks use speculative +execution of indirect branches to leak privileged memory. +See :ref:`[1] <spec_ref1>` :ref:`[5] <spec_ref5>` :ref:`[7] <spec_ref7>` +:ref:`[10] <spec_ref10>` :ref:`[11] <spec_ref11>`. + +Spectre variant 1 (Bounds Check Bypass) +--------------------------------------- + +The bounds check bypass attack :ref:`[2] <spec_ref2>` takes advantage +of speculative execution that bypasses conditional branch instructions +used for memory access bounds check (e.g. checking if the index of an +array results in memory access within a valid range). This results in +memory accesses to invalid memory (with out-of-bound index) that are +done speculatively before validation checks resolve. Such speculative +memory accesses can leave side effects, creating side channels which +leak information to the attacker. + +There are some extensions of Spectre variant 1 attacks for reading data +over the network, see :ref:`[12] <spec_ref12>`. However such attacks +are difficult, low bandwidth, fragile, and are considered low risk. + +Spectre variant 2 (Branch Target Injection) +------------------------------------------- + +The branch target injection attack takes advantage of speculative +execution of indirect branches :ref:`[3] <spec_ref3>`. The indirect +branch predictors inside the processor used to guess the target of +indirect branches can be influenced by an attacker, causing gadget code +to be speculatively executed, thus exposing sensitive data touched by +the victim. The side effects left in the CPU's caches during speculative +execution can be measured to infer data values. + +.. _poison_btb: + +In Spectre variant 2 attacks, the attacker can steer speculative indirect +branches in the victim to gadget code by poisoning the branch target +buffer of a CPU used for predicting indirect branch addresses. Such +poisoning could be done by indirect branching into existing code, +with the address offset of the indirect branch under the attacker's +control. Since the branch prediction on impacted hardware does not +fully disambiguate branch address and uses the offset for prediction, +this could cause privileged code's indirect branch to jump to a gadget +code with the same offset. + +The most useful gadgets take an attacker-controlled input parameter (such +as a register value) so that the memory read can be controlled. Gadgets +without input parameters might be possible, but the attacker would have +very little control over what memory can be read, reducing the risk of +the attack revealing useful data. + +One other variant 2 attack vector is for the attacker to poison the +return stack buffer (RSB) :ref:`[13] <spec_ref13>` to cause speculative +subroutine return instruction execution to go to a gadget. An attacker's +imbalanced subroutine call instructions might "poison" entries in the +return stack buffer which are later consumed by a victim's subroutine +return instructions. This attack can be mitigated by flushing the return +stack buffer on context switch, or virtual machine (VM) exit. + +On systems with simultaneous multi-threading (SMT), attacks are possible +from the sibling thread, as level 1 cache and branch target buffer +(BTB) may be shared between hardware threads in a CPU core. A malicious +program running on the sibling thread may influence its peer's BTB to +steer its indirect branch speculations to gadget code, and measure the +speculative execution's side effects left in level 1 cache to infer the +victim's data. + +Attack scenarios +---------------- + +The following list of attack scenarios have been anticipated, but may +not cover all possible attack vectors. + +1. A user process attacking the kernel +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The attacker passes a parameter to the kernel via a register or + via a known address in memory during a syscall. Such parameter may + be used later by the kernel as an index to an array or to derive + a pointer for a Spectre variant 1 attack. The index or pointer + is invalid, but bound checks are bypassed in the code branch taken + for speculative execution. This could cause privileged memory to be + accessed and leaked. + + For kernel code that has been identified where data pointers could + potentially be influenced for Spectre attacks, new "nospec" accessor + macros are used to prevent speculative loading of data. + + Spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch + target buffer (BTB) before issuing syscall to launch an attack. + After entering the kernel, the kernel could use the poisoned branch + target buffer on indirect jump and jump to gadget code in speculative + execution. + + If an attacker tries to control the memory addresses leaked during + speculative execution, he would also need to pass a parameter to the + gadget, either through a register or a known address in memory. After + the gadget has executed, he can measure the side effect. + + The kernel can protect itself against consuming poisoned branch + target buffer entries by using return trampolines (also known as + "retpoline") :ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` for all + indirect branches. Return trampolines trap speculative execution paths + to prevent jumping to gadget code during speculative execution. + x86 CPUs with Enhanced Indirect Branch Restricted Speculation + (Enhanced IBRS) available in hardware should use the feature to + mitigate Spectre variant 2 instead of retpoline. Enhanced IBRS is + more efficient than retpoline. + + There may be gadget code in firmware which could be exploited with + Spectre variant 2 attack by a rogue user process. To mitigate such + attacks on x86, Indirect Branch Restricted Speculation (IBRS) feature + is turned on before the kernel invokes any firmware code. + +2. A user process attacking another user process +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + A malicious user process can try to attack another user process, + either via a context switch on the same hardware thread, or from the + sibling hyperthread sharing a physical processor core on simultaneous + multi-threading (SMT) system. + + Spectre variant 1 attacks generally require passing parameters + between the processes, which needs a data passing relationship, such + as remote procedure calls (RPC). Those parameters are used in gadget + code to derive invalid data pointers accessing privileged memory in + the attacked process. + + Spectre variant 2 attacks can be launched from a rogue process by + :ref:`poisoning <poison_btb>` the branch target buffer. This can + influence the indirect branch targets for a victim process that either + runs later on the same hardware thread, or running concurrently on + a sibling hardware thread sharing the same physical core. + + A user process can protect itself against Spectre variant 2 attacks + by using the prctl() syscall to disable indirect branch speculation + for itself. An administrator can also cordon off an unsafe process + from polluting the branch target buffer by disabling the process's + indirect branch speculation. This comes with a performance cost + from not using indirect branch speculation and clearing the branch + target buffer. When SMT is enabled on x86, for a process that has + indirect branch speculation disabled, Single Threaded Indirect Branch + Predictors (STIBP) :ref:`[4] <spec_ref4>` are turned on to prevent the + sibling thread from controlling branch target buffer. In addition, + the Indirect Branch Prediction Barrier (IBPB) is issued to clear the + branch target buffer when context switching to and from such process. + + On x86, the return stack buffer is stuffed on context switch. + This prevents the branch target buffer from being used for branch + prediction when the return stack buffer underflows while switching to + a deeper call stack. Any poisoned entries in the return stack buffer + left by the previous process will also be cleared. + + User programs should use address space randomization to make attacks + more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2). + +3. A virtualized guest attacking the host +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The attack mechanism is similar to how user processes attack the + kernel. The kernel is entered via hyper-calls or other virtualization + exit paths. + + For Spectre variant 1 attacks, rogue guests can pass parameters + (e.g. in registers) via hyper-calls to derive invalid pointers to + speculate into privileged memory after entering the kernel. For places + where such kernel code has been identified, nospec accessor macros + are used to stop speculative memory access. + + For Spectre variant 2 attacks, rogue guests can :ref:`poison + <poison_btb>` the branch target buffer or return stack buffer, causing + the kernel to jump to gadget code in the speculative execution paths. + + To mitigate variant 2, the host kernel can use return trampolines + for indirect branches to bypass the poisoned branch target buffer, + and flushing the return stack buffer on VM exit. This prevents rogue + guests from affecting indirect branching in the host kernel. + + To protect host processes from rogue guests, host processes can have + indirect branch speculation disabled via prctl(). The branch target + buffer is cleared before context switching to such processes. + +4. A virtualized guest attacking other guest +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + A rogue guest may attack another guest to get data accessible by the + other guest. + + Spectre variant 1 attacks are possible if parameters can be passed + between guests. This may be done via mechanisms such as shared memory + or message passing. Such parameters could be used to derive data + pointers to privileged data in guest. The privileged data could be + accessed by gadget code in the victim's speculation paths. + + Spectre variant 2 attacks can be launched from a rogue guest by + :ref:`poisoning <poison_btb>` the branch target buffer or the return + stack buffer. Such poisoned entries could be used to influence + speculation execution paths in the victim guest. + + Linux kernel mitigates attacks to other guests running in the same + CPU hardware thread by flushing the return stack buffer on VM exit, + and clearing the branch target buffer before switching to a new guest. + + If SMT is used, Spectre variant 2 attacks from an untrusted guest + in the sibling hyperthread can be mitigated by the administrator, + by turning off the unsafe guest's indirect branch speculation via + prctl(). A guest can also protect itself by turning on microcode + based mitigations (such as IBPB or STIBP on x86) within the guest. + +.. _spectre_sys_info: + +Spectre system information +-------------------------- + +The Linux kernel provides a sysfs interface to enumerate the current +mitigation status of the system for Spectre: whether the system is +vulnerable, and which mitigations are active. + +The sysfs file showing Spectre variant 1 mitigation status is: + + /sys/devices/system/cpu/vulnerabilities/spectre_v1 + +The possible values in this file are: + + ======================================= ================================= + 'Mitigation: __user pointer sanitation' Protection in kernel on a case by + case base with explicit pointer + sanitation. + ======================================= ================================= + +However, the protections are put in place on a case by case basis, +and there is no guarantee that all possible attack vectors for Spectre +variant 1 are covered. + +The spectre_v2 kernel file reports if the kernel has been compiled with +retpoline mitigation or if the CPU has hardware mitigation, and if the +CPU has support for additional process-specific mitigation. + +This file also reports CPU features enabled by microcode to mitigate +attack between user processes: + +1. Indirect Branch Prediction Barrier (IBPB) to add additional + isolation between processes of different users. +2. Single Thread Indirect Branch Predictors (STIBP) to add additional + isolation between CPU threads running on the same core. + +These CPU features may impact performance when used and can be enabled +per process on a case-by-case base. + +The sysfs file showing Spectre variant 2 mitigation status is: + + /sys/devices/system/cpu/vulnerabilities/spectre_v2 + +The possible values in this file are: + + - Kernel status: + + ==================================== ================================= + 'Not affected' The processor is not vulnerable + 'Vulnerable' Vulnerable, no mitigation + 'Mitigation: Full generic retpoline' Software-focused mitigation + 'Mitigation: Full AMD retpoline' AMD-specific software mitigation + 'Mitigation: Enhanced IBRS' Hardware-focused mitigation + ==================================== ================================= + + - Firmware status: Show if Indirect Branch Restricted Speculation (IBRS) is + used to protect against Spectre variant 2 attacks when calling firmware (x86 only). + + ========== ============================================================= + 'IBRS_FW' Protection against user program attacks when calling firmware + ========== ============================================================= + + - Indirect branch prediction barrier (IBPB) status for protection between + processes of different users. This feature can be controlled through + prctl() per process, or through kernel command line options. This is + an x86 only feature. For more details see below. + + =================== ======================================================== + 'IBPB: disabled' IBPB unused + 'IBPB: always-on' Use IBPB on all tasks + 'IBPB: conditional' Use IBPB on SECCOMP or indirect branch restricted tasks + =================== ======================================================== + + - Single threaded indirect branch prediction (STIBP) status for protection + between different hyper threads. This feature can be controlled through + prctl per process, or through kernel command line options. This is x86 + only feature. For more details see below. + + ==================== ======================================================== + 'STIBP: disabled' STIBP unused + 'STIBP: forced' Use STIBP on all tasks + 'STIBP: conditional' Use STIBP on SECCOMP or indirect branch restricted tasks + ==================== ======================================================== + + - Return stack buffer (RSB) protection status: + + ============= =========================================== + 'RSB filling' Protection of RSB on context switch enabled + ============= =========================================== + +Full mitigation might require a microcode update from the CPU +vendor. When the necessary microcode is not available, the kernel will +report vulnerability. + +Turning on mitigation for Spectre variant 1 and Spectre variant 2 +----------------------------------------------------------------- + +1. Kernel mitigation +^^^^^^^^^^^^^^^^^^^^ + + For the Spectre variant 1, vulnerable kernel code (as determined + by code audit or scanning tools) is annotated on a case by case + basis to use nospec accessor macros for bounds clipping :ref:`[2] + <spec_ref2>` to avoid any usable disclosure gadgets. However, it may + not cover all attack vectors for Spectre variant 1. + + For Spectre variant 2 mitigation, the compiler turns indirect calls or + jumps in the kernel into equivalent return trampolines (retpolines) + :ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` to go to the target + addresses. Speculative execution paths under retpolines are trapped + in an infinite loop to prevent any speculative execution jumping to + a gadget. + + To turn on retpoline mitigation on a vulnerable CPU, the kernel + needs to be compiled with a gcc compiler that supports the + -mindirect-branch=thunk-extern -mindirect-branch-register options. + If the kernel is compiled with a Clang compiler, the compiler needs + to support -mretpoline-external-thunk option. The kernel config + CONFIG_RETPOLINE needs to be turned on, and the CPU needs to run with + the latest updated microcode. + + On Intel Skylake-era systems the mitigation covers most, but not all, + cases. See :ref:`[3] <spec_ref3>` for more details. + + On CPUs with hardware mitigation for Spectre variant 2 (e.g. Enhanced + IBRS on x86), retpoline is automatically disabled at run time. + + The retpoline mitigation is turned on by default on vulnerable + CPUs. It can be forced on or off by the administrator + via the kernel command line and sysfs control files. See + :ref:`spectre_mitigation_control_command_line`. + + On x86, indirect branch restricted speculation is turned on by default + before invoking any firmware code to prevent Spectre variant 2 exploits + using the firmware. + + Using kernel address space randomization (CONFIG_RANDOMIZE_SLAB=y + and CONFIG_SLAB_FREELIST_RANDOM=y in the kernel configuration) makes + attacks on the kernel generally more difficult. + +2. User program mitigation +^^^^^^^^^^^^^^^^^^^^^^^^^^ + + User programs can mitigate Spectre variant 1 using LFENCE or "bounds + clipping". For more details see :ref:`[2] <spec_ref2>`. + + For Spectre variant 2 mitigation, individual user programs + can be compiled with return trampolines for indirect branches. + This protects them from consuming poisoned entries in the branch + target buffer left by malicious software. Alternatively, the + programs can disable their indirect branch speculation via prctl() + (See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`). + On x86, this will turn on STIBP to guard against attacks from the + sibling thread when the user program is running, and use IBPB to + flush the branch target buffer when switching to/from the program. + + Restricting indirect branch speculation on a user program will + also prevent the program from launching a variant 2 attack + on x86. All sand-boxed SECCOMP programs have indirect branch + speculation restricted by default. Administrators can change + that behavior via the kernel command line and sysfs control files. + See :ref:`spectre_mitigation_control_command_line`. + + Programs that disable their indirect branch speculation will have + more overhead and run slower. + + User programs should use address space randomization + (/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more + difficult. + +3. VM mitigation +^^^^^^^^^^^^^^^^ + + Within the kernel, Spectre variant 1 attacks from rogue guests are + mitigated on a case by case basis in VM exit paths. Vulnerable code + uses nospec accessor macros for "bounds clipping", to avoid any + usable disclosure gadgets. However, this may not cover all variant + 1 attack vectors. + + For Spectre variant 2 attacks from rogue guests to the kernel, the + Linux kernel uses retpoline or Enhanced IBRS to prevent consumption of + poisoned entries in branch target buffer left by rogue guests. It also + flushes the return stack buffer on every VM exit to prevent a return + stack buffer underflow so poisoned branch target buffer could be used, + or attacker guests leaving poisoned entries in the return stack buffer. + + To mitigate guest-to-guest attacks in the same CPU hardware thread, + the branch target buffer is sanitized by flushing before switching + to a new guest on a CPU. + + The above mitigations are turned on by default on vulnerable CPUs. + + To mitigate guest-to-guest attacks from sibling thread when SMT is + in use, an untrusted guest running in the sibling thread can have + its indirect branch speculation disabled by administrator via prctl(). + + The kernel also allows guests to use any microcode based mitigation + they choose to use (such as IBPB or STIBP on x86) to protect themselves. + +.. _spectre_mitigation_control_command_line: + +Mitigation control on the kernel command line +--------------------------------------------- + +Spectre variant 2 mitigation can be disabled or force enabled at the +kernel command line. + + nospectre_v2 + + [X86] Disable all mitigations for the Spectre variant 2 + (indirect branch prediction) vulnerability. System may + allow data leaks with this option, which is equivalent + to spectre_v2=off. + + + spectre_v2= + + [X86] Control mitigation of Spectre variant 2 + (indirect branch speculation) vulnerability. + The default operation protects the kernel from + user space attacks. + + on + unconditionally enable, implies + spectre_v2_user=on + off + unconditionally disable, implies + spectre_v2_user=off + auto + kernel detects whether your CPU model is + vulnerable + + Selecting 'on' will, and 'auto' may, choose a + mitigation method at run time according to the + CPU, the available microcode, the setting of the + CONFIG_RETPOLINE configuration option, and the + compiler with which the kernel was built. + + Selecting 'on' will also enable the mitigation + against user space to user space task attacks. + + Selecting 'off' will disable both the kernel and + the user space protections. + + Specific mitigations can also be selected manually: + + retpoline + replace indirect branches + retpoline,generic + google's original retpoline + retpoline,amd + AMD-specific minimal thunk + + Not specifying this option is equivalent to + spectre_v2=auto. + +For user space mitigation: + + spectre_v2_user= + + [X86] Control mitigation of Spectre variant 2 + (indirect branch speculation) vulnerability between + user space tasks + + on + Unconditionally enable mitigations. Is + enforced by spectre_v2=on + + off + Unconditionally disable mitigations. Is + enforced by spectre_v2=off + + prctl + Indirect branch speculation is enabled, + but mitigation can be enabled via prctl + per thread. The mitigation control state + is inherited on fork. + + prctl,ibpb + Like "prctl" above, but only STIBP is + controlled per thread. IBPB is issued + always when switching between different user + space processes. + + seccomp + Same as "prctl" above, but all seccomp + threads will enable the mitigation unless + they explicitly opt out. + + seccomp,ibpb + Like "seccomp" above, but only STIBP is + controlled per thread. IBPB is issued + always when switching between different + user space processes. + + auto + Kernel selects the mitigation depending on + the available CPU features and vulnerability. + + Default mitigation: + If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl" + + Not specifying this option is equivalent to + spectre_v2_user=auto. + + In general the kernel by default selects + reasonable mitigations for the current CPU. To + disable Spectre variant 2 mitigations, boot with + spectre_v2=off. Spectre variant 1 mitigations + cannot be disabled. + +Mitigation selection guide +-------------------------- + +1. Trusted userspace +^^^^^^^^^^^^^^^^^^^^ + + If all userspace applications are from trusted sources and do not + execute externally supplied untrusted code, then the mitigations can + be disabled. + +2. Protect sensitive programs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + For security-sensitive programs that have secrets (e.g. crypto + keys), protection against Spectre variant 2 can be put in place by + disabling indirect branch speculation when the program is running + (See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`). + +3. Sandbox untrusted programs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + Untrusted programs that could be a source of attacks can be cordoned + off by disabling their indirect branch speculation when they are run + (See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`). + This prevents untrusted programs from polluting the branch target + buffer. All programs running in SECCOMP sandboxes have indirect + branch speculation restricted by default. This behavior can be + changed via the kernel command line and sysfs control files. See + :ref:`spectre_mitigation_control_command_line`. + +3. High security mode +^^^^^^^^^^^^^^^^^^^^^ + + All Spectre variant 2 mitigations can be forced on + at boot time for all programs (See the "on" option in + :ref:`spectre_mitigation_control_command_line`). This will add + overhead as indirect branch speculations for all programs will be + restricted. + + On x86, branch target buffer will be flushed with IBPB when switching + to a new program. STIBP is left on all the time to protect programs + against variant 2 attacks originating from programs running on + sibling threads. + + Alternatively, STIBP can be used only when running programs + whose indirect branch speculation is explicitly disabled, + while IBPB is still used all the time when switching to a new + program to clear the branch target buffer (See "ibpb" option in + :ref:`spectre_mitigation_control_command_line`). This "ibpb" option + has less performance cost than the "on" option, which leaves STIBP + on all the time. + +References on Spectre +--------------------- + +Intel white papers: + +.. _spec_ref1: + +[1] `Intel analysis of speculative execution side channels <https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/Intel-Analysis-of-Speculative-Execution-Side-Channels.pdf>`_. + +.. _spec_ref2: + +[2] `Bounds check bypass <https://software.intel.com/security-software-guidance/software-guidance/bounds-check-bypass>`_. + +.. _spec_ref3: + +[3] `Deep dive: Retpoline: A branch target injection mitigation <https://software.intel.com/security-software-guidance/insights/deep-dive-retpoline-branch-target-injection-mitigation>`_. + +.. _spec_ref4: + +[4] `Deep Dive: Single Thread Indirect Branch Predictors <https://software.intel.com/security-software-guidance/insights/deep-dive-single-thread-indirect-branch-predictors>`_. + +AMD white papers: + +.. _spec_ref5: + +[5] `AMD64 technology indirect branch control extension <https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf>`_. + +.. _spec_ref6: + +[6] `Software techniques for managing speculation on AMD processors <https://developer.amd.com/wp-content/resources/90343-B_SoftwareTechniquesforManagingSpeculation_WP_7-18Update_FNL.pdf>`_. + +ARM white papers: + +.. _spec_ref7: + +[7] `Cache speculation side-channels <https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/download-the-whitepaper>`_. + +.. _spec_ref8: + +[8] `Cache speculation issues update <https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/latest-updates/cache-speculation-issues-update>`_. + +Google white paper: + +.. _spec_ref9: + +[9] `Retpoline: a software construct for preventing branch-target-injection <https://support.google.com/faqs/answer/7625886>`_. + +MIPS white paper: + +.. _spec_ref10: + +[10] `MIPS: response on speculative execution and side channel vulnerabilities <https://www.mips.com/blog/mips-response-on-speculative-execution-and-side-channel-vulnerabilities/>`_. + +Academic papers: + +.. _spec_ref11: + +[11] `Spectre Attacks: Exploiting Speculative Execution <https://spectreattack.com/spectre.pdf>`_. + +.. _spec_ref12: + +[12] `NetSpectre: Read Arbitrary Memory over Network <https://arxiv.org/abs/1807.10535>`_. + +.. _spec_ref13: + +[13] `Spectre Returns! Speculation Attacks using the Return Stack Buffer <https://www.usenix.org/system/files/conference/woot18/woot18-paper-koruyeh.pdf>`_. |