diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-05-14 06:10:19 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-05-14 06:10:19 +0300 |
commit | 796aec4a5b5850967af0c42d4e84df2d748d570b (patch) | |
tree | 1a2240a29c2e62eb9eb7ee6ac558eebd5431cc0f /block/blk.h | |
parent | a5131c3fdf2608f1c15f3809e201cf540eb28489 (diff) | |
parent | 6827738dc684a87ad54ebba3ae7f3d7c977698eb (diff) | |
download | linux-796aec4a5b5850967af0c42d4e84df2d748d570b.tar.xz |
Merge tag 'idxd-for-linus-may2024' of git bundle from Arjan
Pull DSA and IAA accelerator mis-alignment fix from Arjan van de Ven:
"The DSA (memory copy/zero/etc) and IAA (compression) accelerators in
the Sapphire Rapids and Emerald Rapids SOCs turn out to have a bug
that has security implications.
Both of these accelerators work by the application submitting a 64
byte command to the device; this command contains an opcode as well as
the virtual address of the return value that the device will update on
completion... and a set of opcode specific values.
In a typical scenario a ring 3 application mmaps the device file and
uses the ENQCMD or MOVDIR64 instructions (which are variations of a 64
byte atomic write) on this mmap'd memory region to directly submit
commands to a device hardware.
The return value as specified in the command, is supposed to be 32 (or
64) bytes aligned in memory, and generally the hardware checks and
enforces this alignment.
However in testing it has been found that there are conditions
(controlled by the submitter) where this enforcement does not
happen... which makes it possible for the return value to span a page
boundary. And this is where it goes wrong - the accelerators will
perform the virtual to physical address lookup on the first of the two
pages, but end up continue writing to the next consecutive physical
(host) page rather than the consecutive virtual page. In addition, the
device will end up in a hung state on such unaligned write of the
return value.
This patch series has the proposed software side solution consisting
of three parts:
- Don't allow these two PCI devices to be assigned to VM guests (we
cannot trust a VM guest to behave correctly and not cause this
condition)
- Don't allow ring 3 applications to set up the mmap unless they have
CAP_SYS_RAWIO permissions. This makes it no longer possible for
non-root applications to directly submit commands to the
accelerator
- Add a write() method to the device so that an application can
submit its commands to the kernel driver, which performs the needed
sanity checks before submitting it to the hardware.
This switch from mmap to write is an incompatible interface change to
non-root userspace, but we have not found a way to avoid this. All
software we know of uses a small set of accessor libraries for these
accelerators, for which libqpl and libdml (on github) are the most
common. As part of the security release, updated versions of these
libraries will be released that transparently fall back to write().
Intel has assigned CVE-2024-21823 to this hardware issue"
* tag 'idxd-for-linus-may2024' of git bundle from Arjan:
dmaengine: idxd: add a write() method for applications to submit work
dmaengine: idxd: add a new security check to deal with a hardware erratum
VFIO: Add the SPR_DSA and SPR_IAX devices to the denylist
Diffstat (limited to 'block/blk.h')
0 files changed, 0 insertions, 0 deletions