Age | Commit message (Collapse) | Author | Files | Lines |
|
i.MX8DXL use difference irq number for PCIe EP DMA.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
Add unified pcie<n> and pcie<n>_ep label for existed chipes to prepare
applied one overay file to enable EP function.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
The needed drivers to support PCIe and SATA for i.MX 8QM have been
added.
Configure them for the Apalis iMX8 SoM.
The pciea and pcieb blocks each get a single PCIe lane, pciea is
available on the carrier boards while pcieb is connected to the
on module Wi-Fi/BT module.
The SATA lane is available on the carrier boards.
Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
After some recent changes to the RISC-V crypto code that turned some
indirect function calls into direct ones, builds with CONFIG_CFI_CLANG
fail with:
ld.lld: error: undefined symbol: __kcfi_typeid_sm3_transform_zvksh_zvkb
>>> referenced by arch/riscv/crypto/sm3-riscv64-zvksh-zvkb.o:(.text+0x2) in archive vmlinux.a
ld.lld: error: undefined symbol: __kcfi_typeid_sha512_transform_zvknhb_zvkb
>>> referenced by arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.o:(.text+0x2) in archive vmlinux.a
ld.lld: error: undefined symbol: __kcfi_typeid_sha256_transform_zvknha_or_zvknhb_zvkb
>>> referenced by arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.o:(.text+0x2) in archive vmlinux.a
As these functions are no longer indirectly called (i.e., have their
address taken), the compiler will not emit __kcfi_typeid symbols for
them but SYM_TYPED_FUNC_START expects these to exist at link time.
Switch the definitions of these functions to use SYM_FUNC_START, as they
no longer need kCFI type information since they are only called
directly.
Fixes: 1523eaed0ac5 ("crypto: riscv/sm3 - Use API partial block handling")
Fixes: 561aab1104d8 ("crypto: riscv/sha512 - Use API partial block handling")
Fixes: e6c5597badf2 ("crypto: riscv/sha256 - Use API partial block handling")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Using the MDIO pins with Open Drain causes spec violations of the
signals. Revert the changes.
This reverts commit 315d7f301e234b99c1b9619f0b14cf288dc7c33f.
Signed-off-by: Markus Niebel <Markus.Niebel@ew.tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
Using the MDIO pins with Open Drain causes spec violations of the
signals. Revert the changes.
This reverts commit 9015397c2f2d9d327c0cf88d74e39c4858cb4912.
Signed-off-by: Markus Niebel <Markus.Niebel@ew.tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
Enable the interrupts and wakeup-source to allow the external RTC to be
used as an alarm.
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
Enable the interrupts and wakeup-source to allow the external RTC to be
used as an alarm.
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
Enable the interrupts and wakeup-source to allow the external RTC to be
used as an alarm.
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
The Ethernet PHY setup currently assumes that the bootloader will take the
PHY out of reset, but this behavior is not guaranteed across all
bootloaders. Add the reset GPIO to ensure the kernel can properly control
the PHY reset line.
Also configure the PHY IRQ GPIO to enable interrupt-driven link status
reporting, instead of relying on polling.
This ensures more reliable Ethernet initialization and improves PHY event
handling.
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
The Ethernet PHY setup currently assumes that the bootloader will take the
PHY out of reset, but this behavior is not guaranteed across all
bootloaders. Add the reset GPIO to ensure the kernel can properly control
the PHY reset line.
Also configure the PHY IRQ GPIO to enable interrupt-driven link status
reporting, instead of relying on polling.
This ensures more reliable Ethernet initialization and improves PHY event
handling.
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
The HDMI bridge chip fails to generate an audio source due to the SAI5
master clock (MCLK) direction not being set to output. This prevents proper
clocking of the HDMI audio interface.
Add the `fsl,sai-mclk-direction-output` property to the SAI5 node to ensure
the MCLK is driven by the SoC, resolving the HDMI sound issue.
Fixes: 1d6880ceef43 ("arm64: dts: imx8mn-beacon: Add HDMI video with sound")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
The HDMI bridge chip fails to generate an audio source due to the SAI5
master clock (MCLK) direction not being set to output. This prevents proper
clocking of the HDMI audio interface.
Add the `fsl,sai-mclk-direction-output` property to the SAI5 node to ensure
the MCLK is driven by the SoC, resolving the HDMI sound issue.
Fixes: 8ad7d14d99f3 ("arm64: dts: imx8mm-beacon: Add HDMI video with sound")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
Although not noticeable when used every day, the RTC appears to drift when
left to sit over time. This is due to the capacitive load not being
properly set. Fix RTC drift by correcting the capacitive load setting
from 7000 to 12500, which matches the actual hardware configuration.
Fixes: 25a5ccdce767 ("arm64: dts: freescale: Introduce imx8mp-beacon-kit")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
Although not noticeable when used every day, the RTC appears to drift when
left to sit over time. This is due to the capacitive load not being
properly set. Fix RTC drift by correcting the capacitive load setting
from 7000 to 12500, which matches the actual hardware configuration.
Fixes: 36ca3c8ccb53 ("arm64: dts: imx: Add Beacon i.MX8M Nano development kit")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
Although not noticeable when used every day, the RTC appears to drift when
left to sit over time. This is due to the capacitive load not being
properly set. Fix RTC drift by correcting the capacitive load setting
from 7000 to 12500, which matches the actual hardware configuration.
Fixes: 593816fa2f35 ("arm64: dts: imx: Add Beacon i.MX8m-Mini development kit")
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
This adds support for TQMa93xx module attached to MBa91xxCA board.
TQMa93xx is a SOM using i.MX93 SOC. The SOM features PMIC, RAM, e-MMC and
some optional peripherals like SPI-NOR, RTC, EEPROM, gyroscope and
secure element.
TQMa93xxCA can be attached directly while TQMa93xxLA needs an adapter.
Signed-off-by: Markus Niebel <Markus.Niebel@ew.tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
Add DT support for Toradex SMARC iMX8MP SoM and Toradex SMARC Development
carrier board.
Link: https://www.toradex.com/computer-on-modules/smarc-arm-family/nxp-imx-8m-plus
Link: https://www.toradex.com/products/carrier-board/smarc-development-board-kit
Co-developed-by: Hiago De Franco <hiago.franco@toradex.com>
Signed-off-by: Hiago De Franco <hiago.franco@toradex.com>
Signed-off-by: Vitor Soares <vitor.soares@toradex.com>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
Add support for the PCA85073A RTC module connected via I2C0 on
S32G274A-RDB2 and S32G399A-RDB3 boards.
Note that the PCA85073A RTC module is not battery backed.
Signed-off-by: Ciprian Marian Costea <ciprianmarian.costea@oss.nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
On this board, USB2.0 is a host-only port, add vbus regulator node
and enable USB2.0 node.
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
On this board, USB2.0 is a host-only port, add vbus regulator node
and enable USB2.0 node.
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
Add USB2.0 controller and phy nodes.
Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com> # TQMa95xxSA
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
The compatible "plx,pex8605" does not exist, there is no DT binding for
it and there was never a driver matching this compatible, remove it.
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
Add support for Boundary Devices/Ezurio Nitrogen8M Plus ENC Carrier
Board and it's SOM. Supported interfaces:
- Serial Console
- EQoS Ethernet
- USB
- eMMC
- HDMI
Signed-off-by: Martyn Welch <martyn.welch@collabora.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
strcpy() is deprecated; use strscpy() instead.
Since the destination buffer has a fixed length, strscpy() automatically
determines its size using sizeof() when the size argument is omitted.
This makes the explicit size argument unnecessary - remove it.
Now, combine both if-else branches using strscpy() and the same buffer
into a single statement to simplify the code.
No functional changes intended.
Link: https://github.com/KSPP/linux/issues/88
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
The NIOS2 hardware does not support conflicting mappings for the same
virtual address (see Nios II Processor Reference Guide from 2023.08.28):
"The operating system software is responsible for guaranteeing that
multiple TLB entries do not map the same virtual address. The hardware
behavior is undefined when multiple entries map the same virtual
address."
When flushing tlb-entries, the kernel may violate this invariant for
virtual addresses related to PID 0 as flushing is currently implemented
by assigning physical address, pid and flags to 0.
A small example:
Before flushing TLB mappings for pid:0x42:
dump tlb-entries for line=0xd (addr 000d0000):
-- way:09 vpn:0x0006d000 phys:0x01145000 pid:0x00 flags:rw--c
...
-- way:0d vpn:0x0006d000 phys:0x02020000 pid:0x42 flags:rw--c
After flushing TLB mappings for pid:0x42:
dump tlb-entries for line=0xd (addr 000d0000):
-- way:09 vpn:0x0006d000 phys:0x01145000 pid:0x00 flags:rw--c
...
-- way:0d vpn:0x0006d000 phys:0x00000000 pid:0x00 flags:-----
As functions such as replace_tlb_one_pid operate on the assumption of
unique mappings, this can cause repeated pagefaults for a single
address that are not covered by the existing spurious pagefault handling.
This commit fixes this issue by keeping the pid field of the entries
when flushing. That way, no conflicting mappings are introduced as the
pair of <address,pid> is now kept unique:
Fixed example after flushing TLB mappings for pid:0x42:
dump tlb-entries for line=0xd (addr 000d0000):
-- way:09 vpn:0x0006d000 phys:0x01145000 pid:0x00 flags:rw--c
...
-- way:0d vpn:0x0006d000 phys:0x00000000 pid:0x42 flags:-----
When flushing the complete tlb/initialising all entries, the way is
used as a substitute mmu pid value for the "invalid" entries.
Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com>
Signed-off-by: Andreas Oetken <andreas.oetken@siemens-energy.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
NIOS2 uses a software-managed TLB for virtual address translation. To
flush a cache line, the original mapping is replaced by one to physical
address 0x0 with no permissions (rwx mapped to 0) set. This can lead to
TLB-permission--related traps when such a nominally flushed entry is
encountered as a mapping for an otherwise valid virtual address within a
process (e.g. due to an MMU-PID-namespace rollover that previously
flushed the complete TLB including entries of existing, running
processes).
The default ptep_set_access_flags implementation from mm/pgtable-generic.c
only forces a TLB-update when the page-table entry has changed within the
page table:
/*
* [...] We return whether the PTE actually changed, which in turn
* instructs the caller to do things like update__mmu_cache. [...]
*/
int ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep,
pte_t entry, int dirty)
{
int changed = !pte_same(*ptep, entry);
if (changed) {
set_pte_at(vma->vm_mm, address, ptep, entry);
flush_tlb_fix_spurious_fault(vma, address);
}
return changed;
}
However, no cross-referencing with the TLB-state occurs, so the
flushing-induced pseudo entries that are responsible for the pagefault
in the first place are never pre-empted from TLB on this code path.
This commit fixes this behaviour by always requesting a TLB-update in
this part of the pagefault handling, fixing spurious page-faults on the
way. The handling is a straightforward port of the logic from the MIPS
architecture via an arch-specific ptep_set_access_flags function ported
from arch/mips/include/asm/pgtable.h.
Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com>
Signed-off-by: Andreas Oetken <andreas.oetken@siemens-energy.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
The XOL (execute out-of-line) buffer is used to single-step the
replaced instruction(s) for uprobes. The RISC-V port was missing a
proper fence.i (i$ flushing) after constructing the XOL buffer, which
can result in incorrect execution of stale/broken instructions.
This was found running the BPF selftests "test_progs:
uprobe_autoattach, attach_probe" on the Spacemit K1/X60, where the
uprobes tests randomly blew up.
Reviewed-by: Guo Ren <guoren@kernel.org>
Fixes: 74784081aac8 ("riscv: Add uprobes supported")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250419111402.1660267-2-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The flush_icache_range() function is implemented as a "function-like
macro with unused parameters", which can result in "unused variables"
warnings.
Replace the macro with a static inline function, as advised by
Documentation/process/coding-style.rst.
Fixes: 08f051eda33b ("RISC-V: Flush I$ when making a dirty page executable")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250419111402.1660267-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Micro-optimize vmx_do_interrupt_irqoff() by substituting
MOV %RBP,%RSP; POP %RBP instruction sequence with equivalent
LEAVE instruction. GCC compiler does this by default for
a generic tuning and for all modern processors:
DEF_TUNE (X86_TUNE_USE_LEAVE, "use_leave",
m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_ZHAOXIN
| m_TREMONT | m_CORE_HYBRID | m_CORE_ATOM | m_GENERIC)
The new code also saves a couple of bytes, from:
27: 48 89 ec mov %rbp,%rsp
2a: 5d pop %rbp
to:
27: c9 leave
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20250414081131.97374-2-ubizjak@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Explicitly verify the MSR load/store list counts are below the advertised
limit as part of the initial consistency checks on the lists, so that code
that consumes the count doesn't need to worry about extreme edge cases.
Enforcing the limit during the initial checks fixes a flaw on 32-bit KVM
where a sufficiently high @count could lead to overflow:
arch/x86/kvm/vmx/nested.c:834 nested_vmx_check_msr_switch()
warn: potential user controlled sizeof overflow 'addr + count * 16' '0-u64max + 16-68719476720'
arch/x86/kvm/vmx/nested.c
827 static int nested_vmx_check_msr_switch(struct kvm_vcpu *vcpu,
828 u32 count, u64 addr)
829 {
830 if (count == 0)
831 return 0;
832
833 if (!kvm_vcpu_is_legal_aligned_gpa(vcpu, addr, 16) ||
--> 834 !kvm_vcpu_is_legal_gpa(vcpu, (addr + count * sizeof(struct vmx_msr_entry) - 1)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
While the SDM doesn't explicitly state an illegal count results in VM-Fail,
the SDM states that exceeding the limit may result in undefined behavior.
I.e. the SDM gives hardware, and thus KVM, carte blanche to do literally
anything in response to a count that exceeds the "recommended" limit.
If the limit is exceeded, undefined processor behavior may result
(including a machine check during the VMX transition).
KVM already enforces the limit when processing the MSRs, i.e. already
signals a late VM-Exit Consistency Check for VM-Enter, and generates a
VMX Abort for VM-Exit. I.e. explicitly checking the limits simply means
KVM will signal VM-Fail instead of VM-Exit or VMX Abort.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/44961459-2759-4164-b604-f6bd43da8ce9@stanley.mountain
Link: https://lore.kernel.org/r/20250315024402.2363098-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.15-rc4).
This pull includes wireless and a fix to vxlan which isn't
in Linus's tree just yet. The latter creates with a silent conflict
/ build breakage, so merging it now to avoid causing problems.
drivers/net/vxlan/vxlan_vnifilter.c
094adad91310 ("vxlan: Use a single lock to protect the FDB table")
087a9eb9e597 ("vxlan: vnifilter: Fix unlocked deletion of default FDB entry")
https://lore.kernel.org/20250423145131.513029-1-idosch@nvidia.com
No "normal" conflicts, or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
An AP destroy request for a target vCPU is typically followed by an
RMPADJUST to remove the VMSA attribute from the page currently being
used as the VMSA for the target vCPU. This can result in a vCPU that
is about to VMRUN to exit with #VMEXIT_INVALID.
This usually does not happen as APs are typically sitting in HLT when
being destroyed and therefore the vCPU thread is not running at the time.
However, if HLT is allowed inside the VM, then the vCPU could be about to
VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
guest to crash. An RMPADJUST against an in-use (already running) VMSA
results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
attribute cannot be changed until the VMRUN for target vCPU exits. The
Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
HLT inside the guest.
Update the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event to include the
KVM_REQUEST_WAIT flag. The kvm_vcpu_kick() function will not wait for
requests to be honored, so create kvm_make_request_and_kick() that will
add a new event request and honor the KVM_REQUEST_WAIT flag. This will
ensure that the target vCPU sees the AP destroy request before returning
to the initiating vCPU should the target vCPU be in guest mode.
Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/fe2c885bf35643dd224e91294edb6777d5df23a4.1743097196.git.thomas.lendacky@amd.com
[sean: add a comment explaining the use of smp_send_reschedule()]
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Now that posted MSI and KVM harvesting of PIR is identical, extract the
code (and posted MSI's wonderful comment) to a common helper.
No functional change intended.
Link: https://lore.kernel.org/r/20250401163447.846608-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Use arch_xchg() when moving IRQs from the PIR to the vIRR, purely to avoid
instrumentation so that KVM is compatible with the needs of posted MSI.
This will allow extracting the core PIR logic to common code and sharing
it between KVM and posted MSI handling.
Link: https://lore.kernel.org/r/20250401163447.846608-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Rework KVM's processing of the PIR to use the same algorithm as posted
MSIs, i.e. to do READ(x4) => XCHG(x4) instead of (READ+XCHG)(x4). Given
KVM's long-standing, sub-optimal use of 32-bit accesses to the PIR, it's
safe to say far more thought and investigation was put into handling the
PIR for posted MSIs, i.e. there's no reason to assume KVM's existing
logic is meaningful, let alone superior.
Matching the processing done by posted MSIs will also allow deduplicating
the code between KVM and posted MSIs.
See the comment for handle_pending_pir() added by commit 1b03d82ba15e
("x86/irq: Install posted MSI notification handler") for details on
why isolating loads from XCHG is desirable.
Suggested-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250401163447.846608-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Process the PIR at the natural kernel width, i.e. in 64-bit chunks on
64-bit kernels, so that the worst case of having a posted IRQ in each
chunk of the vIRR only requires 4 loads and xchgs from/to the PIR, not 8.
Deliberately use a "continue" to skip empty entries so that the code is a
carbon copy of handle_pending_pir(), in anticipation of deduplicating KVM
and posted MSI logic.
Suggested-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250401163447.846608-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Track the PIR bitmap in posted interrupt descriptor structures as an array
of unsigned longs instead of using unionized arrays for KVM (u32s) versus
IRQ management (u64s). In practice, because the non-KVM usage is (sanely)
restricted to 64-bit kernels, all existing usage of the u64 variant is
already working with unsigned longs.
Using "unsigned long" for the array will allow reworking KVM's processing
of the bitmap to read/write in 64-bit chunks on 64-bit kernels, i.e. will
allow optimizing KVM by reducing the number of atomic accesses to PIR.
Opportunstically replace the open coded literals in the posted MSIs code
with the appropriate macro. Deliberately don't use ARRAY_SIZE() in the
for-loops, even though it would be cleaner from a certain perspective, in
anticipation of decoupling the processing from the array declaration.
No functional change intended.
Link: https://lore.kernel.org/r/20250401163447.846608-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Read each vIRR exactly once when shuffling IRQs from the PIR to the vAPIC
to ensure getting the highest priority IRQ from the chunk doesn't reload
from the vIRR. In practice, a reload is functionally benign as vcpu->mutex
is held and so IRQs can be consumed, i.e. new IRQs can appear, but existing
IRQs can't disappear.
Link: https://lore.kernel.org/r/20250401163447.846608-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Track whether or not at least one IRQ was found in PIR during the initial
loop to load PIR chunks from memory. Doing so generates slightly better
code (arguably) for processing the for-loop of XCHGs, especially for the
case where there are no pending IRQs.
Note, while PIR can be modified between the initial load and the XCHG, it
can only _gain_ new IRQs, i.e. there is no danger of a false positive due
to the final version of pir_copy[] being empty.
Opportunistically convert the boolean to an "unsigned long" and compute
the effective boolean result via bitwise-OR. Some compilers, e.g.
clang-14, need the extra "hint" to elide conditional branches.
Opportunistically rename the variable in anticipation of moving the PIR
accesses to a common helper that can be shared by posted MSIs and KVM.
Old:
<+74>: test %rdx,%rdx
<+77>: je 0xffffffff812bbeb0 <handle_pending_pir+144>
<pir[0]>
<+88>: mov $0x1,%dl>
<+90>: test %rsi,%rsi
<+93>: je 0xffffffff812bbe8c <handle_pending_pir+108>
<pir[1]>
<+106>: mov $0x1,%dl
<+108>: test %rcx,%rcx
<+111>: je 0xffffffff812bbe9e <handle_pending_pir+126>
<pir[2]>
<+124>: mov $0x1,%dl
<+126>: test %rax,%rax
<+129>: je 0xffffffff812bbeb9 <handle_pending_pir+153>
<pir[3]>
<+142>: jmp 0xffffffff812bbec1 <handle_pending_pir+161>
<+144>: xor %edx,%edx
<+146>: test %rsi,%rsi
<+149>: jne 0xffffffff812bbe7f <handle_pending_pir+95>
<+151>: jmp 0xffffffff812bbe8c <handle_pending_pir+108>
<+153>: test %dl,%dl
<+155>: je 0xffffffff812bbf8e <handle_pending_pir+366>
New:
<+74>: mov %rax,%r8
<+77>: or %rcx,%r8
<+80>: or %rdx,%r8
<+83>: or %rsi,%r8
<+86>: setne %bl
<+89>: je 0xffffffff812bbf88 <handle_pending_pir+360>
<+95>: test %rsi,%rsi
<+98>: je 0xffffffff812bbe8d <handle_pending_pir+109>
<pir[0]>
<+109>: test %rdx,%rdx
<+112>: je 0xffffffff812bbe9d <handle_pending_pir+125>
<pir[1]>
<+125>: test %rcx,%rcx
<+128>: je 0xffffffff812bbead <handle_pending_pir+141>
<pir[2]>
<+141>: test %rax,%rax
<+144>: je 0xffffffff812bbebd <handle_pending_pir+157>
<pir[3]>
Link: https://lore.kernel.org/r/20250401163447.846608-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Ensure the PIR is read exactly once at the start of handle_pending_pir(),
to guarantee that checking for an outstanding posted interrupt in a given
chuck doesn't reload the chunk from the "real" PIR. Functionally, a reload
is benign, but it would defeat the purpose of pre-loading into a copy.
Fixes: 1b03d82ba15e ("x86/irq: Install posted MSI notification handler")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20250401163447.846608-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
insn_decoder_test found a problem with decoding APX CTEST instructions:
Found an x86 instruction decoder bug, please report this.
ffffffff810021df 62 54 94 05 85 ff ctestneq
objdump says 6 bytes, but insn_get_length() says 5
It happens because x86-opcode-map.txt doesn't specify arguments for the
instruction and the decoder doesn't expect to see ModRM byte.
Fixes: 690ca3a3067f ("x86/insn: Add support for APX EVEX instructions to the opcode map")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # v6.10+
Link: https://lore.kernel.org/r/20250423065815.2003231-1-kirill.shutemov@linux.intel.com
|
|
Add a module param to each KVM vendor module to allow disabling device
posted interrupts without having to sacrifice all of APICv/AVIC, and to
also effectively enumerate to userspace whether or not KVM may be
utilizing device posted IRQs. Disabling device posted interrupts is
very desirable for testing, and can even be desirable for production
environments, e.g. if the host kernel wants to interpose on device
interrupts.
Put the module param in kvm-{amd,intel}.ko instead of kvm.ko to match
the overall APICv/AVIC controls, and to avoid complications with said
controls. E.g. if the param is in kvm.ko, KVM needs to be snapshot the
original user-defined value to play nice with a vendor module being
reloaded with different enable_apicv settings.
Link: https://lore.kernel.org/r/20250401161804.842968-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When starting device assignment, i.e. potential IRQ bypass, don't blast
KVM_REQ_UNBLOCK if APICv is disabled/unsupported. There is no need to
wake vCPUs if they can never use VT-d posted IRQs (sending UNBLOCK guards
against races being vCPUs blocking and devices starting IRQ bypass).
Opportunistically use kvm_arch_has_irq_bypass() for all relevant checks in
the VMX Posted Interrupt code so that all checks in KVM x86 incorporate
the same information (once AMD/AVIC is given similar treatment).
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://lore.kernel.org/r/20250401161804.842968-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Rescan I/O APIC routes for a vCPU after handling an intercepted I/O APIC
EOI for an IRQ that is not targeting said vCPU, i.e. after handling what's
effectively a stale EOI VM-Exit. If a level-triggered IRQ is in-flight
when IRQ routing changes, e.g. because the guest changes routing from its
IRQ handler, then KVM intercepts EOIs on both the new and old target vCPUs,
so that the in-flight IRQ can be de-asserted when it's EOI'd.
However, only the EOI for the in-flight IRQ needs to be intercepted, as
IRQs on the same vector with the new routing are coincidental, i.e. occur
only if the guest is reusing the vector for multiple interrupt sources.
If the I/O APIC routes aren't rescanned, KVM will unnecessarily intercept
EOIs for the vector and negative impact the vCPU's interrupt performance.
Note, both commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig
race") and commit 0fc5a36dd6b3 ("KVM: x86: ioapic: Fix level-triggered EOI
and IOAPIC reconfigure race") mentioned this issue, but it was considered
a "rare" occurrence thus was not addressed. However in real environments,
this issue can happen even in a well-behaved guest.
Cc: Kai Huang <kai.huang@intel.com>
Co-developed-by: xuyun <xuyun_xy.xy@linux.alibaba.com>
Signed-off-by: xuyun <xuyun_xy.xy@linux.alibaba.com>
Signed-off-by: weizijie <zijie.wei@linux.alibaba.com>
[sean: massage changelog and comments, use int/-1, reset at scan]
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20250304013335.4155703-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Extract the vCPU specific EOI interception logic for I/O APIC emulation
into a common helper for userspace and in-kernel emulation in anticipation
of optimizing the "pending EOI" case.
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20250304013335.4155703-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Extract and isolate the trigger mode check in kvm_scan_ioapic_routes() in
anticipation of moving destination matching logic to a common helper (for
userspace vs. in-kernel I/O APIC emulation).
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20250304013335.4155703-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The latest AMD platform has introduced a new instruction called PREFETCHI.
This instruction loads a cache line from a specified memory address into
the indicated data or instruction cache level, based on locality reference
hints.
Feature bit definition:
CPUID_Fn80000021_EAX [bit 20] - Indicates support for IC prefetch.
This feature is analogous to Intel's PREFETCHITI (CPUID.(EAX=7,ECX=1):EDX),
though the CPUID bit definitions differ between AMD and Intel.
Advertise support to userspace, as no additional enabling is necessary
(PREFETCHI can't be intercepted as there's no instruction specific behavior
that needs to be virtualize).
The feature is documented in Processor Programming Reference (PPR)
for AMD Family 1Ah Model 02h, Revision C1 (Link below).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/ee1c08fc400bb574a2b8f2c6a0bd9def10a29d35.1744130533.git.babu.moger@amd.com
[sean: rewrite shortlog to highlight the KVM functionality]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
WRMSR_XX_BASE_NS is bit 1 so put it there, add some new bits as
comments only.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250324160617.15379-1-bp@kernel.org
[sean: skip the FSRS/FSRC placeholders to avoid confusion]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Returning a literal X86EMUL_CONTINUE is slightly clearer than returning
rc.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/7604cbbf-15e6-45a8-afec-cf5be46c2924@stanley.mountain
Signed-off-by: Sean Christopherson <seanjc@google.com>
|