| Age | Commit message (Collapse) | Author | Files | Lines |
|
Add the maintenance interrupt to force an exit when the guest
enables/disables individual groups, so that we can resort the
ap_list accordingly.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-27-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Interrupts in the ap_list that cannot be acted upon because they
are not enabled, or that their group is not enabled, shouldn't
make it into the LRs if we are space-constrained.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-26-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Having established that pending interrupts should have priority
to be moved into the LRs over the active interrupts, implement this
in the ap_list sorting.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-25-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Make the internal crystal ball global, so that implementation-specific
code can use it.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-24-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Now that we always reconfigure the vgic HCR register on entry,
the "enable" part of kvm_vgic_vcpu_enable() is pretty useless.
Removing the enable bits from these functions makes it plain that
they are just about computing the reset state. Just rename the
functions accordingly.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-23-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
We currently don't use the maintenance interrupt very much, apart
from EOI on level interrupts, and for LR underflow in limited cases.
However, as we are moving toward a setup where active interrupts
can live outside of the LRs, we need to use the MIs in a more
diverse set of cases.
Add a new helper that produces a digest of the ap_list, and use
that summary to set the various control bits as required.
This slightly changes the way v2 SGIs are handled, as they used to
count for more than one interrupt, but not anymore.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-22-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
We currently save/restore the VMCR register in a pretty lazy way
(on load/put, consistently with what we do with the APRs).
However, we are going to need the group-enable bits that are backed
by VMCR on each entry (so that we can avoid injecting interrupts for
disabled groups).
Move the synchronisation from put to sync, which results in some minor
churn in the nVHE hypercalls to simplify things.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-21-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
As we are going to rely on the [G]ICH_HCR{,_EL2} register to be
programmed with MI information at all times, slightly de-optimise
the flush/sync code to always be called. This is rather lightweight
when no interrupts are in flight.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-20-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Split vgic_v2_populate_lr() into two helpers, so that we have another
primitive that computes the LR from a vgic_irq, but doesn't update
anything in the shadow structure.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-19-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
As we are going to need to handle deactivation for interrupts that
are not in the LRs, split vgic_v2_fold_lr_state() into a helper
that deals with a single interrupt, and the function that loops
over the used LRs.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-18-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Not programming GICH_HCR while no LRs are populated is a bit
of an issue, as we otherwise don't see any maintenance interrupt
when the guest interacts with the LRs.
Decouple the two and always program the control register, even when
we don't have to touch the LRs.
This is very similar to what we are already doing for GICv3.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-17-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
EOIcount is how the virtual CPU interface signals that the guest
is deactivating interrupts outside of the LRs when EOImode==0.
We therefore need to preserve that information so that we can find
out what actually needs deactivating, just like we already do on
GICv3.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-16-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Split vgic_v3_populate_lr() into two, so that we have another
primitive that computes the LR from a vgic_irq, but doesn't
update anything in the shadow structure.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-15-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
As we are going to need to handle deactivation for interrupts that
are not in the LRs, split vgic_v3_fold_lr_state() into a helper
that deals with a single interrupt, and the function that loops
over the used LRs.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-14-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Not programming ICH_HCR_EL2 while no LRs are populated is a bit
of an issue, as we otherwise don't see any maintenance interrupt
when the guest interacts with the LRs.
Decouple the two and always program the control register, even when
we don't have to touch the LRs.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-13-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
EOIcount is how the virtual CPU interface signals that the guest
is deactivating interrupts outside of the LRs when EOImode==0.
We therefore need to preserve that information so that we can find
out what actually needs deactivating.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-12-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Despite LPIs not having an active state, *virtual* LPIs do have
one, which gets cleared on EOI. So far, so good.
However, this leads to a small problem: when an active LPI is not
in the LRs, that EOImode==0 and that the guest EOIs it, EOIcount
doesn't get bumped up. Which means that in these condition, the
LPI would stay active forever.
Clearly, we can't have that. So if we spot an active LPI, we drop
that state. It's pretty pointless anyway, and only serves as a way
to trip SW over.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-11-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Add a bit of documentation describing how we are dealing with LR
overflow. This is mostly a braindump of how things are expected
to work. For now anyway.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-10-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
We currently cannot identify whether an interrupt is queued into
a LR. It wasn't needed until now, but that's about to change.
Add yet another flag to track that state.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-9-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
struct vgic_irq has grown over the years, in a rather bad way.
Repack it using bitfields so that the individual flags, and move
things around a bit so that it a bit smaller.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-8-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
A long time ago, an unsuspecting architect forgot to add a trap
bit for ICV_DIR_EL1 in ICH_HCR_EL2. Which was unfortunate, but
what's a bit of spec between friends? Thankfully, this was fixed
in a later revision, and ARM "deprecates" the lack of trapping
ability.
Unfortuantely, a few (billion) CPUs went out with that defect,
anything ARMv8.0 from ARM, give or take. And on these CPUs,
you can't trap DIR on its own, full stop.
As the next best thing, we can trap everything in the common group,
which is a tad expensive, but hey ho, that's what you get. You can
otherwise recycle the HW in the neaby bin.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-7-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
As we are about to start trapping a bunch of extra things, augment
the pKVM trap description with all the registers trapped by ICH_HCR_EL2.TC,
making them legal instead of resulting in a UNDEF injection in the guest.
While we're at it, ensure that pKVM captures the vgic model so that it
can be checked by the emulation code.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-6-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
The trap bits are currently only set to manage CPU errata. However,
we are about to make use of them for purposes beyond beating broken
CPUs into submission.
For this purpose, turn these errata-driven bits into a patched-in
constant that is merged with the KVM-driven value at the point of
programming the ICH_HCR_EL2 register, rather than being directly
stored with with the shadow value..
This allows the KVM code to distinguish between a trap being handled
for the purpose of an erratum workaround, or for KVM's own need.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-5-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Future changes will require KVM to be able to perform deactivations
by writing to the physical CPU interface. Add the corresponding
VA to the kvm_info structure, and let KVM stash it.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-3-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Add support for FEAT_XNX to shadow stage-2 MMUs, being careful to only
evaluate XN[0] when the feature is actually exposed to the VM.
Restructure the layering of permissions in the fault handler to assume
pX and uX then restricting based on the guest's stage-2 afterwards.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-4-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
FEAT_XNX adds support for encoding separate execute permissions for EL0
and EL1 at stage-2. Add support for this to the page table library,
hiding the unintuitive encoding scheme behind generic pX and uX
permission flags.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-3-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Detect the feature in anticipation of using it in KVM.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-2-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Update header inclusions to follow IWYU (Include What You Use)
principle.
Note that kernel.h is discouraged to be included as it's written
at the top of that file.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
|
|
(Ab)use the static_call infrastructure to convert all:
call __WARN_trap
instances into the desired:
ud1 (%edx), %rdi
eliminating the CALL/RET, but more importantly, fixing the
fact that all WARNs will have:
RIP: 0010:__WARN_trap+0
Basically, by making it a static_call trampoline call, objtool will
collect the callsites, and then the inline rewrite will hit the
special case and replace the code with the magic instruction.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251110115758.456717741@infradead.org
|
|
Implement WARN_ONCE like WARN using BUGFLAG_ONCE.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251110115758.339309119@infradead.org
|
|
The basic idea is to have __WARN_printf() be a vararg function such
that the compiler can do the optimal calling convention for us. This
function body will be a #UD and then set up a va_list in the exception
from pt_regs.
But because the trap will be in a called function, the bug_entry must
be passed in. Have that be the first argument, with the format tucked
away inside the bug_entry.
The comments should clarify the real fun details.
The big downside is that all WARNs will now show:
RIP: 0010:__WARN_trap:+0
One possible solution is to simply discard the top frame when
unwinding. A follow up patch takes care of this slightly differently
by abusing the x86 static_call implementation.
This changes (with the next patches):
WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET,
"corrupted preempt_count: %s/%d/0x%x\n",
from:
cmpl $2, %ecx #, _7
jne .L1472
...
.L1472:
cmpb $0, __already_done.11(%rip)
je .L1513
...
.L1513
movb $1, __already_done.11(%rip)
movl 1424(%r14), %edx # _15->pid, _15->pid
leaq 1912(%r14), %rsi #, _17
movq $.LC43, %rdi #,
call __warn_printk #
ud2
.pushsection __bug_table,"aw"
2:
.long 1b - . # bug_entry::bug_addr
.long .LC1 - . # bug_entry::file
.word 5093 # bug_entry::line
.word 2313 # bug_entry::flags
.org 2b + 12
.popsection
.pushsection .discard.annotate_insn,"M", @progbits, 8
.long 1b - .
.long 8 # ANNOTYPE_REACHABLE
.popsection
into:
cmpl $2, %ecx #, _7
jne .L1442 #,
...
.L1442:
lea (2f)(%rip), %rdi
1:
.pushsection __bug_table,"aw"
2:
.long 1b - . # bug_entry::bug_addr
.long .LC43 - . # bug_entry::format
.long .LC1 - . # bug_entry::file
.word 5093 # bug_entry::line
.word 2323 # bug_entry::flags
.org 2b + 16
.popsection
movl 1424(%r14), %edx # _19->pid, _19->pid
leaq 1912(%r14), %rsi #, _13
ud1 (%edx), %rdi
Notably, by pushing everything into the exception handler it can take
care of the ONCE thing.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251110115758.213813530@infradead.org
|
|
Since we have an explicit format string, use it for the condition string
instead of frobbing it in the file string.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251110115758.097401406@infradead.org
|
|
Opt-in to BUG_FORMAT for x86_64, adjust the BUGTABLE helper and for
now, just store NULL pointers.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251110115757.980264454@infradead.org
|
|
In the origin logic, the bpf_arch_text_poke() assume that the old and new
instructions have the same opcode. However, they can have different opcode
if we want to replace a "call" insn with a "jmp" insn.
Therefore, add the new function parameter "old_t" along with the "new_t",
which are used to indicate the old and new poke type. Meanwhile, adjust
the implement of bpf_arch_text_poke() for all the archs.
"BPF_MOD_NOP" is added to make the code more readable. In
bpf_arch_text_poke(), we still check if the new and old address is NULL to
determine if nop insn should be used, which I think is more safe.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20251118123639.688444-6-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In the origin call case, if BPF_TRAMP_F_SKIP_FRAME is not set, it means
that the trampoline is not called, but "jmp".
Introduce the function bpf_trampoline_use_jmp() to check if the trampoline
is in "jmp" mode.
Do some adjustment on the "jmp" mode for the x86_64. The main adjustment
that we make is for the stack parameter passing case, as the stack
alignment logic changes in the "jmp" mode without the "rip". What's more,
the location of the parameters on the stack also changes.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20251118123639.688444-5-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Some places calculate the origin_call by checking if
BPF_TRAMP_F_SKIP_FRAME is set. However, it should use
BPF_TRAMP_F_ORIG_STACK for this propose. Just fix them.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20251118123639.688444-4-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Implement the DYNAMIC_FTRACE_WITH_JMP for x86_64. In ftrace_call_replace,
we will use JMP32_INSN_OPCODE instead of CALL_INSN_OPCODE if the address
should use "jmp".
Meanwhile, adjust the direct call in the ftrace_regs_caller. The RSB is
balanced in the "jmp" mode. Take the function "foo" for example:
original_caller:
call foo -> foo:
call fentry -> fentry:
[do ftrace callbacks ]
move tramp_addr to stack
RET -> tramp_addr
tramp_addr:
[..]
call foo_body -> foo_body:
[..]
RET -> back to tramp_addr
[..]
RET -> back to original_caller
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20251118123639.688444-3-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-next
Georgi writes:
interconnect changes for 6.19
This pull request contains the interconnect changes for the 6.19-rc1
merge window. The core and driver changes are listed below.
Core changes:
- kbps_to_icc() macro optimization
Driver changes:
- Switch all Qualcomm RPMh interconnect drivers to use the dynamic
node IDs and drop support for non-dynamic ID allocation
- Add new driver and BWMON support for the Kaanapali SoC
- Add QoS support for the SM6350 SoC
- Add QoS support for the SA8775p SoC
- Fix missing link from SNOC_PNOC to the USB 2 on MSM8996 SoC that
includes also a dts change that has been acked by the maintainer
- Drop the QPIC interconnect and BCM nodes for the SDX75 SoC, as these
should be handled by the rpmh-clk driver
- Other misc fixes
Signed-off-by: Georgi Djakov <djakov@kernel.org>
* tag 'icc-6.19-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/djakov/icc: (40 commits)
interconnect: qcom: sm6350: enable QoS configuration
interconnect: qcom: sm6350: Remove empty BCM arrays
interconnect: qcom: icc-rpmh: Get parent's regmap for nested NoCs
dt-bindings: interconnect: qcom,sm6350-rpmh: Add clocks for QoS
dt-bindings: interconnect: qcom-bwmon: Document Kaanapali BWMONs
interconnect: qcom: icc-rpmh: drop support for non-dynamic IDS
interconnect: qcom: sm8750: convert to dynamic IDs
interconnect: qcom: sm8650: convert to dynamic IDs
interconnect: qcom: sm8550: convert to dynamic IDs
interconnect: qcom: sm8450: convert to dynamic IDs
interconnect: qcom: sm8350: convert to dynamic IDs
interconnect: qcom: sm8150: convert to dynamic IDs
interconnect: qcom: sm7150: convert to dynamic IDs
interconnect: qcom: sm6350: convert to dynamic IDs
interconnect: qcom: sdx75: convert to dynamic IDs
interconnect: qcom: sdx65: convert to dynamic IDs
interconnect: qcom: sdx55: convert to dynamic IDs
interconnect: qcom: sdm670: convert to dynamic IDs
interconnect: qcom: sc7180: convert to dynamic IDs
interconnect: qcom: sar2130p: convert to dynamic IDs
...
|
|
The "drop print" commit removed the whole branch and not just the print.
For some ARM64 cpus, this leads to hard lockup when
CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY is not enabled.
Fixes: 62e72463ca71 ("arm64: proton-pack: Drop print when !CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This reverts commit d02c2e45b1e7767b177f6854026e4ad0d70b4a4d.
Mauro reports that this breaks APEI notifications on his QEMU setup
because the "reserved for firmware" region still needs to be writable
by Linux in order to signal _back_ to the firmware after processing
the reported error:
| {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
| ...
| [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error
| Unable to handle kernel write to read-only memory at virtual address ffff800080035018
| Mem abort info:
| ESR = 0x000000009600004f
| EC = 0x25: DABT (current EL), IL = 32 bits
| SET = 0, FnV = 0
| EA = 0, S1PTW = 0
| FSC = 0x0f: level 3 permission fault
| Data abort info:
| ISV = 0, ISS = 0x0000004f, ISS2 = 0x00000000
| CM = 0, WnR = 1, TnD = 0, TagAccess = 0
| GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
| swapper pgtable: 4k pages, 52-bit VAs, pgdp=00000000505d7000
| pgd=10000000510bc003, p4d=1000000100229403, pud=100000010022a403, pmd=100000010022b403, pte=0060000139b90483
| Internal error: Oops: 000000009600004f [#1] SMP
For now, revert the offending commit. We can presumably switch back to
PAGE_KERNEL when bringing this back in the future.
Link: https://lore.kernel.org/r/20251121224611.07efa95a@foz.lan
Reported-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Remove hard-coded strings by using the string helper functions
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/87h5uywtwp.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Stackprotector support was previously unavailable on s390 because by
default compilers generate code which is not suitable for the kernel:
the canary value is accessed via thread local storage, where the address
of thread local storage is within access registers 0 and 1.
Using those registers also for the kernel would come with a significant
performance impact and more complicated kernel entry/exit code, since
access registers contents would have to be exchanged on every kernel entry
and exit.
With the upcoming gcc 16 release new compiler options will become available
which allow to generate code suitable for the kernel. [1]
Compiler option -mstack-protector-guard=global instructs gcc to generate
stackprotector code that refers to a global stackprotector canary value via
symbol __stack_chk_guard. Access to this value is guaranteed to occur via
larl and lgrl instructions.
Furthermore, compiler option -mstack-protector-guard-record generates a
section containing all code addresses that reference the canary value.
To allow for per task canary values the instructions which load the address
of __stack_chk_guard are patched so they access a lowcore field instead: a
per task canary value is available within the task_struct of each task, and
is written to the per-cpu lowcore location on each context switch.
Also add sanity checks and debugging option to be consistent with other
kernel code patching mechanisms.
Full debugging output can be enabled with the following kernel command line
options:
debug_stackprotector
bootdebug
ignore_loglevel
earlyprintk
dyndbg="file stackprotector.c +p"
Example debug output:
stackprot: 0000021e402d4eda: c010005a9ae3 -> c01f00070240
where "<insn address>: <old insn> -> <new insn>".
[1] gcc commit 0cd1f03939d5 ("s390: Support global stack protector")
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Preinitialize the return value, and break out the for loop in
module_finalize() in case of an error to get rid of an ifdef.
This makes it easier to add additional code, which may also depend
on config options.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The KMSG_COMPONENT macro is a leftover of the s390 specific "kernel
message catalog" which never made it upstream.
Remove the macro in order to get rid of a pointless indirection. Replace
all users with the string it defines. In almost all cases this leads to a
simple replacement like this:
- #define KMSG_COMPONENT "appldata"
- #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
+ #define pr_fmt(fmt) "appldata: " fmt
Except for some special cases this is just mechanical/scripted work.
Acked-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Since the rework of the kernel virtual address space [1] the module area
and the kernel image are within the same 4GB area. Therefore there is no
need for the weak per cpu workaround for modules anymore. Remove it.
[1] commit c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces")
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
- Drop CONFIG_SCTP_COOKIE_HMAC_SHA1=y (removed in commit
2f3dd6ec901f29ae ("sctp: Convert cookie authentication to use
HMAC-SHA256")),
- Drop CONFIG_BATMAN_ADV_NC=y (removed in commit 87b95082db32ae1c
("batman-adv: remove network coding support")),
- Enable modular build of the SHA-1 secure hash algorithm (no longer
auto-enabled since commit 2f3dd6ec901f29ae ("sctp: Convert cookie
authentication to use HMAC-SHA256")).
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://patch.msgid.link/65e00bcb7b2980278bb087986ee405627aa32d8b.1760360254.git.geert@linux-m68k.org
|
|
min_t(unsigned int, a, b) casts an 'unsigned long' to 'unsigned int'.
Use min(a, b) instead as it promotes any 'unsigned int' to 'unsigned long'
and so cannot discard significant bits.
In this case the 'unsigned long' value is small enough that the result
is ok.
Detected by an extra check added to min_t().
Signed-off-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Most implementations cache the combined result of two-stage translation,
but some, like Andes cores, use split TLBs that store VS-stage and
G-stage entries separately.
On such systems, when a VCPU migrates to another CPU, an additional
HFENCE.VVMA is required to avoid using stale VS-stage entries, which
could otherwise cause guest faults.
Introduce a static key to identify CPUs with split two-stage TLBs.
When enabled, KVM issues an extra HFENCE.VVMA on VCPU migration to
prevent stale VS-stage mappings.
Signed-off-by: Hui Min Mina Chou <minachou@andestech.com>
Signed-off-by: Ben Zong-You Xie <ben717@andestech.com>
Reviewed-by: Radim Krčmář <rkrcmar@ventanamicro.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Link: https://lore.kernel.org/r/20251117084555.157642-1-minachou@andestech.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
When executing HLV* instructions at the HS mode, a guest page fault
may occur when a g-stage page table migration between triggering the
virtual instruction exception and executing the HLV* instruction.
This may be a corner case, and one simpler way to handle this is to
re-execute the instruction where the virtual instruction exception
occurred, and the guest page fault will be automatically handled.
Fixes: b91f0e4cb8a3 ("RISC-V: KVM: Factor-out instruction emulation into separate sources")
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251121133543.46822-1-fangyu.yu@linux.alibaba.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
There is already support of enabling dirty log gradually in small chunks
for x86 in commit 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in
small chunks") and c862626 ("KVM: arm64: Support enabling dirty log
gradually in small chunks"). This adds support for riscv.
x86 and arm64 writes protect both huge pages and normal pages now, so
riscv protect also protects both huge pages and normal pages.
On a nested virtualization setup (RISC-V KVM running inside a QEMU VM
on an [Intel® Core™ i5-12500H] host), I did some tests with a 2G Linux
VM using different backing page sizes. The time taken for
memory_global_dirty_log_start in the L2 QEMU is listed below:
Page Size Before After Optimization
4K 4490.23ms 31.94ms
2M 48.97ms 45.46ms
1G 28.40ms 30.93ms
Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Signed-off-by: Dong Yang <dayss1224@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251103062825.9084-1-dayss1224@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|