diff options
| author | Mark Rutland <mark.rutland@arm.com> | 2026-04-07 16:16:45 +0300 |
|---|---|---|
| committer | Thomas Gleixner <tglx@kernel.org> | 2026-04-08 12:43:32 +0300 |
| commit | 041aa7a85390c99b1de86dc28eddcff0890d8186 (patch) | |
| tree | 4b95a1289b2b16cc6e7267954fe3c3cb370c9a5b /include | |
| parent | c5538d0141b383808f440186fcd0bc2799af2853 (diff) | |
| download | linux-041aa7a85390c99b1de86dc28eddcff0890d8186.tar.xz | |
entry: Split preemption from irqentry_exit_to_kernel_mode()
Some architecture-specific work needs to be performed between the state
management for exception entry/exit and the "real" work to handle the
exception. For example, arm64 needs to manipulate a number of exception
masking bits, with different exceptions requiring different masking.
Generally this can all be hidden in the architecture code, but for arm64
the current structure of irqentry_exit_to_kernel_mode() makes this
particularly difficult to handle in a way that is correct, maintainable,
and efficient.
The gory details are described in the thread surrounding:
https://lore.kernel.org/lkml/acPAzdtjK5w-rNqC@J2N7QTR9R3/
The summary is:
* Currently, irqentry_exit_to_kernel_mode() handles both involuntary
preemption AND state management necessary for exception return.
* When scheduling (including involuntary preemption), arm64 needs to
have all arm64-specific exceptions unmasked, though regular interrupts
must be masked.
* Prior to the state management for exception return, arm64 needs to
mask a number of arm64-specific exceptions, and perform some work with
these exceptions masked (with RCU watching, etc).
While in theory it is possible to handle this with a new arch_*() hook
called somewhere under irqentry_exit_to_kernel_mode(), this is fragile
and complicated, and doesn't match the flow used for exception return to
user mode, which has a separate 'prepare' step (where preemption can
occur) prior to the state management.
To solve this, refactor irqentry_exit_to_kernel_mode() to match the
style of {irqentry,syscall}_exit_to_user_mode(), moving preemption logic
into a new irqentry_exit_to_kernel_mode_preempt() function, and moving
state management in a new irqentry_exit_to_kernel_mode_after_preempt()
function. The existing irqentry_exit_to_kernel_mode() is left as a
caller of both of these, avoiding the need to modify existing callers.
There should be no functional change as a result of this change.
[ tglx: Updated kernel doc ]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260407131650.3813777-6-mark.rutland@arm.com
Diffstat (limited to 'include')
| -rw-r--r-- | include/linux/irq-entry-common.h | 73 |
1 files changed, 59 insertions, 14 deletions
diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h index 66bc168bc6b5..384520217bfe 100644 --- a/include/linux/irq-entry-common.h +++ b/include/linux/irq-entry-common.h @@ -438,24 +438,46 @@ static __always_inline irqentry_state_t irqentry_enter_from_kernel_mode(struct p } /** - * irqentry_exit_to_kernel_mode - Run preempt checks and establish state after - * invoking the interrupt handler + * irqentry_exit_to_kernel_mode_preempt - Run preempt checks on return to kernel mode * @regs: Pointer to current's pt_regs * @state: Return value from matching call to irqentry_enter_from_kernel_mode() * - * This is the counterpart of irqentry_enter_from_kernel_mode() and runs the - * necessary preemption check if possible and required. It returns to the caller - * with interrupts disabled and the correct state vs. tracing, lockdep and RCU - * required to return to the interrupted context. + * This is to be invoked before irqentry_exit_to_kernel_mode_after_preempt() to + * allow kernel preemption on return from interrupt. + * + * Must be invoked with interrupts disabled and CPU state which allows kernel + * preemption. * - * It is the last action before returning to the low level ASM code which just - * needs to return. + * After returning from this function, the caller can modify CPU state before + * invoking irqentry_exit_to_kernel_mode_after_preempt(), which is required to + * re-establish the tracing, lockdep and RCU state for returning to the + * interrupted context. */ -static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, - irqentry_state_t state) +static inline void irqentry_exit_to_kernel_mode_preempt(struct pt_regs *regs, + irqentry_state_t state) { - lockdep_assert_irqs_disabled(); + if (regs_irqs_disabled(regs) || state.exit_rcu) + return; + + if (IS_ENABLED(CONFIG_PREEMPTION)) + irqentry_exit_cond_resched(); +} +/** + * irqentry_exit_to_kernel_mode_after_preempt - Establish trace, lockdep and RCU state + * @regs: Pointer to current's pt_regs + * @state: Return value from matching call to irqentry_enter_from_kernel_mode() + * + * This is to be invoked after irqentry_exit_to_kernel_mode_preempt() and before + * actually returning to the interrupted context. + * + * There are no requirements for the CPU state other than being able to complete + * the tracing, lockdep and RCU state transitions. After this function returns + * the caller must return directly to the interrupted context. + */ +static __always_inline void +irqentry_exit_to_kernel_mode_after_preempt(struct pt_regs *regs, irqentry_state_t state) +{ if (!regs_irqs_disabled(regs)) { /* * If RCU was not watching on entry this needs to be done @@ -474,9 +496,6 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, } instrumentation_begin(); - if (IS_ENABLED(CONFIG_PREEMPTION)) - irqentry_exit_cond_resched(); - /* Covers both tracing and lockdep */ trace_hardirqs_on(); instrumentation_end(); @@ -491,6 +510,32 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, } /** + * irqentry_exit_to_kernel_mode - Run preempt checks and establish state after + * invoking the interrupt handler + * @regs: Pointer to current's pt_regs + * @state: Return value from matching call to irqentry_enter_from_kernel_mode() + * + * This is the counterpart of irqentry_enter_from_kernel_mode() and combines + * the calls to irqentry_exit_to_kernel_mode_preempt() and + * irqentry_exit_to_kernel_mode_after_preempt(). + * + * The requirement for the CPU state is that it can schedule. After the function + * returns the tracing, lockdep and RCU state transitions are completed and the + * caller must return directly to the interrupted context. + */ +static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, + irqentry_state_t state) +{ + lockdep_assert_irqs_disabled(); + + instrumentation_begin(); + irqentry_exit_to_kernel_mode_preempt(regs, state); + instrumentation_end(); + + irqentry_exit_to_kernel_mode_after_preempt(regs, state); +} + +/** * irqentry_enter - Handle state tracking on ordinary interrupt entries * @regs: Pointer to pt_regs of interrupted context * |
