summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-01-13cpuidle, tdx: Make TDX code noinstr cleanPeter Zijlstra3-2/+6
objtool found a few cases where this code called out into instrumented code: vmlinux.o: warning: objtool: __halt+0x2c: call to hcall_func.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: __halt+0x3f: call to __tdx_hypercall() leaves .noinstr.text section vmlinux.o: warning: objtool: __tdx_hypercall+0x66: call to __tdx_hypercall_failed() leaves .noinstr.text section Fix it by: - moving TDX tdcall assembly methods into .noinstr.text (they are already noistr-clean) - marking __tdx_hypercall_failed() as 'noinstr' - annotating hcall_func() as __always_inline Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195541.111485720@infradead.org
2023-01-13cpuidle, mwait: Make the mwait code noinstr cleanPeter Zijlstra2-7/+7
objtool found a few cases where this code called out into instrumented code: vmlinux.o: warning: objtool: intel_idle_s2idle+0x6e: call to __monitor.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle_irq+0x8c: call to __monitor.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle+0x73: call to __monitor.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: mwait_idle+0x88: call to clflush() leaves .noinstr.text section Fix it by marking the affected methods as __always_inline. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195541.050542952@infradead.org
2023-01-13cpuidle, sched: Remove instrumentation from TIF_{POLLING_NRFLAG,NEED_RESCHED}Peter Zijlstra2-11/+47
objtool pointed out that various idle-TIF management methods have instrumentation: vmlinux.o: warning: objtool: mwait_idle+0x5: call to current_set_polling_and_test() leaves .noinstr.text section vmlinux.o: warning: objtool: acpi_processor_ffh_cstate_enter+0xc5: call to current_set_polling_and_test() leaves .noinstr.text section vmlinux.o: warning: objtool: cpu_idle_poll.isra.0+0x73: call to test_ti_thread_flag() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle+0xbc: call to current_set_polling_and_test() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle_irq+0xea: call to current_set_polling_and_test() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle_s2idle+0xb4: call to current_set_polling_and_test() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle+0xa6: call to current_clr_polling() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle_irq+0xbf: call to current_clr_polling() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle_s2idle+0xa1: call to current_clr_polling() leaves .noinstr.text section vmlinux.o: warning: objtool: mwait_idle+0xe: call to __current_set_polling() leaves .noinstr.text section vmlinux.o: warning: objtool: acpi_processor_ffh_cstate_enter+0xc5: call to __current_set_polling() leaves .noinstr.text section vmlinux.o: warning: objtool: cpu_idle_poll.isra.0+0x73: call to test_ti_thread_flag() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle+0xbc: call to __current_set_polling() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle_irq+0xea: call to __current_set_polling() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle_s2idle+0xb4: call to __current_set_polling() leaves .noinstr.text section vmlinux.o: warning: objtool: cpu_idle_poll.isra.0+0x73: call to test_ti_thread_flag() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle_s2idle+0x73: call to test_ti_thread_flag.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle_irq+0x91: call to test_ti_thread_flag.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle+0x78: call to test_ti_thread_flag.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: acpi_safe_halt+0xf: call to test_ti_thread_flag.constprop.0() leaves .noinstr.text section Remove the instrumentation, because these methods are used in low-level cpuidle code moving between states, that should not be instrumented. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.988741683@infradead.org
2023-01-13time/tick-broadcast: Remove RCU_NONIDLE() usagePeter Zijlstra1-17/+12
No callers left that have already disabled RCU. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.927904612@infradead.org
2023-01-13printk: Remove trace_.*_rcuidle() usagePeter Zijlstra1-1/+1
The problem, per commit fc98c3c8c9dc ("printk: use rcuidle console tracepoint"), was printk usage from the cpuidle path where RCU was already disabled. Per the patches earlier in this series, this is no longer the case. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Petr Mladek <pmladek@suse.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.865735001@infradead.org
2023-01-13arm64, smp: Remove trace_.*_rcuidle() usagePeter Zijlstra1-2/+2
Ever since commit d3afc7f12987 ("arm64: Allow IPIs to be handled as normal interrupts") this function is called in regular IRQ context. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.804410487@infradead.org
2023-01-13arm, smp: Remove trace_.*_rcuidle() usagePeter Zijlstra1-3/+3
None of these functions should ever be ran with RCU disabled anymore. Specifically, do_handle_IPI() is only called from handle_IPI() which explicitly does irq_enter()/irq_exit() which ensures RCU is watching. The problem with smp_cross_call() was, per commit description: 7c64cc0531fa ("arm: Use _rcuidle for smp_cross_call() tracepoints") ... that cpuidle_enter_state_coupled() already had RCU disabled, but that's long been fixed by commit: 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.743432118@infradead.org
2023-01-13x86/tdx: Remove TDX_HCALL_ISSUE_STIPeter Zijlstra3-33/+4
Now that arch_cpu_idle() is expected to return with IRQs disabled, avoid the useless STI/CLI dance. Per the specs this is supposed to work, but nobody has yet relied up this behaviour so broken implementations are possible. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.682137572@infradead.org
2023-01-13arch/idle: Change arch_cpu_idle() behavior: always exit with IRQs disabledPeter Zijlstra27-37/+29
Current arch_cpu_idle() is called with IRQs disabled, but will return with IRQs enabled. However, the very first thing the generic code does after calling arch_cpu_idle() is raw_local_irq_disable(). This means that architectures that can idle with IRQs disabled end up doing a pointless 'enable-disable' dance. Therefore, push this IRQ disabling into the idle function, meaning that those architectures can avoid the pointless IRQ state flipping. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64] Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.618076436@infradead.org
2023-01-13cpuidle, intel_idle: Fix CPUIDLE_FLAG_IBRSPeter Zijlstra2-3/+3
objtool to the rescue: vmlinux.o: warning: objtool: intel_idle_ibrs+0x17: call to spec_ctrl_current() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle_ibrs+0x27: call to wrmsrl.constprop.0() leaves .noinstr.text section Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.556912863@infradead.org
2023-01-13cpuidle, intel_idle: Fix CPUIDLE_FLAG_INIT_XSTATEPeter Zijlstra3-5/+5
Fix instrumentation bugs objtool found: vmlinux.o: warning: objtool: intel_idle_s2idle+0xd5: call to fpu_idle_fpregs() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle_xstate+0x11: call to fpu_idle_fpregs() leaves .noinstr.text section vmlinux.o: warning: objtool: fpu_idle_fpregs+0x9: call to xfeatures_in_use() leaves .noinstr.text section Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.494977795@infradead.org
2023-01-13cpuidle, intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE *again*Peter Zijlstra1-7/+1
So objtool found this bug: vmlinux.o: warning: objtool: intel_idle_irq+0x10c: call to trace_hardirqs_off() leaves .noinstr.text section As per commit 32d4fd5751ea ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE"): "must not have tracing in idle functions" Clearly people can't read and tinker along until splat dissapears. This straight up reverts commit d295ad34f236 ("intel_idle: Fix false positive RCU splats due to incorrect hardirqs state"). It doesn't re-introduce the problem because preceding patches fixed it properly. Fixes: d295ad34f236 ("intel_idle: Fix false positive RCU splats due to incorrect hardirqs state") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.434302128@infradead.org
2023-01-13objtool/idle: Validate __cpuidle code as noinstrPeter Zijlstra32-45/+27
Idle code is very like entry code in that RCU isn't available. As such, add a little validation. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.373461409@infradead.org
2023-01-13cpuidle: Annotate poll_idle()Peter Zijlstra1-1/+5
The __cpuidle functions will become a noinstr class, as such they need explicit annotations. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.312601331@infradead.org
2023-01-13acpi_idle: Remove tracingPeter Zijlstra1-8/+8
All the idle routines are called with RCU disabled, as such there must not be any tracing inside. While there; clean-up the io-port idle thing. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.251666856@infradead.org
2023-01-13cpuidle, cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()Peter Zijlstra1-9/+0
All callers should still have RCU enabled. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.190860672@infradead.org
2023-01-13cpuidle: Fix ct_idle_*() usagePeter Zijlstra15-66/+86
The whole disable-RCU, enable-IRQS dance is very intricate since changing IRQ state is traced, which depends on RCU. Add two helpers for the cpuidle case that mirror the entry code: ct_cpuidle_enter() ct_cpuidle_exit() And fix all the cases where the enter/exit dance was buggy. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.130014793@infradead.org
2023-01-13cpuidle, dt: Push RCU-idle into driverPeter Zijlstra4-3/+11
Doing RCU-idle outside the driver, only to then temporarily enable it again before going idle is suboptimal. Notably: this converts all dt_init_idle_driver() and __CPU_PM_CPU_IDLE_ENTER() users for they are inextrably intertwined. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.068981667@infradead.org
2023-01-13cpuidle, OMAP4: Push RCU-idle into driverPeter Zijlstra1-11/+18
Doing RCU-idle outside the driver, only to then temporarily enable it again, some *four* times, before going idle is suboptimal. Notably three times explicitly using RCU_NONIDLE() and once implicitly through cpu_pm_*(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Tony Lindgren <tony@atomide.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20230112195540.007918454@infradead.org
2023-01-13cpuidle, armada: Push RCU-idle into driverPeter Zijlstra1-0/+7
Doing RCU-idle outside the driver, only to then temporarily enable it again before going idle is suboptimal. Notably the cpu_pm_*() calls implicitly re-enable RCU for a bit. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20230112195539.946630819@infradead.org
2023-01-13cpuidle, OMAP3: Push RCU-idle into driverPeter Zijlstra1-0/+16
Doing RCU-idle outside the driver, only to then teporarily enable it again before going idle is suboptimal. Notably the cpu_pm_*() calls implicitly re-enable RCU for a bit. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Tony Lindgren <tony@atomide.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20230112195539.883561913@infradead.org
2023-01-13cpuidle, ARM/imx6: Push RCU-idle into driverPeter Zijlstra1-1/+4
Doing RCU-idle outside the driver, only to then temporarily enable it again, at least twice, before going idle is suboptimal. Notably both cpu_pm_enter() and cpu_cluster_pm_enter() implicity re-enable RCU. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20230112195539.821714572@infradead.org
2023-01-13cpuidle, psci: Push RCU-idle into driverPeter Zijlstra1-4/+5
Doing RCU-idle outside the driver, only to then temporarily enable it again, at least twice, before going idle is suboptimal. Notably once implicitly through the cpu_pm_*() calls and once explicitly doing ct_irq_*_irqon(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20230112195539.760296658@infradead.org
2023-01-13cpuidle, tegra: Push RCU-idle into driverPeter Zijlstra1-5/+16
Doing RCU-idle outside the driver, only to then temporarily enable it again, at least twice, before going idle is suboptimal. Notably once implicitly through the cpu_pm_*() calls and once explicitly doing RCU_NONIDLE(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20230112195539.699546331@infradead.org
2023-01-13cpuidle, riscv: Push RCU-idle into driverPeter Zijlstra1-4/+5
Doing RCU-idle outside the driver, only to then temporarily enable it again, at least twice, before going idle is suboptimal. That is, once implicitly through the cpu_pm_*() calls and once explicitly doing ct_irq_*_irqon(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20230112195539.637185846@infradead.org
2023-01-13cpuidle: Move IRQ state validationPeter Zijlstra1-5/+5
Make cpuidle_enter_state() consistent with the s2idle variant and verify ->enter() always returns with interrupts disabled. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195539.576412812@infradead.org
2023-01-13cpuidle/poll: Ensure IRQs stay disabled after cpuidle_state::enter() callsPeter Zijlstra1-1/+3
Make cpuidle_state::enter() methods IRQ state invariant on exit. Additionally make sure to use raw_local_irq_*() methods since this cpuidle callback will be called with RCU already disabled. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195539.515253662@infradead.org
2023-01-13x86/idle: Replace 'x86_idle' function pointer with a static_callPeter Zijlstra1-22/+28
Typical boot time setup; no need to suffer an indirect call for that. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20230112195539.453613251@infradead.org
2023-01-13x86/perf/amd: Remove tracing from perf_lopwr_cb()Peter Zijlstra2-9/+6
The perf_lopwr_cb() function is called from the idle routines; there is no RCU there, we must not enter tracing. Use __always_inline, noidle annotations and existing no-trace methods. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195539.392862891@infradead.org
2023-01-12rseq: Increase AT_VECTOR_SIZE_BASE to match rseq auxvec entriesMathieu Desnoyers1-1/+1
Two new auxiliary vector entries are introduced for rseq without matching increment of the AT_VECTOR_SIZE_BASE, which causes failures with CONFIG_HARDENED_USERCOPY=y. Fixes: 317c8194e6ae ("rseq: Introduce feature size and alignment ELF auxiliary vector entries") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230104192054.34046-1-mathieu.desnoyers@efficios.com
2023-01-12selftests/rseq: Revert "selftests/rseq: Add mm_numa_cid to test script"Mathieu Desnoyers1-5/+0
The mm_numa_cid related rseq patches from the series were not picked up into the tip tree, so enabling the mm_numa_cid test needs to be reverted. This reverts commit b344b8f2d88dbf095caf97ac57fd3645843fa70f. Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/oe-lkp/202301040903.2dd1e25b-oliver.sang@intel.com
2023-01-11sched/cputime: Fix IA64 build error of missing arch_vtime_task_switch() ↵Ingo Molnar1-0/+1
prototype The following commit: c89970202a11 ("cputime: remove cputime_to_nsecs fallback") Removed an <asm/cputime.h> inclusion from <linux/sched/cputime.h>, but this broke the IA64 build: arch/ia64/kernel/time.c:110:6: warning: no previous prototype for 'arch_vtime_task_switch' [-Wmissing-prototypes] Add in the missing <asm/cputime.h> header to fix it. Fixes: c89970202a11 ("cputime: remove cputime_to_nsecs fallback") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-07selftests/membarrier: Test MEMBARRIER_CMD_GET_REGISTRATIONSMichal Clapinski3-2/+39
Keep track of previously issued registrations and compare the result with MEMBARRIER_CMD_GET_REGISTRATIONS return value. Signed-off-by: Michal Clapinski <mclapinski@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20221207164338.1535591-3-mclapinski@google.com
2023-01-07sched/membarrier: Introduce MEMBARRIER_CMD_GET_REGISTRATIONSMichal Clapinski2-1/+42
Provide a method to query previously issued registrations. Signed-off-by: Michal Clapinski <mclapinski@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20221207164338.1535591-2-mclapinski@google.com
2023-01-07cpufreq, sched/util: Optimize operations with single CPU capacity lookupLukasz Luba1-20/+23
The max CPU capacity is the same for all CPUs sharing frequency domain. There is a way to avoid heavy operations in a loop for each CPU by leveraging this knowledge. Thus, simplify the looping code in the sugov_next_freq_shared() and drop heavy multiplications. Instead, use simple max() to get the highest utilization from these CPUs. This is useful for platforms with many (4 or 6) little CPUs. We avoid heavy 2*PD_CPU_NUM multiplications in that loop, which is called billions of times, since it's not limited by the schedutil time delta filter in sugov_should_update_freq(). When there was no need to change frequency the code bailed out, not updating the sg_policy::last_freq_update_time. Then every visit after delta_ns time longer than the sg_policy::freq_update_delay_ns goes through and triggers the next frequency calculation code. Although, if the next frequency, as outcome of that, would be the same as current frequency, we won't update the sg_policy::last_freq_update_time and the story will be repeated (in a very short period, sometimes a few microseconds). The max CPU capacity must be fetched every time we are called, due to difficulties during the policy setup, where we are not able to get the normalized CPU capacity at the right time. The fetched CPU capacity value is than used in sugov_iowait_apply() to calculate the right boost. This required a few changes in the local functions and arguments. The capacity value should hopefully be fetched once when needed and then passed over CPU registers to those functions. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20221208160256.859-2-lukasz.luba@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Viresh Kumar <viresh.kumar@linaro.org>
2023-01-07sched/core: Reorganize ttwu_do_wakeup() and ttwu_do_activate()Chengming Zhou1-33/+31
ttwu_do_activate() is used for a complete wakeup, in which we will activate_task() and use ttwu_do_wakeup() to mark the task runnable and perform wakeup-preemption, also call class->task_woken() callback and update the rq->idle_stamp. Since ttwu_runnable() is not a complete wakeup, don't need all those done in ttwu_do_wakeup(), so we can move those to ttwu_do_activate() to simplify ttwu_do_wakeup(), making it only mark the task runnable to be reused in ttwu_runnable() and try_to_wake_up(). This patch should not have any functional changes. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20221223103257.4962-2-zhouchengming@bytedance.com
2023-01-07sched/core: Micro-optimize ttwu_runnable()Chengming Zhou1-3/+10
ttwu_runnable() is used as a fast wakeup path when the wakee task is running on CPU or runnable on RQ, in both cases we can just set its state to TASK_RUNNING to prevent a sleep. If the wakee task is on_cpu running, we don't need to update_rq_clock() or check_preempt_curr(). But if the wakee task is on_rq && !on_cpu (e.g. an IRQ hit before the task got to schedule() and the task been preempted), we should check_preempt_curr() to see if it can preempt the current running. This also removes the class->task_woken() callback from ttwu_runnable(), which wasn't required per the RT/DL implementations: any required push operation would have been queued during class->set_next_task() when p got preempted. ttwu_runnable() also loses the update to rq->idle_stamp, as by definition the rq cannot be idle in this scenario. Suggested-by: Valentin Schneider <vschneid@redhat.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221223103257.4962-1-zhouchengming@bytedance.com
2023-01-05sched/documentation: Document the util clamp featureQais Yousef3-0/+745
Add a document explaining the util clamp feature: what it is and how to use it. The new document hopefully covers everything one needs to know about uclamp. Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://lore.kernel.org/r/20221216235716.201923-1-qyousef@layalina.io Cc: Jonathan Corbet <corbet@lwn.net>
2023-01-05sched/topology: Add __init for sched_init_domains()Bing Huang1-1/+1
sched_init_domains() is only used in initialization Signed-off-by: Bing Huang <huangbing@kylinos.cn> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230105014943.9857-1-huangbing775@126.com
2023-01-02sched/rseq: Fix concurrency ID handling of usermodehelper kthreadsMathieu Desnoyers1-3/+3
sched_mm_cid_after_execve() does not expect NULL t->mm, but it may happen if a usermodehelper kthread fails when attempting to execute a binary. sched_mm_cid_fork() can be issued from a usermodehelper kthread, which has t->flags PF_KTHREAD set. Fixes: af7f588d8f73 ("sched: Introduce per-memory-map concurrency ID") Reported-by: kernel test robot <yujie.liu@intel.com> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/oe-lkp/202212301353.5c959d72-yujie.liu@intel.com
2022-12-27cputime: remove cputime_to_nsecs fallbackNicholas Piggin4-11/+6
The archs that use cputime_to_nsecs() internally provide their own definition and don't need the fallback. cputime_to_usecs() unused except in this fallback, and is not defined anywhere. This removes the final remnant of the cputime_t code from the kernel. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/20221220070705.2958959-1-npiggin@gmail.com
2022-12-27sched/core: Adjusting the order of scanning CPUHao Jia2-3/+3
When select_idle_capacity() starts scanning for an idle CPU, it starts with target CPU that has already been checked in select_idle_sibling(). So we start checking from the next CPU and try the target CPU at the end. Similarly for task_numa_assign(), we have just checked numa_migrate_on of dst_cpu, so start from the next CPU. This also works for steal_cookie_task(), the first scan must fail and start directly from the next one. Signed-off-by: Hao Jia <jiahao.os@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lore.kernel.org/r/20221216062406.7812-3-jiahao.os@bytedance.com
2022-12-27sched/numa: Stop an exhastive search if an idle core is foundHao Jia1-1/+1
In update_numa_stats() we try to find an idle cpu on the NUMA node, preferably an idle core. we can stop looking for the next idle core or idle cpu after finding an idle core. But we can't stop the whole loop of scanning the CPU, because we need to calculate approximate NUMA stats at a point in time. For example, the src and dst nr_running is needed by task_numa_find_cpu(). Signed-off-by: Hao Jia <jiahao.os@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lore.kernel.org/r/20221216062406.7812-2-jiahao.os@bytedance.com
2022-12-27sched: Make const-safeMatthew Wilcox (Oracle)3-22/+24
With a modified container_of() that preserves constness, the compiler finds some pointers which should have been marked as const. task_of() also needs to become const-preserving for the !FAIR_GROUP_SCHED case so that cfs_rq_of() can take a const argument. No change to generated code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221212144946.2657785-1-willy@infradead.org
2022-12-27selftests/rseq: Add mm_numa_cid to test scriptMathieu Desnoyers1-0/+5
Add mm_numa_cid tests to the run_param_test.sh test script. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221216145332.205095-1-mathieu.desnoyers@efficios.com
2022-12-27tracing/rseq: Add mm_cid field to rseq_updateMathieu Desnoyers1-1/+4
Add the mm_cid field to the rseq_update event, allowing tracers to follow which mm_cid is observed by user-space, and whether negative mm_cid values are visible in case of internal scheduler implementation issues. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221122203932.231377-22-mathieu.desnoyers@efficios.com
2022-12-27selftests/rseq: parametrized test: Report/abort on negative concurrency IDMathieu Desnoyers1-0/+5
Report and abort when a negative concurrency ID value is observed by the spinlock test. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221122203932.231377-21-mathieu.desnoyers@efficios.com
2022-12-27selftests/rseq: Implement parametrized mm_cid testMathieu Desnoyers4-49/+122
Adapt to the rseq.h API changes introduced by commits "selftests/rseq: <arch>: Template memory ordering and percpu access mode". Build a new param_test_mm_cid, param_test_mm_cid_benchmark, and param_test_mm_cid_compare_twice executables to test the new "mm_cid" rseq field. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221122203932.231377-20-mathieu.desnoyers@efficios.com
2022-12-27selftests/rseq: Implement basic percpu ops mm_cid testMathieu Desnoyers3-8/+44
Adapt to the rseq.h API changes introduced by commits "selftests/rseq: <arch>: Template memory ordering and percpu access mode". Build a new basic_percpu_ops_mm_cid_test to test the new "mm_cid" rseq field. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221122203932.231377-19-mathieu.desnoyers@efficios.com
2022-12-27selftests/rseq: riscv: Template memory ordering and percpu access modeMathieu Desnoyers2-500/+437
Introduce a rseq-riscv-bits.h template header which is internally included to generate the static inline functions covering: - relaxed and release memory ordering, - per-cpu-id and per-mm-cid per-cpu data access. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221122203932.231377-18-mathieu.desnoyers@efficios.com