kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
11 days	cpuidle: governors: teo: Drop misguided target residency check	Rafael J. Wysocki	1	-5/+2
	commit a03b2011808ab02ccb7ab6b573b013b77fbb5921 upstream. When the target residency of the current candidate idle state is greater than the expected time till the closest timer (the sleep length), it does not matter whether or not the tick has already been stopped or if it is going to be stopped. The closest timer will trigger anyway at its due time, so if an idle state with target residency above the sleep length is selected, energy will be wasted and there may be excess latency. Of course, if the closest timer were canceled before it could trigger, a deeper idle state would be more suitable, but this is not expected to happen (generally speaking, hrtimers are not expected to be canceled as a rule). Accordingly, the teo_state_ok() check done in that case causes energy to be wasted more often than it allows any energy to be saved (if it allows any energy to be saved at all), so drop it and let the governor use the teo_find_shallower_state() return value as the new candidate idle state index. Fixes: 21d28cd2fa5f ("cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront") Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/5955081.DvuYhMxLoT@rafael.j.wysocki Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 days	cpuidle: menu: Use residency threshold in polling state override decisions	Aboorva Devarajan	1	-4/+5
	[ Upstream commit 07d815701274d156ad8c7c088a52e01642156fb8 ] On virtualized PowerPC (pseries) systems, where only one polling state (Snooze) and one deep state (CEDE) are available, selecting CEDE when the predicted idle duration is less than the target residency of CEDE state can hurt performance. In such cases, the entry/exit overhead of CEDE outweighs the power savings, leading to unnecessary state transitions and higher latency. Menu governor currently contains a special-case rule that prioritizes the first non-polling state over polling, even when its target residency is much longer than the predicted idle duration. On PowerPC/pseries, where the gap between the polling state (Snooze) and the first non-polling state (CEDE) is large, this behavior causes performance regressions. Refine that special case by adding an extra requirement: the first non-polling state can only be chosen if its target residency is below the defined RESIDENCY_THRESHOLD_NS. If this condition is not satisfied, polling is allowed instead, avoiding suboptimal non-polling state entries. This change is limited to the single special-case rule for the first non-polling state. The general non-polling state selection logic in the menu governor remains unchanged. Performance improvement observed with pgbench on PowerPC (pseries) system: +---------------------------+------------+------------+------------+ \| Metric \| Baseline \| Patched \| Change (%) \| +---------------------------+------------+------------+------------+ \| Transactions/sec (TPS) \| 495,210 \| 536,982 \| +8.45% \| \| Avg latency (ms) \| 0.163 \| 0.150 \| -7.98% \| +---------------------------+------------+------------+------------+ CPUIdle state usage: +--------------+--------------+-------------+ \| Metric \| Baseline \| Patched \| +--------------+--------------+-------------+ \| Total usage \| 12,735,820 \| 13,918,442 \| \| Above usage \| 11,401,520 \| 1,598,210 \| \| Below usage \| 20,145 \| 702,395 \| +--------------+--------------+-------------+ Above/Total and Below/Total usage percentages: +------------------------+-----------+---------+ \| Metric \| Baseline \| Patched \| +------------------------+-----------+---------+ \| Above % (Above/Total) \| 89.56% \| 11.49% \| \| Below % (Below/Total) \| 0.16% \| 5.05% \| \| Total cpuidle miss (%) \| 89.72% \| 16.54% \| +------------------------+-----------+---------+ The results indicate that restricting CEDE selection to cases where its residency matches the predicted idle time reduces mispredictions, lowers unnecessary state transitions, and improves overall throughput. Reviewed-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> [ rjw: Changelog edits, rebase ] Link: https://patch.msgid.link/20251006013954.17972-1-aboorvad@linux.ibm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-24	cpuidle: Fail cpuidle device registration if there is one already	Rafael J. Wysocki	1	-1/+7
	[ Upstream commit 7b1b7961170e4fcad488755e5ffaaaf9bd527e8f ] Refuse to register a cpuidle device if the given CPU has a cpuidle device already and print a message regarding it. Without this, an attempt to register a new cpuidle device without unregistering the existing one leads to the removal of the existing cpuidle device without removing its sysfs interface. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-24	cpuidle: governors: menu: Select polling state in some more cases	Rafael J. Wysocki	1	-2/+5
	[ Upstream commit db86f55bf81a3a297be05ee8775ae9a8c6e3a599 ] A throughput regression of 11% introduced by commit 779b1a1cb13a ("cpuidle: governors: menu: Avoid selecting states with too much latency") has been reported and it is related to the case when the menu governor checks if selecting a proper idle state instead of a polling one makes sense. In particular, it is questionable to do so if the exit latency of the idle state in question exceeds the predicted idle duration, so add a check for that, which is sufficient to make the reported regression go away, and update the related code comment accordingly. Fixes: 779b1a1cb13a ("cpuidle: governors: menu: Avoid selecting states with too much latency") Closes: https://lore.kernel.org/linux-pm/004501dc43c9$ec8aa930$c59ffb90$@telus.net/ Reported-by: Doug Smythies <dsmythies@telus.net> Tested-by: Doug Smythies <dsmythies@telus.net> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/12786727.O9o76ZdvQC@rafael.j.wysocki Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-11-24	cpuidle: governors: menu: Rearrange main loop in menu_select()	Rafael J. Wysocki	1	-34/+36
	[ Upstream commit 17224c1d2574d29668c4879e1fbf36d6f68cd22b ] Reduce the indentation level in the main loop of menu_select() by rearranging some checks and assignments in it. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/2389215.ElGaqSPkdT@rafael.j.wysocki Stable-dep-of: db86f55bf81a ("cpuidle: governors: menu: Select polling state in some more cases") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-29	Revert "cpuidle: menu: Avoid discarding useful information"	Rafael J. Wysocki	1	-12/+9
	commit 10fad4012234a7dea621ae17c0c9486824f645a0 upstream. It is reported that commit 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") led to a performance regression on Intel Jasper Lake systems because it reduced the time spent by CPUs in idle state C7 which is correlated to the maximum frequency the CPUs can get to because of an average running power limit [1]. Before that commit, get_typical_interval() would have returned UINT_MAX whenever it had been unable to make a high-confidence prediction which had led to selecting the deepest available idle state too often and both power and performance had been inadequate as a result of that on some systems. However, this had not been a problem on systems with relatively aggressive average running power limits, like the Jasper Lake systems in question, because on those systems it was compensated by the ability to run CPUs faster. It was addressed by causing get_typical_interval() to return a number based on the recent idle duration information available to it even if it could not make a high-confidence prediction, but that clearly did not take the possible correlation between idle power and available CPU capacity into account. For this reason, revert most of the changes made by commit 85975daeaa4d, except for one cosmetic cleanup, and add a comment explaining the rationale for returning UINT_MAX from get_typical_interval() when it is unable to make a high-confidence prediction. Fixes: 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") Closes: https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/ [1] Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/3663603.iIbC2pHGDl@rafael.j.wysocki Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-15	cpuidle: qcom-spm: fix device and OF node leaks at probe	Johan Hovold	1	-2/+5
	[ Upstream commit cdc06f912670c8c199d5fa9e78b64b7ed8e871d0 ] Make sure to drop the reference to the saw device taken by of_find_device_by_node() after retrieving its driver data during probe(). Also drop the reference to the CPU node sooner to avoid leaking it in case there is no saw node or device. Fixes: 60f3692b5f0b ("cpuidle: qcom_spm: Detach state machine from main SPM handling") Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28	cpuidle: governors: menu: Avoid selecting states with too much latency	Rafael J. Wysocki	1	-17/+12
	[ Upstream commit 779b1a1cb13ae17028aeddb2fbbdba97357a1e15 ] Occasionally, the exit latency of the idle state selected by the menu governor may exceed the PM QoS CPU wakeup latency limit. Namely, if the scheduler tick has been stopped already and predicted_ns is greater than the tick period length, the governor may return an idle state whose exit latency exceeds latency_req because that decision is made before checking the current idle state's exit latency. For instance, say that there are 3 idle states, 0, 1, and 2. For idle states 0 and 1, the exit latency is equal to the target residency and the values are 0 and 5 us, respectively. State 2 is deeper and has the exit latency and target residency of 200 us and 2 ms (which is greater than the tick period length), respectively. Say that predicted_ns is equal to TICK_NSEC and the PM QoS latency limit is 20 us. After the first two iterations of the main loop in menu_select(), idx becomes 1 and in the third iteration of it the target residency of the current state (state 2) is greater than predicted_ns. State 2 is not a polling one and predicted_ns is not less than TICK_NSEC, so the check on whether or not the tick has been stopped is done. Say that the tick has been stopped already and there are no imminent timers (that is, delta_tick is greater than the target residency of state 2). In that case, idx becomes 2 and it is returned immediately, but the exit latency of state 2 exceeds the latency limit. Address this issue by modifying the code to compare the exit latency of the current idle state (idle state i) with the latency limit before comparing its target residency with predicted_ns, which allows one more exit_latency_ns check that becomes redundant to be dropped. However, after the above change, latency_req cannot take the predicted_ns value any more, which takes place after commit 38f83090f515 ("cpuidle: menu: Remove iowait influence"), because it may cause a polling state to be returned prematurely. In the context of the previous example say that predicted_ns is 3000 and the PM QoS latency limit is still 20 us. Additionally, say that idle state 0 is a polling one. Moving the exit_latency_ns check before the target_residency_ns one causes the loop to terminate in the second iteration, before the target_residency_ns check, so idle state 0 will be returned even though previously state 1 would be returned if there were no imminent timers. For this reason, remove the assignment of the predicted_ns value to latency_req from the code. Fixes: 5ef499cd571c ("cpuidle: menu: Handle stopped tick more aggressively") Cc: 4.17+ <stable@vger.kernel.org> # 4.17+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/5043159.31r3eYUQgx@rafael.j.wysocki Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28	cpuidle: menu: Remove iowait influence	Christian Loehle	1	-43/+9
	[ Upstream commit 38f83090f515b4b5d59382dfada1e7457f19aa47 ] Remove CPU iowaiters influence on idle state selection. Remove the menu notion of performance multiplier which increased with the number of tasks that went to iowait sleep on this CPU and haven't woken up yet. Relying on iowait for cpuidle is problematic for a few reasons: 1. There is no guarantee that an iowaiting task will wake up on the same CPU. 2. The task being in iowait says nothing about the idle duration, we could be selecting shallower states for a long time. 3. The task being in iowait doesn't always imply a performance hit with increased latency. 4. If there is such a performance hit, the number of iowaiting tasks doesn't directly correlate. 5. The definition of iowait altogether is vague at best, it is sprinkled across kernel code. Signed-off-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/20240905092645.2885200-2-christian.loehle@arm.com [ rjw: Minor edits in the changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: 779b1a1cb13a ("cpuidle: governors: menu: Avoid selecting states with too much latency") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28	cpuidle: governors: menu: Avoid using invalid recent intervals data	Rafael J. Wysocki	1	-4/+17
	[ Upstream commit fa3fa55de0d6177fdcaf6fc254f13cc8f33c3eed ] Marc has reported that commit 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") caused the number of wakeup interrupts to increase on an idle system [1], which was not expected to happen after merely allowing shallower idle states to be selected by the governor in some cases. However, on the system in question, all of the idle states deeper than WFI are rejected by the driver due to a firmware issue [2]. This causes the governor to only consider the recent interval duriation data corresponding to attempts to enter WFI that are successful and the recent invervals table is filled with values lower than the scheduler tick period. Consequently, the governor predicts an idle duration below the scheduler tick period length and avoids stopping the tick more often which leads to the observed symptom. Address it by modifying the governor to update the recent intervals table also when entering the previously selected idle state fails, so it knows that the short idle intervals might have been the minority had the selected idle states been actually entered every time. Fixes: 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") Link: https://lore.kernel.org/linux-pm/86o6sv6n94.wl-maz@kernel.org/ [1] Link: https://lore.kernel.org/linux-pm/7ffcb716-9a1b-48c2-aaa4-469d0df7c792@arm.com/ [2] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/2793874.mvXUDI8C0e@rafael.j.wysocki Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04	cpuidle: menu: Avoid discarding useful information	Rafael J. Wysocki	1	-1/+12
	[ Upstream commit 85975daeaa4d6ec560bfcd354fc9c08ad7f38888 ] When giving up on making a high-confidence prediction, get_typical_interval() always returns UINT_MAX which means that the next idle interval prediction will be based entirely on the time till the next timer. However, the information represented by the most recent intervals may not be completely useless in those cases. Namely, the largest recent idle interval is an upper bound on the recently observed idle duration, so it is reasonable to assume that the next idle duration is unlikely to exceed it. Moreover, this is still true after eliminating the suspected outliers if the sample set still under consideration is at least as large as 50% of the maximum sample set size. Accordingly, make get_typical_interval() return the current maximum recent interval value in that case instead of UINT_MAX. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Link: https://patch.msgid.link/7770672.EvYhyI6sBW@rjwysocki.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-17	cpuidle: riscv-sbi: fix device node release in early exit of ↵	Javier Carrasco	1	-2/+2
	for_each_possible_cpu [ Upstream commit 7e25044b804581b9c029d5a28d8800aebde18043 ] The 'np' device_node is initialized via of_cpu_device_node_get(), which requires explicit calls to of_node_put() when it is no longer required to avoid leaking the resource. Instead of adding the missing calls to of_node_put() in all execution paths, use the cleanup attribute for 'np' by means of the __free() macro, which automatically calls of_node_put() when the variable goes out of scope. Given that 'np' is only used within the for_each_possible_cpu(), reduce its scope to release the nood after every iteration of the loop. Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Link: https://lore.kernel.org/r/20241116-cpuidle-riscv-sbi-cleanup-v3-1-a3a46372ce08@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04	cpuidle: riscv-sbi: Use scoped device node handling to fix missing of_node_put	Krzysztof Kozlowski	1	-14/+7
	commit a309320ddbac6b1583224fcb6bacd424bcf8637f upstream. Two return statements in sbi_cpuidle_dt_init_states() did not drop the OF node reference count. Solve the issue and simplify entire error handling with scoped/cleanup.h. Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://patch.msgid.link/20240820094023.61155-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13	cpuidle: Avoid potential overflow in integer multiplication	C Cheng	1	-1/+2
	[ Upstream commit 88390dd788db485912ee7f9a8d3d56fc5265d52f ] In detail: In C language, when you perform a multiplication operation, if both operands are of int type, the multiplication operation is performed on the int type, and then the result is converted to the target type. This means that if the product of int type multiplication exceeds the range that int type can represent, an overflow will occur even if you store the result in a variable of int64_t type. For a multiplication of two int values, it is better to use mul_u32_u32() rather than s->exit_latency_ns = s->exit_latency * NSEC_PER_USEC to avoid potential overflow happenning. Signed-off-by: C Cheng <C.Cheng@mediatek.com> Signed-off-by: Bo Ye <bo.ye@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> [ rjw: New subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26	cpuidle: haltpoll: Do not enable interrupts when entering idle	Borislav Petkov (AMD)	1	-5/+4
	[ Upstream commit c8f5caec3df84a02b937d6d9cda1f7ffa8dc443f ] The cpuidle drivers' ->enter() methods are supposed to be IRQ invariant: 5e26aa933911 ("cpuidle/poll: Ensure IRQs stay disabled after cpuidle_state::enter() calls") bb7b11258561 ("cpuidle: Move IRQ state validation") Do that in the haltpoll driver too. Fixes: 5e26aa933911 ("cpuidle/poll: Ensure IRQs stay disabled after cpuidle_state::enter() calls") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218245 Reported-by: <forza@tnonline.net> Tested-by: <forza@tnonline.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-31	Merge tag 'powerpc-6.6-1' of ↵	Linus Torvalds	1	-7/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the configured SMT state when hotplugging CPUs into the system - Combine final TLB flush and lazy TLB mm shootdown IPIs when using the Radix MMU to avoid a broadcast TLBIE flush on exit - Drop the exclusion between ptrace/perf watchpoints, and drop the now unused associated arch hooks - Add support for the "nohlt" command line option to disable CPU idle - Add support for -fpatchable-function-entry for ftrace, with GCC >= 13.1 - Rework memory block size determination, and support 256MB size on systems with GPUs that have hotpluggable memory - Various other small features and fixes Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam Menghani, Geoff Levand, Hari Bathini, Immad Mir, Jialin Zhang, Joel Stanley, Jordan Niethe, Justin Stitt, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent Dufour, Liang He, Linus Walleij, Mahesh Salgaonkar, Masahiro Yamada, Michal Suchanek, Nageswara R Sastry, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick Desaulniers, Omar Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell Currey, Sourabh Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav Jain, Xiongfeng Wang, Yuan Tan, Zhang Rui, and Zheng Zengkai. * tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (135 commits) macintosh/ams: linux/platform_device.h is needed powerpc/xmon: Reapply "Relax frame size for clang" powerpc/mm/book3s64: Use 256M as the upper limit with coherent device memory attached powerpc/mm/book3s64: Fix build error with SPARSEMEM disabled powerpc/iommu: Fix notifiers being shared by PCI and VIO buses powerpc/mpc5xxx: Add missing fwnode_handle_put() powerpc/config: Disable SLAB_DEBUG_ON in skiroot powerpc/pseries: Remove unused hcall tracing instruction powerpc/pseries: Fix hcall tracepoints with JUMP_LABEL=n powerpc: dts: add missing space before { powerpc/eeh: Use pci_dev_id() to simplify the code powerpc/64s: Move CPU -mtune options into Kconfig powerpc/powermac: Fix unused function warning powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT powerpc: Don't include lppaca.h in paca.h powerpc/pseries: Move hcall_vphn() prototype into vphn.h powerpc/pseries: Move VPHN constants into vphn.h cxl: Drop unused detach_spa() powerpc: Drop zalloc_maybe_bootmem() powerpc/powernv: Use struct opal_prd_msg in more places ...
2023-08-25	Merge branches 'pm-cpuidle' and 'pm-cpufreq'	Rafael J. Wysocki	3	-111/+203
	Merge CPU power management updates for 6.6-rc1: - Rework the menu and teo cpuidle governors to avoid calling tick_nohz_get_sleep_length(), which is likely to become quite expensive going forward, too often and improve making decisions regarding whether or not to stop the scheduler tick in the teo governor (Rafael Wysocki). - Improve the performance of cpufreq_stats_create_table() in some cases (Liao Chang). - Fix two issues in the amd-pstate-ut cpufreq driver (Swapnil Sapkal). - Use clamp() helper macro to improve the code readability in cpufreq_verify_within_limits() (Liao Chang). - Set stale CPU frequency to minimum in intel_pstate (Doug Smythies). * pm-cpuidle: cpuidle: teo: Avoid unnecessary variable assignments cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases cpuidle: teo: Gather statistics regarding whether or not to stop the tick cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront cpuidle: teo: Drop utilized from struct teo_cpu cpuidle: teo: Avoid stopping the tick unnecessarily when bailing out cpuidle: teo: Update idle duration estimate when choosing shallower state * pm-cpufreq: cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver cpufreq: amd-pstate-ut: Remove module parameter access cpufreq: Use clamp() helper macro to improve the code readability cpufreq: intel_pstate: set stale CPU frequency to minimum cpufreq: stats: Improve the performance of cpufreq_stats_create_table()
2023-08-24	powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT	Russell Currey	1	-7/+1
	lppaca_shared_proc() takes a pointer to the lppaca which is typically accessed through get_lppaca(). With DEBUG_PREEMPT enabled, this leads to checking if preemption is enabled, for example: BUG: using smp_processor_id() in preemptible [00000000] code: grep/10693 caller is lparcfg_data+0x408/0x19a0 CPU: 4 PID: 10693 Comm: grep Not tainted 6.5.0-rc3 #2 Call Trace: dump_stack_lvl+0x154/0x200 (unreliable) check_preemption_disabled+0x214/0x220 lparcfg_data+0x408/0x19a0 ... This isn't actually a problem however, as it does not matter which lppaca is accessed, the shared proc state will be the same. vcpudispatch_stats_procfs_init() already works around this by disabling preemption, but the lparcfg code does not, erroring any time /proc/powerpc/lparcfg is accessed with DEBUG_PREEMPT enabled. Instead of disabling preemption on the caller side, rework lppaca_shared_proc() to not take a pointer and instead directly access the lppaca, bypassing any potential preemption checks. Fixes: f13c13a00512 ("powerpc: Stop using non-architected shared_proc field in lppaca") Signed-off-by: Russell Currey <ruscur@russell.cc> [mpe: Rework to avoid needing a definition in paca.h and lppaca.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230823055317.751786-4-mpe@ellerman.id.au
2023-08-23	cpuidle: teo: Avoid unnecessary variable assignments	Rafael J. Wysocki	1	-3/+2
	Notice that it is not necessary to assign tick_intercept_sum in every iteration of the first loop over idle states in teo_select(), because the intercept_sum value does not change after the assignment in a given iteration of the loop, so its value after the last iteration of the loop can be used for computing the tick_intercept_sum value directly. Modify the code accordingly. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-17	cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases	Rafael J. Wysocki	3	-34/+54
	Because the cost of calling tick_nohz_get_sleep_length() may increase in the future, reorder the code in menu_select() so it first uses the statistics to determine the expected idle duration. If that value is higher than RESIDENCY_THRESHOLD_NS, tick_nohz_get_sleep_length() will be called to obtain the time till the closest timer and refine the idle duration prediction if necessary. This causes the governor to always take the full overhead of get_typical_interval() with the assumption that the cost will be amortized by skipping the tick_nohz_get_sleep_length() call in the cases when the predicted idle duration is relatively very small. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Doug Smythies <dsmythies@telus.net>
2023-08-09	cpuidle: teo: Gather statistics regarding whether or not to stop the tick	Rafael J. Wysocki	1	-1/+40
	Currently, if the target residency of the deepest idle state is less than the tick period length, which is quite likely for HZ=100, and the deepest idle state is about to be selected by the TEO idle governor, the decision on whether or not to stop the scheduler tick is based entirely on the time till the closest timer. This is often insufficient, because timers may not be in heavy use and there may be a plenty of other CPU wakeup events between the deepest idle state's target residency and the closest tick. Allow the governor to count those events by making the deepest idle state's bin effectively end at TICK_NSEC and introducing an additional "bin" for collecting "hit" events (ie. the ones in which the measured idle duration falls into the same bin as the time till the closest timer) with idle duration values past TICK_NSEC. This way the "intercepts" metric for the deepest idle state's bin becomes nonzero in general, and so it can influence the decision on whether or not to stop the tick possibly increasing the governor's accuracy in that respect. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Tested-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
2023-08-09	cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases	Rafael J. Wysocki	1	-0/+22
	Make teo_select() avoid calling tick_nohz_get_sleep_length() if the candidate idle state to return is state 0 or if state 0 is a polling one and the target residency of the current candidate one is below a certain threshold, in which cases it may be assumed that the CPU will be woken up immediately by a non-timer wakeup source and the timers are not likely to matter. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Tested-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
2023-08-09	cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront	Rafael J. Wysocki	1	-61/+44
	Because the cost of calling tick_nohz_get_sleep_length() may increase in the future, reorder the code in teo_select() so it first uses the statistics to pick up a candidate idle state and applies the utilization heuristic to it and only then calls tick_nohz_get_sleep_length() to obtain the sleep length value and refine the selection if necessary. This change by itself does not cause tick_nohz_get_sleep_length() to be called less often, but it prepares the code for subsequent changes that will do so. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Tested-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
2023-08-08	cpuidle: psci: Move enabling OSI mode after power domains creation	Maulik Shah	1	-26/+13
	A switch from OSI to PC mode is only possible if all CPUs other than the calling one are OFF, either through a call to CPU_OFF or not yet booted. Currently OSI mode is enabled before power domains are created. In cases where CPUidle states are not using hierarchical CPU topology the bail out path tries to switch back to PC mode which gets denied by firmware since other CPUs are online at this point and creates inconsistent state as firmware is in OSI mode and Linux in PC mode. This change moves enabling OSI mode after power domains are created, this would makes sure that hierarchical CPU topology is used before switching firmware to OSI mode. Cc: stable@vger.kernel.org Fixes: 70c179b49870 ("cpuidle: psci: Allow PM domain to be initialized even if no OSI mode") Signed-off-by: Maulik Shah <quic_mkshah@quicinc.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-08-08	cpuidle: dt_idle_genpd: Add helper function to remove genpd topology	Maulik Shah	2	-0/+31
	Genpd parent and child domain topology created using dt_idle_pd_init_topology() needs to be removed during error cases. Add new helper function dt_idle_pd_remove_topology() for same. Cc: stable@vger.kernel.org Reviewed-by: Ulf Hanssson <ulf.hansson@linaro.org> Signed-off-by: Maulik Shah <quic_mkshah@quicinc.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-08-03	cpuidle: teo: Drop utilized from struct teo_cpu	Rafael J. Wysocki	1	-5/+4
	Because the utilized field in struct teo_cpu is only used locally in teo_select(), replace it with a local variable in that function. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-and-tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2023-08-03	cpuidle: teo: Avoid stopping the tick unnecessarily when bailing out	Rafael J. Wysocki	1	-23/+33
	When teo_select() is going to return early in some special cases, make it avoid stopping the tick if the idle state to be returned is shallow. In particular, never stop the tick if state 0 is to be returned. Link: https://lore.kernel.org/linux-pm/CAJZ5v0jJxHj65r2HXBTd3wfbZtsg=_StzwO1kA5STDnaPe_dWA@mail.gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-and-tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2023-08-03	cpuidle: teo: Update idle duration estimate when choosing shallower state	Rafael J. Wysocki	1	-10/+30
	The TEO governor takes CPU utilization into account by refining idle state selection when the utilization is above a certain threshold. This is done by choosing an idle state shallower than the previously selected one. However, when doing this, the idle duration estimate needs to be adjusted so as to prevent the scheduler tick from being stopped when the candidate idle state is shallow, which may lead to excessive energy usage if the CPU is not woken up quickly enough going forward. Moreover, if the scheduler tick has been stopped already and the new idle duration estimate is too small, the replacement candidate state cannot be used. Modify the relevant code to take the above observations into account. Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness") Link: https://lore.kernel.org/linux-pm/CAJZ5v0jJxHj65r2HXBTd3wfbZtsg=_StzwO1kA5STDnaPe_dWA@mail.gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-and-tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2023-06-05	cpuidle: Use local_clock_noinstr()	Peter Zijlstra	2	-6/+6
	With the introduction of local_clock_noinstr(), local_clock() itself is no longer marked noinstr, use the correct function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Tested-by: Michael Kelley <mikelley@microsoft.com> # Hyper-V Link: https://lore.kernel.org/r/20230519102716.045980863@infradead.org
2023-04-29	RISC-V: Align SBI probe implementation with spec	Andrew Jones	1	-1/+1
	sbi_probe_extension() is specified with "Returns 0 if the given SBI extension ID (EID) is not available, or 1 if it is available unless defined as any other non-zero value by the implementation." Additionally, sbiret.value is a long. Fix the implementation to ensure any nonzero long value is considered a success, rather than only positive int values. Fixes: b9dcd9e41587 ("RISC-V: Add basic support for SBI v0.2") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230427163626.101042-1-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-29	Merge tag 'powerpc-6.4-1' of ↵	Linus Torvalds	1	-14/+14
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for building the kernel using PC-relative addressing on Power10. - Allow HV KVM guests on Power10 to use prefixed instructions. - Unify support for the P2020 CPU (85xx) into a single machine description. - Always build the 64-bit kernel with 128-bit long double. - Drop support for several obsolete 2000's era development boards as identified by Paul Gortmaker. - A series fixing VFIO on Power since some generic changes. - Various other small features and fixes. Thanks to Alexey Kardashevskiy, Andrew Donnellan, Benjamin Gray, Bo Liu, Christophe Leroy, Dan Carpenter, David Binderman, Ira Weiny, Joel Stanley, Kajol Jain, Kautuk Consul, Liang He, Luis Chamberlain, Masahiro Yamada, Michael Neuling, Nathan Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Nick Desaulniers, Nysal Jan K.A, Pali Rohár, Paul Gortmaker, Paul Mackerras, Petr Vaněk, Randy Dunlap, Rob Herring, Sachin Sant, Sean Christopherson, Segher Boessenkool, and Timothy Pearson. * tag 'powerpc-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (156 commits) powerpc/64s: Disable pcrel code model on Clang powerpc: Fix merge conflict between pcrel and copy_thread changes powerpc/configs/powernv: Add IGB=y powerpc/configs/64s: Drop JFS Filesystem powerpc/configs/64s: Use EXT4 to mount EXT2 filesystems powerpc/configs: Make pseries_defconfig an alias for ppc64le_guest powerpc/configs: Make pseries_le an alias for ppc64le_guest powerpc/configs: Incorporate generic kvm_guest.config into guest configs powerpc/configs: Add IBMVETH=y and IBMVNIC=y to guest configs powerpc/configs/64s: Enable Device Mapper options powerpc/configs/64s: Enable PSTORE powerpc/configs/64s: Enable VLAN support powerpc/configs/64s: Enable BLK_DEV_NVME powerpc/configs/64s: Drop REISERFS powerpc/configs/64s: Use SHA512 for module signatures powerpc/configs/64s: Enable IO_STRICT_DEVMEM powerpc/configs/64s: Enable SCHEDSTATS powerpc/configs/64s: Enable DEBUG_VM & other options powerpc/configs/64s: Enable EMULATED_STATS powerpc/configs/64s: Enable KUNIT and most tests ...
2023-04-27	Merge tag 'driver-core-6.4-rc1' of ↵	Linus Torvalds	3	-5/+12
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[\|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ...
2023-04-27	Merge tag 'devicetree-for-6.4-2' of ↵	Linus Torvalds	4	-5/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull more devicetree updates from Rob Herring: - First part of DT header detangling dropping cpu.h from of_device.h and replacing some includes with forward declarations. A handful of drivers needed some adjustment to their includes as a result. - Refactor of_device.h to be used by bus drivers rather than various device drivers. This moves non-bus related functions out of of_device.h. The end goal is for of_platform.h and of_device.h to stop including each other. - Refactor open coded parsing of "ranges" in some bus drivers to use DT address parsing functions - Add some new address parsing functions of_property_read_reg(), of_range_count(), and of_range_to_resource() in preparation to convert more open coded parsing of DT addresses to use them. - Treewide clean-ups to use of_property_read_bool() and of_property_present() as appropriate. The ones here are the ones that didn't get picked up elsewhere. * tag 'devicetree-for-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (34 commits) bus: tegra-gmi: Replace of_platform.h with explicit includes hte: Use of_property_present() for testing DT property presence w1: w1-gpio: Use of_property_read_bool() for boolean properties virt: fsl: Use of_property_present() for testing DT property presence soc: fsl: Use of_property_present() for testing DT property presence sbus: display7seg: Use of_property_read_bool() for boolean properties sparc: Use of_property_read_bool() for boolean properties sparc: Use of_property_present() for testing DT property presence bus: mvebu-mbus: Remove open coded "ranges" parsing of/address: Add of_property_read_reg() helper of/address: Add of_range_count() helper of/address: Add support for 3 address cell bus of/address: Add of_range_to_resource() helper of: unittest: Add bus address range parsing tests of: Drop cpu.h include from of_device.h OPP: Adjust includes to remove of_device.h irqchip: loongson-eiointc: Add explicit include for cpuhotplug.h cpuidle: Adjust includes to remove of_device.h cpufreq: sun50i: Add explicit include for cpu.h cpufreq: Adjust includes to remove of_device.h ...
2023-04-20	cpuidle: pseries: Mark ->enter() functions as __cpuidle	Michael Ellerman	1	-14/+14
	Code in the idle path is not allowed to be instrumented because RCU is disabled, see commit 0e985e9d2286 ("cpuidle: Add comments about noinstr/__cpuidle usage"). Mark the cpuidle ->enter() callbacks as __cpuidle and use the raw_local_irq_*() routines to ensure that is the case. Reported-by: Sachin Sant <sachinp@linux.ibm.com> Link: https://lore.kernel.org/all/4C073F6A-C812-4C4A-BB7A-ECD10B75FB88@linux.ibm.com/ Suggested-by: Peter Zijlstra <peterz@infradead.org> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230406144535.3786008-3-mpe@ellerman.id.au
2023-04-14	cpuidle: Adjust includes to remove of_device.h	Rob Herring	4	-5/+2
	Now that of_cpu_device_node_get() is defined in of.h, of_device.h is just implicitly including other includes, and is no longer needed. Adjust the include files with what was implicitly included by of_device.h (cpu.h, cpuhotplug.h, of.h, and of_platform.h) and drop including of_device.h. Acked-by: Anup Patel <anup@brainfault.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20230329-dt-cpu-header-cleanups-v1-16-581e2605fe47@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-04-03	Merge 6.3-rc5 into driver-core-next	Greg Kroah-Hartman	1	-1/+2
	We need the fixes in here for testing, as well as the driver core changes for documentation updates to build on. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-27	cpuidle: Use of_property_present() for testing DT property presence	Rob Herring	2	-4/+4
	It is preferred to use typed property access functions (i.e. of_property_read_<type> functions) rather than low-level of_get_property/of_find_property functions for reading properties. As part of this, convert of_get_property/of_find_property calls to the recently added of_property_present() helper when we just want to test for presence of a property and nothing more. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Anup Patel <anup@brainfault.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-22	cpuidle: move to use bus_get_dev_root()	Greg Kroah-Hartman	3	-5/+12
	Direct access to the struct bus_type dev_root pointer is going away soon so replace that with a call to bus_get_dev_root() instead, which is what it is there for. This allows us to clean up the cpuidle_add_interface() call a bit as it was only called in one place, with the same argument so just put that into the function itself. Note that cpuidle_remove_interface() should also probably be removed in the future as there are no callers of it for some reason. Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: linux-pm@vger.kernel.org Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20230322090557.2943479-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-07	cpuidle: psci: Iterate backwards over list in psci_pd_remove()	Shawn Guo	1	-1/+2
	In case that psci_pd_init_topology() fails for some reason, psci_pd_remove() will be responsible for deleting provider and removing genpd from psci_pd_providers list. There will be a failure when removing the cluster PD, because the cpu (child) PDs haven't been removed. [ 0.050232] CPUidle PSCI: init PM domain cpu0 [ 0.050278] CPUidle PSCI: init PM domain cpu1 [ 0.050329] CPUidle PSCI: init PM domain cpu2 [ 0.050370] CPUidle PSCI: init PM domain cpu3 [ 0.050422] CPUidle PSCI: init PM domain cpu-cluster0 [ 0.050475] PM: genpd_remove: unable to remove cpu-cluster0 [ 0.051412] PM: genpd_remove: removed cpu3 [ 0.051449] PM: genpd_remove: removed cpu2 [ 0.051499] PM: genpd_remove: removed cpu1 [ 0.051546] PM: genpd_remove: removed cpu0 Fix the problem by iterating the provider list reversely, so that parent PD gets removed after child's PDs like below. [ 0.029052] CPUidle PSCI: init PM domain cpu0 [ 0.029076] CPUidle PSCI: init PM domain cpu1 [ 0.029103] CPUidle PSCI: init PM domain cpu2 [ 0.029124] CPUidle PSCI: init PM domain cpu3 [ 0.029151] CPUidle PSCI: init PM domain cpu-cluster0 [ 0.029647] PM: genpd_remove: removed cpu0 [ 0.029666] PM: genpd_remove: removed cpu1 [ 0.029690] PM: genpd_remove: removed cpu2 [ 0.029714] PM: genpd_remove: removed cpu3 [ 0.029738] PM: genpd_remove: removed cpu-cluster0 Fixes: a65a397f2451 ("cpuidle: psci: Add support for PM domains by using genpd") Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Cc: 5.10+ <stable@vger.kernel.org> # 5.10+ Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>
2023-02-27	Merge tag 'soc-drivers-6.3' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC driver updates from Arnd Bergmann: "As usual, there are lots of minor driver changes across SoC platforms from NXP, Amlogic, AMD Zynq, Mediatek, Qualcomm, Apple and Samsung. These usually add support for additional chip variations in existing drivers, but also add features or bugfixes. The SCMI firmware subsystem gains a unified raw userspace interface through debugfs, which can be used for validation purposes. Newly added drivers include: - New power management drivers for StarFive JH7110, Allwinner D1 and Renesas RZ/V2M - A driver for Qualcomm battery and power supply status - A SoC device driver for identifying Nuvoton WPCM450 chips - A regulator coupler driver for Mediatek MT81xxv" * tag 'soc-drivers-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (165 commits) power: supply: Introduce Qualcomm PMIC GLINK power supply soc: apple: rtkit: Do not copy the reg state structure to the stack soc: sunxi: SUN20I_PPU should depend on PM memory: renesas-rpc-if: Remove redundant division of dummy soc: qcom: socinfo: Add IDs for IPQ5332 and its variant dt-bindings: arm: qcom,ids: Add IDs for IPQ5332 and its variant dt-bindings: power: qcom,rpmpd: add RPMH_REGULATOR_LEVEL_LOW_SVS_L1 firmware: qcom_scm: Move qcom_scm.h to include/linux/firmware/qcom/ MAINTAINERS: Update qcom CPR maintainer entry dt-bindings: firmware: document Qualcomm SM8550 SCM dt-bindings: firmware: qcom,scm: add qcom,scm-sa8775p compatible soc: qcom: socinfo: Add Soc IDs for IPQ8064 and variants dt-bindings: arm: qcom,ids: Add Soc IDs for IPQ8064 and variants soc: qcom: socinfo: Add support for new field in revision 17 soc: qcom: smd-rpm: Add IPQ9574 compatible soc: qcom: pmic_glink: remove redundant calculation of svid soc: qcom: stats: Populate all subsystem debugfs files dt-bindings: soc: qcom,rpmh-rsc: Update to allow for generic nodes soc: qcom: pmic_glink: add CONFIG_NET/CONFIG_OF dependencies soc: qcom: pmic_glink: Introduce altmode support ...
2023-02-21	Merge tag 'pm-6.3-rc1' of ↵	Linus Torvalds	8	-10/+125
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add EPP support to the AMD P-state cpufreq driver, add support for new platforms to the Intel RAPL power capping driver, intel_idle and the Qualcomm cpufreq driver, enable thermal cooling for Tegra194, drop the custom cpufreq driver for loongson1 that is not necessary any more (and the corresponding cpufreq platform device), fix assorted issues and clean up code. Specifics: - Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes Karny, Arnd Bergmann, Bagas Sanjaya) - Drop the custom cpufreq driver for loongson1 that is not necessary any more and the corresponding cpufreq platform device (Keguang Zhang) - Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig entries (Paul E. McKenney) - Enable thermal cooling for Tegra194 (Yi-Wei Wang) - Register module device table and add missing compatibles for cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss) - Various dt binding updates for qcom-cpufreq-nvmem and opp-v2-kryo-cpu (Christian Marangi) - Make kobj_type structure in the cpufreq core constant (Thomas Weißschuh) - Make cpufreq_unregister_driver() return void (Uwe Kleine-König) - Make the TEO cpuidle governor check CPU utilization in order to refine idle state selection (Kajetan Puchalski) - Make Kconfig select the haltpoll cpuidle governor when the haltpoll cpuidle driver is selected and replace a default_idle() call in that driver with arch_cpu_idle() to allow MWAIT to be used (Li RongQing) - Add Emerald Rapids Xeon support to the intel_idle driver (Artem Bityutskiy) - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to avoid randconfig build failures (Arnd Bergmann) - Make kobj_type structures used in the cpuidle sysfs interface constant (Thomas Weißschuh) - Make the cpuidle driver registration code update microsecond values of idle state parameters in accordance with their nanosecond values if they are provided (Rafael Wysocki) - Make the PSCI cpuidle driver prevent topology CPUs from being suspended on PREEMPT_RT (Krzysztof Kozlowski) - Document that pm_runtime_force_suspend() cannot be used with DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald) - Add EXPORT macros for exporting PM functions from drivers (Richard Fitzgerald) - Remove /** from non-kernel-doc comments in hibernation code (Randy Dunlap) - Fix possible name leak in powercap_register_zone() (Yang Yingliang) - Add Meteor Lake and Emerald Rapids support to the intel_rapl power capping driver (Zhang Rui) - Modify the idle_inject power capping facility to support 100% idle injection (Srinivas Pandruvada) - Fix large time windows handling in the intel_rapl power capping driver (Zhang Rui) - Fix memory leaks with using debugfs_lookup() in the generic PM domains and Energy Model code (Greg Kroah-Hartman) - Add missing 'cache-unified' property in the example for kryo OPP bindings (Rob Herring) - Fix error checking in opp_migrate_dentry() (Qi Zheng) - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad Dybcio) - Modify some power management utilities to use the canonical ftrace path (Ross Zwisler) - Correct spelling problems for Documentation/power/ as reported by codespell (Randy Dunlap)" * tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits) Documentation: amd-pstate: disambiguate user space sections cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQ dt-bindings: opp: opp-v2-kryo-cpu: enlarge opp-supported-hw maximum dt-bindings: cpufreq: qcom-cpufreq-nvmem: make cpr bindings optional dt-bindings: cpufreq: qcom-cpufreq-nvmem: specify supported opp tables PM: Add EXPORT macros for exporting PM functions cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT MIPS: loongson32: Drop obsolete cpufreq platform device powercap: intel_rapl: Fix handling for large time window cpuidle: driver: Update microsecond values of state parameters as needed cpuidle: sysfs: make kobj_type structures constant cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies PM: EM: fix memory leak with using debugfs_lookup() PM: domains: fix memory leak with using debugfs_lookup() cpufreq: Make kobj_type structure constant cpufreq: davinci: Fix clk use after free cpufreq: amd-pstate: avoid uninitialized variable use cpufreq: Make cpufreq_unregister_driver() return void OPP: fix error checking in opp_migrate_dentry() dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM8550 compatible ...
2023-02-13	cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT	Krzysztof Kozlowski	3	-2/+16
	The runtime Power Management of CPU topology is not compatible with PREEMPT_RT: 1. Core cpuidle path disables IRQs. 2. Core cpuidle calls cpuidle-psci. 3. cpuidle-psci in __psci_enter_domain_idle_state() calls pm_runtime_put_sync_suspend() and pm_runtime_get_sync() which use spinlocks (which are sleeping on PREEMPT_RT). Deep sleep modes are not a priority of Realtime kernels because the latencies might become unpredictable. On the other hand the PSCI CPU idle power domain is a parent of other devices and power domain controllers, thus it cannot be simply skipped (e.g. on Qualcomm SM8250). Disable the idle callbacks in cpuidle-psci and mark the domain as always on. This is a trade-off between making PREEMPT_RT working and still having a proper power domain hierarchy in the system. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Adrien Thierry <athierry@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-13	cpuidle: driver: Update microsecond values of state parameters as needed	Rafael J. Wysocki	1	-0/+4
	If the cpuidle driver provides the target residency and exit latency in nanoseconds, the corresponding values in microseconds need to be set to reflect the provided numbers in order for the sysfs interface to show them correctly, so make __cpuidle_driver_init() do that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2023-02-09	cpuidle: sysfs: make kobj_type structures constant	Thomas Weißschuh	1	-3/+3
	Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09	cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies	Arnd Bergmann	1	-0/+2
	Some ARMv4 processors don't support suspend, which leads to a build failure with the tegra and qualcomm cpuidle driver: WARNING: unmet direct dependencies detected for ARM_CPU_SUSPEND Depends on [n]: ARCH_SUSPEND_POSSIBLE [=n] Selected by [y]: - ARM_TEGRA_CPUIDLE [=y] && CPU_IDLE [=y] && (ARM [=y] \|\| ARM64) && (ARCH_TEGRA [=n] \|\| COMPILE_TEST [=y]) && !ARM64 && MMU [=y] arch/arm/kernel/sleep.o: in function `__cpu_suspend': (.text+0x68): undefined reference to `cpu_sa110_suspend_size' (.text+0x68): undefined reference to `cpu_fa526_suspend_size' Add an explicit dependency to make randconfig builds avoid this combination. Fixes: faae6c9f2e68 ("cpuidle: tegra: Enable compile testing") Fixes: a871be6b8eee ("cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver") Link: https://lore.kernel.org/all/20211013160125.772873-1-arnd@kernel.org/ Cc: All applicable <stable@vger.kernel.org> Reviewed-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09	firmware: qcom_scm: Move qcom_scm.h to include/linux/firmware/qcom/	Elliot Berman	1	-1/+1
	Move include/linux/qcom_scm.h to include/linux/firmware/qcom/qcom_scm.h. This removes 1 of a few remaining Qualcomm-specific headers into a more approciate subdirectory under include/. Suggested-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Reviewed-by: Guru Das Srinagesh <quic_gurus@quicinc.com> Acked-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230203210956.3580811-1-quic_eberman@quicinc.com
2023-01-31	cpuidle: Fix poll_idle() noinstr annotation	Peter Zijlstra	2	-3/+1
	The instrumentation_begin()/end() annotations in poll_idle() were complete nonsense. Specifically they caused tracing to happen in the middle of noinstr code, resulting in RCU splats. Now that local_clock() is noinstr, mark up the rest and let it rip. Fixes: 00717eb8c955 ("cpuidle: Annotate poll_idle()") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/oe-lkp/202301192148.58ece903-oliver.sang@intel.com Link: https://lore.kernel.org/r/20230126151323.819534689@infradead.org
2023-01-20	cpuidle-haltpoll: Replace default_idle() with arch_cpu_idle()	Li RongQing	1	-1/+1
	When a KVM guest has MWAIT, mwait_idle() is used as the default idle function. However, the cpuidle-haltpoll driver calls default_idle() from default_enter_idle() directly and that one uses HLT instead of MWAIT, which may affect performance adversely, because MWAIT is preferred to HLT as explained by the changelog of commit aebef63cf7ff ("x86: Remove vendor checks from prefer_mwait_c1_over_halt"). Make default_enter_idle() call arch_cpu_idle(), which can use MWAIT, instead of default_idle() to address this issue. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> [ rjw: Changelog rewrite ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-18	cpuidle, arm64: Fix the ARM64 cpuidle logic	Peter Zijlstra	1	-5/+1
	The recent cpuidle changes started triggering RCU splats on Juno development boards: \| ============================= \| WARNING: suspicious RCU usage \| ----------------------------- \| include/trace/events/ipi.h:19 suspicious rcu_dereference_check() usage! Fix cpuidle on ARM64: - ... by introducing a new 'is_rcu' flag to the cpuidle helpers & make ARM64 use it, as ARM64 wants to keep RCU active longer and wants to do the ct_cpuidle_enter()/exit() dance itself. - Also update the PSCI driver accordingly. - This also removes the last known RCU_NONIDLE() user as a bonus. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/Y8Z31UbzG3LJgAXE@hirez.programming.kicks-ass.net --
2023-01-18	cpuidle: mvebu: Fix duplicate flags assignment	Arnd Bergmann	1	-6/+4
	The added '.flags' value is sometimes ignored here because it gets overwritten by another initialization: drivers/cpuidle/cpuidle-mvebu-v7.c:24:33: error: initialized field overwritten [-Werror=override-init] 24 \| #define MVEBU_V7_FLAG_DEEP_IDLE 0x10000 \| ^~~~~~~ drivers/cpuidle/cpuidle-mvebu-v7.c:69:43: note: in expansion of macro 'MVEBU_V7_FLAG_DEEP_IDLE' ... Merge the two fields into one. Fixes: 4ce40e9dbe83 ("cpuidle, armada: Push RCU-idle into driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230117164642.1672784-1-arnd@kernel.org