summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide/kernel-parameters.txt
AgeCommit message (Collapse)AuthorFilesLines
2025-10-19fs: Add 'initramfs_options' to set initramfs mount optionsLichen Liu1-0/+3
[ Upstream commit 278033a225e13ec21900f0a92b8351658f5377f2 ] When CONFIG_TMPFS is enabled, the initial root filesystem is a tmpfs. By default, a tmpfs mount is limited to using 50% of the available RAM for its content. This can be problematic in memory-constrained environments, particularly during a kdump capture. In a kdump scenario, the capture kernel boots with a limited amount of memory specified by the 'crashkernel' parameter. If the initramfs is large, it may fail to unpack into the tmpfs rootfs due to insufficient space. This is because to get X MB of usable space in tmpfs, 2*X MB of memory must be available for the mount. This leads to an OOM failure during the early boot process, preventing a successful crash dump. This patch introduces a new kernel command-line parameter, initramfs_options, which allows passing specific mount options directly to the rootfs when it is first mounted. This gives users control over the rootfs behavior. For example, a user can now specify initramfs_options=size=75% to allow the tmpfs to use up to 75% of the available memory. This can significantly reduce the memory pressure for kdump. Consider a practical example: To unpack a 48MB initramfs, the tmpfs needs 48MB of usable space. With the default 50% limit, this requires a memory pool of 96MB to be available for the tmpfs mount. The total memory requirement is therefore approximately: 16MB (vmlinuz) + 48MB (loaded initramfs) + 48MB (unpacked kernel) + 96MB (for tmpfs) + 12MB (runtime overhead) ≈ 220MB. By using initramfs_options=size=75%, the memory pool required for the 48MB tmpfs is reduced to 48MB / 0.75 = 64MB. This reduces the total memory requirement by 32MB (96MB - 64MB), allowing the kdump to succeed with a smaller crashkernel size, such as 192MB. An alternative approach of reusing the existing rootflags parameter was considered. However, a new, dedicated initramfs_options parameter was chosen to avoid altering the current behavior of rootflags (which applies to the final root filesystem) and to prevent any potential regressions. Also add documentation for the new kernel parameter "initramfs_options" This approach is inspired by prior discussions and patches on the topic. Ref: https://www.lightofdawn.org/blog/?viewDetailed=00128 Ref: https://landley.net/notes-2015.html#01-01-2015 Ref: https://lkml.org/lkml/2021/6/29/783 Ref: https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html#what-is-rootfs Signed-off-by: Lichen Liu <lichliu@redhat.com> Link: https://lore.kernel.org/20250815121459.3391223-1-lichliu@redhat.com Tested-by: Rob Landley <rob@landley.net> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-11x86/vmscape: Enable the mitigationPawan Gupta1-0/+11
Commit 556c1ad666ad90c50ec8fccb930dd5046cfbecfb upstream. Enable the previously added mitigation for VMscape. Add the cmdline vmscape={off|ibpb|force} and sysfs reporting. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-10x86/bugs: Add a Transient Scheduler Attacks mitigationBorislav Petkov (AMD)1-0/+13
Commit d8010d4ba43e9f790925375a7de100604a5e2dba upstream. Add the required features detection glue to bugs.c et all in order to support the TSA mitigation. Co-developed-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27Revert "x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2" ↵Breno Leitao1-2/+0
on v6.6 and older This reverts commit 7adb96687ce8819de5c7bb172c4eeb6e45736e06 which is commit 98fdaeb296f51ef08e727a7cc72e5b5c864c4f4d upstream. commit 7adb96687ce8 ("x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2") depends on commit 72c70f480a70 ("x86/bugs: Add a separate config for Spectre V2"), which introduced MITIGATION_SPECTRE_V2. commit 72c70f480a70 ("x86/bugs: Add a separate config for Spectre V2") never landed in stable tree, thus, stable tree doesn't have MITIGATION_SPECTRE_V2, that said, commit 7adb96687ce8 ("x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2") has no value if the dependecy was not applied. Revert commit 7adb96687ce8 ("x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2") in stable kernel which landed in in 5.4.294, 5.10.238, 5.15.185, 6.1.141 and 6.6.93 stable versions. Cc: David.Kaplan@amd.com Cc: peterz@infradead.org Cc: pawan.kumar.gupta@linux.intel.com Cc: mingo@kernel.org Cc: brad.spengler@opensrcsec.com Cc: stable@vger.kernel.org # 6.6 6.1 5.15 5.10 5.4 Reported-by: Brad Spengler <brad.spengler@opensrcsec.com> Reported-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-04x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2Breno Leitao1-0/+2
[ Upstream commit 98fdaeb296f51ef08e727a7cc72e5b5c864c4f4d ] Change the default value of spectre v2 in user mode to respect the CONFIG_MITIGATION_SPECTRE_V2 config option. Currently, user mode spectre v2 is set to auto (SPECTRE_V2_USER_CMD_AUTO) by default, even if CONFIG_MITIGATION_SPECTRE_V2 is disabled. Set the spectre_v2 value to auto (SPECTRE_V2_USER_CMD_AUTO) if the Spectre v2 config (CONFIG_MITIGATION_SPECTRE_V2) is enabled, otherwise set the value to none (SPECTRE_V2_USER_CMD_NONE). Important to say the command line argument "spectre_v2_user" overwrites the default value in both cases. When CONFIG_MITIGATION_SPECTRE_V2 is not set, users have the flexibility to opt-in for specific mitigations independently. In this scenario, setting spectre_v2= will not enable spectre_v2_user=, and command line options spectre_v2_user and spectre_v2 are independent when CONFIG_MITIGATION_SPECTRE_V2=n. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Kaplan <David.Kaplan@amd.com> Link: https://lore.kernel.org/r/20241031-x86_bugs_last_v2-v2-2-b7ff1dab840e@debian.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18x86/its: Add support for RSB stuffing mitigationPawan Gupta1-0/+3
commit facd226f7e0c8ca936ac114aba43cb3e8b94e41e upstream. When retpoline mitigation is enabled for spectre-v2, enabling call-depth-tracking and RSB stuffing also mitigates ITS. Add cmdline option indirect_target_selection=stuff to allow enabling RSB stuffing mitigation. When retpoline mitigation is not enabled, =stuff option is ignored, and default mitigation for ITS is deployed. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-18x86/its: Add "vmexit" option to skip mitigation on some CPUsPawan Gupta1-0/+2
commit 2665281a07e19550944e8354a2024635a7b2714a upstream. Ice Lake generation CPUs are not affected by guest/host isolation part of ITS. If a user is only concerned about KVM guests, they can now choose a new cmdline option "vmexit" that will not deploy the ITS mitigation when CPU is not affected by guest/host isolation. This saves the performance overhead of ITS mitigation on Ice Lake gen CPUs. When "vmexit" option selected, if the CPU is affected by ITS guest/host isolation, the default ITS mitigation is deployed. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-18x86/its: Enable Indirect Target Selection mitigationPawan Gupta1-0/+13
commit f4818881c47fd91fcb6d62373c57c7844e3de1c0 upstream. Indirect Target Selection (ITS) is a bug in some pre-ADL Intel CPUs with eIBRS. It affects prediction of indirect branch and RETs in the lower half of cacheline. Due to ITS such branches may get wrongly predicted to a target of (direct or indirect) branch that is located in the upper half of the cacheline. Scope of impact =============== Guest/host isolation -------------------- When eIBRS is used for guest/host isolation, the indirect branches in the VMM may still be predicted with targets corresponding to branches in the guest. Intra-mode ---------- cBPF or other native gadgets can be used for intra-mode training and disclosure using ITS. User/kernel isolation --------------------- When eIBRS is enabled user/kernel isolation is not impacted. Indirect Branch Prediction Barrier (IBPB) ----------------------------------------- After an IBPB, indirect branches may be predicted with targets corresponding to direct branches which were executed prior to IBPB. This is mitigated by a microcode update. Add cmdline parameter indirect_target_selection=off|on|force to control the mitigation to relocate the affected branches to an ITS-safe thunk i.e. located in the upper half of cacheline. Also add the sysfs reporting. When retpoline mitigation is deployed, ITS safe-thunks are not needed, because retpoline sequence is already ITS-safe. Similarly, when call depth tracking (CDT) mitigation is deployed (retbleed=stuff), ITS safe return thunk is not used, as CDT prevents RSB-underflow. To not overcomplicate things, ITS mitigation is not supported with spectre-v2 lfence;jmp mitigation. Moreover, it is less practical to deploy lfence;jmp mitigation on ITS affected parts anyways. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07x86/microcode: Prepare for minimal revision checkThomas Gleixner1-0/+5
commit 9407bda845dd19756e276d4f3abc15a20777ba45 upstream Applying microcode late can be fatal for the running kernel when the update changes functionality which is in use already in a non-compatible way, e.g. by removing a CPUID bit. There is no way for admins which do not have access to the vendors deep technical support to decide whether late loading of such a microcode is safe or not. Intel has added a new field to the microcode header which tells the minimal microcode revision which is required to be active in the CPU in order to be safe. Provide infrastructure for handling this in the core code and a command line switch which allows to enforce it. If the update is considered safe the kernel is not tainted and the annoying warning message not emitted. If it's enforced and the currently loaded microcode revision is not safe for late loading then the load is aborted. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211724.079611170@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10proc: add config & param to block forcing mem writesAdrian Ratiu1-0/+10
[ Upstream commit 41e8149c8892ed1962bd15350b3c3e6e90cba7f4 ] This adds a Kconfig option and boot param to allow removing the FOLL_FORCE flag from /proc/pid/mem write calls because it can be abused. The traditional forcing behavior is kept as default because it can break GDB and some other use cases. Previously we tried a more sophisticated approach allowing distributions to fine-tune /proc/pid/mem behavior, however that got NAK-ed by Linus [1], who prefers this simpler approach with semantics also easier to understand for users. Link: https://lore.kernel.org/lkml/CAHk-=wiGWLChxYmUA5HrT5aopZrB7_2VTa0NLZcxORgkUe5tEQ@mail.gmail.com/ [1] Cc: Doug Anderson <dianders@chromium.org> Cc: Jeff Xu <jeffxu@google.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Link: https://lore.kernel.org/r/20240802080225.89408-1-adrian.ratiu@collabora.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14clocksource: Scale the watchdog read retries automaticallyFeng Tang1-6/+0
[ Upstream commit 2ed08e4bc53298db3f87b528cd804cb0cce066a9 ] On a 8-socket server the TSC is wrongly marked as 'unstable' and disabled during boot time on about one out of 120 boot attempts: clocksource: timekeeping watchdog on CPU227: wd-tsc-wd excessive read-back delay of 153560ns vs. limit of 125000ns, wd-wd read-back delay only 11440ns, attempt 3, marking tsc unstable tsc: Marking TSC unstable due to clocksource watchdog TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. sched_clock: Marking unstable (119294969739, 159204297)<-(125446229205, -5992055152) clocksource: Checking clocksource tsc synchronization from CPU 319 to CPUs 0,99,136,180,210,542,601,896. clocksource: Switched to clocksource hpet The reason is that for platform with a large number of CPUs, there are sporadic big or huge read latencies while reading the watchog/clocksource during boot or when system is under stress work load, and the frequency and maximum value of the latency goes up with the number of online CPUs. The cCurrent code already has logic to detect and filter such high latency case by reading the watchdog twice and checking the two deltas. Due to the randomness of the latency, there is a low probabilty that the first delta (latency) is big, but the second delta is small and looks valid. The watchdog code retries the readouts by default twice, which is not necessarily sufficient for systems with a large number of CPUs. There is a command line parameter 'max_cswd_read_retries' which allows to increase the number of retries, but that's not user friendly as it needs to be tweaked per system. As the number of required retries is proportional to the number of online CPUs, this parameter can be calculated at runtime. Scale and enlarge the number of retries according to the number of online CPUs and remove the command line parameter completely. [ tglx: Massaged change log and comments ] Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jin Wang <jin1.wang@intel.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20240221060859.1027450-1-feng.tang@intel.com Stable-dep-of: f2655ac2c06a ("clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14profiling: remove profile=sleep supportTetsuo Handa1-3/+1
commit b88f55389ad27f05ed84af9e1026aa64dbfabc9a upstream. The kernel sleep profile is no longer working due to a recursive locking bug introduced by commit 42a20f86dc19 ("sched: Add wrapper for get_wchan() to keep task blocked") Booting with the 'profile=sleep' kernel command line option added or executing # echo -n sleep > /sys/kernel/profiling after boot causes the system to lock up. Lockdep reports kthreadd/3 is trying to acquire lock: ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70 but task is already holding lock: ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370 with the call trace being lock_acquire+0xc8/0x2f0 get_wchan+0x32/0x70 __update_stats_enqueue_sleeper+0x151/0x430 enqueue_entity+0x4b0/0x520 enqueue_task_fair+0x92/0x6b0 ttwu_do_activate+0x73/0x140 try_to_wake_up+0x213/0x370 swake_up_locked+0x20/0x50 complete+0x2f/0x40 kthread+0xfb/0x180 However, since nobody noticed this regression for more than two years, let's remove 'profile=sleep' support based on the assumption that nobody needs this functionality. Fixes: 42a20f86dc19 ("sched: Add wrapper for get_wchan() to keep task blocked") Cc: stable@vger.kernel.org # v5.16+ Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=nSean Christopherson1-0/+3
[ Upstream commit ce0abef6a1d540acef85068e0e82bdf1fbeeb0e9 ] Explicitly disallow enabling mitigations at runtime for kernels that were built with CONFIG_CPU_MITIGATIONS=n, as some architectures may omit code entirely if mitigations are disabled at compile time. E.g. on x86, a large pile of Kconfigs are buried behind CPU_MITIGATIONS, and trying to provide sane behavior for retroactively enabling mitigations is extremely difficult, bordering on impossible. E.g. page table isolation and call depth tracking require build-time support, BHI mitigations will still be off without additional kernel parameters, etc. [ bp: Touchups. ] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240420000556.2645001-3-seanjc@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-27usb: new quirk to reduce the SET_ADDRESS request timeoutHardik Gajjar1-0/+3
[ Upstream commit 5a1ccf0c72cf917ff3ccc131d1bb8d19338ffe52 ] This patch introduces a new USB quirk, USB_QUIRK_SHORT_SET_ADDRESS_REQ_TIMEOUT, which modifies the timeout value for the SET_ADDRESS request. The standard timeout for USB request/command is 5000 ms, as recommended in the USB 3.2 specification (section 9.2.6.1). However, certain scenarios, such as connecting devices through an APTIV hub, can lead to timeout errors when the device enumerates as full speed initially and later switches to high speed during chirp negotiation. In such cases, USB analyzer logs reveal that the bus suspends for 5 seconds due to incorrect chirp parsing and resumes only after two consecutive timeout errors trigger a hub driver reset. Packet(54) Dir(?) Full Speed J(997.100 us) Idle( 2.850 us) _______| Time Stamp(28 . 105 910 682) _______|_____________________________________________________________Ch0 Packet(55) Dir(?) Full Speed J(997.118 us) Idle( 2.850 us) _______| Time Stamp(28 . 106 910 632) _______|_____________________________________________________________Ch0 Packet(56) Dir(?) Full Speed J(399.650 us) Idle(222.582 us) _______| Time Stamp(28 . 107 910 600) _______|_____________________________________________________________Ch0 Packet(57) Dir Chirp J( 23.955 ms) Idle(115.169 ms) _______| Time Stamp(28 . 108 532 832) _______|_____________________________________________________________Ch0 Packet(58) Dir(?) Full Speed J (Suspend)( 5.347 sec) Idle( 5.366 us) _______| Time Stamp(28 . 247 657 600) _______|_____________________________________________________________Ch0 This 5-second delay in device enumeration is undesirable, particularly in automotive applications where quick enumeration is crucial (ideally within 3 seconds). The newly introduced quirks provide the flexibility to align with a 3-second time limit, as required in specific contexts like automotive applications. By reducing the SET_ADDRESS request timeout to 500 ms, the system can respond more swiftly to errors, initiate rapid recovery, and ensure efficient device enumeration. This change is vital for scenarios where rapid smartphone enumeration and screen projection are essential. To use the quirk, please write "vendor_id:product_id:p" to /sys/bus/usb/drivers/hub/module/parameter/quirks For example, echo "0x2c48:0x0132:p" > /sys/bus/usb/drivers/hub/module/parameters/quirks" Signed-off-by: Hardik Gajjar <hgajjar@de.adit-jv.com> Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20231027152029.104363-2-hgajjar@de.adit-jv.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-17x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=autoJosh Poimboeuf1-3/+0
commit 36d4fe147c870f6d3f6602befd7ef44393a1c87a upstream. Unlike most other mitigations' "auto" options, spectre_bhi=auto only mitigates newer systems, which is confusing and not particularly useful. Remove it. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.1712813475.git.jpoimboe@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-17x86/bugs: Clarify that syscall hardening isn't a BHI mitigationJosh Poimboeuf1-2/+1
commit 5f882f3b0a8bf0788d5a0ee44b1191de5319bb8a upstream. While syscall hardening helps prevent some BHI attacks, there's still other low-hanging fruit remaining. Don't classify it as a mitigation and make it clear that the system may still be vulnerable if it doesn't have a HW or SW mitigation enabled. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/b5951dae3fdee7f1520d5136a27be3bdfe95f88b.1712813475.git.jpoimboe@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-17x86/bugs: Fix BHI documentationJosh Poimboeuf1-5/+7
commit dfe648903f42296866d79f10d03f8c85c9dfba30 upstream. Fix up some inaccuracies in the BHI documentation. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/8c84f7451bfe0dd08543c6082a383f390d4aa7e2.1712813475.git.jpoimboe@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/bhi: Mitigate KVM by defaultPawan Gupta1-2/+3
commit 95a6ccbdc7199a14b71ad8901cb788ba7fb5167b upstream. BHI mitigation mode spectre_bhi=auto does not deploy the software mitigation by default. In a cloud environment, it is a likely scenario where userspace is trusted but the guests are not trusted. Deploying system wide mitigation in such cases is not desirable. Update the auto mode to unconditionally mitigate against malicious guests. Deploy the software sequence at VMexit in auto mode also, when hardware mitigation is not available. Unlike the force =on mode, software sequence is not deployed at syscalls in auto mode. Suggested-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/bhi: Add BHI mitigation knobPawan Gupta1-0/+11
commit ec9404e40e8f36421a2b66ecb76dc2209fe7f3ef upstream. Branch history clearing software sequences and hardware control BHI_DIS_S were defined to mitigate Branch History Injection (BHI). Add cmdline spectre_bhi={on|off|auto} to control BHI mitigation: auto - Deploy the hardware mitigation BHI_DIS_S, if available. on - Deploy the hardware mitigation BHI_DIS_S, if available, otherwise deploy the software sequence at syscall entry and VMexit. off - Turn off BHI mitigation. The default is auto mode which does not deploy the software sequence mitigation. This is because of the hardening done in the syscall dispatch path, which is the likely target of BHI. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03x86/Kconfig: Remove CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULTBorislav Petkov (AMD)1-3/+1
commit 29956748339aa8757a7e2f927a8679dd08f24bb6 upstream. It was meant well at the time but nothing's using it so get rid of it. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240202163510.GDZb0Zvj8qOndvFOiZ@fat_crate.local Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-15x86/rfds: Mitigate Register File Data Sampling (RFDS)Pawan Gupta1-0/+21
commit 8076fcde016c9c0e0660543e67bff86cb48a7c9c upstream. RFDS is a CPU vulnerability that may allow userspace to infer kernel stale data previously used in floating point registers, vector registers and integer registers. RFDS only affects certain Intel Atom processors. Intel released a microcode update that uses VERW instruction to clear the affected CPU buffers. Unlike MDS, none of the affected cores support SMT. Add RFDS bug infrastructure and enable the VERW based mitigation by default, that clears the affected buffers just before exiting to userspace. Also add sysfs reporting and cmdline parameter "reg_file_data_sampling" to control the mitigation. For details see: Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28smp,csd: Throw an error if a CSD lock is stuck for too longRik van Riel1-0/+7
[ Upstream commit 94b3f0b5af2c7af69e3d6e0cdd9b0ea535f22186 ] The CSD lock seems to get stuck in 2 "modes". When it gets stuck temporarily, it usually gets released in a few seconds, and sometimes up to one or two minutes. If the CSD lock stays stuck for more than several minutes, it never seems to get unstuck, and gradually more and more things in the system end up also getting stuck. In the latter case, we should just give up, so the system can dump out a little more information about what went wrong, and, with panic_on_oops and a kdump kernel loaded, dump a whole bunch more information about what might have gone wrong. In addition, there is an smp.panic_on_ipistall kernel boot parameter that by default retains the old behavior, but when set enables the panic after the CSD lock has been stuck for more than the specified number of milliseconds, as in 300,000 for five minutes. [ paulmck: Apply Imran Khan feedback. ] [ paulmck: Apply Leonardo Bras feedback. ] Link: https://lore.kernel.org/lkml/bc7cc8b0-f587-4451-8bcd-0daae627bcc7@paulmck-laptop/ Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Imran Khan <imran.f.khan@oracle.com> Reviewed-by: Leonardo Bras <leobras@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-02Merge tag 'wq-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-9/+19
Pull workqueue updates from Tejun Heo: - Unbound workqueues now support more flexible affinity scopes. The default behavior is to soft-affine according to last level cache boundaries. A work item queued from a given LLC is executed by a worker running on the same LLC but the worker may be moved across cache boundaries as the scheduler sees fit. On machines which multiple L3 caches, which are becoming more popular along with chiplet designs, this improves cache locality while not harming work conservation too much. Unbound workqueues are now also a lot more flexible in terms of execution affinity. Differeing levels of affinity scopes are supported and both the default and per-workqueue affinity settings can be modified dynamically. This should help working around amny of sub-optimal behaviors observed recently with asymmetric ARM CPUs. This involved signficant restructuring of workqueue code. Nothing was reported yet but there's some risk of subtle regressions. Should keep an eye out. - Rescuer workers now has more identifiable comms. - workqueue.unbound_cpus added so that CPUs which can be used by workqueue can be constrained early during boot. - Now that all the in-tree users have been flushed out, trigger warning if system-wide workqueues are flushed. * tag 'wq-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (31 commits) workqueue: fix data race with the pwq->stats[] increment workqueue: Rename rescuer kworker workqueue: Make default affinity_scope dynamically updatable workqueue: Add "Affinity Scopes and Performance" section to documentation workqueue: Implement non-strict affinity scope for unbound workqueues workqueue: Add workqueue_attrs->__pod_cpumask workqueue: Factor out need_more_worker() check and worker wake-up workqueue: Factor out work to worker assignment and collision handling workqueue: Add multiple affinity scopes and interface to select them workqueue: Modularize wq_pod_type initialization workqueue: Add tools/workqueue/wq_dump.py which prints out workqueue configuration workqueue: Generalize unbound CPU pods workqueue: Factor out clearing of workqueue-only attrs fields workqueue: Factor out actual cpumask calculation to reduce subtlety in wq_update_pod() workqueue: Initialize unbound CPU pods later in the boot workqueue: Move wq_pod_init() below workqueue_init() workqueue: Rename NUMA related names to use pod instead workqueue: Rename workqueue_attrs->no_numa to ->ordered workqueue: Make unbound workqueues to use per-cpu pool_workqueues workqueue: Call wq_update_unbound_numa() on all CPUs in NUMA node on CPU hotplug ...
2023-09-01Merge tag 'usb-6.6-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB / Thunderbolt / PHY driver updates from Greg KH: "Here is the big set of USB, Thunderbolt, and PHY driver updates for 6.6-rc1. Included in here are: - PHY driver additions and cleanups - Thunderbolt minor additions and fixes - USB MIDI 2 gadget support added - dwc3 driver updates and additions - Removal of some old USB wireless code that was missed when that codebase was originally removed a few years ago, cleaning up some core USB code paths - USB core potential use-after-free fixes that syzbot from different people/groups keeps tripping over - typec updates and additions - gadget fixes and cleanups - loads of smaller USB core and driver cleanups all over the place Full details are in the shortlog. All of these have been in linux-next for a while with no reported problems" * tag 'usb-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (154 commits) platform/chrome: cros_ec_typec: Configure Retimer cable type tcpm: Avoid soft reset when partner does not support get_status usb: typec: tcpm: reset counter when enter into unattached state after try role usb: typec: tcpm: set initial svdm version based on pd revision USB: serial: option: add FOXCONN T99W368/T99W373 product USB: serial: option: add Quectel EM05G variant (0x030e) usb: dwc2: add pci_device_id driver_data parse support usb: gadget: remove max support speed info in bind operation usb: gadget: composite: cleanup function config_ep_by_speed_and_alt() usb: gadget: config: remove max speed check in usb_assign_descriptors() usb: gadget: unconditionally allocate hs/ss descriptor in bind operation usb: gadget: f_uvc: change endpoint allocation in uvc_function_bind() usb: gadget: add a inline function gether_bitrate() usb: gadget: use working speed to calcaulate network bitrate and qlen dt-bindings: usb: samsung,exynos-dwc3: Add Exynos850 support usb: dwc3: exynos: Add support for Exynos850 variant usb: gadget: udc-xilinx: fix incorrect type in assignment warning usb: gadget: udc-xilinx: fix cast from restricted __le16 warning usb: gadget: udc-xilinx: fix restricted __le16 degrades to integer warning USB: dwc2: hande irq on dead controller correctly ...
2023-09-01Merge tag 'riscv-for-linus-6.6-mw1' of ↵Linus Torvalds1-7/+15
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for the new "riscv,isa-extensions" and "riscv,isa-base" device tree interfaces for probing extensions - Support for userspace access to the performance counters - Support for more instructions in kprobes - Crash kernels can be allocated above 4GiB - Support for KCFI - Support for ELFs in !MMU configurations - ARCH_KMALLOC_MINALIGN has been reduced to 8 - mmap() defaults to sv48-sized addresses, with longer addresses hidden behind a hint (similar to Arm and Intel) - Also various fixes and cleanups * tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits) lib/Kconfig.debug: Restrict DEBUG_INFO_SPLIT for RISC-V riscv: support PREEMPT_DYNAMIC with static keys riscv: Move create_tmp_mapping() to init sections riscv: Mark KASAN tmp* page tables variables as static riscv: mm: use bitmap_zero() API riscv: enable DEBUG_FORCE_FUNCTION_ALIGN_64B riscv: remove redundant mv instructions RISC-V: mm: Document mmap changes RISC-V: mm: Update pgtable comment documentation RISC-V: mm: Add tests for RISC-V mm RISC-V: mm: Restrict address space for sv39,sv48,sv57 riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent riscv: allow kmalloc() caches aligned to the smallest value riscv: support the elf-fdpic binfmt loader binfmt_elf_fdpic: support 64-bit systems riscv: Allow CONFIG_CFI_CLANG to be selected riscv/purgatory: Disable CFI riscv: Add CFI error handling riscv: Add ftrace_stub_graph riscv: Add types to indirectly called assembly functions ...
2023-08-31Merge tag 'powerpc-6.6-1' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the configured SMT state when hotplugging CPUs into the system - Combine final TLB flush and lazy TLB mm shootdown IPIs when using the Radix MMU to avoid a broadcast TLBIE flush on exit - Drop the exclusion between ptrace/perf watchpoints, and drop the now unused associated arch hooks - Add support for the "nohlt" command line option to disable CPU idle - Add support for -fpatchable-function-entry for ftrace, with GCC >= 13.1 - Rework memory block size determination, and support 256MB size on systems with GPUs that have hotpluggable memory - Various other small features and fixes Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam Menghani, Geoff Levand, Hari Bathini, Immad Mir, Jialin Zhang, Joel Stanley, Jordan Niethe, Justin Stitt, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent Dufour, Liang He, Linus Walleij, Mahesh Salgaonkar, Masahiro Yamada, Michal Suchanek, Nageswara R Sastry, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick Desaulniers, Omar Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell Currey, Sourabh Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav Jain, Xiongfeng Wang, Yuan Tan, Zhang Rui, and Zheng Zengkai. * tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (135 commits) macintosh/ams: linux/platform_device.h is needed powerpc/xmon: Reapply "Relax frame size for clang" powerpc/mm/book3s64: Use 256M as the upper limit with coherent device memory attached powerpc/mm/book3s64: Fix build error with SPARSEMEM disabled powerpc/iommu: Fix notifiers being shared by PCI and VIO buses powerpc/mpc5xxx: Add missing fwnode_handle_put() powerpc/config: Disable SLAB_DEBUG_ON in skiroot powerpc/pseries: Remove unused hcall tracing instruction powerpc/pseries: Fix hcall tracepoints with JUMP_LABEL=n powerpc: dts: add missing space before { powerpc/eeh: Use pci_dev_id() to simplify the code powerpc/64s: Move CPU -mtune options into Kconfig powerpc/powermac: Fix unused function warning powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT powerpc: Don't include lppaca.h in paca.h powerpc/pseries: Move hcall_vphn() prototype into vphn.h powerpc/pseries: Move VPHN constants into vphn.h cxl: Drop unused detach_spa() powerpc: Drop zalloc_maybe_bootmem() powerpc/powernv: Use struct opal_prd_msg in more places ...
2023-08-31Merge patch series "support allocating crashkernel above 4G explicitly on riscv"Palmer Dabbelt1-7/+8
Chen Jiahao <chenjiahao16@huawei.com> says: On riscv, the current crash kernel allocation logic is trying to allocate within 32bit addressible memory region by default, if failed, try to allocate without 4G restriction. In need of saving DMA zone memory while allocating a relatively large crash kernel region, allocating the reserved memory top down in high memory, without overlapping the DMA zone, is a mature solution. Hence this patchset introduces the parameter option crashkernel=X,[high,low]. One can reserve the crash kernel from high memory above DMA zone range by explicitly passing "crashkernel=X,high"; or reserve a memory range below 4G with "crashkernel=X,low". Besides, there are few rules need to take notice: 1. "crashkernel=X,[high,low]" will be ignored if "crashkernel=size" is specified. 2. "crashkernel=X,low" is valid only when "crashkernel=X,high" is passed and there is enough memory to be allocated under 4G. 3. When allocating crashkernel above 4G and no "crashkernel=X,low" is specified, a 128M low memory will be allocated automatically for swiotlb bounce buffer. See Documentation/admin-guide/kernel-parameters.txt for more information. To verify loading the crashkernel, adapted kexec-tools is attached below: https://github.com/chenjh005/kexec-tools/tree/build-test-riscv-v2 Following test cases have been performed as expected: 1) crashkernel=256M //low=256M 2) crashkernel=1G //low=1G 3) crashkernel=4G //high=4G, low=128M(default) 4) crashkernel=4G crashkernel=256M,high //high=4G, low=128M(default), high is ignored 5) crashkernel=4G crashkernel=256M,low //high=4G, low=128M(default), low is ignored 6) crashkernel=4G,high //high=4G, low=128M(default) 7) crashkernel=256M,low //low=0M, invalid 8) crashkernel=4G,high crashkernel=256M,low //high=4G, low=256M 9) crashkernel=4G,high crashkernel=4G,low //high=0M, low=0M, invalid 10) crashkernel=512M@0xd0000000 //low=512M 11) crashkernel=1G,high crashkernel=0M,low //high=1G, low=0M * b4-shazam-merge: docs: kdump: Update the crashkernel description for riscv riscv: kdump: Implement crashkernel=X,[high,low] Link: https://lore.kernel.org/r/20230726175000.2536220-1-chenjiahao16@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31Merge tag 'docs-6.6' of git://git.lwn.net/linuxLinus Torvalds1-23/+23
Pull documentation updates from Jonathan Corbet: "Documentation work keeps chugging along; this includes: - Work from Carlos Bilbao to integrate rustdoc output into the generated HTML documentation. This took some work to figure out how to do it without slowing the docs build and without creating people who don't have Rust installed, but Carlos got there - Move the loongarch and mips architecture documentation under Documentation/arch/ - Some more maintainer documentation from Jakub ... plus the usual assortment of updates, translations, and fixes" * tag 'docs-6.6' of git://git.lwn.net/linux: (56 commits) Docu: genericirq.rst: fix irq-example input: docs: pxrc: remove reference to phoenix-sim Documentation: serial-console: Fix literal block marker docs/mm: remove references to hmm_mirror ops and clean typos docs/zh_CN: correct regi_chg(),regi_add() to region_chg(),region_add() Documentation: Fix typos Documentation/ABI: Fix typos scripts: kernel-doc: fix macro handling in enums scripts: kernel-doc: parse DEFINE_DMA_UNMAP_[ADDR|LEN] Documentation: riscv: Update boot image header since EFI stub is supported Documentation: riscv: Add early boot document Documentation: arm: Add bootargs to the table of added DT parameters docs: kernel-parameters: Refer to the correct bitmap function doc: update params of memhp_default_state= docs: Add book to process/kernel-docs.rst docs: sparse: fix invalid link addresses docs: vfs: clean up after the iterate() removal docs: Add a section on surveys to the researcher guidelines docs: move mips under arch docs: move loongarch under arch ...
2023-08-30Merge tag 'dma-mapping-6.6-2023-08-29' of ↵Linus Torvalds1-1/+12
git://git.infradead.org/users/hch/dma-mapping Pull dma-maping updates from Christoph Hellwig: - allow dynamic sizing of the swiotlb buffer, to cater for secure virtualization workloads that require all I/O to be bounce buffered (Petr Tesarik) - move a declaration to a header (Arnd Bergmann) - check for memory region overlap in dma-contiguous (Binglei Wang) - remove the somewhat dangerous runtime swiotlb-xen enablement and unexport is_swiotlb_active (Christoph Hellwig, Juergen Gross) - per-node CMA improvements (Yajun Deng) * tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: optimize get_max_slots() swiotlb: move slot allocation explanation comment where it belongs swiotlb: search the software IO TLB only if the device makes use of it swiotlb: allocate a new memory pool when existing pools are full swiotlb: determine potential physical address limit swiotlb: if swiotlb is full, fall back to a transient memory pool swiotlb: add a flag whether SWIOTLB is allowed to grow swiotlb: separate memory pool data from other allocator data swiotlb: add documentation and rename swiotlb_do_find_slots() swiotlb: make io_tlb_default_mem local to swiotlb.c swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated dma-contiguous: check for memory region overlap dma-contiguous: support numa CMA for specified node dma-contiguous: support per-numa CMA for all architectures dma-mapping: move arch_dma_set_mask() declaration to header swiotlb: unexport is_swiotlb_active x86: always initialize xen-swiotlb when xen-pcifront is enabling xen/pci: add flag for PCI passthrough being possible
2023-08-29Merge tag 'tpmdd-v6.6' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm updates from Jarkko Sakkinen: - Restrict linking of keys to .ima and .evm keyrings based on digitalSignature attribute in the certificate - PowerVM: load machine owner keys into the .machine [1] keyring - PowerVM: load module signing keys into the secondary trusted keyring (keys blessed by the vendor) - tpm_tis_spi: half-duplex transfer mode - tpm_tis: retry corrupted transfers - Apply revocation list (.mokx) to an all system keyrings (e.g. .machine keyring) Link: https://blogs.oracle.com/linux/post/the-machine-keyring [1] * tag 'tpmdd-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: certs: Reference revocation list for all keyrings tpm/tpm_tis_synquacer: Use module_platform_driver macro to simplify the code tpm: remove redundant variable len tpm_tis: Resend command to recover from data transfer errors tpm_tis: Use responseRetry to recover from data transfer errors tpm_tis: Move CRC check to generic send routine tpm_tis_spi: Add hardware wait polling KEYS: Replace all non-returning strlcpy with strscpy integrity: PowerVM support for loading third party code signing keys integrity: PowerVM machine keyring enablement integrity: check whether imputed trust is enabled integrity: remove global variable from machine_keyring.c integrity: ignore keys failing CA restrictions on non-UEFI platform integrity: PowerVM support for loading CA keys on machine keyring integrity: Enforce digitalSignature usage in the ima and evm keyrings KEYS: DigitalSignature link restriction tpm_tis: Revert "tpm_tis: Disable interrupts on ThinkPad T490s"
2023-08-29Merge tag 'acpi-6.6-rc1' of ↵Linus Torvalds1-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These include new ACPICA material, a rework of the ACPI thermal driver, a switch-over of the ACPI processor driver to using _OSC instead of (long deprecated) _PDC for CPU initialization, a rework of firmware notifications handling in several drivers, fixes and cleanups for suspend-to-idle handling on AMD systems, ACPI backlight driver updates and more. Specifics: - Update the ACPICA code in the kernel to upstream revision 20230628 including the following changes: - Suppress a GCC 12 dangling-pointer warning (Philip Prindeville) - Reformat the ACPI_STATE_COMMON macro and its users (George Guo) - Replace the ternary operator with ACPI_MIN() (Jiangshan Yi) - Add support for _DSC as per ACPI 6.5 (Saket Dumbre) - Remove a duplicate macro from zephyr header (Najumon B.A) - Add data structures for GED and _EVT tracking (Jose Marinho) - Fix misspelled CDAT DSMAS define (Dave Jiang) - Simplify an error message in acpi_ds_result_push() (Christophe Jaillet) - Add a struct size macro related to SRAT (Dave Jiang) - Add AML_NO_OPERAND_RESOLVE flag to Timer (Abhishek Mainkar) - Add support for RISC-V external interrupt controllers in MADT (Sunil V L) - Add RHCT flags, CMO and MMU nodes (Sunil V L) - Change ACPICA version to 20230628 (Bob Moore) - Introduce new wrappers for ACPICA notify handler install/remove and convert multiple drivers to using their own Notify() handlers instead of the ACPI bus type .notify() slated for removal (Michal Wilczynski) - Add backlight=native DMI quirk for Apple iMac12,1 and iMac12,2 (Hans de Goede) - Put ACPI video and its child devices explicitly into D0 on boot to avoid platform firmware confusion (Kai-Heng Feng) - Add backlight=native DMI quirk for Lenovo Ideapad Z470 (Jiri Slaby) - Support obtaining physical CPU ID from MADT on LoongArch (Bibo Mao) - Convert ACPI CPU initialization to using _OSC instead of _PDC that has been depreceted since 2018 and dropped from the specification in ACPI 6.5 (Michal Wilczynski, Rafael Wysocki) - Drop non-functional nocrt parameter from ACPI thermal (Mario Limonciello) - Clean up the ACPI thermal driver, rework the handling of firmware notifications in it and make it provide a table of generic trip point structures to the core during initialization (Rafael Wysocki) - Defer enumeration of devices with _DEP pointing to IVSC (Wentong Wu) - Install SystemCMOS address space handler for ACPI000E (TAD) to meet platform firmware expectations on some platforms (Zhang Rui) - Fix finding the generic error data in the ACPi extlog driver for compatibility with old and new firmware interface versions (Xiaochun Lee) - Remove assorted unused declarations of functions (Yue Haibing) - Move AMBA bus scan handling into arm64 specific directory (Sudeep Holla) - Fix and clean up suspend-to-idle interface for AMD systems (Mario Limonciello, Andy Shevchenko) - Fix string truncation warning in pnpacpi_add_device() (Sunil V L)" * tag 'acpi-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (66 commits) ACPI: x86: s2idle: Add a function to get LPS0 constraint for a device ACPI: x86: s2idle: Add for_each_lpi_constraint() helper ACPI: x86: s2idle: Add more debugging for AMD constraints parsing ACPI: x86: s2idle: Fix a logic error parsing AMD constraints table ACPI: x86: s2idle: Catch multiple ACPI_TYPE_PACKAGE objects ACPI: x86: s2idle: Post-increment variables when getting constraints ACPI: Adjust #ifdef for *_lps0_dev use ACPI: TAD: Install SystemCMOS address space handler for ACPI000E ACPI: Remove assorted unused declarations of functions ACPI: extlog: Fix finding the generic error data for v3 structure PNP: ACPI: Fix string truncation warning ACPI: Remove unused extern declaration acpi_paddr_to_node() ACPI: video: Add backlight=native DMI quirk for Apple iMac12,1 and iMac12,2 ACPI: video: Put ACPI video and its child devices into D0 on boot ACPI: processor: LoongArch: Get physical ID from MADT ACPI: scan: Defer enumeration of devices with a _DEP pointing to IVSC device ACPI: thermal: Eliminate code duplication from acpi_thermal_notify() ACPI: thermal: Drop unnecessary thermal zone callbacks ACPI: thermal: Rework thermal_get_trend() ACPI: thermal: Use trip point table to register thermal zones ...
2023-08-29Merge tag 's390-6.6-1' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Add vfio-ap support to pass-through crypto devices to secure execution guests - Add API ordinal 6 support to zcrypt_ep11misc device drive, which is required to handle key generate and key derive (e.g. secure key to protected key) correctly - Add missing secure/has_secure sysfs files for the case where it is not possible to figure where a system has been booted from. Existing user space relies on that these files are always present - Fix DCSS block device driver list corruption, caused by incorrect error handling - Convert virt_to_pfn() and pfn_to_virt() from defines to static inline functions to enforce type checking - Cleanups, improvements, and minor fixes to the kernel mapping setup - Fix various virtual vs physical address confusions - Move pfault code to separate file, since it has nothing to do with regular fault handling - Move s390 documentation to Documentation/arch/ like it has been done for other architectures already - Add HAVE_FUNCTION_GRAPH_RETVAL support - Factor out the s390_hypfs filesystem and add a new config option for it. The filesystem is deprecated and as soon as all users are gone it can be removed some time in the not so near future - Remove support for old CEX2 and CEX3 crypto cards from zcrypt device driver - Add support for user-defined certificates: receive user-defined certificates with a diagnose call and provide them via 'cert_store' keyring to user space - Couple of other small fixes and improvements all over the place * tag 's390-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (66 commits) s390/pci: use builtin_misc_device macro to simplify the code s390/vfio-ap: make sure nib is shared KVM: s390: export kvm_s390_pv*_is_protected functions s390/uv: export uv_pin_shared for direct usage s390/vfio-ap: check for TAPQ response codes 0x35 and 0x36 s390/vfio-ap: handle queue state change in progress on reset s390/vfio-ap: use work struct to verify queue reset s390/vfio-ap: store entire AP queue status word with the queue object s390/vfio-ap: remove upper limit on wait for queue reset to complete s390/vfio-ap: allow deconfigured queue to be passed through to a guest s390/vfio-ap: wait for response code 05 to clear on queue reset s390/vfio-ap: clean up irq resources if possible s390/vfio-ap: no need to check the 'E' and 'I' bits in APQSW after TAPQ s390/ipl: refactor deprecated strncpy s390/ipl: fix virtual vs physical address confusion s390/zcrypt_ep11misc: support API ordinal 6 with empty pin-blob s390/paes: fix PKEY_TYPE_EP11_AES handling for secure keyblobs s390/pkey: fix PKEY_TYPE_EP11_AES handling for sysfs attributes s390/pkey: fix PKEY_TYPE_EP11_AES handling in PKEY_VERIFYKEY2 IOCTL s390/pkey: fix PKEY_TYPE_EP11_AES handling in PKEY_KBLOB2PROTK[23] ...
2023-08-28Merge tag 'rcu.2023.08.21a' of ↵Linus Torvalds1-1/+55
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Documentation updates - Miscellaneous fixes, perhaps most notably simplifying SRCU_NOTIFIER_INIT() as suggested - RCU Tasks updates, most notably treating Tasks RCU callbacks as lazy while still treating synchronous grace periods as urgent. Also fixes one bug that restores the ability to apply debug-objects to RCU Tasks and another that fixes a race condition that could result in false-positive failures of the boot-time self-test code - RCU-scalability performance-test updates, most notably adding the ability to measure the RCU-Tasks's grace-period kthread's CPU consumption. This proved quite useful for the RCU Tasks work - Reference-acquisition/release performance-test updates, including a fix for an uninitialized wait_queue_head_t - Miscellaneous torture-test updates - Torture-test scripting updates, including removal of the non-longer-functional formal-verification scripts, test builds of individual RCU Tasks flavors, better diagnostics for loss of connectivity for distributed rcutorture tests, disabling of reboot loops in qemu/KVM-based rcutorture testing, and passing of init parameters to rcutorture's init program * tag 'rcu.2023.08.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (64 commits) rcu: Use WRITE_ONCE() for assignments to ->next for rculist_nulls rcu: Make the rcu_nocb_poll boot parameter usable via boot config rcu: Mark __rcu_irq_enter_check_tick() ->rcu_urgent_qs load srcu,notifier: Remove #ifdefs in favor of SRCU Tiny srcu_usage rcutorture: Stop right-shifting torture_random() return values torture: Stop right-shifting torture_random() return values torture: Move stutter_wait() timeouts to hrtimers torture: Move torture_shuffle() timeouts to hrtimers torture: Move torture_onoff() timeouts to hrtimers torture: Make torture_hrtimeout_*() use TASK_IDLE torture: Add lock_torture writer_fifo module parameter torture: Add a kthread-creation callback to _torture_create_kthread() rcu-tasks: Fix boot-time RCU tasks debug-only deadlock rcu-tasks: Permit use of debug-objects with RCU Tasks flavors checkpatch: Complain about unexpected uses of RCU Tasks Trace torture: Cause mkinitrd.sh to indicate failure on compile errors torture: Make init program dump command-line arguments torture: Switch qemu from -nographic to -display none torture: Add init-program support for loongarch torture: Avoid torture-test reboot loops ...
2023-08-28Merge tag 'v6.6-vfs.misc' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual filesystems. Features: - Block mode changes on symlinks and rectify our broken semantics - Report file modifications via fsnotify() for splice - Allow specifying an explicit timeout for the "rootwait" kernel command line option. This allows to timeout and reboot instead of always waiting indefinitely for the root device to show up - Use synchronous fput for the close system call Cleanups: - Get rid of open-coded lockdep workarounds for async io submitters and replace it all with a single consolidated helper - Simplify epoll allocation helper - Convert simple_write_begin and simple_write_end to use a folio - Convert page_cache_pipe_buf_confirm() to use a folio - Simplify __range_close to avoid pointless locking - Disable per-cpu buffer head cache for isolated cpus - Port ecryptfs to kmap_local_page() api - Remove redundant initialization of pointer buf in pipe code - Unexport the d_genocide() function which is only used within core vfs - Replace printk(KERN_ERR) and WARN_ON() with WARN() Fixes: - Fix various kernel-doc issues - Fix refcount underflow for eventfds when used as EFD_SEMAPHORE - Fix a mainly theoretical issue in devpts - Check the return value of __getblk() in reiserfs - Fix a racy assert in i_readcount_dec - Fix integer conversion issues in various functions - Fix LSM security context handling during automounts that prevented NFS superblock sharing" * tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) cachefiles: use kiocb_{start,end}_write() helpers ovl: use kiocb_{start,end}_write() helpers aio: use kiocb_{start,end}_write() helpers io_uring: use kiocb_{start,end}_write() helpers fs: create kiocb_{start,end}_write() helpers fs: add kerneldoc to file_{start,end}_write() helpers io_uring: rename kiocb_end_write() local helper splice: Convert page_cache_pipe_buf_confirm() to use a folio libfs: Convert simple_write_begin and simple_write_end to use a folio fs/dcache: Replace printk and WARN_ON by WARN fs/pipe: remove redundant initialization of pointer buf fs: Fix kernel-doc warnings devpts: Fix kernel-doc warnings doc: idmappings: fix an error and rephrase a paragraph init: Add support for rootwait timeout parameter vfs: fix up the assert in i_readcount_dec fs: Fix one kernel-doc comment docs: filesystems: idmappings: clarify from where idmappings are taken fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing ...
2023-08-25Merge branch 'acpi-thermal'Rafael J. Wysocki1-4/+0
Merge ACPI thermal driver changes for 6.6-rc1: - Drop non-functional nocrt parameter from ACPI thermal (Mario Limonciello). - Clean up the ACPI thermal driver, rework the handling of firmware notifications in it and make it provide a table of generic trip point structures to the core during initialization (Rafael Wysocki). * acpi-thermal: ACPI: thermal: Eliminate code duplication from acpi_thermal_notify() ACPI: thermal: Drop unnecessary thermal zone callbacks ACPI: thermal: Rework thermal_get_trend() ACPI: thermal: Use trip point table to register thermal zones thermal: core: Rework and rename __for_each_thermal_trip() ACPI: thermal: Introduce struct acpi_thermal_trip ACPI: thermal: Carry out trip point updates under zone lock ACPI: thermal: Clean up acpi_thermal_register_thermal_zone() thermal: core: Add priv pointer to struct thermal_trip thermal: core: Introduce thermal_zone_device_exec() thermal: core: Do not handle trip points with invalid temperature ACPI: thermal: Drop redundant local variable from acpi_thermal_resume() ACPI: thermal: Do not attach private data to ACPI handles ACPI: thermal: Drop enabled flag from struct acpi_thermal_active ACPI: thermal: Drop nocrt parameter
2023-08-18Documentation: Fix typosBjorn Helgaas1-1/+1
Fix typos in Documentation. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20230814212822.193684-4-helgaas@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-08-18doc: update params of memhp_default_state=Ma Wupeng1-1/+1
Commit 5f47adf762b7 ("mm/memory_hotplug: allow to specify a default online_type") allows to specify a default online_type which make online memory to kernel or movable zone possible but fail to update to doc. Update doc to fit this change. Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230802074312.2111074-1-mawupeng1@huawei.com
2023-08-18powerpc/idle: Add support for nohltVaibhav Jain1-1/+1
This patch enables config option GENERIC_IDLE_POLL_SETUP for arch powerpc. This adds support for kernel param 'nohlt'. Powerpc kernel also supports another kernel boot-time param called 'powersave' which can also be used to disable all cpu idle-states and forces CPU to an idle-loop similar to what cpu_idle_poll() does. This patch however makes powerpc kernel-parameters better aligned to the generic boot-time parameters. Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230818050739.827851-1-vaibhav@linux.ibm.com
2023-08-17tpm_tis: Revert "tpm_tis: Disable interrupts on ThinkPad T490s"Jarkko Sakkinen1-0/+7
Since for MMIO driver using FIFO registers, also known as tpm_tis, the default (and tbh recommended) behaviour is now the polling mode, the "tristate" workaround is no longer for benefit. If someone wants to explicitly enable IRQs for a TPM chip that should be without question allowed. It could very well be a piece hardware in the existing deny list because of e.g. firmware update or something similar. While at it, document the module parameter, as this was not done in 2006 when it first appeared in the mainline. Link: https://lore.kernel.org/linux-integrity/20201015214430.17937-1-jsnitsel@redhat.com/ Link: https://lore.kernel.org/all/1145393776.4829.19.camel@localhost.localdomain/ Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-08-17Merge branches 'doc.2023.07.14b', 'fixes.2023.08.16a', ↵Paul E. McKenney1-1/+55
'rcu-tasks.2023.07.24a', 'rcuscale.2023.07.14b', 'refscale.2023.07.14b', 'torture.2023.08.14a' and 'torturescripts.2023.07.20a' into HEAD doc.2023.07.14b: Documentation updates. fixes.2023.08.16a: Miscellaneous fixes. rcu-tasks.2023.07.24a: RCU Tasks updates. rcuscale.2023.07.14b: RCU (updater) scalability test updates. refscale.2023.07.14b: Reference (reader) scalability test updates. torture.2023.08.14a: Other torture-test updates. torturescripts.2023.07.20a: Other torture-test scripting updates.
2023-08-16docs: kdump: Update the crashkernel description for riscvChen Jiahao1-7/+8
Now "crashkernel=" parameter on riscv has been updated to support crashkernel=X,[high,low]. Through which we can reserve memory region above/within 32bit addressible DMA zone. Here update the parameter description accordingly. Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Baoquan He <bhe@redhat.com> Link: https://lore.kernel.org/r/20230726175000.2536220-3-chenjiahao16@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-15init: Add support for rootwait timeout parameterLoic Poulain1-0/+4
Add an optional timeout arg to 'rootwait' as the maximum time in seconds to wait for the root device to show up before attempting forced mount of the root filesystem. Use case: In case of device mapper usage for the rootfs (e.g. root=/dev/dm-0), if the mapper is not able to create the virtual block for any reason (wrong arguments, bad dm-verity signature, etc), the `rootwait` param causes the kernel to wait forever. It may however be desirable to only wait for a given time and then panic (force mount) to cause device reset. This gives the bootloader a chance to detect the problem and to take some measures, such as marking the booted partition as bad (for A/B case) or entering a recovery mode. In success case, mounting happens as soon as the root device is ready, unlike the existing 'rootdelay' parameter which performs an unconditional pause. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Message-Id: <20230813082349.513386-1-loic.poulain@linaro.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-15torture: Add lock_torture writer_fifo module parameterDietmar Eggemann1-0/+4
This commit adds a module parameter that causes the locktorture writer to run at real-time priority. To use it: insmod /lib/modules/torture.ko random_shuffle=1 insmod /lib/modules/locktorture.ko torture_type=mutex_lock rt_boost=1 rt_boost_factor=50 nested_locks=3 writer_fifo=1 ^^^^^^^^^^^^^ A predecessor to this patch has been helpful to uncover issues with the proxy-execution series. [ paulmck: Remove locktorture-specific code from kernel/torture.c. ] Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: kernel-team@android.com Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> [jstultz: Include header change to build, reword commit message] Signed-off-by: John Stultz <jstultz@google.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-08-14Merge 6.5-rc6 into usb-nextGreg Kroah-Hartman1-13/+45
We need the USB and Thunderbolt fixes in here to build on. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-09USB: Remove Wireless USB and UWB documentationAlan Stern1-1/+1
Support for Wireless USB and Ultra WideBand was removed in 2020 by commit caa6772db4c1 ("Staging: remove wusbcore and UWB from the kernel tree."). But the documentation files were left behind. Let's get rid of that out-of-date documentation. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/015d4310-bcd3-4ba4-9a0e-3664f281a9be@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-08workqueue: Make default affinity_scope dynamically updatableTejun Heo1-4/+4
While workqueue.default_affinity_scope is writable, it only affects workqueues which are created afterwards and isn't very useful. Instead, let's introduce explicit "default" scope and update the effective scope dynamically when workqueue.default_affinity_scope is changed. Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-08workqueue: Add multiple affinity scopes and interface to select themTejun Heo1-0/+12
Add three more affinity scopes - WQ_AFFN_CPU, SMT and CACHE - and make CACHE the default. The code changes to actually add the additional scopes are trivial. Also add module parameter "workqueue.default_affinity_scope" to override the default scope and "affinity_scope" sysfs file to configure it per workqueue. wq_dump.py and documentations are updated accordingly. This enables significant flexibility in configuring how unbound workqueues behave. If affinity scope is set to "cpu", it'll behave close to a per-cpu workqueue. On the other hand, "system" removes all locality boundaries. Many modern machines have multiple L3 caches often while being mostly uniform in terms of memory access. Thus, workqueue's previous behavior of spreading work items in each NUMA node had negative performance implications from unncessarily crossing L3 boundaries between issue and execution. However, picking a finer grained affinity scope also has a downside in that an issuer in one group can't utilize CPUs in other groups. While dependent on the specifics of workload, there's usually a noticeable penalty in crossing L3 boundaries, so let's default to CACHE. This issue will be further addressed and documented with examples in future patches. Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-08workqueue: Remove module param disable_numa and sysfs knobs pool_ids and numaTejun Heo1-9/+0
Unbound workqueue CPU affinity is going to receive an overhaul and the NUMA specific knobs won't make sense anymore. Remove them. Also, the pool_ids knob was used for debugging and not really meaningful given that there is no visibility into the pools associated with those IDs. Remove it too. A future patch will improve overall visibility. Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-08Merge tag 'gds-for-linus-2023-08-01' of ↵Linus Torvalds1-13/+34
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/gds fixes from Dave Hansen: "Mitigate Gather Data Sampling issue: - Add Base GDS mitigation - Support GDS_NO under KVM - Fix a documentation typo" * tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/x86: Fix backwards on/off logic about YMM support KVM: Add GDS_NO support to KVM x86/speculation: Add Kconfig option for GDS x86/speculation: Add force option to GDS mitigation x86/speculation: Add Gather Data Sampling mitigation
2023-08-02powerpc: Add HOTPLUG_SMT supportMichael Ellerman1-2/+2
Add support for HOTPLUG_SMT, which enables the generic sysfs SMT support files in /sys/devices/system/cpu/smt, as well as the "nosmt" boot parameter. Implement the recently added hooks to allow partial SMT states, allow any number of threads per core. Tie the config symbol to HOTPLUG_CPU, which enables it on the major platforms that support SMT. If there are other platforms that want the SMT support that can be tweaked in future. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [ldufour: remove topology_smt_supported] [ldufour: remove topology_smt_threads_supported] [ldufour: select CONFIG_SMT_NUM_THREADS_DYNAMIC] [ldufour: update kernel-parameters.txt] Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Link: https://msgid.link/20230705145143.40545-10-ldufour@linux.ibm.com