summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2023-11-29x86/CPU/AMD: Move the Zen3 BTC_NO detection to the Zen3 init functionBorislav Petkov (AMD)1-8/+9
No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: http://lore.kernel.org/r/20231120104152.13740-4-bp@alien8.de
2023-11-29x86/CPU/AMD: Carve out the erratum 1386 fixBorislav Petkov (AMD)1-9/+15
Call it on the affected CPU generations. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: http://lore.kernel.org/r/20231120104152.13740-3-bp@alien8.de
2023-11-29x86/CPU/AMD: Add ZenX generations flagsBorislav Petkov (AMD)1-2/+68
Add X86_FEATURE flags for each Zen generation. They should be used from now on instead of checking f/m/s. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/r/20231120104152.13740-2-bp@alien8.de
2023-11-28x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank typesMuralidhara M K1-0/+6
Add HWID and McaType values for new SMCA bank types. Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231102114225.2006878-3-muralimk@amd.com
2023-11-27x86/mce/amd, EDAC/mce_amd: Move long names to decoder moduleYazen Ghannam1-44/+30
The long names of the SMCA banks are only used by the MCE decoder module. Move them out of the arch code and into the decoder module. [ bp: Name the long names array "smca_long_names", drop local ptr in decode_smca_error(), constify arrays. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231118193248.1296798-5-yazen.ghannam@amd.com
2023-11-26Merge tag 'x86-urgent-2023-11-26' of ↵Linus Torvalds4-48/+37
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode fixes from Ingo Molnar: "Fix/enhance x86 microcode version reporting: fix the bootup log spam, and remove the driver version announcement to avoid version confusion when distros backport fixes" * tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Rework early revisions reporting x86/microcode: Remove the driver announcement and version
2023-11-24x86/cpu/intel_epb: Don't rely on link orderJames Morse1-1/+1
intel_epb_init() is called as a subsys_initcall() to register cpuhp callbacks. The callbacks make use of get_cpu_device() which will return NULL unless register_cpu() has been called. register_cpu() is called from topology_init(), which is also a subsys_initcall(). This is fragile. Moving the register_cpu() to a different subsys_initcall() leads to a NULL dereference during boot. Make intel_epb_init() a late_initcall(), user-space can't provide a policy before this point anyway. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-23arch: include linux/cpu.h for trap_init() prototypeArnd Bergmann1-0/+1
some architectures run into a -Wmissing-prototypes warning for trap_init() arch/microblaze/kernel/traps.c:21:6: warning: no previous prototype for 'trap_init' [-Wmissing-prototypes] Include the right header to avoid this consistently, removing the extra declarations on m68k and x86 that were added as local workarounds already. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-11-23x86/ioapic: Remove unfinished sentence from commentAdrian Huang1-1/+1
[ mingo: Refine changelog. ] Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org
2023-11-22x86/mce/inject: Clear test status valueYazen Ghannam1-0/+1
AMD systems generally allow MCA "simulation" where MCA registers can be written with valid data and the full MCA handling flow can be tested by software. However, the platform on Scalable MCA systems, can prevent software from writing data to the MCA registers. There is no architectural way to determine this configuration. Therefore, the MCE injection module will check for this behavior by writing and reading back a test status value. This is done during module init, and the check can run on any CPU with any valid MCA bank. If MCA_STATUS writes are ignored by the platform, then there are no side effects on the hardware state. If the writes are not ignored, then the test status value will remain in the hardware MCA_STATUS register. It is likely that the value will not be overwritten by hardware or software, since the tested CPU and bank are arbitrary. Therefore, the user may see a spurious, synthetic MCA error reported whenever MCA is polled for this CPU. Clear the test value immediately after writing it. It is very unlikely that a valid MCA error is logged by hardware during the test. Errors that cause an #MC won't be affected. Fixes: 891e465a1bd8 ("x86/mce: Check whether writes to MCA_STATUS are getting ignored") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231118193248.1296798-2-yazen.ghannam@amd.com
2023-11-22Merge tag 'hyperv-fixes-signed-20231121' of ↵Linus Torvalds1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - One fix for the KVP daemon (Ani Sinha) - Fix for the detection of E820_TYPE_PRAM in a Gen2 VM (Saurabh Sengar) - Micro-optimization for hv_nmi_unknown() (Uros Bizjak) * tag 'hyperv-fixes-signed-20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown() x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM hv/hv_kvp_daemon: Some small fixes for handling NM keyfiles
2023-11-22x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()Uros Bizjak1-1/+4
Use atomic_try_cmpxchg() instead of atomic_cmpxchg(*ptr, old, new) == old in hv_nmi_unknown(). On x86 the CMPXCHG instruction returns success in the ZF flag, so this change saves a compare after CMPXCHG. The generated asm code improves from: 3e: 65 8b 15 00 00 00 00 mov %gs:0x0(%rip),%edx 45: b8 ff ff ff ff mov $0xffffffff,%eax 4a: f0 0f b1 15 00 00 00 lock cmpxchg %edx,0x0(%rip) 51: 00 52: 83 f8 ff cmp $0xffffffff,%eax 55: 0f 95 c0 setne %al to: 3e: 65 8b 15 00 00 00 00 mov %gs:0x0(%rip),%edx 45: b8 ff ff ff ff mov $0xffffffff,%eax 4a: f0 0f b1 15 00 00 00 lock cmpxchg %edx,0x0(%rip) 51: 00 52: 0f 95 c0 setne %al No functional change intended. Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Dexuan Cui <decui@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20231114170038.381634-1-ubizjak@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20231114170038.381634-1-ubizjak@gmail.com>
2023-11-21x86/apic: Drop apic::delivery_modeAndrew Cooper8-10/+0
This field is set to APIC_DELIVERY_MODE_FIXED in all cases, and is read exactly once. Fold the constant in uv_program_mmr() and drop the field. Searching for the origin of the stale HyperV comment reveals commit a31e58e129f7 ("x86/apic: Switch all APICs to Fixed delivery mode") which notes: As a consequence of this change, the apic::irq_delivery_mode field is now pointless, but this needs to be cleaned up in a separate patch. 6 years is long enough for this technical debt to have survived. [ bp: Fold in https://lore.kernel.org/r/20231121123034.1442059-1-andrew.cooper3@citrix.com ] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20231102-x86-apic-v1-1-bf049a2a0ed6@citrix.com
2023-11-21x86/microcode: Rework early revisions reportingBorislav Petkov (AMD)4-44/+37
The AMD side of the loader issues the microcode revision for each logical thread on the system, which can become really noisy on huge machines. And doing that doesn't make a whole lot of sense - the microcode revision is already in /proc/cpuinfo. So in case one is interested in the theoretical support of mixed silicon steppings on AMD, one can check there. What is also missing on the AMD side - something which people have requested before - is showing the microcode revision the CPU had *before* the early update. So abstract that up in the main code and have the BSP on each vendor provide those revision numbers. Then, dump them only once on driver init. On Intel, do not dump the patch date - it is not needed. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/CAHk-=wg=%2B8rceshMkB4VnKxmRccVLtBLPBawnewZuuqyx5U=3A@mail.gmail.com
2023-11-21x86/microcode: Remove the driver announcement and versionBorislav Petkov (AMD)1-4/+0
First of all, the print is useless. The driver will either load and say which microcode revision the machine has or issue an error. Then, the version number is meaningless and actively confusing, as Yazen mentioned recently: when a subset of patches are backported to a distro kernel, one can't assume the driver version is the same as the upstream one. And besides, the version number of the loader hasn't been used and incremented for a long time. So drop it. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20231115210212.9981-2-bp@alien8.de
2023-11-20x86/mtrr: Document missing function parameters in kernel-docBorislav Petkov (AMD)1-4/+10
Add text explaining what they do. No functional changes. Closes: https://lore.kernel.org/oe-kbuild-all/202311130104.9xKAKzke-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/202311130104.9xKAKzke-lkp@intel.com
2023-11-20Merge tag 'x86_urgent_for_v6.7_rc2' of ↵Linus Torvalds2-22/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Ignore invalid x2APIC entries in order to not waste per-CPU data - Fix a back-to-back signals handling scenario when shadow stack is in use - A documentation fix - Add Kirill as TDX maintainer * tag 'x86_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/acpi: Ignore invalid x2APIC entries x86/shstk: Delay signal entry SSP write until after user accesses x86/Documentation: Indent 'note::' directive for protocol version number note MAINTAINERS: Add Intel TDX entry
2023-11-17x86/smp: Export symbol cpu_clustergroup_mask()Kan Liang1-0/+1
Intel cstate PMU driver will invoke the topology_cluster_cpumask() to retrieve the CPU mask of a cluster. A modpost error is triggered since the symbol cpu_clustergroup_mask is not exported. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20231116142245.1233485-2-kan.liang@linux.intel.com
2023-11-15x86/mce: Remove redundant check from mce_device_create()Nikolay Borisov1-3/+0
mce_device_create() is called only from mce_cpu_online() which in turn will be called iff MCA support is available. That is, at the time of mce_device_create() call it's guaranteed that MCA support is available. No need to duplicate this check so remove it. [ bp: Massage commit message. ] Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231107165529.407349-1-nik.borisov@suse.com
2023-11-13x86/paravirt: Use relative reference for the original instruction offsetHou Wenlong2-4/+6
Similar to the alternative patching, use a relative reference for original instruction offset rather than absolute one, which saves 8 bytes for one PARA_SITE entry on x86_64. As a result, a R_X86_64_PC32 relocation is generated instead of an R_X86_64_64 one, which also reduces relocation metadata on relocatable builds. Hardcode the alignment to 4 now. [ bp: Massage commit message. ] Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/9e6053107fbaabc0d33e5d2865c5af2c67ec9925.1686301237.git.houwenlong.hwl@antgroup.com
2023-11-13x86/barrier: Do not serialize MSR accesses on AMDBorislav Petkov (AMD)3-0/+13
AMD does not have the requirement for a synchronization barrier when acccessing a certain group of MSRs. Do not incur that unnecessary penalty there. There will be a CPUID bit which explicitly states that a MFENCE is not needed. Once that bit is added to the APM, this will be extended with it. While at it, move to processor.h to avoid include hell. Untangling that file properly is a matter for another day. Some notes on the performance aspect of why this is relevant, courtesy of Kishon VijayAbraham <Kishon.VijayAbraham@amd.com>: On a AMD Zen4 system with 96 cores, a modified ipi-bench[1] on a VM shows x2AVIC IPI rate is 3% to 4% lower than AVIC IPI rate. The ipi-bench is modified so that the IPIs are sent between two vCPUs in the same CCX. This also requires to pin the vCPU to a physical core to prevent any latencies. This simulates the use case of pinning vCPUs to the thread of a single CCX to avoid interrupt IPI latency. In order to avoid run-to-run variance (for both x2AVIC and AVIC), the below configurations are done: 1) Disable Power States in BIOS (to prevent the system from going to lower power state) 2) Run the system at fixed frequency 2500MHz (to prevent the system from increasing the frequency when the load is more) With the above configuration: *) Performance measured using ipi-bench for AVIC: Average Latency: 1124.98ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 42.6759M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] *) Performance measured using ipi-bench for x2AVIC: Average Latency: 1172.42ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 40.9432M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] From above, x2AVIC latency is ~4% more than AVIC. However, the expectation is x2AVIC performance to be better or equivalent to AVIC. Upon analyzing the perf captures, it is observed significant time is spent in weak_wrmsr_fence() invoked by x2apic_send_IPI(). With the fix to skip weak_wrmsr_fence() *) Performance measured using ipi-bench for x2AVIC: Average Latency: 1117.44ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 42.9608M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] Comparing the performance of x2AVIC with and without the fix, it can be seen the performance improves by ~4%. Performance captured using an unmodified ipi-bench using the 'mesh-ipi' option with and without weak_wrmsr_fence() on a Zen4 system also showed significant performance improvement without weak_wrmsr_fence(). The 'mesh-ipi' option ignores CCX or CCD and just picks random vCPU. Average throughput (10 iterations) with weak_wrmsr_fence(), Cumulative throughput: 4933374 IPI/s Average throughput (10 iterations) without weak_wrmsr_fence(), Cumulative throughput: 6355156 IPI/s [1] https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/ipi-bench Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230622095212.20940-1-bp@alien8.de
2023-11-13x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernelZhiquan Li1-0/+16
Memory errors don't happen very often, especially fatal ones. However, in large-scale scenarios such as data centers, that probability increases with the amount of machines present. When a fatal machine check happens, mce_panic() is called based on the severity grading of that error. The page containing the error is not marked as poison. However, when kexec is enabled, tools like makedumpfile understand when pages are marked as poison and do not touch them so as not to cause a fatal machine check exception again while dumping the previous kernel's memory. Therefore, mark the page containing the error as poisoned so that the kexec'ed kernel can avoid accessing the page. [ bp: Rewrite commit message and comment. ] Co-developed-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Link: https://lore.kernel.org/r/20231014051754.3759099-1-zhiquan1.li@intel.com
2023-11-13x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()Yuntao Wang1-3/+1
After 0b62f6cb0773 ("x86/microcode/32: Move early loading after paging enable"), the global variable relocated_ramdisk is no longer used anywhere except for the relocate_initrd() function. Make it a local variable of that function. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Baoquan He <bhe@redhat.com> Link: https://lore.kernel.org/r/20231113034026.130679-1-ytcoode@gmail.com
2023-11-09x86/acpi: Ignore invalid x2APIC entriesZhang Rui1-19/+15
Currently, the kernel enumerates the possible CPUs by parsing both ACPI MADT Local APIC entries and x2APIC entries. So CPUs with "valid" APIC IDs, even if they have duplicated APIC IDs in Local APIC and x2APIC, are always enumerated. Below is what ACPI MADT Local APIC and x2APIC describes on an Ivebridge-EP system, [02Ch 0044 1] Subtable Type : 00 [Processor Local APIC] [02Fh 0047 1] Local Apic ID : 00 ... [164h 0356 1] Subtable Type : 00 [Processor Local APIC] [167h 0359 1] Local Apic ID : 39 [16Ch 0364 1] Subtable Type : 00 [Processor Local APIC] [16Fh 0367 1] Local Apic ID : FF ... [3ECh 1004 1] Subtable Type : 09 [Processor Local x2APIC] [3F0h 1008 4] Processor x2Apic ID : 00000000 ... [B5Ch 2908 1] Subtable Type : 09 [Processor Local x2APIC] [B60h 2912 4] Processor x2Apic ID : 00000077 As a result, kernel shows "smpboot: Allowing 168 CPUs, 120 hotplug CPUs". And this wastes significant amount of memory for the per-cpu data. Plus this also breaks https://lore.kernel.org/all/87edm36qqb.ffs@tglx/, because __max_logical_packages is over-estimated by the APIC IDs in the x2APIC entries. According to https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#processor-local-x2apic-structure: "[Compatibility note] On some legacy OSes, Logical processors with APIC ID values less than 255 (whether in XAPIC or X2APIC mode) must use the Processor Local APIC structure to convey their APIC information to OSPM, and those processors must be declared in the DSDT using the Processor() keyword. Logical processors with APIC ID values 255 and greater must use the Processor Local x2APIC structure and be declared using the Device() keyword." Therefore prevent the registration of x2APIC entries with an APIC ID less than 255 if the local APIC table enumerates valid APIC IDs. [ tglx: Simplify the logic ] Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230702162802.344176-1-rui.zhang@intel.com
2023-11-08x86/shstk: Delay signal entry SSP write until after user accessesRick Edgecombe1-3/+3
When a signal is being delivered, the kernel needs to make accesses to userspace. These accesses could encounter an access error, in which case the signal delivery itself will trigger a segfault. Usually this would result in the kernel killing the process. But in the case of a SEGV signal handler being configured, the failure of the first signal delivery will result in *another* signal getting delivered. The second signal may succeed if another thread has resolved the issue that triggered the segfault (i.e. a well timed mprotect()/mmap()), or the second signal is being delivered to another stack (i.e. an alt stack). On x86, in the non-shadow stack case, all the accesses to userspace are done before changes to the registers (in pt_regs). The operation is aborted when an access error occurs, so although there may be writes done for the first signal, control flow changes for the signal (regs->ip, regs->sp, etc) are not committed until all the accesses have already completed successfully. This means that the second signal will be delivered as if it happened at the time of the first signal. It will effectively replace the first aborted signal, overwriting the half-written frame of the aborted signal. So on sigreturn from the second signal, control flow will resume happily from the point of control flow where the original signal was delivered. The problem is, when shadow stack is active, the shadow stack SSP register/MSR is updated *before* some of the userspace accesses. This means if the earlier accesses succeed and the later ones fail, the second signal will not be delivered at the same spot on the shadow stack as the first one. So on sigreturn from the second signal, the SSP will be pointing to the wrong location on the shadow stack (off by a frame). Pengfei privately reported that while using a shadow stack enabled glibc, the “signal06” test in the LTP test-suite hung. It turns out it is testing the above described double signal scenario. When this test was compiled with shadow stack, the first signal pushed a shadow stack sigframe, then the second pushed another. When the second signal was handled, the SSP was at the first shadow stack signal frame instead of the original location. The test then got stuck as the #CP from the twice incremented SSP was incorrect and generated segfaults in a loop. Fix this by adjusting the SSP register only after any userspace accesses, such that there can be no failures after the SSP is adjusted. Do this by moving the shadow stack sigframe push logic to happen after all other userspace accesses. Note, sigreturn (as opposed to the signal delivery dealt with in this patch) has ordering behavior that could lead to similar failures. The ordering issues there extend beyond shadow stack to include the alt stack restoration. Fixing that would require cross-arch changes, and the ordering today does not cause any known test or apps breakages. So leave it as is, for now. [ dhansen: minor changelog/subject tweak ] Fixes: 05e36022c054 ("x86/shstk: Handle signals for shadow stack") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20231107182251.91276-1-rick.p.edgecombe%40intel.com Link: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/signal/signal06.c
2023-11-04Merge tag 'x86_microcode_for_v6.7_rc1' of ↵Linus Torvalds14-880/+846
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading updates from Borislac Petkov: "Major microcode loader restructuring, cleanup and improvements by Thomas Gleixner: - Restructure the code needed for it and add a temporary initrd mapping on 32-bit so that the loader can access the microcode blobs. This in itself is a preparation for the next major improvement: - Do not load microcode on 32-bit before paging has been enabled. Handling this has caused an endless stream of headaches, issues, ugly code and unnecessary hacks in the past. And there really wasn't any sensible reason to do that in the first place. So switch the 32-bit loading to happen after paging has been enabled and turn the loader code "real purrty" again - Drop mixed microcode steppings loading on Intel - there, a single patch loaded on the whole system is sufficient - Rework late loading to track which CPUs have updated microcode successfully and which haven't, act accordingly - Move late microcode loading on Intel in NMI context in order to guarantee concurrent loading on all threads - Make the late loading CPU-hotplug-safe and have the offlined threads be woken up for the purpose of the update - Add support for a minimum revision which determines whether late microcode loading is safe on a machine and the microcode does not change software visible features which the machine cannot use anyway since feature detection has happened already. Roughly, the minimum revision is the smallest revision number which must be loaded currently on the system so that late updates can be allowed - Other nice leanups, fixess, etc all over the place" * tag 'x86_microcode_for_v6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) x86/microcode/intel: Add a minimum required revision for late loading x86/microcode: Prepare for minimal revision check x86/microcode: Handle "offline" CPUs correctly x86/apic: Provide apic_force_nmi_on_cpu() x86/microcode: Protect against instrumentation x86/microcode: Rendezvous and load in NMI x86/microcode: Replace the all-in-one rendevous handler x86/microcode: Provide new control functions x86/microcode: Add per CPU control field x86/microcode: Add per CPU result state x86/microcode: Sanitize __wait_for_cpus() x86/microcode: Clarify the late load logic x86/microcode: Handle "nosmt" correctly x86/microcode: Clean up mc_cpu_down_prep() x86/microcode: Get rid of the schedule work indirection x86/microcode: Mop up early loading leftovers x86/microcode/amd: Use cached microcode for AP load x86/microcode/amd: Cache builtin/initrd microcode early x86/microcode/amd: Cache builtin microcode too x86/microcode/amd: Use correct per CPU ucode_cpu_info ...
2023-11-04Merge tag 'tty-6.7-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty and serial updates from Greg KH: "Here is the big set of tty/serial driver changes for 6.7-rc1. Included in here are: - console/vgacon cleanups and removals from Arnd - tty core and n_tty cleanups from Jiri - lots of 8250 driver updates and cleanups - sc16is7xx serial driver updates - dt binding updates - first set of port lock wrapers from Thomas for the printk fixes coming in future releases - other small serial and tty core cleanups and updates All of these have been in linux-next for a while with no reported issues" * tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (193 commits) serdev: Replace custom code with device_match_acpi_handle() serdev: Simplify devm_serdev_device_open() function serdev: Make use of device_set_node() tty: n_gsm: add copyright Siemens Mobility GmbH tty: n_gsm: fix race condition in status line change on dead connections serial: core: Fix runtime PM handling for pending tx vgacon: fix mips/sibyte build regression dt-bindings: serial: drop unsupported samsung bindings tty: serial: samsung: drop earlycon support for unsupported platforms tty: 8250: Add note for PX-835 tty: 8250: Fix IS-200 PCI ID comment tty: 8250: Add Brainboxes Oxford Semiconductor-based quirks tty: 8250: Add support for Intashield IX cards tty: 8250: Add support for additional Brainboxes PX cards tty: 8250: Fix up PX-803/PX-857 tty: 8250: Fix port count of PX-257 tty: 8250: Add support for Intashield IS-100 tty: 8250: Add support for Brainboxes UP cards tty: 8250: Add support for additional Brainboxes UC cards tty: 8250: Remove UC-257 and UC-431 ...
2023-11-03Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of ↵Linus Torvalds1-136/+11
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "As usual, lots of singleton and doubleton patches all over the tree and there's little I can say which isn't in the individual changelogs. The lengthier patch series are - 'kdump: use generic functions to simplify crashkernel reservation in arch', from Baoquan He. This is mainly cleanups and consolidation of the 'crashkernel=' kernel parameter handling - After much discussion, David Laight's 'minmax: Relax type checks in min() and max()' is here. Hopefully reduces some typecasting and the use of min_t() and max_t() - A group of patches from Oleg Nesterov which clean up and slightly fix our handling of reads from /proc/PID/task/... and which remove task_struct.thread_group" * tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits) scripts/gdb/vmalloc: disable on no-MMU scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n .mailmap: add address mapping for Tomeu Vizoso mailmap: update email address for Claudiu Beznea tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions .mailmap: map Benjamin Poirier's address scripts/gdb: add lx_current support for riscv ocfs2: fix a spelling typo in comment proc: test ProtectionKey in proc-empty-vm test proc: fix proc-empty-vm test with vsyscall fs/proc/base.c: remove unneeded semicolon do_io_accounting: use sig->stats_lock do_io_accounting: use __for_each_thread() ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error() ocfs2: fix a typo in a comment scripts/show_delta: add __main__ judgement before main code treewide: mark stuff as __ro_after_init fs: ocfs2: check status values proc: test /proc/${pid}/statm compiler.h: move __is_constexpr() to compiler.h ...
2023-11-02Merge tag 'sysctl-6.7-rc1' of ↵Linus Torvalds2-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "To help make the move of sysctls out of kernel/sysctl.c not incur a size penalty sysctl has been changed to allow us to not require the sentinel, the final empty element on the sysctl array. Joel Granados has been doing all this work. On the v6.6 kernel we got the major infrastructure changes required to support this. For v6.7-rc1 we have all arch/ and drivers/ modified to remove the sentinel. Both arch and driver changes have been on linux-next for a bit less than a month. It is worth re-iterating the value: - this helps reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array - the extra 64-byte penalty is no longer inncurred now when we move sysctls out from kernel/sysctl.c to their own files For v6.8-rc1 expect removal of all the sentinels and also then the unneeded check for procname == NULL. The last two patches are fixes recently merged by Krister Johansen which allow us again to use softlockup_panic early on boot. This used to work but the alias work broke it. This is useful for folks who want to detect softlockups super early rather than wait and spend money on cloud solutions with nothing but an eventual hung kernel. Although this hadn't gone through linux-next it's also a stable fix, so we might as well roll through the fixes now" * tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (23 commits) watchdog: move softlockup_panic back to early_param proc: sysctl: prevent aliased sysctls from getting passed to init intel drm: Remove now superfluous sentinel element from ctl_table array Drivers: hv: Remove now superfluous sentinel element from ctl_table array raid: Remove now superfluous sentinel element from ctl_table array fw loader: Remove the now superfluous sentinel element from ctl_table array sgi-xp: Remove the now superfluous sentinel element from ctl_table array vrf: Remove the now superfluous sentinel element from ctl_table array char-misc: Remove the now superfluous sentinel element from ctl_table array infiniband: Remove the now superfluous sentinel element from ctl_table array macintosh: Remove the now superfluous sentinel element from ctl_table array parport: Remove the now superfluous sentinel element from ctl_table array scsi: Remove now superfluous sentinel element from ctl_table array tty: Remove now superfluous sentinel element from ctl_table array xen: Remove now superfluous sentinel element from ctl_table array hpet: Remove now superfluous sentinel element from ctl_table array c-sky: Remove now superfluous sentinel element from ctl_talbe array powerpc: Remove now superfluous sentinel element from ctl_table arrays riscv: Remove now superfluous sentinel element from ctl_table array x86/vdso: Remove now superfluous sentinel element from ctl_table array ...
2023-11-01Merge tag 'x86_tdx_for_6.7' of ↵Linus Torvalds1-20/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 TDX updates from Dave Hansen: "The majority of this is a rework of the assembly and C wrappers that are used to talk to the TDX module and VMM. This is a nice cleanup in general but is also clearing the way for using this code when Linux is the TDX VMM. There are also some tidbits to make TDX guests play nicer with Hyper-V and to take advantage the hardware TSC. Summary: - Refactor and clean up TDX hypercall/module call infrastructure - Handle retrying/resuming page conversion hypercalls - Make sure to use the (shockingly) reliable TSC in TDX guests" [ TLA reminder: TDX is "Trust Domain Extensions", Intel's guest VM confidentiality technology ] * tag 'x86_tdx_for_6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tdx: Mark TSC reliable x86/tdx: Fix __noreturn build warning around __tdx_hypercall_failed() x86/virt/tdx: Make TDX_MODULE_CALL handle SEAMCALL #UD and #GP x86/virt/tdx: Wire up basic SEAMCALL functions x86/tdx: Remove 'struct tdx_hypercall_args' x86/tdx: Reimplement __tdx_hypercall() using TDX_MODULE_CALL asm x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs x86/tdx: Pass TDCALL/SEAMCALL input/output registers via a structure x86/tdx: Rename __tdx_module_call() to __tdcall() x86/tdx: Make macros of TDCALLs consistent with the spec x86/tdx: Skip saving output regs when SEAMCALL fails with VMFailInvalid x86/tdx: Zero out the missing RSI in TDX_HYPERCALL macro x86/tdx: Retry partially-completed page conversion hypercalls
2023-10-31Merge tag 'rcu-next-v6.7' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks Pull RCU updates from Frederic Weisbecker: - RCU torture, locktorture and generic torture infrastructure updates that include various fixes, cleanups and consolidations. Among the user visible things, ftrace dumps can now be found into their own file, and module parameters get better documented and reported on dumps. - Generic and misc fixes all over the place. Some highlights: * Hotplug handling has seen some light cleanups and comments * An RCU barrier can now be triggered through sysfs to serialize memory stress testing and avoid OOM * Object information is now dumped in case of invalid callback invocation * Also various SRCU issues, too hard to trigger to deserve urgent pull requests, have been fixed - RCU documentation updates - RCU reference scalability test minor fixes and doc improvements. - RCU tasks minor fixes - Stall detection updates. Introduce RCU CPU Stall notifiers that allows a subsystem to provide informations to help debugging. Also cure some false positive stalls. * tag 'rcu-next-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks: (56 commits) srcu: Only accelerate on enqueue time locktorture: Check the correct variable for allocation failure srcu: Fix callbacks acceleration mishandling rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP rcu: Standardize explicit CPU-hotplug calls rcu: Conditionally build CPU-hotplug teardown callbacks rcu: Remove references to rcu_migrate_callbacks() from diagrams rcu: Assume rcu_report_dead() is always called locally rcu: Assume IRQS disabled from rcu_report_dead() rcu: Use rcu_segcblist_segempty() instead of open coding it rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects srcu: Fix srcu_struct node grpmask overflow on 64-bit systems torture: Convert parse-console.sh to mktemp rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() rcutorture: Replace schedule_timeout*() 1-jiffy waits with HZ/20 torture: Add kvm.sh --debug-info argument locktorture: Rename readers_bind/writers_bind to bind_readers/bind_writers doc: Catch-up update for locktorture module parameters locktorture: Add call_rcu_chains module parameter locktorture: Add new module parameters to lock_torture_print_module_parms() ...
2023-10-31Merge tag 'x86-core-2023-10-29-v2' of ↵Linus Torvalds30-225/+272
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Thomas Gleixner: - Limit the hardcoded topology quirk for Hygon CPUs to those which have a model ID less than 4. The newer models have the topology CPUID leaf 0xB correctly implemented and are not affected. - Make SMT control more robust against enumeration failures SMT control was added to allow controlling SMT at boottime or runtime. The primary purpose was to provide a simple mechanism to disable SMT in the light of speculation attack vectors. It turned out that the code is sensible to enumeration failures and worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration which means the primary thread mask is not set up correctly. By chance a XEN/PV boot ends up with smp_num_siblings == 2, which makes the hotplug control stay at its default value "enabled". So the mask is never evaluated. The ongoing rework of the topology evaluation caused XEN/PV to end up with smp_num_siblings == 1, which sets the SMT control to "not supported" and the empty primary thread mask causes the hotplug core to deny the bringup of the APS. Make the decision logic more robust and take 'not supported' and 'not implemented' into account for the decision whether a CPU should be booted or not. - Fake primary thread mask for XEN/PV Pretend that all XEN/PV vCPUs are primary threads, which makes the usage of the primary thread mask valid on XEN/PV. That is consistent with because all of the topology information on XEN/PV is fake or even non-existent. - Encapsulate topology information in cpuinfo_x86 Move the randomly scattered topology data into a separate data structure for readability and as a preparatory step for the topology evaluation overhaul. - Consolidate APIC ID data type to u32 It's fixed width hardware data and not randomly u16, int, unsigned long or whatever developers decided to use. - Cure the abuse of cpuinfo for persisting logical IDs. Per CPU cpuinfo is used to persist the logical package and die IDs. That's really not the right place simply because cpuinfo is subject to be reinitialized when a CPU goes through an offline/online cycle. Use separate per CPU data for the persisting to enable the further topology management rework. It will be removed once the new topology management is in place. - Provide a debug interface for inspecting topology information Useful in general and extremly helpful for validating the topology management rework in terms of correctness or "bug" compatibility. * tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/apic, x86/hyperv: Use u32 in hv_snp_boot_ap() too x86/cpu: Provide debug interface x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids x86/apic: Use u32 for wakeup_secondary_cpu[_64]() x86/apic: Use u32 for [gs]et_apic_id() x86/apic: Use u32 for phys_pkg_id() x86/apic: Use u32 for cpu_present_to_apicid() x86/apic: Use u32 for check_apicid_used() x86/apic: Use u32 for APIC IDs in global data x86/apic: Use BAD_APICID consistently x86/cpu: Move cpu_l[l2]c_id into topology info x86/cpu: Move logical package and die IDs into topology info x86/cpu: Remove pointless evaluation of x86_coreid_bits x86/cpu: Move cu_id into topology info x86/cpu: Move cpu_core_id into topology info hwmon: (fam15h_power) Use topology_core_id() scsi: lpfc: Use topology_core_id() x86/cpu: Move cpu_die_id into topology info x86/cpu: Move phys_proc_id into topology info x86/cpu: Encapsulate topology information in cpuinfo_x86 ...
2023-10-31Merge tag 'x86-apic-2023-10-29-v2' of ↵Linus Torvalds3-8/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 APIC updates from Thomas Gleixner: - Make the quirk for non-maskable MSI interrupts in the affinity setter functional again. It was broken by a MSI core code update, which restructured the code in a way that the quirk flag was not longer set correctly. Trying to restore the core logic caused a deeper inspection and it turned out that the extra quirk flag is not required at all because it's the inverse of the reservation mode bit, which only can be set when the MSI interrupt is maskable. So the trivial fix is to use the reservation mode check in the affinity setter function and remove almost 40 lines of code related to the no-mask quirk flag. - Cure a Kconfig dependency issue which causes compile failures by correcting the conditionals in the affected header files. - Clean up coding style in the UV APIC driver. * tag 'x86-apic-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/msi: Fix misconfigured non-maskable MSI quirk x86/msi: Fix compile error caused by CONFIG_GENERIC_MSI_IRQ=y && !CONFIG_X86_LOCAL_APIC x86/platform/uv/apic: Clean up inconsistent indenting
2023-10-31Merge tag 'x86-mm-2023-10-28' of ↵Linus Torvalds3-20/+26
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm handling updates from Ingo Molnar: - Add new NX-stack self-test - Improve NUMA partial-CFMWS handling - Fix #VC handler bugs resulting in SEV-SNP boot failures - Drop the 4MB memory size restriction on minimal NUMA nodes - Reorganize headers a bit, in preparation to header dependency reduction efforts - Misc cleanups & fixes * tag 'x86-mm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Drop the 4 MB restriction on minimal NUMA node memory size selftests/x86/lam: Zero out buffer for readlink() x86/sev: Drop unneeded #include x86/sev: Move sev_setup_arch() to mem_encrypt.c x86/tdx: Replace deprecated strncpy() with strtomem_pad() selftests/x86/mm: Add new test that userspace stack is in fact NX x86/sev: Make boot_ghcb_page[] static x86/boot: Move x86_cache_alignment initialization to correct spot x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach x86/sev-es: Allow copy_from_kernel_nofault() in earlier boot x86_64: Show CR4.PSE on auxiliaries like on BSP x86/iommu/docs: Update AMD IOMMU specification document URL x86/sev/docs: Update document URL in amd-memory-encryption.rst x86/mm: Move arch_memory_failure() and arch_is_platform_page() definitions from <asm/processor.h> to <asm/pgtable.h> ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window x86/numa: Introduce numa_fill_memblks()
2023-10-31Merge tag 'x86-irq-2023-10-28' of ↵Linus Torvalds1-6/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 irq fix from Ingo Molnar: "Fix out-of-order NMI nesting checks resulting in false positive warnings" * tag 'x86-irq-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/nmi: Fix out-of-order NMI nesting checks & false positive warning
2023-10-31Merge tag 'x86-entry-2023-10-28' of ↵Linus Torvalds2-18/+26
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry updates from Ingo Molnar: - Make IA32_EMULATION boot time configurable with the new ia32_emulation=<bool> boot option - Clean up fast syscall return validation code: convert it to C and refactor the code - As part of this, optimize the canonical RIP test code * tag 'x86-entry-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry/32: Clean up syscall fast exit tests x86/entry/64: Use TASK_SIZE_MAX for canonical RIP test x86/entry/64: Convert SYSRET validation tests to C x86/entry/32: Remove SEP test for SYSEXIT x86/entry/32: Convert do_fast_syscall_32() to bool return type x86/entry/compat: Combine return value test from syscall handler x86/entry/64: Remove obsolete comment on tracing vs. SYSRET x86: Make IA32_EMULATION boot time configurable x86/entry: Make IA32 syscalls' availability depend on ia32_enabled() x86/elf: Make loading of 32bit processes depend on ia32_enabled() x86/entry: Compile entry_SYSCALL32_ignore() unconditionally x86/entry: Rename ignore_sysret() x86: Introduce ia32_enabled()
2023-10-31Merge tag 'x86-boot-2023-10-28' of ↵Linus Torvalds1-9/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: - Rework PE header generation, primarily to generate a modern, 4k aligned kernel image view with narrower W^X permissions. - Further refine init-lifetime annotations - Misc cleanups & fixes * tag 'x86-boot-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/boot: efistub: Assign global boot_params variable x86/boot: Rename conflicting 'boot_params' pointer to 'boot_params_ptr' x86/head/64: Move the __head definition to <asm/init.h> x86/head/64: Add missing __head annotation to startup_64_load_idt() x86/head/64: Mark 'startup_gdt[]' and 'startup_gdt_descr' as __initdata x86/boot: Harmonize the style of array-type parameter for fixup_pointer() calls x86/boot: Fix incorrect startup_gdt_descr.size x86/boot: Compile boot code with -std=gnu11 too x86/boot: Increase section and file alignment to 4k/512 x86/boot: Split off PE/COFF .data section x86/boot: Drop PE/COFF .reloc section x86/boot: Construct PE/COFF .text section from assembler x86/boot: Derive file size from _edata symbol x86/boot: Define setup size in linker script x86/boot: Set EFI handover offset directly in header asm x86/boot: Grab kernel_info offset from zoffset header directly x86/boot: Drop references to startup_64 x86/boot: Drop redundant code setting the root device x86/boot: Omit compression buffer from PE/COFF image memory footprint x86/boot: Remove the 'bugger off' message ...
2023-10-31Merge tag 'x86-headers-2023-10-28' of ↵Linus Torvalds5-6/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 header file cleanup from Ingo Molnar: "Replace <asm/export.h> uses with <linux/export.h> and then remove <asm/export.h>" * tag 'x86-headers-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/headers: Remove <asm/export.h> x86/headers: Replace #include <asm/export.h> with #include <linux/export.h> x86/headers: Remove unnecessary #include <asm/export.h>
2023-10-31Merge tag 'objtool-core-2023-10-28' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: "Misc fixes and cleanups: - Fix potential MAX_NAME_LEN limit related build failures - Fix scripts/faddr2line symbol filtering bug - Fix scripts/faddr2line on LLVM=1 - Fix scripts/faddr2line to accept readelf output with mapping symbols - Minor cleanups" * tag 'objtool-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: scripts/faddr2line: Skip over mapping symbols in output from readelf scripts/faddr2line: Use LLVM addr2line and readelf if LLVM=1 scripts/faddr2line: Don't filter out non-function symbols from readelf objtool: Remove max symbol name length limitation objtool: Propagate early errors objtool: Use 'the fallthrough' pseudo-keyword x86/speculation, objtool: Use absolute relocations for annotations x86/unwind/orc: Remove redundant initialization of 'mid' pointer in __orc_find()
2023-10-31Merge tag 'sched-core-2023-10-28' of ↵Linus Torvalds1-2/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Fair scheduler (SCHED_OTHER) improvements: - Remove the old and now unused SIS_PROP code & option - Scan cluster before LLC in the wake-up path - Use candidate prev/recent_used CPU if scanning failed for cluster wakeup NUMA scheduling improvements: - Improve the VMA access-PID code to better skip/scan VMAs - Extend tracing to cover VMA-skipping decisions - Improve/fix the recently introduced sched_numa_find_nth_cpu() code - Generalize numa_map_to_online_node() Energy scheduling improvements: - Remove the EM_MAX_COMPLEXITY limit - Add tracepoints to track energy computation - Make the behavior of the 'sched_energy_aware' sysctl more consistent - Consolidate and clean up access to a CPU's max compute capacity - Fix uclamp code corner cases RT scheduling improvements: - Drive dl_rq->overloaded with dl_rq->pushable_dl_tasks updates - Drive the ->rto_mask with rt_rq->pushable_tasks updates Scheduler scalability improvements: - Rate-limit updates to tg->load_avg - On x86 disable IBRS when CPU is offline to improve single-threaded performance - Micro-optimize in_task() and in_interrupt() - Micro-optimize the PSI code - Avoid updating PSI triggers and ->rtpoll_total when there are no state changes Core scheduler infrastructure improvements: - Use saved_state to reduce some spurious freezer wakeups - Bring in a handful of fast-headers improvements to scheduler headers - Make the scheduler UAPI headers more widely usable by user-space - Simplify the control flow of scheduler syscalls by using lock guards - Fix sched_setaffinity() vs. CPU hotplug race Scheduler debuggability improvements: - Disallow writing invalid values to sched_rt_period_us - Fix a race in the rq-clock debugging code triggering warnings - Fix a warning in the bandwidth distribution code - Micro-optimize in_atomic_preempt_off() checks - Enforce that the tasklist_lock is held in for_each_thread() - Print the TGID in sched_show_task() - Remove the /proc/sys/kernel/sched_child_runs_first sysctl ... and misc cleanups & fixes" * tag 'sched-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits) sched/fair: Remove SIS_PROP sched/fair: Use candidate prev/recent_used CPU if scanning failed for cluster wakeup sched/fair: Scan cluster before scanning LLC in wake-up path sched: Add cpus_share_resources API sched/core: Fix RQCF_ACT_SKIP leak sched/fair: Remove unused 'curr' argument from pick_next_entity() sched/nohz: Update comments about NEWILB_KICK sched/fair: Remove duplicate #include sched/psi: Update poll => rtpoll in relevant comments sched: Make PELT acronym definition searchable sched: Fix stop_one_cpu_nowait() vs hotplug sched/psi: Bail out early from irq time accounting sched/topology: Rename 'DIE' domain to 'PKG' sched/psi: Delete the 'update_total' function parameter from update_triggers() sched/psi: Avoid updating PSI triggers and ->rtpoll_total when there are no state changes sched/headers: Remove comment referring to rq::cpu_load, since this has been removed sched/numa: Complete scanning of inactive VMAs when there is no alternative sched/numa: Complete scanning of partial VMAs regardless of PID activity sched/numa: Move up the access pid reset logic sched/numa: Trace decisions related to skipping VMAs ...
2023-10-31Merge tag 'x86_fpu_for_6.7_rc1' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu fixlet from Borislav Petkov: - kernel-doc fix * tag 'x86_fpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu/xstate: Address kernel-doc warning
2023-10-31Merge tag 'x86_platform_for_6.7_rc1' of ↵Linus Torvalds3-5/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Borislav Petkov: - Make sure PCI function 4 IDs of AMD family 0x19, models 0x60-0x7f are actually used in the amd_nb.c enumeration - Add support for extracting NUMA information from devicetree for Hyper-V usages - Add PCI device IDs for the new AMD MI300 AI accelerators - Annotate an array in struct uv_rtc_timer_head with the new __counted_by attribute - Rework UV's NMI action parameter handling * tag 'x86_platform_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/amd_nb: Use Family 19h Models 60h-7Fh Function 4 IDs x86/numa: Add Devicetree support x86/of: Move the x86_flattree_get_config() call out of x86_dtb_init() x86/amd_nb: Add AMD Family MI300 PCI IDs x86/platform/uv: Annotate struct uv_rtc_timer_head with __counted_by x86/platform/uv: Rework NMI "action" modparam handling
2023-10-31Merge tag 'x86_cpu_for_6.7_rc1' of ↵Linus Torvalds2-1/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid updates from Borislav Petkov: - Make sure the "svm" feature flag is cleared from /proc/cpuinfo when virtualization support is disabled in the BIOS on AMD and Hygon platforms - A minor cleanup * tag 'x86_cpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/amd: Remove redundant 'break' statement x86/cpu: Clear SVM feature if disabled by BIOS
2023-10-31Merge tag 'x86_cache_for_6.7_rc1' of ↵Linus Torvalds4-95/+242
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Add support for non-contiguous capacity bitmasks being added to Intel's CAT implementation - Other improvements to resctrl code: better configuration, simplifications, debugging support, fixes * tag 'x86_cache_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Display RMID of resource group x86/resctrl: Add support for the files of MON groups only x86/resctrl: Display CLOSID for resource group x86/resctrl: Introduce "-o debug" mount option x86/resctrl: Move default group file creation to mount x86/resctrl: Unwind properly from rdt_enable_ctx() x86/resctrl: Rename rftype flags for consistency x86/resctrl: Simplify rftype flag definitions x86/resctrl: Add multiple tasks to the resctrl group at once Documentation/x86: Document resctrl's new sparse_masks x86/resctrl: Add sparse_masks file in info x86/resctrl: Enable non-contiguous CBMs in Intel CAT x86/resctrl: Rename arch_has_sparse_bitmaps x86/resctrl: Fix remaining kernel-doc warnings
2023-10-31Merge tag 'x86_bugs_for_6.7_rc1' of ↵Linus Torvalds3-55/+52
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 hw mitigation updates from Borislav Petkov: - A bunch of improvements, cleanups and fixlets to the SRSO mitigation machinery and other, general cleanups to the hw mitigations code, by Josh Poimboeuf - Improve the return thunk detection by objtool as it is absolutely important that the default return thunk is not used after returns have been patched. Future work to detect and report this better is pending - Other misc cleanups and fixes * tag 'x86_bugs_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/retpoline: Document some thunk handling aspects x86/retpoline: Make sure there are no unconverted return thunks due to KCSAN x86/callthunks: Delete unused "struct thunk_desc" x86/vdso: Run objtool on vdso32-setup.o objtool: Fix return thunk patching in retpolines x86/srso: Remove unnecessary semicolon x86/pti: Fix kernel warnings for pti= and nopti cmdline options x86/calldepth: Rename __x86_return_skl() to call_depth_return_thunk() x86/nospec: Refactor UNTRAIN_RET[_*] x86/rethunk: Use SYM_CODE_START[_LOCAL]_NOALIGN macros x86/srso: Disentangle rethunk-dependent options x86/srso: Move retbleed IBPB check into existing 'has_microcode' code block x86/bugs: Remove default case for fully switched enums x86/srso: Remove 'pred_cmd' label x86/srso: Unexport untraining functions x86/srso: Improve i-cache locality for alias mitigation x86/srso: Fix unret validation dependencies x86/srso: Fix vulnerability reporting for missing microcode x86/srso: Print mitigation for retbleed IBPB case x86/srso: Print actual mitigation if requested mitigation isn't possible ...
2023-10-31Merge tag 'ras_core_for_6.7_rc1' of ↵Linus Torvalds4-26/+98
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Borislav Petkov: - Specify what error addresses reported on AMD are actually usable memory error addresses for further decoding * tag 'ras_core_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Cleanup mce_usable_address() x86/mce: Define amd_mce_usable_address() x86/MCE/AMD: Split amd_mce_is_memory_error()
2023-10-28Merge tag 'x86-urgent-2023-10-28' of ↵Linus Torvalds3-9/+42
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix a possible CPU hotplug deadlock bug caused by the new TSC synchronization code - Fix a legacy PIC discovery bug that results in device troubles on affected systems, such as non-working keybards, etc - Add a new Intel CPU model number to <asm/intel-family.h> * tag 'x86-urgent-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsc: Defer marking TSC unstable to a worker x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibility x86/cpu: Add model number for Intel Arrow Lake mobile processor
2023-10-27x86/tsc: Defer marking TSC unstable to a workerThomas Gleixner1-1/+9
Tetsuo reported the following lockdep splat when the TSC synchronization fails during CPU hotplug: tsc: Marking TSC unstable due to check_tsc_sync_source failed WARNING: inconsistent lock state inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. ffffffff8cfa1c78 (watchdog_lock){?.-.}-{2:2}, at: clocksource_watchdog+0x23/0x5a0 {IN-HARDIRQ-W} state was registered at: _raw_spin_lock_irqsave+0x3f/0x60 clocksource_mark_unstable+0x1b/0x90 mark_tsc_unstable+0x41/0x50 check_tsc_sync_source+0x14f/0x180 sysvec_call_function_single+0x69/0x90 Possible unsafe locking scenario: lock(watchdog_lock); <Interrupt> lock(watchdog_lock); stack backtrace: _raw_spin_lock+0x30/0x40 clocksource_watchdog+0x23/0x5a0 run_timer_softirq+0x2a/0x50 sysvec_apic_timer_interrupt+0x6e/0x90 The reason is the recent conversion of the TSC synchronization function during CPU hotplug on the control CPU to a SMP function call. In case that the synchronization with the upcoming CPU fails, the TSC has to be marked unstable via clocksource_mark_unstable(). clocksource_mark_unstable() acquires 'watchdog_lock', but that lock is taken with interrupts enabled in the watchdog timer callback to minimize interrupt disabled time. That's obviously a possible deadlock scenario, Before that change the synchronization function was invoked in thread context so this could not happen. As it is not crucical whether the unstable marking happens slightly delayed, defer the call to a worker thread which avoids the lock context problem. Fixes: 9d349d47f0e3 ("x86/smpboot: Make TSC synchronization function call based") Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87zg064ceg.ffs@tglx
2023-10-27x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibilityThomas Gleixner2-8/+33
David and a few others reported that on certain newer systems some legacy interrupts fail to work correctly. Debugging revealed that the BIOS of these systems leaves the legacy PIC in uninitialized state which makes the PIC detection fail and the kernel switches to a dummy implementation. Unfortunately this fallback causes quite some code to fail as it depends on checks for the number of legacy PIC interrupts or the availability of the real PIC. In theory there is no reason to use the PIC on any modern system when IO/APIC is available, but the dependencies on the related checks cannot be resolved trivially and on short notice. This needs lots of analysis and rework. The PIC detection has been added to avoid quirky checks and force selection of the dummy implementation all over the place, especially in VM guest scenarios. So it's not an option to revert the relevant commit as that would break a lot of other scenarios. One solution would be to try to initialize the PIC on detection fail and retry the detection, but that puts the burden on everything which does not have a PIC. Fortunately the ACPI/MADT table header has a flag field, which advertises in bit 0 that the system is PCAT compatible, which means it has a legacy 8259 PIC. Evaluate that bit and if set avoid the detection routine and keep the real PIC installed, which then gets initialized (for nothing) and makes the rest of the code with all the dependencies work again. Fixes: e179f6914152 ("x86, irq, pic: Probe for legacy PIC and set legacy_pic appropriately") Reported-by: David Lazar <dlazar@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: David Lazar <dlazar@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Cc: stable@vger.kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218003 Link: https://lore.kernel.org/r/875y2u5s8g.ffs@tglx
2023-10-26x86/apic/msi: Fix misconfigured non-maskable MSI quirkKoichiro Den1-5/+3
commit ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted"), reworked the code so that the x86 specific quirk for affinity setting of non-maskable PCI/MSI interrupts is not longer activated if necessary. This could be solved by restoring the original logic in the core MSI code, but after a deeper analysis it turned out that the quirk flag is not required at all. The quirk is only required when the PCI/MSI device cannot mask the MSI interrupts, which in turn also prevents reservation mode from being enabled for the affected interrupt. This allows ot remove the NOMASK quirk bit completely as msi_set_affinity() can instead check whether reservation mode is enabled for the interrupt, which gives exactly the same answer. Even in the momentary non-existing case that the reservation mode would be not set for a maskable MSI interrupt this would not cause any harm as it just would cause msi_set_affinity() to go needlessly through the functionaly equivalent slow path, which works perfectly fine with maskable interrupts as well. Rework msi_set_affinity() to query the reservation mode and remove all NOMASK quirk logic from the core code. [ tglx: Massaged changelog ] Fixes: ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231026032036.2462428-1-den@valinux.co.jp