summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-05-25Merge branch 'locking/futex' into locking/core, to pick up pending futex changesIngo Molnar35-564/+2479
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-21selftests/futex: Fix spelling mistake "unitiliazed" -> "uninitialized"Colin Ian King1-1/+1
There is a spelling mistake in a fail error message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20250520080657.30726-1-colin.i.king@gmail.com
2025-05-21futex: Correct the kernedoc return value for futex_wait_setup().Sebastian Andrzej Siewior1-1/+2
The kerneldoc for futex_wait_setup() states it can return "0" or "<1". This isn't true because the error case is "<0" not less than 1. Document that <0 is returned on error. Drop the possible return values and state possible reasons. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20250517151455.1065363-6-bigeasy@linutronix.de
2025-05-21tools headers: Synchronize prctl.h ABI headerSebastian Andrzej Siewior3-11/+15
The prctl.h ABI header was slightly updated during the development of the interface. In particular the "immutable" parameter became a bit in the option argument. Synchronize prctl.h ABI header again and make use of the definition in the testsuite and "perf bench futex". Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20250517151455.1065363-5-bigeasy@linutronix.de
2025-05-21futex: Use RCU_INIT_POINTER() in futex_mm_init().Sebastian Andrzej Siewior1-8/+1
There is no need for an explicit NULL pointer initialisation plus a comment why it is okay. RCU_INIT_POINTER() can be used for NULL initialisations and it is documented. This has been build tested with gcc version 9.3.0 (Debian 9.3.0-22) on a x86-64 defconfig. Fixes: 094ac8cff7858 ("futex: Relax the rcu_assign_pointer() assignment of mm->futex_phash in futex_mm_init()") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250517151455.1065363-4-bigeasy@linutronix.de
2025-05-21selftests/futex: Use TAP output in futex_numa_mpolSebastian Andrzej Siewior1-33/+32
Use TAP output for easier automated testing. Suggested-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20250517151455.1065363-3-bigeasy@linutronix.de
2025-05-21selftests/futex: Use TAP output in futex_priv_hashSebastian Andrzej Siewior1-86/+62
Use TAP output for easier automated testing. Suggested-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20250517151455.1065363-2-bigeasy@linutronix.de
2025-05-16futex: Fix kernel-doc commentsBorislav Petkov (AMD)2-3/+3
Fix those: ./kernel/futex/futex.h:208: warning: Function parameter or struct member 'drop_hb_ref' not described in 'futex_q' ./kernel/futex/waitwake.c:343: warning: expecting prototype for futex_wait_queue(). Prototype was for futex_do_wait() instead ./kernel/futex/waitwake.c:594: warning: Function parameter or struct member 'task' not described in 'futex_wait_setup' Fixes: 93f1b6d79a73 ("futex: Move futex_queue() into futex_wait_setup()") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250512185641.0450a99b@canb.auug.org.au # report Link: https://lore.kernel.org/r/20250515171641.24073-1-bp@kernel.org # submission
2025-05-11futex: Relax the rcu_assign_pointer() assignment of mm->futex_phash in ↵Ingo Molnar1-1/+8
futex_mm_init() The following commit added an rcu_assign_pointer() assignment to futex_mm_init() in <linux/futex.h>: bd54df5ea7ca ("futex: Allow to resize the private local hash") Which breaks the build on older compilers (gcc-9, x86-64 defconfig): CC io_uring/futex.o In file included from ./arch/x86/include/generated/asm/rwonce.h:1, from ./include/linux/compiler.h:390, from ./include/linux/array_size.h:5, from ./include/linux/kernel.h:16, from io_uring/futex.c:2: ./include/linux/futex.h: In function 'futex_mm_init': ./include/linux/rcupdate.h:555:36: error: dereferencing pointer to incomplete type 'struct futex_private_hash' The problem is that this variant of rcu_assign_pointer() wants to know the full type of 'struct futex_private_hash', which type is local to futex.c: kernel/futex/core.c:struct futex_private_hash { There are a couple of mechanical solutions for this bug: - we can uninline futex_mm_init() and move it into futex/core.c - or we can share the structure definition with kernel/fork.c. But both of these solutions have disadvantages: the first one adds runtime overhead, while the second one dis-encapsulates private futex types. A third solution, implemented by this patch, is to just initialize mm->futex_phash with NULL like the patch below, it's not like this new MM's ->futex_phash can be observed externally until the task is inserted into the task list, which guarantees full store ordering. The relaxation of this initialization might also give a tiny speedup on certain platforms. Fixes: bd54df5ea7ca ("futex: Allow to resize the private local hash") Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: André Almeida <andrealmeid@igalia.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/aB8SI00EHBri23lB@gmail.com
2025-05-08futex: Fix outdated comment in struct restart_blockNam Cao1-1/+1
Since commit 2070887fdeac ("futex: fix restart in wait_requeue_pi"), futex_wait_requeue_pi() no longer uses restart_block. Update the comment accordingly. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250428193445.4571-1-namcao@linutronix.de
2025-05-06locking/lockdep: Add number of dynamic keys to /proc/lockdep_statsWaiman Long3-0/+6
There have been recent reports about running out of lockdep keys: MAX_LOCKDEP_KEYS too low! One possible reason is that too many dynamic keys have been registered. A possible culprit is the lockdep_register_key() call in qdisc_alloc() of net/sched/sch_generic.c. Currently, there is no way to find out how many dynamic keys have been registered. Add such a stat to the /proc/lockdep_stats to get better clarity. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20250506042049.50060-4-boqun.feng@gmail.com
2025-05-06locking/lockdep: Prevent abuse of lockdep subclassWaiman Long1-0/+3
To catch the code trying to use a subclass value >= MAX_LOCKDEP_SUBCLASSES (8), add a DEBUG_LOCKS_WARN_ON() statement to notify the users that such a large value is not allowed. [ boqun: Reword the commit log with a more objective tone ] Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20250506042049.50060-3-boqun.feng@gmail.com
2025-05-06locking/lockdep: Move hlock_equal() to the respective #ifdefferyAndy Shevchenko1-35/+35
When hlock_equal() is unused, it prevents kernel builds with clang, `make W=1` and CONFIG_WERROR=y, CONFIG_LOCKDEP=y and CONFIG_LOCKDEP_SMALL=n: lockdep.c:2005:20: error: unused function 'hlock_equal' [-Werror,-Wunused-function] Fix this by moving the function to the respective existing ifdeffery for its the only user. See also: 6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build") Fixes: 68e305678583 ("lockdep: Adjust check_redundant() for recursive read change") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20250506042049.50060-2-boqun.feng@gmail.com
2025-05-04Linux 6.15-rc5v6.15-rc5Linus Torvalds1-1/+1
2025-05-04Merge tag 'perf-tools-fixes-for-v6.15-2025-05-04' of ↵Linus Torvalds2-2/+13
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Namhyung Kim: "Just a couple of build fixes on arm64" * tag 'perf-tools-fixes-for-v6.15-2025-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf tools: Fix in-source libperf build perf tools: Fix arm64 build by generating unistd_64.h
2025-05-04Merge tag 'trace-v6.15-rc4' of ↵Linus Torvalds3-6/+9
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix read out of bounds bug in tracing_splice_read_pipe() The size of the sub page being read can now be greater than a page. But the buffer used in tracing_splice_read_pipe() only allocates a page size. The data copied to the buffer is the amount in sub buffer which can overflow the buffer. Use min((size_t)trace_seq_used(&iter->seq), PAGE_SIZE) to limit the amount copied to the buffer to a max of PAGE_SIZE. - Fix the test for NULL from "!filter_hash" to "!*filter_hash" The add_next_hash() function checked for NULL at the wrong pointer level. - Do not use the array in trace_adjust_address() if there are no elements The trace_adjust_address() finds the offset of a module that was stored in the persistent buffer when reading the previous boot buffer to see if the address belongs to a module that was loaded in the previous boot. An array is created that matches currently loaded modules with previously loaded modules. The trace_adjust_address() uses that array to find the new offset of the address that's in the previous buffer. But if no module was loaded, it ends up reading the last element in an array that was never allocated. Check if nr_entries is zero and exit out early if it is. - Remove nested lock of trace_event_sem in print_event_fields() The print_event_fields() function iterates over the ftrace_events list and requires the trace_event_sem semaphore held for read. But this function is always called with that semaphore held for read. Remove the taking of the semaphore and replace it with lockdep_assert_held_read(&trace_event_sem) * tag 'trace-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Do not take trace_event_sem in print_event_fields() tracing: Fix trace_adjust_address() when there is no modules in scratch area ftrace: Fix NULL memory allocation check tracing: Fix oob write in trace_seq_to_buffer()
2025-05-04Merge tag 'parisc-for-6.15-rc5' of ↵Linus Torvalds1-3/+13
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fix from Helge Deller: "Fix a double SIGFPE crash" * tag 'parisc-for-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix double SIGFPE crash
2025-05-04parisc: Fix double SIGFPE crashHelge Deller1-3/+13
Camm noticed that on parisc a SIGFPE exception will crash an application with a second SIGFPE in the signal handler. Dave analyzed it, and it happens because glibc uses a double-word floating-point store to atomically update function descriptors. As a result of lazy binding, we hit a floating-point store in fpe_func almost immediately. When the T bit is set, an assist exception trap occurs when when the co-processor encounters *any* floating-point instruction except for a double store of register %fr0. The latter cancels all pending traps. Let's fix this by clearing the Trap (T) bit in the FP status register before returning to the signal handler in userspace. The issue can be reproduced with this test program: root@parisc:~# cat fpe.c static void fpe_func(int sig, siginfo_t *i, void *v) { sigset_t set; sigemptyset(&set); sigaddset(&set, SIGFPE); sigprocmask(SIG_UNBLOCK, &set, NULL); printf("GOT signal %d with si_code %ld\n", sig, i->si_code); } int main() { struct sigaction action = { .sa_sigaction = fpe_func, .sa_flags = SA_RESTART|SA_SIGINFO }; sigaction(SIGFPE, &action, 0); feenableexcept(FE_OVERFLOW); return printf("%lf\n",1.7976931348623158E308*1.7976931348623158E308); } root@parisc:~# gcc fpe.c -lm root@parisc:~# ./a.out Floating point exception root@parisc:~# strace -f ./a.out execve("./a.out", ["./a.out"], 0xf9ac7034 /* 20 vars */) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0 ... rt_sigaction(SIGFPE, {sa_handler=0x1110a, sa_mask=[], sa_flags=SA_RESTART|SA_SIGINFO}, NULL, 8) = 0 --- SIGFPE {si_signo=SIGFPE, si_code=FPE_FLTOVF, si_addr=0x1078f} --- --- SIGFPE {si_signo=SIGFPE, si_code=FPE_FLTOVF, si_addr=0xf8f21237} --- +++ killed by SIGFPE +++ Floating point exception Signed-off-by: Helge Deller <deller@gmx.de> Suggested-by: John David Anglin <dave.anglin@bell.net> Reported-by: Camm Maguire <camm@maguirefamily.org> Cc: stable@vger.kernel.org
2025-05-04Merge tag 'edac_urgent_for_v6.15_rc5' of ↵Linus Torvalds2-4/+7
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fixes from Borislav Petkov: - Test the correct structure member when handling correctable errors and avoid spurious interrupts, in altera_edac * tag 'edac_urgent_for_v6.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/altera: Set DDR and SDMMC interrupt mask before registration EDAC/altera: Test the correct error reg offset
2025-05-04Merge tag 'x86-urgent-2025-05-04' of ↵Linus Torvalds3-4/+43
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "Fix SEV-SNP memory acceptance from the EFI stub for guests running at VMPL >0" * tag 'x86-urgent-2025-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/sev: Support memory acceptance in the EFI stub under SVSM
2025-05-04Merge tag 'perf-urgent-2025-05-04' of ↵Linus Torvalds4-6/+30
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc perf fixes from Ingo Molnar: - Require group events for branch counter groups and PEBS counter snapshotting groups to be x86 events. - Fix the handling of counter-snapshotting of non-precise events, where counter values may move backwards a bit, temporarily, confusing the code. - Restrict perf/KVM PEBS to guest-owned events. * tag 'perf-urgent-2025-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value. perf/x86/intel/ds: Fix counter backwards of non-precise events counters-snapshotting perf/x86/intel: Check the X86 leader for pebs_counter_event_group perf/x86/intel: Only check the group flag for X86 leader
2025-05-04Merge tag 'irq-urgent-2025-05-04' of ↵Linus Torvalds2-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: - Prevent NULL pointer dereference in msi_domain_debug_show() - Fix crash in the qcom-mpm irqchip driver when configuring interrupts for non-wake GPIOs * tag 'irq-urgent-2025-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/qcom-mpm: Prevent crash when trying to handle non-wake GPIOs genirq/msi: Prevent NULL pointer dereference in msi_domain_debug_show()
2025-05-04x86/boot/sev: Support memory acceptance in the EFI stub under SVSMArd Biesheuvel3-4/+43
Commit: d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance") provided a fix for SEV-SNP memory acceptance from the EFI stub when running at VMPL #0. However, that fix was insufficient for SVSM SEV-SNP guests running at VMPL >0, as those rely on a SVSM calling area, which is a shared buffer whose address is programmed into a SEV-SNP MSR, and the SEV init code that sets up this calling area executes much later during the boot. Given that booting via the EFI stub at VMPL >0 implies that the firmware has configured this calling area already, reuse it for performing memory acceptance in the EFI stub. Fixes: fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0") Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250428174322.2780170-2-ardb+git@google.com
2025-05-04Merge tag 'arm64-fixes' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Add missing sentinels to the arm64 Spectre-BHB MIDR arrays, otherwise is_midr_in_range_list() reads beyond the end of these arrays" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: errata: Add missing sentinels to Spectre-BHB MIDR arrays
2025-05-04Merge tag 'i2c-for-6.15-rc5' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: - imx-lpi2c: fix clock error handling sequence in probe * tag 'i2c-for-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: imx-lpi2c: Fix clock count when probe defers
2025-05-03Merge tag 'sound-6.15-rc5' of ↵Linus Torvalds30-43/+206
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A bunch of small fixes. Mostly driver specific. - An OOB access fix in core UMP rawmidi conversion code - Fix for ASoC DAPM hw_params widget sequence - Make retry of usb_set_interface() errors for flaky devices - Fix redundant USB MIDI name strings - Quirks for various HP and ASUS models with HD-audio, and Jabra Evolve 65 USB-audio - Cirrus Kunit test fixes - Various fixes for ASoC Intel, stm32, renesas, imx-card, and simple-card" * tag 'sound-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (30 commits) ASoC: amd: ps: fix for irq handler return status ASoC: simple-card-utils: Fix pointer check in graph_util_parse_link_direction ASoC: intel/sdw_utils: Add volume limit to cs35l56 speakers ASoC: intel/sdw_utils: Add volume limit to cs42l43 speakers ASoC: stm32: sai: add a check on minimal kernel frequency ASoC: stm32: sai: skip useless iterations on kernel rate loop ALSA: hda/realtek - Add more HP laptops which need mute led fixup ALSA: hda/realtek: Fix built-mic regression on other ASUS models ASoC: Intel: catpt: avoid type mismatch in dev_dbg() format ALSA: usb-audio: Fix duplicated name in MIDI substream names ALSA: ump: Fix buffer overflow at UMP SysEx message conversion ALSA: usb-audio: Add second USB ID for Jabra Evolve 65 headset ALSA: hda/realtek: Add quirk for HP Spectre x360 15-df1xxx ALSA: hda: Apply volume control on speaker+lineout for HP EliteStudio AIO ASoC: Intel: bytcr_rt5640: Add DMI quirk for Acer Aspire SW3-013 ASoC: amd: acp: Fix devm_snd_soc_register_card(acp-pdm-mach) failure ASoC: amd: acp: Fix NULL pointer deref in acp_i2s_set_tdm_slot ASoC: amd: acp: Fix NULL pointer deref on acp resume path ASoC: renesas: rz-ssi: Use NOIRQ_SYSTEM_SLEEP_PM_OPS() ASoC: soc-acpi-intel-ptl-match: add empty item to ptl_cs42l43_l3[] ...
2025-05-03futex,selftests: Add another FUTEX2_NUMA selftestPeter Zijlstra2-1/+264
Implement a simple NUMA aware spinlock for testing and howto purposes. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-05-03selftests/futex: Add futex_numa_mpolSebastian Andrzej Siewior5-1/+290
Test the basic functionality for the NUMA and MPOL flags: - FUTEX2_NUMA should take the NUMA node which is after the uaddr and use it. - Only update the node if FUTEX_NO_NODE was set by the user - FUTEX2_MPOL should use the memory based on the policy. I attempted to set the node with mbind() and then use this with MPOL but this fails and futex falls back to the default node for the current CPU. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-22-bigeasy@linutronix.de
2025-05-03selftests/futex: Add futex_priv_hashSebastian Andrzej Siewior4-2/+323
Test the basic functionality of the private hash: - Upon start, with no threads there is no private hash. - The first thread initializes the private hash. - More than four threads will increase the size of the private hash if the system has more than 16 CPUs online. - Once the user sets the size of private hash, auto scaling is disabled. - The user is only allowed to use numbers to the power of two. - The user may request the global or make the hash immutable. - Once the global hash has been set or the hash has been made immutable, further changes are not allowed. - Futex operations should work the whole time. It must be possible to hold a lock, such a PI initialised mutex, during the resize operation. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-21-bigeasy@linutronix.de
2025-05-03selftests/futex: Build without headers nonsensePeter Zijlstra1-0/+18
Make it build without relying on recent headers. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-05-03tools/perf: Allow to select the number of hash bucketsSebastian Andrzej Siewior8-1/+101
Add the -b/ --buckets argument to specify the number of hash buckets for the private futex hash. This is directly passed to prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, buckets, immutable) and must return without an error if specified. The `immutable' is 0 by default and can be set to 1 via the -I/ --immutable argument. The size of the private hash is verified with PR_FUTEX_HASH_GET_SLOTS. If PR_FUTEX_HASH_GET_SLOTS failed then it is assumed that an older kernel was used without the support and that the global hash is used. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-20-bigeasy@linutronix.de
2025-05-03tools headers: Synchronize prctl.h ABI headerSebastian Andrzej Siewior1-1/+43
Synchronize prctl.h with current uapi version after adding PR_FUTEX_HASH. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-19-bigeasy@linutronix.de
2025-05-03futex: Implement FUTEX2_MPOLPeter Zijlstra5-18/+115
Extend the futex2 interface to be aware of mempolicy. When FUTEX2_MPOL is specified and there is a MPOL_PREFERRED or home_node specified covering the futex address, use that hash-map. Notably, in this case the futex will go to the global node hashtable, even if it is a PRIVATE futex. When FUTEX2_NUMA|FUTEX2_MPOL is specified and the user specified node value is FUTEX_NO_NODE, the MPOL lookup (as described above) will be tried first before reverting to setting node to the local node. [bigeasy: add CONFIG_FUTEX_MPOL, add MPOL to FUTEX2_VALID_MASK, write the node only to user if FUTEX_NO_NODE was supplied] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-18-bigeasy@linutronix.de
2025-05-03futex: Implement FUTEX2_NUMAPeter Zijlstra4-20/+123
Extend the futex2 interface to be numa aware. When FUTEX2_NUMA is specified for a futex, the user value is extended to two words (of the same size). The first is the user value we all know, the second one will be the node to place this futex on. struct futex_numa_32 { u32 val; u32 node; }; When node is set to ~0, WAIT will set it to the current node_id such that WAKE knows where to find it. If userspace corrupts the node value between WAIT and WAKE, the futex will not be found and no wakeup will happen. When FUTEX2_NUMA is not set, the node is simply an extension of the hash, such that traditional futexes are still interleaved over the nodes. This is done to avoid having to have a separate !numa hash-table. [bigeasy: ensure to have at least hashsize of 4 in futex_init(), add pr_info() for size and allocation information. Cast the naddr math to void*] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-17-bigeasy@linutronix.de
2025-05-03futex: Allow to make the private hash immutableSebastian Andrzej Siewior2-6/+45
My initial testing showed that: perf bench futex hash reported less operations/sec with private hash. After using the same amount of buckets in the private hash as used by the global hash then the operations/sec were about the same. This changed once the private hash became resizable. This feature added an RCU section and reference counting via atomic inc+dec operation into the hot path. The reference counting can be avoided if the private hash is made immutable. Extend PR_FUTEX_HASH_SET_SLOTS by a fourth argument which denotes if the private should be made immutable. Once set (to true) the a further resize is not allowed (same if set to global hash). Add PR_FUTEX_HASH_GET_IMMUTABLE which returns true if the hash can not be changed. Update "perf bench" suite. For comparison, results of "perf bench futex hash -s": - Xeon CPU E5-2650, 2 NUMA nodes, total 32 CPUs: - Before the introducing task local hash shared Averaged 1.487.148 operations/sec (+- 0,53%), total secs = 10 private Averaged 2.192.405 operations/sec (+- 0,07%), total secs = 10 - With the series shared Averaged 1.326.342 operations/sec (+- 0,41%), total secs = 10 -b128 Averaged 141.394 operations/sec (+- 1,15%), total secs = 10 -Ib128 Averaged 851.490 operations/sec (+- 0,67%), total secs = 10 -b8192 Averaged 131.321 operations/sec (+- 2,13%), total secs = 10 -Ib8192 Averaged 1.923.077 operations/sec (+- 0,61%), total secs = 10 128 is the default allocation of hash buckets. 8192 was the previous amount of allocated hash buckets. - Xeon(R) CPU E7-8890 v3, 4 NUMA nodes, total 144 CPUs: - Before the introducing task local hash shared Averaged 1.810.936 operations/sec (+- 0,26%), total secs = 20 private Averaged 2.505.801 operations/sec (+- 0,05%), total secs = 20 - With the series shared Averaged 1.589.002 operations/sec (+- 0,25%), total secs = 20 -b1024 Averaged 42.410 operations/sec (+- 0,20%), total secs = 20 -Ib1024 Averaged 740.638 operations/sec (+- 1,51%), total secs = 20 -b65536 Averaged 48.811 operations/sec (+- 1,35%), total secs = 20 -Ib65536 Averaged 1.963.165 operations/sec (+- 0,18%), total secs = 20 1024 is the default allocation of hash buckets. 65536 was the previous amount of allocated hash buckets. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20250416162921.513656-16-bigeasy@linutronix.de
2025-05-03futex: Allow to resize the private local hashSebastian Andrzej Siewior4-21/+281
The mm_struct::futex_hash_lock guards the futex_hash_bucket assignment/ replacement. The futex_hash_allocate()/ PR_FUTEX_HASH_SET_SLOTS operation can now be invoked at runtime and resize an already existing internal private futex_hash_bucket to another size. The reallocation is based on an idea by Thomas Gleixner: The initial allocation of struct futex_private_hash sets the reference count to one. Every user acquires a reference on the local hash before using it and drops it after it enqueued itself on the hash bucket. There is no reference held while the task is scheduled out while waiting for the wake up. The resize process allocates a new struct futex_private_hash and drops the initial reference. Synchronized with mm_struct::futex_hash_lock it is checked if the reference counter for the currently used mm_struct::futex_phash is marked as DEAD. If so, then all users enqueued on the current private hash are requeued on the new private hash and the new private hash is set to mm_struct::futex_phash. Otherwise the newly allocated private hash is saved as mm_struct::futex_phash_new and the rehashing and reassigning is delayed to the futex_hash() caller once the reference counter is marked DEAD. The replacement is not performed at rcuref_put() time because certain callers, such as futex_wait_queue(), drop their reference after changing the task state. This change will be destroyed once the futex_hash_lock is acquired. The user can change the number slots with PR_FUTEX_HASH_SET_SLOTS multiple times. An increase and decrease is allowed and request blocks until the assignment is done. The private hash allocated at thread creation is changed from 16 to 16 <= 4 * number_of_threads <= global_hash_size where number_of_threads can not exceed the number of online CPUs. Should the user PR_FUTEX_HASH_SET_SLOTS then the auto scaling is disabled. [peterz: reorganize the code to avoid state tracking and simplify new object handling, block the user until changes are in effect, allow increase and decrease of the hash]. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-15-bigeasy@linutronix.de
2025-05-03futex: Allow automatic allocation of process wide futex hashSebastian Andrzej Siewior3-0/+39
Allocate a private futex hash with 16 slots if a task forks its first thread. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-14-bigeasy@linutronix.de
2025-05-03futex: Add basic infrastructure for local task local hashSebastian Andrzej Siewior8-21/+244
The futex hash is system wide and shared by all tasks. Each slot is hashed based on futex address and the VMA of the thread. Due to randomized VMAs (and memory allocations) the same logical lock (pointer) can end up in a different hash bucket on each invocation of the application. This in turn means that different applications may share a hash bucket on the first invocation but not on the second and it is not always clear which applications will be involved. This can result in high latency's to acquire the futex_hash_bucket::lock especially if the lock owner is limited to a CPU and can not be effectively PI boosted. Introduce basic infrastructure for process local hash which is shared by all threads of process. This hash will only be used for a PROCESS_PRIVATE FUTEX operation. The hashmap can be allocated via: prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, num); A `num' of 0 means that the global hash is used instead of a private hash. Other values for `num' specify the number of slots for the hash and the number must be power of two, starting with two. The prctl() returns zero on success. This function can only be used before a thread is created. The current status for the private hash can be queried via: num = prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_SLOTS); which return the current number of slots. The value 0 means that the global hash is used. Values greater than 0 indicate the number of slots that are used. A negative number indicates an error. For optimisation, for the private hash jhash2() uses only two arguments the address and the offset. This omits the VMA which is always the same. [peterz: Use 0 for global hash. A bit shuffling and renaming. ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-13-bigeasy@linutronix.de
2025-05-03futex: Create helper function to initialize a hash slotSebastian Andrzej Siewior1-5/+9
Factor out the futex_hash_bucket initialisation into a helpr function. The helper function will be used in a follow up patch implementing process private hash buckets. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-12-bigeasy@linutronix.de
2025-05-03futex: Introduce futex_q_lockptr_lock()Sebastian Andrzej Siewior4-6/+53
futex_lock_pi() and __fixup_pi_state_owner() acquire the futex_q::lock_ptr without holding a reference assuming the previously obtained hash bucket and the assigned lock_ptr are still valid. This isn't the case once the private hash can be resized and becomes invalid after the reference drop. Introduce futex_q_lockptr_lock() to lock the hash bucket recorded in futex_q::lock_ptr. The lock pointer is read in a RCU section to ensure that it does not go away if the hash bucket has been replaced and the old pointer has been observed. After locking the pointer needs to be compared to check if it changed. If so then the hash bucket has been replaced and the user has been moved to the new one and lock_ptr has been updated. The lock operation needs to be redone in this case. The locked hash bucket is not returned. A special case is an early return in futex_lock_pi() (due to signal or timeout) and a successful futex_wait_requeue_pi(). In both cases a valid futex_q::lock_ptr is expected (and its matching hash bucket) but since the waiter has been removed from the hash this can no longer be guaranteed. Therefore before the waiter is removed and a reference is acquired which is later dropped by the waiter to avoid a resize. Add futex_q_lockptr_lock() and use it. Acquire an additional reference in requeue_pi_wake_futex() and futex_unlock_pi() while the futex_q is removed, denote this extra reference in futex_q::drop_hb_ref and let the waiter drop the reference in this case. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-11-bigeasy@linutronix.de
2025-05-03futex: Decrease the waiter count before the unlock operationSebastian Andrzej Siewior2-5/+5
To support runtime resizing of the process private hash, it's required to not use the obtained hash bucket once the reference count has been dropped. The reference will be dropped after the unlock of the hash bucket. The amount of waiters is decremented after the unlock operation. There is no requirement that this needs to happen after the unlock. The increment happens before acquiring the lock to signal early that there will be a waiter. The waiter can avoid blocking on the lock if it is known that there will be no waiter. There is no difference in terms of ordering if the decrement happens before or after the unlock. Decrease the waiter count before the unlock operation. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-10-bigeasy@linutronix.de
2025-05-03futex: Acquire a hash reference in futex_wait_multiple_setup()Sebastian Andrzej Siewior1-0/+6
futex_wait_multiple_setup() changes task_struct::__state to !TASK_RUNNING and then enqueues on multiple futexes. Every futex_q_lock() acquires a reference on the global hash which is dropped later. If a rehash is in progress then the loop will block on mm_struct::futex_hash_bucket for the rehash to complete and this will lose the previously set task_struct::__state. Acquire a reference on the local hash to avoiding blocking on mm_struct::futex_hash_bucket. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-9-bigeasy@linutronix.de
2025-05-03futex: Create private_hash() get/put classPeter Zijlstra2-0/+20
This gets us: fph = futex_private_hash(key) /* gets fph and inc users */ futex_private_hash_get(fph) /* inc users */ futex_private_hash_put(fph) /* dec users */ Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-8-bigeasy@linutronix.de
2025-05-03futex: Create futex_hash() get/put classPeter Zijlstra5-24/+30
This gets us: hb = futex_hash(key) /* gets hb and inc users */ futex_hash_get(hb) /* inc users */ futex_hash_put(hb) /* dec users */ Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-7-bigeasy@linutronix.de
2025-05-03futex: Create hb scopesPeter Zijlstra4-474/+493
Create explicit scopes for hb variables; almost pure re-indent. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-6-bigeasy@linutronix.de
2025-05-03futex: Pull futex_hash() out of futex_q_lock()Peter Zijlstra4-10/+8
Getting the hash bucket and queuing it are two distinct actions. In light of wanting to add a put hash bucket function later, untangle them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-5-bigeasy@linutronix.de
2025-05-03futex: Move futex_queue() into futex_wait_setup()Peter Zijlstra4-43/+42
futex_wait_setup() has a weird calling convention in order to return hb to use as an argument to futex_queue(). Mostly such that requeue can have an extra test in between. Reorder code a little to get rid of this and keep the hb usage inside futex_wait_setup(). [bigeasy: fixes] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-4-bigeasy@linutronix.de
2025-05-03mm: Add vmalloc_huge_node()Peter Zijlstra3-8/+30
To enable node specific hash-tables using huge pages if possible. [bigeasy: use __vmalloc_node_range_noprof(), add nommu bits, inline vmalloc_huge] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250416162921.513656-3-bigeasy@linutronix.de
2025-05-03rcuref: Provide rcuref_is_dead()Sebastian Andrzej Siewior1-1/+21
rcuref_read() returns the number of references that are currently held. If 0 is returned then it is not safe to assume that the object ca be scheduled for deconstruction because it is marked DEAD. This happens if the return value of rcuref_put() is ignored and assumptions are made. If 0 is returned then the counter transitioned from 0 to RCUREF_NOREF. If rcuref_put() did not return to the caller then the counter did not yet transition from RCUREF_NOREF to RCUREF_DEAD. This means that there is still a chance that the counter will transition from RCUREF_NOREF to 0 meaning it is still valid and must not be deconstructed. In this brief window rcuref_read() will return 0. Provide rcuref_is_dead() to determine if the counter is marked as RCUREF_DEAD. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-2-bigeasy@linutronix.de
2025-05-03Merge tag 'spi-fix-v6.15-rc4' of ↵Linus Torvalds5-19/+21
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A fairly small pile of fixes, plus one new compatible string addition to the Synopsis driver for a new platform. The most notable thing is the fix for divide by zeros in spi-mem if an operation has no dummy bytes" * tag 'spi-fix-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: tegra114: Don't fail set_cs_timing when delays are zero spi: spi-qpic-snand: fix NAND_READ_LOCATION_2 register handling spi: spi-mem: Add fix to avoid divide error spi: dt-bindings: snps,dw-apb-ssi: Add compatible for SOPHGO SG2042 SoC spi: dt-bindings: snps,dw-apb-ssi: Merge duplicate compatible entry spi: spi-qpic-snand: propagate errors from qcom_spi_block_erase() spi: stm32-ospi: Fix an error handling path in stm32_ospi_probe()