summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2025-05-05drm: add drm_file_err function to add process infoSunil Khatri1-0/+3
Add a drm helper function which appends the process information for the drm_file over drm_err formatted output. v5: change to macro from function (Christian Koenig) add helper functions for lock/unlock (Christian Koenig) v6: remove __maybe_unused and make function inline (Jani Nikula) remove drm_print.h v7: Use va_format and %pV to concatenate fmt and vargs (Jani Nikula) v8: Code formatting and typos (Ursulin tvrtko) Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-05RDMA/mlx5: Fix error flow upon firmware failure for RQ destructionPatrisious Haddad1-0/+1
Upon RQ destruction if the firmware command fails which is the last resource to be destroyed some SW resources were already cleaned regardless of the failure. Now properly rollback the object to its original state upon such failure. In order to avoid a use-after free in case someone tries to destroy the object again, which results in the following kernel trace: refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 37589 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x148 Modules linked in: rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) rfkill mlx5_core(OE) mlxdevm(OE) ib_uverbs(OE) ib_core(OE) psample mlxfw(OE) mlx_compat(OE) macsec tls pci_hyperv_intf sunrpc vfat fat virtio_net net_failover failover fuse loop nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_console virtio_gpu virtio_blk virtio_dma_buf virtio_mmio dm_mirror dm_region_hash dm_log dm_mod xpmem(OE) CPU: 0 UID: 0 PID: 37589 Comm: python3 Kdump: loaded Tainted: G OE ------- --- 6.12.0-54.el10.aarch64 #1 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : refcount_warn_saturate+0xf4/0x148 lr : refcount_warn_saturate+0xf4/0x148 sp : ffff80008b81b7e0 x29: ffff80008b81b7e0 x28: ffff000133d51600 x27: 0000000000000001 x26: 0000000000000000 x25: 00000000ffffffea x24: ffff00010ae80f00 x23: ffff00010ae80f80 x22: ffff0000c66e5d08 x21: 0000000000000000 x20: ffff0000c66e0000 x19: ffff00010ae80340 x18: 0000000000000006 x17: 0000000000000000 x16: 0000000000000020 x15: ffff80008b81b37f x14: 0000000000000000 x13: 2e656572662d7265 x12: ffff80008283ef78 x11: ffff80008257efd0 x10: ffff80008283efd0 x9 : ffff80008021ed90 x8 : 0000000000000001 x7 : 00000000000bffe8 x6 : c0000000ffff7fff x5 : ffff0001fb8e3408 x4 : 0000000000000000 x3 : ffff800179993000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000133d51600 Call trace: refcount_warn_saturate+0xf4/0x148 mlx5_core_put_rsc+0x88/0xa0 [mlx5_ib] mlx5_core_destroy_rq_tracked+0x64/0x98 [mlx5_ib] mlx5_ib_destroy_wq+0x34/0x80 [mlx5_ib] ib_destroy_wq_user+0x30/0xc0 [ib_core] uverbs_free_wq+0x28/0x58 [ib_uverbs] destroy_hw_idr_uobject+0x34/0x78 [ib_uverbs] uverbs_destroy_uobject+0x48/0x240 [ib_uverbs] __uverbs_cleanup_ufile+0xd4/0x1a8 [ib_uverbs] uverbs_destroy_ufile_hw+0x48/0x120 [ib_uverbs] ib_uverbs_close+0x2c/0x100 [ib_uverbs] __fput+0xd8/0x2f0 __fput_sync+0x50/0x70 __arm64_sys_close+0x40/0x90 invoke_syscall.constprop.0+0x74/0xd0 do_el0_svc+0x48/0xe8 el0_svc+0x44/0x1d0 el0t_64_sync_handler+0x120/0x130 el0t_64_sync+0x1a4/0x1a8 Fixes: e2013b212f9f ("net/mlx5_core: Add RQ and SQ event handling") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://patch.msgid.link/3181433ccdd695c63560eeeb3f0c990961732101.1745839855.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-05-05reset: Add devm_reset_control_array_get_exclusive_released()Patrice Chotard1-0/+6
Add the released variant of devm_reset_control_array_get_exclusive(). Needed by spi-smt32-ospi driver as same reset line is ulso used by stm32-omm driver. Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Link: https://lore.kernel.org/r/20250411-b4-upstream_ospi_reset_update-v2-1-4de7f5dd2a91@foss.st.com Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-05-05ALSA: core: Remove unused snd_jack_set_parentDr. David Alan Gilbert1-6/+0
snd_jack_set_parent() was added as part of 2008's commit e76d8ceaaff9 ("ALSA: Add jack reporting API") but hasn't been used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20250502235219.1000429-6-linux@treblig.org
2025-05-05ALSA: core: Remove unused snd_device_get_stateDr. David Alan Gilbert1-1/+0
snd_device_get_state() last use was removed in 2022 by commit 7e1afce5866e ("ALSA: usb-audio: Inform the delayed registration more properly") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20250502235219.1000429-5-linux@treblig.org
2025-05-05ALSA: pcm: Remove unused snd_dmaengine_pcm_open_request_chanDr. David Alan Gilbert1-2/+0
snd_dmaengine_pcm_open_request_chan() last use was removed in 2022's commit b401d1fd8053 ("ASoC: pxa: remove unused board support") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20250502235219.1000429-3-linux@treblig.org
2025-05-05ALSA: pcm: Remove unused snd_pcm_rate_range_to_bitsDr. David Alan Gilbert1-2/+0
The last use of snd_pcm_rate_range_to_bits() was removed in 2016 by commit b6b6e4d670c9 ("ASoC: topology: Fix setting of stream rates, rate_min and rate_max") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20250502235219.1000429-2-linux@treblig.org
2025-05-05crypto: ahash - Add HASH_REQUEST_ZEROHerbert Xu1-0/+4
Add a helper to zero hash stack requests that were never cloned off the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: ahash - Add core export and importHerbert Xu1-0/+24
Add crypto_ahash_export_core and crypto_ahash_import_core. For now they only differ from the normal export/import functions when going through shash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: shash - Cap state size to HASH_MAX_STATESIZEHerbert Xu1-1/+4
Now that all shash algorithms have converted over to the generic export format, limit the shash state size to HASH_MAX_STATESIZE. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: zynqmp-sha - Fix partial block implementationHerbert Xu1-0/+4
The zynqmp-sha partial block was based on an old design of the partial block API where the leftover calculation was done in the Crypto API. As the leftover calculation is now done by the algorithm, fix this by passing the partial blocks to the fallback. Also zero the stack descriptors. Fixes: 201e9ec3b621 ("crypto: zynqmp-sha - Use API partial block handling") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: lib/sha256 - Use generic block helperHerbert Xu1-7/+0
Use the BLOCK_HASH_UPDATE_BLOCKS helper instead of duplicating partial block handling. Also remove the unused lib/sha256 force-generic interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: sha256 - Use the partial block API for genericHerbert Xu1-2/+12
The shash interface already handles partial blocks, use it for sha224-generic and sha256-generic instead of going through the lib/sha256 interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: lib/sha256 - Add helpers for block-based shashHerbert Xu1-0/+45
Add an internal sha256_finup helper and move the finalisation code from __sha256_final into it. Also add sha256_choose_blocks and CRYPTO_ARCH_HAVE_LIB_SHA256_SIMD so that the Crypto API can use the SIMD block function unconditionally. The Crypto API must not be used in hard IRQs and there is no reason to have a fallback path for hardirqs. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: api - Rename CRYPTO_ALG_REQ_CHAIN to CRYPTO_ALG_REQ_VIRTHerbert Xu4-8/+8
As chaining has been removed, all that remains of REQ_CHAIN is just virtual address support. Rename it before the reintroduction of batching creates confusion. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: acomp - Clone folios properlyHerbert Xu1-6/+2
The folios contain references to the request itself so they must be setup again in the cloned request. Fixes: 5f3437e9c89e ("crypto: acomp - Simplify folio handling") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: rng - fix documentation for crypto_rng_alg()Ovidiu Panait1-5/+3
Current documentation states that crypto_rng_alg() returns the cra_name of the rng algorithm, but it actually returns a 'struct rng_alg' pointer from a RNG handle. Update documentation to reflect this. Signed-off-by: Ovidiu Panait <ovidiu.panait.oss@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: streebog - Use API partial block handlingHerbert Xu1-5/+0
Use the Crypto API partial block handling. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: lib/sha256 - improve function prototypesEric Biggers1-4/+4
Follow best practices by changing the length parameters to size_t and explicitly specifying the length of the output digest arrays. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: sha256 - remove sha256_base.hEric Biggers1-151/+0
sha256_base.h is no longer used, so remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: sha256 - support arch-optimized lib and expose through shashEric Biggers3-15/+37
As has been done for various other algorithms, rework the design of the SHA-256 library to support arch-optimized implementations, and make crypto/sha256.c expose both generic and arch-optimized shash algorithms that wrap the library functions. This allows users of the SHA-256 library functions to take advantage of the arch-optimized code, and this makes it much simpler to integrate SHA-256 for each architecture. Note that sha256_base.h is not used in the new design. It will be removed once all the architecture-specific code has been updated. Move the generic block function into its own module to avoid a circular dependency from libsha256.ko => sha256-$ARCH.ko => libsha256.ko. Signed-off-by: Eric Biggers <ebiggers@google.com> Add export and import functions to maintain existing export format. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: lib/poly1305 - Use block-only interfaceHerbert Xu1-47/+6
Now that every architecture provides a block function, use that to implement the lib/poly1305 and remove the old per-arch code. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: lib/poly1305 - Add block-only interfaceHerbert Xu2-8/+45
Add a block-only interface for poly1305. Implement the generic code first. Also use the generic partial block helper. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: lib/sha256 - Move partial block handling outHerbert Xu3-37/+62
Extract the common partial block handling into a helper macro that can be reused by other library code. Also delete the unused sha256_base_do_finalize function. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux v6.15-rc5Herbert Xu64-463/+557
Merge mainline to pick up bcachefs poly1305 patch 4bf4b5046de0 ("bcachefs: use library APIs for ChaCha20 and Poly1305"). This is a prerequisite for removing the poly1305 shash algorithm.
2025-05-04Merge back cpufreq material for 6.16Rafael J. Wysocki1-4/+7
2025-05-04Merge branch 'x86/urgent' into x86/boot, to pick up fixesIngo Molnar22-115/+199
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-04dm mpath: Interface for explicit probing of active pathsKevin Wolf1-2/+7
Multipath cannot directly provide failover for ioctls in the kernel because it doesn't know what each ioctl means and which result could indicate a path error. Userspace generally knows what the ioctl it issued means and if it might be a path error, but neither does it know which path the ioctl took nor does it necessarily have the privileges to fail a path using the control device. In order to allow userspace to address this situation, implement a DM_MPATH_PROBE_PATHS ioctl that prompts the dm-mpath driver to probe all active paths in the current path group to see whether they still work, and fail them if not. If this returns success, userspace can retry the ioctl and expect that the previously hit bad path is now failed (or working again). The immediate motivation for this is the use of SG_IO in QEMU for SCSI passthrough. Following a failed SG_IO ioctl, QEMU will trigger probing to ensure that all active paths are actually alive, so that retrying SG_IO at least has a lower chance of failing due to a path error. However, the problem is broader than just SG_IO (it affects any ioctl), and if applications need failover support for other ioctls, the same probing can be used. This is not implemented on the DM control device, but on the DM mpath block devices, to allow all users who have access to such a block device to make use of this interface, specifically to implement failover for ioctls. For the same reason, it is also unprivileged. Its implementation is effectively just a bunch of reads, which could already be issued by userspace, just without any guarantee that all the rights paths are selected. The probing implemented here is done fully synchronously path by path; probing all paths concurrently is left as an improvement for the future. Co-developed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-05-04dm: Allow .prepare_ioctl to handle ioctls directlyKevin Wolf1-1/+8
This adds a 'bool *forward' parameter to .prepare_ioctl, which allows device mapper targets to accept ioctls to themselves instead of the underlying device. If the target already fully handled the ioctl, it sets *forward to false and device mapper won't forward it to the underlying device any more. In order for targets to actually know what the ioctl is about and how to handle it, pass also cmd and arg. As long as targets restrict themselves to interpreting ioctls of type DM_IOCTL, this is a backwards compatible change because previously, any such ioctl would have been passed down through all device mapper layers until it reached a device that can't understand the ioctl and would return an error. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-05-04Merge tag 'v6.15-rc4' into x86/fpu, to pick up fixesIngo Molnar36-296/+366
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-03Merge tag 'sound-6.15-rc5' of ↵Linus Torvalds2-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A bunch of small fixes. Mostly driver specific. - An OOB access fix in core UMP rawmidi conversion code - Fix for ASoC DAPM hw_params widget sequence - Make retry of usb_set_interface() errors for flaky devices - Fix redundant USB MIDI name strings - Quirks for various HP and ASUS models with HD-audio, and Jabra Evolve 65 USB-audio - Cirrus Kunit test fixes - Various fixes for ASoC Intel, stm32, renesas, imx-card, and simple-card" * tag 'sound-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (30 commits) ASoC: amd: ps: fix for irq handler return status ASoC: simple-card-utils: Fix pointer check in graph_util_parse_link_direction ASoC: intel/sdw_utils: Add volume limit to cs35l56 speakers ASoC: intel/sdw_utils: Add volume limit to cs42l43 speakers ASoC: stm32: sai: add a check on minimal kernel frequency ASoC: stm32: sai: skip useless iterations on kernel rate loop ALSA: hda/realtek - Add more HP laptops which need mute led fixup ALSA: hda/realtek: Fix built-mic regression on other ASUS models ASoC: Intel: catpt: avoid type mismatch in dev_dbg() format ALSA: usb-audio: Fix duplicated name in MIDI substream names ALSA: ump: Fix buffer overflow at UMP SysEx message conversion ALSA: usb-audio: Add second USB ID for Jabra Evolve 65 headset ALSA: hda/realtek: Add quirk for HP Spectre x360 15-df1xxx ALSA: hda: Apply volume control on speaker+lineout for HP EliteStudio AIO ASoC: Intel: bytcr_rt5640: Add DMI quirk for Acer Aspire SW3-013 ASoC: amd: acp: Fix devm_snd_soc_register_card(acp-pdm-mach) failure ASoC: amd: acp: Fix NULL pointer deref in acp_i2s_set_tdm_slot ASoC: amd: acp: Fix NULL pointer deref on acp resume path ASoC: renesas: rz-ssi: Use NOIRQ_SYSTEM_SLEEP_PM_OPS() ASoC: soc-acpi-intel-ptl-match: add empty item to ptl_cs42l43_l3[] ...
2025-05-03futex: Implement FUTEX2_MPOLPeter Zijlstra2-1/+5
Extend the futex2 interface to be aware of mempolicy. When FUTEX2_MPOL is specified and there is a MPOL_PREFERRED or home_node specified covering the futex address, use that hash-map. Notably, in this case the futex will go to the global node hashtable, even if it is a PRIVATE futex. When FUTEX2_NUMA|FUTEX2_MPOL is specified and the user specified node value is FUTEX_NO_NODE, the MPOL lookup (as described above) will be tried first before reverting to setting node to the local node. [bigeasy: add CONFIG_FUTEX_MPOL, add MPOL to FUTEX2_VALID_MASK, write the node only to user if FUTEX_NO_NODE was supplied] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-18-bigeasy@linutronix.de
2025-05-03futex: Implement FUTEX2_NUMAPeter Zijlstra2-0/+10
Extend the futex2 interface to be numa aware. When FUTEX2_NUMA is specified for a futex, the user value is extended to two words (of the same size). The first is the user value we all know, the second one will be the node to place this futex on. struct futex_numa_32 { u32 val; u32 node; }; When node is set to ~0, WAIT will set it to the current node_id such that WAKE knows where to find it. If userspace corrupts the node value between WAIT and WAKE, the futex will not be found and no wakeup will happen. When FUTEX2_NUMA is not set, the node is simply an extension of the hash, such that traditional futexes are still interleaved over the nodes. This is done to avoid having to have a separate !numa hash-table. [bigeasy: ensure to have at least hashsize of 4 in futex_init(), add pr_info() for size and allocation information. Cast the naddr math to void*] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-17-bigeasy@linutronix.de
2025-05-03futex: Allow to make the private hash immutableSebastian Andrzej Siewior1-0/+2
My initial testing showed that: perf bench futex hash reported less operations/sec with private hash. After using the same amount of buckets in the private hash as used by the global hash then the operations/sec were about the same. This changed once the private hash became resizable. This feature added an RCU section and reference counting via atomic inc+dec operation into the hot path. The reference counting can be avoided if the private hash is made immutable. Extend PR_FUTEX_HASH_SET_SLOTS by a fourth argument which denotes if the private should be made immutable. Once set (to true) the a further resize is not allowed (same if set to global hash). Add PR_FUTEX_HASH_GET_IMMUTABLE which returns true if the hash can not be changed. Update "perf bench" suite. For comparison, results of "perf bench futex hash -s": - Xeon CPU E5-2650, 2 NUMA nodes, total 32 CPUs: - Before the introducing task local hash shared Averaged 1.487.148 operations/sec (+- 0,53%), total secs = 10 private Averaged 2.192.405 operations/sec (+- 0,07%), total secs = 10 - With the series shared Averaged 1.326.342 operations/sec (+- 0,41%), total secs = 10 -b128 Averaged 141.394 operations/sec (+- 1,15%), total secs = 10 -Ib128 Averaged 851.490 operations/sec (+- 0,67%), total secs = 10 -b8192 Averaged 131.321 operations/sec (+- 2,13%), total secs = 10 -Ib8192 Averaged 1.923.077 operations/sec (+- 0,61%), total secs = 10 128 is the default allocation of hash buckets. 8192 was the previous amount of allocated hash buckets. - Xeon(R) CPU E7-8890 v3, 4 NUMA nodes, total 144 CPUs: - Before the introducing task local hash shared Averaged 1.810.936 operations/sec (+- 0,26%), total secs = 20 private Averaged 2.505.801 operations/sec (+- 0,05%), total secs = 20 - With the series shared Averaged 1.589.002 operations/sec (+- 0,25%), total secs = 20 -b1024 Averaged 42.410 operations/sec (+- 0,20%), total secs = 20 -Ib1024 Averaged 740.638 operations/sec (+- 1,51%), total secs = 20 -b65536 Averaged 48.811 operations/sec (+- 1,35%), total secs = 20 -Ib65536 Averaged 1.963.165 operations/sec (+- 0,18%), total secs = 20 1024 is the default allocation of hash buckets. 65536 was the previous amount of allocated hash buckets. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20250416162921.513656-16-bigeasy@linutronix.de
2025-05-03futex: Allow to resize the private local hashSebastian Andrzej Siewior2-2/+5
The mm_struct::futex_hash_lock guards the futex_hash_bucket assignment/ replacement. The futex_hash_allocate()/ PR_FUTEX_HASH_SET_SLOTS operation can now be invoked at runtime and resize an already existing internal private futex_hash_bucket to another size. The reallocation is based on an idea by Thomas Gleixner: The initial allocation of struct futex_private_hash sets the reference count to one. Every user acquires a reference on the local hash before using it and drops it after it enqueued itself on the hash bucket. There is no reference held while the task is scheduled out while waiting for the wake up. The resize process allocates a new struct futex_private_hash and drops the initial reference. Synchronized with mm_struct::futex_hash_lock it is checked if the reference counter for the currently used mm_struct::futex_phash is marked as DEAD. If so, then all users enqueued on the current private hash are requeued on the new private hash and the new private hash is set to mm_struct::futex_phash. Otherwise the newly allocated private hash is saved as mm_struct::futex_phash_new and the rehashing and reassigning is delayed to the futex_hash() caller once the reference counter is marked DEAD. The replacement is not performed at rcuref_put() time because certain callers, such as futex_wait_queue(), drop their reference after changing the task state. This change will be destroyed once the futex_hash_lock is acquired. The user can change the number slots with PR_FUTEX_HASH_SET_SLOTS multiple times. An increase and decrease is allowed and request blocks until the assignment is done. The private hash allocated at thread creation is changed from 16 to 16 <= 4 * number_of_threads <= global_hash_size where number_of_threads can not exceed the number of online CPUs. Should the user PR_FUTEX_HASH_SET_SLOTS then the auto scaling is disabled. [peterz: reorganize the code to avoid state tracking and simplify new object handling, block the user until changes are in effect, allow increase and decrease of the hash]. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-15-bigeasy@linutronix.de
2025-05-03futex: Allow automatic allocation of process wide futex hashSebastian Andrzej Siewior1-0/+6
Allocate a private futex hash with 16 slots if a task forks its first thread. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-14-bigeasy@linutronix.de
2025-05-03futex: Add basic infrastructure for local task local hashSebastian Andrzej Siewior3-3/+33
The futex hash is system wide and shared by all tasks. Each slot is hashed based on futex address and the VMA of the thread. Due to randomized VMAs (and memory allocations) the same logical lock (pointer) can end up in a different hash bucket on each invocation of the application. This in turn means that different applications may share a hash bucket on the first invocation but not on the second and it is not always clear which applications will be involved. This can result in high latency's to acquire the futex_hash_bucket::lock especially if the lock owner is limited to a CPU and can not be effectively PI boosted. Introduce basic infrastructure for process local hash which is shared by all threads of process. This hash will only be used for a PROCESS_PRIVATE FUTEX operation. The hashmap can be allocated via: prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, num); A `num' of 0 means that the global hash is used instead of a private hash. Other values for `num' specify the number of slots for the hash and the number must be power of two, starting with two. The prctl() returns zero on success. This function can only be used before a thread is created. The current status for the private hash can be queried via: num = prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_SLOTS); which return the current number of slots. The value 0 means that the global hash is used. Values greater than 0 indicate the number of slots that are used. A negative number indicates an error. For optimisation, for the private hash jhash2() uses only two arguments the address and the offset. This omits the VMA which is always the same. [peterz: Use 0 for global hash. A bit shuffling and renaming. ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-13-bigeasy@linutronix.de
2025-05-03mm: Add vmalloc_huge_node()Peter Zijlstra1-2/+7
To enable node specific hash-tables using huge pages if possible. [bigeasy: use __vmalloc_node_range_noprof(), add nommu bits, inline vmalloc_huge] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250416162921.513656-3-bigeasy@linutronix.de
2025-05-03rcuref: Provide rcuref_is_dead()Sebastian Andrzej Siewior1-1/+21
rcuref_read() returns the number of references that are currently held. If 0 is returned then it is not safe to assume that the object ca be scheduled for deconstruction because it is marked DEAD. This happens if the return value of rcuref_put() is ignored and assumptions are made. If 0 is returned then the counter transitioned from 0 to RCUREF_NOREF. If rcuref_put() did not return to the caller then the counter did not yet transition from RCUREF_NOREF to RCUREF_DEAD. This means that there is still a chance that the counter will transition from RCUREF_NOREF to 0 meaning it is still valid and must not be deconstructed. In this brief window rcuref_read() will return 0. Provide rcuref_is_dead() to determine if the counter is marked as RCUREF_DEAD. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-2-bigeasy@linutronix.de
2025-05-03net: stmmac: remove speed_mode_2500() methodRussell King (Oracle)1-1/+0
Remove the speed_mode_2500() platform method which is no longer used or necessary, being superseded by the more flexible get_interfaces() method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uASM3-0021R3-2B@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-03net: stmmac: add get_interfaces() platform methodRussell King (Oracle)1-0/+2
Add a get_interfaces() platform method to allow platforms to indicate to phylink which interface modes they support - which then allows phylink to validate on initialisation that the configured PHY interface mode is actually supported. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1uASLn-0021Qd-Mi@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-03ASoC: soc-utils: add snd_soc_dlc_is_dummy()Mark Brown1-1/+1
Merge series from Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>: We are using dummy component/dlc, but didn't have its check funciton. This patch adds it.
2025-05-03Merge tag 'pm-6.15-rc5' of ↵Linus Torvalds1-29/+54
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix three recent regressions, two in cpufreq and one in the Intel Soundwire driver, and an unchecked MSR access in the intel_pstate driver: - Fix a recent regression causing systems where frequency tables are used by cpufreq to have issues with setting frequency limits (Rafael Wysocki) - Fix a recent regressions causing frequency boost settings to become out-of-sync if platform firmware updates the registers associated with frequency boost during system resume (Viresh Kumar) - Fix a recent regression causing resume failures to occur in the Intel Soundwire driver if the device handled by it is in runtime suspend before a system-wide suspend (Rafael Wysocki) - Fix an unchecked MSR aceess in the intel_pstate driver occurring when CPUID indicates no turbo, but the driver attempts to enable turbo frequencies due to a misleading value read from an MSR (Srinivas Pandruvada)" * tag 'pm-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode soundwire: intel_auxdevice: Fix system suspend/resume handling cpufreq: Fix setting policy limits when frequency tables are used cpufreq: ACPI: Re-sync CPU boost state on system resume
2025-05-02Merge branch 'pm-cpufreq'Rafael J. Wysocki1-29/+54
Merge cpufreq fixes for 6.15-rc5: - Fix a recent regression causing systems where frequency tables are used by cpufreq to have issues with setting frequency limits (Rafael Wysocki). - Fix a recent regressions causing frequency boost settings to become out-of-sync if platform firmware updates the registers associated with them during system resume (Viresh Kumar). - Fix an unchecked MSR aceess in the intel_pstate driver occurring when CPUID indicates no turbo, but the driver attempts to enable turbo frequencies due to a misleading value read from an MSR (Srinivas Pandruvada). * pm-cpufreq: cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode cpufreq: Fix setting policy limits when frequency tables are used cpufreq: ACPI: Re-sync CPU boost state on system resume
2025-05-02configfs-tsm: Namespace TSM report symbolsDan Williams1-11/+11
In preparation for new + common TSM (TEE Security Manager) infrastructure, namespace the TSM report symbols in tsm.h with an _REPORT suffix to differentiate them from other incoming tsm work. Cc: Yilun Xu <yilun.xu@intel.com> Cc: Samuel Ortiz <sameo@rivosinc.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://patch.msgid.link/174107246021.1288555.7203769833791489618.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-05-02bpf: udp: Make sure iter->batch always contains a full bucket snapshotJordan Rife1-0/+3
Require that iter->batch always contains a full bucket snapshot. This invariant is important to avoid skipping or repeating sockets during iteration when combined with the next few patches. Before, there were two cases where a call to bpf_iter_udp_batch may only capture part of a bucket: 1. When bpf_iter_udp_realloc_batch() returns -ENOMEM [1]. 2. When more sockets are added to the bucket while calling bpf_iter_udp_realloc_batch(), making the updated batch size insufficient [2]. In cases where the batch size only covers part of a bucket, it is possible to forget which sockets were already visited, especially if we have to process a bucket in more than two batches. This forces us to choose between repeating or skipping sockets, so don't allow this: 1. Stop iteration and propagate -ENOMEM up to userspace if reallocation fails instead of continuing with a partial batch. 2. Try bpf_iter_udp_realloc_batch() with GFP_USER just as before, but if we still aren't able to capture the full bucket, call bpf_iter_udp_realloc_batch() again while holding the bucket lock to guarantee the bucket does not change. On the second attempt use GFP_NOWAIT since we hold onto the spin lock. Introduce the udp_portaddr_for_each_entry_from macro and use it instead of udp_portaddr_for_each_entry to make it possible to continue iteration from an arbitrary socket. This is required for this patch in the GFP_NOWAIT case to allow us to fill the rest of a batch starting from the middle of a bucket and the later patch which skips sockets that were already seen. Testing all scenarios directly is a bit difficult, but I did some manual testing to exercise the code paths where GFP_NOWAIT is used and where ERR_PTR(err) is returned. I used the realloc test case included later in this series to trigger a scenario where a realloc happens inside bpf_iter_udp_batch and made a small code tweak to force the first realloc attempt to allocate a too-small batch, thus requiring another attempt with GFP_NOWAIT. Some printks showed both reallocs with the tests passing: Apr 25 23:16:24 crow kernel: go again GFP_USER Apr 25 23:16:24 crow kernel: go again GFP_NOWAIT With this setup, I also forced each of the bpf_iter_udp_realloc_batch calls to return -ENOMEM to ensure that iteration ends and that the read() in userspace fails. [1]: https://lore.kernel.org/bpf/CABi4-ogUtMrH8-NVB6W8Xg_F_KDLq=yy-yu-tKr2udXE2Mu1Lg@mail.gmail.com/ [2]: https://lore.kernel.org/bpf/7ed28273-a716-4638-912d-f86f965e54bb@linux.dev/ Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02Merge tag 'iommu-fixes-v6.15-rc4' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: "ARM-SMMU fixes: - Fix broken detection of the S2FWB feature - Ensure page-size bitmap is initialised for SVA domains - Fix handling of SMMU client devices with duplicate Stream IDs - Don't fail SMMU probe if Stream IDs are aliased across clients Intel VT-d fixes: - Add quirk for IGFX device - Revert an ATS change to fix a boot failure AMD IOMMU: - Fix potential buffer overflow Core: - Fix for iommu_copy_struct_from_user()" * tag 'iommu-fixes-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu/vt-d: Apply quirk_iommu_igfx for 8086:0044 (QM57/QS57) iommu/vt-d: Revert ATS timing change to fix boot failure iommu: Fix two issues in iommu_copy_struct_from_user() iommu/amd: Fix potential buffer overflow in parse_ivrs_acpihid iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefully iommu/arm-smmu-v3: Fix iommu_device_probe bug due to duplicated stream ids iommu/arm-smmu-v3: Fix pgsize_bit for sva domains iommu/arm-smmu-v3: Add missing S2FWB feature detection
2025-05-02coredump: hand a pidfd to the usermode coredump helperChristian Brauner1-0/+1
Give userspace a way to instruct the kernel to install a pidfd into the usermode helper process. This makes coredump handling a lot more reliable for userspace. In parallel with this commit we already have systemd adding support for this in [1]. We create a pidfs file for the coredumping process when we process the corename pattern. When the usermode helper process is forked we then install the pidfs file as file descriptor three into the usermode helpers file descriptor table so it's available to the exec'd program. Since usermode helpers are either children of the system_unbound_wq workqueue or kthreadd we know that the file descriptor table is empty and can thus always use three as the file descriptor number. Note, that we'll install a pidfd for the thread-group leader even if a subthread is calling do_coredump(). We know that task linkage hasn't been removed due to delay_group_leader() and even if this @current isn't the actual thread-group leader we know that the thread-group leader cannot be reaped until @current has exited. Link: https://github.com/systemd/systemd/pull/37125 [1] Link: https://lore.kernel.org/20250414-work-coredump-v2-3-685bf231f828@kernel.org Tested-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-02uio_hv_generic: Fix sysfs creation path for ring bufferNaman Jain1-0/+6
On regular bootup, devices get registered to VMBus first, so when uio_hv_generic driver for a particular device type is probed, the device is already initialized and added, so sysfs creation in hv_uio_probe() works fine. However, when the device is removed and brought back, the channel gets rescinded and the device again gets registered to VMBus. However this time, the uio_hv_generic driver is already registered to probe for that device and in this case sysfs creation is tried before the device's kobject gets initialized completely. Fix this by moving the core logic of sysfs creation of ring buffer, from uio_hv_generic to HyperV's VMBus driver, where the rest of the sysfs attributes for the channels are defined. While doing that, make use of attribute groups and macros, instead of creating sysfs directly, to ensure better error handling and code flow. Problematic path: vmbus_process_offer (A new offer comes for the VMBus device) vmbus_add_channel_work vmbus_device_register |-> device_register | |... | |-> hv_uio_probe | |... | |-> sysfs_create_bin_file (leads to a warning as | the primary channel's kobject, which is used to | create the sysfs file, is not yet initialized) |-> kset_create_and_add |-> vmbus_add_channel_kobj (initialization of the primary channel's kobject happens later) Above code flow is sequential and the warning is always reproducible in this path. Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel") Cc: stable@kernel.org Suggested-by: Saurabh Sengar <ssengar@linux.microsoft.com> Suggested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Link: https://lore.kernel.org/r/20250502074811.2022-2-namjain@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-02btrfs: correct the order of prelim_ref arguments in btrfs__prelim_refGoldwyn Rodrigues1-1/+1
btrfs_prelim_ref() calls the old and new reference variables in the incorrect order. This causes a NULL pointer dereference because oldref is passed as NULL to trace_btrfs_prelim_ref_insert(). Note, trace_btrfs_prelim_ref_insert() is being called with newref as oldref (and oldref as NULL) on purpose in order to print out the values of newref. To reproduce: echo 1 > /sys/kernel/debug/tracing/events/btrfs/btrfs_prelim_ref_insert/enable Perform some writeback operations. Backtrace: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 115949067 P4D 115949067 PUD 11594a067 PMD 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 1 UID: 0 PID: 1188 Comm: fsstress Not tainted 6.15.0-rc2-tester+ #47 PREEMPT(voluntary) 7ca2cef72d5e9c600f0c7718adb6462de8149622 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014 RIP: 0010:trace_event_raw_event_btrfs__prelim_ref+0x72/0x130 Code: e8 43 81 9f ff 48 85 c0 74 78 4d 85 e4 0f 84 8f 00 00 00 49 8b 94 24 c0 06 00 00 48 8b 0a 48 89 48 08 48 8b 52 08 48 89 50 10 <49> 8b 55 18 48 89 50 18 49 8b 55 20 48 89 50 20 41 0f b6 55 28 88 RSP: 0018:ffffce44820077a0 EFLAGS: 00010286 RAX: ffff8c6b403f9014 RBX: ffff8c6b55825730 RCX: 304994edf9cf506b RDX: d8b11eb7f0fdb699 RSI: ffff8c6b403f9010 RDI: ffff8c6b403f9010 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000010 R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8c6b4e8fb000 R13: 0000000000000000 R14: ffffce44820077a8 R15: ffff8c6b4abd1540 FS: 00007f4dc6813740(0000) GS:ffff8c6c1d378000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 000000010eb42000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> prelim_ref_insert+0x1c1/0x270 find_parent_nodes+0x12a6/0x1ee0 ? __entry_text_end+0x101f06/0x101f09 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 btrfs_is_data_extent_shared+0x167/0x640 ? fiemap_process_hole+0xd0/0x2c0 extent_fiemap+0xa5c/0xbc0 ? __entry_text_end+0x101f05/0x101f09 btrfs_fiemap+0x7e/0xd0 do_vfs_ioctl+0x425/0x9d0 __x64_sys_ioctl+0x75/0xc0 Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>