summaryrefslogtreecommitdiff
path: root/include/trace
AgeCommit message (Collapse)AuthorFilesLines
3 daysMerge tag 'x86-urgent-2025-12-21' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix FPU core dumps on certain CPU models - Fix htmldocs build warning - Export TLB tracing event name via header - Remove unused constant from <linux/mm_types.h> - Fix comments - Fix whitespace noise in documentation - Fix variadic structure's definition to un-confuse UBSAN - Fix posted MSI interrupts irq_retrigger() bug - Fix asm build failure with older GCC builds * tag 'x86-urgent-2025-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bug: Fix old GCC compile fails x86/msi: Make irq_retrigger() functional for posted MSI x86/platform/uv: Fix UBSAN array-index-out-of-bounds mm: Remove tlb_flush_reason::NR_TLB_FLUSH_REASONS from <linux/mm_types.h> x86/mm/tlb/trace: Export the TLB_REMOTE_WRONG_CPU enum in <trace/events/tlb.h> x86/sgx: Remove unmatched quote in __sgx_encl_extend function comment x86/boot/Documentation: Fix whitespace noise in boot.rst x86/fpu: Fix FPU state core dump truncation on CPUs with no extended xfeatures x86/boot/Documentation: Fix htmldocs build warning due to malformed table in boot.rst
11 daysMerge tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds1-0/+234
Pull ceph updates from Ilya Dryomov: "We have a patch that adds an initial set of tracepoints to the MDS client from Max, a fix that hardens osdmap parsing code from myself (marked for stable) and a few assorted fixups" * tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client: rbd: stop selecting CRC32, CRYPTO, and CRYPTO_AES ceph: stop selecting CRC32, CRYPTO, and CRYPTO_AES libceph: make decode_pool() more resilient against corrupted osdmaps libceph: Amend checking to fix `make W=1` build breakage ceph: Amend checking to fix `make W=1` build breakage ceph: add trace points to the MDS client libceph: fix log output race condition in OSD client
12 daysx86/mm/tlb/trace: Export the TLB_REMOTE_WRONG_CPU enum in <trace/events/tlb.h>Tal Zussman1-2/+3
When the TLB_REMOTE_WRONG_CPU enum was introduced for the tlb_flush tracepoint, the enum was not exported to user-space. Add it to the appropriate macro definition to enable parsing by userspace tools, as per: Link: https://lore.kernel.org/all/20150403013802.220157513@goodmis.org [ mingo: Capitalize IPI, etc. ] Fixes: 2815a56e4b72 ("x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU") Signed-off-by: Tal Zussman <tz2294@columbia.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Rik van Riel <riel@surriel.com> Link: https://patch.msgid.link/20251212-tlb-trace-fix-v2-1-d322e0ad9b69@columbia.edu
2025-12-10ceph: add trace points to the MDS clientMax Kellermann1-0/+234
This patch adds trace points to the Ceph filesystem MDS client: - request submission (CEPH_MSG_CLIENT_REQUEST) and completion (CEPH_MSG_CLIENT_REPLY) - capabilities (CEPH_MSG_CLIENT_CAPS) These are the central pieces that are useful for analyzing MDS latency/performance problems from the client's perspective. In the long run, all doutc() calls should be replaced with tracepoints. This way, the Ceph filesystem can be traced at any time (without spamming the kernel log). Additionally, trace points can be used in BPF programs (which can even deference the pointer parameters and extract more values). Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-12-09Merge tag 'f2fs-for-6.19-rc1' of ↵Linus Torvalds1-9/+50
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This series focuses on minor clean-ups and performance optimizations across sysfs, documentation, debugfs, tracepoints, slab allocation, and GC. Furthermore, it resolves several corner-case bugs caught by xfstests, as well as issues related to 16KB page support and f2fs_enable_checkpoint. Enhancement: - wrap ASCII tables in literal blocks to fix LaTeX build - optimize trace_f2fs_write_checkpoint with enums - support to show curseg.next_blkoff in debugfs - add a sysfs entry to show max open zones - add fadvise tracepoint - use global inline_xattr_slab instead of per-sb slab cache - set default valid_thresh_ratio to 80 for zoned devices - maintain one time GC mode is enabled during whole zoned GC cycle Bug fix: - ensure node page reads complete before f2fs_put_super() finishes - do not account invalid blocks in get_left_section_blocks() - revert summary entry count from 2048 to 512 in 16kb block support - detect recoverable inode during dryrun of find_fsync_dnodes() - fix age extent cache insertion skip on counter overflow - add sanity checks before unlinking and loading inodes - ensure minimum trim granularity accounts for all devices - block cache/dio write during f2fs_enable_checkpoint() - propagate error from f2fs_enable_checkpoint() - invalidate dentry cache on failed whiteout creation - avoid updating compression context during writeback - avoid updating zero-sized extent in extent cache - avoid potential deadlock" * tag 'f2fs-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (39 commits) f2fs: ignore discard return value f2fs: optimize trace_f2fs_write_checkpoint with enums f2fs: fix to not account invalid blocks in get_left_section_blocks() f2fs: support to show curseg.next_blkoff in debugfs docs: f2fs: wrap ASCII tables in literal blocks to fix LaTeX build f2fs: expand scalability of f2fs mount option f2fs: change default schedule timeout value f2fs: introduce f2fs_schedule_timeout() f2fs: use memalloc_retry_wait() as much as possible f2fs: add a sysfs entry to show max open zones f2fs: wrap all unusable_blocks_per_sec code in CONFIG_BLK_DEV_ZONED f2fs: simplify list initialization in f2fs_recover_fsync_data() f2fs: revert summary entry count from 2048 to 512 in 16kb block support f2fs: fix to detect recoverable inode during dryrun of find_fsync_dnodes() f2fs: fix return value of f2fs_recover_fsync_data() f2fs: add fadvise tracepoint f2fs: fix age extent cache insertion skip on counter overflow f2fs: Add sanity checks before unlinking and loading inodes f2fs: Rename f2fs_unlink exit label f2fs: ensure minimum trim granularity accounts for all devices ...
2025-12-09Merge tag 'io_uring-6.19-20251208' of ↵Linus Torvalds1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring updates from Jens Axboe: "Followup set of fixes for io_uring for this merge window. These are either later fixes, or cleanups that don't make sense to defer. This pull request contains: - Fix for a recent regression in io-wq worker creation - Tracing cleanup - Use READ_ONCE/WRITE_ONCE consistently for ring mapped kbufs. Mostly for documentation purposes, indicating that they are shared with userspace - Fix for POLL_ADD losing a completion, if the request is updated and now is triggerable - eg, if POLLIN is set with the updated, and the polled file is readable - In conjunction with the above fix, also unify how poll wait queue entries are deleted with the head update. We had 3 different spots doing both the list deletion and head write, with one of them nicely documented. Abstract that into a helper and use it consistently - Small series from Joanne fixing an issue with buffer cloning, and cleaning up the arg validation" * tag 'io_uring-6.19-20251208' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring/poll: unify poll waitqueue entry and list removal io_uring/kbuf: use WRITE_ONCE() for userspace-shared buffer ring fields io_uring/kbuf: use READ_ONCE() for userspace-mapped memory io_uring/rsrc: fix lost entries after cloned range io_uring/rsrc: rename misleading src_node variable in io_clone_buffers() io_uring/rsrc: clean up buffer cloning arg validation io_uring/trace: rename io_uring_queue_async_work event "rw" field io_uring/io-wq: always retry worker create on ERESTART* io_uring/poll: correctly handle io_poll_add() return value on update
2025-12-06Merge tag 'mm-stable-2025-12-03-21-26' of ↵Linus Torvalds3-2/+100
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki) Rework the vmalloc() code to support non-blocking allocations (GFP_ATOIC, GFP_NOWAIT) "ksm: fix exec/fork inheritance" (xu xin) Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited across fork/exec "mm/zswap: misc cleanup of code and documentations" (SeongJae Park) Some light maintenance work on the zswap code "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira) Enhance the /sys/kernel/debug/page_owner debug feature by adding unique identifiers to differentiate the various stack traces so that userspace monitoring tools can better match stack traces over time "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn) Minor alterations to the page allocator's per-cpu-pages feature "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra) Address a scalability issue in userfaultfd's UFFDIO_MOVE operation "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov) "drivers/base/node: fold node register and unregister functions" (Donet Tom) Clean up the NUMA node handling code a little "mm: some optimizations for prot numa" (Kefeng Wang) Cleanups and small optimizations to the NUMA allocation hinting code "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn) Address long lock hold times at boot on large machines. These were causing (harmless) softlockup warnings "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang) Remove some now-unnecessary work from page reclaim "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park) Enhance the DAMOS auto-tuning feature "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan) Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace configuration "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes) Enhance the new(ish) file_operations.mmap_prepare() method and port additional callsites from the old ->mmap() over to ->mmap_prepare() "Fix stale IOTLB entries for kernel address space" (Lu Baolu) Fix a bug (and possible security issue on non-x86) in the IOMMU code. In some situations the IOMMU could be left hanging onto a stale kernel pagetable entry "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang) Clean up and optimize the folio splitting code "mm, swap: misc cleanup and bugfix" (Kairui Song) Some cleanups and a minor fix in the swap discard code "mm/damon: misc documentation fixups" (SeongJae Park) "mm/damon: support pin-point targets removal" (SeongJae Park) Permit userspace to remove a specific monitoring target in the middle of the current targets list "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo) A couple of cleanups related to mm header file inclusion "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He) improve the selection of swap devices for NUMA machines "mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista) Change the memory block labels from macros to enums so they will appear in kernel debug info "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes) Address an inefficiency when KSM unmerges an address range "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park) Fix leaks and unhandled malloc() failures in DAMON userspace unit tests "some cleanups for pageout()" (Baolin Wang) Clean up a couple of minor things in the page scanner's writeback-for-eviction code "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu) Move hugetlb's sysfs/sysctl handling code into a new file "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes) Make the VMA guard regions available in /proc/pid/smaps and improves the mergeability of guarded VMAs "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes) Reduce mmap lock contention for callers performing VMA guard region operations "vma_start_write_killable" (Matthew Wilcox) Start work on permitting applications to be killed when they are waiting on a read_lock on the VMA lock "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park) Add additional userspace testing of DAMON's "commit" feature "mm/damon: misc cleanups" (SeongJae Park) "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes) Address the possible loss of a VMA's VM_SOFTDIRTY flag when that VMA is merged with another "mm: support device-private THP" (Balbir Singh) Introduce support for Transparent Huge Page (THP) migration in zone device-private memory "Optimize folio split in memory failure" (Zi Yan) "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang) Some more cleanups in the folio splitting code "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes) Clean up our handling of pagetable leaf entries by introducing the concept of 'software leaf entries', of type softleaf_t "reparent the THP split queue" (Muchun Song) Reparent the THP split queue to its parent memcg. This is in preparation for addressing the long-standing "dying memcg" problem, wherein dead memcg's linger for too long, consuming memory resources "unify PMD scan results and remove redundant cleanup" (Wei Yang) A little cleanup in the hugepage collapse code "zram: introduce writeback bio batching" (Sergey Senozhatsky) Improve zram writeback efficiency by introducing batched bio writeback support "memcg: cleanup the memcg stats interfaces" (Shakeel Butt) Clean up our handling of the interrupt safety of some memcg stats "make vmalloc gfp flags usage more apparent" (Vishal Moola) Clean up vmalloc's handling of incoming GFP flags "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang) Teach soft dirty and userfaultfd write protect tracking to use RISC-V's Svrsw60t59b extension "mm: swap: small fixes and comment cleanups" (Youngjun Park) Fix a small bug and clean up some of the swap code "initial work on making VMA flags a bitmap" (Lorenzo Stoakes) Start work on converting the vma struct's flags to a bitmap, so we stop running out of them, especially on 32-bit "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park) Address a possible bug in the swap discard code and clean things up a little [ This merge also reverts commit ebb9aeb980e5 ("vfio/nvgrace-gpu: register device memory for poison handling") because it looks broken to me, I've asked for clarification - Linus ] * tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm: fix vma_start_write_killable() signal handling mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate mm/swapfile: fix list iteration when next node is removed during discard fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling mm/kfence: add reboot notifier to disable KFENCE on shutdown memcg: remove inc/dec_lruvec_kmem_state helpers selftests/mm/uffd: initialize char variable to Null mm: fix DEBUG_RODATA_TEST indentation in Kconfig mm: introduce VMA flags bitmap type tools/testing/vma: eliminate dependency on vma->__vm_flags mm: simplify and rename mm flags function for clarity mm: declare VMA flags by bit zram: fix a spelling mistake mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity mm/vmscan: skip increasing kswapd_failures when reclaim was boosted pagemap: update BUDDY flag documentation mm: swap: remove scan_swap_map_slots() references from comments mm: swap: change swap_alloc_slow() to void mm, swap: remove redundant comment for read_swap_cache_async mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational ...
2025-12-05Merge tag 'trace-v6.19' of ↵Linus Torvalds1-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Extend tracing option mask to 64 bits The trace options were defined by a 32 bit variable. This limits the tracing instances to have a total of 32 different options. As that limit has been hit, and more options are being added, increase the option mask to a 64 bit number, doubling the number of options available. As this is required for the kprobe topic branches as well as the tracing topic branch, a separate branch was created and merged into both. - Make trace_user_fault_read() available for the rest of tracing The function trace_user_fault_read() is used by trace_marker file read to allow reading user space to be done fast and without locking or allocations. Make this available so that the system call trace events can use it too. - Have system call trace events read user space values Now that the system call trace events callbacks are called in a faultable context, take advantage of this and read the user space buffers for various system calls. For example, show the path name of the openat system call instead of just showing the pointer to that path name in user space. Also show the contents of the buffer of the write system call. Several system call trace events are updated to make tracing into a light weight strace tool for all applications in the system. - Update perf system call tracing to do the same - And a config and syscall_user_buf_size file to control the size of the buffer Limit the amount of data that can be read from user space. The default size is 63 bytes but that can be expanded to 165 bytes. - Allow the persistent ring buffer to print system calls normally The persistent ring buffer prints trace events by their type and ignores the print_fmt. This is because the print_fmt may change from kernel to kernel. As the system call output is fixed by the system call ABI itself, there's no reason to limit that. This makes reading the system call events in the persistent ring buffer much nicer and easier to understand. - Add options to show text offset to function profiler The function profiler that counts the number of times a function is hit currently lists all functions by its name and offset. But this becomes ambiguous when there are several functions with the same name. Add a tracing option that changes the output to be that of '_text+offset' instead. Now a user space tool can use this information to map the '_text+offset' to the unique function it is counting. - Report bad dynamic event command If a bad command is passed to the dynamic_events file, report it properly in the error log. - Clean up tracer options Clean up the tracer option code a bit, by removing some useless code and also using switch statements instead of a series of if statements. - Have tracing options be instance specific Tracers can have their own options (function tracer, irqsoff tracer, function graph tracer, etc). But now that the same tracer can be enabled in multiple trace instances, their options are still global. The API is per instance, thus changing one affects other instances. This isn't even consistent, as the option take affect differently depending on when an tracer started in an instance. Make the options for instances only affect the instance it is changed under. - Optimize pid_list lock contention Whenever the pid_list is read, it uses a spin lock. This happens at every sched switch. Taking the lock at sched switch can be removed by instead using a seqlock counter. - Clean up the trace trigger structures The trigger code uses two different structures to implement a single tigger. This was due to trying to reuse code for the two different types of triggers (always on trigger, and count limited trigger). But by adding a single field to one structure, the other structure could be absorbed into the first structure making he code easier to understand. - Create a bulk garbage collector for trace triggers If user space has triggers for several hundreds of events and then removes them, it can take several seconds to complete. This is because each removal calls tracepoint_synchronize_unregister() that can take hundreds of milliseconds to complete. Instead, create a helper thread that will do the clean up. When a trigger is removed, it will create the kthread if it isn't already created, and then add the trigger to a llist. The kthread will take the items off the llist, call tracepoint_synchronize_unregister(), and then remove the items it took off. It will then check if there's more items to free before sleeping. This makes user space removing all these triggers to finish in less than a second. - Allow function tracing of some of the tracing infrastructure code Because the tracing code can cause recursion issues if it is traced by the function tracer the entire tracing directory disables function tracing. But not all of tracing causes issues if it is traced. Namely, the event tracing code. Add a config that enables some of the tracing code to be traced to help in debugging it. Note, when this is enabled, it does add noise to general function tracing, especially if events are enabled as well (which is a common case). - Add boot-time backup instance for persistent buffer The persistent ring buffer is used mostly for kernel crash analysis in the field. One issue is that if there's a crash, the data in the persistent ring buffer must be read before tracing can begin using it. This slows down the boot process. Once tracing starts in the persistent ring buffer, the old data must be freed and the addresses no longer match and old events can't be in the buffer with new events. Create a way to create a backup buffer that copies the persistent ring buffer at boot up. Then after a crash, the always on tracer can begin immediately as well as the normal boot process while the crash analysis tooling uses the backup buffer. After the backup buffer is finished being read, it can be removed. - Enable function graph args and return address options at the same time Currently the when reading of arguments in the function graph tracer is enabled, the option to record the parent function in the entry event can not be enabled. Update the code so that it can. - Add new struct_offset() helper macro Add a new macro that takes a pointer to a structure and a name of one of its members and it will return the offset of that member. This allows the ring buffer code to simplify the following: From: size = struct_size(entry, buf, cnt - sizeof(entry->id)); To: size = struct_offset(entry, id) + cnt; There should be other simplifications that this macro can help out with as well * tag 'trace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (42 commits) overflow: Introduce struct_offset() to get offset of member function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously tracing: Add boot-time backup of persistent ring buffer ftrace: Allow tracing of some of the tracing code tracing: Use strim() in trigger_process_regex() instead of skip_spaces() tracing: Add bulk garbage collection of freeing event_trigger_data tracing: Remove unneeded event_mutex lock in event_trigger_regex_release() tracing: Merge struct event_trigger_ops into struct event_command tracing: Remove get_trigger_ops() and add count_func() from trigger ops tracing: Show the tracer options in boot-time created instance ftrace: Avoid redundant initialization in register_ftrace_direct tracing: Remove unused variable in tracing_trace_options_show() fgraph: Make fgraph_no_sleep_time signed tracing: Convert function graph set_flags() to use a switch() statement tracing: Have function graph tracer option sleep-time be per instance tracing: Move graph-time out of function graph options tracing: Have function graph tracer option funcgraph-irqs be per instance trace/pid_list: optimize pid_list->lock contention tracing: Have function graph tracer define options per instance tracing: Have function tracer define options per instance ...
2025-12-04Merge tag 'spi-v6.19' of ↵Linus Torvalds1-0/+106
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "This release is almost entirely new drivers, with a couple of small changes in generic code. The biggest individual update is a rename of the existing Microchip driver and the addition of a new driver for the silicon SPI controller in their PolarFire SoCs. The overlap between the soft IP supported by the current driver and this new one is regrettably all in the IP and not in the register interface offered to software. - Add a time offset parameter for offloads, allowing them to be defined in relation to each other. This is useful for IIO type applcations where you trigger an operation then read the result after a delay. - Add a tracepoint for flash exec_ops, bringing the flash support more in line with the debuggability of vanilla SPI. - Support for Airoha EN7523, Arduino MCUs, Aspeed AST2700, Microchip PolarFire SPI controllers, NXP i.MX51 ECSPI target mode, Qualcomm IPQ5414 and IPQ5332, Renesas RZ/T2H, RZ/V2N and RZ/2NH and SpacemiT K1 QuadSPI. There's also a small set of ASoC cleanups that I mistakenly applied to the SPI tree and then put more stuff on top of before it was brought to my attention, sorry about that" * tag 'spi-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (80 commits) spi: microchip-core: Refactor FIFO read and write handlers spi: ch341: fix out-of-bounds memory access in ch341_transfer_one spi: microchip-core: Remove unneeded PM related macro spi: microchip-core: Use SPI_MODE_X_MASK spi: microchip-core: Utilise temporary variable for struct device spi: microchip-core: Replace dead code (-ENOMEM error message) spi: microchip-core: use min() instead of min_t() spi: dt-bindings: airoha: add compatible for EN7523 spi: airoha-snfi: en7523: workaround flash damaging if UART_TXD was short to GND spi: dt-bindings: renesas,rzv2h-rspi: Document RZ/V2N SoC support spi: dt-bindings: renesas,rzv2h-rspi: Document RZ/V2N SoC support spi: microchip: Enable compile-testing for FPGA SPI controllers spi: Fix potential uninitialized variable in probe() spi: rzv2h-rspi: add support for RZ/T2H and RZ/N2H spi: dt-bindings: renesas,rzv2h-rspi: document RZ/T2H and RZ/N2H spi: rzv2h-rspi: add support for loopback mode spi: rzv2h-rspi: add support for variable transfer clock spi: rzv2h-rspi: add support for using PCLK for transfer clock spi: rzv2h-rspi: make transfer clock rate finding chip-specific spi: rzv2h-rspi: avoid recomputing transfer frequency ...
2025-12-04Merge tag 'sound-6.19-rc1' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "The majority of changes at this time were about ASoC with a lot of code refactoring works. From the functionality POV, there isn't much to see, but we have a wide range of device-specific fixes and updates. Here are some highlights: - Continued ASoC API cleanup work, spanned over many files - Added a SoundWire SCDA generic class driver with regmap support - Enhancements and fixes for Cirrus, Intel, Maxim and Qualcomm. - Support for ASoC Allwinner A523, Mediatek MT8189, Qualcomm QCM2290, QRB2210 and SM6115, SpacemiT K1, and TI TAS2568, TAS5802, TAS5806, TAS5815, TAS5828 and TAS5830 - Usual HD-audio and USB-audio quirks and fixups - Support for Onkyo SE-300PCIE, TASCAM IF-FW/DM MkII Some gpiolib changes for shared GPIOs are included along with this PR for covering ASoC drivers changes" * tag 'sound-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (739 commits) ALSA: hda/realtek: Add PCI SSIDs to HP ProBook quirks ALSA: usb-audio: Simplify with usb_endpoint_max_periodic_payload() ALSA: hda/realtek: fix mute/micmute LEDs don't work for more HP laptops ALSA: rawmidi: Fix inconsistent indenting warning reported by smatch ALSA: dice: fix buffer overflow in detect_stream_formats() ASoC: codecs: Modify awinic amplifier dsp read and write functions ASoC: SDCA: Fixup some more Kconfig issues ASoC: cs35l56: Log a message if firmware is missing ASoC: nau8325: Delete a stray tab firmware: cs_dsp: Add test cases for client_ops == NULL firmware: cs_dsp: Don't require client to provide a struct cs_dsp_client_ops ASoC: fsl_micfil: Set channel range control ASoC: fsl_micfil: Add default quality for different platforms ASoC: intel: sof_sdw: Add codec_info for cs42l45 ASoC: sdw_utils: Add cs42l45 support functions ASoC: intel: sof_sdw: Add ability to have auxiliary devices ASoC: sdw_utils: Move codec_name to dai info ASoC: sdw_utils: Add codec_conf for every DAI ASoC: SDCA: Add terminal type into input/output widget name ASoC: SDCA: Align mute controls to ALSA expectations ...
2025-12-04io_uring/trace: rename io_uring_queue_async_work event "rw" fieldCaleb Sander Mateos1-6/+6
The io_uring_queue_async_work tracepoint event stores an int rw field that represents whether the work item is hashed. Rename it to "hashed" and change its type to bool to more accurately reflect its value. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-04Merge tag 'ext4_for_linus-6.19-rc1' of ↵Linus Torvalds1-9/+90
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "New features and improvements for the ext4 file system: - Optimize online defragmentation by using folios instead of individual buffer heads - Improve error codes stored in the superblock when the journal aborts - Minor cleanups and clarifications in ext4_map_blocks() - Add documentation of the casefold and encrypt flags - Add support for file systems with a blocksize greater than the pagesize - Improve performance by enabling the caching the fact that an inode does not have a Posix ACL Various Bug Fixes: - Fix false positive complaints from smatch - Fix error code which is returned by ext4fs_dirhash() when Siphash is used without the encryption key - Fix races when writing to inline data files which could trigger a BUG - Fix potential NULL dereference when there is an corrupt file system with an extended attribute value stored in a inode - Fix false positive lockdep report when syzbot uses ext4 and ocfs2 together - Fix false positive reported by DEPT by adjusting lock annotation - Avoid a potential BUG_ON in jbd2 when a file system is massively corrupted - Fix a WARN_ON when superblock is corrupted with a non-NULL terminated mount options field - Add check if the userspace passes in a non-NULL terminated mount options field to EXT4_IOC_SET_TUNE_SB_PARAM - Fix a potential journal checksum failure whena file system is copied while it is mounted read-only - Fix a potential potential orphan file tracking error which only showed on 32-bit systems - Fix assertion checks in mballoc (which have to be explicitly enbled by manually enabling AGGRESSIVE_CHECKS and recompiling) - Avoid complaining about overly large orphan files created by mke2fs with with file systems with a 64k block size" * tag 'ext4_for_linus-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits) ext4: mark inodes without acls in __ext4_iget() ext4: enable block size larger than page size ext4: add checks for large folio incompatibilities when BS > PS ext4: support verifying data from large folios with fs-verity ext4: make data=journal support large block size ext4: support large block size in __ext4_block_zero_page_range() ext4: support large block size in mpage_prepare_extent_to_map() ext4: support large block size in mpage_map_and_submit_buffers() ext4: support large block size in ext4_block_write_begin() ext4: support large block size in ext4_mpage_readpages() ext4: rename 'page' references to 'folio' in multi-block allocator ext4: prepare buddy cache inode for BS > PS with large folios ext4: support large block size in ext4_mb_init_cache() ext4: support large block size in ext4_mb_get_buddy_page_lock() ext4: support large block size in ext4_mb_load_buddy_gfp() ext4: add EXT4_LBLK_TO_PG and EXT4_PG_TO_LBLK for block/page conversion ext4: add EXT4_LBLK_TO_B macro for logical block to bytes conversion ext4: support large block size in ext4_readdir() ext4: support large block size in ext4_calculate_overhead() ext4: introduce s_min_folio_order for future BS > PS support ...
2025-12-04f2fs: optimize trace_f2fs_write_checkpoint with enumsYH Lin1-5/+14
This patch optimizes the tracepoint by replacing these hardcoded strings with a new enumeration f2fs_cp_phase. 1.Defines enum f2fs_cp_phase with values for each checkpoint phase. 2.Updates trace_f2fs_write_checkpoint to accept a u16 phase argument instead of a string pointer. 3.Uses __print_symbolic in TP_printk to convert the enum values back to their corresponding strings for human-readable trace output. This change reduces the storage overhead for each trace event by replacing a variable-length string with a 2-byte integer, while maintaining the same readable output in ftrace. Signed-off-by: YH Lin <yhli@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: add fadvise tracepointJaegeuk Kim1-0/+32
This adds a tracepoint in the fadvise call path. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: fix to access i_size w/ i_size_read()Chao Yu1-4/+4
It recommends to use i_size_{read,write}() to access and update i_size, otherwise, we may get wrong tearing value due to high 32-bits value and low 32-bits value of i_size field are not updated atomically in 32-bits archicture machine. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04Merge tag 'net-next-6.19' of ↵Linus Torvalds1-10/+27
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Replace busylock at the Tx queuing layer with a lockless list. Resulting in a 300% (4x) improvement on heavy TX workloads, sending twice the number of packets per second, for half the cpu cycles. - Allow constantly busy flows to migrate to a more suitable CPU/NIC queue. Normally we perform queue re-selection when flow comes out of idle, but under extreme circumstances the flows may be constantly busy. Add sysctl to allow periodic rehashing even if it'd risk packet reordering. - Optimize the NAPI skb cache, make it larger, use it in more paths. - Attempt returning Tx skbs to the originating CPU (like we already did for Rx skbs). - Various data structure layout and prefetch optimizations from Eric. - Remove ktime_get() from the recvmsg() fast path, ktime_get() is sadly quite expensive on recent AMD machines. - Extend threaded NAPI polling to allow the kthread busy poll for packets. - Make MPTCP use Rx backlog processing. This lowers the lock pressure, improving the Rx performance. - Support memcg accounting of MPTCP socket memory. - Allow admin to opt sockets out of global protocol memory accounting (using a sysctl or BPF-based policy). The global limits are a poor fit for modern container workloads, where limits are imposed using cgroups. - Improve heuristics for when to kick off AF_UNIX garbage collection. - Allow users to control TCP SACK compression, and default to 33% of RTT. - Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid unnecessarily aggressive rcvbuf growth and overshot when the connection RTT is low. - Preserve skb metadata space across skb_push / skb_pull operations. - Support for IPIP encapsulation in the nftables flowtable offload. - Support appending IP interface information to ICMP messages (RFC 5837). - Support setting max record size in TLS (RFC 8449). - Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL. - Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock. - Let users configure the number of write buffers in SMC. - Add new struct sockaddr_unsized for sockaddr of unknown length, from Kees. - Some conversions away from the crypto_ahash API, from Eric Biggers. - Some preparations for slimming down struct page. - YAML Netlink protocol spec for WireGuard. - Add a tool on top of YAML Netlink specs/lib for reporting commonly computed derived statistics and summarized system state. Driver API: - Add CAN XL support to the CAN Netlink interface. - Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as defined by the OPEN Alliance's "Advanced diagnostic features for 100BASE-T1 automotive Ethernet PHYs" specification. - Add DPLL phase-adjust-gran pin attribute (and implement it in zl3073x). - Refactor xfrm_input lock to reduce contention when NIC offloads IPsec and performs RSS. - Add info to devlink params whether the current setting is the default or a user override. Allow resetting back to default. - Add standard device stats for PSP crypto offload. - Leverage DSA frame broadcast to implement simple HSR frame duplication for a lot of switches without dedicated HSR offload. - Add uAPI defines for 1.6Tbps link modes. Device drivers: - Add Motorcomm YT921x gigabit Ethernet switch support. - Add MUCSE driver for N500/N210 1GbE NIC series. - Convert drivers to support dedicated ops for timestamping control, and away from the direct IOCTL handling. While at it support GET operations for PHY timestamping. - Add (and convert most drivers to) a dedicated ethtool callback for reading the Rx ring count. - Significant refactoring efforts in the STMMAC driver, which supports Synopsys turn-key MAC IP integrated into a ton of SoCs. - Ethernet high-speed NICs: - Broadcom (bnxt): - support PPS in/out on all pins - Intel (100G, ice, idpf): - ice: implement standard ethtool and timestamping stats - i40e: support setting the max number of MAC addresses per VF - iavf: support RSS of GTP tunnels for 5G and LTE deployments - nVidia/Mellanox (mlx5): - reduce downtime on interface reconfiguration - disable being an XDP redirect target by default (same as other drivers) to avoid wasting resources if feature is unused - Meta (fbnic): - add support for Linux-managed PCS on 25G, 50G, and 100G links - Wangxun: - support Rx descriptor merge, and Tx head writeback - support Rx coalescing offload - support 25G SPF and 40G QSFP modules - Ethernet virtual: - Google (gve): - allow ethtool to configure rx_buf_len - implement XDP HW RX Timestamping support for DQ descriptor format - Microsoft vNIC (mana): - support HW link state events - handle hardware recovery events when probing the device - Ethernet NICs consumer, and embedded: - usbnet: add support for Byte Queue Limits (BQL) - AMD (amd-xgbe): - add device selftests - NXP (enetc): - add i.MX94 support - Broadcom integrated MACs (bcmgenet, bcmasp): - bcmasp: add support for PHY-based Wake-on-LAN - Broadcom switches (b53): - support port isolation - support BCM5389/97/98 and BCM63XX ARL formats - Lantiq/MaxLinear switches: - support bridge FDB entries on the CPU port - use regmap for register access - allow user to enable/disable learning - support Energy Efficient Ethernet - support configuring RMII clock delays - add tagging driver for MaxLinear GSW1xx switches - Synopsys (stmmac): - support using the HW clock in free running mode - add Eswin EIC7700 support - add Rockchip RK3506 support - add Altera Agilex5 support - Cadence (macb): - cleanup and consolidate descriptor and DMA address handling - add EyeQ5 support - TI: - icssg-prueth: support AF_XDP - Airoha access points: - add missing Ethernet stats and link state callback - add AN7583 support - support out-of-order Tx completion processing - Power over Ethernet: - pd692x0: preserve PSE configuration across reboots - add support for TPS23881B devices - Ethernet PHYs: - Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support - Support 50G SerDes and 100G interfaces in Linux-managed PHYs - micrel: - support for non PTP SKUs of lan8814 - enable in-band auto-negotiation on lan8814 - realtek: - cable testing support on RTL8224 - interrupt support on RTL8221B - motorcomm: support for PHY LEDs on YT853 - microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag - mscc: support for PHY LED control - CAN drivers: - m_can: add support for optional reset and system wake up - remove can_change_mtu() obsoleted by core handling - mcp251xfd: support GPIO controller functionality - Bluetooth: - add initial support for PASTa - WiFi: - split ieee80211.h file, it's way too big - improvements in VHT radiotap reporting, S1G, Channel Switch Announcement handling, rate tracking in mesh networks - improve multi-radio monitor mode support, and add a cfg80211 debugfs interface for it - HT action frame handling on 6 GHz - initial chanctx work towards NAN - MU-MIMO sniffer improvements - WiFi drivers: - RealTek (rtw89): - support USB devices RTL8852AU and RTL8852CU - initial work for RTL8922DE - improved injection support - Intel: - iwlwifi: new sniffer API support - MediaTek (mt76): - WED support for >32-bit DMA - airoha NPU support - regdomain improvements - continued WiFi7/MLO work - Qualcomm/Atheros: - ath10k: factory test support - ath11k: TX power insertion support - ath12k: BSS color change support - ath12k: statistics improvements - brcmfmac: Acer A1 840 tablet quirk - rtl8xxxu: 40 MHz connection fixes/support" * tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits) net: page_pool: sanitise allocation order net: page pool: xa init with destroy on pp init net/mlx5e: Support XDP target xmit with dummy program net/mlx5e: Update XDP features in switch channels selftests/tc-testing: Test CAKE scheduler when enqueue drops packets net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop wireguard: netlink: generate netlink code wireguard: uapi: generate header with ynl-gen wireguard: uapi: move flag enums wireguard: uapi: move enum wg_cmd wireguard: netlink: add YNL specification selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py selftests: drv-net: introduce Iperf3Runner for measurement use cases selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive() Documentation: net: dsa: mention simple HSR offload helpers Documentation: net: dsa: mention availability of RedBox ...
2025-12-04Merge tag 'sched_ext-for-6.19' of ↵Linus Torvalds1-0/+39
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext updates from Tejun Heo: - Improve recovery from misbehaving BPF schedulers. When a scheduler puts many tasks with varying affinity restrictions on a shared DSQ, CPUs scanning through tasks they cannot run can overwhelm the system, causing lockups. Bypass mode now uses per-CPU DSQs with a load balancer to avoid this, and hooks into the hardlockup detector to attempt recovery. Add scx_cpu0 example scheduler to demonstrate this scenario. - Add lockless peek operation for DSQs to reduce lock contention for schedulers that need to query queue state during load balancing. - Allow scx_bpf_reenqueue_local() to be called from anywhere in preparation for deprecating cpu_acquire/release() callbacks in favor of generic BPF hooks. - Prepare for hierarchical scheduler support: add scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime() kfuncs, make scx_bpf_dsq_insert*() return bool, and wrap kfunc args in structs for future aux__prog parameter. - Implement cgroup_set_idle() callback to notify BPF schedulers when a cgroup's idle state changes. - Fix migration tasks being incorrectly downgraded from stop_sched_class to rt_sched_class across sched_ext enable/disable. Applied late as the fix is low risk and the bug subtle but needs stable backporting. - Various fixes and cleanups including cgroup exit ordering, SCX_KICK_WAIT reliability, and backward compatibility improvements. * tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (44 commits) sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks sched_ext: tools: Removing duplicate targets during non-cross compilation sched_ext: Use kvfree_rcu() to release per-cpu ksyncs object sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs sched_ext: Update comments replacing breather with aborting mechanism sched_ext: Implement load balancer for bypass mode sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked() sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR sched_ext: Add scx_cpu0 example scheduler sched_ext: Hook up hardlockup detector sched_ext: Make handle_lockup() propagate scx_verror() result sched_ext: Refactor lockup handlers into handle_lockup() sched_ext: Make scx_exit() and scx_vexit() return bool sched_ext: Exit dispatch and move operations immediately when aborting sched_ext: Simplify breather mechanism with scx_aborting flag sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode sched_ext: Refactor do_enqueue_task() local and global DSQ paths sched_ext: Use shorter slice in bypass mode sched_ext: Mark racy bitfields to prevent adding fields that can't tolerate races sched_ext: Minor cleanups to scx_task_iter ...
2025-12-03Merge tag 'pm-6.19-rc1' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "There are quite a few interesting things here, including new hardware support, new features, some bug fixes and documentation updates. In addition, there are a usual bunch of minor fixes and cleanups all over. In the new hardware support category, there are intel_pstate and intel_rapl driver updates to support new processors, Panther Lake, Wildcat Lake, Noval Lake, and Diamond Rapids in the OOB mode, OPP and bandwidth allocation support in the tegra186 cpufreq driver, and JH7110S SOC support in dt-platdev cpufreq. The new features are the PM QoS CPU latency limit for suspend-to-idle, the netlink support for the energy model management, support for terminating system suspend via a wakeup event during the sync of file systems, configurable number of hibernation compression threads, the runtime PM auto-cleanup macros, and the "poweroff" PM event that is expected to be used during system shutdown. Bugs are mostly fixed in cpuidle governors, but there are also fixes elsewhere, like in the amd-pstate cpufreq driver. Documentation updates include, but are not limited to, a new doc on debugging shutdown hangs, cross-referencing fixes and cleanups in the intel_pstate documentation, and updates of comments in the core hibernation code. Specifics: - Introduce and document a QoS limit on CPU exit latency during wakeup from suspend-to-idle (Ulf Hansson) - Add support for building libcpupower statically (Zuo An) - Add support for sending netlink notifications to user space on energy model updates (Changwoo Mini, Peng Fan) - Minor improvements to the Rust OPP interface (Tamir Duberstein) - Fixes to scope-based pointers in the OPP library (Viresh Kumar) - Use residency threshold in polling state override decisions in the menu cpuidle governor (Aboorva Devarajan) - Add sanity check for exit latency and target residency in the cpufreq core (Rafael Wysocki) - Use this_cpu_ptr() where possible in the teo governor (Christian Loehle) - Rework the handling of tick wakeups in the teo cpuidle governor to increase the likelihood of stopping the scheduler tick in the cases when tick wakeups can be counted as non-timer ones (Rafael Wysocki) - Fix a reverse condition in the teo cpuidle governor and drop a misguided target residency check from it (Rafael Wysocki) - Clean up multiple minor defects in the teo cpuidle governor (Rafael Wysocki) - Update header inclusion to make it follow the Include What You Use principle (Andy Shevchenko) - Enable MSR-based RAPL PMU support in the intel_rapl power capping driver and arrange for using it on the Panther Lake and Wildcat Lake processors (Kuppuswamy Sathyanarayanan) - Add support for Nova Lake and Wildcat Lake processors to the intel_rapl power capping driver (Kaushlendra Kumar, Srinivas Pandruvada) - Add OPP and bandwidth support for Tegra186 (Aaron Kling) - Optimizations for parameter array handling in the amd-pstate cpufreq driver (Mario Limonciello) - Fix for mode changes with offline CPUs in the amd-pstate cpufreq driver (Gautham Shenoy) - Preserve freq_table_sorted across suspend/hibernate in the cpufreq core (Zihuan Zhang) - Adjust energy model rules for Intel hybrid platforms in the intel_pstate cpufreq driver and improve printing of debug messages in it (Rafael Wysocki) - Replace deprecated strcpy() in cpufreq_unregister_governor() (Thorsten Blum) - Fix duplicate hyperlink target errors in the intel_pstate cpufreq driver documentation and use :ref: directive for internal linking in it (Swaraj Gaikwad, Bagas Sanjaya) - Add Diamond Rapids OOB mode support to the intel_pstate cpufreq driver (Kuppuswamy Sathyanarayanan) - Use mutex guard for driver locking in the intel_pstate driver and eliminate some code duplication from it (Rafael Wysocki) - Replace udelay() with usleep_range() in ACPI cpufreq (Kaushlendra Kumar) - Minor improvements to various cpufreq drivers (Christian Marangi, Hal Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu) - Replace snprintf() with scnprintf() in show_trace_dev_match() (Kaushlendra Kumar) - Fix memory allocation error handling in pm_vt_switch_required() (Malaya Kumar Rout) - Introduce CALL_PM_OP() macro and use it to simplify code in generic PM operations (Kaushlendra Kumar) - Add module param to backtrace all CPUs in the device power management watchdog (Sergey Senozhatsky) - Rework message printing in swsusp_save() (Rafael Wysocki) - Make it possible to change the number of hibernation compression threads (Xueqin Luo) - Clarify that only cgroup1 freezer uses PM freezer (Tejun Heo) - Add document on debugging shutdown hangs to PM documentation and correct a mistaken configuration option in it (Mario Limonciello) - Shut down wakeup source timer before removing the wakeup source from the list (Kaushlendra Kumar, Rafael Wysocki) - Introduce new PMSG_POWEROFF event for system shutdown handling with the help of PM device callbacks (Mario Limonciello) - Make pm_test delay interruptible by wakeup events (Riwen Lu) - Clean up kernel-doc comment style usage in the core hibernation code and remove unuseful comments from it (Sunday Adelodun, Rafael Wysocki) - Add support for handling wakeup events and aborting the suspend process while it is syncing file systems (Samuel Wu, Rafael Wysocki) - Add WQ_UNBOUND to pm_wq workqueue (Marco Crivellari) - Add runtime PM wrapper macros for ACQUIRE()/ACQUIRE_ERR() and use them in the PCI core and the ACPI TAD driver (Rafael Wysocki) - Improve runtime PM in the ACPI TAD driver (Rafael Wysocki) - Update pm_runtime_allow/forbid() documentation (Rafael Wysocki) - Fix typos in runtime.c comments (Malaya Kumar Rout) - Move governor.h from devfreq under include/linux/ and rename to devfreq-governor.h to allow devfreq governor definitions in out of drivers/devfreq/ (Dmitry Baryshkov) - Use min() to improve readability in tegra30-devfreq.c (Thorsten Blum) - Fix potential use-after-free issue of OPP handling in hisi_uncore_freq.c (Pengjie Zhang) - Fix typo in DFSO_DOWNDIFFERENTIAL macro name in governor_simpleondemand.c in devfreq (Riwen Lu)" * tag 'pm-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (96 commits) PM / devfreq: Fix typo in DFSO_DOWNDIFFERENTIAL macro name cpuidle: Warn instead of bailing out if target residency check fails cpuidle: Update header inclusion Documentation: power/cpuidle: Document the CPU system wakeup latency QoS cpuidle: Respect the CPU system wakeup QoS limit for cpuidle sched: idle: Respect the CPU system wakeup QoS limit for s2idle pmdomain: Respect the CPU system wakeup QoS limit for cpuidle pmdomain: Respect the CPU system wakeup QoS limit for s2idle PM: QoS: Introduce a CPU system wakeup QoS limit cpuidle: governors: teo: Add missing space to the description PM: hibernate: Extra cleanup of comments in swap handling code PM / devfreq: tegra30: use min to simplify actmon_cpu_to_emc_rate PM / devfreq: hisi: Fix potential UAF in OPP handling PM / devfreq: Move governor.h to a public header location powercap: intel_rapl: Enable MSR-based RAPL PMU support powercap: intel_rapl: Prepare read_raw() interface for atomic-context callers cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list PM: sleep: Call pm_sleep_fs_sync() instead of ksys_sync_helper() PM: sleep: Add support for wakeup during filesystem sync cpufreq: ACPI: Replace udelay() with usleep_range() ...
2025-12-02Merge tag 'timers-core-2025-11-30' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: - Prevent a thundering herd problem when the timekeeper CPU is delayed and a large number of CPUs compete to acquire jiffies_lock to do the update. Limit it to one CPU with a separate "uncontended" atomic variable. - A set of improvements for the timer migration mechanism: - Support imbalanced NUMA trees correctly - Support dynamic exclusion of CPUs from the migrator duty to allow the cpuset/isolation mechanism to exclude them from handling timers of remote idle CPUs - The usual small updates, cleanups and enhancements * tag 'timers-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/migration: Exclude isolated cpus from hierarchy cpumask: Add initialiser to use cleanup helpers sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks() timers/migration: Use scoped_guard on available flag set/clear timers/migration: Add mask for CPUs available in the hierarchy timers/migration: Rename 'online' bit to 'available' selftests/timers/nanosleep: Add tests for return of remaining time selftests/timers: Clean up kernel version check in posix_timers time: Fix a few typos in time[r] related code comments time: tick-oneshot: Add missing Return and parameter descriptions to kernel-doc hrtimer: Store time as ktime_t in restart block timers/migration: Remove dead code handling idle CPU checking for remote timers timers/migration: Remove unused "cpu" parameter from tmigr_get_group() timers/migration: Assert that hotplug preparing CPU is part of stable active hierarchy timers/migration: Fix imbalanced NUMA trees timers/migration: Remove locking on group connection timers/migration: Convert "while" loops to use "for" tick/sched: Limit non-timekeeper CPUs calling jiffies update
2025-12-02Merge tag 'core-rseq-2025-11-30' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rseq updates from Thomas Gleixner: "A large overhaul of the restartable sequences and CID management: The recent enablement of RSEQ in glibc resulted in regressions which are caused by the related overhead. It turned out that the decision to invoke the exit to user work was not really a decision. More or less each context switch caused that. There is a long list of small issues which sums up nicely and results in a 3-4% regression in I/O benchmarks. The other detail which caused issues due to extra work in context switch and task migration is the CID (memory context ID) management. It also requires to use a task work to consolidate the CID space, which is executed in the context of an arbitrary task and results in sporadic uncontrolled exit latencies. The rewrite addresses this by: - Removing deprecated and long unsupported functionality - Moving the related data into dedicated data structures which are optimized for fast path processing. - Caching values so actual decisions can be made - Replacing the current implementation with a optimized inlined variant. - Separating fast and slow path for architectures which use the generic entry code, so that only fault and error handling goes into the TIF_NOTIFY_RESUME handler. - Rewriting the CID management so that it becomes mostly invisible in the context switch path. That moves the work of switching modes into the fork/exit path, which is a reasonable tradeoff. That work is only required when a process creates more threads than the cpuset it is allowed to run on or when enough threads exit after that. An artificial thread pool benchmarks which triggers this did not degrade, it actually improved significantly. The main effect in migration heavy scenarios is that runqueue lock held time and therefore contention goes down significantly" * tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) sched/mmcid: Switch over to the new mechanism sched/mmcid: Implement deferred mode change irqwork: Move data struct to a types header sched/mmcid: Provide CID ownership mode fixup functions sched/mmcid: Provide new scheduler CID mechanism sched/mmcid: Introduce per task/CPU ownership infrastructure sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex sched/mmcid: Provide precomputed maximal value sched/mmcid: Move initialization out of line signal: Move MMCID exit out of sighand lock sched/mmcid: Convert mm CID mask to a bitmap cpumask: Cache num_possible_cpus() sched/mmcid: Use cpumask_weighted_or() cpumask: Introduce cpumask_weighted_or() sched/mmcid: Prevent pointless work in mm_update_cpus_allowed() sched/mmcid: Move scheduler code out of global header sched: Fixup whitespace damage sched/mmcid: Cacheline align MM CID storage sched/mmcid: Use proper data structures sched/mmcid: Revert the complex CID management ...
2025-12-01Merge tag 'vfs-6.19-rc1.inode' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "Features: - Hide inode->i_state behind accessors. Open-coded accesses prevent asserting they are done correctly. One obvious aspect is locking, but significantly more can be checked. For example it can be detected when the code is clearing flags which are already missing, or is setting flags when it is illegal (e.g., I_FREEING when ->i_count > 0) - Provide accessors for ->i_state, converts all filesystems using coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2, overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to compile - Rework I_NEW handling to operate without fences, simplifying the code after the accessor infrastructure is in place Cleanups: - Move wait_on_inode() from writeback.h to fs.h - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb for clarity - Cosmetic fixes to LRU handling - Push list presence check into inode_io_list_del() - Touch up predicts in __d_lookup_rcu() - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage - Assert on ->i_count in iput_final() - Assert ->i_lock held in __iget() Fixes: - Add missing fences to I_NEW handling" * tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) dcache: touch up predicts in __d_lookup_rcu() fs: push list presence check into inode_io_list_del() fs: cosmetic fixes to lru handling fs: rework I_NEW handling to operate without fences fs: make plain ->i_state access fail to compile xfs: use the new ->i_state accessors nilfs2: use the new ->i_state accessors overlayfs: use the new ->i_state accessors gfs2: use the new ->i_state accessors f2fs: use the new ->i_state accessors smb: use the new ->i_state accessors ceph: use the new ->i_state accessors btrfs: use the new ->i_state accessors Manual conversion to use ->i_state accessors of all places not covered by coccinelle Coccinelle-based conversion to use ->i_state accessors fs: provide accessors for ->i_state fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb fs: move wait_on_inode() from writeback.h to fs.h fs: add missing fences to I_NEW handling ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage ...
2025-11-27ext4: rename EXT4_GET_BLOCKS_PRE_IOYang Erkun1-1/+1
This flag has been generalized to split an unwritten extent when we do dio or dioread_nolock writeback, or to avoid merge new extents which was created by extents split. Update some related comments too. Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Yang Erkun <yangerkun@huawei.com> Message-ID: <20251112084538.1658232-2-yangerkun@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-11-25mm/memory-failure: remove the selection of RASXie Yuanbin1-0/+98
commit 97f0b13452198290799f ("tracing: add trace event for memory-failure") introduces the selection of RAS in memory-failure. This commit is just a tracing feature; in reality, there is no dependency between memory-failure and RAS. RAS increases the size of the bzImage image by 8k, which is very valuable for embedded devices. Move the memory-failure traceing code from ras_event.h to memory-failure.h and remove the selection of RAS. Link: https://lkml.kernel.org/r/20251119095943.67125-1-xieyuanbin1@huawei.com Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-25mm/khugepaged: unify SCAN_PMD_NONE and SCAN_PMD_NULL into SCAN_NO_PTE_TABLEWei Yang1-2/+1
The current hugepage collapse scan results include two separate values, SCAN_PMD_NONE and SCAN_PMD_NULL, which are handled identically by the consuming code. To reduce confusion and improve long-term maintenance, this commit merges these two functionally equivalent states into a single, clearer identifier: SCAN_NO_PTE_TABLE Link: https://lkml.kernel.org/r/20251114030028.7035-4-richard.weiyang@gmail.com Suggested-by: "David Hildenbrand (Red Hat)" <david@kernel.org> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nico Pache <npache@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-21mm: introduce VM_MAYBE_GUARD and make visible in /proc/$pid/smapsLorenzo Stoakes1-0/+1
Patch series "introduce VM_MAYBE_GUARD and make it sticky", v4. Currently, guard regions are not visible to users except through /proc/$pid/pagemap, with no explicit visibility at the VMA level. This makes the feature less useful, as it isn't entirely apparent which VMAs may have these entries present, especially when performing actions which walk through memory regions such as those performed by CRIU. This series addresses this issue by introducing the VM_MAYBE_GUARD flag which fulfils this role, updating the smaps logic to display an entry for these. The semantics of this flag are that a guard region MAY be present if set (we cannot be sure, as we can't efficiently track whether an MADV_GUARD_REMOVE finally removes all the guard regions in a VMA) - but if not set the VMA definitely does NOT have any guard regions present. It's problematic to establish this flag without further action, because that means that VMAs with guard regions in them become non-mergeable with adjacent VMAs for no especially good reason. To work around this, this series also introduces the concept of 'sticky' VMA flags - that is flags which: a. if set in one VMA and not in another still permit those VMAs to be merged (if otherwise compatible). b. When they are merged, the resultant VMA must have the flag set. The VMA logic is updated to propagate these flags correctly. Additionally, VM_MAYBE_GUARD being an explicit VMA flag allows us to solve an issue with file-backed guard regions - previously these established an anon_vma object for file-backed mappings solely to have vma_needs_copy() correctly propagate guard region mappings to child processes. We introduce a new flag alias VM_COPY_ON_FORK (which currently only specifies VM_MAYBE_GUARD) and update vma_needs_copy() to check explicitly for this flag and to copy page tables if it is present, which resolves this issue. Additionally, we add the ability for allow-listed VMA flags to be atomically writable with only mmap/VMA read locks held. The only flag we allow so far is VM_MAYBE_GUARD, which we carefully ensure does not cause any races by being allowed to do so. This allows us to maintain guard region installation as a read-locked operation and not endure the overhead of obtaining a write lock here. Finally we introduce extensive VMA userland tests to assert that the sticky VMA logic behaves correctly as well as guard region self tests to assert that smaps visibility is correctly implemented. This patch (of 9): Currently, if a user needs to determine if guard regions are present in a range, they have to scan all VMAs (or have knowledge of which ones might have guard regions). Since commit 8e2f2aeb8b48 ("fs/proc/task_mmu: add guard region bit to pagemap") and the related commit a516403787e0 ("fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions"), users can use either /proc/$pid/pagemap or the PAGEMAP_SCAN functionality to perform this operation at a virtual address level. This is not ideal, and it gives no visibility at a /proc/$pid/smaps level that guard regions exist in ranges. This patch remedies the situation by establishing a new VMA flag, VM_MAYBE_GUARD, to indicate that a VMA may contain guard regions (it is uncertain because we cannot reasonably determine whether a MADV_GUARD_REMOVE call has removed all of the guard regions in a VMA, and additionally VMAs may change across merge/split). We utilise 0x800 for this flag which makes it available to 32-bit architectures also, a flag that was previously used by VM_DENYWRITE, which was removed in commit 8d0920bde5eb ("mm: remove VM_DENYWRITE") and hasn't bee reused yet. We also update the smaps logic and documentation to identify these VMAs. Another major use of this functionality is that we can use it to identify that we ought to copy page tables on fork. We do not actually implement usage of this flag in mm/madvise.c yet as we need to allow some VMA flags to be applied atomically under mmap/VMA read lock in order to avoid the need to acquire a write lock for this purpose. Link: https://lkml.kernel.org/r/cover.1763460113.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/cf8ef821eba29b6c5b5e138fffe95d6dcabdedb9.1763460113.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lance Yang <lance.yang@linux.dev> Cc: Andrei Vagin <avagin@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20timers/migration: Rename 'online' bit to 'available'Gabriele Monaco1-2/+2
The timer migration hierarchy excludes offline CPUs via the tmigr_is_not_available function, which is essentially checking the online bit for the CPU. Rename the online bit to available and all references in function names and tracepoint to generalise the concept of available CPUs. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251120145653.296659-2-gmonaco@redhat.com
2025-11-17Merge back earlier material related to system sleep for 6.19Rafael J. Wysocki1-1/+2
2025-11-17ASoC: asoc.h: convert to snd_soc_dapm_xxx()Kuninori Morimoto1-2/+2
This patch converts below functions. dapm->dev -> snd_soc_dapm_to_dev() dapm->card -> snd_soc_dapm_to_card() dapm->component -> snd_soc_dapm_to_component() dapm_kcontrol_get_value() -> snd_soc_dapm_kcontrol_get_value() snd_soc_component_enable_pin() -> snd_soc_dapm_enable_pin() snd_soc_component_enable_pin_unlocked() -> snd_soc_dapm_enable_pin_unlocked() snd_soc_component_disable_pin() -> snd_soc_dapm_disable_pin() snd_soc_component_disable_pin_unlocked() -> snd_soc_dapm_disable_pin_unlocked() snd_soc_component_nc_pin() -> snd_soc_dapm_nc_pin() snd_soc_component_nc_pin_unlocked() -> snd_soc_dapm_nc_pin_unlocked() snd_soc_component_get_pin_status() -> snd_soc_dapm_get_pin_status() snd_soc_component_force_enable_pin() -> snd_soc_dapm_force_enable_pin() snd_soc_component_force_enable_pin_unlocked() -> snd_soc_dapm_force_enable_pin_unlocked() snd_soc_component_force_bias_level() -> snd_soc_dapm_force_bias_level() snd_soc_component_get_bias_level() -> snd_soc_dapm_get_bias_level() snd_soc_component_init_bias_level() -> snd_soc_dapm_init_bias_level() snd_soc_component_get_dapm() -> snd_soc_component_to_dapm() snd_soc_dapm_kcontrol_component() -> snd_soc_dapm_kcontrol_to_component() snd_soc_dapm_kcontrol_widget() -> snd_soc_dapm_kcontrol_to_widget() snd_soc_dapm_kcontrol_dapm() -> snd_soc_dapm_kcontrol_to_dapm() snd_soc_dapm_np_pin() -> snd_soc_dapm_disable_pin() Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://patch.msgid.link/87346la0cv.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-14PM: Introduce new PMSG_POWEROFF eventMario Limonciello (AMD)1-1/+2
PMSG_POWEROFF will be used for the PM core to allow differentiating between a hibernation or shutdown sequence when re-using callbacks for common code. Hibernation is started by writing a hibernation method (such as 'platform' 'shutdown', or 'reboot') to use into /sys/power/disk and writing 'disk' to /sys/power/state. Shutdown is initiated with the reboot() syscall with arguments on whether to halt the system or power it off. Tested-by: Eric Naim <dnaim@cachyos.org> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Link: https://patch.msgid.link/20251112224025.2051702-2-superm1@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-12sched_ext: Implement load balancer for bypass modeTejun Heo1-0/+39
In bypass mode, tasks are queued on per-CPU bypass DSQs. While this works well in most cases, there is a failure mode where a BPF scheduler can skew task placement severely before triggering bypass in highly over-saturated systems. If most tasks end up concentrated on a few CPUs, those CPUs can accumulate queues that are too long to drain in a reasonable time, leading to RCU stalls and hung tasks. Implement a simple timer-based load balancer that redistributes tasks across CPUs within each NUMA node. The balancer runs periodically (default 500ms, tunable via bypass_lb_intv_us module parameter) and moves tasks from overloaded CPUs to underloaded ones. When moving tasks between bypass DSQs, the load balancer holds nested DSQ locks to avoid dropping and reacquiring the donor DSQ lock on each iteration, as donor DSQs can be very long and highly contended. Add the SCX_ENQ_NESTED flag and use raw_spin_lock_nested() in dispatch_enqueue() to support this. The load balancer timer function reads scx_bypass_depth locklessly to check whether bypass mode is active. Use WRITE_ONCE() when updating scx_bypass_depth to pair with the READ_ONCE() in the timer function. This has been tested on a 192 CPU dual socket AMD EPYC machine with ~20k runnable tasks running scx_cpu0. As scx_cpu0 queues all tasks to CPU0, almost all tasks end up on CPU0 creating severe imbalance. Without the load balancer, disabling the scheduler can lead to RCU stalls and hung tasks, taking a very long time to complete. With the load balancer, disable completes in about a second. The load balancing operation can be monitored using the sched_ext_bypass_lb tracepoint and disabled by setting bypass_lb_intv_us to 0. v2: Lock both rq and DSQ in bypass_lb_cpu() and use dispatch_dequeue_locked() to prevent races with dispatch_dequeue() (Andrea Righi). Cc: Andrea Righi <arighi@nvidia.com> Cc: Dan Schatzberg <schatzberg.dan@gmail.com> Cc: Emil Tsalapatis <etsal@meta.com> Reviewed_by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-06ext4: add two trace points for moving extentsZhang Yi1-0/+74
To facilitate tracking the length, type, and outcome of the move extent, add a trace point at both the entry and exit of mext_move_extent(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <20251013015128.499308-13-yi.zhang@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-11-06ext4: introduce seq counter for the extent status entryZhang Yi1-8/+15
In the iomap_write_iter(), the iomap buffered write frame does not hold any locks between querying the inode extent mapping info and performing page cache writes. As a result, the extent mapping can be changed due to concurrent I/O in flight. Similarly, in the iomap_writepage_map(), the write-back process faces a similar problem: concurrent changes can invalidate the extent mapping before the I/O is submitted. Therefore, both of these processes must recheck the mapping info after acquiring the folio lock. To address this, similar to XFS, we propose introducing an extent sequence number to serve as a validity cookie for the extent. After commit 24b7a2331fcd ("ext4: clairfy the rules for modifying extents"), we can ensure the extent information should always be processed through the extent status tree, and the extent status tree is always uptodate under i_rwsem or invalidate_lock or folio lock, so it's safe to introduce this sequence number. The sequence number will be increased whenever the extent status tree changes, preparing for the buffered write iomap conversion. Besides, this mechanism is also applicable for the moving extents case. In move_extent_per_page(), it also needs to reacquire data_sem and check the mapping info again under the folio lock. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <20251013015128.499308-3-yi.zhang@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-11-05spi: tegra210-quad: Improve timeout handling underMark Brown1-0/+9
Merge series from Vishwaroop A <va@nvidia.com>: This patch series addresses timeout handling issues in the Tegra QSPI driver that occur under high system load conditions. We've observed that when CPUs are saturated (due to error injection, RAS firmware activity, or general CPU contention), QSPI interrupt handlers can be delayed, causing spurious transfer failures even though the hardware completed the operation successfully. These changes have been tested in production environments under various high load scenarios including RAS testing and CPU saturation workloads.
2025-11-04net: add net cookie for net device trace eventsTonghao Zhang1-10/+27
In a multi-network card or container environment, this is needed in order to differentiate between trace events relating to net devices that exist in different network namespaces and share the same name. for xmit_timeout trace events: [002] ..s1. 1838.311662: net_dev_xmit_timeout: dev=eth0 driver=virtio_net queue=10 net_cookie=3 [007] ..s1. 1839.335650: net_dev_xmit_timeout: dev=eth0 driver=virtio_net queue=10 net_cookie=4100 [007] ..s1. 1844.455659: net_dev_xmit_timeout: dev=eth0 driver=virtio_net queue=10 net_cookie=3 [002] ..s1. 1850.087647: net_dev_xmit_timeout: dev=eth0 driver=virtio_net queue=10 net_cookie=3 Cc: Eran Ben Elisha <eranbe@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Simon Horman <horms@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251028043244.82288-1-tonghao@bamaicloud.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-04rseq: Cache CPU ID and MM CID valuesThomas Gleixner1-2/+2
In preparation for rewriting RSEQ exit to user space handling provide storage to cache the CPU ID and MM CID values which were written to user space. That prepares for a quick check, which avoids the update when nothing changed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.841964081@linutronix.de
2025-10-30trace: tcp: add three metrics to trace_tcp_rcvbuf_grow()Eric Dumazet1-0/+9
While chasing yet another receive autotuning bug, I found useful to add rcv_ssthresh, window_clamp and rcv_wnd. tcp_stream 40597 [068] 2172.978198: tcp:tcp_rcvbuf_grow: time=50307 rtt_us=50179 copied=77824 inq=0 space=40960 ooo=0 scaling_ratio=219 rcvbuf=131072 rcv_ssthresh=107474 window_clamp=112128 rcv_wnd=110592 tcp_stream 40597 [068] 2173.028528: tcp:tcp_rcvbuf_grow: time=50336 rtt_us=50206 copied=110592 inq=0 space=77824 ooo=0 scaling_ratio=219 rcvbuf=509444 rcv_ssthresh=328658 window_clamp=435813 rcv_wnd=331776 tcp_stream 40597 [068] 2173.078830: tcp:tcp_rcvbuf_grow: time=50305 rtt_us=50070 copied=270336 inq=0 space=110592 ooo=0 scaling_ratio=219 rcvbuf=509444 rcv_ssthresh=431159 window_clamp=435813 rcv_wnd=434176 tcp_stream 40597 [068] 2173.129137: tcp:tcp_rcvbuf_grow: time=50313 rtt_us=50118 copied=434176 inq=0 space=270336 ooo=0 scaling_ratio=219 rcvbuf=2457847 rcv_ssthresh=1299511 window_clamp=2102611 rcv_wnd=1302528 tcp_stream 40597 [068] 2173.179451: tcp:tcp_rcvbuf_grow: time=50318 rtt_us=50041 copied=1019904 inq=0 space=434176 ooo=0 scaling_ratio=219 rcvbuf=2457847 rcv_ssthresh=2087445 window_clamp=2102611 rcv_wnd=2088960 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20251028-net-tcp-recv-autotune-v3-2-74b43ba4c84c@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29tracing: Display some syscall arrays as stringsSteven Rostedt1-1/+3
Some of the system calls that read a fixed length of memory from the user space address are not arrays but strings. Take a bit away from the nb_args field in the syscall meta data to use as a flag to denote that the system call's user_arg_size is being used as a string. The nb_args should never be more than 6, so 7 bits is plenty to hold that number. When the user_arg_is_str flag that, when set, will display the data array from the user space address as a string and not an array. This will allow the output to look like this: sys_sethostname(name: 0x5584310eb2a0 "debian", len: 6) Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231147.930550359@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-29tracing: Have system call events record user array dataSteven Rostedt1-1/+3
For system call events that have a length field, add a "user_arg_size" parameter to the system call meta data that denotes the index of the args array that holds the size of arg that the user_mask field has a bit set for. The "user_mask" has a bit set that denotes the arg that points to an array in the user space address space and if a system call event has the user_mask field set and the user_arg_size set, it will then record the content of that address into the trace event, up to the size defined by SYSCALL_FAULT_BUF_SZ - 1. This allows the output to look like: sys_write(fd: 0xa, buf: 0x5646978d13c0 (01:00:05:00:00:00:00:00:01:87:55:89:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00), count: 0x20) Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231147.763528474@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-29tracing: Have syscall trace events read user space stringSteven Rostedt1-1/+3
As of commit 654ced4a1377 ("tracing: Introduce tracepoint_is_faultable()") system call trace events allow faulting in user space memory. Have some of the system call trace events take advantage of this. Use the trace_user_fault_read() logic to read the user space buffer from user space and instead of just saving the pointer to the buffer in the system call event, also save the string that is passed in. The syscall event has its nb_args shorten from an int to a short (where even u8 is plenty big enough) and the freed two bytes are used for "user_mask". The new "user_mask" field is used to store the index of the "args" field array that has the address to read from user space. This value is set to 0 if the system call event does not need to read user space for a field. This mask can be used to know if the event may fault or not. Only one bit set in user_mask is supported at this time. This allows the output to look like this: sys_access(filename: 0x7f8c55368470 "/etc/ld.so.preload", mode: 4) sys_execve(filename: 0x564ebcf5a6b8 "/usr/bin/emacs", argv: 0x7fff357c0300, envp: 0x564ebc4a4820) Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231147.261867956@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-27spi: spi-mem: Trace exec_opSean Anderson1-0/+106
The spi subsystem has tracing, which is very convenient when debugging problems. Add tracing for spi-mem too so that accesses that skip the spi subsystem can still be seen. The format is roughly based on the existing spi tracing. We don't bother tracing the op's address because the tracing happens while the memory is locked, so there can be no confusion about the matching of start and stop. The conversion of cmd/addr/dummy to an array is directly analogous to the conversion in the latter half of spi_mem_exec_op. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20251021144702.1582397-1-sean.anderson@linux.dev Signed-off-by: Mark Brown <broonie@kernel.org>
2025-10-20Manual conversion to use ->i_state accessors of all places not covered by ↵Mateusz Guzik1-4/+4
coccinelle Nothing to look at apart from iput_final(). Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-07Merge tag 'dma-mapping-6.18-2025-10-07' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: "Two small fixes for the recently performed code refactoring (Shigeru Yoshida) and missing handling of direction parameter in DMA debug code (Petr Tesarik)" * tag 'dma-mapping-6.18-2025-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: fix direction in dma_alloc direction traces kmsan: fix kmsan_handle_dma() to avoid false positives
2025-10-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-35/+0
Pull kvm updates from Paolo Bonzini: "This excludes the bulk of the x86 changes, which I will send separately. They have two not complex but relatively unusual conflicts so I will wait for other dust to settle. guest_memfd: - Add support for host userspace mapping of guest_memfd-backed memory for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have no way to detect private vs. shared). This lays the groundwork for removal of guest memory from the kernel direct map, as well as for limited mmap() for guest_memfd-backed memory. For more information see: - commit a6ad54137af9 ("Merge branch 'guest-memfd-mmap' into HEAD") - guest_memfd in Firecracker: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding - direct map removal: https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/ - mmap support: https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/ ARM: - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature. LoongArch: - Detect page table walk feature on new hardware - Add sign extension with kernel MMIO/IOCSR emulation - Improve in-kernel IPI emulation - Improve in-kernel PCH-PIC emulation - Move kvm_iocsr tracepoint out of generic code RISC-V: - Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V - Added SBI v3.0 PMU enhancements in KVM and perf driver s390: - Improve interrupt cpu for wakeup, in particular the heuristic to decide which vCPU to deliver a floating interrupt to. - Clear the PTE when discarding a swapped page because of CMMA; this bug was introduced in 6.16 when refactoring gmap code. x86 selftests: - Add #DE coverage in the fastops test (the only exception that's guest- triggerable in fastop-emulated instructions). - Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra Forest (SRF) and Clearwater Forest (CWF). - Minor cleanups and improvements x86 (guest side): - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping could map devices as WB and prevent the device drivers from mapping their devices with UC/UC-. - Make kvm_async_pf_task_wake() a local static helper and remove its export. - Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU bindings even when PV_UNHALT is unsupported. Generic: - Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is now included in GFP_NOWAIT. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits) KVM: s390: Fix to clear PTE when discarding a swapped page KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs KVM: arm64: selftests: Cope with arch silliness in EL2 selftest KVM: arm64: selftests: Add basic test for running in VHE EL2 KVM: arm64: selftests: Enable EL2 by default KVM: arm64: selftests: Initialize HCR_EL2 KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2 KVM: arm64: selftests: Select SMCCC conduit based on current EL KVM: arm64: selftests: Provide helper for getting default vCPU target KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts KVM: arm64: selftests: Create a VGICv3 for 'default' VMs KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation KVM: arm64: selftests: Add helper to check for VGICv3 support KVM: arm64: selftests: Initialize VGICv3 only once KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code KVM: selftests: Add ex_str() to print human friendly name of exception vectors selftests/kvm: remove stale TODO in xapic_state_test KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount ...
2025-10-04Merge tag 'dma-mapping-6.18-2025-09-30' of ↵Linus Torvalds1-5/+4
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: - Refactoring of DMA mapping API to physical addresses as the primary interface instead of page+offset parameters This gets much closer to Matthew Wilcox's long term wish for struct-pageless IO to cacheable DRAM and is supporting memdesc project which seeks to substantially transform how struct page works. An advantage of this approach is the possibility of introducing DMA_ATTR_MMIO, which covers existing 'dma_map_resource' flow in the common paths, what in turn lets to use recently introduced dma_iova_link() API to map PCI P2P MMIO without creating struct page Developped by Leon Romanovsky and Jason Gunthorpe - Minor clean-up by Petr Tesarik and Qianfeng Rong * tag 'dma-mapping-6.18-2025-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: kmsan: fix missed kmsan_handle_dma() signature conversion mm/hmm: properly take MMIO path mm/hmm: migrate to physical address-based DMA mapping API dma-mapping: export new dma_*map_phys() interface xen: swiotlb: Open code map_resource callback dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs() kmsan: convert kmsan_handle_dma to use physical addresses dma-mapping: convert dma_direct_*map_page to be phys_addr_t based iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys() iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys dma-debug: refactor to use physical addresses for page mapping iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link(). dma-mapping: introduce new DMA attribute to indicate MMIO memory swiotlb: Remove redundant __GFP_NOWARN dma-direct: clean up the logic in __dma_direct_alloc_pages()
2025-10-04Merge tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds1-0/+22
Pull NFS client updates from Anna Schumaker: "New Features: - Add a Kconfig option to redirect dfprintk() to the trace buffer - Enable use of the RWF_DONTCACHE flag on the NFS client - Add striped layout handling to pNFS flexfiles - Add proper localio handling for READ and WRITE O_DIRECT Bugfixes: - Handle NFS4ERR_GRACE errors during delegation recall - Fix NFSv4.1 backchannel max_resp_sz verification check - Fix mount hang after CREATE_SESSION failure - Fix d_parent->d_inode locking in nfs4_setup_readdir() Other Cleanups and Improvements: - Improvements to write handling tracepoints - Fix a few trivial spelling mistakes - Cleanups to the rpcbind cleanup call sites - Convert the SUNRPC xdr_buf to use a scratch folio instead of scratch page - Remove unused NFS_WBACK_BUSY() macro - Remove __GFP_NOWARN flags - Unexport rpc_malloc() and rpc_free()" * tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (46 commits) NFS: add basic STATX_DIOALIGN and STATX_DIO_READ_ALIGN support nfs/localio: add tracepoints for misaligned DIO READ and WRITE support nfs/localio: add proper O_DIRECT support for READ and WRITE nfs/localio: refactor iocb initialization nfs/localio: refactor iocb and iov_iter_bvec initialization nfs/localio: avoid issuing misaligned IO using O_DIRECT nfs/localio: make trace_nfs_local_open_fh more useful NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support sunrpc: unexport rpc_malloc() and rpc_free() NFSv4/flexfiles: Add support for striped layouts NFSv4/flexfiles: Update layout stats & error paths for striped layouts NFSv4/flexfiles: Write path updates for striped layouts NFSv4/flexfiles: Commit path updates for striped layouts NFSv4/flexfiles: Read path updates for striped layouts NFSv4/flexfiles: Update low level helper functions to be DS stripe aware. NFSv4/flexfiles: Add data structure support for striped layouts NFSv4/flexfiles: Use ds_commit_idx when marking a write commit NFSv4/flexfiles: Remove cred local variable dependency nfs4_setup_readdir(): insufficient locking for ->d_parent->d_inode dereferencing NFS: Enable use of the RWF_DONTCACHE flag on the NFS client ...
2025-10-03dma-mapping: fix direction in dma_alloc direction tracesPetr Tesarik1-0/+1
Set __entry->dir to the actual "dir" parameter of all trace events in dma_alloc_class. This struct member was left uninitialized by mistake. Signed-off-by: Petr Tesarik <ptesarik@suse.com> Fixes: 3afff779a725 ("dma-mapping: trace dma_alloc/free direction") Cc: stable@vger.kernel.org Reviewed-by: Sean Anderson <sean.anderson@linux.dev> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251001061028.412258-1-ptesarik@suse.com
2025-10-03Merge tag 'mm-stable-2025-10-01-19-00' of ↵Linus Torvalds5-22/+157
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature - "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code - "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON - "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling * tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ...
2025-10-03Merge tag 'net-next-6.18' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core & protocols: - Improve drop account scalability on NUMA hosts for RAW and UDP sockets and the backlog, almost doubling the Pps capacity under DoS - Optimize the UDP RX performance under stress, reducing contention, revisiting the binary layout of the involved data structs and implementing NUMA-aware locking. This improves UDP RX performance by an additional 50%, even more under extreme conditions - Add support for PSP encryption of TCP connections; this mechanism has some similarities with IPsec and TLS, but offers superior HW offloads capabilities - Ongoing work to support Accurate ECN for TCP. AccECN allows more than one congestion notification signal per RTT and is a building block for Low Latency, Low Loss, and Scalable Throughput (L4S) - Reorganize the TCP socket binary layout for data locality, reducing the number of touched cachelines in the fastpath - Refactor skb deferral free to better scale on large multi-NUMA hosts, this improves TCP and UDP RX performances significantly on such HW - Increase the default socket memory buffer limits from 256K to 4M to better fit modern link speeds - Improve handling of setups with a large number of nexthop, making dump operating scaling linearly and avoiding unneeded synchronize_rcu() on delete - Improve bridge handling of VLAN FDB, storing a single entry per bridge instead of one entry per port; this makes the dump order of magnitude faster on large switches - Restore IP ID correctly for encapsulated packets at GSO segmentation time, allowing GRO to merge packets in more scenarios - Improve netfilter matching performance on large sets - Improve MPTCP receive path performance by leveraging recently introduced core infrastructure (skb deferral free) and adopting recent TCP autotuning changes - Allow bridges to redirect to a backup port when the bridge port is administratively down - Introduce MPTCP 'laminar' endpoint that con be used only once per connection and simplify common MPTCP setups - Add RCU safety to dst->dev, closing a lot of possible races - A significant crypto library API for SCTP, MPTCP and IPv6 SR, reducing code duplication - Supports pulling data from an skb frag into the linear area of an XDP buffer Things we sprinkled into general kernel code: - Generate netlink documentation from YAML using an integrated YAML parser Driver API: - Support using IPv6 Flow Label in Rx hash computation and RSS queue selection - Introduce API for fetching the DMA device for a given queue, allowing TCP zerocopy RX on more H/W setups - Make XDP helpers compatible with unreadable memory, allowing more easily building DevMem-enabled drivers with a unified XDP/skbs datapath - Add a new dedicated ethtool callback enabling drivers to provide the number of RX rings directly, improving efficiency and clarity in RX ring queries and RSS configuration - Introduce a burst period for the health reporter, allowing better handling of multiple errors due to the same root cause - Support for DPLL phase offset exponential moving average, controlling the average smoothing factor Device drivers: - Add a new Huawei driver for 3rd gen NIC (hinic3) - Add a new SpacemiT driver for K1 ethernet MAC - Add a generic abstraction for shared memory communication devices (dibps) - Ethernet high-speed NICs: - nVidia/Mellanox: - Use multiple per-queue doorbell, to avoid MMIO contention issues - support adjacent functions, allowing them to delegate their SR-IOV VFs to sibling PFs - support RSS for IPSec offload - support exposing raw cycle counters in PTP and mlx5 - support for disabling host PFs. - Intel (100G, ice, idpf): - ice: support for SRIOV VFs over an Active-Active link aggregate - ice: support for firmware logging via debugfs - ice: support for Earliest TxTime First (ETF) hardware offload - idpf: support basic XDP functionalities and XSk - Broadcom (bnxt): - support Hyper-V VF ID - dynamic SRIOV resource allocations for RoCE - Meta (fbnic): - support queue API, zero-copy Rx and Tx - support basic XDP functionalities - devlink health support for FW crashes and OTP mem corruptions - expand hardware stats coverage to FEC, PHY, and Pause - Wangxun: - support ethtool coalesce options - support for multiple RSS contexts - Ethernet virtual: - Macsec: - replace custom netlink attribute checks with policy-level checks - Bonding: - support aggregator selection based on port priority - Microsoft vNIC: - use page pool fragments for RX buffers instead of full pages to improve memory efficiency - Ethernet NICs consumer, and embedded: - Qualcomm: support Ethernet function for IPQ9574 SoC - Airoha: implement wlan offloading via NPU - Freescale - enetc: add NETC timer PTP driver and add PTP support - fec: enable the Jumbo frame support for i.MX8QM - Renesas (R-Car S4): - support HW offloading for layer 2 switching - support for RZ/{T2H, N2H} SoCs - Cadence (macb): support TAPRIO traffic scheduling - TI: - support for Gigabit ICSS ethernet SoC (icssm-prueth) - Synopsys (stmmac): a lot of cleanups - Ethernet PHYs: - Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS driver - Support bcm63268 GPHY power control - Support for Micrel lan8842 PHY and PTP - Support for Aquantia AQR412 and AQR115 - CAN: - a large CAN-XL preparation work - reorganize raw_sock and uniqframe struct to minimize memory usage - rcar_canfd: update the CAN-FD handling - WiFi: - extended Neighbor Awareness Networking (NAN) support - S1G channel representation cleanup - improve S1G support - WiFi drivers: - Intel (iwlwifi): - major refactor and cleanup - Broadcom (brcm80211): - support for AP isolation - RealTek (rtw88/89) rtw88/89: - preparation work for RTL8922DE support - MediaTek (mt76): - HW restart improvements - MLO support - Qualcomm/Atheros (ath10k): - GTK rekey fixes - Bluetooth drivers: - btusb: support for several new IDs for MT7925 - btintel: support for BlazarIW core - btintel_pcie: support for _suspend() / _resume() - btintel_pcie: support for Scorpious, Panther Lake-H484 IDs" * tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1536 commits) net: stmmac: Add support for Allwinner A523 GMAC200 dt-bindings: net: sun8i-emac: Add A523 GMAC200 compatible Revert "Documentation: net: add flow control guide and document ethtool API" octeontx2-pf: fix bitmap leak octeontx2-vf: fix bitmap leak net/mlx5e: Use extack in set rxfh callback net/mlx5e: Introduce mlx5e_rss_params for RSS configuration net/mlx5e: Introduce mlx5e_rss_init_params net/mlx5e: Remove unused mdev param from RSS indir init net/mlx5: Improve QoS error messages with actual depth values net/mlx5e: Prevent entering switchdev mode with inconsistent netns net/mlx5: HWS, Generalize complex matchers net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs selftests/net: add tcp_port_share to .gitignore Revert "net/mlx5e: Update and set Xon/Xoff upon MTU set" net: add NUMA awareness to skb_attempt_defer_free() net: use llist for sd->defer_list net: make softnet_data.defer_count an atomic selftests: drv-net: psp: add tests for destroying devices selftests: drv-net: psp: add test for auto-adjusting TCP MSS ...
2025-10-02Merge tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds1-1/+1
Pull drm updates from Dave Airlie: "cross-subsystem: - i2c-hid: Make elan touch controllers power on after panel is enabled - dt bindings for STM32MP25 SoC - pci vgaarb: use screen_info helpers - rust pin-init updates - add MEI driver for late binding firmware update/load uapi: - add ioctl for reassigning GEM handles - provide boot_display attribute on boot-up devices core: - document DRM_MODE_PAGE_FLIP_EVENT - add vendor specific recovery method to drm device wedged uevent gem: - Simplify gpuvm locking ttm: - add interface to populate buffers sched: - Fix race condition in trace code atomic: - Reallow no-op async page flips display: - dp: Fix command length video: - Improve pixel-format handling for struct screen_info rust: - drop Opaque<> from ioctl args - Alloc: - BorrowedPage type and AsPageIter traits - Implement Vmalloc::to_page() and VmallocPageIter - DMA/Scatterlist: - Add dma::DataDirection and type alias for dma_addr_t - Abstraction for struct scatterlist and sg_table - DRM: - simplify use of generics - add DriverFile type alias - drop Object::SIZE - Rust: - pin-init tree merge - Various methods for AsBytes and FromBytes traits gpuvm: - Support madvice in Xe driver gpusvm: - fix hmm_pfn_to_map_order usage in gpusvm bridge: - Improve and fix ref counting on bridge management - cdns-dsi: Various improvements to mode setting - Support Solomon SSD2825 plus DT bindings - Support Waveshare DSI2DPI plus DT bindings - Support Content Protection property - display-connector: Improve DP display detection - Add support for Radxa Ra620 plus DT bindings - adv7511: Provide SPD and HDMI infoframes - it6505: Replace crypto_shash with sha() - synopsys: Add support for DW DPTX Controller plus DT bindings - adv7511: Write full Audio infoframe - ite6263: Support vendor-specific infoframes - simple: Add support for Realtek RTD2171 DP-to-HDMI plus DT bindings panel: - panel-edp: Support mt8189 Chromebooks; Support BOE NV140WUM-N64; Support SHP LQ134Z1; Fixes - panel-simple: Support Olimex LCD-OLinuXino-5CTS plus DT bindings - Support Samsung AMS561RA01 - Support Hydis HV101HD1 plus DT bindings - ilitek-ili9881c: Refactor mode setting; Add support for Bestar BSD1218-A101KL68 LCD plus DT bindings - lvds: Add support for Ampire AMP19201200B5TZQW-T03 to DT bindings - edp: Add support for additonal mt8189 Chromebook panels - lvds: Add DT bindings for EDT ETML0700Z8DHA amdgpu: - add CRIU support for gem objects - RAS updates - VCN SRAM load fixes - EDID read fixes - eDP ALPM support - Documentation updates - Rework PTE flag generation - DCE6 fixes - VCN devcoredump cleanup - MMHUB client id fixes - VCN 5.0.1 RAS support - SMU 13.0.x updates - Expanded PCIe DPC support - Expanded VCN reset support - VPE per queue reset support - give kernel jobs unique id for tracing - pre-populate exported buffers - cyan skillfish updates - make vbios build number available in sysfs - userq updates - HDCP updates - support MMIO remap page as ttm pool - JPEG parser updates - DCE6 DC updates - use devm for i2c buses - GPUVM locking updates - Drop non-DC DCE11 code - improve fallback handling for pixel encoding amdkfd: - SVM/page migration fixes - debugfs fixes - add CRIO support for gem objects - SVM updates radeon: - use dev_warn_once in CS parsers xe: - add madvise interface - add DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS to query VMA count and memory attributes - drop L# bank mask reporting from media GT3 on Xe3+. - add SLPC power_profile sysfs interface - add configs attribs to add post/mid context-switch commands - handle firmware reported hardware errors notifying userspace with device wedged uevent - use same dir structure across sysfs/debugfs - cleanup and future proof vram region init - add G-states and PCI link states to debugfs - Add SRIOV support for CCS surfaces on Xe2+ - Enable SRIOV PF mode by default on supported platforms - move flush to common code - extended core workarounds for Xe2/3 - use DRM scheduler for delayed GT TLB invalidations - configs improvements and allow VF device enablement - prep work to expose mmio regions to userspace - VF migration support added - prepare GPU SVM for THP migration - start fixing XE_PAGE_SIZE vs PAGE_SIZE - add PSMI support for hw validation - resize VF bars to max possible size according to number of VFs - Ensure GT is in C0 during resume - pre-populate exported buffers - replace xe_hmm with gpusvm - add more SVM GT stats to debugfs - improve fake pci and WA kunnit handle for new platform testing - Test GuC to GuC comms to add debugging - use attribute groups to simplify sysfs registration - add Late Binding firmware code to interact with MEI i915: - apply multiple JSL/EHL/Gen7/Gen6 workarounds properly - protect against overflow in active_engine() - Use try_cmpxchg64() in __active_lookup() - include GuC registers in error state - get rid of dev->struct_mutex - iopoll: generalize read_poll_timout - lots more display refactoring - Reject HBR3 in any eDP Panel - Prune modes for YUV420 - Display Wa fix, additions, and updates - DP: Fix 2.7 Gbps link training on g4x - DP: Adjust the idle pattern handling - DP: Shuffle the link training code a bit - Don't set/read the DSI C clock divider on GLK - Enable_psr kernel parameter changes - Type-C enabled/disconnected dp-alt sink - Wildcat Lake enabling - DP HDR updates - DRAM detection - wait PSR idle on dsb commit - Remove FBC modulo 4 restriction for ADL-P+ - panic: refactor framebuffer allocation habanalabs: - debug/visibility improvements - vmalloc-backed coherent mmap support - HLDIO infrastructure nova-core: - various register!() macro improvements - minor vbios/firmware fixes/refactoring - advance firmware boot stages; process Booter and patch signatures - process GSP and GSP bootloader - Add r570.144 firmware bindings and update to it - Move GSP boot code to own module - Use new pin-init features to store driver's private data in a single allocation - Update ARef import from sync::aref nova-drm: - Update ARef import from sync::aref tyr: - initial driver skeleton for a rust driver for ARM Mali GPUs - capable of powering up, query metadata and provide it to userspace. msm: - GPU and Core: - in DT bindings describe clocks per GPU type - GMU bandwidth voting for x1-85 - a623/a663 speedbins - cleanup some remaining no-iommu leftovers after VM_BIND conversion - fix GEM obj 32b size truncation - add missing VM_BIND param validation - IFPC for x1-85 and a750 - register xml and gen_header.py sync from mesa - Display: - add missing bindings for display on SC8180X - added DisplayPort MST bindings - conversion from round_rate() to determine_rate() amdxdna: - add IOCTL_AMDXDNA_GET_ARRAY - support user space allocated buffers - streamline PM interfaces - Refactoring wrt. hardware contexts - improve error reporting nouveau: - use GSP firmware by default - improve error reporting - Pre-populate exported buffers ast: - Clean up detection of DRAM config exynos: - add DSIM bridge driver support for Exynos7870 - Document Exynos7870 DSIM compatible in dt-binding panthor: - Print task/pid on errors - Add support for Mali G710, G510, G310, Gx15, Gx20, Gx25 - Improve cache flushing - Fail VM bind if BO has offset renesas: - convert to RUNTIME_PM_OPS rcar-du: - Make number of lanes configurable - Use RUNTIME_PM_OPS - Add support for DSI commands rocket: - Add driver for Rockchip NPU plus DT bindings - Use kfree() and sizeof() correctly - Test DMA status rockchip: - dsi2: Add support for RK3576 plus DT bindings - Add support for RK3588 DPTX output tidss: - Use crtc_ fields for programming display mode - Remove other drivers from aperture pixpaper: - Add support for Mayqueen Pixpaper plus DT bindings v3d: - Support querying nubmer of GPU resets for KHR_robustness stm: - Clean up logging - ltdc: Add support support for STM32MP257F-EV1 plus DT bindings sitronix: - st7571-i2c: Add support for inverted displays and 2-bit grayscale tidss: - Convert to kernel's FIELD_ macros vesadrm: - Support 8-bit palette mode imagination: - Improve power management - Add support for TH1520 GPU - Support Risc-V architectures v3d: - Improve job management and locking vkms: - Support variants of ARGB8888, ARGB16161616, RGB565, RGB888 and P01x - Spport YUV with 16-bit components" * tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel: (1455 commits) drm/amd: Add name to modes from amdgpu_connector_add_common_modes() drm/amd: Drop some common modes from amdgpu_connector_add_common_modes() drm/amdgpu: update MODULE_PARM_DESC for freesync_video drm/amd: Use dynamic array size declaration for amdgpu_connector_add_common_modes() drm/amd/display: Share dce100_validate_global with DCE6-8 drm/amd/display: Share dce100_validate_bandwidth with DCE6-8 drm/amdgpu: Fix fence signaling race condition in userqueue amd/amdkfd: enhance kfd process check in switch partition amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw drm/amd/display: Reject modes with too high pixel clock on DCE6-10 drm/amd: Drop unnecessary check in amdgpu_connector_add_common_modes() drm/amd/display: Only enable common modes for eDP and LVDS drm/amdgpu: remove the redeclaration of variable i drm/amdgpu/userq: assign an error code for invalid userq va drm/amdgpu: revert "rework reserved VMID handling" v2 drm/amdgpu: remove leftover from enforcing isolation by VMID drm/amdgpu: Add fallback to pipe reset if KCQ ring reset fails accel/habanalabs: add Infineon version check accel/habanalabs/gaudi2: read preboot status after recovering from dirty state accel/habanalabs: add HL_GET_P_STATE passthrough type ...
2025-10-02Merge tag 'for-6.18/io_uring-20250929' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring updates from Jens Axboe: - Store ring provided buffers locally for the users, rather than stuff them into struct io_kiocb. These types of buffers must always be fully consumed or recycled in the current context, and leaving them in struct io_kiocb is hence not a good ideas as that struct has a vastly different life time. Basically just an architecture cleanup that can help prevent issues with ring provided buffers in the future. - Support for mixed CQE sizes in the same ring. Before this change, a CQ ring either used the default 16b CQEs, or it was setup with 32b CQE using IORING_SETUP_CQE32. For use cases where a few 32b CQEs were needed, this caused everything else to use big CQEs. This is wasteful both in terms of memory usage, but also memory bandwidth for the posted CQEs. With IORING_SETUP_CQE_MIXED, applications may use request types that post both normal 16b and big 32b CQEs on the same ring. - Add helpers for async data management, to make it harder for opcode handlers to mess it up. - Add support for multishot for uring_cmd, which ublk can use. This helps improve efficiency, by providing a persistent request type that can trigger multiple CQEs. - Add initial support for ring feature querying. We had basic support for probe operations, but the API isn't great. Rather than expand that, add support for QUERY which is easily expandable and can cover a lot more cases than the existing probe support. This will help applications get a better idea of what operations are supported on a given host. - zcrx improvements from Pavel: - Improve refill entry alignment for better caching - Various cleanups, especially around deduplicating normal memory vs dmabuf setup. - Generalisation of the niov size (Patch 12). It's still hard coded to PAGE_SIZE on init, but will let the user to specify the rx buffer length on setup. - Syscall / synchronous bufer return. It'll be used as a slow fallback path for returning buffers when the refill queue is full. Useful for tolerating slight queue size misconfiguration or with inconsistent load. - Accounting more memory to cgroups. - Additional independent cleanups that will also be useful for mutli-area support. - Various fixes and cleanups * tag 'for-6.18/io_uring-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (68 commits) io_uring/cmd: drop unused res2 param from io_uring_cmd_done() io_uring: fix nvme's 32b cqes on mixed cq io_uring/query: cap number of queries io_uring/query: prevent infinite loops io_uring/zcrx: account niov arrays to cgroup io_uring/zcrx: allow synchronous buffer return io_uring/zcrx: introduce io_parse_rqe() io_uring/zcrx: don't adjust free cache space io_uring/zcrx: use guards for the refill lock io_uring/zcrx: reduce netmem scope in refill io_uring/zcrx: protect netdev with pp_lock io_uring/zcrx: rename dma lock io_uring/zcrx: make niov size variable io_uring/zcrx: set sgt for umem area io_uring/zcrx: remove dmabuf_offset io_uring/zcrx: deduplicate area mapping io_uring/zcrx: pass ifq to io_zcrx_alloc_fallback() io_uring/zcrx: check all niovs filled with dma addresses io_uring/zcrx: move area reg checks into io_import_area io_uring/zcrx: don't pass slot to io_zcrx_create_area ...