summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
44 hoursMerge tag 'mm-hotfixes-stable-2026-04-19-00-14' of ↵HEADmasterLinus Torvalds8-12/+27
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM fixes from Andrew Morton: "7 hotfixes. 6 are cc:stable and all are for MM. Please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2026-04-19-00-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/damon/core: disallow non-power of two min_region_sz on damon_start() mm/vmalloc: take vmap_purge_lock in shrinker mm: call ->free_folio() directly in folio_unmap_invalidate() mm: blk-cgroup: fix use-after-free in cgwb_release_workfn() mm/zone_device: do not touch device folio after calling ->folio_free() mm/damon/core: disallow time-quota setting zero esz mm/mempolicy: fix weighted interleave auto sysfs name
46 hoursMerge tag 'driver-core-7.1-rc1-2' of ↵Linus Torvalds6-3/+88
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core fixes from Danilo Krummrich: - Prevent a device from being probed before device_add() has finished initializing it; gate probe with a "ready_to_probe" device flag to avoid races with concurrent driver_register() calls - Fix a kernel-doc warning for DEV_FLAG_COUNT introduced by the above - Return -ENOTCONN from software_node_get_reference_args() when a referenced software node is known but not yet registered, allowing callers to defer probe - In sysfs_group_attrs_change_owner(), also check is_visible_const(); missed when the const variant was introduced * tag 'driver-core-7.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: driver core: Add kernel-doc for DEV_FLAG_COUNT enum value sysfs: attribute_group: Respect is_visible_const() when changing owner software node: return -ENOTCONN when referenced swnode is not registered yet driver core: Don't let a device probe until it's ready
2 daysMerge tag 'staging-7.1-rc1' of ↵Linus Torvalds91-2251/+1719
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver updates from Greg KH: "Here is the "big" set of staging driver changes for 7.1-rc1. Nothing major in here at all, just lots of little cleanups for the staging drivers, driven by new developers getting their feet wet in kernel development. "Largest" thing in here is the change of some of the octeon variable types into proper kernel ones. Full details are in the shortlog. All of these have been in linux-next for a while with no reported issues" * tag 'staging-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (154 commits) staging: rtl8723bs: remove redundant & parentheses staging: most: dim2: replace BUG_ON() in poison_channel() staging: most: dim2: replace BUG_ON() in enqueue() staging: most: dim2: replace BUG_ON() in configure_channel() staging: most: dim2: replace BUG_ON() in service_done_flag() staging: most: dim2: replace BUG_ON() in try_start_dim_transfer() staging: rtl8723bs: remove unused RTL8188E antenna selection macros staging: rtl8723bs: remove redundant blank lines in basic_types.h staging: rtl8723bs: wrap complex macros with parentheses staging: rtl8723bs: remove unused WRITEEF/READEF byte macros staging: rtl8723bs: rename camelCase variable staging: greybus: audio: fix error message for BTN_3 button staging: rtl8723bs: rename variables to snake_case staging: rtl8723bs: fix spelling in comment staging: rtl8723bs: cleanup return in sdio_init() staging: rtl8723bs: use direct returns in sdio_dvobj_init() staging: rtl8723bs: remove unused arg at odm_interface.h greybus: raw: fix use-after-free if write is called after disconnect greybus: raw: fix use-after-free on cdev close staging: rtl8723bs: fix logical continuations in xmit_linux.c ...
2 daysMerge tag 'usb-7.1-rc1' of ↵Linus Torvalds134-1381/+4763
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB / Thunderbolt updates from Greg KH: "Here is the big set of USB and Thunderbolt changes for 7.1-rc1. Lots of little things in here, nothing major, just constant improvements, updates, and new features. Highlights are: - new USB power supply driver support. These changes did touch outside of drivers/usb/ but got acks from the relevant mantainers for them. - dts file updates and conversions - string function conversions into "safer" ones - new device quirks - xhci driver updates - usb gadget driver minor fixes - typec driver additions and updates - small number of thunderbolt driver changes - dwc3 driver updates and additions of new hardware support - other minor driver updates All of these have been in the linux-next tree for a while with no reported issues" * tag 'usb-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (176 commits) usb: dwc3: starfive: Add JHB100 USB 2.0 DRD controller dt-bindings: usb: dwc3: add support for StarFive JHB100 dt-bindings: usb: atmel,at91sam9rl-udc: convert to DT schema dt-bindings: usb: atmel,at91rm9200-udc: convert to DT schema dt-bindings: usb: generic-ehci: fix schema structure and add at91sam9g45 constraints dt-bindings: usb: generic-ohci: add AT91RM9200 OHCI binding support arm: dts: at91: remove unused #address-cells/#size-cells from sam9x60 udc node drivers/usb/host: Fix spelling error 'seperate' -> 'separate' usbip: tools: add hint when no exported devices are found USB: serial: iuu_phoenix: fix iuutool author name usb: gadget: f_ncm: validate minimum block_len in ncm_unwrap_ntb() usb: gadget: f_phonet: fix skb frags[] overflow in pn_rx_complete() usb: gadget: f_hid: Add missing error code usb: typec: cros_ec_ucsi: Load driver from OF and ACPI definitions dt-bindings: chrome: Add cros-ec-ucsi compatibility to typec binding USB: of: Simplify with scoped for each OF child loop usbip: validate number_of_packets in usbip_pack_ret_submit() usb: gadget: renesas_usb3: validate endpoint index in standard request handlers usb: core: config: reverse the size check of the SSP isoc endpoint descriptor usb: typec: ucsi: Set usb mode on partner change ...
2 daysMerge tag 'tty-7.1-rc1' of ↵Linus Torvalds43-1364/+532
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial updates from Greg KH: "Here is the set of tty and serial driver changes for 7.1-rc1. Not much here this cycle, biggest thing is the removal of an old driver that never got any actual hardware support (esp32), and the second try to moving the tty ports to their own workqueues (first try was in 7.0-rc1 but was reverted due to problems) Otherwise it's just a small set of driver updates and some vt modifier key enhancements. All have been in linux-next for a while with no reported issues" * tag 'tty-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (35 commits) tty: serial: ip22zilog: Fix section mispatch warning hvc/xen: Check console connection flag serial: sh-sci: Add support for RZ/G3L RSCI dt-bindings: serial: renesas,rsci: Document RZ/G3L SoC tty: atmel_serial: update outdated reference to atmel_tasklet_func() serial: xilinx_uartps: Drop unused include serial: qcom-geni: drop stray newline format specifier serial: 8250: loongson: Enable building on MIPS Loongson64 dt-bindings: serial: 8250: Add Loongson 3A4000 uart compatible serial: 8250_fintek: Add support for F81214E tty: tty_port: add workqueue to flip TTY buffer vt: support ITU-T T.416 color subparameters serial: qcom-geni: Fix RTS behavior with flow control tty: serial: imx: keep dma request disabled before dma transfer setup tty: serial: 8250: Add SystemBase Multi I/O cards serial: pic32_uart: allow driver to be compiled on all architectures with COMPILE_TEST serial: tegra: remove Kconfig dependency on APB DMA controller dt-bindings: serial: amlogic,meson-uart: Add compatible string for A9 dt-bindings: serial: atmel,at91-usart: add microchip,lan9691-usart serial: auart: check clk_enable() return in console write ...
2 daysMerge tag 'mm-stable-2026-04-18-02-14' of ↵Linus Torvalds84-1675/+2814
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - "Eliminate Dying Memory Cgroup" (Qi Zheng and Muchun Song) Address the longstanding "dying memcg problem". A situation wherein a no-longer-used memory control group will hang around for an extended period pointlessly consuming memory - "fix unexpected type conversions and potential overflows" (Qi Zheng) Fix a couple of potential 32-bit/64-bit issues which were identified during review of the "Eliminate Dying Memory Cgroup" series - "kho: history: track previous kernel version and kexec boot count" (Breno Leitao) Use Kexec Handover (KHO) to pass the previous kernel's version string and the number of kexec reboots since the last cold boot to the next kernel, and print it at boot time - "liveupdate: prevent double preservation" (Pasha Tatashin) Teach LUO to avoid managing the same file across different active sessions - "liveupdate: Fix module unloading and unregister API" (Pasha Tatashin) Address an issue with how LUO handles module reference counting and unregistration during module unloading - "zswap pool per-CPU acomp_ctx simplifications" (Kanchana Sridhar) Simplify and clean up the zswap crypto compression handling and improve the lifecycle management of zswap pool's per-CPU acomp_ctx resources - "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race" (SeongJae Park) Address unlikely but possible leaks and deadlocks in damon_call() and damon_walk() - "mm/damon/core: validate damos_quota_goal->nid" (SeongJae Park) Fix a couple of root-only wild pointer dereferences - "Docs/admin-guide/mm/damon: warn commit_inputs vs other params race" (SeongJae Park) Update the DAMON documentation to warn operators about potential races which can occur if the commit_inputs parameter is altered at the wrong time - "Minor hmm_test fixes and cleanups" (Alistair Popple) Bugfixes and a cleanup for the HMM kernel selftests - "Modify memfd_luo code" (Chenghao Duan) Cleanups, simplifications and speedups to the memfd_lou code - "mm, kvm: allow uffd support in guest_memfd" (Mike Rapoport) Support for userfaultfd in guest_memfd - "selftests/mm: skip several tests when thp is not available" (Chunyu Hu) Fix several issues in the selftests code which were causing breakage when the tests were run on CONFIG_THP=n kernels - "mm/mprotect: micro-optimization work" (Pedro Falcato) A couple of nice speedups for mprotect() - "MAINTAINERS: update KHO and LIVE UPDATE entries" (Pratyush Yadav) Document upcoming changes in the maintenance of KHO, LUO, memfd_luo, kexec, crash, kdump and probably other kexec-based things - they are being moved out of mm.git and into a new git tree * tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (121 commits) MAINTAINERS: add page cache reviewer mm/vmscan: avoid false-positive -Wuninitialized warning MAINTAINERS: update Dave's kdump reviewer email address MAINTAINERS: drop include/linux/liveupdate from LIVE UPDATE MAINTAINERS: drop include/linux/kho/abi/ from KHO MAINTAINERS: update KHO and LIVE UPDATE maintainers MAINTAINERS: update kexec/kdump maintainers entries mm/migrate_device: remove dead migration entry check in migrate_vma_collect_huge_pmd() selftests: mm: skip charge_reserved_hugetlb without killall userfaultfd: allow registration of ranges below mmap_min_addr mm/vmstat: fix vmstat_shepherd double-scheduling vmstat_update mm/hugetlb: fix early boot crash on parameters without '=' separator zram: reject unrecognized type= values in recompress_store() docs: proc: document ProtectionKey in smaps mm/mprotect: special-case small folios when applying permissions mm/mprotect: move softleaf code out of the main function mm: remove '!root_reclaim' checking in should_abort_scan() mm/sparse: fix comment for section map alignment mm/page_io: use sio->len for PSWPIN accounting in sio_read_complete() selftests/mm: transhuge_stress: skip the test when thp not available ...
2 daysmm/damon/core: disallow non-power of two min_region_sz on damon_start()SeongJae Park1-0/+5
Commit d8f867fa0825 ("mm/damon: add damon_ctx->min_sz_region") introduced a bug that allows unaligned DAMON region address ranges. Commit c80f46ac228b ("mm/damon/core: disallow non-power of two min_region_sz") fixed it, but only for damon_commit_ctx() use case. Still, DAMON sysfs interface can emit non-power of two min_region_sz via damon_start(). Fix the path by adding the is_power_of_2() check on damon_start(). The issue was discovered by sashiko [1]. Link: https://lore.kernel.org/20260411213638.77768-1-sj@kernel.org Link: https://lore.kernel.org/20260403155530.64647-1-sj@kernel.org [1] Fixes: d8f867fa0825 ("mm/damon: add damon_ctx->min_sz_region") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 6.18.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 daysmm/vmalloc: take vmap_purge_lock in shrinkerUladzislau Rezki (Sony)1-0/+1
decay_va_pool_node() can be invoked concurrently from two paths: __purge_vmap_area_lazy() when pools are being purged, and the shrinker via vmap_node_shrink_scan(). However, decay_va_pool_node() is not safe to run concurrently, and the shrinker path currently lacks serialization, leading to races and possible leaks. Protect decay_va_pool_node() by taking vmap_purge_lock in the shrinker path to ensure serialization with purge users. Link: https://lore.kernel.org/20260413192646.14683-1-urezki@gmail.com Fixes: 7679ba6b36db ("mm: vmalloc: add a shrinker to drain vmap pools") Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Baoquan He <baoquan.he@linux.dev> Cc: chenyichong <chenyichong@uniontech.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 daysmm: call ->free_folio() directly in folio_unmap_invalidate()Matthew Wilcox (Oracle)3-3/+7
We can only call filemap_free_folio() if we have a reference to (or hold a lock on) the mapping. Otherwise, we've already removed the folio from the mapping so it no longer pins the mapping and the mapping can be removed, causing a use-after-free when accessing mapping->a_ops. Follow the same pattern as __remove_mapping() and load the free_folio function pointer before dropping the lock on the mapping. That lets us make filemap_free_folio() static as this was the only caller outside filemap.c. Link: https://lore.kernel.org/20260413184314.3419945-1-willy@infradead.org Fixes: fb7d3bc41493 ("mm/filemap: drop streaming/uncached pages when writeback completes") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: Google Big Sleep <big-sleep-vuln-reports+bigsleep-501448199@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 daysmm: blk-cgroup: fix use-after-free in cgwb_release_workfn()Breno Leitao1-2/+3
cgwb_release_workfn() calls css_put(wb->blkcg_css) and then later accesses wb->blkcg_css again via blkcg_unpin_online(). If css_put() drops the last reference, the blkcg can be freed asynchronously (css_free_rwork_fn -> blkcg_css_free -> kfree) before blkcg_unpin_online() dereferences the pointer to access blkcg->online_pin, resulting in a use-after-free: BUG: KASAN: slab-use-after-free in blkcg_unpin_online (./include/linux/instrumented.h:112 ./include/linux/atomic/atomic-instrumented.h:400 ./include/linux/refcount.h:389 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 block/blk-cgroup.c:1367) Write of size 4 at addr ff11000117aa6160 by task kworker/71:1/531 Workqueue: cgwb_release cgwb_release_workfn Call Trace: <TASK> blkcg_unpin_online (./include/linux/instrumented.h:112 ./include/linux/atomic/atomic-instrumented.h:400 ./include/linux/refcount.h:389 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 block/blk-cgroup.c:1367) cgwb_release_workfn (mm/backing-dev.c:629) process_scheduled_works (kernel/workqueue.c:3278 kernel/workqueue.c:3385) Freed by task 1016: kfree (./include/linux/kasan.h:235 mm/slub.c:2689 mm/slub.c:6246 mm/slub.c:6561) css_free_rwork_fn (kernel/cgroup/cgroup.c:5542) process_scheduled_works (kernel/workqueue.c:3302 kernel/workqueue.c:3385) ** Stack based on commit 66672af7a095 ("Add linux-next specific files for 20260410") I am seeing this crash sporadically in Meta fleet across multiple kernel versions. A full reproducer is available at: https://github.com/leitao/debug/blob/main/reproducers/repro_blkcg_uaf.sh (The race window is narrow. To make it easily reproducible, inject a msleep(100) between css_put() and blkcg_unpin_online() in cgwb_release_workfn(). With that delay and a KASAN-enabled kernel, the reproducer triggers the splat reliably in less than a second.) Fix this by moving blkcg_unpin_online() before css_put(), so the cgwb's CSS reference keeps the blkcg alive while blkcg_unpin_online() accesses it. Link: https://lore.kernel.org/20260413-blkcg-v1-1-35b72622d16c@debian.org Fixes: 59b57717fff8 ("blkcg: delay blkg destruction until after writeback has finished") Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: JP Kobryn <inwardvessel@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 daysmm/zone_device: do not touch device folio after calling ->folio_free()Matthew Brost1-1/+1
The contents of a device folio can immediately change after calling ->folio_free(), as the folio may be reallocated by a driver with a different order. Instead of touching the folio again to extract the pgmap, use the local stack variable when calling percpu_ref_put_many(). Link: https://lore.kernel.org/20260410230346.4009855-1-matthew.brost@intel.com Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Balbir Singh <balbirs@nvidia.com> Reviewed-by: Vishal Moola <vishal.moola@gmail.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Cc: David Hildenbrand <david@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 daysmm/damon/core: disallow time-quota setting zero eszSeongJae Park1-3/+5
When the throughput of a DAMOS scheme is very slow, DAMOS time quota can make the effective size quota smaller than damon_ctx->min_region_sz. In the case, damos_apply_scheme() will skip applying the action, because the action is tried at region level, which requires >=min_region_sz size. That is, the quota is effectively exceeded for the quota charge window. Because no action will be applied, the total_charged_sz and total_charged_ns are also not updated. damos_set_effective_quota() will try to update the effective size quota before starting the next charge window. However, because the total_charged_sz and total_charged_ns have not updated, the throughput and effective size quota are also not changed. Since effective size quota can only be decreased, other effective size quota update factors including DAMOS quota goals and size quota cannot make any change, either. As a result, the scheme is unexpectedly deactivated until the user notices and mitigates the situation. The users can mitigate this situation by changing the time quota online or re-install the scheme. While the mitigation is somewhat straightforward, finding the situation would be challenging, because DAMON is not providing good observabilities for that. Even if such observability is provided, doing the additional monitoring and the mitigation is somewhat cumbersome and not aligned to the intention of the time quota. The time quota was intended to help reduce the user's administration overhead. Fix the problem by setting time quota-modified effective size quota be at least min_region_sz always. The issue was discovered [1] by sashiko. Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org [1] Fixes: 1cd243030059 ("mm/damon/schemes: implement time quota") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 5.16.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 daysmm/mempolicy: fix weighted interleave auto sysfs nameJoshua Hahn1-3/+5
The __ATTR macro is a utility that makes defining kobj_attributes easier by stringfying the name, verifying the mode, and setting the show/store fields in a single initializer. It takes a raw token as the first value, rather than a string, so that __ATTR family macros like __ATTR_RW can token-paste it for inferring the _show / _store function names. Commit e341f9c3c841 ("mm/mempolicy: Weighted Interleave Auto-tuning") used the __ATTR macro to define the "auto" sysfs for weighted interleave. A few months later, commit 2fb6915fa22d ("compiler_types.h: add "auto" as a macro for "__auto_type"") introduced a #define macro which expanded auto into __auto_type. This led to the "auto" token passed into __ATTR to be expanded out into __auto_type, and the sysfs entry to be displayed as __auto_type as well. Expand out the __ATTR macro and directly pass a string "auto" instead of the raw token 'auto' to prevent it from being expanded out. Also bypass the VERIFY_OCTAL_PERMISSIONS check by triple checking that 0664 is indeed the intended permissions for this sysfs file. Before: $ ls /sys/kernel/mm/mempolicy/weighted_interleave __auto_type node0 After: $ ls /sys/kernel/mm/mempolicy/weighted_interleave/ auto node0 Link: https://lore.kernel.org/20260407141415.3080960-1-joshua.hahnjy@gmail.com Fixes: 2fb6915fa22d ("compiler_types.h: add "auto" as a macro for "__auto_type"") Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Rakie Kim <rakie.kim@sk.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Byungchul Park <byungchul@sk.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Ying Huang <ying.huang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysMerge tag 'pinctrl-v7.1-1' of ↵Linus Torvalds120-606/+9879
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control updates from Linus Walleij: "Core changes: - Perform basic checks on pin config properties so as not to allow directly contradictory settings such as setting a pin to more than one bias or drive mode - Handle input-threshold-voltage-microvolt property - Introduce pinctrl_gpio_get_config() handling in the core for SCMI GPIO using pin control New drivers: - GPIO-by-pin control driver (also appearing in the GPIO pull request) fulfilling a promise on a comment from Grant Likely many years ago: "can't GPIO just be a front-end for pin control?" it turns out it can, if and only if you design something new from scratch, such as SCMI - Broadcom BCM7038 as a pinctrl-single delegate - Mobileye EyeQ6Lplus OLB pin controller - Qualcomm Eliza and Hawi families TLMM pin controllers - Qualcomm SDM670 and Milos family LPASS LPI pin controllers - Qualcomm IPQ5210 pin controller - Realtek RTD1625 pin controller support - Rockchip RV1103B pin controller support - Texas Instruments AM62L as a pinctrl-single delegate Improvements: - Set config implementation for the Spacemit K1 pin controller" * tag 'pinctrl-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (84 commits) pinctrl: qcom: Add Hawi pinctrl driver dt-bindings: pinctrl: qcom: Describe Hawi TLMM block dt-bindings: pinctrl: pinctrl-max77620: convert to DT schema pinctrl: single: Add bcm7038-padconf compatible matching dt-bindings: pinctrl: pinctrl-single: Add brcm,bcm7038-padconf dt-bindings: pinctrl: apple,pinctrl: Add t8122 compatible pinctrl: qcom: sdm670-lpass-lpi: label variables as static pinctrl: sophgo: pinctrl-sg2044: Fix wrong module description pinctrl: sophgo: pinctrl-sg2042: Fix wrong module description pinctrl: qcom: add sdm670 lpi tlmm dt-bindings: pinctrl: qcom: Add SDM670 LPASS LPI pinctrl dt-bindings: qcom: lpass-lpi-common: add reserved GPIOs property pinctrl: qcom: Introduce IPQ5210 TLMM driver dt-bindings: pinctrl: qcom: add IPQ5210 pinctrl pinctrl: qcom: Drop redundant intr_target_reg on modern SoCs pinctrl: qcom: eliza: Fix interrupt target bit pinctrl: core: Don't use "proxy" headers pinctrl: amd: Support new ACPI ID AMDI0033 pinctrl: renesas: rzg2l: Drop superfluous blank line pinctrl: renesas: rzg2l: Fix save/restore of {IOLH,IEN,PUPD,SMT} registers ...
3 daysMerge tag 'i3c/for-7.1' of ↵Linus Torvalds11-100/+293
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux Pull i3c updates from Alexandre Belloni: "Subsystem: - add sysfs option to rescan bus via entdaa - fix error code handling for send_ccc_cmd Drivers: - mipi-i3c-hci-pci: Intel Nova Lake-H I3C support" * tag 'i3c/for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: (22 commits) i3c: mipi-i3c-hci: fix IBI payload length calculation for final status i3c: master: adi: Fix error propagation for CCCs i3c: master: Fix error codes at send_ccc_cmd i3c: master: Move bus_init error suppression i3c: master: Move entdaa error suppression i3c: master: Move rstdaa error suppression i3c: dw: Simplify xfer cleanup with __free(kfree) i3c: dw: Fix memory leak in dw_i3c_master_i3c_xfers() i3c: master: renesas: Use __free(kfree) for xfer cleanup in renesas_i3c_send_ccc_cmd() i3c: master: renesas: Fix memory leak in renesas_i3c_i3c_xfers() i3c: master: dw-i3c: Balance PM runtime usage count on probe failure i3c: master: dw-i3c: Fix missing reset assertion in remove() callback i3c: mipi-i3c-hci-pci: Enable IBI while runtime suspended for Intel controllers i3c: mipi-i3c-hci-pci: Add optional ability to manage child runtime PM i3c: mipi-i3c-hci: Allow parent to manage runtime PM i3c: mipi-i3c-hci: Add quirk to allow IBI while runtime suspended i3c: mipi-i3c-hci-pci: Set d3hot_delay to 0 for Intel controllers i3c: fix missing newline in dev_err messages i3c: master: use kzalloc_flex i3c: mipi-i3c-hci-pci: Add support for Intel Nova Lake-H I3C ...
3 daysMerge tag 'parisc-for-7.1-rc1' of ↵Linus Torvalds16-212/+56
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc architecture updates from Helge Deller: - A fix to make modules on 32-bit parisc architecture work again - Drop ip_fast_csum() inline assembly to avoid unaligned memory accesses - Allow to build kernel without 32-bit VDSO - Reference leak fix in error path in LED driver * tag 'parisc-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: led: fix reference leak on failed device registration module.lds.S: Fix modules on 32-bit parisc architecture parisc: Allow to build without VDSO32 parisc: Include 32-bit VDSO only when building for 32-bit or compat mode parisc: Allow to disable COMPAT mode on 64-bit kernel parisc: Fix default stack size when COMPAT=n parisc: Fix signal code to depend on CONFIG_COMPAT instead of CONFIG_64BIT parisc: is_compat_task() shall return false for COMPAT=n parisc: Avoid compat syscalls when COMPAT=n parisc: _llseek syscall is only available for 32-bit userspace parisc: Drop ip_fast_csum() inline assembly implementation parisc: update outdated comments for renamed ccio_alloc_consistent()
3 daysMerge tag 'memblock-v7.1-rc1' of ↵Linus Torvalds25-198/+271
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock updates from Mike Rapoport: - improve debuggability of reserve_mem kernel parameter handling with print outs in case of a failure and debugfs info showing what was actually reserved - Make memblock_free_late() and free_reserved_area() use the same core logic for freeing the memory to buddy and ensure it takes care of updating memblock arrays when ARCH_KEEP_MEMBLOCK is enabled. * tag 'memblock-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: x86/alternative: delay freeing of smp_locks section memblock: warn when freeing reserved memory before memory map is initialized memblock, treewide: make memblock_free() handle late freeing memblock: make free_reserved_area() update memblock if ARCH_KEEP_MEMBLOCK=y memblock: extract page freeing from free_reserved_area() into a helper memblock: make free_reserved_area() more robust mm: move free_reserved_area() to mm/memblock.c powerpc: opal-core: pair alloc_pages_exact() with free_pages_exact() powerpc: fadump: pair alloc_pages_exact() with free_pages_exact() memblock: reserve_mem: fix end caclulation in reserve_mem_release_by_name() memblock: move reserve_bootmem_range() to memblock.c and make it static memblock: Add reserve_mem debugfs info memblock: Print out errors on reserve_mem parser
3 daysMerge tag 'i2c-for-7.1-rc1-part1' of ↵Linus Torvalds24-393/+970
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: "The biggest news in this pull request is that it will start the last cycle of me handling the I2C subsystem. From 7.2. on, I will pass maintainership to Andi Shyti who has been maintaining the I2C drivers for a while now and who has done a great job in doing so. We will use this cycle for a hopefully smooth transition. Thanks must go to Andi for stepping up! I will still be around for guidance. Updates: - generic cleanups in npcm7xx, qcom-cci, xiic and designware DT bindings - atr: use kzalloc_flex for alias pool allocation - ixp4xx: convert bindings to DT schema - ocores: use read_poll_timeout_atomic() for polling waits - qcom-geni: skip extra TX DMA TRE for single read messages - s3c24xx: validate SMBus block length before using it - spacemit: refactor xfer path and add K1 PIO support - tegra: identify DVC and VI with SoC data variants - tegra: support SoC-specific register offsets - xiic: switch to devres and generic fw properties - xiic: skip input clock setup on non-OF systems - various minor improvements in other drivers rtl9300: - add per-SoC callbacks and clock support for RTL9607C - add support for new 50 kHz and 2.5 MHz bus speeds - general refactoring in preparation for RTL9607C support New support: - DesignWare GOOG5000 (ACPI HID) - Intel Nova Lake (ACPI ID) - Realtek RTL9607C - SpacemiT K3 binding - Tegra410 register layout support" * tag 'i2c-for-7.1-rc1-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (40 commits) i2c: usbio: Add ACPI device-id for NVL platforms i2c: qcom-geni: Avoid extra TX DMA TRE for single read message in GPI mode i2c: atr: use kzalloc_flex i2c: spacemit: introduce pio for k1 i2c: spacemit: move i2c_xfer_msg() i2c: xiic: skip input clock setup on non-OF systems i2c: xiic: use numbered adapter registration i2c: xiic: cosmetic: use resource format specifier in debug log i2c: xiic: cosmetic cleanup i2c: xiic: switch to generic device property accessors i2c: xiic: remove duplicate error message i2c: xiic: switch to devres managed APIs i2c: rtl9300: add RTL9607C i2c controller support i2c: rtl9300: introduce new function properties to driver data i2c: rtl9300: introduce clk struct for upcoming rtl9607 support dt-bindings: i2c: realtek,rtl9301-i2c: extend for clocks and RTL9607C support i2c: rtl9300: introduce a property for 8 bit width reg address i2c: rtl9300: introduce F_BUSY to the reg_fields struct i2c: rtl9300: introduce max length property to driver data i2c: rtl9300: split data_reg into read and write reg ...
3 daysMerge tag 'for-linus-7.1-1' of https://github.com/cminyard/linux-ipmiLinus Torvalds4-31/+435
Pull ipmi updates from Corey Minyard: "Small updates and fixes (mostly to the BMC software): - Fix one issue in the host side driver where a kthread can be left running on a specific memory allocation failre at probe time - Replace system_wq with system_percpu_wq so system_wq can eventually go away" * tag 'for-linus-7.1-1' of https://github.com/cminyard/linux-ipmi: ipmi:ssif: Clean up kthread on errors ipmi:ssif: Remove unnecessary indention ipmi: ssif_bmc: Fix KUnit test link failure when KUNIT=m ipmi: ssif_bmc: add unit test for state machine ipmi: ssif_bmc: change log level to dbg in irq callback ipmi: ssif_bmc: fix message desynchronization after truncated response ipmi: ssif_bmc: fix missing check for copy_to_user() partial failure ipmi: ssif_bmc: cancel response timer on remove ipmi: Replace use of system_wq with system_percpu_wq
3 daysMerge tag 'perf-tools-for-v7.1-2026-04-17' of ↵Linus Torvalds245-1676/+6797
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "perf report: - Add 'comm_nodigit' sort key to combine similar threads that only have different numbers in the comm. In the following example, the 'comm_nodigit' will have samples from all threads starting with "bpfrb/" into an entry "bpfrb/<N>". $ perf report -s comm_nodigit,comm -H ... # # Overhead CommandNoDigit / Command # ........... ........................ # 20.30% swapper 20.30% swapper 13.37% chrome 13.37% chrome 10.07% bpfrb/<N> 7.47% bpfrb/0 0.70% bpfrb/1 0.47% bpfrb/3 0.46% bpfrb/2 0.25% bpfrb/4 0.23% bpfrb/5 0.20% bpfrb/6 0.14% bpfrb/10 0.07% bpfrb/7 - Support flat layout for symfs. The --symfs option is to specify the location of debugging symbol files. The default 'hierarchy' layout would search the symbol file using the same path of the original file under the symfs root. The new 'flat' layout would search only in the root directory. - Update 'simd' sort key for ARM SIMD flags to cover ASE/SME and more predicate flags. perf stat: - Add --pmu-filter option to select specific PMUs. This would be useful when you measure metrics from multiple instance of uncore PMUs with similar names. # perf stat -M cpa_p0_avg_bw Performance counter stats for 'system wide': 19,417,779,115 hisi_sicl0_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl0_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl0_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl0_cpa0/cpa_p0_rd_dat_32b/ 19,417,751,103 hisi_sicl10_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl10_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl10_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl10_cpa0/cpa_p0_rd_dat_32b/ 19,417,730,679 hisi_sicl2_cpa0/cpa_cycles/ # 0.31 cpa_p0_avg_bw 75,635,749 hisi_sicl2_cpa0/cpa_p0_wr_dat/ 18,520,640 hisi_sicl2_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl2_cpa0/cpa_p0_rd_dat_32b/ 19,417,674,227 hisi_sicl8_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl8_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl8_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl8_cpa0/cpa_p0_rd_dat_32b/ 19.417734480 seconds time elapsed With --pmu-filter, users can select only hisi_sicl2_cpa0 PMU. # perf stat --pmu-filter hisi_sicl2_cpa0 -M cpa_p0_avg_bw Performance counter stats for 'system wide': 6,234,093,559 cpa_cycles # 0.60 cpa_p0_avg_bw 50,548,465 cpa_p0_wr_dat 7,552,182 cpa_p0_rd_dat_64b 0 cpa_p0_rd_dat_32b 6.234139320 seconds time elapsed Data type profiling: - Quality improvements by tracking register state more precisely - Ensure array members to get the type - Handle more cases for global variables Vendor event/metric updates: - Update various Intel events and metrics - Add NVIDIA Tegra 410 Olympus events Internal changes: - Verify perf.data header for maliciously crafted files - Update perf test to cover more usages and make them robust - Move a couple of copied kernel headers not to annoy objtool build - Fix a bug in map sorting in name order - Remove some unused codes Misc: - Fix module symbol resolution with non-zero text address - Add -t/--threads option to `perf bench mem mmap` - Track duration of exit*() syscall by `perf trace -s` - Add core.addr2line-timeout and core.addr2line-disable-warn config items" * tag 'perf-tools-for-v7.1-2026-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (131 commits) perf loongarch: Fix build failure with CONFIG_LIBDW_DWARF_UNWIND perf annotate: Use jump__delete when freeing LoongArch jumps perf test: Fixes for check branch stack sampling perf test: Fix inet_pton probe failure and unroll call graph perf build: fix "argument list too long" in second location perf header: Add sanity checks to HEADER_BPF_BTF processing perf header: Sanity check HEADER_BPF_PROG_INFO perf header: Sanity check HEADER_PMU_CAPS perf header: Sanity check HEADER_HYBRID_TOPOLOGY perf header: Sanity check HEADER_CACHE perf header: Sanity check HEADER_GROUP_DESC perf header: Sanity check HEADER_PMU_MAPPINGS perf header: Sanity check HEADER_MEM_TOPOLOGY perf header: Sanity check HEADER_NUMA_TOPOLOGY perf header: Sanity check HEADER_CPU_TOPOLOGY perf header: Sanity check HEADER_NRCPUS and HEADER_CPU_DOMAIN_INFO perf header: Bump up the max number of command line args allowed perf header: Validate nr_domains when reading HEADER_CPU_DOMAIN_INFO perf sample: Fix documentation typo perf arm_spe: Improve SIMD flags setting ...
3 daysMAINTAINERS: add page cache reviewerJan Kara1-0/+2
Add myself as a page cache reviewer since I tend to review changes in these areas anyway. [akpm@linux-foundation.org: add linux-mm@kvack.org] Link: https://lore.kernel.org/20260415174039.13016-2-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Lorenzo Stoakes <ljs@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/vmscan: avoid false-positive -Wuninitialized warningArnd Bergmann1-2/+2
When the -fsanitize=bounds sanitizer is enabled, gcc-16 sometimes runs into a corner case in the read_ctrl_pos() pos function, where it sees possible undefined behavior from the 'tier' index overflowing, presumably in the case that this was called with a negative tier: In function 'get_tier_idx', inlined from 'isolate_folios' at mm/vmscan.c:4671:14: mm/vmscan.c: In function 'isolate_folios': mm/vmscan.c:4645:29: error: 'pv.refaulted' is used uninitialized [-Werror=uninitialized] Part of the problem seems to be that read_ctrl_pos() has unusual calling conventions since commit 37a260870f2c ("mm/mglru: rework type selection") where passing MAX_NR_TIERS makes it accumulate all tiers but passing a smaller positive number makes it read a single tier instead. Shut up the warning by adding a fake initialization to the two instances of this variable that can run into that corner case. Link: https://lore.kernel.org/all/CAJHvVcjtFW86o5FoQC8MMEXCHAC0FviggaQsd5EmiCHP+1fBpg@mail.gmail.com/ Link: https://lore.kernel.org/20260414065206.3236176-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kairui Song <kasong@tencent.com> Cc: Koichiro Den <koichiro.den@canonical.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysMAINTAINERS: update Dave's kdump reviewer email addressDave Young1-1/+1
Use my personal email address due to the Red Hat work will stop soon Link: https://lore.kernel.org/ad8GFhh3SI1wb7IC@darkstar.users.ipa.redhat.com Signed-off-by: Dave Young <ruirui.yang@linux.dev> Acked-by: Dave Young <dyoung@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysMAINTAINERS: drop include/linux/liveupdate from LIVE UPDATEPratyush Yadav (Google)1-1/+0
The directory does not exist any more. Link: https://lore.kernel.org/20260414121752.1912847-4-pratyush@kernel.org Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysMAINTAINERS: drop include/linux/kho/abi/ from KHOPratyush Yadav (Google)1-1/+0
The KHO entry already includes include/linux/kho. Listing its subdirectory is redundant. Link: https://lore.kernel.org/20260414121752.1912847-3-pratyush@kernel.org Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysMAINTAINERS: update KHO and LIVE UPDATE maintainersPratyush Yadav (Google)2-3/+9
Patch series "MAINTAINERS: update KHO and LIVE UPDATE entries". This series contains some updates for the Kexec Handover (KHO) and Live update entries. Patch 1 updates the maintainers list and adds the liveupdate tree. Patches 2 and 3 clean up stale files in the list. This patch (of 3): I have been helping out with reviewing and developing KHO. I would also like to help maintain it. Change my entry from R to M for KHO and live update. Alex has been inactive for a while, so to avoid over-crowding the KHO entry and to keep the information up-to-date, move his entry from M to R. We also now have a tree for KHO and live update at liveupdate/linux.git where we plan to start maintaining those subsystems and start queuing the patches. List that in the entries as well. Link: https://lore.kernel.org/20260414121752.1912847-1-pratyush@kernel.org Link: https://lore.kernel.org/20260414121752.1912847-2-pratyush@kernel.org Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Alexander Graf <graf@amazon.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: David Hildenbrand <david@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysMAINTAINERS: update kexec/kdump maintainers entriesPasha Tatashin2-1/+10
Update KEXEC and KDUMP maintainer entries by adding the live update group maintainers. Remove Vivek Goyal due to inactivity to keep the MAINTAINERS file up-to-date, and add Vivek to the CREDITS file to recognize their contributions. Link: https://lore.kernel.org/20260413121146.49215-1-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Pratyush Yadav <pratyush@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Diego Viola <diego.viola@gmail.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Cc: Mark Brown <broonie@kernel.org> Cc: Martin Kepplinger <martink@posteo.de> Cc: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/migrate_device: remove dead migration entry check in ↵Davidlohr Bueso1-6/+0
migrate_vma_collect_huge_pmd() The softleaf_is_migration() check is unreachable as entries that are not device_private are filtered out. Similarly, the PTE-level equivalent in migrate_vma_collect_pmd() skips migration entries. This dead branch also contained a double spin_unlock(ptl) bug. Link: https://lore.kernel.org/20260212014611.416695-1-dave@stgolabs.net Fixes: a30b48bf1b244 ("mm/migrate_device: implement THP migration of zone device pages") Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Acked-by: Balbir Singh <balbirs@nvidia.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests: mm: skip charge_reserved_hugetlb without killallCao Ruichuang1-0/+5
charge_reserved_hugetlb.sh tears down background writers with killall from psmisc. Minimal Ubuntu images do not always provide that tool, so the selftest fails in cleanup for an environment reason rather than for the hugetlb behavior it is trying to cover. Skip the test when killall is unavailable, similar to the existing root check, so these environments report the dependency clearly instead of failing the test. Link: https://lore.kernel.org/20260410044139.67480-1-create0818@163.com Signed-off-by: Cao Ruichuang <create0818@163.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysuserfaultfd: allow registration of ranges below mmap_min_addrDenis M. Karpov1-2/+0
The current implementation of validate_range() in fs/userfaultfd.c performs a hard check against mmap_min_addr. This is redundant because UFFDIO_REGISTER operates on memory ranges that must already be backed by a VMA. Enforcing mmap_min_addr or capability checks again in userfaultfd is unnecessary and prevents applications like binary compilers from using UFFD for valid memory regions mapped by application. Remove the redundant check for mmap_min_addr. We started using UFFD instead of the classic mprotect approach in the binary translator to track application writes. During development, we encountered this bug. The translator cannot control where the translated application chooses to map its memory and if the app requires a low-address area, UFFD fails, whereas mprotect would work just fine. I believe this is a genuine logic bug rather than an improvement, and I would appreciate including the fix in stable. Link: https://lore.kernel.org/20260409103345.15044-1-komlomal@gmail.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Denis M. Karpov <komlomal@gmail.com> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Acked-by: Harry Yoo (Oracle) <harry@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/vmstat: fix vmstat_shepherd double-scheduling vmstat_updateBreno Leitao1-1/+1
vmstat_shepherd uses delayed_work_pending() to check whether vmstat_update is already scheduled for a given CPU before queuing it. However, delayed_work_pending() only tests WORK_STRUCT_PENDING_BIT, which is cleared the moment a worker thread picks up the work to execute it. This means that while vmstat_update is actively running on a CPU, delayed_work_pending() returns false. If need_update() also returns true at that point (per-cpu counters not yet zeroed mid-flush), the shepherd queues a second invocation with delay=0, causing vmstat_update to run again immediately after finishing. On a 72-CPU system this race is readily observable: before the fix, many CPUs show invocation gaps well below 500 jiffies (the minimum round_jiffies_relative() can produce), with the most extreme cases reaching 0 jiffies—vmstat_update called twice within the same jiffy. Fix this by replacing delayed_work_pending() with work_busy(), which returns non-zero for both WORK_BUSY_PENDING (timer armed or work queued) and WORK_BUSY_RUNNING (work currently executing). The shepherd now correctly skips a CPU in all busy states. After the fix, all sub-jiffy and most sub-100-jiffie gaps disappear. The remaining early invocations have gaps in the 700–999 jiffie range, attributable to round_jiffies_relative() aligning to a nearer jiffie-second boundary rather than to this race. Each spurious vmstat_update invocation has a measurable side effect: refresh_cpu_vm_stats() calls decay_pcp_high() for every zone, which drains idle per-CPU pages back to the buddy allocator via free_pcppages_bulk(), taking the zone spinlock each time. Eliminating the double-scheduling therefore reduces zone lock contention directly. On a 72-CPU stress-ng workload measured with perf lock contention: free_pcppages_bulk contention count: ~55% reduction free_pcppages_bulk total wait time: ~57% reduction free_pcppages_bulk max wait time: ~47% reduction Note: work_busy() is inherently racy—between the check and the subsequent queue_delayed_work_on() call, vmstat_update can finish execution, leaving the work neither pending nor running. In that narrow window the shepherd can still queue a second invocation. After the fix, this residual race is rare and produces only occasional small gaps, a significant improvement over the systematic double-scheduling seen with delayed_work_pending(). Link: https://lore.kernel.org/20260409-vmstat-v2-1-e9d9a6db08ad@debian.org Fixes: 7b8da4c7f07774 ("vmstat: get rid of the ugly cpu_stat_off variable") Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Dmitry Ilvokhin <d@ilvokhin.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/hugetlb: fix early boot crash on parameters without '=' separatorThorsten Blum1-0/+3
If hugepages, hugepagesz, or default_hugepagesz are specified on the kernel command line without the '=' separator, early parameter parsing passes NULL to hugetlb_add_param(), which dereferences it in strlen() and can crash the system during early boot. Reject NULL values in hugetlb_add_param() and return -EINVAL instead. Link: https://lore.kernel.org/20260409105437.108686-4-thorsten.blum@linux.dev Fixes: 5b47c02967ab ("mm/hugetlb: convert cmdline parameters from setup to early") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Frank van der Linden <fvdl@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 dayszram: reject unrecognized type= values in recompress_store()Andrew Stellman1-0/+2
recompress_store() parses the type= parameter with three if statements checking for "idle", "huge", and "huge_idle". An unrecognized value silently falls through with mode left at 0, causing the recompression pass to run with no slot filter — processing all slots instead of the intended subset. Add a !mode check after the type parsing block to return -EINVAL for unrecognized values, consistent with the function's other parameter validation. Link: https://lore.kernel.org/20260407153027.42425-1-astellman@stellman-greene.com Signed-off-by: Andrew Stellman <astellman@stellman-greene.com> Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysdocs: proc: document ProtectionKey in smapsKevin Brodsky1-0/+4
The ProtectionKey entry was added in v4.9; back then it was x86-specific, but it now lives in generic code and applies to all architectures supporting pkeys (currently x86, power, arm64). Time to document it: add a paragraph to proc.rst about the ProtectionKey entry. [akpm@linux-foundation.org: s/system/hardware/, per review discussion] [akpm@linux-foundation.org: s/hardware/CPU/] Link: https://lore.kernel.org/20260407125133.564182-1-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reported-by: Yury Khrustalev <yury.khrustalev@arm.com> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/mprotect: special-case small folios when applying permissionsPedro Falcato1-34/+57
The common order-0 case is important enough to want its own branch, and avoids the hairy, large loop logic that the CPU does not seem to handle particularly well. While at it, encourage the compiler to inline batch PTE logic and resolve constant branches by adding __always_inline strategically. Link: https://lore.kernel.org/20260402141628.3367596-3-pfalcato@suse.de Signed-off-by: Pedro Falcato <pfalcato@suse.de> Suggested-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Tested-by: Luke Yang <luyang@redhat.com> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Jiri Hladky <jhladky@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/mprotect: move softleaf code out of the main functionPedro Falcato1-60/+67
Patch series "mm/mprotect: micro-optimization work", v3. Micro-optimize the change_protection functionality and the change_pte_range() routine. This set of functions works in an incredibly tight loop, and even small inefficiencies are incredibly evident when spun hundreds, thousands or hundreds of thousands of times. There was an attempt to keep the batching functionality as much as possible, which introduced some part of the slowness, but not all of it. Removing it for !arm64 architectures would speed mprotect() up even further, but could easily pessimize cases where large folios are mapped (which is not as rare as it seems, particularly when it comes to the page cache these days). The micro-benchmark used for the tests was [0] (usable using google/benchmark and g++ -O2 -lbenchmark repro.cpp) This resulted in the following (first entry is baseline): --------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------- mprotect_bench 85967 ns 85967 ns 6935 mprotect_bench 70684 ns 70684 ns 9887 After the patchset we can observe an ~18% speedup in mprotect. Wonderful for the elusive mprotect-based workloads! Testing & more ideas welcome. I suspect there is plenty of improvement possible but it would require more time than what I have on my hands right now. The entire inlined function (which inlines into change_protection()) is gigantic - I'm not surprised this is so finnicky. Note: per my profiling, the next _big_ bottleneck here is modify_prot_start_ptes, exactly on the xchg() done by x86. ptep_get_and_clear() is _expensive_. I don't think there's a properly safe way to go about it since we do depend on the D bit quite a lot. This might not be such an issue on other architectures. Luke Yang reported [1]: : On average, we see improvements ranging from a minimum of 5% to a : maximum of 55%, with most improvements showing around a 25% speed up in : the libmicro/mprot_tw4m micro benchmark. This patch (of 2): Move softleaf change_pte_range code into a separate function. This makes the change_pte_range() function a good bit smaller, and lessens cognitive load when reading through the function. Link: https://lore.kernel.org/20260402141628.3367596-1-pfalcato@suse.de Link: https://lore.kernel.org/20260402141628.3367596-2-pfalcato@suse.de Link: https://lore.kernel.org/all/aY8-XuFZ7zCvXulB@luyang-thinkpadp1gen7.toromso.csb/ Link: https://gist.github.com/heatd/1450d273005aba91fa5744f44dfcd933 [0] Link: https://lore.kernel.org/CAL2CeBxT4jtJ+LxYb6=BNxNMGinpgD_HYH5gGxOP-45Q2OncqQ@mail.gmail.com [1] Signed-off-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Tested-by: Luke Yang <luyang@redhat.com> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Jiri Hladky <jhladky@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm: remove '!root_reclaim' checking in should_abort_scan()Zhaoyang Huang1-4/+0
Android systems usually use memory.reclaim interface to implement user space memory management which expects that the requested reclaim target and actually reclaimed amount memory are not diverging by too much. With the current MGRLU implementation there is, however, no bail out when the reclaim target is reached and this could lead to an excessive reclaim that scales with the reclaim hierarchy size.For example, we can get a nr_reclaimed=394/nr_to_reclaim=32 proactive reclaim under a common 1-N cgroup hierarchy. This defect arose from the goal of keeping fairness among memcgs that is, for try_to_free_mem_cgroup_pages -> shrink_node_memcgs -> shrink_lruvec -> lru_gen_shrink_lruvec -> try_to_shrink_lruvec, the !root_reclaim(sc) check was there for reclaim fairness, which was necessary before commit b82b530740b9 ("mm: vmscan: restore incremental cgroup iteration") because the fairness depended on attempted proportional reclaim from every memcg under the target memcg. However after commit b82b530740b9 there is no longer a need to visit every memcg to ensure fairness. Let's have try_to_shrink_lruvec bail out when the nr_reclaimed achieved. Link: https://lore.kernel.org/20260318011558.1696310-1-zhaoyang.huang@unisoc.com Link: https://lore.kernel.org/20260212032111.408865-1-zhaoyang.huang@unisoc.com Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Suggested-by: T.J.Mercier <tjmercier@google.com> Reviewed-by: T.J. Mercier <tjmercier@google.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Qi Zheng <qi.zheng@linux.dev> Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Kairui Song <kasong@tencent.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Yu Zhao <yuzhao@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Wei Xu <weixugc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/sparse: fix comment for section map alignmentMuchun Song1-15/+10
The comment in mmzone.h currently details exhaustive per-architecture bit-width lists and explains alignment using min(PAGE_SHIFT, PFN_SECTION_SHIFT). Such details risk falling out of date over time and may inadvertently be left un-updated. We always expect a single section to cover full pages. Therefore, we can safely assume that PFN_SECTION_SHIFT is large enough to accommodate SECTION_MAP_LAST_BIT. We use BUILD_BUG_ON() to ensure this. Update the comment to accurately reflect this consensus, making it clear that we rely on a single section covering full pages. Link: https://lore.kernel.org/20260402102320.3617578-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Petr Tesarik <ptesarik@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/page_io: use sio->len for PSWPIN accounting in sio_read_complete()David Carlier1-1/+1
sio_read_complete() uses sio->pages to account global PSWPIN vm events, but sio->pages tracks the number of bvec entries (folios), not base pages. While large folios cannot currently reach this path (SWP_FS_OPS and SWP_SYNCHRONOUS_IO are mutually exclusive, and mTHP swap-in allocation is gated on SWP_SYNCHRONOUS_IO), the accounting is semantically inconsistent with the per-memcg path which correctly uses folio_nr_pages(). Use sio->len >> PAGE_SHIFT instead, which gives the correct base page count since sio->len is accumulated via folio_size(folio). Link: https://lore.kernel.org/20260402061408.36119-1-devnexen@gmail.com Signed-off-by: David Carlier <devnexen@gmail.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Chris Li <chrisl@kernel.org> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: NeilBrown <neil@brown.name> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: transhuge_stress: skip the test when thp not availableChunyu Hu1-0/+4
The test requires thp, skip the test when thp is not available to avoid false positive. Tested with thp disabled kernel. Before the fix: # -------------------------------- # running ./transhuge-stress -d 20 # -------------------------------- # TAP version 13 # 1..1 # transhuge-stress: allocate 1453 transhuge pages, using 2907 MiB virtual memory and 11 MiB of ram # Bail out! MADV_HUGEPAGE# Planned tests != run tests (1 != 0) # # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0 # [FAIL] not ok 60 transhuge-stress -d 20 # exit=1 After the fix: # -------------------------------- # running ./transhuge-stress -d 20 # -------------------------------- # TAP version 13 # 1..0 # SKIP Transparent Hugepages not available # [SKIP] ok 5 transhuge-stress -d 20 # SKIP Link: https://lore.kernel.org/20260402014543.1671131-7-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Li Wang <liwang@redhat.com> Cc: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: split_huge_page_test: skip the test when thp is not availableChunyu Hu1-0/+4
When thp is not enabled on some kernel config such as realtime kernel, the test will report failure. Fix the false positive by skipping the test directly when thp is not enabled. Tested with thp disabled kernel: Before The fix: # -------------------------------------------------- # running ./split_huge_page_test /tmp/xfs_dir_Ywup9p # -------------------------------------------------- # TAP version 13 # Bail out! Reading PMD pagesize failed # # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0 # [FAIL] not ok 61 split_huge_page_test /tmp/xfs_dir_Ywup9p # exit=1 After the fix: # -------------------------------------------------- # running ./split_huge_page_test /tmp/xfs_dir_YHPUPl # -------------------------------------------------- # TAP version 13 # 1..0 # SKIP Transparent Hugepages not available # [SKIP] ok 6 split_huge_page_test /tmp/xfs_dir_YHPUPl # SKIP Link: https://lore.kernel.org/20260402014543.1671131-6-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Li Wang <liwang@redhat.com> Cc: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm/vm_util: robust write_file()Chunyu Hu1-3/+12
Add three more checks for buflen and numwritten. The buflen should be at least two, that means at least one char and the null-end. The error case check is added by checking numwriten < 0 instead of numwritten < 1. And the truncate case is checked. The test will exit if any of these conditions aren't met. Additionally, add more print information when a write failure occurs or a truncated write happens, providing clearer diagnostics. Link: https://lore.kernel.org/20260402014543.1671131-5-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: move write_file helper to vm_utilChunyu Hu5-48/+20
thp_settings provides write_file() helper for safely writing to a file and exit when write failure happens. It's a very low level helper and many sub tests need such a helper, not only thp tests. split_huge_page_test also defines a write_file locally. The two have minior differences in return type and used exit api. And there would be conflicts if split_huge_page_test wanted to include thp_settings.h because of different prototype, making it less convenient. It's possisble to merge the two, although some tests don't use the kselftest infrastrucutre for testing. It would also work when using the ksft_exit_msg() to exit in my test, as the counters are all zero. Output will be like: TAP version 13 1..62 Bail out! /proc/sys/vm/drop_caches1 open failed: No such file or directory # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0 So here we just keep the version in split_huge_page_test, and move it into the vm_util. This makes it easy to maitain and user could just include one vm_util.h when they don't need thp setting helpers. Keep the prototype of void return as the function will exit on any error, return value is not necessary, and will simply the callers like write_num() and write_string(). Link: https://lore.kernel.org/20260402014543.1671131-4-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Suggested-by: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: soft-dirty: skip two tests when thp is not availableChunyu Hu1-1/+3
The test_hugepage test contain two sub tests. If just reporting one skip when thp not available, there will be error in the log because the test count don't match the test plan. Change to skip two tests by running the ksft_test_result_skip twice in this case. Without the fix (run test on thp disabled kernel): ./run_vmtests.sh -t soft_dirty # -------------------- # running ./soft-dirty # -------------------- # TAP version 13 # 1..19 # ok 1 Test test_simple # ok 2 Test test_vma_reuse dirty bit of allocated page # ok 3 Test test_vma_reuse dirty bit of reused address page # ok 4 # SKIP Transparent Hugepages not available # ok 5 Test test_mprotect-anon dirty bit of new written page # ok 6 Test test_mprotect-anon soft-dirty clear after clear_refs # ok 7 Test test_mprotect-anon soft-dirty clear after marking RO # ok 8 Test test_mprotect-anon soft-dirty clear after marking RW # ok 9 Test test_mprotect-anon soft-dirty after rewritten # ok 10 Test test_mprotect-file dirty bit of new written page # ok 11 Test test_mprotect-file soft-dirty clear after clear_refs # ok 12 Test test_mprotect-file soft-dirty clear after marking RO # ok 13 Test test_mprotect-file soft-dirty clear after marking RW # ok 14 Test test_mprotect-file soft-dirty after rewritten # ok 15 Test test_merge-anon soft-dirty after remap merge 1st pg # ok 16 Test test_merge-anon soft-dirty after remap merge 2nd pg # ok 17 Test test_merge-anon soft-dirty after mprotect merge 1st pg # ok 18 Test test_merge-anon soft-dirty after mprotect merge 2nd pg # # 1 skipped test(s) detected. Consider enabling relevant config options to improve coverage. # # Planned tests != run tests (19 != 18) # # Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0 # [FAIL] not ok 52 soft-dirty # exit=1 With the fix (run test on thp disabled kernel): ./run_vmtests.sh -t soft_dirty # -------------------- # running ./soft-dirty # TAP version 13 # -------------------- # running ./soft-dirty # -------------------- # TAP version 13 # 1..19 # ok 1 Test test_simple # ok 2 Test test_vma_reuse dirty bit of allocated page # ok 3 Test test_vma_reuse dirty bit of reused address page # # Transparent Hugepages not available # ok 4 # SKIP Test test_hugepage huge page allocation # ok 5 # SKIP Test test_hugepage huge page dirty bit # ok 6 Test test_mprotect-anon dirty bit of new written page # ok 7 Test test_mprotect-anon soft-dirty clear after clear_refs # ok 8 Test test_mprotect-anon soft-dirty clear after marking RO # ok 9 Test test_mprotect-anon soft-dirty clear after marking RW # ok 10 Test test_mprotect-anon soft-dirty after rewritten # ok 11 Test test_mprotect-file dirty bit of new written page # ok 12 Test test_mprotect-file soft-dirty clear after clear_refs # ok 13 Test test_mprotect-file soft-dirty clear after marking RO # ok 14 Test test_mprotect-file soft-dirty clear after marking RW # ok 15 Test test_mprotect-file soft-dirty after rewritten # ok 16 Test test_merge-anon soft-dirty after remap merge 1st pg # ok 17 Test test_merge-anon soft-dirty after remap merge 2nd pg # ok 18 Test test_merge-anon soft-dirty after mprotect merge 1st pg # ok 19 Test test_merge-anon soft-dirty after mprotect merge 2nd pg # # 2 skipped test(s) detected. Consider enabling relevant config options to improve coverage. # # Totals: pass:17 fail:0 xfail:0 xpass:0 skip:2 error:0 # [PASS] ok 1 soft-dirty hwpoison_inject # SUMMARY: PASS=1 SKIP=0 FAIL=0 1..1 Link: https://lore.kernel.org/20260402014543.1671131-3-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Li Wang <liwang@redhat.com> Cc: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm/guard-regions: skip collapse test when thp not enabledChunyu Hu1-0/+4
Patch series "selftests/mm: skip several tests when thp is not available", v8. There are several tests requires transprarent hugepages, when run on thp disabled kernel such as realtime kernel, there will be false negative. Mark those tests as skip when thp is not available. This patch (of 6): When thp is not available, just skip the collape tests to avoid the false negative. Without the change, run with a thp disabled kernel: ./run_vmtests.sh -t madv_guard -n 1 <snip/> # RUN guard_regions.anon.collapse ... # guard-regions.c:2217:collapse:Expected madvise(ptr, size, MADV_NOHUGEPAGE) (-1) == 0 (0) # collapse: Test terminated by assertion # FAIL guard_regions.anon.collapse not ok 2 guard_regions.anon.collapse <snip/> # RUN guard_regions.shmem.collapse ... # guard-regions.c:2217:collapse:Expected madvise(ptr, size, MADV_NOHUGEPAGE) (-1) == 0 (0) # collapse: Test terminated by assertion # FAIL guard_regions.shmem.collapse not ok 32 guard_regions.shmem.collapse <snip/> # RUN guard_regions.file.collapse ... # guard-regions.c:2217:collapse:Expected madvise(ptr, size, MADV_NOHUGEPAGE) (-1) == 0 (0) # collapse: Test terminated by assertion # FAIL guard_regions.file.collapse not ok 62 guard_regions.file.collapse <snip/> # FAILED: 87 / 90 tests passed. # 17 skipped test(s) detected. Consider enabling relevant config options to improve coverage. # Totals: pass:70 fail:3 xfail:0 xpass:0 skip:17 error:0 With this change, run with thp disabled kernel: ./run_vmtests.sh -t madv_guard -n 1 <snip/> # RUN guard_regions.anon.collapse ... # SKIP Transparent Hugepages not available # OK guard_regions.anon.collapse ok 2 guard_regions.anon.collapse # SKIP Transparent Hugepages not available <snip/> # RUN guard_regions.file.collapse ... # SKIP Transparent Hugepages not available # OK guard_regions.file.collapse ok 62 guard_regions.file.collapse # SKIP Transparent Hugepages not available <snip/> # RUN guard_regions.shmem.collapse ... # SKIP Transparent Hugepages not available # OK guard_regions.shmem.collapse ok 32 guard_regions.shmem.collapse # SKIP Transparent Hugepages not available <snip/> # PASSED: 90 / 90 tests passed. # 20 skipped test(s) detected. Consider enabling relevant config options to improve coverage. # Totals: pass:70 fail:0 xfail:0 xpass:0 skip:20 error:0 Link: https://lore.kernel.org/20260402014543.1671131-1-chuhu@redhat.com Link: https://lore.kernel.org/20260402014543.1671131-2-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Li Wang <liwang@redhat.com> Cc: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysuserfaultfd: mfill_atomic(): remove retry logicMike Rapoport (Microsoft)1-27/+0
Since __mfill_atomic_pte() handles the retry for both anonymous and shmem, there is no need to retry copying the date from the userspace in the loop in mfill_atomic(). Drop the retry logic from mfill_atomic(). [rppt@kernel.org: remove safety measure of not returning ENOENT from _copy] Link: https://lore.kernel.org/ac5zcDUY8CFHr6Lw@kernel.org Link: https://lore.kernel.org/20260402041156.1377214-12-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrei Vagin <avagin@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Harry Yoo (Oracle) <harry@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nikita Kalyazin <kalyazin@amazon.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Carlier <devnexen@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysshmem, userfaultfd: implement shmem uffd operations using vm_uffd_opsMike Rapoport (Microsoft)4-155/+106
Add filemap_add() and filemap_remove() methods to vm_uffd_ops and use them in __mfill_atomic_pte() to add shmem folios to page cache and remove them in case of error. Implement these methods in shmem along with vm_uffd_ops->alloc_folio() and drop shmem_mfill_atomic_pte(). Since userfaultfd now does not reference any functions from shmem, drop include if linux/shmem_fs.h from mm/userfaultfd.c mfill_atomic_install_pte() is not used anywhere outside of mm/userfaultfd, make it static. Link: https://lore.kernel.org/20260402041156.1377214-11-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: James Houghton <jthoughton@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrei Vagin <avagin@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Harry Yoo (Oracle) <harry@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nikita Kalyazin <kalyazin@amazon.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Carlier <devnexen@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysuserfaultfd: introduce vm_uffd_ops->alloc_folio()Mike Rapoport (Microsoft)2-44/+54
and use it to refactor mfill_atomic_pte_zeroed_folio() and mfill_atomic_pte_copy(). mfill_atomic_pte_zeroed_folio() and mfill_atomic_pte_copy() perform almost identical actions: * allocate a folio * update folio contents (either copy from userspace of fill with zeros) * update page tables with the new folio Split a __mfill_atomic_pte() helper that handles both cases and uses newly introduced vm_uffd_ops->alloc_folio() to allocate the folio. Pass the ops structure from the callers to __mfill_atomic_pte() to later allow using anon_uffd_ops for MAP_PRIVATE mappings of file-backed VMAs. Note, that the new ops method is called alloc_folio() rather than folio_alloc() to avoid clash with alloc_tag macro folio_alloc(). Link: https://lore.kernel.org/20260402041156.1377214-10-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: James Houghton <jthoughton@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrei Vagin <avagin@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Harry Yoo (Oracle) <harry@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nikita Kalyazin <kalyazin@amazon.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Carlier <devnexen@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysshmem, userfaultfd: use a VMA callback to handle UFFDIO_CONTINUEMike Rapoport (Microsoft)3-17/+39
When userspace resolves a page fault in a shmem VMA with UFFDIO_CONTINUE it needs to get a folio that already exists in the pagecache backing that VMA. Instead of using shmem_get_folio() for that, add a get_folio_noalloc() method to 'struct vm_uffd_ops' that will return a folio if it exists in the VMA's pagecache at given pgoff. Implement get_folio_noalloc() method for shmem and slightly refactor userfaultfd's mfill_get_vma() and mfill_atomic_pte_continue() to support this new API. Link: https://lore.kernel.org/20260402041156.1377214-9-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: James Houghton <jthoughton@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrei Vagin <avagin@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Harry Yoo (Oracle) <harry@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nikita Kalyazin <kalyazin@amazon.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Carlier <devnexen@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysuserfaultfd: introduce vm_uffd_opsMike Rapoport (Microsoft)5-10/+69
Current userfaultfd implementation works only with memory managed by core MM: anonymous, shmem and hugetlb. First, there is no fundamental reason to limit userfaultfd support only to the core memory types and userfaults can be handled similarly to regular page faults provided a VMA owner implements appropriate callbacks. Second, historically various code paths were conditioned on vma_is_anonymous(), vma_is_shmem() and is_vm_hugetlb_page() and some of these conditions can be expressed as operations implemented by a particular memory type. Introduce vm_uffd_ops extension to vm_operations_struct that will delegate memory type specific operations to a VMA owner. Operations for anonymous memory are handled internally in userfaultfd using anon_uffd_ops that implicitly assigned to anonymous VMAs. Start with a single operation, ->can_userfault() that will verify that a VMA meets requirements for userfaultfd support at registration time. Implement that method for anonymous, shmem and hugetlb and move relevant parts of vma_can_userfault() into the new callbacks. [rppt@kernel.org: relocate VM_DROPPABLE test, per Tal] Link: https://lore.kernel.org/adffgfM5ANxtPIEF@kernel.org Link: https://lore.kernel.org/20260402041156.1377214-8-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrei Vagin <avagin@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Harry Yoo (Oracle) <harry@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nikita Kalyazin <kalyazin@amazon.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Carlier <devnexen@gmail.com> Cc: Tal Zussman <tz2294@columbia.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>