| Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"7 hotfixes. 6 are cc:stable and all are for MM. Please see the
individual changelogs for details"
* tag 'mm-hotfixes-stable-2026-04-19-00-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/damon/core: disallow non-power of two min_region_sz on damon_start()
mm/vmalloc: take vmap_purge_lock in shrinker
mm: call ->free_folio() directly in folio_unmap_invalidate()
mm: blk-cgroup: fix use-after-free in cgwb_release_workfn()
mm/zone_device: do not touch device folio after calling ->folio_free()
mm/damon/core: disallow time-quota setting zero esz
mm/mempolicy: fix weighted interleave auto sysfs name
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:
- Prevent a device from being probed before device_add() has finished
initializing it; gate probe with a "ready_to_probe" device flag to
avoid races with concurrent driver_register() calls
- Fix a kernel-doc warning for DEV_FLAG_COUNT introduced by the above
- Return -ENOTCONN from software_node_get_reference_args() when a
referenced software node is known but not yet registered, allowing
callers to defer probe
- In sysfs_group_attrs_change_owner(), also check is_visible_const();
missed when the const variant was introduced
* tag 'driver-core-7.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
driver core: Add kernel-doc for DEV_FLAG_COUNT enum value
sysfs: attribute_group: Respect is_visible_const() when changing owner
software node: return -ENOTCONN when referenced swnode is not registered yet
driver core: Don't let a device probe until it's ready
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver updates from Greg KH:
"Here is the "big" set of staging driver changes for 7.1-rc1.
Nothing major in here at all, just lots of little cleanups for the
staging drivers, driven by new developers getting their feet wet in
kernel development. "Largest" thing in here is the change of some of
the octeon variable types into proper kernel ones.
Full details are in the shortlog.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (154 commits)
staging: rtl8723bs: remove redundant & parentheses
staging: most: dim2: replace BUG_ON() in poison_channel()
staging: most: dim2: replace BUG_ON() in enqueue()
staging: most: dim2: replace BUG_ON() in configure_channel()
staging: most: dim2: replace BUG_ON() in service_done_flag()
staging: most: dim2: replace BUG_ON() in try_start_dim_transfer()
staging: rtl8723bs: remove unused RTL8188E antenna selection macros
staging: rtl8723bs: remove redundant blank lines in basic_types.h
staging: rtl8723bs: wrap complex macros with parentheses
staging: rtl8723bs: remove unused WRITEEF/READEF byte macros
staging: rtl8723bs: rename camelCase variable
staging: greybus: audio: fix error message for BTN_3 button
staging: rtl8723bs: rename variables to snake_case
staging: rtl8723bs: fix spelling in comment
staging: rtl8723bs: cleanup return in sdio_init()
staging: rtl8723bs: use direct returns in sdio_dvobj_init()
staging: rtl8723bs: remove unused arg at odm_interface.h
greybus: raw: fix use-after-free if write is called after disconnect
greybus: raw: fix use-after-free on cdev close
staging: rtl8723bs: fix logical continuations in xmit_linux.c
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt updates from Greg KH:
"Here is the big set of USB and Thunderbolt changes for 7.1-rc1.
Lots of little things in here, nothing major, just constant
improvements, updates, and new features. Highlights are:
- new USB power supply driver support.
These changes did touch outside of drivers/usb/ but got acks from
the relevant mantainers for them.
- dts file updates and conversions
- string function conversions into "safer" ones
- new device quirks
- xhci driver updates
- usb gadget driver minor fixes
- typec driver additions and updates
- small number of thunderbolt driver changes
- dwc3 driver updates and additions of new hardware support
- other minor driver updates
All of these have been in the linux-next tree for a while with no
reported issues"
* tag 'usb-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (176 commits)
usb: dwc3: starfive: Add JHB100 USB 2.0 DRD controller
dt-bindings: usb: dwc3: add support for StarFive JHB100
dt-bindings: usb: atmel,at91sam9rl-udc: convert to DT schema
dt-bindings: usb: atmel,at91rm9200-udc: convert to DT schema
dt-bindings: usb: generic-ehci: fix schema structure and add at91sam9g45 constraints
dt-bindings: usb: generic-ohci: add AT91RM9200 OHCI binding support
arm: dts: at91: remove unused #address-cells/#size-cells from sam9x60 udc node
drivers/usb/host: Fix spelling error 'seperate' -> 'separate'
usbip: tools: add hint when no exported devices are found
USB: serial: iuu_phoenix: fix iuutool author name
usb: gadget: f_ncm: validate minimum block_len in ncm_unwrap_ntb()
usb: gadget: f_phonet: fix skb frags[] overflow in pn_rx_complete()
usb: gadget: f_hid: Add missing error code
usb: typec: cros_ec_ucsi: Load driver from OF and ACPI definitions
dt-bindings: chrome: Add cros-ec-ucsi compatibility to typec binding
USB: of: Simplify with scoped for each OF child loop
usbip: validate number_of_packets in usbip_pack_ret_submit()
usb: gadget: renesas_usb3: validate endpoint index in standard request handlers
usb: core: config: reverse the size check of the SSP isoc endpoint descriptor
usb: typec: ucsi: Set usb mode on partner change
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial updates from Greg KH:
"Here is the set of tty and serial driver changes for 7.1-rc1.
Not much here this cycle, biggest thing is the removal of an old
driver that never got any actual hardware support (esp32), and the
second try to moving the tty ports to their own workqueues (first try
was in 7.0-rc1 but was reverted due to problems)
Otherwise it's just a small set of driver updates and some vt modifier
key enhancements.
All have been in linux-next for a while with no reported issues"
* tag 'tty-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (35 commits)
tty: serial: ip22zilog: Fix section mispatch warning
hvc/xen: Check console connection flag
serial: sh-sci: Add support for RZ/G3L RSCI
dt-bindings: serial: renesas,rsci: Document RZ/G3L SoC
tty: atmel_serial: update outdated reference to atmel_tasklet_func()
serial: xilinx_uartps: Drop unused include
serial: qcom-geni: drop stray newline format specifier
serial: 8250: loongson: Enable building on MIPS Loongson64
dt-bindings: serial: 8250: Add Loongson 3A4000 uart compatible
serial: 8250_fintek: Add support for F81214E
tty: tty_port: add workqueue to flip TTY buffer
vt: support ITU-T T.416 color subparameters
serial: qcom-geni: Fix RTS behavior with flow control
tty: serial: imx: keep dma request disabled before dma transfer setup
tty: serial: 8250: Add SystemBase Multi I/O cards
serial: pic32_uart: allow driver to be compiled on all architectures with COMPILE_TEST
serial: tegra: remove Kconfig dependency on APB DMA controller
dt-bindings: serial: amlogic,meson-uart: Add compatible string for A9
dt-bindings: serial: atmel,at91-usart: add microchip,lan9691-usart
serial: auart: check clk_enable() return in console write
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- "Eliminate Dying Memory Cgroup" (Qi Zheng and Muchun Song)
Address the longstanding "dying memcg problem". A situation wherein a
no-longer-used memory control group will hang around for an extended
period pointlessly consuming memory
- "fix unexpected type conversions and potential overflows" (Qi Zheng)
Fix a couple of potential 32-bit/64-bit issues which were identified
during review of the "Eliminate Dying Memory Cgroup" series
- "kho: history: track previous kernel version and kexec boot count"
(Breno Leitao)
Use Kexec Handover (KHO) to pass the previous kernel's version string
and the number of kexec reboots since the last cold boot to the next
kernel, and print it at boot time
- "liveupdate: prevent double preservation" (Pasha Tatashin)
Teach LUO to avoid managing the same file across different active
sessions
- "liveupdate: Fix module unloading and unregister API" (Pasha
Tatashin)
Address an issue with how LUO handles module reference counting and
unregistration during module unloading
- "zswap pool per-CPU acomp_ctx simplifications" (Kanchana Sridhar)
Simplify and clean up the zswap crypto compression handling and
improve the lifecycle management of zswap pool's per-CPU acomp_ctx
resources
- "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race"
(SeongJae Park)
Address unlikely but possible leaks and deadlocks in damon_call() and
damon_walk()
- "mm/damon/core: validate damos_quota_goal->nid" (SeongJae Park)
Fix a couple of root-only wild pointer dereferences
- "Docs/admin-guide/mm/damon: warn commit_inputs vs other params race"
(SeongJae Park)
Update the DAMON documentation to warn operators about potential
races which can occur if the commit_inputs parameter is altered at
the wrong time
- "Minor hmm_test fixes and cleanups" (Alistair Popple)
Bugfixes and a cleanup for the HMM kernel selftests
- "Modify memfd_luo code" (Chenghao Duan)
Cleanups, simplifications and speedups to the memfd_lou code
- "mm, kvm: allow uffd support in guest_memfd" (Mike Rapoport)
Support for userfaultfd in guest_memfd
- "selftests/mm: skip several tests when thp is not available" (Chunyu
Hu)
Fix several issues in the selftests code which were causing breakage
when the tests were run on CONFIG_THP=n kernels
- "mm/mprotect: micro-optimization work" (Pedro Falcato)
A couple of nice speedups for mprotect()
- "MAINTAINERS: update KHO and LIVE UPDATE entries" (Pratyush Yadav)
Document upcoming changes in the maintenance of KHO, LUO, memfd_luo,
kexec, crash, kdump and probably other kexec-based things - they are
being moved out of mm.git and into a new git tree
* tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (121 commits)
MAINTAINERS: add page cache reviewer
mm/vmscan: avoid false-positive -Wuninitialized warning
MAINTAINERS: update Dave's kdump reviewer email address
MAINTAINERS: drop include/linux/liveupdate from LIVE UPDATE
MAINTAINERS: drop include/linux/kho/abi/ from KHO
MAINTAINERS: update KHO and LIVE UPDATE maintainers
MAINTAINERS: update kexec/kdump maintainers entries
mm/migrate_device: remove dead migration entry check in migrate_vma_collect_huge_pmd()
selftests: mm: skip charge_reserved_hugetlb without killall
userfaultfd: allow registration of ranges below mmap_min_addr
mm/vmstat: fix vmstat_shepherd double-scheduling vmstat_update
mm/hugetlb: fix early boot crash on parameters without '=' separator
zram: reject unrecognized type= values in recompress_store()
docs: proc: document ProtectionKey in smaps
mm/mprotect: special-case small folios when applying permissions
mm/mprotect: move softleaf code out of the main function
mm: remove '!root_reclaim' checking in should_abort_scan()
mm/sparse: fix comment for section map alignment
mm/page_io: use sio->len for PSWPIN accounting in sio_read_complete()
selftests/mm: transhuge_stress: skip the test when thp not available
...
|
|
Commit d8f867fa0825 ("mm/damon: add damon_ctx->min_sz_region") introduced
a bug that allows unaligned DAMON region address ranges. Commit
c80f46ac228b ("mm/damon/core: disallow non-power of two min_region_sz")
fixed it, but only for damon_commit_ctx() use case. Still, DAMON sysfs
interface can emit non-power of two min_region_sz via damon_start(). Fix
the path by adding the is_power_of_2() check on damon_start().
The issue was discovered by sashiko [1].
Link: https://lore.kernel.org/20260411213638.77768-1-sj@kernel.org
Link: https://lore.kernel.org/20260403155530.64647-1-sj@kernel.org [1]
Fixes: d8f867fa0825 ("mm/damon: add damon_ctx->min_sz_region")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.18.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
decay_va_pool_node() can be invoked concurrently from two paths:
__purge_vmap_area_lazy() when pools are being purged, and the shrinker via
vmap_node_shrink_scan().
However, decay_va_pool_node() is not safe to run concurrently, and the
shrinker path currently lacks serialization, leading to races and possible
leaks.
Protect decay_va_pool_node() by taking vmap_purge_lock in the shrinker
path to ensure serialization with purge users.
Link: https://lore.kernel.org/20260413192646.14683-1-urezki@gmail.com
Fixes: 7679ba6b36db ("mm: vmalloc: add a shrinker to drain vmap pools")
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <baoquan.he@linux.dev>
Cc: chenyichong <chenyichong@uniontech.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We can only call filemap_free_folio() if we have a reference to (or hold a
lock on) the mapping. Otherwise, we've already removed the folio from the
mapping so it no longer pins the mapping and the mapping can be removed,
causing a use-after-free when accessing mapping->a_ops.
Follow the same pattern as __remove_mapping() and load the free_folio
function pointer before dropping the lock on the mapping. That lets us
make filemap_free_folio() static as this was the only caller outside
filemap.c.
Link: https://lore.kernel.org/20260413184314.3419945-1-willy@infradead.org
Fixes: fb7d3bc41493 ("mm/filemap: drop streaming/uncached pages when writeback completes")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Google Big Sleep <big-sleep-vuln-reports+bigsleep-501448199@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
cgwb_release_workfn() calls css_put(wb->blkcg_css) and then later accesses
wb->blkcg_css again via blkcg_unpin_online(). If css_put() drops the last
reference, the blkcg can be freed asynchronously (css_free_rwork_fn ->
blkcg_css_free -> kfree) before blkcg_unpin_online() dereferences the
pointer to access blkcg->online_pin, resulting in a use-after-free:
BUG: KASAN: slab-use-after-free in blkcg_unpin_online (./include/linux/instrumented.h:112 ./include/linux/atomic/atomic-instrumented.h:400 ./include/linux/refcount.h:389 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 block/blk-cgroup.c:1367)
Write of size 4 at addr ff11000117aa6160 by task kworker/71:1/531
Workqueue: cgwb_release cgwb_release_workfn
Call Trace:
<TASK>
blkcg_unpin_online (./include/linux/instrumented.h:112 ./include/linux/atomic/atomic-instrumented.h:400 ./include/linux/refcount.h:389 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 block/blk-cgroup.c:1367)
cgwb_release_workfn (mm/backing-dev.c:629)
process_scheduled_works (kernel/workqueue.c:3278 kernel/workqueue.c:3385)
Freed by task 1016:
kfree (./include/linux/kasan.h:235 mm/slub.c:2689 mm/slub.c:6246 mm/slub.c:6561)
css_free_rwork_fn (kernel/cgroup/cgroup.c:5542)
process_scheduled_works (kernel/workqueue.c:3302 kernel/workqueue.c:3385)
** Stack based on commit 66672af7a095 ("Add linux-next specific files
for 20260410")
I am seeing this crash sporadically in Meta fleet across multiple kernel
versions. A full reproducer is available at:
https://github.com/leitao/debug/blob/main/reproducers/repro_blkcg_uaf.sh
(The race window is narrow. To make it easily reproducible, inject a
msleep(100) between css_put() and blkcg_unpin_online() in
cgwb_release_workfn(). With that delay and a KASAN-enabled kernel, the
reproducer triggers the splat reliably in less than a second.)
Fix this by moving blkcg_unpin_online() before css_put(), so the
cgwb's CSS reference keeps the blkcg alive while blkcg_unpin_online()
accesses it.
Link: https://lore.kernel.org/20260413-blkcg-v1-1-35b72622d16c@debian.org
Fixes: 59b57717fff8 ("blkcg: delay blkg destruction until after writeback has finished")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: JP Kobryn <inwardvessel@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The contents of a device folio can immediately change after calling
->folio_free(), as the folio may be reallocated by a driver with a
different order. Instead of touching the folio again to extract the
pgmap, use the local stack variable when calling percpu_ref_put_many().
Link: https://lore.kernel.org/20260410230346.4009855-1-matthew.brost@intel.com
Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Balbir Singh <balbirs@nvidia.com>
Reviewed-by: Vishal Moola <vishal.moola@gmail.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When the throughput of a DAMOS scheme is very slow, DAMOS time quota can
make the effective size quota smaller than damon_ctx->min_region_sz. In
the case, damos_apply_scheme() will skip applying the action, because the
action is tried at region level, which requires >=min_region_sz size.
That is, the quota is effectively exceeded for the quota charge window.
Because no action will be applied, the total_charged_sz and
total_charged_ns are also not updated. damos_set_effective_quota() will
try to update the effective size quota before starting the next charge
window. However, because the total_charged_sz and total_charged_ns have
not updated, the throughput and effective size quota are also not changed.
Since effective size quota can only be decreased, other effective size
quota update factors including DAMOS quota goals and size quota cannot
make any change, either.
As a result, the scheme is unexpectedly deactivated until the user notices
and mitigates the situation. The users can mitigate this situation by
changing the time quota online or re-install the scheme. While the
mitigation is somewhat straightforward, finding the situation would be
challenging, because DAMON is not providing good observabilities for that.
Even if such observability is provided, doing the additional monitoring
and the mitigation is somewhat cumbersome and not aligned to the intention
of the time quota. The time quota was intended to help reduce the user's
administration overhead.
Fix the problem by setting time quota-modified effective size quota be at
least min_region_sz always.
The issue was discovered [1] by sashiko.
Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org
Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org [1]
Fixes: 1cd243030059 ("mm/damon/schemes: implement time quota")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 5.16.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The __ATTR macro is a utility that makes defining kobj_attributes easier
by stringfying the name, verifying the mode, and setting the show/store
fields in a single initializer. It takes a raw token as the first value,
rather than a string, so that __ATTR family macros like __ATTR_RW can
token-paste it for inferring the _show / _store function names.
Commit e341f9c3c841 ("mm/mempolicy: Weighted Interleave Auto-tuning") used
the __ATTR macro to define the "auto" sysfs for weighted interleave. A
few months later, commit 2fb6915fa22d ("compiler_types.h: add "auto" as a
macro for "__auto_type"") introduced a #define macro which expanded auto
into __auto_type.
This led to the "auto" token passed into __ATTR to be expanded out into
__auto_type, and the sysfs entry to be displayed as __auto_type as well.
Expand out the __ATTR macro and directly pass a string "auto" instead of
the raw token 'auto' to prevent it from being expanded out. Also bypass
the VERIFY_OCTAL_PERMISSIONS check by triple checking that 0664 is indeed
the intended permissions for this sysfs file.
Before:
$ ls /sys/kernel/mm/mempolicy/weighted_interleave
__auto_type node0
After:
$ ls /sys/kernel/mm/mempolicy/weighted_interleave/
auto node0
Link: https://lore.kernel.org/20260407141415.3080960-1-joshua.hahnjy@gmail.com
Fixes: 2fb6915fa22d ("compiler_types.h: add "auto" as a macro for "__auto_type"")
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Rakie Kim <rakie.kim@sk.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control updates from Linus Walleij:
"Core changes:
- Perform basic checks on pin config properties so as not to allow
directly contradictory settings such as setting a pin to more than
one bias or drive mode
- Handle input-threshold-voltage-microvolt property
- Introduce pinctrl_gpio_get_config() handling in the core for SCMI
GPIO using pin control
New drivers:
- GPIO-by-pin control driver (also appearing in the GPIO pull
request) fulfilling a promise on a comment from Grant Likely many
years ago: "can't GPIO just be a front-end for pin control?" it
turns out it can, if and only if you design something new from
scratch, such as SCMI
- Broadcom BCM7038 as a pinctrl-single delegate
- Mobileye EyeQ6Lplus OLB pin controller
- Qualcomm Eliza and Hawi families TLMM pin controllers
- Qualcomm SDM670 and Milos family LPASS LPI pin controllers
- Qualcomm IPQ5210 pin controller
- Realtek RTD1625 pin controller support
- Rockchip RV1103B pin controller support
- Texas Instruments AM62L as a pinctrl-single delegate
Improvements:
- Set config implementation for the Spacemit K1 pin controller"
* tag 'pinctrl-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (84 commits)
pinctrl: qcom: Add Hawi pinctrl driver
dt-bindings: pinctrl: qcom: Describe Hawi TLMM block
dt-bindings: pinctrl: pinctrl-max77620: convert to DT schema
pinctrl: single: Add bcm7038-padconf compatible matching
dt-bindings: pinctrl: pinctrl-single: Add brcm,bcm7038-padconf
dt-bindings: pinctrl: apple,pinctrl: Add t8122 compatible
pinctrl: qcom: sdm670-lpass-lpi: label variables as static
pinctrl: sophgo: pinctrl-sg2044: Fix wrong module description
pinctrl: sophgo: pinctrl-sg2042: Fix wrong module description
pinctrl: qcom: add sdm670 lpi tlmm
dt-bindings: pinctrl: qcom: Add SDM670 LPASS LPI pinctrl
dt-bindings: qcom: lpass-lpi-common: add reserved GPIOs property
pinctrl: qcom: Introduce IPQ5210 TLMM driver
dt-bindings: pinctrl: qcom: add IPQ5210 pinctrl
pinctrl: qcom: Drop redundant intr_target_reg on modern SoCs
pinctrl: qcom: eliza: Fix interrupt target bit
pinctrl: core: Don't use "proxy" headers
pinctrl: amd: Support new ACPI ID AMDI0033
pinctrl: renesas: rzg2l: Drop superfluous blank line
pinctrl: renesas: rzg2l: Fix save/restore of {IOLH,IEN,PUPD,SMT} registers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c updates from Alexandre Belloni:
"Subsystem:
- add sysfs option to rescan bus via entdaa
- fix error code handling for send_ccc_cmd
Drivers:
- mipi-i3c-hci-pci: Intel Nova Lake-H I3C support"
* tag 'i3c/for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: (22 commits)
i3c: mipi-i3c-hci: fix IBI payload length calculation for final status
i3c: master: adi: Fix error propagation for CCCs
i3c: master: Fix error codes at send_ccc_cmd
i3c: master: Move bus_init error suppression
i3c: master: Move entdaa error suppression
i3c: master: Move rstdaa error suppression
i3c: dw: Simplify xfer cleanup with __free(kfree)
i3c: dw: Fix memory leak in dw_i3c_master_i3c_xfers()
i3c: master: renesas: Use __free(kfree) for xfer cleanup in renesas_i3c_send_ccc_cmd()
i3c: master: renesas: Fix memory leak in renesas_i3c_i3c_xfers()
i3c: master: dw-i3c: Balance PM runtime usage count on probe failure
i3c: master: dw-i3c: Fix missing reset assertion in remove() callback
i3c: mipi-i3c-hci-pci: Enable IBI while runtime suspended for Intel controllers
i3c: mipi-i3c-hci-pci: Add optional ability to manage child runtime PM
i3c: mipi-i3c-hci: Allow parent to manage runtime PM
i3c: mipi-i3c-hci: Add quirk to allow IBI while runtime suspended
i3c: mipi-i3c-hci-pci: Set d3hot_delay to 0 for Intel controllers
i3c: fix missing newline in dev_err messages
i3c: master: use kzalloc_flex
i3c: mipi-i3c-hci-pci: Add support for Intel Nova Lake-H I3C
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture updates from Helge Deller:
- A fix to make modules on 32-bit parisc architecture work again
- Drop ip_fast_csum() inline assembly to avoid unaligned memory
accesses
- Allow to build kernel without 32-bit VDSO
- Reference leak fix in error path in LED driver
* tag 'parisc-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: led: fix reference leak on failed device registration
module.lds.S: Fix modules on 32-bit parisc architecture
parisc: Allow to build without VDSO32
parisc: Include 32-bit VDSO only when building for 32-bit or compat mode
parisc: Allow to disable COMPAT mode on 64-bit kernel
parisc: Fix default stack size when COMPAT=n
parisc: Fix signal code to depend on CONFIG_COMPAT instead of CONFIG_64BIT
parisc: is_compat_task() shall return false for COMPAT=n
parisc: Avoid compat syscalls when COMPAT=n
parisc: _llseek syscall is only available for 32-bit userspace
parisc: Drop ip_fast_csum() inline assembly implementation
parisc: update outdated comments for renamed ccio_alloc_consistent()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
- improve debuggability of reserve_mem kernel parameter handling with
print outs in case of a failure and debugfs info showing what was
actually reserved
- Make memblock_free_late() and free_reserved_area() use the same core
logic for freeing the memory to buddy and ensure it takes care of
updating memblock arrays when ARCH_KEEP_MEMBLOCK is enabled.
* tag 'memblock-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
x86/alternative: delay freeing of smp_locks section
memblock: warn when freeing reserved memory before memory map is initialized
memblock, treewide: make memblock_free() handle late freeing
memblock: make free_reserved_area() update memblock if ARCH_KEEP_MEMBLOCK=y
memblock: extract page freeing from free_reserved_area() into a helper
memblock: make free_reserved_area() more robust
mm: move free_reserved_area() to mm/memblock.c
powerpc: opal-core: pair alloc_pages_exact() with free_pages_exact()
powerpc: fadump: pair alloc_pages_exact() with free_pages_exact()
memblock: reserve_mem: fix end caclulation in reserve_mem_release_by_name()
memblock: move reserve_bootmem_range() to memblock.c and make it static
memblock: Add reserve_mem debugfs info
memblock: Print out errors on reserve_mem parser
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"The biggest news in this pull request is that it will start the last
cycle of me handling the I2C subsystem. From 7.2. on, I will pass
maintainership to Andi Shyti who has been maintaining the I2C drivers
for a while now and who has done a great job in doing so.
We will use this cycle for a hopefully smooth transition. Thanks must
go to Andi for stepping up! I will still be around for guidance.
Updates:
- generic cleanups in npcm7xx, qcom-cci, xiic and designware DT
bindings
- atr: use kzalloc_flex for alias pool allocation
- ixp4xx: convert bindings to DT schema
- ocores: use read_poll_timeout_atomic() for polling waits
- qcom-geni: skip extra TX DMA TRE for single read messages
- s3c24xx: validate SMBus block length before using it
- spacemit: refactor xfer path and add K1 PIO support
- tegra: identify DVC and VI with SoC data variants
- tegra: support SoC-specific register offsets
- xiic: switch to devres and generic fw properties
- xiic: skip input clock setup on non-OF systems
- various minor improvements in other drivers
rtl9300:
- add per-SoC callbacks and clock support for RTL9607C
- add support for new 50 kHz and 2.5 MHz bus speeds
- general refactoring in preparation for RTL9607C support
New support:
- DesignWare GOOG5000 (ACPI HID)
- Intel Nova Lake (ACPI ID)
- Realtek RTL9607C
- SpacemiT K3 binding
- Tegra410 register layout support"
* tag 'i2c-for-7.1-rc1-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (40 commits)
i2c: usbio: Add ACPI device-id for NVL platforms
i2c: qcom-geni: Avoid extra TX DMA TRE for single read message in GPI mode
i2c: atr: use kzalloc_flex
i2c: spacemit: introduce pio for k1
i2c: spacemit: move i2c_xfer_msg()
i2c: xiic: skip input clock setup on non-OF systems
i2c: xiic: use numbered adapter registration
i2c: xiic: cosmetic: use resource format specifier in debug log
i2c: xiic: cosmetic cleanup
i2c: xiic: switch to generic device property accessors
i2c: xiic: remove duplicate error message
i2c: xiic: switch to devres managed APIs
i2c: rtl9300: add RTL9607C i2c controller support
i2c: rtl9300: introduce new function properties to driver data
i2c: rtl9300: introduce clk struct for upcoming rtl9607 support
dt-bindings: i2c: realtek,rtl9301-i2c: extend for clocks and RTL9607C support
i2c: rtl9300: introduce a property for 8 bit width reg address
i2c: rtl9300: introduce F_BUSY to the reg_fields struct
i2c: rtl9300: introduce max length property to driver data
i2c: rtl9300: split data_reg into read and write reg
...
|
|
Pull ipmi updates from Corey Minyard:
"Small updates and fixes (mostly to the BMC software):
- Fix one issue in the host side driver where a kthread can be left
running on a specific memory allocation failre at probe time
- Replace system_wq with system_percpu_wq so system_wq can eventually
go away"
* tag 'for-linus-7.1-1' of https://github.com/cminyard/linux-ipmi:
ipmi:ssif: Clean up kthread on errors
ipmi:ssif: Remove unnecessary indention
ipmi: ssif_bmc: Fix KUnit test link failure when KUNIT=m
ipmi: ssif_bmc: add unit test for state machine
ipmi: ssif_bmc: change log level to dbg in irq callback
ipmi: ssif_bmc: fix message desynchronization after truncated response
ipmi: ssif_bmc: fix missing check for copy_to_user() partial failure
ipmi: ssif_bmc: cancel response timer on remove
ipmi: Replace use of system_wq with system_percpu_wq
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim:
"perf report:
- Add 'comm_nodigit' sort key to combine similar threads that only
have different numbers in the comm. In the following example, the
'comm_nodigit' will have samples from all threads starting with
"bpfrb/" into an entry "bpfrb/<N>".
$ perf report -s comm_nodigit,comm -H
...
#
# Overhead CommandNoDigit / Command
# ........... ........................
#
20.30% swapper
20.30% swapper
13.37% chrome
13.37% chrome
10.07% bpfrb/<N>
7.47% bpfrb/0
0.70% bpfrb/1
0.47% bpfrb/3
0.46% bpfrb/2
0.25% bpfrb/4
0.23% bpfrb/5
0.20% bpfrb/6
0.14% bpfrb/10
0.07% bpfrb/7
- Support flat layout for symfs. The --symfs option is to specify the
location of debugging symbol files. The default 'hierarchy' layout
would search the symbol file using the same path of the original
file under the symfs root. The new 'flat' layout would search only
in the root directory.
- Update 'simd' sort key for ARM SIMD flags to cover ASE/SME and more
predicate flags.
perf stat:
- Add --pmu-filter option to select specific PMUs. This would be
useful when you measure metrics from multiple instance of uncore
PMUs with similar names.
# perf stat -M cpa_p0_avg_bw
Performance counter stats for 'system wide':
19,417,779,115 hisi_sicl0_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw
0 hisi_sicl0_cpa0/cpa_p0_wr_dat/
0 hisi_sicl0_cpa0/cpa_p0_rd_dat_64b/
0 hisi_sicl0_cpa0/cpa_p0_rd_dat_32b/
19,417,751,103 hisi_sicl10_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw
0 hisi_sicl10_cpa0/cpa_p0_wr_dat/
0 hisi_sicl10_cpa0/cpa_p0_rd_dat_64b/
0 hisi_sicl10_cpa0/cpa_p0_rd_dat_32b/
19,417,730,679 hisi_sicl2_cpa0/cpa_cycles/ # 0.31 cpa_p0_avg_bw
75,635,749 hisi_sicl2_cpa0/cpa_p0_wr_dat/
18,520,640 hisi_sicl2_cpa0/cpa_p0_rd_dat_64b/
0 hisi_sicl2_cpa0/cpa_p0_rd_dat_32b/
19,417,674,227 hisi_sicl8_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw
0 hisi_sicl8_cpa0/cpa_p0_wr_dat/
0 hisi_sicl8_cpa0/cpa_p0_rd_dat_64b/
0 hisi_sicl8_cpa0/cpa_p0_rd_dat_32b/
19.417734480 seconds time elapsed
With --pmu-filter, users can select only hisi_sicl2_cpa0 PMU.
# perf stat --pmu-filter hisi_sicl2_cpa0 -M cpa_p0_avg_bw
Performance counter stats for 'system wide':
6,234,093,559 cpa_cycles # 0.60 cpa_p0_avg_bw
50,548,465 cpa_p0_wr_dat
7,552,182 cpa_p0_rd_dat_64b
0 cpa_p0_rd_dat_32b
6.234139320 seconds time elapsed
Data type profiling:
- Quality improvements by tracking register state more precisely
- Ensure array members to get the type
- Handle more cases for global variables
Vendor event/metric updates:
- Update various Intel events and metrics
- Add NVIDIA Tegra 410 Olympus events
Internal changes:
- Verify perf.data header for maliciously crafted files
- Update perf test to cover more usages and make them robust
- Move a couple of copied kernel headers not to annoy objtool build
- Fix a bug in map sorting in name order
- Remove some unused codes
Misc:
- Fix module symbol resolution with non-zero text address
- Add -t/--threads option to `perf bench mem mmap`
- Track duration of exit*() syscall by `perf trace -s`
- Add core.addr2line-timeout and core.addr2line-disable-warn config
items"
* tag 'perf-tools-for-v7.1-2026-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (131 commits)
perf loongarch: Fix build failure with CONFIG_LIBDW_DWARF_UNWIND
perf annotate: Use jump__delete when freeing LoongArch jumps
perf test: Fixes for check branch stack sampling
perf test: Fix inet_pton probe failure and unroll call graph
perf build: fix "argument list too long" in second location
perf header: Add sanity checks to HEADER_BPF_BTF processing
perf header: Sanity check HEADER_BPF_PROG_INFO
perf header: Sanity check HEADER_PMU_CAPS
perf header: Sanity check HEADER_HYBRID_TOPOLOGY
perf header: Sanity check HEADER_CACHE
perf header: Sanity check HEADER_GROUP_DESC
perf header: Sanity check HEADER_PMU_MAPPINGS
perf header: Sanity check HEADER_MEM_TOPOLOGY
perf header: Sanity check HEADER_NUMA_TOPOLOGY
perf header: Sanity check HEADER_CPU_TOPOLOGY
perf header: Sanity check HEADER_NRCPUS and HEADER_CPU_DOMAIN_INFO
perf header: Bump up the max number of command line args allowed
perf header: Validate nr_domains when reading HEADER_CPU_DOMAIN_INFO
perf sample: Fix documentation typo
perf arm_spe: Improve SIMD flags setting
...
|
|
Add myself as a page cache reviewer since I tend to review changes in
these areas anyway.
[akpm@linux-foundation.org: add linux-mm@kvack.org]
Link: https://lore.kernel.org/20260415174039.13016-2-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Lorenzo Stoakes <ljs@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When the -fsanitize=bounds sanitizer is enabled, gcc-16 sometimes runs
into a corner case in the read_ctrl_pos() pos function, where it sees
possible undefined behavior from the 'tier' index overflowing, presumably
in the case that this was called with a negative tier:
In function 'get_tier_idx',
inlined from 'isolate_folios' at mm/vmscan.c:4671:14:
mm/vmscan.c: In function 'isolate_folios':
mm/vmscan.c:4645:29: error: 'pv.refaulted' is used uninitialized [-Werror=uninitialized]
Part of the problem seems to be that read_ctrl_pos() has unusual calling
conventions since commit 37a260870f2c ("mm/mglru: rework type selection")
where passing MAX_NR_TIERS makes it accumulate all tiers but passing a
smaller positive number makes it read a single tier instead.
Shut up the warning by adding a fake initialization to the two instances
of this variable that can run into that corner case.
Link: https://lore.kernel.org/all/CAJHvVcjtFW86o5FoQC8MMEXCHAC0FviggaQsd5EmiCHP+1fBpg@mail.gmail.com/
Link: https://lore.kernel.org/20260414065206.3236176-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kairui Song <kasong@tencent.com>
Cc: Koichiro Den <koichiro.den@canonical.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Use my personal email address due to the Red Hat work will stop soon
Link: https://lore.kernel.org/ad8GFhh3SI1wb7IC@darkstar.users.ipa.redhat.com
Signed-off-by: Dave Young <ruirui.yang@linux.dev>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The directory does not exist any more.
Link: https://lore.kernel.org/20260414121752.1912847-4-pratyush@kernel.org
Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The KHO entry already includes include/linux/kho. Listing its
subdirectory is redundant.
Link: https://lore.kernel.org/20260414121752.1912847-3-pratyush@kernel.org
Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "MAINTAINERS: update KHO and LIVE UPDATE entries".
This series contains some updates for the Kexec Handover (KHO) and Live
update entries. Patch 1 updates the maintainers list and adds the
liveupdate tree. Patches 2 and 3 clean up stale files in the list.
This patch (of 3):
I have been helping out with reviewing and developing KHO. I would also
like to help maintain it. Change my entry from R to M for KHO and live
update. Alex has been inactive for a while, so to avoid over-crowding the
KHO entry and to keep the information up-to-date, move his entry from M to
R.
We also now have a tree for KHO and live update at liveupdate/linux.git
where we plan to start maintaining those subsystems and start queuing the
patches. List that in the entries as well.
Link: https://lore.kernel.org/20260414121752.1912847-1-pratyush@kernel.org
Link: https://lore.kernel.org/20260414121752.1912847-2-pratyush@kernel.org
Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Reviewed-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: David Hildenbrand <david@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Update KEXEC and KDUMP maintainer entries by adding the live update group
maintainers. Remove Vivek Goyal due to inactivity to keep the MAINTAINERS
file up-to-date, and add Vivek to the CREDITS file to recognize their
contributions.
Link: https://lore.kernel.org/20260413121146.49215-1-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Pratyush Yadav <pratyush@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Diego Viola <diego.viola@gmail.com>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Martin Kepplinger <martink@posteo.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
migrate_vma_collect_huge_pmd()
The softleaf_is_migration() check is unreachable as entries that are not
device_private are filtered out. Similarly, the PTE-level equivalent in
migrate_vma_collect_pmd() skips migration entries.
This dead branch also contained a double spin_unlock(ptl) bug.
Link: https://lore.kernel.org/20260212014611.416695-1-dave@stgolabs.net
Fixes: a30b48bf1b244 ("mm/migrate_device: implement THP migration of zone device pages")
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Balbir Singh <balbirs@nvidia.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
charge_reserved_hugetlb.sh tears down background writers with killall from
psmisc. Minimal Ubuntu images do not always provide that tool, so the
selftest fails in cleanup for an environment reason rather than for the
hugetlb behavior it is trying to cover.
Skip the test when killall is unavailable, similar to the existing root
check, so these environments report the dependency clearly instead of
failing the test.
Link: https://lore.kernel.org/20260410044139.67480-1-create0818@163.com
Signed-off-by: Cao Ruichuang <create0818@163.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The current implementation of validate_range() in fs/userfaultfd.c
performs a hard check against mmap_min_addr. This is redundant because
UFFDIO_REGISTER operates on memory ranges that must already be backed by a
VMA.
Enforcing mmap_min_addr or capability checks again in userfaultfd is
unnecessary and prevents applications like binary compilers from using
UFFD for valid memory regions mapped by application.
Remove the redundant check for mmap_min_addr.
We started using UFFD instead of the classic mprotect approach in the
binary translator to track application writes. During development, we
encountered this bug. The translator cannot control where the translated
application chooses to map its memory and if the app requires a
low-address area, UFFD fails, whereas mprotect would work just fine. I
believe this is a genuine logic bug rather than an improvement, and I
would appreciate including the fix in stable.
Link: https://lore.kernel.org/20260409103345.15044-1-komlomal@gmail.com
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Denis M. Karpov <komlomal@gmail.com>
Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
Acked-by: Harry Yoo (Oracle) <harry@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
vmstat_shepherd uses delayed_work_pending() to check whether vmstat_update
is already scheduled for a given CPU before queuing it. However,
delayed_work_pending() only tests WORK_STRUCT_PENDING_BIT, which is
cleared the moment a worker thread picks up the work to execute it.
This means that while vmstat_update is actively running on a CPU,
delayed_work_pending() returns false. If need_update() also returns true
at that point (per-cpu counters not yet zeroed mid-flush), the shepherd
queues a second invocation with delay=0, causing vmstat_update to run
again immediately after finishing.
On a 72-CPU system this race is readily observable: before the fix, many
CPUs show invocation gaps well below 500 jiffies (the minimum
round_jiffies_relative() can produce), with the most extreme cases
reaching 0 jiffies—vmstat_update called twice within the same jiffy.
Fix this by replacing delayed_work_pending() with work_busy(), which
returns non-zero for both WORK_BUSY_PENDING (timer armed or work queued)
and WORK_BUSY_RUNNING (work currently executing). The shepherd now
correctly skips a CPU in all busy states.
After the fix, all sub-jiffy and most sub-100-jiffie gaps disappear. The
remaining early invocations have gaps in the 700–999 jiffie range,
attributable to round_jiffies_relative() aligning to a nearer
jiffie-second boundary rather than to this race.
Each spurious vmstat_update invocation has a measurable side effect:
refresh_cpu_vm_stats() calls decay_pcp_high() for every zone, which drains
idle per-CPU pages back to the buddy allocator via free_pcppages_bulk(),
taking the zone spinlock each time. Eliminating the double-scheduling
therefore reduces zone lock contention directly. On a 72-CPU stress-ng
workload measured with perf lock contention:
free_pcppages_bulk contention count: ~55% reduction
free_pcppages_bulk total wait time: ~57% reduction
free_pcppages_bulk max wait time: ~47% reduction
Note: work_busy() is inherently racy—between the check and the
subsequent queue_delayed_work_on() call, vmstat_update can finish
execution, leaving the work neither pending nor running. In that narrow
window the shepherd can still queue a second invocation. After the fix,
this residual race is rare and produces only occasional small gaps, a
significant improvement over the systematic double-scheduling seen with
delayed_work_pending().
Link: https://lore.kernel.org/20260409-vmstat-v2-1-e9d9a6db08ad@debian.org
Fixes: 7b8da4c7f07774 ("vmstat: get rid of the ugly cpu_stat_off variable")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Dmitry Ilvokhin <d@ilvokhin.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If hugepages, hugepagesz, or default_hugepagesz are specified on the
kernel command line without the '=' separator, early parameter parsing
passes NULL to hugetlb_add_param(), which dereferences it in strlen() and
can crash the system during early boot.
Reject NULL values in hugetlb_add_param() and return -EINVAL instead.
Link: https://lore.kernel.org/20260409105437.108686-4-thorsten.blum@linux.dev
Fixes: 5b47c02967ab ("mm/hugetlb: convert cmdline parameters from setup to early")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Cc: David Hildenbrand <david@kernel.org>
Cc: Frank van der Linden <fvdl@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
recompress_store() parses the type= parameter with three if statements
checking for "idle", "huge", and "huge_idle". An unrecognized value
silently falls through with mode left at 0, causing the recompression pass
to run with no slot filter — processing all slots instead of the
intended subset.
Add a !mode check after the type parsing block to return -EINVAL for
unrecognized values, consistent with the function's other parameter
validation.
Link: https://lore.kernel.org/20260407153027.42425-1-astellman@stellman-greene.com
Signed-off-by: Andrew Stellman <astellman@stellman-greene.com>
Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The ProtectionKey entry was added in v4.9; back then it was x86-specific,
but it now lives in generic code and applies to all architectures
supporting pkeys (currently x86, power, arm64).
Time to document it: add a paragraph to proc.rst about the ProtectionKey
entry.
[akpm@linux-foundation.org: s/system/hardware/, per review discussion]
[akpm@linux-foundation.org: s/hardware/CPU/]
Link: https://lore.kernel.org/20260407125133.564182-1-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reported-by: Yury Khrustalev <yury.khrustalev@arm.com>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The common order-0 case is important enough to want its own branch, and
avoids the hairy, large loop logic that the CPU does not seem to handle
particularly well.
While at it, encourage the compiler to inline batch PTE logic and resolve
constant branches by adding __always_inline strategically.
Link: https://lore.kernel.org/20260402141628.3367596-3-pfalcato@suse.de
Signed-off-by: Pedro Falcato <pfalcato@suse.de>
Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Tested-by: Luke Yang <luyang@redhat.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Hladky <jhladky@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm/mprotect: micro-optimization work", v3.
Micro-optimize the change_protection functionality and the
change_pte_range() routine. This set of functions works in an incredibly
tight loop, and even small inefficiencies are incredibly evident when spun
hundreds, thousands or hundreds of thousands of times.
There was an attempt to keep the batching functionality as much as
possible, which introduced some part of the slowness, but not all of it.
Removing it for !arm64 architectures would speed mprotect() up even
further, but could easily pessimize cases where large folios are mapped
(which is not as rare as it seems, particularly when it comes to the page
cache these days).
The micro-benchmark used for the tests was [0] (usable using
google/benchmark and g++ -O2 -lbenchmark repro.cpp)
This resulted in the following (first entry is baseline):
---------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------
mprotect_bench 85967 ns 85967 ns 6935
mprotect_bench 70684 ns 70684 ns 9887
After the patchset we can observe an ~18% speedup in mprotect. Wonderful
for the elusive mprotect-based workloads!
Testing & more ideas welcome. I suspect there is plenty of improvement
possible but it would require more time than what I have on my hands right
now. The entire inlined function (which inlines into change_protection())
is gigantic - I'm not surprised this is so finnicky.
Note: per my profiling, the next _big_ bottleneck here is
modify_prot_start_ptes, exactly on the xchg() done by x86.
ptep_get_and_clear() is _expensive_. I don't think there's a properly
safe way to go about it since we do depend on the D bit quite a lot. This
might not be such an issue on other architectures.
Luke Yang reported [1]:
: On average, we see improvements ranging from a minimum of 5% to a
: maximum of 55%, with most improvements showing around a 25% speed up in
: the libmicro/mprot_tw4m micro benchmark.
This patch (of 2):
Move softleaf change_pte_range code into a separate function. This makes
the change_pte_range() function a good bit smaller, and lessens cognitive
load when reading through the function.
Link: https://lore.kernel.org/20260402141628.3367596-1-pfalcato@suse.de
Link: https://lore.kernel.org/20260402141628.3367596-2-pfalcato@suse.de
Link: https://lore.kernel.org/all/aY8-XuFZ7zCvXulB@luyang-thinkpadp1gen7.toromso.csb/
Link: https://gist.github.com/heatd/1450d273005aba91fa5744f44dfcd933 [0]
Link: https://lore.kernel.org/CAL2CeBxT4jtJ+LxYb6=BNxNMGinpgD_HYH5gGxOP-45Q2OncqQ@mail.gmail.com [1]
Signed-off-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Tested-by: Luke Yang <luyang@redhat.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Hladky <jhladky@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Android systems usually use memory.reclaim interface to implement user
space memory management which expects that the requested reclaim target
and actually reclaimed amount memory are not diverging by too much. With
the current MGRLU implementation there is, however, no bail out when the
reclaim target is reached and this could lead to an excessive reclaim
that scales with the reclaim hierarchy size.For example, we can get a
nr_reclaimed=394/nr_to_reclaim=32 proactive reclaim under a common 1-N
cgroup hierarchy.
This defect arose from the goal of keeping fairness among memcgs that is,
for try_to_free_mem_cgroup_pages -> shrink_node_memcgs -> shrink_lruvec ->
lru_gen_shrink_lruvec -> try_to_shrink_lruvec, the !root_reclaim(sc) check
was there for reclaim fairness, which was necessary before commit
b82b530740b9 ("mm: vmscan: restore incremental cgroup iteration") because
the fairness depended on attempted proportional reclaim from every memcg
under the target memcg. However after commit b82b530740b9 there is no
longer a need to visit every memcg to ensure fairness. Let's have
try_to_shrink_lruvec bail out when the nr_reclaimed achieved.
Link: https://lore.kernel.org/20260318011558.1696310-1-zhaoyang.huang@unisoc.com
Link: https://lore.kernel.org/20260212032111.408865-1-zhaoyang.huang@unisoc.com
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Suggested-by: T.J.Mercier <tjmercier@google.com>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Qi Zheng <qi.zheng@linux.dev>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Kairui Song <kasong@tencent.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Wei Xu <weixugc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The comment in mmzone.h currently details exhaustive per-architecture
bit-width lists and explains alignment using min(PAGE_SHIFT,
PFN_SECTION_SHIFT). Such details risk falling out of date over time and
may inadvertently be left un-updated.
We always expect a single section to cover full pages. Therefore, we can
safely assume that PFN_SECTION_SHIFT is large enough to accommodate
SECTION_MAP_LAST_BIT. We use BUILD_BUG_ON() to ensure this.
Update the comment to accurately reflect this consensus, making it clear
that we rely on a single section covering full pages.
Link: https://lore.kernel.org/20260402102320.3617578-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Petr Tesarik <ptesarik@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
sio_read_complete() uses sio->pages to account global PSWPIN vm events,
but sio->pages tracks the number of bvec entries (folios), not base pages.
While large folios cannot currently reach this path (SWP_FS_OPS and
SWP_SYNCHRONOUS_IO are mutually exclusive, and mTHP swap-in allocation is
gated on SWP_SYNCHRONOUS_IO), the accounting is semantically inconsistent
with the per-memcg path which correctly uses folio_nr_pages().
Use sio->len >> PAGE_SHIFT instead, which gives the correct base page
count since sio->len is accumulated via folio_size(folio).
Link: https://lore.kernel.org/20260402061408.36119-1-devnexen@gmail.com
Signed-off-by: David Carlier <devnexen@gmail.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: NeilBrown <neil@brown.name>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The test requires thp, skip the test when thp is not available to avoid
false positive.
Tested with thp disabled kernel.
Before the fix:
# --------------------------------
# running ./transhuge-stress -d 20
# --------------------------------
# TAP version 13
# 1..1
# transhuge-stress: allocate 1453 transhuge pages, using 2907 MiB virtual memory and 11 MiB of ram
# Bail out! MADV_HUGEPAGE# Planned tests != run tests (1 != 0)
# # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
# [FAIL]
not ok 60 transhuge-stress -d 20 # exit=1
After the fix:
# --------------------------------
# running ./transhuge-stress -d 20
# --------------------------------
# TAP version 13
# 1..0 # SKIP Transparent Hugepages not available
# [SKIP]
ok 5 transhuge-stress -d 20 # SKIP
Link: https://lore.kernel.org/20260402014543.1671131-7-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Li Wang <liwang@redhat.com>
Cc: Nico Pache <npache@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When thp is not enabled on some kernel config such as realtime kernel, the
test will report failure. Fix the false positive by skipping the test
directly when thp is not enabled.
Tested with thp disabled kernel:
Before The fix:
# --------------------------------------------------
# running ./split_huge_page_test /tmp/xfs_dir_Ywup9p
# --------------------------------------------------
# TAP version 13
# Bail out! Reading PMD pagesize failed
# # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
# [FAIL]
not ok 61 split_huge_page_test /tmp/xfs_dir_Ywup9p # exit=1
After the fix:
# --------------------------------------------------
# running ./split_huge_page_test /tmp/xfs_dir_YHPUPl
# --------------------------------------------------
# TAP version 13
# 1..0 # SKIP Transparent Hugepages not available
# [SKIP]
ok 6 split_huge_page_test /tmp/xfs_dir_YHPUPl # SKIP
Link: https://lore.kernel.org/20260402014543.1671131-6-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Li Wang <liwang@redhat.com>
Cc: Nico Pache <npache@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add three more checks for buflen and numwritten. The buflen should be at
least two, that means at least one char and the null-end. The error case
check is added by checking numwriten < 0 instead of numwritten < 1. And
the truncate case is checked. The test will exit if any of these
conditions aren't met.
Additionally, add more print information when a write failure occurs or a
truncated write happens, providing clearer diagnostics.
Link: https://lore.kernel.org/20260402014543.1671131-5-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
thp_settings provides write_file() helper for safely writing to a file and
exit when write failure happens. It's a very low level helper and many
sub tests need such a helper, not only thp tests.
split_huge_page_test also defines a write_file locally. The two have
minior differences in return type and used exit api. And there would be
conflicts if split_huge_page_test wanted to include thp_settings.h because
of different prototype, making it less convenient.
It's possisble to merge the two, although some tests don't use the
kselftest infrastrucutre for testing. It would also work when using the
ksft_exit_msg() to exit in my test, as the counters are all zero. Output
will be like:
TAP version 13
1..62
Bail out! /proc/sys/vm/drop_caches1 open failed: No such file or directory
# Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
So here we just keep the version in split_huge_page_test, and move it into
the vm_util. This makes it easy to maitain and user could just include
one vm_util.h when they don't need thp setting helpers. Keep the
prototype of void return as the function will exit on any error, return
value is not necessary, and will simply the callers like write_num() and
write_string().
Link: https://lore.kernel.org/20260402014543.1671131-4-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Suggested-by: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The test_hugepage test contain two sub tests. If just reporting one skip
when thp not available, there will be error in the log because the test
count don't match the test plan. Change to skip two tests by running the
ksft_test_result_skip twice in this case.
Without the fix (run test on thp disabled kernel):
./run_vmtests.sh -t soft_dirty
# --------------------
# running ./soft-dirty
# --------------------
# TAP version 13
# 1..19
# ok 1 Test test_simple
# ok 2 Test test_vma_reuse dirty bit of allocated page
# ok 3 Test test_vma_reuse dirty bit of reused address page
# ok 4 # SKIP Transparent Hugepages not available
# ok 5 Test test_mprotect-anon dirty bit of new written page
# ok 6 Test test_mprotect-anon soft-dirty clear after clear_refs
# ok 7 Test test_mprotect-anon soft-dirty clear after marking RO
# ok 8 Test test_mprotect-anon soft-dirty clear after marking RW
# ok 9 Test test_mprotect-anon soft-dirty after rewritten
# ok 10 Test test_mprotect-file dirty bit of new written page
# ok 11 Test test_mprotect-file soft-dirty clear after clear_refs
# ok 12 Test test_mprotect-file soft-dirty clear after marking RO
# ok 13 Test test_mprotect-file soft-dirty clear after marking RW
# ok 14 Test test_mprotect-file soft-dirty after rewritten
# ok 15 Test test_merge-anon soft-dirty after remap merge 1st pg
# ok 16 Test test_merge-anon soft-dirty after remap merge 2nd pg
# ok 17 Test test_merge-anon soft-dirty after mprotect merge 1st pg
# ok 18 Test test_merge-anon soft-dirty after mprotect merge 2nd pg
# # 1 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# # Planned tests != run tests (19 != 18)
# # Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0
# [FAIL]
not ok 52 soft-dirty # exit=1
With the fix (run test on thp disabled kernel):
./run_vmtests.sh -t soft_dirty
# --------------------
# running ./soft-dirty
# TAP version 13
# --------------------
# running ./soft-dirty
# --------------------
# TAP version 13
# 1..19
# ok 1 Test test_simple
# ok 2 Test test_vma_reuse dirty bit of allocated page
# ok 3 Test test_vma_reuse dirty bit of reused address page
# # Transparent Hugepages not available
# ok 4 # SKIP Test test_hugepage huge page allocation
# ok 5 # SKIP Test test_hugepage huge page dirty bit
# ok 6 Test test_mprotect-anon dirty bit of new written page
# ok 7 Test test_mprotect-anon soft-dirty clear after clear_refs
# ok 8 Test test_mprotect-anon soft-dirty clear after marking RO
# ok 9 Test test_mprotect-anon soft-dirty clear after marking RW
# ok 10 Test test_mprotect-anon soft-dirty after rewritten
# ok 11 Test test_mprotect-file dirty bit of new written page
# ok 12 Test test_mprotect-file soft-dirty clear after clear_refs
# ok 13 Test test_mprotect-file soft-dirty clear after marking RO
# ok 14 Test test_mprotect-file soft-dirty clear after marking RW
# ok 15 Test test_mprotect-file soft-dirty after rewritten
# ok 16 Test test_merge-anon soft-dirty after remap merge 1st pg
# ok 17 Test test_merge-anon soft-dirty after remap merge 2nd pg
# ok 18 Test test_merge-anon soft-dirty after mprotect merge 1st pg
# ok 19 Test test_merge-anon soft-dirty after mprotect merge 2nd pg
# # 2 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# # Totals: pass:17 fail:0 xfail:0 xpass:0 skip:2 error:0
# [PASS]
ok 1 soft-dirty
hwpoison_inject
# SUMMARY: PASS=1 SKIP=0 FAIL=0
1..1
Link: https://lore.kernel.org/20260402014543.1671131-3-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Li Wang <liwang@redhat.com>
Cc: Nico Pache <npache@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "selftests/mm: skip several tests when thp is not available",
v8.
There are several tests requires transprarent hugepages, when run on thp
disabled kernel such as realtime kernel, there will be false negative.
Mark those tests as skip when thp is not available.
This patch (of 6):
When thp is not available, just skip the collape tests to avoid the false
negative.
Without the change, run with a thp disabled kernel:
./run_vmtests.sh -t madv_guard -n 1
<snip/>
# RUN guard_regions.anon.collapse ...
# guard-regions.c:2217:collapse:Expected madvise(ptr, size, MADV_NOHUGEPAGE) (-1) == 0 (0)
# collapse: Test terminated by assertion
# FAIL guard_regions.anon.collapse
not ok 2 guard_regions.anon.collapse
<snip/>
# RUN guard_regions.shmem.collapse ...
# guard-regions.c:2217:collapse:Expected madvise(ptr, size, MADV_NOHUGEPAGE) (-1) == 0 (0)
# collapse: Test terminated by assertion
# FAIL guard_regions.shmem.collapse
not ok 32 guard_regions.shmem.collapse
<snip/>
# RUN guard_regions.file.collapse ...
# guard-regions.c:2217:collapse:Expected madvise(ptr, size, MADV_NOHUGEPAGE) (-1) == 0 (0)
# collapse: Test terminated by assertion
# FAIL guard_regions.file.collapse
not ok 62 guard_regions.file.collapse
<snip/>
# FAILED: 87 / 90 tests passed.
# 17 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# Totals: pass:70 fail:3 xfail:0 xpass:0 skip:17 error:0
With this change, run with thp disabled kernel:
./run_vmtests.sh -t madv_guard -n 1
<snip/>
# RUN guard_regions.anon.collapse ...
# SKIP Transparent Hugepages not available
# OK guard_regions.anon.collapse
ok 2 guard_regions.anon.collapse # SKIP Transparent Hugepages not available
<snip/>
# RUN guard_regions.file.collapse ...
# SKIP Transparent Hugepages not available
# OK guard_regions.file.collapse
ok 62 guard_regions.file.collapse # SKIP Transparent Hugepages not available
<snip/>
# RUN guard_regions.shmem.collapse ...
# SKIP Transparent Hugepages not available
# OK guard_regions.shmem.collapse
ok 32 guard_regions.shmem.collapse # SKIP Transparent Hugepages not available
<snip/>
# PASSED: 90 / 90 tests passed.
# 20 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# Totals: pass:70 fail:0 xfail:0 xpass:0 skip:20 error:0
Link: https://lore.kernel.org/20260402014543.1671131-1-chuhu@redhat.com
Link: https://lore.kernel.org/20260402014543.1671131-2-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Li Wang <liwang@redhat.com>
Cc: Nico Pache <npache@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Since __mfill_atomic_pte() handles the retry for both anonymous and shmem,
there is no need to retry copying the date from the userspace in the loop
in mfill_atomic().
Drop the retry logic from mfill_atomic().
[rppt@kernel.org: remove safety measure of not returning ENOENT from _copy]
Link: https://lore.kernel.org/ac5zcDUY8CFHr6Lw@kernel.org
Link: https://lore.kernel.org/20260402041156.1377214-12-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand (Arm) <david@kernel.org>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Harry Yoo (Oracle) <harry@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nikita Kalyazin <kalyazin@amazon.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Carlier <devnexen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add filemap_add() and filemap_remove() methods to vm_uffd_ops and use them
in __mfill_atomic_pte() to add shmem folios to page cache and remove them
in case of error.
Implement these methods in shmem along with vm_uffd_ops->alloc_folio() and
drop shmem_mfill_atomic_pte().
Since userfaultfd now does not reference any functions from shmem, drop
include if linux/shmem_fs.h from mm/userfaultfd.c
mfill_atomic_install_pte() is not used anywhere outside of mm/userfaultfd,
make it static.
Link: https://lore.kernel.org/20260402041156.1377214-11-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: James Houghton <jthoughton@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand (Arm) <david@kernel.org>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Harry Yoo (Oracle) <harry@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nikita Kalyazin <kalyazin@amazon.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Carlier <devnexen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
and use it to refactor mfill_atomic_pte_zeroed_folio() and
mfill_atomic_pte_copy().
mfill_atomic_pte_zeroed_folio() and mfill_atomic_pte_copy() perform
almost identical actions:
* allocate a folio
* update folio contents (either copy from userspace of fill with zeros)
* update page tables with the new folio
Split a __mfill_atomic_pte() helper that handles both cases and uses newly
introduced vm_uffd_ops->alloc_folio() to allocate the folio.
Pass the ops structure from the callers to __mfill_atomic_pte() to later
allow using anon_uffd_ops for MAP_PRIVATE mappings of file-backed VMAs.
Note, that the new ops method is called alloc_folio() rather than
folio_alloc() to avoid clash with alloc_tag macro folio_alloc().
Link: https://lore.kernel.org/20260402041156.1377214-10-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: James Houghton <jthoughton@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand (Arm) <david@kernel.org>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Harry Yoo (Oracle) <harry@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nikita Kalyazin <kalyazin@amazon.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Carlier <devnexen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When userspace resolves a page fault in a shmem VMA with UFFDIO_CONTINUE
it needs to get a folio that already exists in the pagecache backing that
VMA.
Instead of using shmem_get_folio() for that, add a get_folio_noalloc()
method to 'struct vm_uffd_ops' that will return a folio if it exists in
the VMA's pagecache at given pgoff.
Implement get_folio_noalloc() method for shmem and slightly refactor
userfaultfd's mfill_get_vma() and mfill_atomic_pte_continue() to support
this new API.
Link: https://lore.kernel.org/20260402041156.1377214-9-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: James Houghton <jthoughton@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand (Arm) <david@kernel.org>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Harry Yoo (Oracle) <harry@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nikita Kalyazin <kalyazin@amazon.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Carlier <devnexen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Current userfaultfd implementation works only with memory managed by core
MM: anonymous, shmem and hugetlb.
First, there is no fundamental reason to limit userfaultfd support only to
the core memory types and userfaults can be handled similarly to regular
page faults provided a VMA owner implements appropriate callbacks.
Second, historically various code paths were conditioned on
vma_is_anonymous(), vma_is_shmem() and is_vm_hugetlb_page() and some of
these conditions can be expressed as operations implemented by a
particular memory type.
Introduce vm_uffd_ops extension to vm_operations_struct that will delegate
memory type specific operations to a VMA owner.
Operations for anonymous memory are handled internally in userfaultfd
using anon_uffd_ops that implicitly assigned to anonymous VMAs.
Start with a single operation, ->can_userfault() that will verify that a
VMA meets requirements for userfaultfd support at registration time.
Implement that method for anonymous, shmem and hugetlb and move relevant
parts of vma_can_userfault() into the new callbacks.
[rppt@kernel.org: relocate VM_DROPPABLE test, per Tal]
Link: https://lore.kernel.org/adffgfM5ANxtPIEF@kernel.org
Link: https://lore.kernel.org/20260402041156.1377214-8-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand (Arm) <david@kernel.org>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Harry Yoo (Oracle) <harry@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nikita Kalyazin <kalyazin@amazon.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Carlier <devnexen@gmail.com>
Cc: Tal Zussman <tz2294@columbia.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|