| Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, the changes primarily focus on resolving race
conditions, memory safety issues (UAF), and improving the robustness
of garbage collection (GC), and folio management.
Enhancements:
- add page-order information for large folio reads in iostat
- add defrag_blocks sysfs node
Bug fixes:
- fix uninitialized kobject put in f2fs_init_sysfs()
- disallow setting an extension to both cold and hot
- fix node_cnt race between extent node destroy and writeback
- preserve previous reserve_{blocks,node} value when remount
- freeze GC and discard threads quickly
- fix false alarm of lockdep on cp_global_sem lock
- fix data loss caused by incorrect use of nat_entry flag
- skip empty sections in f2fs_get_victim
- fix inline data not being written to disk in writeback path
- fix fsck inconsistency caused by FGGC of node block
- fix fsck inconsistency caused by incorrect nat_entry flag usage
- call f2fs_handle_critical_error() to set cp_error flag
- fix fiemap boundary handling when read extent cache is incomplete
- fix use-after-free of sbi in f2fs_compress_write_end_io()
- fix UAF caused by decrementing sbi->nr_pages[] in f2fs_write_end_io()
- fix incorrect file address mapping when inline inode is unwritten
- fix incomplete search range in f2fs_get_victim when f2fs_need_rand_seg is enabled
- avoid memory leak in f2fs_rename()"
* tag 'f2fs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (35 commits)
f2fs: add page-order information for large folio reads in iostat
f2fs: do not support mmap write for large folio
f2fs: fix uninitialized kobject put in f2fs_init_sysfs()
f2fs: protect extension_list reading with sb_lock in f2fs_sbi_show()
f2fs: disallow setting an extension to both cold and hot
f2fs: fix node_cnt race between extent node destroy and writeback
f2fs: allow empty mount string for Opt_usr|grp|projjquota
f2fs: fix to preserve previous reserve_{blocks,node} value when remount
f2fs: invalidate block device page cache on umount
f2fs: fix to freeze GC and discard threads quickly
f2fs: fix to avoid uninit-value access in f2fs_sanity_check_node_footer
f2fs: fix false alarm of lockdep on cp_global_sem lock
f2fs: fix data loss caused by incorrect use of nat_entry flag
f2fs: fix to skip empty sections in f2fs_get_victim
f2fs: fix inline data not being written to disk in writeback path
f2fs: fix fsck inconsistency caused by FGGC of node block
f2fs: fix fsck inconsistency caused by incorrect nat_entry flag usage
f2fs: fix to do sanity check on dcc->discard_cmd_cnt conditionally
f2fs: refactor node footer flag setting related code
f2fs: refactor f2fs_move_node_folio function
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax updates from Ira Weiny:
"The series adds DAX support required for the upcoming fuse/famfs file
system.[1] The support here is required because famfs is backed by
devdax rather than pmem. This all lays the groundwork for using shared
memory as a file system"
Link: https://lore.kernel.org/all/0100019d43e5f632-f5862a3e-361c-4b54-a9a6-96c242a8f17a-000000@email.amazonses.com/ [1]
* tag 'libnvdimm-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax/fsdev: fix uninitialized kaddr in fsdev_dax_zero_page_range()
dax: export dax_dev_get()
dax: Add fs_dax_get() func to prepare dax for fs-dax usage
dax: Add dax_set_ops() for setting dax_operations at bind time
dax: Add dax_operations for use by fs-dax on fsdev dax
dax: Save the kva from memremap
dax: add fsdev.c driver for fs-dax on character dax
dax: Factor out dax_folio_reset_order() helper
dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"We've finally gotten rid of the struct clk_ops::round_rate() code
after months of effort from Brian Masney. Now the only option is to
use determine_rate(), which is good because that takes a struct
argument instead of just a couple unsigned longs, allowing us to
easily modify the way we determine and set rates in the clk tree.
Beyond that core framework change we've got the typical pile of new
SoC clk driver additions, fixes for clk data and/or adding missing
clks because the consumer driver using those clks wasn't ready, etc.
The usual suspects are all here: Qualcomm, Samsung, Mediatek, and
Rockchip along with some newcomers making RISC-V SoCs like ESWIN's
eic700 and Tenstorrent's Atlantis. The clk driver side of this looks
pretty normal.
Core:
- Remove the round_rate() clk op (yay!)
New Drivers:
- ESWIN eic700 SoC clk support
- Econet EN751221 SoC clock/reset support
- Global TCSR, RPMh, and display clock controller support for the
Qualcomm Eliza platform
- TCSR, the multiple global, and the RPMh clock controller support
for the Qualcomm Nord platform
- GPU clock controller support for Qualcomm SM8750
- Video and GPU clock controller support for Qualcomm Glymur
- Global clock controller support for Qualcomm IPQ5210
- Axis ARTPEC-9: Add new PLL clocks and new drivers for eight clock
controllers on the SoC
- ExynosAutov920: Add G3D (GPU) clock controller
- Clock driver for the Rockchip RV1103B SoC
- Initial support for the Renesas RZ/G3L (R9A08G046) SoC
- Clock and reset controllers (e.g. PRCM) in the Tenstorrent Atlantis SoC"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (132 commits)
clk: visconti: pll: initialize clk_init_data to zero
clk: fsl-sai: Add MCLK generation support
clk: fsl-sai: Extract clock setup into fsl_sai_clk_register()
dt-bindings: clock: fsl-sai: Document clock-cells = <1> support
clk: fsl-sai: Add i.MX8M support with 8 byte register offset
clk: fsl-sai: Sort the headers
dt-bindings: clock: fsl-sai: Document i.MX8M support
clk: qcom: gcc: Add multiple global clock controller driver for Nord SoC
clk: qcom: rpmh: Add support for Nord rpmh clocks
clk: qcom: Add TCSR clock driver for Nord SoC
dt-bindings: clock: qcom: Add Nord Global Clock Controller
dt-bindings: clock: qcom-rpmhcc: Add support for Nord SoCs
dt-bindings: clock: qcom: Document the Nord SoC TCSR Clock Controller
clk: qcom: gcc-x1e80100: Keep GCC USB QTB clock always ON
clk: qcom: Constify list of critical CBCR registers
clk: qcom: Constify qcom_cc_driver_data
clk: qcom: videocc-glymur: Constify qcom_cc_desc
clk: qcom: Add a driver for SM8750 GPU clocks
dt-bindings: clock: qcom: Add SM8750 GPU clocks
clk: qcom: ipq-cmn-pll: Add IPQ8074 SoC support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dcache busy loop updates from Al Viro:
"Fix livelocks in shrink_dcache_tree()
If shrink_dcache_tree() finds a dentry in the middle of being killed
by another thread, it has to wait until the victim finishes dying,
gets detached from the tree and ceases to pin its parent.
The way we used to deal with that amounted to busy-wait;
unfortunately, it's not just inefficient but can lead to reliably
reproducible hard livelocks.
Solved by having shrink_dentry_tree() attach a completion to such
dentry, with dentry_unlist() calling complete() on all objects
attached to it. With a bit of care it can be done without growing
struct dentry or adding overhead in normal case"
* tag 'pull-dcache-busy-wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
get rid of busy-waiting in shrink_dcache_tree()
dcache.c: more idiomatic "positives are not allowed" sanity checks
struct dentry: make ->d_u anonymous
for_each_alias(): helper macro for iterating through dentries of given inode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Fix printk ring buffer initialization and sanity checks
- Workaround printf kunit test compilation with gcc < 12.1
- Add IPv6 address printf format tests
- Misc code and documentation cleanup
* tag 'printk-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printf: Compile the kunit test with DISABLE_BRANCH_PROFILING DISABLE_BRANCH_PROFILING
lib/vsprintf: use bool for local decode variable
lib/hexdump: print_hex_dump_bytes() calls print_hex_dump_debug()
printk: ringbuffer: fix errors in comments
printk_ringbuffer: Add sanity check for 0-size data
printk_ringbuffer: Fix get_data() size sanity check
printf: add IPv6 address format tests
printk: Fix _DESCS_COUNT type for 64-bit systems
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull entry cleanup from Ingo Molnar:
"Remove the unused ARCH_SYSCALL_WORK_{ENTER,EXIT} flags"
* tag 'core-urgent-2026-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Kill ARCH_SYSCALL_WORK_{ENTER,EXIT}
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Ilpo Järvinen:
"asus-wmi:
- Retain battery charge threshold during boot which avoids
unsolicited change to 100%. Return -ENODATA when the limit
is not yet known
- Improve screenpad power/brightness handling consistency
- Fix screenpad brightness range
barco-p50-gpio:
- Normalize gpio_get return values
bitland-mifs-wmi:
- Add driver for Bitland laptops (supports platform profile,
hwmon, kbd backlight, gpu mode, hotkeys, and fan boost)
dell_rbu:
- Fix using uninitialized value in sysfs write function
dell-wmi-sysman:
- Respect destination length when constructing enum strings
hp-wmi:
- Propagate fan setting apply failures and log an error
- Fix sysfs write vs work handler cancel_delayed_work_sync() deadlock
- Correct keepalive schedule_delayed_work() to mod_delayed_work()
- Fix u8 underflows in GPU delta calculation
- Use mutex to protect fan pwm/mode
- Ignore kbd backlight and FnLock key events that are handled by FW
- Fix fan table parsing (use correct field)
- Add support for Omen 14-fb0xxx, 16-n0xxx, 16-wf1xxx, and
Omen MAX 16-ak0xxxx
input: trackpoint & thinkpad_acpi:
- Enable doubletap by default and add sysfs enable/disable
int3472:
- Add support for GPIO type 0x02 (IR flood LED)
intel-speed-select: (updated to v1.26)
- Avoid using current base frequency as maximum
- Fix CPU extended family ID decoding
- Fix exit code
- Improve error reporting
intel/vsec:
- Refactor to support ACPI-enumerated PMT endpoints.
pcengines-apuv2:
- Attach software node to the gpiochip
uniwill:
- Refactor hwmon to smaller parts to accomodate HW diversity
- Support USB-C power/performance priority switch through sysfs
- Add another XMG Fusion 15 (L19) DMI vendor
- Enable fine-grained features to device lineup mapping
wmi:
- Perform output size check within WMI core to allow simpler WMI
drivers
misc:
- acpi_driver -> platform driver conversions (a large number of
changes from Rafael J. Wysocki)
- cleanups / refactoring / improvements"
* tag 'platform-drivers-x86-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (106 commits)
platform/x86: hp-wmi: Add support for Omen 16-wf1xxx (8C77)
platform/x86: hp-wmi: Add support for Omen 16-n0xxx (8A44)
platform/x86: hp-wmi: Add support for OMEN MAX 16-ak0xxx (8D87)
platform/x86: hp-wmi: fix fan table parsing
platform/x86: hp-wmi: add Omen 14-fb0xxx (board 8C58) support
platform/wmi: Replace .no_notify_data with .min_event_size
platform/wmi: Extend wmidev_query_block() to reject undersized data
platform/wmi: Extend wmidev_invoke_method() to reject undersized data
platform/wmi: Prepare to reject undersized unmarshalling results
platform/wmi: Convert drivers to use wmidev_invoke_procedure()
platform/wmi: Add wmidev_invoke_procedure()
platform/x86: int3472: Add support for GPIO type 0x02 (IR flood LED)
platform/x86: int3472: Parameterize LED con_id in registration
platform/x86: int3472: Rename pled to led in LED registration code
platform/x86: int3472: Use local variable for LED struct access
platform/x86: thinkpad_acpi: remove obsolete TODO comment
platform/x86: dell-wmi-sysman: bound enumeration string aggregation
platform/x86: hp-wmi: Ignore backlight and FnLock events
platform/x86: uniwill-laptop: Fix signedness bug
platform/x86: dell_rbu: avoid uninit value usage in packet_size_write()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"Core:
- Add a resource-managed version of alloc_workqueue()
(`devm_alloc_workqueue()`)
- Preserve the Open Firmware (OF) node when an ACPI handle
is present
Apple SMC:
- Wire up the Apple SMC power driver by adding a new MFD cell
Atmel HLCDC:
- Fetch the LVDS PLL clock as a fallback if the generic sys_clk
is unavailable
Broadcom BCM2835 PM:
- Add support for the BCM2712 power management device
- Introduce a hardware type identifier to distinguish SoC variants
Congatec CGBC, KEMPLD, RSMU, Si476x:
- Fix various kernel-doc warnings and correct struct member names
DLN2:
- Drop redundant USB device references and switch to managed
resource allocations
- Update bare 'unsigned' types to 'unsigned int'
ENE KB3930:
- Use the of_device_is_system_power_controller() wrapper
EZX PCAP:
- Avoid rescheduling after destroying the workqueue by switching
to a device-managed workqueue
- Drop redundant memory allocation error messages
- Return directly instead of using empty goto statements
Freescale i.MX25 TSADC:
- Convert devicetree bindings from TXT to YAML format
Freescale MC13xxx:
- Fix a memory leak in subdevice platform data allocation by
using devm_kmemdup()
Intel LPC ICH:
- Expose a software node for the GPIO controller cell to fix
GPIO lookups
Intel LPSS:
- Add PCI IDs for the Intel Nova Lake-H platform
Maxim MAX77620:
- Convert devicetree bindings from TXT to YAML format
- Document an optional I2C address for the MAX77663 RTC device
Maxim MAX77705:
- Make the max77705_pm_ops variable static to resolve a
sparse warning
MediaTek MT6397:
- Correct the hardware CIDs for the MT6328, MT6331, and MT6332
PMICs to allow proper driver binding
ROHM BD71828:
- Enable system wakeup via the power button
ROHM BD72720:
- Add a new compatible string for the ROHM BD73900 PMIC
SpacemiT P1:
- Drop the deprecated "vin-supply" property from the devicetree
bindings
- Add individual regulator supply properties to match actual
hardware topology
STMicroelectronics STPMIC1:
- Attempt system shutdown a second time to handle transient I2C
communication failures
Viperboard:
- Drop redundant USB device references"
* tag 'mfd-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (28 commits)
mfd: core: Preserve OF node when ACPI handle is present
mfd: ene-kb3930: Use of_device_is_system_power_controller() wrapper
mfd: intel-lpss: Add Intel Nova Lake-H PCI IDs
dt-bindings: mfd: max77620: Document optional RTC address for MAX77663
dt-bindings: mfd: max77620: Convert to DT schema
mfd: ezx-pcap: Avoid rescheduling after destroying workqueue
mfd: ezx-pcap: Return directly instead of empty gotos
mfd: ezx-pcap: Drop memory allocation error message
mfd: bcm2835-pm: Add BCM2712 PM device support
mfd: bcm2835-pm: Introduce SoC-specific type identifier
dt-bindings: mfd: bd72720: Add ROHM BD73900
mfd: si476x: Fix kernel-doc warnings
mfd: rsmu: Remove a empty kernel-doc line
mfd: kempld: Fix kernel-doc struct member names
mfd: congatec: Fix kernel-doc struct member names
dt-bindings: mfd: Convert fsl-imx25-tsadc.txt to yaml format
mfd: viperboard: Drop redundant device reference
mfd: dln2: Switch to managed resources and fix bare unsigned types
mfd: macsmc: Wire up Apple SMC power driver
mfd: mt6397: Properly fix CID of MT6328, MT6331 and MT6332
...
|
|
Pull rdma updates from Jason Gunthorpe:
"The usual collection of driver changes, more core infrastructure
updates that typical this cycle:
- Minor cleanups and kernel-doc fixes in bnxt_re, hns, rdmavt, efa,
ocrdma, erdma, rtrs, hfi1, ionic, and pvrdma
- New udata validation framework and driver updates
- Modernize CQ creation interface in mlx4 and mlx5, manage CQ umem in
core
- Promote UMEM to a core component, split out DMA block iterator
logic
- Introduce FRMR pools with aging, statistics, pinned handles, and
netlink control and use it in mlx5
- Add PCIe TLP emulation support in mlx5
- Extend umem to work with revocable pinned dmabuf's and use it in
irdma
- More net namespace improvements for rxe
- GEN4 hardware support in irdma
- First steps to MW and UC support in mana_ib
- Support for CQ umem and doorbells in bnxt_re
- Drop opa_vnic driver from hfi1
Fixes:
- IB/core zero dmac neighbor resolution race
- GID table memory free
- rxe pad/ICRC validation and r_key async errors
- mlx4 external umem for CQ
- umem DMA attributes on unmap
- mana_ib RX steering on RSS QP destroy"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (116 commits)
RDMA/core: Fix user CQ creation for drivers without create_cq
RDMA/ionic: bound node_desc sysfs read with %.64s
IB/core: Fix zero dmac race in neighbor resolution
RDMA/mana_ib: Support memory windows
RDMA/rxe: Validate pad and ICRC before payload_size() in rxe_rcv
RDMA/core: Prefer NLA_NUL_STRING
RDMA/core: Fix memory free for GID table
RDMA/hns: Remove the duplicate calls to ib_copy_validate_udata_in()
RDMA: Remove redundant = {} for udata req structs
RDMA/irdma: Add missing comp_mask check in alloc_ucontext
RDMA/hns: Add missing comp_mask check in create_qp
RDMA/mlx5: Pull comp_mask validation into ib_copy_validate_udata_in_cm()
RDMA: Use ib_copy_validate_udata_in_cm() for zero comp_mask
RDMA/hns: Use ib_copy_validate_udata_in()
RDMA/mlx4: Use ib_copy_validate_udata_in() for QP
RDMA/mlx4: Use ib_copy_validate_udata_in()
RDMA/mlx5: Use ib_copy_validate_udata_in() for MW
RDMA/mlx5: Use ib_copy_validate_udata_in() for SRQ
RDMA/pvrdma: Use ib_copy_validate_udata_in() for srq
RDMA: Use ib_copy_validate_udata_in() for implicit full structs
...
|
|
Pull nfsd updates from Chuck Lever:
- filehandle signing to defend against filehandle-guessing attacks
(Benjamin Coddington)
The server now appends a SipHash-2-4 MAC to each filehandle when
the new "sign_fh" export option is enabled. NFSD then verifies
filehandles received from clients against the expected MAC;
mismatches return NFS error STALE
- convert the entire NLMv4 server-side XDR layer from hand-written C to
xdrgen-generated code, spanning roughly thirty patches (Chuck Lever)
XDR functions are generally boilerplate code and are easy to get
wrong. The goals of this conversion are improved memory safety, lower
maintenance burden, and groundwork for eventual Rust code generation
for these functions.
- improve pNFS block/SCSI layout robustness with two related changes
(Dai Ngo)
SCSI persistent reservation fencing is now tracked per client and
per device via an xarray, to avoid both redundant preempt operations
on devices already fenced and a potential NFSD deadlock when all nfsd
threads are waiting for a layout return.
- scalability and infrastructure improvements
Sincere thanks to all contributors, reviewers, testers, and bug
reporters who participated in the v7.1 NFSD development cycle.
* tag 'nfsd-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (83 commits)
NFSD: Docs: clean up pnfs server timeout docs
nfsd: fix comment typo in nfsxdr
nfsd: fix comment typo in nfs3xdr
NFSD: convert callback RPC program to per-net namespace
NFSD: use per-operation statidx for callback procedures
svcrdma: Use contiguous pages for RDMA Read sink buffers
SUNRPC: Add svc_rqst_page_release() helper
SUNRPC: xdr.h: fix all kernel-doc warnings
svcrdma: Factor out WR chain linking into helper
svcrdma: Add Write chunk WRs to the RPC's Send WR chain
svcrdma: Clean up use of rdma->sc_pd->device
svcrdma: Clean up use of rdma->sc_pd->device in Receive paths
svcrdma: Add fair queuing for Send Queue access
SUNRPC: Optimize rq_respages allocation in svc_alloc_arg
SUNRPC: Track consumed rq_pages entries
svcrdma: preserve rq_next_page in svc_rdma_save_io_pages
SUNRPC: Handle NULL entries in svc_rqst_release_pages
SUNRPC: Allocate a separate Reply page array
SUNRPC: Tighten bounds checking in svc_rqst_replace_page
NFSD: Sign filehandles
...
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:
- Prevent a device from being probed before device_add() has finished
initializing it; gate probe with a "ready_to_probe" device flag to
avoid races with concurrent driver_register() calls
- Fix a kernel-doc warning for DEV_FLAG_COUNT introduced by the above
- Return -ENOTCONN from software_node_get_reference_args() when a
referenced software node is known but not yet registered, allowing
callers to defer probe
- In sysfs_group_attrs_change_owner(), also check is_visible_const();
missed when the const variant was introduced
* tag 'driver-core-7.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
driver core: Add kernel-doc for DEV_FLAG_COUNT enum value
sysfs: attribute_group: Respect is_visible_const() when changing owner
software node: return -ENOTCONN when referenced swnode is not registered yet
driver core: Don't let a device probe until it's ready
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt updates from Greg KH:
"Here is the big set of USB and Thunderbolt changes for 7.1-rc1.
Lots of little things in here, nothing major, just constant
improvements, updates, and new features. Highlights are:
- new USB power supply driver support.
These changes did touch outside of drivers/usb/ but got acks from
the relevant mantainers for them.
- dts file updates and conversions
- string function conversions into "safer" ones
- new device quirks
- xhci driver updates
- usb gadget driver minor fixes
- typec driver additions and updates
- small number of thunderbolt driver changes
- dwc3 driver updates and additions of new hardware support
- other minor driver updates
All of these have been in the linux-next tree for a while with no
reported issues"
* tag 'usb-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (176 commits)
usb: dwc3: starfive: Add JHB100 USB 2.0 DRD controller
dt-bindings: usb: dwc3: add support for StarFive JHB100
dt-bindings: usb: atmel,at91sam9rl-udc: convert to DT schema
dt-bindings: usb: atmel,at91rm9200-udc: convert to DT schema
dt-bindings: usb: generic-ehci: fix schema structure and add at91sam9g45 constraints
dt-bindings: usb: generic-ohci: add AT91RM9200 OHCI binding support
arm: dts: at91: remove unused #address-cells/#size-cells from sam9x60 udc node
drivers/usb/host: Fix spelling error 'seperate' -> 'separate'
usbip: tools: add hint when no exported devices are found
USB: serial: iuu_phoenix: fix iuutool author name
usb: gadget: f_ncm: validate minimum block_len in ncm_unwrap_ntb()
usb: gadget: f_phonet: fix skb frags[] overflow in pn_rx_complete()
usb: gadget: f_hid: Add missing error code
usb: typec: cros_ec_ucsi: Load driver from OF and ACPI definitions
dt-bindings: chrome: Add cros-ec-ucsi compatibility to typec binding
USB: of: Simplify with scoped for each OF child loop
usbip: validate number_of_packets in usbip_pack_ret_submit()
usb: gadget: renesas_usb3: validate endpoint index in standard request handlers
usb: core: config: reverse the size check of the SSP isoc endpoint descriptor
usb: typec: ucsi: Set usb mode on partner change
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial updates from Greg KH:
"Here is the set of tty and serial driver changes for 7.1-rc1.
Not much here this cycle, biggest thing is the removal of an old
driver that never got any actual hardware support (esp32), and the
second try to moving the tty ports to their own workqueues (first try
was in 7.0-rc1 but was reverted due to problems)
Otherwise it's just a small set of driver updates and some vt modifier
key enhancements.
All have been in linux-next for a while with no reported issues"
* tag 'tty-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (35 commits)
tty: serial: ip22zilog: Fix section mispatch warning
hvc/xen: Check console connection flag
serial: sh-sci: Add support for RZ/G3L RSCI
dt-bindings: serial: renesas,rsci: Document RZ/G3L SoC
tty: atmel_serial: update outdated reference to atmel_tasklet_func()
serial: xilinx_uartps: Drop unused include
serial: qcom-geni: drop stray newline format specifier
serial: 8250: loongson: Enable building on MIPS Loongson64
dt-bindings: serial: 8250: Add Loongson 3A4000 uart compatible
serial: 8250_fintek: Add support for F81214E
tty: tty_port: add workqueue to flip TTY buffer
vt: support ITU-T T.416 color subparameters
serial: qcom-geni: Fix RTS behavior with flow control
tty: serial: imx: keep dma request disabled before dma transfer setup
tty: serial: 8250: Add SystemBase Multi I/O cards
serial: pic32_uart: allow driver to be compiled on all architectures with COMPILE_TEST
serial: tegra: remove Kconfig dependency on APB DMA controller
dt-bindings: serial: amlogic,meson-uart: Add compatible string for A9
dt-bindings: serial: atmel,at91-usart: add microchip,lan9691-usart
serial: auart: check clk_enable() return in console write
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- "Eliminate Dying Memory Cgroup" (Qi Zheng and Muchun Song)
Address the longstanding "dying memcg problem". A situation wherein a
no-longer-used memory control group will hang around for an extended
period pointlessly consuming memory
- "fix unexpected type conversions and potential overflows" (Qi Zheng)
Fix a couple of potential 32-bit/64-bit issues which were identified
during review of the "Eliminate Dying Memory Cgroup" series
- "kho: history: track previous kernel version and kexec boot count"
(Breno Leitao)
Use Kexec Handover (KHO) to pass the previous kernel's version string
and the number of kexec reboots since the last cold boot to the next
kernel, and print it at boot time
- "liveupdate: prevent double preservation" (Pasha Tatashin)
Teach LUO to avoid managing the same file across different active
sessions
- "liveupdate: Fix module unloading and unregister API" (Pasha
Tatashin)
Address an issue with how LUO handles module reference counting and
unregistration during module unloading
- "zswap pool per-CPU acomp_ctx simplifications" (Kanchana Sridhar)
Simplify and clean up the zswap crypto compression handling and
improve the lifecycle management of zswap pool's per-CPU acomp_ctx
resources
- "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race"
(SeongJae Park)
Address unlikely but possible leaks and deadlocks in damon_call() and
damon_walk()
- "mm/damon/core: validate damos_quota_goal->nid" (SeongJae Park)
Fix a couple of root-only wild pointer dereferences
- "Docs/admin-guide/mm/damon: warn commit_inputs vs other params race"
(SeongJae Park)
Update the DAMON documentation to warn operators about potential
races which can occur if the commit_inputs parameter is altered at
the wrong time
- "Minor hmm_test fixes and cleanups" (Alistair Popple)
Bugfixes and a cleanup for the HMM kernel selftests
- "Modify memfd_luo code" (Chenghao Duan)
Cleanups, simplifications and speedups to the memfd_lou code
- "mm, kvm: allow uffd support in guest_memfd" (Mike Rapoport)
Support for userfaultfd in guest_memfd
- "selftests/mm: skip several tests when thp is not available" (Chunyu
Hu)
Fix several issues in the selftests code which were causing breakage
when the tests were run on CONFIG_THP=n kernels
- "mm/mprotect: micro-optimization work" (Pedro Falcato)
A couple of nice speedups for mprotect()
- "MAINTAINERS: update KHO and LIVE UPDATE entries" (Pratyush Yadav)
Document upcoming changes in the maintenance of KHO, LUO, memfd_luo,
kexec, crash, kdump and probably other kexec-based things - they are
being moved out of mm.git and into a new git tree
* tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (121 commits)
MAINTAINERS: add page cache reviewer
mm/vmscan: avoid false-positive -Wuninitialized warning
MAINTAINERS: update Dave's kdump reviewer email address
MAINTAINERS: drop include/linux/liveupdate from LIVE UPDATE
MAINTAINERS: drop include/linux/kho/abi/ from KHO
MAINTAINERS: update KHO and LIVE UPDATE maintainers
MAINTAINERS: update kexec/kdump maintainers entries
mm/migrate_device: remove dead migration entry check in migrate_vma_collect_huge_pmd()
selftests: mm: skip charge_reserved_hugetlb without killall
userfaultfd: allow registration of ranges below mmap_min_addr
mm/vmstat: fix vmstat_shepherd double-scheduling vmstat_update
mm/hugetlb: fix early boot crash on parameters without '=' separator
zram: reject unrecognized type= values in recompress_store()
docs: proc: document ProtectionKey in smaps
mm/mprotect: special-case small folios when applying permissions
mm/mprotect: move softleaf code out of the main function
mm: remove '!root_reclaim' checking in should_abort_scan()
mm/sparse: fix comment for section map alignment
mm/page_io: use sio->len for PSWPIN accounting in sio_read_complete()
selftests/mm: transhuge_stress: skip the test when thp not available
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control updates from Linus Walleij:
"Core changes:
- Perform basic checks on pin config properties so as not to allow
directly contradictory settings such as setting a pin to more than
one bias or drive mode
- Handle input-threshold-voltage-microvolt property
- Introduce pinctrl_gpio_get_config() handling in the core for SCMI
GPIO using pin control
New drivers:
- GPIO-by-pin control driver (also appearing in the GPIO pull
request) fulfilling a promise on a comment from Grant Likely many
years ago: "can't GPIO just be a front-end for pin control?" it
turns out it can, if and only if you design something new from
scratch, such as SCMI
- Broadcom BCM7038 as a pinctrl-single delegate
- Mobileye EyeQ6Lplus OLB pin controller
- Qualcomm Eliza and Hawi families TLMM pin controllers
- Qualcomm SDM670 and Milos family LPASS LPI pin controllers
- Qualcomm IPQ5210 pin controller
- Realtek RTD1625 pin controller support
- Rockchip RV1103B pin controller support
- Texas Instruments AM62L as a pinctrl-single delegate
Improvements:
- Set config implementation for the Spacemit K1 pin controller"
* tag 'pinctrl-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (84 commits)
pinctrl: qcom: Add Hawi pinctrl driver
dt-bindings: pinctrl: qcom: Describe Hawi TLMM block
dt-bindings: pinctrl: pinctrl-max77620: convert to DT schema
pinctrl: single: Add bcm7038-padconf compatible matching
dt-bindings: pinctrl: pinctrl-single: Add brcm,bcm7038-padconf
dt-bindings: pinctrl: apple,pinctrl: Add t8122 compatible
pinctrl: qcom: sdm670-lpass-lpi: label variables as static
pinctrl: sophgo: pinctrl-sg2044: Fix wrong module description
pinctrl: sophgo: pinctrl-sg2042: Fix wrong module description
pinctrl: qcom: add sdm670 lpi tlmm
dt-bindings: pinctrl: qcom: Add SDM670 LPASS LPI pinctrl
dt-bindings: qcom: lpass-lpi-common: add reserved GPIOs property
pinctrl: qcom: Introduce IPQ5210 TLMM driver
dt-bindings: pinctrl: qcom: add IPQ5210 pinctrl
pinctrl: qcom: Drop redundant intr_target_reg on modern SoCs
pinctrl: qcom: eliza: Fix interrupt target bit
pinctrl: core: Don't use "proxy" headers
pinctrl: amd: Support new ACPI ID AMDI0033
pinctrl: renesas: rzg2l: Drop superfluous blank line
pinctrl: renesas: rzg2l: Fix save/restore of {IOLH,IEN,PUPD,SMT} registers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
- improve debuggability of reserve_mem kernel parameter handling with
print outs in case of a failure and debugfs info showing what was
actually reserved
- Make memblock_free_late() and free_reserved_area() use the same core
logic for freeing the memory to buddy and ensure it takes care of
updating memblock arrays when ARCH_KEEP_MEMBLOCK is enabled.
* tag 'memblock-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
x86/alternative: delay freeing of smp_locks section
memblock: warn when freeing reserved memory before memory map is initialized
memblock, treewide: make memblock_free() handle late freeing
memblock: make free_reserved_area() update memblock if ARCH_KEEP_MEMBLOCK=y
memblock: extract page freeing from free_reserved_area() into a helper
memblock: make free_reserved_area() more robust
mm: move free_reserved_area() to mm/memblock.c
powerpc: opal-core: pair alloc_pages_exact() with free_pages_exact()
powerpc: fadump: pair alloc_pages_exact() with free_pages_exact()
memblock: reserve_mem: fix end caclulation in reserve_mem_release_by_name()
memblock: move reserve_bootmem_range() to memblock.c and make it static
memblock: Add reserve_mem debugfs info
memblock: Print out errors on reserve_mem parser
|
|
The comment in mmzone.h currently details exhaustive per-architecture
bit-width lists and explains alignment using min(PAGE_SHIFT,
PFN_SECTION_SHIFT). Such details risk falling out of date over time and
may inadvertently be left un-updated.
We always expect a single section to cover full pages. Therefore, we can
safely assume that PFN_SECTION_SHIFT is large enough to accommodate
SECTION_MAP_LAST_BIT. We use BUILD_BUG_ON() to ensure this.
Update the comment to accurately reflect this consensus, making it clear
that we rely on a single section covering full pages.
Link: https://lore.kernel.org/20260402102320.3617578-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Petr Tesarik <ptesarik@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add filemap_add() and filemap_remove() methods to vm_uffd_ops and use them
in __mfill_atomic_pte() to add shmem folios to page cache and remove them
in case of error.
Implement these methods in shmem along with vm_uffd_ops->alloc_folio() and
drop shmem_mfill_atomic_pte().
Since userfaultfd now does not reference any functions from shmem, drop
include if linux/shmem_fs.h from mm/userfaultfd.c
mfill_atomic_install_pte() is not used anywhere outside of mm/userfaultfd,
make it static.
Link: https://lore.kernel.org/20260402041156.1377214-11-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: James Houghton <jthoughton@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand (Arm) <david@kernel.org>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Harry Yoo (Oracle) <harry@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nikita Kalyazin <kalyazin@amazon.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Carlier <devnexen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
and use it to refactor mfill_atomic_pte_zeroed_folio() and
mfill_atomic_pte_copy().
mfill_atomic_pte_zeroed_folio() and mfill_atomic_pte_copy() perform
almost identical actions:
* allocate a folio
* update folio contents (either copy from userspace of fill with zeros)
* update page tables with the new folio
Split a __mfill_atomic_pte() helper that handles both cases and uses newly
introduced vm_uffd_ops->alloc_folio() to allocate the folio.
Pass the ops structure from the callers to __mfill_atomic_pte() to later
allow using anon_uffd_ops for MAP_PRIVATE mappings of file-backed VMAs.
Note, that the new ops method is called alloc_folio() rather than
folio_alloc() to avoid clash with alloc_tag macro folio_alloc().
Link: https://lore.kernel.org/20260402041156.1377214-10-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: James Houghton <jthoughton@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand (Arm) <david@kernel.org>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Harry Yoo (Oracle) <harry@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nikita Kalyazin <kalyazin@amazon.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Carlier <devnexen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When userspace resolves a page fault in a shmem VMA with UFFDIO_CONTINUE
it needs to get a folio that already exists in the pagecache backing that
VMA.
Instead of using shmem_get_folio() for that, add a get_folio_noalloc()
method to 'struct vm_uffd_ops' that will return a folio if it exists in
the VMA's pagecache at given pgoff.
Implement get_folio_noalloc() method for shmem and slightly refactor
userfaultfd's mfill_get_vma() and mfill_atomic_pte_continue() to support
this new API.
Link: https://lore.kernel.org/20260402041156.1377214-9-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: James Houghton <jthoughton@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand (Arm) <david@kernel.org>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Harry Yoo (Oracle) <harry@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nikita Kalyazin <kalyazin@amazon.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Carlier <devnexen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Current userfaultfd implementation works only with memory managed by core
MM: anonymous, shmem and hugetlb.
First, there is no fundamental reason to limit userfaultfd support only to
the core memory types and userfaults can be handled similarly to regular
page faults provided a VMA owner implements appropriate callbacks.
Second, historically various code paths were conditioned on
vma_is_anonymous(), vma_is_shmem() and is_vm_hugetlb_page() and some of
these conditions can be expressed as operations implemented by a
particular memory type.
Introduce vm_uffd_ops extension to vm_operations_struct that will delegate
memory type specific operations to a VMA owner.
Operations for anonymous memory are handled internally in userfaultfd
using anon_uffd_ops that implicitly assigned to anonymous VMAs.
Start with a single operation, ->can_userfault() that will verify that a
VMA meets requirements for userfaultfd support at registration time.
Implement that method for anonymous, shmem and hugetlb and move relevant
parts of vma_can_userfault() into the new callbacks.
[rppt@kernel.org: relocate VM_DROPPABLE test, per Tal]
Link: https://lore.kernel.org/adffgfM5ANxtPIEF@kernel.org
Link: https://lore.kernel.org/20260402041156.1377214-8-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand (Arm) <david@kernel.org>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Harry Yoo (Oracle) <harry@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nikita Kalyazin <kalyazin@amazon.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Carlier <devnexen@gmail.com>
Cc: Tal Zussman <tz2294@columbia.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
vma_can_userfault() has grown pretty big and it's not called on
performance critical path.
Move it out of line.
No functional changes.
Link: https://lore.kernel.org/20260402041156.1377214-7-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Harry Yoo (Oracle) <harry@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nikita Kalyazin <kalyazin@amazon.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Carlier <devnexen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When kdamond_fn() main loop is finished, the function cancels remaining
damos_walk() request and unset the damon_ctx->kdamond so that API callers
and API functions themselves can show the context is terminated.
damos_walk() adds the caller's request to the queue first. After that, it
shows if the kdamond of the damon_ctx is still running (damon_ctx->kdamond
is set). Only if the kdamond is running, damos_walk() starts waiting for
the kdamond's handling of the newly added request.
The damos_walk() requests registration and damon_ctx->kdamond unset are
protected by different mutexes, though. Hence, damos_walk() could race
with damon_ctx->kdamond unset, and result in deadlocks.
For example, let's suppose kdamond successfully finished the damow_walk()
request cancelling. Right after that, damos_walk() is called for the
context. It registers the new request, and shows the context is still
running, because damon_ctx->kdamond unset is not yet done. Hence the
damos_walk() caller starts waiting for the handling of the request.
However, the kdamond is already on the termination steps, so it never
handles the new request. As a result, the damos_walk() caller thread
infinitely waits.
Fix this by introducing another damon_ctx field, namely
walk_control_obsolete. It is protected by the
damon_ctx->walk_control_lock, which protects damos_walk() request
registration. Initialize (unset) it in kdamond_fn() before letting
damon_start() returns and set it just before the cancelling of the
remaining damos_walk() request is executed. damos_walk() reads the
obsolete field under the lock and avoids adding a new request.
After this change, only requests that are guaranteed to be handled or
cancelled are registered. Hence the after-registration DAMON context
termination check is no longer needed. Remove it together.
The issue is found by sashiko [1].
Link: https://lore.kernel.org/20260327233319.3528-3-sj@kernel.org
Link: https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org [1]
Fixes: bf0eaba0ff9c ("mm/damon/core: implement damos_walk()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.14.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit
race".
damon_call() and damos_walk() can leak memory and/or deadlock when they
race with kdamond terminations. Fix those.
This patch (of 2);
When kdamond_fn() main loop is finished, the function cancels all
remaining damon_call() requests and unset the damon_ctx->kdamond so that
API callers and API functions themselves can know the context is
terminated. damon_call() adds the caller's request to the queue first.
After that, it shows if the kdamond of the damon_ctx is still running
(damon_ctx->kdamond is set). Only if the kdamond is running, damon_call()
starts waiting for the kdamond's handling of the newly added request.
The damon_call() requests registration and damon_ctx->kdamond unset are
protected by different mutexes, though. Hence, damon_call() could race
with damon_ctx->kdamond unset, and result in deadlocks.
For example, let's suppose kdamond successfully finished the damon_call()
requests cancelling. Right after that, damon_call() is called for the
context. It registers the new request, and shows the context is still
running, because damon_ctx->kdamond unset is not yet done. Hence the
damon_call() caller starts waiting for the handling of the request.
However, the kdamond is already on the termination steps, so it never
handles the new request. As a result, the damon_call() caller threads
infinitely waits.
Fix this by introducing another damon_ctx field, namely
call_controls_obsolete. It is protected by the
damon_ctx->call_controls_lock, which protects damon_call() requests
registration. Initialize (unset) it in kdamond_fn() before letting
damon_start() returns and set it just before the cancelling of remaining
damon_call() requests is executed. damon_call() reads the obsolete field
under the lock and avoids adding a new request.
After this change, only requests that are guaranteed to be handled or
cancelled are registered. Hence the after-registration DAMON context
termination check is no longer needed. Remove it together.
Note that the deadlock will not happen when damon_call() is called for
repeat mode request. In tis case, damon_call() returns instead of waiting
for the handling when the request registration succeeds and it shows the
kdamond is running. However, if the request also has dealloc_on_cancel,
the request memory would be leaked.
The issue is found by sashiko [1].
Link: https://lore.kernel.org/20260327233319.3528-1-sj@kernel.org
Link: https://lore.kernel.org/20260327233319.3528-2-sj@kernel.org
Link: https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org [1]
Fixes: 42b7491af14c ("mm/damon/core: introduce damon_call()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.14.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Due to initialization ordering, page_ext is allocated and initialized
relatively late during boot. Some pages have already been allocated and
freed before page_ext becomes available, leaving their codetag
uninitialized.
A clear example is in init_section_page_ext(): alloc_page_ext() calls
kmemleak_alloc(). If the slab cache has no free objects, it falls back to
the buddy allocator to allocate memory. However, at this point page_ext
is not yet fully initialized, so these newly allocated pages have no
codetag set. These pages may later be reclaimed by KASAN, which causes
the warning to trigger when they are freed because their codetag ref is
still empty.
Use a global array to track pages allocated before page_ext is fully
initialized. The array size is fixed at 8192 entries, and will emit a
warning if this limit is exceeded. When page_ext initialization
completes, set their codetag to empty to avoid warnings when they are
freed later.
This warning is only observed with CONFIG_MEM_ALLOC_PROFILING_DEBUG=Y and
mem_profiling_compressed disabled:
[ 9.582133] ------------[ cut here ]------------
[ 9.582137] alloc_tag was not set
[ 9.582139] WARNING: ./include/linux/alloc_tag.h:164 at __pgalloc_tag_sub+0x40f/0x550, CPU#5: systemd/1
[ 9.582190] CPU: 5 UID: 0 PID: 1 Comm: systemd Not tainted 7.0.0-rc4 #1 PREEMPT(lazy)
[ 9.582192] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 9.582194] RIP: 0010:__pgalloc_tag_sub+0x40f/0x550
[ 9.582196] Code: 00 00 4c 29 e5 48 8b 05 1f 88 56 05 48 8d 4c ad 00 48 8d 2c c8 e9 87 fd ff ff 0f 0b 0f 0b e9 f3 fe ff ff 48 8d 3d 61 2f ed 03 <67> 48 0f b9 3a e9 b3 fd ff ff 0f 0b eb e4 e8 5e cd 14 02 4c 89 c7
[ 9.582197] RSP: 0018:ffffc9000001f940 EFLAGS: 00010246
[ 9.582200] RAX: dffffc0000000000 RBX: 1ffff92000003f2b RCX: 1ffff110200d806c
[ 9.582201] RDX: ffff8881006c0360 RSI: 0000000000000004 RDI: ffffffff9bc7b460
[ 9.582202] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff3a62324
[ 9.582203] R10: ffffffff9d311923 R11: 0000000000000000 R12: ffffea0004001b00
[ 9.582204] R13: 0000000000002000 R14: ffffea0000000000 R15: ffff8881006c0360
[ 9.582206] FS: 00007ffbbcf2d940(0000) GS:ffff888450479000(0000) knlGS:0000000000000000
[ 9.582208] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.582210] CR2: 000055ee3aa260d0 CR3: 0000000148b67005 CR4: 0000000000770ef0
[ 9.582211] PKRU: 55555554
[ 9.582212] Call Trace:
[ 9.582213] <TASK>
[ 9.582214] ? __pfx___pgalloc_tag_sub+0x10/0x10
[ 9.582216] ? check_bytes_and_report+0x68/0x140
[ 9.582219] __free_frozen_pages+0x2e4/0x1150
[ 9.582221] ? __free_slab+0xc2/0x2b0
[ 9.582224] qlist_free_all+0x4c/0xf0
[ 9.582227] kasan_quarantine_reduce+0x15d/0x180
[ 9.582229] __kasan_slab_alloc+0x69/0x90
[ 9.582232] kmem_cache_alloc_noprof+0x14a/0x500
[ 9.582234] do_getname+0x96/0x310
[ 9.582237] do_readlinkat+0x91/0x2f0
[ 9.582239] ? __pfx_do_readlinkat+0x10/0x10
[ 9.582240] ? get_random_bytes_user+0x1df/0x2c0
[ 9.582244] __x64_sys_readlinkat+0x96/0x100
[ 9.582246] do_syscall_64+0xce/0x650
[ 9.582250] ? __x64_sys_getrandom+0x13a/0x1e0
[ 9.582252] ? __pfx___x64_sys_getrandom+0x10/0x10
[ 9.582254] ? do_syscall_64+0x114/0x650
[ 9.582255] ? ksys_read+0xfc/0x1d0
[ 9.582258] ? __pfx_ksys_read+0x10/0x10
[ 9.582260] ? do_syscall_64+0x114/0x650
[ 9.582262] ? do_syscall_64+0x114/0x650
[ 9.582264] ? __pfx_fput_close_sync+0x10/0x10
[ 9.582266] ? file_close_fd_locked+0x178/0x2a0
[ 9.582268] ? __x64_sys_faccessat2+0x96/0x100
[ 9.582269] ? __x64_sys_close+0x7d/0xd0
[ 9.582271] ? do_syscall_64+0x114/0x650
[ 9.582273] ? do_syscall_64+0x114/0x650
[ 9.582275] ? clear_bhb_loop+0x50/0xa0
[ 9.582277] ? clear_bhb_loop+0x50/0xa0
[ 9.582279] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 9.582280] RIP: 0033:0x7ffbbda345ee
[ 9.582282] Code: 0f 1f 40 00 48 8b 15 29 38 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 0b 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fa 37 0d 00 f7 d8 64 89 01 48
[ 9.582284] RSP: 002b:00007ffe2ad8de58 EFLAGS: 00000202 ORIG_RAX: 000000000000010b
[ 9.582286] RAX: ffffffffffffffda RBX: 000055ee3aa25570 RCX: 00007ffbbda345ee
[ 9.582287] RDX: 000055ee3aa25570 RSI: 00007ffe2ad8dee0 RDI: 00000000ffffff9c
[ 9.582288] RBP: 0000000000001000 R08: 0000000000000003 R09: 0000000000001001
[ 9.582289] R10: 0000000000001000 R11: 0000000000000202 R12: 0000000000000033
[ 9.582290] R13: 00007ffe2ad8dee0 R14: 00000000ffffff9c R15: 00007ffe2ad8deb0
[ 9.582292] </TASK>
[ 9.582293] ---[ end trace 0000000000000000 ]---
Link: https://lore.kernel.org/20260331081312.123719-1-hao.ge@linux.dev
Fixes: dcfe378c81f72 ("lib: introduce support for page allocation tagging")
Signed-off-by: Hao Ge <hao.ge@linux.dev>
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Change liveupdate_unregister_file_handler and liveupdate_unregister_flb to
return void instead of an error code. This follows the design principle
that unregistration during module unload should not fail, as the unload
cannot be stopped at that point.
Link: https://lore.kernel.org/20260327033335.696621-10-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Cc: David Matlack <dmatlack@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Samiullah Khawaja <skhawaja@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Because liveupdate FLB objects will soon drop their persistent module
references when registered, list traversals must be protected against
concurrent module unloading.
To provide this protection, utilize the global luo_register_rwlock. It
protects the global registry of FLBs and the handler's specific list of
FLB dependencies.
Read locks are used during concurrent list traversals (e.g., during
preservation and serialization). Write locks are taken during
registration and unregistration.
Link: https://lore.kernel.org/20260327033335.696621-5-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Cc: David Matlack <dmatlack@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Samiullah Khawaja <skhawaja@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "liveupdate: prevent double preservation", v4.
Currently, LUO does not prevent the same file from being managed twice
across different active sessions.
Because LUO preserves files of absolutely different types: memfd, and
upcoming vfiofd [1], iommufd [2], guestmefd (and possible kvmfd/cpufd).
There is no common private data or guarantee on how to prevent that the
same file is not preserved twice beside using inode or some slower and
expensive method like hashtables.
This patch (of 4)
Currently, LUO does not prevent the same file from being managed twice
across different active sessions.
Use a global xarray luo_preserved_files to keep track of file identifiers
being preserved by LUO. Update luo_preserve_file() to check and insert
the file identifier into this xarray when it is preserved, and erase it in
luo_file_unpreserve_files() when it is released.
To allow handlers to define what constitutes a "unique" file (e.g.,
different struct file objects pointing to the same hardware resource), add
a get_id() callback to struct liveupdate_file_ops. If not provided, the
default identifier is the struct file pointer itself.
This ensures that the same file (or resource) cannot be managed by
multiple sessions. If another session attempts to preserve an already
managed file, it will now fail with -EBUSY.
Link: https://lore.kernel.org/20260326163943.574070-1-pasha.tatashin@soleen.com
Link: https://lore.kernel.org/20260326163943.574070-2-pasha.tatashin@soleen.com
Link: https://lore.kernel.org/all/20260129212510.967611-1-dmatlack@google.com [1]
Link: https://lore.kernel.org/all/20260203220948.2176157-1-skhawaja@google.com [2]
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: David Matlack <dmatlack@google.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Use Kexec Handover (KHO) to pass the previous kernel's version string and
the number of kexec reboots since the last cold boot to the next kernel,
and print it at boot time.
Example output:
[ 0.000000] KHO: exec from: 6.19.0-rc4-next-20260107 (count 1)
Motivation
==========
Bugs that only reproduce when kexecing from specific kernel versions are
difficult to diagnose. These issues occur when a buggy kernel kexecs into
a new kernel, with the bug manifesting only in the second kernel.
Recent examples include the following commits:
* commit eb2266312507 ("x86/boot: Fix page table access in
5-level to 4-level paging transition")
* commit 77d48d39e991 ("efistub/tpm: Use ACPI reclaim memory
for event log to avoid corruption")
* commit 64b45dd46e15 ("x86/efi: skip memattr table on kexec
boot")
As kexec-based reboots become more common, these version-dependent bugs
are appearing more frequently. At scale, correlating crashes to the
previous kernel version is challenging, especially when issues only occur
in specific transition scenarios.
Implementation
==============
The kexec metadata is stored as a plain C struct (struct
kho_kexec_metadata) rather than FDT format, for simplicity and direct
field access. It is registered via kho_add_subtree() as a separate
subtree, keeping it independent from the core KHO ABI. This design
choice:
- Keeps the core KHO ABI minimal and stable
- Allows the metadata format to evolve independently
- Avoids requiring version bumps for all KHO consumers (LUO, etc.)
when the metadata format changes
The struct kho_kexec_metadata contains two fields:
- previous_release: The kernel version that initiated the kexec
- kexec_count: Number of kexec boots since last cold boot
On cold boot, kexec_count starts at 0 and increments with each kexec. The
count helps identify issues that only manifest after multiple consecutive
kexec reboots.
[leitao@debian.org: call kho_kexec_metadata_init() for both boot paths]
Link: https://lore.kernel.org/all/20260309-kho-v8-5-c3abcf4ac750@debian.org/ [1]
Link: https://lore.kernel.org/20260409-kho_fix_merge_issue-v1-1-710c84ceaa85@debian.org
Link: https://lore.kernel.org/20260316-kho-v9-5-ed6dcd951988@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
kho_add_subtree() accepts a size parameter but only forwards it to
debugfs. The size is not persisted in the KHO FDT, so it is lost across
kexec. This makes it impossible for the incoming kernel to determine the
blob size without understanding the blob format.
Store the blob size as a "blob-size" property in the KHO FDT alongside the
"preserved-data" physical address. This allows the receiving kernel to
recover the size for any blob regardless of format.
Also extend kho_retrieve_subtree() with an optional size output parameter
so callers can learn the blob size without needing to understand the blob
format. Update all callers to pass NULL for the new parameter.
Link: https://lore.kernel.org/20260316-kho-v9-3-ed6dcd951988@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Since kho_add_subtree() now accepts arbitrary data blobs (not just FDTs),
rename the parameter from 'fdt' to 'blob' to better reflect its purpose.
Apply the same rename to kho_remove_subtree() for consistency.
Also rename kho_debugfs_fdt_add() and kho_debugfs_fdt_remove() to
kho_debugfs_blob_add() and kho_debugfs_blob_remove() respectively, with
the same parameter rename from 'fdt' to 'blob'.
Link: https://lore.kernel.org/20260316-kho-v9-2-ed6dcd951988@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "kho: history: track previous kernel version and kexec boot
count", v9.
Use Kexec Handover (KHO) to pass the previous kernel's version string and
the number of kexec reboots since the last cold boot to the next kernel,
and print it at boot time.
Example
=======
[ 0.000000] Linux version 6.19.0-rc3-upstream-00047-ge5d992347849
...
[ 0.000000] KHO: exec from: 6.19.0-rc4-next-20260107upstream-00004-g3071b0dc4498 (count 1)
Motivation
==========
Bugs that only reproduce when kexecing from specific kernel versions are
difficult to diagnose. These issues occur when a buggy kernel kexecs into
a new kernel, with the bug manifesting only in the second kernel.
Recent examples include:
* eb2266312507 ("x86/boot: Fix page table access in 5-level to 4-level paging transition")
* 77d48d39e991 ("efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption")
* 64b45dd46e15 ("x86/efi: skip memattr table on kexec boot")
As kexec-based reboots become more common, these version-dependent bugs
are appearing more frequently. At scale, correlating crashes to the
previous kernel version is challenging, especially when issues only occur
in specific transition scenarios.
Some bugs manifest only after multiple consecutive kexec reboots.
Tracking the kexec count helps identify these cases (this metric is
already used by live update sub-system).
KHO provides a reliable mechanism to pass information between kernels. By
carrying the previous kernel's release string and kexec count forward, we
can print this context at boot time to aid debugging.
The goal of this feature is to have this information being printed in
early boot, so, users can trace back kernel releases in kexec. Systemd is
not helpful because we cannot assume that the previous kernel has systemd
or even write access to the disk (common when using Linux as bootloaders)
This patch (of 6):
kho_add_subtree() assumes the fdt argument is always an FDT and calls
fdt_totalsize() on it in the debugfs code path. This assumption will
break if a caller passes arbitrary data instead of an FDT.
When CONFIG_KEXEC_HANDOVER_DEBUGFS is enabled, kho_debugfs_fdt_add() calls
__kho_debugfs_fdt_add(), which executes:
f->wrapper.size = fdt_totalsize(fdt);
Fix this by adding an explicit size parameter to kho_add_subtree() so
callers specify the blob size. This allows subtrees to contain arbitrary
data formats, not just FDTs. Update all callers:
- memblock.c: use fdt_totalsize(fdt)
- luo_core.c: use fdt_totalsize(fdt_out)
- test_kho.c: use fdt_totalsize()
- kexec_handover.c (root fdt): use fdt_totalsize(kho_out.fdt)
Also update __kho_debugfs_fdt_add() to receive the size explicitly instead
of computing it internally via fdt_totalsize(). In kho_in_debugfs_init(),
pass fdt_totalsize() for the root FDT and sub-blobs since all current
users are FDTs. A subsequent patch will persist the size in the KHO FDT
so the incoming side can handle non-FDT blobs correctly.
Link: https://lore.kernel.org/20260323110747.193569-1-duanchenghao@kylinos.cn
Link: https://lore.kernel.org/20260316-kho-v9-1-ed6dcd951988@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Suggested-by: Pratyush Yadav <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
mem_cgroup_update_lru_size()
The nr_pages parameter of mem_cgroup_update_lru_size() represents a page
count. During the reparenting of LRU folios, the value passed to it can
potentially exceed the maximum value of a 32-bit integer. It should be
declared as long instead of int to match the types used in lruvec size
accounting and to prevent possible overflow.
Update the parameter type to long to ensure correctness.
Link: https://lore.kernel.org/fd4140de44fa0a3978e4e2426731187fe8625f0b.1774604356.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We must ensure the folio is deleted from or added to the correct lruvec
list. So, add VM_WARN_ON_ONCE_FOLIO() to catch invalid users. The
VM_BUG_ON_PAGE() in move_pages_to_lru() can be removed as
add_page_to_lru_list() will perform the necessary check.
Link: https://lore.kernel.org/2c90fc006d9d730331a3caeef96f7e5dabe2036d.1772711148.git.zhengqi.arch@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yosry Ahmed <yosry@kernel.org>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Now that everything is set up, switch folio->memcg_data pointers to
objcgs, update the accessors, and execute reparenting on cgroup death.
Finally, folio->memcg_data of LRU folios and kmem folios will always point
to an object cgroup pointer. The folio->memcg_data of slab folios will
point to an vector of object cgroups.
Link: https://lore.kernel.org/80cb7af198dc6f2173fe616d1207a4c315ece141.1772711148.git.zhengqi.arch@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yosry Ahmed <yosry@kernel.org>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Convert objcg to be per-memcg per-node type, so that when reparent LRU
folios later, we can hold the lru lock at the node level, thus avoiding
holding too many lru locks at once.
[zhengqi.arch@bytedance.com: reset pn->orig_objcg to NULL]
Link: https://lore.kernel.org/20260309112939.31937-1-qi.zheng@linux.dev
[akpm@linux-foundation.org: fix comment typo, per Usama. Reflow comment to 80 cols]
[devnexen@gmail.com: fix obj_cgroup leak in mem_cgroup_css_online() error path]
Link: https://lore.kernel.org/20260322193631.45457-1-devnexen@gmail.com
[devnexen@gmail.com: add newline, per Qi Zheng]
Link: https://lore.kernel.org/20260323063007.7783-1-devnexen@gmail.com
Link: https://lore.kernel.org/56c04b1c5d54f75ccdc12896df6c1ca35403ecc3.1772711148.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yosry Ahmed <yosry@kernel.org>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Usama Arif <usama.arif@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
For cgroup v2, count_shadow_nodes() is the only place to read
non-hierarchical stats (lruvec_stats->state_local). To avoid the need to
consider cgroup v2 during subsequent non-hierarchical stats reparenting,
use lruvec_lru_size() instead of lruvec_page_state_local() to get the
number of lru pages.
For NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B cases, it appears
that the statistics here have already been problematic for a while since
slab pages have been reparented. So just ignore it for now.
Link: https://lore.kernel.org/b1d448c667a8fb377c3390d9aba43bdb7e4d5739.1772711148.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yosry Ahmed <yosry@kernel.org>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Similar to traditional LRU folios, in order to solve the dying memcg
problem, we also need to reparenting MGLRU folios to the parent memcg when
memcg offline.
However, there are the following challenges:
1. Each lruvec has between MIN_NR_GENS and MAX_NR_GENS generations, the
number of generations of the parent and child memcg may be different,
so we cannot simply transfer MGLRU folios in the child memcg to the
parent memcg as we did for traditional LRU folios.
2. The generation information is stored in folio->flags, but we cannot
traverse these folios while holding the lru lock, otherwise it may
cause softlockup.
3. In walk_update_folio(), the gen of folio and corresponding lru size
may be updated, but the folio is not immediately moved to the
corresponding lru list. Therefore, there may be folios of different
generations on an LRU list.
4. In lru_gen_del_folio(), the generation to which the folio belongs is
found based on the generation information in folio->flags, and the
corresponding LRU size will be updated. Therefore, we need to update
the lru size correctly during reparenting, otherwise the lru size may
be updated incorrectly in lru_gen_del_folio().
Finally, this patch chose a compromise method, which is to splice the lru
list in the child memcg to the lru list of the same generation in the
parent memcg during reparenting. And in order to ensure that the parent
memcg has the same generation, we need to increase the generations in the
parent memcg to the MAX_NR_GENS before reparenting.
Of course, the same generation has different meanings in the parent and
child memcg, this will cause confusion in the hot and cold information of
folios. But other than that, this method is simple enough, the lru size
is correct, and there is no need to consider some concurrency issues (such
as lru_gen_del_folio()).
To prepare for the above work, this commit implements the specific
functions, which will be used during reparenting.
[zhengqi.arch@bytedance.com: use list_splice_tail_init() to reparent child folios]
Link: https://lore.kernel.org/20260324114937.28569-1-qi.zheng@linux.dev
Link: https://lore.kernel.org/e75050354cdbc42221a04f7cf133292b61105548.1772711148.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Suggested-by: Harry Yoo <harry.yoo@oracle.com>
Suggested-by: Imran Khan <imran.f.khan@oracle.com>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yosry Ahmed <yosry@kernel.org>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
To resolve the dying memcg issue, we need to reparent LRU folios of child
memcg to its parent memcg. For traditional LRU list, each lruvec of every
memcg comprises four LRU lists. Due to the symmetry of the LRU lists, it
is feasible to transfer the LRU lists from a memcg to its parent memcg
during the reparenting process.
This commit implements the specific function, which will be used during
the reparenting process.
Link: https://lore.kernel.org/a92d217a9fc82bd0c401210204a095caaf615b1c.1772711148.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Muchun Song <muchun.song@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yosry Ahmed <yosry@kernel.org>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The following diagram illustrates how to ensure the safety of the folio
lruvec lock when LRU folios undergo reparenting.
In the folio_lruvec_lock(folio) function:
rcu_read_lock();
retry:
lruvec = folio_lruvec(folio);
/* There is a possibility of folio reparenting at this point. */
spin_lock(&lruvec->lru_lock);
if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) {
/*
* The wrong lruvec lock was acquired, and a retry is required.
* This is because the folio resides on the parent memcg lruvec
* list.
*/
spin_unlock(&lruvec->lru_lock);
goto retry;
}
/* Reaching here indicates that folio_memcg() is stable. */
In the memcg_reparent_objcgs(memcg) function:
spin_lock(&lruvec->lru_lock);
spin_lock(&lruvec_parent->lru_lock);
/* Transfer folios from the lruvec list to the parent's. */
spin_unlock(&lruvec_parent->lru_lock);
spin_unlock(&lruvec->lru_lock);
After acquiring the lruvec lock, it is necessary to verify whether the
folio has been reparented. If reparenting has occurred, the new lruvec
lock must be reacquired. During the LRU folio reparenting process, the
lruvec lock will also be acquired (this will be implemented in a
subsequent patch). Therefore, folio_memcg() remains unchanged while the
lruvec lock is held.
Given that lruvec_memcg(lruvec) is always equal to folio_memcg(folio)
after the lruvec lock is acquired, the lruvec_memcg_debug() check is
redundant. Hence, it is removed.
This patch serves as a preparation for the reparenting of LRU folios.
Link: https://lore.kernel.org/23f22cbb1419f277a3483018b32158ae2b86c666.1772711148.git.zhengqi.arch@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yosry Ahmed <yosry@kernel.org>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Now we have lruvec_unlock(), lruvec_unlock_irq() and
lruvec_unlock_irqrestore(), but no the paired lruvec_lock(),
lruvec_lock_irq() and lruvec_lock_irqsave().
There is currently no use case for lruvec_lock_irqsave(), so only
introduce lruvec_lock_irq(), and change all open-code places to use this
helper function. This looks cleaner and prepares for reparenting LRU
pages, preventing user from missing RCU lock calls due to open-code lruvec
lock.
Link: https://lore.kernel.org/2d0bafe7564e17ece46dfd58197af22ce57017dc.1772711148.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yosry Ahmed <yosry@kernel.org>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In the near future, a folio will no longer pin its corresponding memory
cgroup. To ensure safety, it will only be appropriate to hold the rcu
read lock or acquire a reference to the memory cgroup returned by
folio_memcg(), thereby preventing it from being released.
In the current patch, the rcu read lock is employed to safeguard against
the release of the memory cgroup in count_memcg_folio_events().
This serves as a preparatory measure for the reparenting of the
LRU pages.
Link: https://lore.kernel.org/dea6aa0389367f7fd6b715c8837a2cf7506bd889.1772711148.git.zhengqi.arch@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yosry Ahmed <yosry@kernel.org>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In the near future, a folio will no longer pin its corresponding memory
cgroup. To ensure safety, it will only be appropriate to hold the rcu
read lock or acquire a reference to the memory cgroup returned by
folio_memcg(), thereby preventing it from being released.
In the current patch, the function get_mem_cgroup_css_from_folio() and the
rcu read lock are employed to safeguard against the release of the memory
cgroup.
This serves as a preparatory measure for the reparenting of the
LRU pages.
Link: https://lore.kernel.org/645f99bc344575417f67def3744f975596df2793.1772711148.git.zhengqi.arch@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yosry Ahmed <yosry@kernel.org>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Memory cgroup functions such as get_mem_cgroup_from_folio() and
get_mem_cgroup_from_mm() return a valid memory cgroup pointer, even for
the root memory cgroup. In contrast, the situation for object cgroups has
been different.
Previously, the root object cgroup couldn't be returned because it didn't
exist. Now that a valid root object cgroup exists, for the sake of
consistency, it's necessary to align the behavior of object-cgroup-related
operations with that of memory cgroup APIs.
Link: https://lore.kernel.org/e9c3f40ba7681d9753372d4ee2ac7a0216848b95.1772711148.git.zhengqi.arch@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yosry Ahmed <yosry@kernel.org>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It is inappropriate to use folio_lruvec_lock() variants in conjunction
with unlock_page_lruvec() variants, as this involves the inconsistent
operation of locking a folio while unlocking a page. To rectify this, the
functions unlock_page_lruvec{_irq, _irqrestore} are renamed to
lruvec_unlock{_irq,_irqrestore}.
Link: https://lore.kernel.org/4e5e05271a250df4d1812e1832be65636a78c957.1772711148.git.zhengqi.arch@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Chen Ridong <chenridong@huawei.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yosry Ahmed <yosry@kernel.org>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit c50ca15dd496 ("mm: add vm_ops->mapped hook") introduced
__vma_check_mmap_hook() in order to assert that a driver doesn't
incorrectly implement both an f_op->mmap() and a vm_ops->mapped hook, the
latter of which would not ultimately get invoked.
However, this did not correctly account for stacked drivers (or drivers
that otherwise use the compatibility layer) which might recursively call
an mmap_prepare hook via the compatibility layer.
Thus the nested mmap_prepare() invocation might result in a VMA which has
vm_ops->mapped set with an overlaying mmap() hook, causing the
__vma_check_mmap_hook() to fail in vfs_mmap(), wrongly failing the
operation.
This patch resolves this by simply removing the check, as we can't be
certain that an mmap() hook doesn't at some point invoke the compatibility
layer, and it's not worth trying to track it.
Link: https://lore.kernel.org/20260413105713.92625-1-ljs@kernel.org
Fixes: c50ca15dd496 ("mm: add vm_ops->mapped hook")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/all/adx2ws5z0NMIe5Yj@shinmob/
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Tested-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD updates from Miquel Raynal:
"MTD changes:
- mtdconcat finally makes it in, after several years of being merged
and reverted
- Baikal SoC support is being removed, so MTD bits are being removed
as well
- misc cleanups
NAND changes:
- SunXi driver support for new versions of the Allwinner NAND
controller.
- DT-binding improvements and cleanups.
- A few fixes (Realtek ECC and Winbond SPI NAND), aside with the
usual load of misc changes.
SPI NOR fixes:
- Enable die erase on MT35XU02GCBA. We knew this flash needed this
fixup since 7f77c561e227 ("mtd: spi-nor: micron-st: add TODO for
fixing mt35xu02gcba") but did not add it due to lack of hardware to
test on.
- Fix locking on some Winbond w25q series flashes.
- Fix Auto Address Increment (AAI) writes on SST that flashes that
start on odd address. The write enable latch needs to be set again
after the single byte program"
* tag 'mtd/for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (44 commits)
mtd: spinand: winbond: Declare the QE bit on W25NxxJW
mtd: spi-nor: micron-st: Enable die erase support for MT35XU02GCBA
mtd: spi-nor: winbond: Fix locking support for w25q256jw
mtd: spi-nor: sst: Fix write enable before AAI sequence
mtd: spi-nor: winbond: Fix locking support for w25q64jvm
mtd: spi-nor: winbond: Fix locking support for w25q256jwm
dt-bindings: mtd: mxc-nand: add missing compatible string and ref to nand-controller-legacy.yaml
dt-bindings: mtd: gpmi-nand: ref to nand-controller-legacy.yaml
dt-bindings: mtd: refactor NAND bindings and add nand-controller-legacy.yaml
mtd: spinand: winbond: Clarify when to enable the HS bit
mtd: rawnand: sunxi: introduce maximize variable user data length
mtd: rawnand: sunxi: fix typos in comments
mtd: rawnand: sunxi: change error prone variable name
mtd: rawnand: sunxi: remove dead code
mtd: rawnand: sunxi: make the code more self-explanatory
mtd: rawnand: sunxi: replace hard coded value by a define - take2
mtd: rawnand: sunxi: do not count BBM bytes twice
mtd: rawnand: sunxi: fix sunxi_nfc_hw_ecc_read_extra_oob
mtd: rawnand: sunxi: sunxi_nand_ooblayout_free code clarification
mtd: cmdlinepart: use a flexible array member
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
- Refactor code paths involved with partial block zero-out in
prearation for converting ext4 to use iomap for buffered writes
- Remove use of d_alloc() from ext4 in preparation for the deprecation
of this interface
- Replace some J_ASSERTS with a journal abort so we can avoid a kernel
panic for a localized file system error
- Simplify various code paths in mballoc, move_extent, and fast commit
- Fix rare deadlock in jbd2_journal_cancel_revoke() that can be
triggered by generic/013 when blocksize < pagesize
- Fix memory leak when releasing an extended attribute when its value
is stored in an ea_inode
- Fix various potential kunit test bugs in fs/ext4/extents.c
- Fix potential out-of-bounds access in check_xattr() with a corrupted
file system
- Make the jbd2_inode dirty range tracking safe for lockless reads
- Avoid a WARN_ON when writeback files due to a corrupted file system;
we already print an ext4 warning indicatign that data will be lost,
so the WARN_ON is not necessary and doesn't add any new information
* tag 'ext4_for_linux-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (37 commits)
jbd2: fix deadlock in jbd2_journal_cancel_revoke()
ext4: fix missing brelse() in ext4_xattr_inode_dec_ref_all()
ext4: fix possible null-ptr-deref in mbt_kunit_exit()
ext4: fix possible null-ptr-deref in extents_kunit_exit()
ext4: fix the error handling process in extents_kunit_init).
ext4: call deactivate_super() in extents_kunit_exit()
ext4: fix miss unlock 'sb->s_umount' in extents_kunit_init()
ext4: fix bounds check in check_xattrs() to prevent out-of-bounds access
ext4: zero post-EOF partial block before appending write
ext4: move pagecache_isize_extended() out of active handle
ext4: remove ctime/mtime update from ext4_alloc_file_blocks()
ext4: unify SYNC mode checks in fallocate paths
ext4: ensure zeroed partial blocks are persisted in SYNC mode
ext4: move zero partial block range functions out of active handle
ext4: pass allocate range as loff_t to ext4_alloc_file_blocks()
ext4: remove handle parameters from zero partial block functions
ext4: move ordered data handling out of ext4_block_do_zero_range()
ext4: rename ext4_block_zero_page_range() to ext4_block_zero_range()
ext4: factor out journalled block zeroing range
ext4: rename and extend ext4_block_truncate_page()
...
|
|
Pull bpf fixes from Alexei Starovoitov:
"Most of the diff stat comes from Xu Kuohai's fix to emit ENDBR/BTI,
since all JITs had to be touched to move constant blinding out and
pass bpf_verifier_env in.
- Fix use-after-free in arena_vm_close on fork (Alexei Starovoitov)
- Dissociate struct_ops program with map if map_update fails (Amery
Hung)
- Fix out-of-range and off-by-one bugs in arm64 JIT (Daniel Borkmann)
- Fix precedence bug in convert_bpf_ld_abs alignment check (Daniel
Borkmann)
- Fix arg tracking for imprecise/multi-offset in BPF_ST/STX insns
(Eduard Zingerman)
- Copy token from main to subprogs to fix missing kallsyms (Eduard
Zingerman)
- Prevent double close and leak of btf objects in libbpf (Jiri Olsa)
- Fix af_unix null-ptr-deref in sockmap (Michal Luczaj)
- Fix NULL deref in map_kptr_match_type for scalar regs (Mykyta
Yatsenko)
- Avoid unnecessary IPIs. Remove redundant bpf_flush_icache() in
arm64 and riscv JITs (Puranjay Mohan)
- Fix out of bounds access. Validate node_id in arena_alloc_pages()
(Puranjay Mohan)
- Reject BPF-to-BPF calls and callbacks in arm32 JIT (Puranjay Mohan)
- Refactor all JITs to pass bpf_verifier_env to emit ENDBR/BTI for
indirect jump targets on x86-64, arm64 JITs (Xu Kuohai)
- Allow UTF-8 literals in bpf_bprintf_prepare() (Yihan Ding)"
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (32 commits)
bpf, arm32: Reject BPF-to-BPF calls and callbacks in the JIT
bpf: Dissociate struct_ops program with map if map_update fails
bpf: Validate node_id in arena_alloc_pages()
libbpf: Prevent double close and leak of btf objects
selftests/bpf: cover UTF-8 trace_printk output
bpf: allow UTF-8 literals in bpf_bprintf_prepare()
selftests/bpf: Reject scalar store into kptr slot
bpf: Fix NULL deref in map_kptr_match_type for scalar regs
bpf: Fix precedence bug in convert_bpf_ld_abs alignment check
bpf, arm64: Emit BTI for indirect jump target
bpf, x86: Emit ENDBR for indirect jump targets
bpf: Add helper to detect indirect jump targets
bpf: Pass bpf_verifier_env to JIT
bpf: Move constants blinding out of arch-specific JITs
bpf, sockmap: Take state lock for af_unix iter
bpf, sockmap: Fix af_unix null-ptr-deref in proto update
selftests/bpf: Extend bpf_iter_unix to attempt deadlocking
bpf, sockmap: Fix af_unix iter deadlock
bpf, sockmap: Annotate af_unix sock:: Sk_state data-races
selftests/bpf: verify kallsyms entries for token-loaded subprograms
...
|