| Age | Commit message (Collapse) | Author | Files | Lines |
|
Introduce a bus specific probe and remove function. For now this only
allows to get rid of a cast of the generic device to an fsi device in
the drivers and changes the remove prototype to return void---a non-zero
return value is ignored anyhow.
The objective is to get rid of users of struct device callbacks
.probe(), .remove() and .shutdown() to eventually remove these.
Until all fsi drivers are converted this results in a runtime warning
about the drivers needing an update because there is a bus probe
function and a driver probe function.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Acked-by: Eddie James <eajames@linux.ibm.com>
Link: https://patch.msgid.link/3b53adb75a5ae7894736d46cb6eb85f5ef36520e.1765279318.git.u.kleine-koenig@baylibre.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There are no users of fsi_bus_type outside of fsi-core.c, so make that
variable static, don't export it and drop the declaration from the
public header file.
As there is a usage of fsi_bus_type in fsi_create_device() the
definition of that variable must happen further up in the file to not
have to add a local declaration.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Acked-by: Eddie James <eajames@linux.ibm.com>
Link: https://patch.msgid.link/bfd83034dec04d5a6b01a234988377fc6224614d.1765279318.git.u.kleine-koenig@baylibre.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Similar to wrappers for other subsystems provide inline functions for
fsi devices to store driver data.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Acked-by: Eddie James <eajames@linux.ibm.com>
Link: https://patch.msgid.link/5de7a7cbb30918b3503235130bd8aa1a9a63d71c.1765279318.git.u.kleine-koenig@baylibre.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add GPL-2.0 license id to some files related to kdb and kgdb,
replacing references to GPL or COPYING.
These files were introduced into the kernel in 2008 and 2010.
Signed-off-by: Tim Bird <tim.bird@sony.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Call the parent driver pcode functions through the parent interface
function pointers instead of expecting both to have functions of the
same name.
In i915, add the interface to existing intel_pcode.[ch], while in xe
move them to new display/xe_display_pcode.[ch] and build it only for
CONFIG_DRM_XE_DISPLAY=y.
Do not add separate write and write_timeout calls in the
interface. Instead, handle the default 1 ms timeout in the
intel_parent.c glue layer.
This drops the last intel_pcode.h includes from display, and allows us
to remove the corresponding xe compat header.
v2: initialize .pcode in i915
Reviewed-by: Michał Grzelak <michal.grzelak@intel.com>
Link: https://patch.msgid.link/20260126112925.2452171-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
To probe an IWB in an ACPI based system it is required:
- to implement the IORT functions handling the IWB IORT node and create
functions to retrieve IWB firmware information
- to augment the driver to match the DSDT ACPI "ARMH0003" device and
retrieve the IWB wire and trigger mask from the GSI interrupt descriptor
in the IWB msi_domain_ops.msi_translate() function
Make the required driver changes to enable IWB probing in ACPI systems.
The GICv5 GSI format requires special handling for IWB routed IRQs.
Add IWB GSI detection to the top level driver gic_v5_get_gsi_domain_id()
function so that the correct IRQ domain for a GSI can be detected by
parsing the GSI and check whether it is an IWB-backed IRQ or not.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260115-gicv5-host-acpi-v3-6-c13a9a150388@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
On ACPI ARM64 systems the GICv5 ITS configuration and translate frames
are described in the MADT table.
Refactor the current GICv5 ITS driver code to share common functions
between ACPI and OF and implement ACPI probing in the GICv5 ITS driver.
Add iort_msi_xlate() to map a device ID and retrieve an MSI controller
fwnode node for ACPI systems and update pci_msi_map_rid_ctlr_node() to
use it in its ACPI code path.
Add the required functions to IORT code for deviceID retrieval and IRQ
domain registration and look-up so that the GICv5 ITS driver in an
ACPI based system can be successfully probed.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260115-gicv5-host-acpi-v3-5-c13a9a150388@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
On ARM64 ACPI systems GICv5 IRSes are described in MADT sub-entries.
Add the required plumbing to parse MADT IRS firmware table entries and
probe the IRS components in ACPI.
Augment the irqdomain_ops.translate() for PPI and SPI IRQs in order to
provide support for their ACPI based firmware translation.
Implement an irqchip ACPI based callback to initialize the global GSI
domain upon an MADT IRS detection.
The IRQCHIP_ACPI_DECLARE() entry in the top level GICv5 driver is only used
to trigger the IRS probing (ie the global GSI domain is initialized once on
the first call on multi-IRS systems); IRS probing takes place by calling
acpi_table_parse_madt() in the IRS sub-driver, that probes all IRSes
in sequence.
Add a new ACPI interrupt model so that it can be detected at runtime and
distinguished from previous GIC architecture models.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260115-gicv5-host-acpi-v3-4-c13a9a150388@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
To support booting with OF and ACPI seamlessly, GIC ITS parent code
requires the PCI/MSI irqdomain layer to implement a function to retrieve
an MSI controller fwnode and map an RID in a firmware agnostic way
(ie pci_msi_map_rid_ctlr_node()).
Convert pci_msi_map_rid_ctlr_node() to an OF agnostic interface
(fwnode_handle based) and update the GIC ITS MSI parent code to reflect
the pci_msi_map_rid_ctlr_node() change.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260115-gicv5-host-acpi-v3-2-c13a9a150388@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The GICv5 driver IRQ domain hierarchy requires adding a parent field to
struct irqchip_fwid so that core code can reference a fwnode_handle parent
for a given fwnode.
Add a parent field to struct irqchip_fwid and update the related kernel API
functions to initialize and handle it.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260115-gicv5-host-acpi-v3-1-c13a9a150388@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
|
|
Introduce a basic DM implementation that enables creating and
registering device memory, and using the associated memory keys
for networking operations.
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://patch.msgid.link/20260127082649.429018-1-kotaranov@linux.microsoft.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull CPUFreq Arm updates for 7.0 from Viresh Kumar:
"- Update cpufreq-dt-platdev list for tegra, qcom, TI (Aaron Kling,
Dhruva Gole, and Konrad Dybcio).
- Minor improvements to the cpufreq / cpumask rust implementation
(Alexandre Courbot, Alice Ryhl, Tamir Duberstein, and Yilin Chen).
- Add support for AM62L3 SoC to ti-cpufreq driver (Dhruva Gole).
- Update FIE arch_freq_scale in ticks for non-PCC regs (Jie Zhan).
- Other minor cleanups / improvements (Felix Gu, Juan Martinez, Luca
Weiss, and Sergey Shtylyov)."
* tag 'cpufreq-arm-updates-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: scmi: Fix device_node reference leak in scmi_cpu_domain_id()
cpufreq: ti-cpufreq: add support for AM62L3 SoC
cpufreq: dt-platdev: Add ti,am62l3 to blocklist
cpufreq/amd-pstate: Add comment explaining nominal_perf usage for performance policy
cpufreq: scmi: correct SCMI explanation
cpufreq: dt-platdev: Block the driver from probing on more QC platforms
rust: cpumask: rename methods of Cpumask for clarity and consistency
cpufreq: CPPC: Update FIE arch_freq_scale in ticks for non-PCC regs
cpufreq: CPPC: Factor out cppc_fie_kworker_init()
ACPI: CPPC: Factor out and export per-cpu cppc_perf_ctrs_in_pcc_cpu()
rust: cpufreq: replace `kernel::c_str!` with C-Strings
cpufreq: Add Tegra186 and Tegra194 to cpufreq-dt-platdev blocklist
dt-bindings: cpufreq: qcom-hw: document Milos CPUFREQ Hardware
rust: cpufreq: add __rust_helper to helpers
rust: cpufreq: always inline functions using build_assert with arguments
|
|
This reverts commit d000422a46aad32217cf1be747eb61d641baae2f.
It is reported by many to cause boot failures, so must be reverted.
Cc: Xin Zhao <jackzxcui1989@163.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/d1942304-ee30-478d-90fb-279519f3ae81@samsung.com
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reported-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Commit e062bdfdd6ad ("regmap: warn users about uninitialized flat cache")
warns when REGCACHE_FLAT is used without full defaults. This causes
false positives on hardware where many registers reset to zero but are
not listed in reg_defaults, forcing drivers to maintain large tables
just to silence the warning.
Add a reg_default_cb() hook so drivers can supply defaults for registers
not present in reg_defaults when populating REGCACHE_FLAT. This keeps
the warning quiet for known zero-reset registers without bloating
tables. Provide a generic regmap_default_zero_cb() helper for drivers
that need zero defaults.
The hook is only used for REGCACHE_FLAT; the core does not
check readable/writeable access, so drivers must provide readable_reg/
writeable_reg callbacks and handle holes in the register map.
Signed-off-by: Sheetal <sheetal@nvidia.com>
Link: https://patch.msgid.link/20260123095346.1258556-3-sheetal@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Since commit d6df7df7ae5a0 ("ARM: pxa: remove unused board files"), there
has been no in-tree user of the AK4641 codec driver. The last user
(HP iPAQ hx4700) was a non-DT PXA board file that instantiated the device
via I2C board data; that code was removed as part of the PXA board-file
purge.
The AK4641 driver was introduced ~2011 and still probes only via the I2C
device-ID table ('.id_table'), without an 'of_match_table', so there are
no upstream Devicetree users to retain. With no in-tree users left, remove
the driver.
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Link: https://patch.msgid.link/20260122-sound-cleanup-v1-1-0a91901609b8@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Although value 4 (INDOOR_SP_AP_OLD) is deprecated in IEEE standards,
existing APs may still use this control value. Since this value is
based on the old specification, we cannot trust such APs implement
proper power controls.
Therefore, move IEEE80211_6GHZ_CTRL_REG_INDOOR_SP_AP_OLD case
from SP_AP to LPI_AP power type handling to prevent potential
power limit violations.
Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260111163601.6b5a36d3601e.I1704ee575fd25edb0d56f48a0a3169b44ef72ad0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
To prepare sdio drivers to migrate away from struct device_driver::shutdown
(and then eventually remove that callback) create a serdev driver shutdown
callback and migration code to keep the existing behaviour. Note this
introduces a warning for each driver that isn't converted yet to that
callback at register time.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/397f45c2818f6632151f92b70e547262f373c3b6.1768232321.git.u.kleine-koenig@baylibre.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add an option to operate as the RSTA in an FTM measurement request.
When requested, the device will dwell on the requested channel until
the peer starts the FTM negotiation. This option is only valid for
trigger-based/non trigger-based measurement with LMR feedback which
will allow the RSTA to receive the results of the measurement.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260111190221.1f95fc0afab4.Iae2d32783b8e7c4a29089fec0f4c6bce94d303cc@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The FTM result includes some of the periodic measurement negotiated
parameters (like the burst duration and number of bursts), but it
doesn't include the burst period. Add it to the FTM result
notification.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260111190221.e0778f86edef.I3c98c1933eb639963bc3ffdef81a8788b59f2188@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
ranging
Periodic FTM request attributes are defined based on the periodic
parameters used in EDCA-based ranging negotiation. However, non-EDCA
based ranging (trigger-based/non-trigger-based) does not include
periodic parameters in the negotiation protocol, even though upper
layers may still request periodic measurements.
Clarify the semantics of periodic ranging attributes when used with
non-EDCA based ranging.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260111190221.b89cb3f68e1a.I7a9d8c6d1c66c77f1b43120a841101c96c3f19ad@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add new capabilities to the PMSR FTM capabilities list. The new
capabilities include 6 GHz support, supported number of spatial streams
and supported number of LTF repetitions.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Tested-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260111190221.bf43785c18f6.Ic98cf9790ddee84bf88e5720b93c46c23af3c96c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add netns support to loopback and vhost. Keep netns disabled for
virtio-vsock, but add necessary changes to comply with common API
updates.
This is the patch in the series when vhost-vsock namespaces actually
come online.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-3-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add netns logic to vsock core. Additionally, modify transport hook
prototypes to be used by later transport-specific patches (e.g.,
*_seqpacket_allow()).
Namespaces are supported primarily by changing socket lookup functions
(e.g., vsock_find_connected_socket()) to take into account the socket
namespace and the namespace mode before considering a candidate socket a
"match".
This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode to
report the mode and /proc/sys/net/vsock/child_ns_mode to set the mode
for new namespaces.
Add netns functionality (initialization, passing to transports, procfs,
etc...) to the af_vsock socket layer. Later patches that add netns
support to transports depend on this patch.
This patch changes the allocation of random ports for connectible vsocks
in order to avoid leaking the random port range starting point to other
namespaces.
dgram_allow(), stream_allow(), and seqpacket_allow() callbacks are
modified to take a vsk in order to perform logic on namespace modes. In
future patches, the net will also be used for socket
lookups in these functions.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-1-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Along with renaming the .move_notify() callback, rename the corresponding
dma-buf core function. This makes the expected behavior clear to exporters
calling this function.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20260124-dmabuf-revoke-v5-2-f98fca917e96@nvidia.com
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
Rename the .move_notify() callback to .invalidate_mappings() to make its
purpose explicit and highlight that it is responsible for invalidating
existing mappings.
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20260124-dmabuf-revoke-v5-1-f98fca917e96@nvidia.com
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
The helper makes it easier to handle optional GPIOs and simplifies the
error handling code.
Signed-off-by: Stefan Kerkmann <s.kerkmann@pengutronix.de>
Signed-off-by: Michael Tretter <m.tretter@pengutronix.de>
Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>
Link: https://lore.kernel.org/r/20260126-gpio-devm_fwnode_gpiod_get_optional-v2-1-ec34f8e35077@pengutronix.de
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
|
|
Factor out cppc_perf_ctrs_in_pcc_cpu() for checking whether per-cpu CPC
regs are defined in PCC channels, and export it out for further use.
Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
There are no more callers of this function after commit f6b2d8b134b2413
("btrfs: track the next file offset in struct btrfs_bio_ctrl"), so
remove the function.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is no DAMON API caller that directly access 'kdamond' and
'kdamond_lock' fields of 'struct damon_ctx'. Keeping those exposed could
only encourage creative but error-prone usages. Hide them from DAMON API
callers by marking those as private fields.
Link: https://lkml.kernel.org/r/20260115152047.68415-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm/damon: hide kdamond and kdamond_lock from API callers".
'kdamond' and 'kdamond_lock' fields initially exposed to DAMON API callers
for flexible synchronization and use cases. As DAMON API became somewhat
complicated compared to the early days, Keeping those exposed could only
encourage the API callers to invent more creative but complicated and
difficult-to-debug use cases.
Fortunately DAMON API callers didn't invent that many creative use cases.
There exist only two use cases of 'kdamond' and 'kdamond_lock'. Finding
whether the kdamond is actively running, and getting the pid of the
kdamond. For the first use case, a dedicated API function, namely
'damon_is_running()' is provided, and all DAMON API callers are using the
function for the use case. Hence only the second use case is where the
fields are directly being used by DAMON API callers.
To prevent future invention of complicated and erroneous use cases of the
fields, hide the fields from the API callers. For that, provide new
dedicated DAMON API functions for the remaining use case, namely
damon_kdamond_pid(), migrate DAMON API callers to use the new function,
and mark the fields as private fields.
This patch (of 5):
'kdamond' and 'kdamond_lock' are directly being used by DAMON API callers
for getting the pid of the corresponding kdamond. To discourage invention
of creative but complicated and erroneous new usages of the fields that
require careful synchronization, implement a new API function that can
simply be used without the manual synchronizations.
Link: https://lkml.kernel.org/r/20260115152047.68415-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20260115152047.68415-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nodemask: align nodes_and{,not} with underlying bitmap ops".
nodes_and{,not} are void despite that underlying bitmap_and(,not) return
boolean, true if the result bitmap is non-empty. Align nodemask API, and
simplify client code.
This patch (of 3):
Bitmap functions bitmap_and{,not} return boolean depending on emptiness of
the result bitmap. The corresponding nodemask helpers ignore the returned
value.
Propagate the underlying bitmaps result to nodemasks users, as it
simplifies user code.
Link: https://lkml.kernel.org/r/20260114172217.861204-1-ynorov@nvidia.com
Link: https://lkml.kernel.org/r/20260114172217.861204-2-ynorov@nvidia.com
Signed-off-by: Yury Norov <ynorov@nvidia.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Waiman Long <longman@redhat.com>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
[__]page_table_check_pte_clear()
This reverts commit aa232204c468 ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pte_clear").
Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the page
in the pte, but instead in the address of the access.
[ajd@linux.ibm.com: rebase, fix additional occurrence and loop handling]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-8-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Ingo Molnar <mingo@kernel.org> # x86
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
[__]page_table_check_pmd_clear()
This reverts commit 1831414cd729 ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pmd_clear").
Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the page
in the pte, but instead in the address of the access.
[ajd@linux.ibm.com: rebase on arm64 changes]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-7-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Ingo Molnar <mingo@kernel.org> # x86
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
[__]page_table_check_pud_clear()
This reverts commit 931c38e16499 ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pud_clear").
Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the page
in the pte, but instead in the address of the access.
[ajd@linux.ibm.com: rebase on arm64 changes]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-6-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Ingo Molnar <mingo@kernel.org> # x86
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
To provide support for powerpc platforms, provide an addr parameter to the
__page_table_check_ptes_set() and page_table_check_ptes_set() routines.
This parameter is needed on some powerpc platforms which do not encode
whether a mapping is for user or kernel in the pte. On such platforms,
this can be inferred from the addr parameter.
[ajd@linux.ibm.com: rebase on arm64 + riscv changes, update commit message]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-5-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
[__]page_table_check_pmd[s]_set()
This reverts commit a3b837130b58 ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pmd_set").
Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.
Apply this to __page_table_check_pmds_set(), page_table_check_pmd_set(), and
the page_table_check_pmd_set() wrapper macro.
[ajd@linux.ibm.com: rebase on arm64 + riscv changes, update commit message]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-4-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Ingo Molnar <mingo@kernel.org> # x86
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
[__]page_table_check_pud[s]_set()
This reverts commit 6d144436d954 ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pud_set").
Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the page
in the pte, but instead in the address of the access.
Apply this to __page_table_check_puds_set(), page_table_check_puds_set()
and the page_table_check_pud_set() wrapper macro.
[ajd@linux.ibm.com: rebase on riscv + arm64 changes, update commit message]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-3-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Ingo Molnar <mingo@kernel.org> # x86
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Instead of translating between RMP_ and TTU_ flags, remove the RMP_ flags
and just use the TTU_ flag space; there's plenty available.
Possibly we should rename these to RMAP_ flags, and maybe even pass them
in through rmap_walk_arg, but that can be done later.
Link: https://lkml.kernel.org/r/20260109041345.3863089-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Jann Horn <jannh@google.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
commit bda420b98505 ("numa balancing: migrate on fault among multiple
bound nodes") adds new flag MPOL_F_NUMA_BALANCING to enable NUMA balancing
for MPOL_BIND memory policy.
When the cpuset of tasks changes, the mempolicy of the task is rebound by
mpol_rebind_nodemask(). When MPOL_F_STATIC_NODES and
MPOL_F_RELATIVE_NODES are both not set, the behaviour of rebinding should
be same whenever MPOL_F_NUMA_BALANCING is set or not. So, when an
application calls set_mempolicy() with MPOL_F_NUMA_BALANCING set but both
MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES cleared,
mempolicy.w.cpuset_mems_allowed should be set to
cpuset_current_mems_allowed nodemask. However, in current implementation,
mpol_store_user_nodemask() wrongly returns true, causing
mempolicy->w.user_nodemask to be incorrectly set to the user-specified
nodemask. Later, when the cpuset of the application changes,
mpol_rebind_nodemask() ends up rebinding based on the user-specified
nodemask rather than the cpuset_mems_allowed nodemask as intended.
I can reproduce with the following steps in qemu with 4 NUMA nodes:
1. echo '+cpuset' > /sys/fs/cgroup/cgroup.subtree_control
2. mkdir /sys/fs/cgroup/test
3. ./reproducer &
4. cat /proc/$pid/numa_maps, the task is bound to NUMA 1
5. echo $pid > /sys/fs/cgroup/test/cgroup.procs
6. cat /proc/$pid/numa_maps, the task is bound to NUMA 0 now.
The reproducer code:
int main()
{
struct bitmask *bmp;
int ret;
bmp = numa_parse_nodestring("1");
ret = set_mempolicy(MPOL_BIND | MPOL_F_NUMA_BALANCING,
bmp->maskp, bmp->size + 1);
if (ret < 0) {
perror("Failed to call set_mempolicy");
exit(-1);
}
while (1);
return 0;
}
If I call set_mempolicy() without MPOL_F_NUMA_BALANCING in the reproducer
code. After step 5, the task is still bound to NUMA 1.
To fix this, only set mempolicy->w.user_nodemask to the user-specified
nodemask if MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES is present.
Link: https://lkml.kernel.org/r/20260120011018.1256654-1-tujinjiang@huawei.com
Link: https://lkml.kernel.org/r/20251223110523.1161421-1-tujinjiang@huawei.com
Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, zsmalloc performs address linearization on read (which
sometimes requires memcpy() to a local buffer). Not all zsmalloc users
need a linear address. For example, Crypto API supports SG-list,
performing linearization under the hood, if needed. In addition, some
compressors can have native SG-list support, completely avoiding the
linearization step.
Provide an SG-list based zsmalloc read API:
- zs_obj_read_sg_begin()
- zs_obj_read_sg_end()
This API allows callers to obtain an SG representation of the object (one
entry for objects that are contained in a single page and two entries for
spanning objects), avoiding the need for a bounce buffer and memcpy.
[senozhatsky@chromium.org: make zs_obj_read_sg_begin() return void, per Yosry]
Link: https://lkml.kernel.org/r/20260117024900.792237-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20260113034645.2729998-1-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm/damon: advance DAMOS-based LRU sorting".
DAMOS_LRU_[DE]PRIO actions were added to DAMOS for more access-aware LRU
lists sorting. For simple usage, a specialized kernel module, namely
DAMON_LRU_SORT, has also been introduced. After the introduction of the
module, DAMON got a few important new features, including the aim-based
quota auto-tuning, age tracking, young page filter, and monitoring
intervals auto-tuning. Meanwhile, DAMOS-based LRU sorting had no direct
updates. Now we show some rooms to advance for DAMOS-based LRU sorting.
Firstly, the aim-oriented quota auto-tuning can simplify the LRU sorting
parameters tuning. But there is no good auto-tuning target metric for LRU
sorting use case. Secondly, the behavior of DAMOS_LRU_[DE]PRIO are not
very symmetric. DAMOS_LRU_DEPRIO directly moves the pages to inactive LRU
list, while DAMOS_LRU_PRIO only marks the page as accessed, so that the
page can not directly but only eventually moved to the active LRU list.
Finally, DAMON_LRU_SORT users cannot utilize the modern features that can
be useful for them, too.
Improve the situation with the following changes. First, introduce a new
DAMOS quota auto-tuning target metric for active:inactive memory size
ratio. Since LRU sorting is a kind of balancing of active and inactive
pages, the active:inactive memory size ratio can be intuitively set.
Second, update DAMOS_LRU_[DE]PRIO behaviors to be more intuitive and
symmetric, by letting them directly move the pages to [in]active LRU list.
Third, update the DAMON_LRU_SORT module user interface to be able to
fully utilize the modern features including the [in]active memory size
ratio-based quota auto-tuning, young page filter, and monitoring intervals
auto-tuning.
With these changes, for example, users can now ask DAMON to "find hot/cold
memory regions with auto-tuned monitoring intervals, do one more page
level access check for found hot/cold memory, and move pages of those to
active or inactive LRU lists accordingly, aiming X:Y active to inactive
memory ratio." For example, if they know 30% of the memory is better to be
protected from reclamation, 30:70 can be set as the target ratio.
Test Results
------------
I ran DAMON_LRU_SORT with the features introduced by this series, on a
real world server workload. For the active:inactive ratio goal, I set
50:50. I confirmed it achieves the target active:inactive ratio, without
manual tuning of the monitoring intervals and the hot/coldness thresholds.
The baseline system that was not running the DAMON_LRU_SORT was keeping
active:inactive ratio of about 1:10.
Note that the test didn't show a clear performance difference, though. I
believe that was mainly because the workload was not very memory
intensive. Also, whether the 50:50 target ratio was optimum is unclear.
Nonetheless, the positive performance impact of the basic LRU sorting idea
is already confirmed with the initial DAMON_LRU_SORT introduction patch
series. The goal of this patch series is simplifying the parameters
tuning of DAMOS-based LRU sorting, and the test confirmed the aimed goals
are achieved.
Patches Sequence
----------------
First three patches extend DAMOS quota auto-tuning to support [in]active
memory ratio target metric type. Those (patches 1-3) introduce new
metrics, implement DAMON sysfs support, and update the documentation,
respectively.
Following patch (patch 4) makes DAMOS_LRU_PRIO action to directly move
target pages to active LRU list, instead of only marking them accessed.
Following seven patches (patches 5-11) updates DAMON_LRU_SORT to support
modern DAMON features. Patch 5 makes it uses not only access frequency
but also age at under-quota regions prioritization. Patches 6-11 add the
support for young page filtering, active:inactive memory ratio based quota
auto-tuning, and monitoring intervals auto-tuning, with appropriate
document updates.
This patch (of 11):
DAMOS_LRU_[DE]PRIO are DAMOS actions for making balance of active and
inactive memory size. There is no appropriate DAMOS quota auto-tuning
target metric for the use case. Add two new DAMOS quota goal metrics for
the purpose, namely DAMOS_QUOTA_[IN]ACTIVE_MEM_BP. Those will represent
the ratio of [in]active memory to total (inactive + active) memory.
Hence, users will be able to ask DAMON to, for example, "find hot and cold
memory, and move pages of those to active and inactive LRU lists,
adjusting the hot/cold thresholds aiming 50:50 active:inactive memory
ratio."
Link: https://lkml.kernel.org/r/20260113152717.70459-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20260113152717.70459-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Introduce cma_alloc_frozen{_compound}() helper to alloc pages without
incrementing their refcount, then convert hugetlb cma to use the
cma_alloc_frozen_compound() and cma_release_frozen() and remove the unused
cma_{alloc,free}_folio(), also move the cma_validate_zones() into
mm/internal.h since no outside user.
The set_pages_refcounted() is only called to set non-compound pages after
above changes, so remove the processing about PageHead.
Link: https://lkml.kernel.org/r/20260109093136.1491549-6-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In order to allocate given range of pages or allocate compound pages
without incrementing their refcount, adding two new helper
alloc_contig_frozen_{range,pages}() which may be beneficial to some users
(eg hugetlb).
The new alloc_contig_{range,pages} only take !__GFP_COMP gfp now, and the
free_contig_range() is refactored to only free non-compound pages, the
only caller to free compound pages in cma_free_folio() is changed
accordingly, and the free_contig_frozen_range() is provided to match the
alloc_contig_frozen_range(), which is used to free frozen pages.
Link: https://lkml.kernel.org/r/20260109093136.1491549-5-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Kill cma_pages_valid() which only used in cma_release(), also cleanup code
duplication between cma pages valid checking and cma memrange finding.
Link: https://lkml.kernel.org/r/20260109093136.1491549-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Factor out the splitting of non-compound page from make_alloc_exact() and
split_page() into a new helper function __split_page().
While at it, convert the VM_BUG_ON_PAGE() into a VM_WARN_ON_PAGE().
Link: https://lkml.kernel.org/r/20260109093136.1491549-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm: hugetlb: allocate frozen gigantic folio", v6.
Introduce alloc_contig_frozen_pages() and cma_alloc_frozen_compound()
which avoid atomic operation about page refcount, and then convert to
allocate frozen gigantic folio by the new helpers in hugetlb to cleanup
the alloc_gigantic_folio().
This patch (of 6):
Add a new helper to free huge page to be consistency to
debug_vm_pgtable_alloc_huge_page(), and use HPAGE_PUD_ORDER instead of
open-code.
Also move the free_contig_range() under CONFIG_ALLOC_CONTIG since all
caller are built with CONFIG_ALLOC_CONTIG.
Link: https://lkml.kernel.org/r/20260109093136.1491549-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Using class->size to detect spanning objects is not entirely correct,
because some size classes can hold a range of object sizes of up to
class->size bytes in length, due to size-classes merge. Such classes use
padding for cases when actually written objects are smaller than
class->size. zs_obj_read_begin() can incorrectly hit the slow path and
perform memcpy of such objects, basically copying padding bytes. Instead
of class->size zs_obj_read_begin() should use the actual compressed object
length (both zram and zswap know it) so that it can correctly handle
situations when a written object is small enough to fit into the first
physical page.
Link: https://lkml.kernel.org/r/20260107052145.3586917-1-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> [zsmalloc & zswap]
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Rename mem_cgroup_ino() to mem_cgroup_id() and mem_cgroup_get_from_ino()
to mem_cgroup_get_from_id(). These functions now use cgroup IDs (from
cgroup_id()) rather than inode numbers, so the names should reflect that.
[shakeel.butt@linux.dev: replace ino with id, per SeongJae]
Link: https://lkml.kernel.org/r/flkqanhyettp5uq22bjwg37rtmnpeg3mghznsylxcxxgaafpl4@nov2x7tagma7
[akpm@linux-foundation.org: build fix]
Link: https://lkml.kernel.org/r/20251225232116.294540-9-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Now that all callers have been converted to use either:
- The private ID APIs (mem_cgroup_private_id/mem_cgroup_from_private_id)
for internal kernel objects that outlive their cgroup
- The public cgroup ID APIs (mem_cgroup_ino/mem_cgroup_get_from_ino)
for external interfaces
Remove the unused wrapper functions mem_cgroup_id() and
mem_cgroup_from_id() along with their !CONFIG_MEMCG stubs.
Link: https://lkml.kernel.org/r/20251225232116.294540-8-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|