kernel/linux.git/drivers/base/cacheinfo.c, branch v6.18.21

cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id

2025-07-16T13:04:27+00:00

Filesystems like resctrl use the cache-id exposed via sysfs to identify groups of CPUs. The value is also used for PCIe cache steering tags. On DT platforms cache-id is not something that is described in the device-tree, but instead generated from the smallest CPU h/w id of the CPUs associated with that cache. CPU h/w ids may be larger than 32 bits. Add a hook to allow architectures to compress the value from the devicetree into 32 bits. Returning the same value is always safe as cache_of_set_id() will stop if a value larger than 32 bits is seen. For example, on arm64 the value is the MPIDR affinity register, which only has 32 bits of affinity data, but spread accross the 64 bit field. An arch-specific bit swizzle gives a 32 bit value. Signed-off-by: James Morse Reviewed-by: Jonathan Cameron Reviewed-by: Gavin Shan Link: https://lore.kernel.org/r/20250711182743.30141-3-james.morse@arm.com Signed-off-by: Greg Kroah-Hartman

cacheinfo: Set cache 'id' based on DT data

2025-07-16T13:04:16+00:00

Use the minimum CPU h/w id of the CPUs associated with the cache for the cache 'id'. This will provide a stable id value for a given system. As we need to check all possible CPUs, we can't use the shared_cpu_map which is just online CPUs. As there's not a cache to CPUs mapping in DT, we have to walk all CPU nodes and then walk cache levels. The cache_id exposed to user-space has historically been 32 bits, and is too late to change. This value is parsed into a u32 by user-space libraries such as libvirt: https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588 Give up on assigning cache-id's if a CPU h/w id greater than 32 bits is found. match_cache_node() does not make use of the __free() cleanup helpers because of_find_next_cache_node(prev) does not drop a reference to prev, and its too easy to accidentally drop the reference on cpu, which belongs to for_each_of_cpu_node(). Cc: "Rafael J. Wysocki" Signed-off-by: Rob Herring [ ben: converted to use the __free cleanup idiom ] Signed-off-by: Ben Horgan [ morse: Add checks to give up if a value larger than 32 bits is seen. ] Signed-off-by: James Morse Reviewed-by: Jonathan Cameron Reviewed-by: Gavin Shan Link: https://lore.kernel.org/r/20250711182743.30141-2-james.morse@arm.com Signed-off-by: Greg Kroah-Hartman

cacheinfo: Allocate memory during CPU hotplug if not done from the primary CPU

2024-12-06T12:07:47+00:00

Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") adds functionality that architectures can use to optionally allocate and build cacheinfo early during boot. Commit 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") lets secondary CPUs correct (and reallocate memory) cacheinfo data if needed. If the early build functionality is not used and cacheinfo does not need correction, memory for cacheinfo is never allocated. x86 does not use the early build functionality. Consequently, during the cacheinfo CPU hotplug callback, last_level_cache_is_valid() attempts to dereference a NULL pointer: BUG: kernel NULL pointer dereference, address: 0000000000000100 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not present page PGD 0 P4D 0 Oops: 0000 [#1] PREEPMT SMP NOPTI CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1 RIP: 0010: last_level_cache_is_valid+0x95/0xe0a Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback if not done earlier. Moreover, before determining the validity of the last-level cache info, ensure that it has been allocated. Simply checking for non-zero cache_leaves() is not sufficient, as some architectures (e.g., Intel processors) have non-zero cache_leaves() before allocation. Dereferencing NULL cacheinfo can occur in update_per_cpu_data_slice_size(). This function iterates over all online CPUs. However, a CPU may have come online recently, but its cacheinfo may not have been allocated yet. While here, remove an unnecessary indentation in allocate_cache_info(). [ bp: Massage. ] Fixes: 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") Signed-off-by: Ricardo Neri Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Radu Rendec Reviewed-by: Nikolay Borisov Reviewed-by: Andreas Herrmann Reviewed-by: Sudeep Holla Cc: stable@vger.kernel.org # 6.3+ Link: https://lore.kernel.org/r/20241128002247.26726-2-ricardo.neri-calderon@linux.intel.com

cacheinfo: Use of_property_present() for non-boolean properties

2024-11-10T09:54:07+00:00

The use of of_property_read_bool() for non-boolean properties is deprecated in favor of of_property_present() when testing for property presence. Signed-off-by: Rob Herring (Arm) Link: https://lore.kernel.org/r/20241104190342.270883-1-robh@kernel.org Signed-off-by: Greg Kroah-Hartman

cacheinfo: Don't opencode per_cpu_cacheinfo()

2024-11-04T00:59:06+00:00

That file contains a local helper that returns ->info_list, just use it. No functional changes. Signed-off-by: Nikolay Borisov Link: https://lore.kernel.org/r/20241023051118.888065-1-nik.borisov@suse.com Signed-off-by: Greg Kroah-Hartman

drivers: cacheinfo: use __free attribute instead of of_node_put()

2024-07-31T11:56:34+00:00

Introduce the __free attribute for scope-based resource management. Resources allocated with __free are automatically released at the end of the scope. This enhancement aims to mitigate memory management issues associated with forgetting to release resources by utilizing __free instead of of_node_put(). To introduce this feature, some modifications to the code structure were necessary. The original pattern: ``` prev = np; while(...) { [...] np = of_find_next_cache_node(np); of_node_put(prev); prev = np; [...] } ``` has been updated to: ``` while(...) { [...] struct device_node __free(device_node) *prev = np; np = of_find_next_cache_node(np) [...] } ``` With this change, the previous node is automatically cleaned up at the end of each iteration, allowing the elimination of all of_node_put() calls and some goto statements. Suggested-by: Julia Lawall Signed-off-by: Vincenzo Mezzela Link: https://lore.kernel.org/r/20240719151335.869145-1-vincenzo.mezzela@gmail.com Signed-off-by: Greg Kroah-Hartman

mm and cache_info: remove unnecessary CPU cache info update

2024-02-22T18:24:41+00:00

For each CPU hotplug event, we will update per-CPU data slice size and corresponding PCP configuration for every online CPU to make the implementation simple. But, Kyle reported that this takes tens seconds during boot on a machine with 34 zones and 3840 CPUs. So, in this patch, for each CPU hotplug event, we only update per-CPU data slice size and corresponding PCP configuration for the CPUs that share caches with the hotplugged CPU. With the patch, the system boot time reduces 67 seconds on the machine. Link: https://lkml.kernel.org/r/20240126081944.414520-1-ying.huang@intel.com Fixes: 362d37a106dd ("mm, pcp: reduce lock contention for draining high-order pages") Signed-off-by: "Huang, Ying" Originally-by: Kyle Meyer Reported-and-tested-by: Kyle Meyer Cc: Sudeep Holla Cc: Mel Gorman Signed-off-by: Andrew Morton

mm, pcp: reduce lock contention for draining high-order pages

2023-10-25T23:47:10+00:00

In commit f26b3fa04611 ("mm/page_alloc: limit number of high-order pages on PCP during bulk free"), the PCP (Per-CPU Pageset) will be drained when PCP is mostly used for high-order pages freeing to improve the cache-hot pages reusing between page allocating and freeing CPUs. On system with small per-CPU data cache slice, pages shouldn't be cached before draining to guarantee cache-hot. But on a system with large per-CPU data cache slice, some pages can be cached before draining to reduce zone lock contention. So, in this patch, instead of draining without any caching, "pcp->batch" pages will be cached in PCP before draining if the size of the per-CPU data cache slice is more than "3 * batch". In theory, if the size of per-CPU data cache slice is more than "2 * batch", we can reuse cache-hot pages between CPUs. But considering the other usage of cache (code, other data accessing, etc.), "3 * batch" is used. Note: "3 * batch" is chosen to make sure the optimization works on recent x86_64 server CPUs. If you want to increase it, please check whether it breaks the optimization. On a 2-socket Intel server with 128 logical CPU, with the patch, the network bandwidth of the UNIX (AF_UNIX) test case of lmbench test suite with 16-pair processes increase 70.5%. The cycles% of the spinlock contention (mostly for zone lock) decreases from 46.1% to 21.3%. The number of PCP draining for high order pages freeing (free_high) decreases 89.9%. The cache miss rate keeps 0.2%. Link: https://lkml.kernel.org/r/20231016053002.756205-4-ying.huang@intel.com Signed-off-by: "Huang, Ying" Acked-by: Mel Gorman Cc: Sudeep Holla Cc: Vlastimil Babka Cc: David Hildenbrand Cc: Johannes Weiner Cc: Dave Hansen Cc: Michal Hocko Cc: Pavel Tatashin Cc: Matthew Wilcox Cc: Christoph Lameter Cc: Arjan van de Ven Signed-off-by: Andrew Morton

cacheinfo: calculate size of per-CPU data cache slice

2023-10-25T23:47:10+00:00

This can be used to estimate the size of the data cache slice that can be used by one CPU under ideal circumstances. Both DATA caches and UNIFIED caches are used in calculation. So, the users need to consider the impact of the code cache usage. Because the cache inclusive/non-inclusive information isn't available now, we just use the size of the per-CPU slice of LLC to make the result more predictable across architectures. This may be improved when more cache information is available in the future. A brute-force algorithm to iterate all online CPUs is used to avoid to allocate an extra cpumask, especially in offline callback. Link: https://lkml.kernel.org/r/20231016053002.756205-3-ying.huang@intel.com Signed-off-by: "Huang, Ying" Acked-by: Mel Gorman Cc: Sudeep Holla Cc: Vlastimil Babka Cc: David Hildenbrand Cc: Johannes Weiner Cc: Dave Hansen Cc: Michal Hocko Cc: Pavel Tatashin Cc: Matthew Wilcox Cc: Christoph Lameter Cc: Arjan van de Ven Signed-off-by: Andrew Morton

drivers: base: cacheinfo: Update cpu_map_populated during CPU Hotplug

2023-05-31T19:36:47+00:00

Until commit 5c2712387d48 ("cacheinfo: Fix LLC is not exported through sysfs"), cacheinfo called populate_cache_leaves() for CPU coming online which let the arch specific functions handle (at least on x86) populating the shared_cpu_map. However, with the changes in the aforementioned commit, populate_cache_leaves() is not called when a CPU comes online as a result of hotplug since last_level_cache_is_valid() returns true as the cacheinfo data is not discarded. The CPU coming online is not present in shared_cpu_map, however, it will not be added since the cpu_cacheinfo->cpu_map_populated flag is set (it is set in populate_cache_leaves() when cacheinfo is first populated for x86) This can lead to inconsistencies in the shared_cpu_map when an offlined CPU comes online again. Example below depicts the inconsistency in the shared_cpu_list in cacheinfo when CPU8 is offlined and onlined again on a 3rd Generation EPYC processor: # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143 # echo 0 > /sys/devices/system/cpu/cpu8/online # echo 1 > /sys/devices/system/cpu/cpu8/online # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8 # cat /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list 136 # cat /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list 9-15,136-143 Clear the flag when the CPU is removed from shared_cpu_map when cache_shared_cpu_map_remove() is called during CPU hotplug. This will allow cache_shared_cpu_map_setup() to add the CPU coming back online in the shared_cpu_map. Set the flag again when the shared_cpu_map is setup. Following are results of performing the same test as described above with the changes: # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143 # echo 0 > /sys/devices/system/cpu/cpu8/online # echo 1 > /sys/devices/system/cpu/cpu8/online # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143 # cat /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list 8,136 # cat /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list 8-15,136-143 Fixes: 5c2712387d48 ("cacheinfo: Fix LLC is not exported through sysfs") Signed-off-by: K Prateek Nayak Reviewed-by: Yicong Yang Reviewed-by: Sudeep Holla Link: https://lore.kernel.org/r/20230508084115.1157-3-kprateek.nayak@amd.com Signed-off-by: Greg Kroah-Hartman