kernel/linux.git/mm/slab.c, branch v6.6.133

Randomized slab caches for kmalloc()

2023-07-18T08:07:47+00:00

When exploiting memory vulnerabilities, "heap spraying" is a common technique targeting those related to dynamic memory allocation (i.e. the "heap"), and it plays an important role in a successful exploitation. Basically, it is to overwrite the memory area of vulnerable object by triggering allocation in other subsystems or modules and therefore getting a reference to the targeted memory location. It's usable on various types of vulnerablity including use after free (UAF), heap out- of-bound write and etc. There are (at least) two reasons why the heap can be sprayed: 1) generic slab caches are shared among different subsystems and modules, and 2) dedicated slab caches could be merged with the generic ones. Currently these two factors cannot be prevented at a low cost: the first one is a widely used memory allocation mechanism, and shutting down slab merging completely via `slub_nomerge` would be overkill. To efficiently prevent heap spraying, we propose the following approach: to create multiple copies of generic slab caches that will never be merged, and random one of them will be used at allocation. The random selection is based on the address of code that calls `kmalloc()`, which means it is static at runtime (rather than dynamically determined at each time of allocation, which could be bypassed by repeatedly spraying in brute force). In other words, the randomness of cache selection will be with respect to the code address rather than time, i.e. allocations in different code paths would most likely pick different caches, although kmalloc() at each place would use the same cache copy whenever it is executed. In this way, the vulnerable object and memory allocated in other subsystems and modules will (most probably) be on different slab caches, which prevents the object from being sprayed. Meanwhile, the static random selection is further enhanced with a per-boot random seed, which prevents the attacker from finding a usable kmalloc that happens to pick the same cache with the vulnerable subsystem/module by analyzing the open source code. In other words, with the per-boot seed, the random selection is static during each time the system starts and runs, but not across different system startups. The overhead of performance has been tested on a 40-core x86 server by comparing the results of `perf bench all` between the kernels with and without this patch based on the latest linux-next kernel, which shows minor difference. A subset of benchmarks are listed below: sched/ sched/ syscall/ mem/ mem/ messaging pipe basic memcpy memset (sec) (sec) (sec) (GB/sec) (GB/sec) control1 0.019 5.459 0.733 15.258789 51.398026 control2 0.019 5.439 0.730 16.009221 48.828125 control3 0.019 5.282 0.735 16.009221 48.828125 control_avg 0.019 5.393 0.733 15.759077 49.684759 experiment1 0.019 5.374 0.741 15.500992 46.502976 experiment2 0.019 5.440 0.746 16.276042 51.398026 experiment3 0.019 5.242 0.752 15.258789 51.398026 experiment_avg 0.019 5.352 0.746 15.678608 49.766343 The overhead of memory usage was measured by executing `free` after boot on a QEMU VM with 1GB total memory, and as expected, it's positively correlated with # of cache copies: control 4 copies 8 copies 16 copies total 969.8M 968.2M 968.2M 968.2M used 20.0M 21.9M 24.1M 26.7M free 936.9M 933.6M 931.4M 928.6M available 932.2M 928.8M 926.6M 923.9M Co-developed-by: Xiu Jianfeng Signed-off-by: Xiu Jianfeng Signed-off-by: GONG, Ruiqi Reviewed-by: Kees Cook Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Dennis Zhou # percpu Signed-off-by: Vlastimil Babka

Merge tag 'slab-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

2023-06-29T23:34:12+00:00

Pull slab updates from Vlastimil Babka: - SLAB deprecation: Following the discussion at LSF/MM 2023 [1] and no objections, the SLAB allocator is deprecated by renaming the config option (to make its users notice) to CONFIG_SLAB_DEPRECATED with updated help text. SLUB should be used instead. Existing defconfigs with CONFIG_SLAB are also updated. - SLAB_NO_MERGE kmem_cache flag (Jesper Dangaard Brouer): There are (very limited) cases where kmem_cache merging is undesirable, and existing ways to prevent it are hacky. Introduce a new flag to do that cleanly and convert the existing hacky users. Btrfs plans to use this for debug kernel builds (that use case is always fine), networking for performance reasons (that should be very rare). - Replace the usage of weak PRNGs (David Keisar Schmidt): In addition to using stronger RNGs for the security related features, the code is a bit cleaner. - Misc code cleanups (SeongJae Parki, Xiongwei Song, Zhen Lei, and zhaoxinchao) Link: https://lwn.net/Articles/932201/ [1] * tag 'slab-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab_common: use SLAB_NO_MERGE instead of negative refcount mm/slab: break up RCU readers on SLAB_TYPESAFE_BY_RCU example code mm/slab: add a missing semicolon on SLAB_TYPESAFE_BY_RCU example code mm/slab_common: reduce an if statement in create_cache() mm/slab: introduce kmem_cache flag SLAB_NO_MERGE mm/slab: rename CONFIG_SLAB to CONFIG_SLAB_DEPRECATED mm/slab: remove HAVE_HARDENED_USERCOPY_ALLOCATOR mm/slab_common: Replace invocation of weak PRNG mm/slab: Replace invocation of weak PRNG slub: Don't read nr_slabs and total_objects directly slub: Remove slabs_node() function slub: Remove CONFIG_SMP defined check slub: Put objects_show() into CONFIG_SLUB_DEBUG enabled block slub: Correct the error code when slab_kset is NULL mm/slab: correct return values in comment for _kmem_cache_create()

mm/slab: simplify create_kmalloc_cache() args and make it static

2023-06-19T23:19:20+00:00

In the slab variant of kmem_cache_init(), call new_kmalloc_cache() instead of initialising the kmalloc_caches array directly. With this, create_kmalloc_cache() is now only called from new_kmalloc_cache() in the same file, so make it static. In addition, the useroffset argument is always 0 while usersize is the same as size. Remove them. Link: https://lkml.kernel.org/r/20230612153201.554742-4-catalin.marinas@arm.com Signed-off-by: Catalin Marinas Reviewed-by: Vlastimil Babka Tested-by: Isaac J. Manjarres Cc: Alasdair Kergon Cc: Ard Biesheuvel Cc: Arnd Bergmann Cc: Christoph Hellwig Cc: Daniel Vetter Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Jerry Snitselaar Cc: Joerg Roedel Cc: Jonathan Cameron Cc: Jonathan Cameron Cc: Lars-Peter Clausen Cc: Logan Gunthorpe Cc: Marc Zyngier Cc: Mark Brown Cc: Mike Snitzer Cc: "Rafael J. Wysocki" Cc: Robin Murphy Cc: Saravana Kannan Cc: Will Deacon Signed-off-by: Andrew Morton

Merge branches 'slab/for-6.5/prandom', 'slab/for-6.5/slab_no_merge' and 'slab/for-6.5/slab-deprecate' into slab/for-next

2023-06-16T09:05:59+00:00

Merge the feature branches scheduled for 6.5: - replace the usage of weak PRNGs, by David Keisar Schmidt - introduce the SLAB_NO_MERGE kmem_cache flag, by Jesper Dangaard Brouer - deprecate CONFIG_SLAB, with a planned removal, by myself

mm/slab: Replace invocation of weak PRNG

2023-05-22T13:22:08+00:00

The Slab allocator randomization uses the prandom_u32 PRNG. That was added to prevent attackers to obtain information on the heap state, by randomizing the freelists state. However, this PRNG turned out to be weak, as noted in commit c51f8f88d705 To fix it, we have changed the invocation of prandom_u32_state to get_random_u32 to ensure the PRNG is strong. Since a modulo operation is applied right after that, we used get_random_u32_below, to achieve uniformity. In addition, we changed the freelist_init_state union to struct, since the rnd_state inside which is used to store the state of prandom_u32, is not needed anymore, since get_random_u32 maintains its own state. Signed-off-by: David Keisar Schmidt Signed-off-by: Vlastimil Babka

mm/slab: correct return values in comment for _kmem_cache_create()

2023-05-22T13:17:19+00:00

__kmem_cache_create() returns 0 on success and non-zero on failure. The comment is wrong in two instances, so fix the first one and remove the second one. Also make the comment non-doc, because it doesn't describe an API function, but SLAB-specific implementation. Signed-off-by: zhaoxinchao Signed-off-by: Vlastimil Babka

mm: vmscan: refactor updating current->reclaim_state

2023-04-18T23:30:10+00:00

During reclaim, we keep track of pages reclaimed from other means than LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab, which we stash a pointer to in current task_struct. However, we keep track of more than just reclaimed slab pages through this. We also use it for clean file pages dropped through pruned inodes, and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add a helper function that wraps updating it through current, so that future changes to this logic are contained within include/linux/swap.h. Link: https://lkml.kernel.org/r/20230413104034.1086717-4-yosryahmed@google.com Signed-off-by: Yosry Ahmed Acked-by: Michal Hocko Cc: Alexander Viro Cc: Christoph Lameter Cc: Darrick J. Wong Cc: Dave Chinner Cc: David Hildenbrand Cc: David Rientjes Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner Cc: Joonsoo Kim Cc: Matthew Wilcox Cc: Miaohe Lin Cc: NeilBrown Cc: Peter Xu Cc: Roman Gushchin Cc: Shakeel Butt Cc: Tim Chen Cc: Vlastimil Babka Cc: Yu Zhao Signed-off-by: Andrew Morton

mm, treewide: redefine MAX_ORDER sanely

2023-04-06T02:42:46+00:00

MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov Reviewed-by: Michael Ellerman [powerpc] Cc: "Kirill A. Shutemov" Cc: Zi Yan Signed-off-by: Andrew Morton

Merge tag 'slab-fix-for-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

2023-03-24T17:12:14+00:00

Pull slab fix from Vlastimil Babka: "A single build fix for a corner case configuration that is apparently possible to achieve on some arches, from Geert" * tag 'slab-fix-for-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP

mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP

2023-03-22T11:11:43+00:00

sh/migor_defconfig: mm/slab.c: In function ‘slab_memory_callback’: mm/slab.c:1127:23: error: implicit declaration of function ‘init_cache_node_node’; did you mean ‘drain_cache_node_node’? [-Werror=implicit-function-declaration] 1127 | ret = init_cache_node_node(nid); | ^~~~~~~~~~~~~~~~~~~~ | drain_cache_node_node The #ifdef condition protecting the definition of init_cache_node_node() no longer matches the conditions protecting the (multiple) users. Fix this by syncing the conditions. Fixes: 76af6a054da40553 ("mm/migrate: add CPU hotplug to demotion #ifdef") Reported-by: Randy Dunlap Link: https://lore.kernel.org/r/b5bdea22-ed2f-3187-6efe-0c72330270a4@infradead.org Signed-off-by: Geert Uytterhoeven Reviewed-by: John Paul Adrian Glaubitz Acked-by: Randy Dunlap Cc: Signed-off-by: Vlastimil Babka