| Age | Commit message (Collapse) | Author | Files | Lines |
|
Slab objects that are allocated with kmalloc_nolock() must be freed
using kfree_nolock() because only a subset of alloc hooks are called,
since kmalloc_nolock() can't spin on a lock during allocation.
This imposes a limitation: such objects cannot be freed with kfree_rcu(),
forcing users to work around this limitation by calling call_rcu()
with a callback that frees the object using kfree_nolock().
Remove this limitation by teaching kmemleak to gracefully ignore cases
when kmemleak_free() or kmemleak_ignore() is called without a prior
kmemleak_alloc().
Unlike kmemleak, kfence already handles this case, because,
due to its design, only a subset of allocations are served from kfence.
With this change, kfree() and kfree_rcu() can be used to free objects
that are allocated using kmalloc_nolock().
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260210044642.139482-2-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
When CONFIG_SLAB_FREELIST_RANDOM is enabled and get_random_u32()
is called in an NMI context, lockdep complains because it acquires
a local_lock:
================================
WARNING: inconsistent lock state
6.19.0-rc5-slab-for-next+ #325 Tainted: G N
--------------------------------
inconsistent {INITIAL USE} -> {IN-NMI} usage.
kunit_try_catch/8312 [HC2[2]:SC0[0]:HE0:SE1] takes:
ffff88a02ec49cc0 (batched_entropy_u32.lock){-.-.}-{3:3}, at: get_random_u32+0x7f/0x2e0
{INITIAL USE} state was registered at:
lock_acquire+0xd9/0x2f0
get_random_u32+0x93/0x2e0
__get_random_u32_below+0x17/0x70
cache_random_seq_create+0x121/0x1c0
init_cache_random_seq+0x5d/0x110
do_kmem_cache_create+0x1e0/0xa30
__kmem_cache_create_args+0x4ec/0x830
create_kmalloc_caches+0xe6/0x130
kmem_cache_init+0x1b1/0x660
mm_core_init+0x1d8/0x4b0
start_kernel+0x620/0xcd0
x86_64_start_reservations+0x18/0x30
x86_64_start_kernel+0xf3/0x140
common_startup_64+0x13e/0x148
irq event stamp: 76
hardirqs last enabled at (75): [<ffffffff8298b77a>] exc_nmi+0x11a/0x240
hardirqs last disabled at (76): [<ffffffff8298b991>] sysvec_irq_work+0x11/0x110
softirqs last enabled at (0): [<ffffffff813b2dda>] copy_process+0xc7a/0x2350
softirqs last disabled at (0): [<0000000000000000>] 0x0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(batched_entropy_u32.lock);
<Interrupt>
lock(batched_entropy_u32.lock);
*** DEADLOCK ***
Fix this by using pseudo-random number generator if !allow_spin.
This means kmalloc_nolock() users won't get truly random numbers,
but there is not much we can do about it.
Note that an NMI handler might interrupt prandom_u32_state() and
change the random state, but that's safe.
Link: https://lore.kernel.org/all/0c33bdee-6de8-4d9f-92ca-4f72c1b6fb9f@suse.cz
Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
Cc: stable@vger.kernel.org
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260210081900.329447-3-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Lockdep complains when get_from_any_partial() is called in an NMI
context, because current->mems_allowed_seq is seqcount_spinlock_t and
not NMI-safe:
================================
WARNING: inconsistent lock state
6.19.0-rc5-kfree-rcu+ #315 Tainted: G N
--------------------------------
inconsistent {INITIAL USE} -> {IN-NMI} usage.
kunit_try_catch/9989 [HC1[1]:SC0[0]:HE0:SE1] takes:
ffff889085799820 (&____s->seqcount#3){.-.-}-{0:0}, at: ___slab_alloc+0x58f/0xc00
{INITIAL USE} state was registered at:
lock_acquire+0x185/0x320
kernel_init_freeable+0x391/0x1150
kernel_init+0x1f/0x220
ret_from_fork+0x736/0x8f0
ret_from_fork_asm+0x1a/0x30
irq event stamp: 56
hardirqs last enabled at (55): [<ffffffff850a68d7>] _raw_spin_unlock_irq+0x27/0x70
hardirqs last disabled at (56): [<ffffffff850858ca>] __schedule+0x2a8a/0x6630
softirqs last enabled at (0): [<ffffffff81536711>] copy_process+0x1dc1/0x6a10
softirqs last disabled at (0): [<0000000000000000>] 0x0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&____s->seqcount#3);
<Interrupt>
lock(&____s->seqcount#3);
*** DEADLOCK ***
According to Documentation/locking/seqlock.rst, seqcount_t is not
NMI-safe and seqcount_latch_t should be used when read path can interrupt
the write-side critical section. In this case, do not access
current->mems_allowed_seq and avoid retry.
Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
Cc: stable@vger.kernel.org
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260210081900.329447-2-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Merge series "slab: replace cpu (partial) slabs with sheaves".
The percpu sheaves caching layer was introduced as opt-in but the goal
was to eventually move all caches to them. This is the next step,
enabling sheaves for all caches (except the two bootstrap ones) and then
removing the per cpu (partial) slabs and lots of associated code.
Besides the lower locking overhead and much more likely fastpath when
freeing, this removes the rather complicated code related to the cpu
slab lockless fastpaths (using this_cpu_try_cmpxchg128/64) and all its
complications for PREEMPT_RT or kmalloc_nolock().
The lockless slab freelist+counters update operation using
try_cmpxchg128/64 remains and is crucial for freeing remote NUMA objects
and to allow flushing objects from sheaves to slabs mostly without the
node list_lock.
Link: https://lore.kernel.org/all/20260123-sheaves-for-all-v4-0-041323d506f7@suse.cz/
|
|
SLAB_NO_OBJ_EXT is set for boot caches, but need_slab_obj_exts() doesn't
check this flag. We should return false unconditionally when
SLAB_NO_OBJ_EXT is set.
Signed-off-by: Hao Li <hao.li@linux.dev>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260205120709.425719-1-hao.li@linux.dev
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
While SLAB_OBJ_EXT_IN_OBJ allows to reduce memory overhead to account
slab objects, it prevents slab merging because merging can change
the metadata layout.
As pointed out Vlastimil Babka, disabling merging solely for this memory
optimization may not be a net win, because disabling slab merging tends
to increase overall memory usage.
Restrict SLAB_OBJ_EXT_IN_OBJ to caches that are already unmergeable for
other reasons (e.g., those with constructors or SLAB_TYPESAFE_BY_RCU).
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260127103151.21883-3-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
When a cache has high s->align value and s->object_size is not aligned
to it, each object ends up with some unused space because of alignment.
If this wasted space is big enough, we can use it to store the
slabobj_ext metadata instead of wasting it.
On my system, this happens with caches like kmem_cache, mm_struct, pid,
task_struct, sighand_cache, xfs_inode, and others.
To place the slabobj_ext metadata within each object, the existing
slab_obj_ext() logic can still be used by setting:
- slab->obj_exts = slab_address(slab) + (slabobj_ext offset)
- stride = s->size
slab_obj_ext() doesn't need know where the metadata is stored,
so this method works without adding extra overhead to slab_obj_ext().
A good example benefiting from this optimization is xfs_inode
(object_size: 992, align: 64). To measure memory savings, 2 millions of
files were created on XFS.
[ MEMCG=y, MEM_ALLOC_PROFILING=n ]
Before patch (creating ~2.64M directories on xfs):
Slab: 5175976 kB
SReclaimable: 3837524 kB
SUnreclaim: 1338452 kB
After patch (creating ~2.64M directories on xfs):
Slab: 5152912 kB
SReclaimable: 3838568 kB
SUnreclaim: 1314344 kB (-23.54 MiB)
Enjoy the memory savings!
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-10-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
To access SLUB's internal implementation details beyond cache flags in
ksize(), move __ksize(), ksize(), and slab_ksize() to mm/slub.c.
[vbabka@suse.cz: also make __ksize() static and move its kerneldoc to
ksize() ]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-9-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The leftover space in a slab is always smaller than s->size, and
kmem caches for large objects that are not power-of-two sizes tend to have
a greater amount of leftover space per slab. In some cases, the leftover
space is larger than the size of the slabobj_ext array for the slab.
An excellent example of such a cache is ext4_inode_cache. On my system,
the object size is 1136, with a preferred order of 3, 28 objects per slab,
and 960 bytes of leftover space per slab.
Since the size of the slabobj_ext array is only 224 bytes (w/o mem
profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
fits within the leftover space.
Allocate the slabobj_exts array from this unused space instead of using
kcalloc() when it is large enough. The array is allocated from unused
space only when creating new slabs, and it doesn't try to utilize unused
space if alloc_slab_obj_exts() is called after slab creation because
implementing lazy allocation involves more expensive synchronization.
The implementation and evaluation of lazy allocation from unused space
is left as future-work. As pointed by Vlastimil Babka [1], it could be
beneficial when a slab cache without SLAB_ACCOUNT can be created, and
some of the allocations from the cache use __GFP_ACCOUNT. For example,
xarray does that.
To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
MEM_ALLOC_PROFILING are not used for the cache, allocate the slabobj_ext
array only when either of them is enabled on slab allocation.
[ MEMCG=y, MEM_ALLOC_PROFILING=n ]
Before patch (creating ~2.64M directories on ext4):
Slab: 4747880 kB
SReclaimable: 4169652 kB
SUnreclaim: 578228 kB
After patch (creating ~2.64M directories on ext4):
Slab: 4724020 kB
SReclaimable: 4169188 kB
SUnreclaim: 554832 kB (-22.84 MiB)
Enjoy the memory savings!
Link: https://lore.kernel.org/linux-mm/48029aab-20ea-4d90-bfd1-255592b2018e@suse.cz [1]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-8-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
In the near future, slabobj_ext may reside outside the allocated slab
object range within a slab, which could be reported as an out-of-bounds
access by KASAN.
As suggested by Andrey Konovalov [1], explicitly disable KASAN and KMSAN
checks when accessing slabobj_ext within slab allocator, memory profiling,
and memory cgroup code. While an alternative approach could be to unpoison
slabobj_ext, out-of-bounds accesses outside the slab allocator are
generally more common.
Move metadata_access_enable()/disable() helpers to mm/slab.h so that
it can be used outside mm/slub.c. However, as suggested by Suren
Baghdasaryan [2], instead of calling them directly from mm code (which is
more prone to errors), change users to access slabobj_ext via get/put
APIs:
- Users should call get_slab_obj_exts() to access slabobj_metadata
and call put_slab_obj_exts() when it's done.
- From now on, accessing it outside the section covered by
get_slab_obj_exts() ~ put_slab_obj_exts() is illegal.
This ensures that accesses to slabobj_ext metadata won't be reported
as access violations.
Call kasan_reset_tag() in slab_obj_ext() before returning the address to
prevent SW or HW tag-based KASAN from reporting false positives.
Suggested-by: Andrey Konovalov <andreyknvl@gmail.com>
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Link: https://lore.kernel.org/linux-mm/CA+fCnZezoWn40BaS3cgmCeLwjT+5AndzcQLc=wH3BjMCu6_YCw@mail.gmail.com [1]
Link: https://lore.kernel.org/linux-mm/CAJuCfpG=Lb4WhYuPkSpdNO4Ehtjm1YcEEK0OM=3g9i=LxmpHSQ@mail.gmail.com [2]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-7-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Use a configurable stride value when accessing slab object extension
metadata instead of assuming a fixed sizeof(struct slabobj_ext).
Store stride value in free bits of slab->counters field. This allows
for flexibility in cases where the extension is embedded within
slab objects.
Since these free bits exist only on 64-bit, any future optimizations
that need to change stride value cannot be enabled on 32-bit architectures.
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-6-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Currently, the slab allocator assumes that slab->obj_exts is a pointer
to an array of struct slabobj_ext objects. However, to support storage
methods where struct slabobj_ext is embedded within objects, the slab
allocator should not make this assumption. Instead of directly
dereferencing the slabobj_exts array, abstract access to
struct slabobj_ext via helper functions.
Introduce a new API slabobj_ext metadata access:
slab_obj_ext(slab, obj_exts, index) - returns the pointer to
struct slabobj_ext element at the given index.
Directly dereferencing the return value of slab_obj_exts() is no longer
allowed. Instead, slab_obj_ext() must always be used to access
individual struct slabobj_ext objects.
Convert all users to use these APIs.
No functional changes intended.
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-5-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Convert ext4_inode_cache to use the kmem_cache_args interface and
specify a free pointer offset.
Since ext4_inode_cache uses a constructor, the free pointer would be
placed after the object to prevent overwriting fields used by the
constructor.
However, some fields such as ->i_flags are not used by the constructor
and can safely be repurposed for the free pointer.
Specify the free pointer offset at i_flags to reduce the object size.
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-4-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
When a slab cache has a constructor, the free pointer is placed after the
object because certain fields must not be overwritten even after the
object is freed.
However, some fields that the constructor does not initialize can safely
be overwritten after free. Allow specifying the free pointer offset within
the object, reducing the overall object size when some fields can be reused
for the free pointer.
Adjust the document accordingly.
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-3-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
When both KASAN and SLAB_STORE_USER are enabled, accesses to
struct kasan_alloc_meta fields can be misaligned on 64-bit architectures.
This occurs because orig_size is currently defined as unsigned int,
which only guarantees 4-byte alignment. When struct kasan_alloc_meta is
placed after orig_size, it may end up at a 4-byte boundary rather than
the required 8-byte boundary on 64-bit systems.
Note that 64-bit architectures without HAVE_EFFICIENT_UNALIGNED_ACCESS
are assumed to require 64-bit accesses to be 64-bit aligned.
See HAVE_64BIT_ALIGNED_ACCESS and commit adab66b71abf ("Revert:
"ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"") for more details.
Change orig_size from unsigned int to unsigned long to ensure proper
alignment for any subsequent metadata. This should not waste additional
memory because kmalloc objects are already aligned to at least
ARCH_KMALLOC_MINALIGN.
Closes: https://lore.kernel.org/all/aPrLF0OUK651M4dk@hyeyoo
Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: stable@vger.kernel.org
Fixes: 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Closes: https://lore.kernel.org/all/aPrLF0OUK651M4dk@hyeyoo/
Link: https://patch.msgid.link/20260113061845.159790-2-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The comments above check_pad_bytes() document the field layout of a
single object. Rewrite them to improve clarity and precision.
Also update an outdated comment in calculate_sizes().
Suggested-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Hao Li <hao.li@linux.dev>
Link: https://patch.msgid.link/20251229122415.192377-1-hao.li@linux.dev
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
When allocating slabobj_ext array in alloc_slab_obj_exts(), the array
can be allocated from the same slab we're allocating the array for.
This led to obj_exts_in_slab() incorrectly returning true [1],
although the array is not allocated from wasted space of the slab.
Vlastimil Babka observed that this problem should be fixed even when
ignoring its incompatibility with obj_exts_in_slab(), because it creates
slabs that are never freed as there is always at least one allocated
object.
To avoid this, use the next kmalloc size or large kmalloc when
the array can be allocated from the same cache we're allocating
the array for.
In case of random kmalloc caches, there are multiple kmalloc caches
for the same size and the cache is selected based on the caller address.
Because it is fragile to ensure the same caller address is passed to
kmalloc_slab(), kmalloc_noprof(), and kmalloc_node_noprof(), bump the
size to (s->object_size + 1) when the sizes are equal, instead of
directly comparing the kmem_cache pointers.
Note that this doesn't happen when memory allocation profiling is
disabled, as when the allocation of the array is triggered by memory
cgroup (KMALLOC_CGROUP), the array is allocated from KMALLOC_NORMAL.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202601231457.f7b31e09-lkp@intel.com [1]
Cc: stable@vger.kernel.org
Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260126125714.88008-1-harry.yoo@oracle.com
Reviewed-by: Hao Li <hao.li@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Kernel test robot has reported a regression in the patch "slab: refill
sheaves from all nodes". When taken in isolation like this, there is
indeed a tradeoff - we prefer to use remote objects prior to allocating
new local slabs. It is replicating a behavior that existed before
sheaves for replenishing cpu (partial) slabs - now called
get_from_any_partial() to allocate a single object.
So the possibility of allocating remote objects is intended even if
remote accesses are then slower. But the profiles in the report also
suggested a contention on the list_lock spinlock. And that's something
we can try to avoid without much tradeoff - if someone else has the
spin_lock, it's more likely they are allocating from the node than
freeing to it, so we can skip it even if it means allocating a new local
slab - contributing to that lock's contention isn't worth it. It should
not result in partial slabs accumulating on the remote node.
Thus add an allow_spin parameter to __refill_objects_node() and
get_partial_node_bulk() to make the attempts from __refill_objects_any()
use only a trylock.
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202601132136.77efd6d7-lkp@intel.com
Link: https://patch.msgid.link/20260129-b4-refill_any_trylock-v1-1-de7420b25840@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
A number of stat items related to cpu slabs became unused, remove them.
Two of those were ALLOC_FASTPATH and FREE_FASTPATH. But instead of
removing those, use them instead of ALLOC_PCS and FREE_PCS, since
sheaves are the new (and only) fastpaths, Remove the recently added
_PCS variants instead.
Change where FREE_SLOWPATH is counted so that it only counts freeing of
objects by slab users that (for whatever reason) do not go to a percpu
sheaf, and not all (including internal) callers of __slab_free(). Thus
sheaf flushing (already counted by SHEAF_FLUSH) does not affect
FREE_SLOWPATH anymore. This matches how ALLOC_SLOWPATH doesn't count
sheaf refills (counted by SHEAF_REFILL).
Reviewed-by: Hao Li <hao.li@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The cpu slabs and their deactivations were removed, so remove the unused
stat items. Weirdly enough the values were also used to control
__add_partial() adding to head or tail of the list, so replace that with
a new enum add_mode, which is cleaner.
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Currently slabs are only frozen after consistency checks failed. This
can happen only in caches with debugging enabled, and those use
free_to_partial_list() for freeing. The non-debug operation of
__slab_free() can thus stop considering the frozen field, and we can
remove the FREE_FROZEN stat.
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The changes related to sheaves made the description of locking and other
details outdated. Update it to reflect current state.
Also add a new copyright line due to major changes.
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
__refill_objects() currently only attempts to get partial slabs from the
local node and then allocates new slab(s). Expand it to trying also
other nodes while observing the remote node defrag ratio, similarly to
get_any_partial().
This will prevent allocating new slabs on a node while other nodes have
many free slabs. It does mean sheaves will contain non-local objects in
that case. Allocations that care about specific node will still be
served appropriately, but might get a slowpath allocation.
Like get_any_partial() we do observe cpuset_zone_allowed(), although we
might be refilling a sheaf that will be then used from a different
allocation context.
We can also use the resulting refill_objects() in
__kmem_cache_alloc_bulk() for non-debug caches. This means
kmem_cache_alloc_bulk() will get better performance when sheaves are
exhausted. kmem_cache_alloc_bulk() cannot indicate a preferred node so
it's compatible with sheaves refill in preferring the local node.
Its users also have gfp flags that allow spinning, so document that
as a requirement.
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Hao Li <hao.li@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The macros slub_get_cpu_ptr()/slub_put_cpu_ptr() are now unused, remove
them. USE_LOCKLESS_FAST_PATH() has lost its true meaning with the code
being removed. The only remaining usage is in fact testing whether we
can assert irqs disabled, because spin_lock_irqsave() only does that on
!RT. Test for CONFIG_PREEMPT_RT instead.
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The cpu slab is not used anymore for allocation or freeing, the
remaining code is for flushing, but it's effectively dead. Remove the
whole struct kmem_cache_cpu, the flushing code and other orphaned
functions.
The remaining used field of kmem_cache_cpu is the stat array with
CONFIG_SLUB_STATS. Put it instead in a new struct kmem_cache_stats.
In struct kmem_cache, the field is cpu_stats and placed near the
end of the struct.
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The kmalloc_nolock() implementation has several complications and
restrictions due to SLUB's cpu slab locking, lockless fastpath and
PREEMPT_RT differences. With cpu slab usage removed, we can simplify
things:
- relax the PREEMPT_RT context checks as they were before commit
99a3e3a1cfc9 ("slab: fix kmalloc_nolock() context check for
PREEMPT_RT") and also reference the explanation comment in the page
allocator
- the local_lock_cpu_slab() macros became unused, remove them
- we no longer need to set up lockdep classes on PREEMPT_RT
- we no longer need to annotate ___slab_alloc as NOKPROBE_SYMBOL
since there's no lockless cpu freelist manipulation anymore
- __slab_alloc_node() can be called from kmalloc_nolock_noprof()
unconditionally. It can also no longer return EBUSY. But trylock
failures can still happen so retry with the larger bucket if the
allocation fails for any reason.
Note that we still need __CMPXCHG_DOUBLE, because while it was removed
we don't use cmpxchg16b on cpu freelist anymore, we still use it on
slab freelist, and the alternative is slab_lock() which can be
interrupted by a nmi. Clarify the comment to mention it specifically.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
There are no more cpu slabs so we don't need their deferred
deactivation. The function is now only used from places where we
allocate a new slab but then can't spin on node list_lock to put it on
the partial list. Instead of the deferred action we can free it directly
via __free_slab(), we just need to tell it to use _nolock() freeing of
the underlying pages and take care of the accounting.
Since free_frozen_pages_nolock() variant does not yet exist for code
outside of the page allocator, create it as a trivial wrapper for
__free_frozen_pages(..., FPI_TRYLOCK).
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
We have removed cpu slab usage from allocation paths. Now remove
do_slab_free() which was freeing objects to the cpu slab when
the object belonged to it. Instead call __slab_free() directly,
which was previously the fallback.
This simplifies kfree_nolock() - when freeing to percpu sheaf
fails, we can call defer_free() directly.
Also remove functions that became unused.
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
We have removed the partial slab usage from allocation paths. Now remove
the whole config option and associated code.
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
We now rely on sheaves as the percpu caching layer and can refill them
directly from partial or newly allocated slabs. Start removing the cpu
(partial) slabs code, first from allocation paths.
This means that any allocation not satisfied from percpu sheaves will
end up in ___slab_alloc(), where we remove the usage of cpu (partial)
slabs, so it will only perform get_partial() or new_slab(). In the
latter case we reuse alloc_from_new_slab() (when we don't use
the debug/tiny alloc_single_from_new_slab() variant).
In get_partial_node() we used to return a slab for freezing as the cpu
slab and to refill the partial slab. Now we only want to return a single
object and leave the slab on the list (unless it became full). We can't
simply reuse alloc_single_from_partial() as that assumes freeing uses
free_to_partial_list(). Instead we need to use __slab_update_freelist()
to work properly against a racing __slab_free().
To reflect the new purpose of get_partial() functions, rename them to
get_from_partial(), get_from_partial_node(), and get_from_any_partial().
The rest of the changes is removing functions that no longer have any
callers.
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
At this point we have sheaves enabled for all caches, but their refill
is done via __kmem_cache_alloc_bulk() which relies on cpu (partial)
slabs - now a redundant caching layer that we are about to remove.
The refill will thus be done from slabs on the node partial list.
Introduce new functions that can do that in an optimized way as it's
easier than modifying the __kmem_cache_alloc_bulk() call chain.
Introduce struct partial_bulk_context, a variant of struct
partial_context that can return a list of slabs from the partial list
with the sum of free objects in them within the requested min and max.
Introduce get_partial_node_bulk() that removes the slabs from freelist
and returns them in the list. There is a racy read of slab->counters
so make sure the non-atomic write in __update_freelist_slow() is not
tearing.
Introduce get_freelist_nofreeze() which grabs the freelist without
freezing the slab.
Introduce alloc_from_new_slab() which can allocate multiple objects from
a newly allocated slab where we don't need to synchronize with freeing.
In some aspects it's similar to alloc_single_from_new_slab() but assumes
the cache is a non-debug one so it can avoid some actions. It supports
the allow_spin parameter, which we always set true here, but the
followup change will reuse the function in a context where it may be
false.
Introduce __refill_objects() that uses the functions above to fill an
array of objects. It has to handle the possibility that the slabs will
contain more objects that were requested, due to concurrent freeing of
objects to those slabs. When no more slabs on partial lists are
available, it will allocate new slabs. It is intended to be only used
in context where spinning is allowed, so add a WARN_ON_ONCE check there.
Finally, switch refill_sheaf() to use __refill_objects(). Sheaves are
only refilled from contexts that allow spinning, or even blocking.
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Enable sheaves for kmalloc caches. For other types than KMALLOC_NORMAL,
we can simply allow them in calculate_sizes() as they are created later
than KMALLOC_NORMAL caches and can allocate sheaves and barns from
those.
For KMALLOC_NORMAL caches we perform additional step after first
creating them without sheaves. Then bootstrap_cache_sheaves() simply
allocates and initializes barns and sheaves and finally sets
s->sheaf_capacity to make them actually used.
Afterwards the only caches left without sheaves (unless SLUB_TINY or
debugging is enabled) are kmem_cache and kmem_cache_node. These are only
used when creating or destroying other kmem_caches. Thus they are not
performance critical and we can simply leave it that way.
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Before we enable percpu sheaves for kmalloc caches, we need to make sure
kmalloc_nolock() and kfree_nolock() will continue working properly and
not spin when not allowed to.
Percpu sheaves themselves use local_trylock() so they are already
compatible. We just need to be careful with the barn->lock spin_lock.
Pass a new allow_spin parameter where necessary to use
spin_trylock_irqsave().
In kmalloc_nolock_noprof() we can now attempt alloc_from_pcs() safely,
for now it will always fail until we enable sheaves for kmalloc caches
next. Similarly in kfree_nolock() we can attempt free_to_pcs().
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Until now, kmem_cache->cpu_sheaves was !NULL only for caches with
sheaves enabled. Since we want to enable them for almost all caches,
it's suboptimal to test the pointer in the fast paths, so instead
allocate it for all caches in do_kmem_cache_create(). Instead of testing
the cpu_sheaves pointer to recognize caches (yet) without sheaves, test
kmem_cache->sheaf_capacity for being 0, where needed, using a new
cache_has_sheaves() helper.
However, for the fast paths sake we also assume that the main sheaf
always exists (pcs->main is !NULL), and during bootstrap we cannot
allocate sheaves yet.
Solve this by introducing a single static bootstrap_sheaf that's
assigned as pcs->main during bootstrap. It has a size of 0, so during
allocations, the fast path will find it's empty. Since the size of 0
matches sheaf_capacity of 0, the freeing fast paths will find it's
"full". In the slow path handlers, we use cache_has_sheaves() to
recognize that the cache doesn't (yet) have real sheaves, and fall back.
Thus sharing the single bootstrap sheaf like this for multiple caches
and cpus is safe.
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
In the first step to replace cpu (partial) slabs with sheaves, enable
sheaves for almost all caches. Treat args->sheaf_capacity as a minimum,
and calculate sheaf capacity with a formula that roughly follows the
formula for number of objects in cpu partial slabs in set_cpu_partial().
This should achieve roughly similar contention on the barn spin lock as
there's currently for node list_lock without sheaves, to make
benchmarking results comparable. It can be further tuned later.
Don't enable sheaves for bootstrap caches as that wouldn't work. In
order to recognize them by SLAB_NO_OBJ_EXT, make sure the flag exists
even for !CONFIG_SLAB_OBJ_EXT.
This limitation will be lifted for kmalloc caches after the necessary
bootstrapping changes.
Also do not enable sheaves for SLAB_NOLEAKTRACE caches to avoid
recursion with kmemleak tracking (thanks to Breno Leitao).
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Hao Li <hao.li@linux.dev>
Tested-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
When __pcs_replace_empty_main() fails to obtain a full sheaf directly
from the barn, it may either:
- Refill an empty sheaf obtained via barn_get_empty_sheaf(), or
- Allocate a brand new full sheaf via alloc_full_sheaf().
After reacquiring the per-CPU lock, if pcs->main is still empty and
pcs->spare is NULL, the current code donates the empty main sheaf to
the barn via barn_put_empty_sheaf() and installs the full sheaf as
pcs->main, leaving pcs->spare unpopulated.
Instead, keep the existing empty main sheaf locally as the spare:
pcs->spare = pcs->main;
pcs->main = full;
This populates pcs->spare earlier, which can reduce future barn traffic.
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Hao Li <haolee.swjtu@gmail.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
|
|
slab_mergeable() determines whether a slab cache can be merged, but it
should not be used when the cache is not fully created yet.
Extract the pre-cache-creation mergeability checks into
slab_args_unmergeable(), which evaluates kmem_cache_args, slab flags,
and slab_nomerge to determine if a cache will be mergeable before it is
created.
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260127103151.21883-2-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Before enabling sheaves for all caches (with automatically determined
capacity), their enablement should no longer prevent merging of caches.
Limit this merge prevention only to caches that were created with a
specific sheaf capacity, by adding the SLAB_NO_MERGE flag to them.
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Move __kmem_cache_alias() to slab_common.c since it's called by
__kmem_cache_create_args() and calls find_mergeable() that both
are in this file. We can remove two slab.h declarations and make
them static. Instead declare sysfs_slab_alias() from slub.c so
that __kmem_cache_alias() can keep calling it.
Add args parameter to __kmem_cache_alias() and find_mergeable() instead
of align and ctor. With that we can also move the checks for usersize
and sheaf_capacity there from __kmem_cache_create_args() and make the
result more symmetric with slab_unmergeable().
No functional changes intended.
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
All the debug flags prevent merging, except SLAB_CONSISTENCY_CHECKS. This
is suboptimal because this flag (like any debug flags) prevents the
usage of any fastpaths, and thus affect performance of any aliased
cache. Also the objects from an aliased cache than the one specified for
debugging could also interfere with the debugging efforts.
Fix this by adding the whole SLAB_DEBUG_FLAGS collection to
SLAB_NEVER_MERGE instead of individual debug flags, so it now also
includes SLAB_CONSISTENCY_CHECKS.
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
kvfree_call_rcu() can be called while holding a raw_spinlock_t.
Since __kfree_rcu_sheaf() may acquire a spinlock_t (which becomes a
sleeping lock on PREEMPT_RT) and violate lock nesting rules,
kvfree_call_rcu() bypasses the sheaves layer entirely on PREEMPT_RT.
However, lockdep still complains about acquiring spinlock_t while holding
raw_spinlock_t, even on !PREEMPT_RT where spinlock_t is a spinning lock.
This causes a false lockdep warning [1]:
=============================
[ BUG: Invalid wait context ]
6.19.0-rc6-next-20260120 #21508 Not tainted
-----------------------------
migration/1/23 is trying to lock:
ffff8afd01054e98 (&barn->lock){..-.}-{3:3}, at: barn_get_empty_sheaf+0x1d/0xb0
other info that might help us debug this:
context-{5:5}
3 locks held by migration/1/23:
#0: ffff8afd01fd89a8 (&p->pi_lock){-.-.}-{2:2}, at: __balance_push_cpu_stop+0x3f/0x200
#1: ffffffff9f15c5c8 (rcu_read_lock){....}-{1:3}, at: cpuset_cpus_allowed_fallback+0x27/0x250
#2: ffff8afd1f470be0 ((local_lock_t *)&pcs->lock){+.+.}-{3:3}, at: __kfree_rcu_sheaf+0x52/0x3d0
stack backtrace:
CPU: 1 UID: 0 PID: 23 Comm: migration/1 Not tainted 6.19.0-rc6-next-20260120 #21508 PREEMPTLAZY
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Stopper: __balance_push_cpu_stop+0x0/0x200 <- balance_push+0x118/0x170
Call Trace:
<TASK>
__dump_stack+0x22/0x30
dump_stack_lvl+0x60/0x80
dump_stack+0x19/0x24
__lock_acquire+0xd3a/0x28e0
? __lock_acquire+0x5a9/0x28e0
? __lock_acquire+0x5a9/0x28e0
? barn_get_empty_sheaf+0x1d/0xb0
lock_acquire+0xc3/0x270
? barn_get_empty_sheaf+0x1d/0xb0
? __kfree_rcu_sheaf+0x52/0x3d0
_raw_spin_lock_irqsave+0x47/0x70
? barn_get_empty_sheaf+0x1d/0xb0
barn_get_empty_sheaf+0x1d/0xb0
? __kfree_rcu_sheaf+0x52/0x3d0
__kfree_rcu_sheaf+0x19f/0x3d0
kvfree_call_rcu+0xaf/0x390
set_cpus_allowed_force+0xc8/0xf0
[...]
</TASK>
This wasn't triggered until sheaves were enabled for all slab caches,
since kfree_rcu() wasn't being called with a raw spinlock held for
caches with sheaves (vma, maple node).
As suggested by Vlastimil Babka, fix this by using a lockdep map with
LD_WAIT_CONFIG wait type to tell lockdep that acquiring spinlock_t is valid
in this case, as those spinlocks won't be used on PREEMPT_RT.
Note that kfree_rcu_sheaf_map should be acquired using _try() variant,
otherwise the acquisition of the lockdep map itself will trigger an invalid
wait context warning.
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Closes: https://lore.kernel.org/linux-mm/c858b9af-2510-448b-9ab3-058f7b80dd42@paulmck-laptop [1]
Fixes: ec66e0d59952 ("slab: add sheaf support for batching kfree_rcu() operations")
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
After we submit the rcu_free sheaves to call_rcu() we need to make sure
the rcu callbacks complete. kvfree_rcu_barrier() does that via
flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
that.
This currently causes no issues because the caches with sheaves we have
are never destroyed. The problem flagged by kernel test robot was
reported for a patch that enables sheaves for (almost) all caches, and
occurred only with CONFIG_KASAN. Harry Yoo found the root cause [1]:
It turns out the object freed by sheaf_flush_unused() was in KASAN
percpu quarantine list (confirmed by dumping the list) by the time
__kmem_cache_shutdown() returns an error.
Quarantined objects are supposed to be flushed by kasan_cache_shutdown(),
but things go wrong if the rcu callback (rcu_free_sheaf_nobarn()) is
processed after kasan_cache_shutdown() finishes.
That's why rcu_barrier() in __kmem_cache_shutdown() didn't help,
because it's called after kasan_cache_shutdown().
Calling rcu_barrier() in kvfree_rcu_barrier_on_cache() guarantees
that it'll be added to the quarantine list before kasan_cache_shutdown()
is called. So it's a valid fix!
[1] https://lore.kernel.org/all/aWd6f3jERlrB5yeF@hyeyoo/
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
Cc: stable@vger.kernel.org
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Tested-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Eric Dumazet has noticed cache_from_obj() is not inlined with clang and
suggested splitting it into two functions, where the smaller inlined one
assumes the fastpath is !CONFIG_SLAB_FREELIST_HARDENED. However most
distros enable it these days and so this would likely add a function
call to the object free fastpaths.
Instead take a step back and consider that cache_from_obj() is a relict
from when memcgs created their separate kmem_cache copies, as the
outdated comment in build_detached_freelist() reminds us.
Meanwhile hardening/debugging had reused cache_from_obj() to validate
that the freed object really belongs to a slab from the cache we think
we are freeing from.
In build_detached_freelist() simply remove this, because it did not
handle the NULL result from cache_from_obj() failure properly, nor
validate objects (for the NULL slab->slab_cache pointer) when called via
kfree_bulk(). If anyone is motivated to implement it properly, it should
be possible in a similar way to kmem_cache_free().
In kmem_cache_free(), do the hardening/debugging checks directly so they
are inlined by definition and virt_to_slab(obj) is performed just once.
In case they failed, call a newly introduced warn_free_bad_obj() that
performs the warnings outside of the fastpath, and leak the object.
As an intentional change, leak the object when slab->slab_cache differs
from the cache given to kmem_cache_free(). Previously we would only leak
when the object is not in a valid slab page or the slab->slab_cache
pointer is NULL, and otherwise trust the slab->slab_cache over the
kmem_cache_free() argument. But if those differ, it means something went
wrong enough that it's best not to continue freeing.
As a result the fastpath should be inlined in all configs and the
warnings are moved away.
Reported-by: Eric Dumazet <edumazet@google.com>
Closes: https://lore.kernel.org/all/20260115130642.3419324-1-edumazet@google.com/
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Hao Li <hao.li@linux.dev>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
On PREEMPT_RT kernels, local_lock becomes a sleeping lock. The current
check in kmalloc_nolock() only verifies we're not in NMI or hard IRQ
context, but misses the case where preemption is disabled.
When a BPF program runs from a tracepoint with preemption disabled
(preempt_count > 0), kmalloc_nolock() proceeds to call
local_lock_irqsave() which attempts to acquire a sleeping lock,
triggering:
BUG: sleeping function called from invalid context
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6128
preempt_count: 2, expected: 0
Fix this by checking !preemptible() on PREEMPT_RT, which directly
expresses the constraint that we cannot take a sleeping lock when
preemption is disabled. This encompasses the previous checks for NMI
and hard IRQ contexts while also catching cases where preemption is
disabled.
Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
Reported-by: syzbot+b1546ad4a95331b2101e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b1546ad4a95331b2101e
Signed-off-by: Swaraj Gaikwad <swarajgaikwad1925@gmail.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113150639.48407-1-swarajgaikwad1925@gmail.co
Cc: <stable@vger.kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library fixes from Eric Biggers:
- A couple more fixes for the lib/crypto KUnit tests
- Fix missing MMU protection for the AES S-box
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: aes: Fix missing MMU protection for AES S-box
MAINTAINERS: add test vector generation scripts to "CRYPTO LIBRARY"
lib/crypto: tests: Fix syntax error for old python versions
lib/crypto: tests: polyval_kunit: Increase iterations for preparekey in IRQs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for some reported issues.
Included in here is:
- much reported rust_binder fix
- counter driver fixes
- new device ids for the mei driver
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
rust_binder: remove spin_lock() in rust_shrink_free_page()
mei: me: add nova lake point S DID
counter: 104-quad-8: Fix incorrect return value in IRQ handler
counter: interrupt-cnt: Drop IRQF_NO_THREAD flag
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"Disable GCOV instrumentation in the SEV noinstr.c collection of SEV
noinstr methods, to further robustify the code"
* tag 'x86-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Disable GCOV on noinstr object
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a crash in sched_mm_cid_after_execve()"
* tag 'sched-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/mm_cid: Prevent NULL mm dereference in sched_mm_cid_after_execve()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fix from Ingo Molnar:
"Fix perf swevent hrtimer deinit regression"
* tag 'perf-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Ensure swevent hrtimer is properly destroyed
|