kernel/linux.git/lib/debugobjects.c, branch v6.18.21

debugobject: Make it work with deferred page initialization - again

2026-03-12T11:09:08+00:00

[ Upstream commit fd3634312a04f336dcbfb481060219f0cd320738 ] debugobjects uses __GFP_HIGH for allocations as it might be invoked within locked regions. That worked perfectly fine until v6.18. It still works correctly when deferred page initialization is disabled and works by chance when no page allocation is required before deferred page initialization has completed. Since v6.18 allocations w/o a reclaim flag cause new_slab() to end up in alloc_frozen_pages_nolock_noprof(), which returns early when deferred page initialization has not yet completed. As the deferred page initialization takes quite a while the debugobject pool is depleted and debugobjects are disabled. This can be worked around when PREEMPT_COUNT is enabled as that allows debugobjects to add __GFP_KSWAPD_RECLAIM to the GFP flags when the context is preemtible. When PREEMPT_COUNT is disabled the context is unknown and the reclaim bit can't be set because the caller might hold locks which might deadlock in the allocator. In preemptible context the reclaim bit is harmless and not a performance issue as that's usually invoked from slow path initialization context. That makes debugobjects depend on PREEMPT_COUNT || !DEFERRED_STRUCT_PAGE_INIT. Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") Signed-off-by: Thomas Gleixner Tested-by: Sebastian Andrzej Siewior Acked-by: Alexei Starovoitov Acked-by: Vlastimil Babka Link: https://patch.msgid.link/87pl6gznti.ffs@tglx Signed-off-by: Sasha Levin

debugobjects: Track object usage to avoid premature freeing of objects

2024-10-15T15:30:33+00:00

The freelist is freed at a constant rate independent of the actual usage requirements. That's bad in scenarios where usage comes in bursts. The end of a burst puts the objects on the free list and freeing proceeds even when the next burst which requires objects started again. Keep track of the usage with a exponentially wheighted moving average and take that into account in the worker function which frees objects from the free list. This further reduces the kmem_cache allocation/free rate for a full kernel compile: kmem_cache_alloc() kmem_cache_free() Baseline: 225k 173k Usage: 170k 117k Signed-off-by: Thomas Gleixner Reviewed-by: Zhen Lei Link: https://lore.kernel.org/all/87bjznhme2.ffs@tglx

debugobjects: Refill per CPU pool more agressively

2024-10-15T15:30:33+00:00

Right now the per CPU pools are only refilled when they become empty. That's suboptimal especially when there are still non-freed objects in the to free list. Check whether an allocation from the per CPU pool emptied a batch and try to allocate from the free pool if that still has objects available. kmem_cache_alloc() kmem_cache_free() Baseline: 295k 245k Refill: 225k 173k Signed-off-by: Thomas Gleixner Reviewed-by: Zhen Lei Link: https://lore.kernel.org/all/20241007164914.439053085@linutronix.de

debugobjects: Double the per CPU slots

2024-10-15T15:30:33+00:00

In situations where objects are rapidly allocated from the pool and handed back, the size of the per CPU pool turns out to be too small. Double the size of the per CPU pool. This reduces the kmem cache allocation and free operations during a kernel compile: alloc free Baseline: 380k 330k Double size: 295k 245k Especially the reduction of allocations is important because that happens in the hot path when objects are initialized. The maximum increase in per CPU pool memory consumption is about 2.5K per online CPU, which is acceptable. Signed-off-by: Thomas Gleixner Reviewed-by: Zhen Lei Link: https://lore.kernel.org/all/20241007164914.378676302@linutronix.de

debugobjects: Move pool statistics into global_pool struct

2024-10-15T15:30:33+00:00

Keep it along with the pool as that's a hot cache line anyway and it makes the code more comprehensible. Signed-off-by: Thomas Gleixner Reviewed-by: Zhen Lei Link: https://lore.kernel.org/all/20241007164914.318776207@linutronix.de

debugobjects: Implement batch processing

2024-10-15T15:30:33+00:00

Adding and removing single objects in a loop is bad in terms of lock contention and cache line accesses. To implement batching, record the last object in a batch in the object itself. This is trivialy possible as hlists are strictly stacks. At a batch boundary, when the first object is added to the list the object stores a pointer to itself in debug_obj::batch_last. When the next object is added to the list then the batch_last pointer is retrieved from the first object in the list and stored in the to be added one. That means for batch processing the first object always has a pointer to the last object in a batch, which allows to move batches in a cache line efficient way and reduces the lock held time. Signed-off-by: Thomas Gleixner Reviewed-by: Zhen Lei Link: https://lore.kernel.org/all/20241007164914.258995000@linutronix.de

debugobjects: Prepare kmem_cache allocations for batching

2024-10-15T15:30:32+00:00

Allocate a batch and then push it into the pool. Utilize the debug_obj::last_node pointer for keeping track of the batch boundary. Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/all/20241007164914.198647184@linutronix.de

debugobjects: Prepare for batching

2024-10-15T15:30:32+00:00

Move the debug_obj::object pointer into a union and add a pointer to the last node in a batch. That allows to implement batch processing efficiently by utilizing the stack property of hlist: When the first object of a batch is added to the list, then the batch pointer is set to the hlist node of the object itself. Any subsequent add retrieves the pointer to the last node from the first object in the list and uses that for storing the last node pointer in the newly added object. Add the pointer to the data structure and ensure that all relevant pool sizes are strictly batch sized. The actual batching implementation follows in subsequent changes. Signed-off-by: Thomas Gleixner Reviewed-by: Zhen Lei Link: https://lore.kernel.org/all/20241007164914.139204961@linutronix.de

debugobjects: Use static key for boot pool selection

2024-10-15T15:30:32+00:00

Get rid of the conditional in the hot path. Signed-off-by: Thomas Gleixner Reviewed-by: Zhen Lei Link: https://lore.kernel.org/all/20241007164914.077247071@linutronix.de

debugobjects: Rework free_object_work()

2024-10-15T15:30:32+00:00

Convert it to batch processing with intermediate helper functions. This reduces the final changes for batch processing. Signed-off-by: Thomas Gleixner Reviewed-by: Zhen Lei Link: https://lore.kernel.org/all/20241007164914.015906394@linutronix.de