kernel/linux.git/ipc/shm.c, branch v6.6.131

ipc: fix to protect IPCS lookups using RCU

2025-06-27T10:08:49+00:00

commit d66adabe91803ef34a8b90613c81267b5ded1472 upstream. syzbot reported that it discovered a use-after-free vulnerability, [0] [0]: https://lore.kernel.org/all/67af13f8.050a0220.21dd3.0038.GAE@google.com/ idr_for_each() is protected by rwsem, but this is not enough. If it is not protected by RCU read-critical region, when idr_for_each() calls radix_tree_node_free() through call_rcu() to free the radix_tree_node structure, the node will be freed immediately, and when reading the next node in radix_tree_for_each_slot(), the already freed memory may be read. Therefore, we need to add code to make sure that idr_for_each() is protected within the RCU read-critical region when we call it in shm_destroy_orphaned(). Link: https://lkml.kernel.org/r/20250424143322.18830-1-aha310510@gmail.com Fixes: b34a6b1da371 ("ipc: introduce shm_rmid_forced sysctl") Signed-off-by: Jeongjun Park Reported-by: syzbot+a2b84e569d06ca3a949c@syzkaller.appspotmail.com Cc: Jeongjun Park Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Matthew Wilcox (Oracle) Cc: Vasiliy Kulikov Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman

mm: Re-introduce vm_flags to do_mmap()

2023-07-11T21:12:18+00:00

There was no more caller passing vm_flags to do_mmap(), and vm_flags was removed from the function's input by: commit 45e55300f114 ("mm: remove unnecessary wrapper function do_mmap_pgoff()"). There is a new user now. Shadow stack allocation passes VM_SHADOW_STACK to do_mmap(). Thus, re-introduce vm_flags to do_mmap(). Co-developed-by: Rick Edgecombe Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Peter Collingbourne Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Reviewed-by: Mark Brown Acked-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Tested-by: Mark Brown Link: https://lore.kernel.org/all/20230613001108.3040476-5-rick.p.edgecombe%40intel.com

ipc/shm: introduce new do_vma_munmap() to munmap

2023-02-10T00:51:33+00:00

The shm already has the vma iterator in position for a write. do_vmi_munmap() searches for the correct position and aligns the write, so it is not the right function to use in this case. The shm VMA tree modification is similar to the brk munmap situation, the vma iterator is in position and the VMA is already known. This patch generalizes the brk munmap function do_brk_munmap() to be used for any other callers with the vma iterator already in position to munmap a VMA. Link: https://lkml.kernel.org/r/20230126212049.980501-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Reported-by: Sven Schnelle Link: https://lore.kernel.org/linux-mm/yt9dh6wec21a.fsf@linux.ibm.com/ Cc: Arnd Bergmann Signed-off-by: Andrew Morton

ipc/shm: use the vma iterator for munmap calls

2023-02-10T00:51:32+00:00

Pass through the vma iterator to do_vmi_munmap() to handle the iterator state internally Link: https://lkml.kernel.org/r/20230120162650.984577-16-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Signed-off-by: Andrew Morton

ipc/shm: call underlying open/close vm_ops

2022-11-23T02:50:42+00:00

Shared memory segments can be created that are backed by hugetlb pages. When this happens, the vmas associated with any mappings (shmat) are marked VM_HUGETLB, yet the vm_ops for such mappings are provided by ipc/shm (shm_vm_ops). There is a mechanism to call the underlying hugetlb vm_ops, and this is done for most operations. However, it is not done for open and close. This was not an issue until the introduction of the hugetlb vma_lock. This lock structure is pointed to by vm_private_data and the open/close vm_ops help maintain this structure. The special hugetlb routine called at fork took care of structure updates at fork time. However, vma_splitting is not properly handled for ipc shared memory mappings backed by hugetlb pages. This can result in a "kernel NULL pointer dereference" BUG or use after free as two vmas point to the same lock structure. Update the shm open and close routines to always call the underlying open and close routines. Link: https://lkml.kernel.org/r/20221114210018.49346-1-mike.kravetz@oracle.com Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing") Signed-off-by: Mike Kravetz Reported-by: Doug Nelson Reported-by: Cc: Alexander Mikhalitsyn Cc: "Eric W . Biederman" Cc: Manfred Spraul Cc: Matthew Wilcox Cc: Miaohe Lin Cc: Michal Hocko Signed-off-by: Andrew Morton

ipc/shm: use VMA iterator instead of linked list

2022-09-27T02:46:21+00:00

The VMA iterator is faster than the linked llist, and it can be walked even when VMAs are being removed from the address space, so there's no need to keep track of 'next'. Link: https://lkml.kernel.org/r/20220906194824.2110408-46-Liam.Howlett@oracle.com Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Liam R. Howlett Tested-by: Yu Zhao Cc: Catalin Marinas Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: SeongJae Park Cc: Sven Schnelle Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton

shm: extend forced shm destroy to support objects from several IPC nses

2021-11-20T18:35:54+00:00

Currently, the exit_shm() function not designed to work properly when task->sysvshm.shm_clist holds shm objects from different IPC namespaces. This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it leads to use-after-free (reproducer exists). This is an attempt to fix the problem by extending exit_shm mechanism to handle shm's destroy from several IPC ns'es. To achieve that we do several things: 1. add a namespace (non-refcounted) pointer to the struct shmid_kernel 2. during new shm object creation (newseg()/shmget syscall) we initialize this pointer by current task IPC ns 3. exit_shm() fully reworked such that it traverses over all shp's in task->sysvshm.shm_clist and gets IPC namespace not from current task as it was before but from shp's object itself, then call shm_destroy(shp, ns). Note: We need to be really careful here, because as it was said before (1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we using special helper get_ipc_ns_not_zero() which allows to get IPC ns refcounter only if IPC ns not in the "state of destruction". Q/A Q: Why can we access shp->ns memory using non-refcounted pointer? A: Because shp object lifetime is always shorther than IPC namespace lifetime, so, if we get shp object from the task->sysvshm.shm_clist while holding task_lock(task) nobody can steal our namespace. Q: Does this patch change semantics of unshare/setns/clone syscalls? A: No. It's just fixes non-covered case when process may leave IPC namespace without getting task->sysvshm.shm_clist list cleaned up. Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife.com Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity") Co-developed-by: Manfred Spraul Signed-off-by: Manfred Spraul Signed-off-by: Alexander Mikhalitsyn Cc: "Eric W. Biederman" Cc: Davidlohr Bueso Cc: Greg KH Cc: Andrei Vagin Cc: Pavel Tikhomirov Cc: Vasily Averin Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm,hugetlb: remove mlock ulimit for SHM_HUGETLB

2021-11-09T18:02:48+00:00

Commit 21a3c273f88c ("mm, hugetlb: add thread name and pid to SHM_HUGETLB mlock rlimit warning") marked this as deprecated in 2012, but it is not deleted yet. Mike says he still sees that message in log files on occasion, so maybe we should preserve this warning. Also remove hugetlbfs related user_shm_unlock in ipc/shm.c and remove the user_shm_unlock after out. Link: https://lkml.kernel.org/r/20211103105857.25041-1-zhangyiru3@huawei.com Signed-off-by: zhangyiru Reviewed-by: Mike Kravetz Cc: Hugh Dickins Cc: Liu Zixian Cc: Michal Hocko Cc: wuxu.wu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

memcg: enable accounting of ipc resources

2021-09-03T16:58:12+00:00

When user creates IPC objects it forces kernel to allocate memory for these long-living objects. It makes sense to account them to restrict the host's memory consumption from inside the memcg-limited container. This patch enables accounting for IPC shared memory segments, messages semaphores and semaphore's undo lists. Link: https://lkml.kernel.org/r/d6507b06-4df6-78f8-6c54-3ae86e3b5339@virtuozzo.com Signed-off-by: Vasily Averin Reviewed-by: Shakeel Butt Cc: Alexander Viro Cc: Alexey Dobriyan Cc: Andrei Vagin Cc: Borislav Petkov Cc: Borislav Petkov Cc: Christian Brauner Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: "J. Bruce Fields" Cc: Jeff Layton Cc: Jens Axboe Cc: Jiri Slaby Cc: Johannes Weiner Cc: Kirill Tkhai Cc: Michal Hocko Cc: Oleg Nesterov Cc: Roman Gushchin Cc: Serge Hallyn Cc: Tejun Heo Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Yutian Yang Cc: Zefan Li Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

Merge branch 'akpm' (patches from Andrew)

2021-07-02T19:08:10+00:00

Merge more updates from Andrew Morton: "190 patches. Subsystems affected by this patch series: mm (hugetlb, userfaultfd, vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock, migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap, zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc, core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs, signals, exec, kcov, selftests, compress/decompress, and ipc" * emailed patches from Andrew Morton : (190 commits) ipc/util.c: use binary search for max_idx ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock ipc: use kmalloc for msg_queue and shmid_kernel ipc sem: use kvmalloc for sem_undo allocation lib/decompressors: remove set but not used variabled 'level' selftests/vm/pkeys: exercise x86 XSAVE init state selftests/vm/pkeys: refill shadow register after implicit kernel write selftests/vm/pkeys: handle negative sys_pkey_alloc() return code selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random kcov: add __no_sanitize_coverage to fix noinstr for all architectures exec: remove checks in __register_bimfmt() x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned hfsplus: report create_date to kstat.btime hfsplus: remove unnecessary oom message nilfs2: remove redundant continue statement in a while-loop kprobes: remove duplicated strong free_insn_page in x86 and s390 init: print out unknown kernel parameters checkpatch: do not complain about positive return values starting with EPOLL checkpatch: improve the indented label test checkpatch: scripts/spdxcheck.py now requires python3 ...