summaryrefslogtreecommitdiff
path: root/drivers/android
AgeCommit message (Collapse)AuthorFilesLines
2025-07-29Merge tag 'char-misc-6.17-rc1' of ↵Linus Torvalds12-405/+657
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc / IIO / other driver updates from Greg KH: "Here is the big set of char/misc/iio and other smaller driver subsystems for 6.17-rc1. It's a big set this time around, with the huge majority being in the iio subsystem with new drivers and dts files being added there. Highlights include: - IIO driver updates, additions, and changes making more code const and cleaning up some init logic - bus_type constant conversion changes - misc device test functions added - rust miscdevice minor fixup - unused function removals for some drivers - mei driver updates - mhi driver updates - interconnect driver updates - Android binder updates and test infrastructure added - small cdx driver updates - small comedi fixes - small nvmem driver updates - small pps driver updates - some acrn virt driver fixes for printk messages - other small driver updates All of these have been in linux-next with no reported issues" * tag 'char-misc-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits) binder: Use seq_buf in binder_alloc kunit tests binder: Add copyright notice to new kunit files misc: ti_fpc202: Switch to of_fwnode_handle() bus: moxtet: Use dev_fwnode() pc104: move PC104 option to drivers/Kconfig drivers: virt: acrn: Don't use %pK through printk comedi: fix race between polling and detaching interconnect: qcom: Add Milos interconnect provider driver dt-bindings: interconnect: document the RPMh Network-On-Chip Interconnect in Qualcomm Milos SoC mei: more prints with client prefix mei: bus: use cldev in prints bus: mhi: host: pci_generic: Add Telit FN990B40 modem support bus: mhi: host: Detect events pointing to unexpected TREs bus: mhi: host: pci_generic: Add Foxconn T99W696 modem bus: mhi: host: Use str_true_false() helper bus: mhi: host: pci_generic: Add support for EM929x and set MRU to 32768 for better performance. bus: mhi: host: Fix endianness of BHI vector table bus: mhi: host: pci_generic: Disable runtime PM for QDU100 bus: mhi: host: pci_generic: Fix the modem name of Foxconn T99W640 dt-bindings: interconnect: qcom,msm8998-bwmon: Allow 'nonposted-mmio' ...
2025-07-28Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-14/+6
Pull misc VFS updates from Al Viro: "VFS-related cleanups in various places (mostly of the "that really can't happen" or "there's a better way to do it" variety)" * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: gpib: use file_inode() binder_ioctl_write_read(): simplify control flow a bit secretmem: move setting O_LARGEFILE and bumping users' count to the place where we create the file apparmor: file never has NULL f_path.mnt landlock: opened file never has a negative dentry
2025-07-24binder: Use seq_buf in binder_alloc kunit testsTiffany Yang1-25/+21
Replace instances of snprintf with seq_buf functions, as suggested by Kees [1]. [1] https://lore.kernel.org/all/202507160743.15E8044@keescook/ Fixes: d1934ed9803c ("binder: encapsulate individual alloc test cases") Suggested-by: Kees Cook <kees@kernel.org> Cc: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Tiffany Yang <ynaffit@google.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250722234508.232228-2-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-24binder: Add copyright notice to new kunit filesTiffany Yang3-1/+11
Clean up for the binder_alloc kunit test series. Add a copyright notice to new files, as suggested by Carlos [1]. [1] https://lore.kernel.org/all/CAFuZdDLD=3CBOLSWw3VxCf7Nkf884SSNmt1wresQgxgBwED=eQ@mail.gmail.com/ Fixes: 5e024582f494 ("binder: Scaffolding for binder_alloc KUnit tests") Suggested-by: Carlos Llamas <cmllamas@google.com> Cc: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Tiffany Yang <ynaffit@google.com> Link: https://lore.kernel.org/r/20250722234508.232228-1-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16binder: encapsulate individual alloc test casesTiffany Yang1-53/+181
Each case tested by the binder allocator test is defined by 3 parameters: the end alignment type of each requested buffer allocation, whether those buffers share the front or back pages of the allotted address space, and the order in which those buffers should be released. The alignment type represents how a binder buffer may be laid out within or across page boundaries and relative to other buffers, and it's used along with whether the buffers cover part (sharing the front pages) of or all (sharing the back pages) of the vma to calculate the sizes passed into each test. binder_alloc_test_alloc recursively generates each possible arrangement of alignment types and then tests that the binder_alloc code tracks pages correctly when those buffers are allocated and then freed in every possible order at both ends of the address space. While they provide comprehensive coverage, they are poor candidates to be represented as KUnit test cases, which must be statically enumerated. For 5 buffers and 5 end alignment types, the test case array would have 750,000 entries. This change structures the recursive calls into meaningful test cases so that failures are easier to interpret. Signed-off-by: Tiffany Yang <ynaffit@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250714185321.2417234-7-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16binder: Convert binder_alloc selftests to KUnitTiffany Yang7-366/+282
Convert the existing binder_alloc_selftest tests into KUnit tests. These tests allocate and free an exhaustive combination of buffers with various sizes and alignments. This change allows them to be run without blocking or otherwise interfering with other processes in binder. This test is refactored into more meaningful cases in the subsequent patch. Signed-off-by: Tiffany Yang <ynaffit@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250714185321.2417234-6-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16binder: Scaffolding for binder_alloc KUnit testsTiffany Yang9-6/+208
Add setup and teardown for testing binder allocator code with KUnit. Include minimal test cases to verify that tests are initialized correctly. Tested-by: Rae Moar <rmoar@google.com> Signed-off-by: Tiffany Yang <ynaffit@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250714185321.2417234-5-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16binder: Store lru freelist in binder_allocTiffany Yang3-20/+67
Store a pointer to the free pages list that the binder allocator should use for a process inside of struct binder_alloc. This change allows binder allocator code to be tested and debugged deterministically while a system is using binder; i.e., without interfering with other binder processes and independently of the shrinker. This is necessary to convert the current binder_alloc_selftest into a kunit test that does not rely on hijacking an existing binder_proc to run. A binder process's binder_alloc->freelist should not be changed after it is initialized. A sole exception is the process that runs the existing binder_alloc selftest. Its freelist can be temporarily replaced for the duration of the test because it runs as a single thread before any pages can be added to the global binder freelist, and the test frees every page it allocates before dropping the binder_selftest_lock. This exception allows the existing selftest to be used to check for regressions, but it will be dropped when the binder_alloc tests are converted to kunit in a subsequent patch in this series. Signed-off-by: Tiffany Yang <ynaffit@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250714185321.2417234-3-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16binder: Fix selftest page indexingTiffany Yang1-1/+1
The binder allocator selftest was only checking the last page of buffers that ended on a page boundary. Correct the page indexing to account for buffers that are not page-aligned. Signed-off-by: Tiffany Yang <ynaffit@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250714185321.2417234-2-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16binder: use guards for plain mutex- and spinlock-protected sectionsDmitry Antipov3-38/+17
Use 'guard(mutex)' and 'guard(spinlock)' for plain (i.e. non-scoped) mutex- and spinlock-protected sections, respectively, thus making locking a bit simpler. Briefly tested with 'stress-ng --binderfs'. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250626073054.7706-2-dmantipov@yandex.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16binder: use kstrdup() in binderfs_binder_device_create()Dmitry Antipov1-4/+1
In 'binderfs_binder_device_create()', use 'kstrdup()' to copy the newly created device's name, thus making the former a bit simpler. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: "Tiffany Y. Yang" <ynaffit@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250626073054.7706-1-dmantipov@yandex.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-03kill binderfs_remove_file()Al Viro3-18/+1
don't try to open-code simple_recursive_removal(), especially when you miss things like d_invalidate()... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-24binder: Remove unused binder lock eventsSteven Rostedt1-21/+0
Trace events can take up to 5K each when they are defined, regardless if they are used or not. The binder lock events: binder_lock, binder_locked and binder_unlock are no longer used. Remove them. Fixes: a60b890f607d ("binder: remove global binder lock") Signed-off-by: "Steven Rostedt (Google)" <rostedt@goodmis.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250612093408.3b7320fa@batman.local.home Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-24binder: fix reversed pid/tid in logCarlos Llamas1-4/+2
The "pid:tid" format is used consistently throughout the driver's logs with the exception of this one place where the arguments are reversed. Let's fix that. Also, collapse a multi-line comment into a single line. Cc: Steven Moreland <smoreland@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250605141930.1069438-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-18binder_ioctl_write_read(): simplify control flow a bitAl Viro1-14/+6
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-06Merge tag 'char-misc-6.16-rc1' of ↵Linus Torvalds3-89/+169
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc / iio driver updates from Greg KH: "Here is the big char/misc/iio and other small driver subsystem pull request for 6.16-rc1. Overall, a lot of individual changes, but nothing major, just the normal constant forward progress of new device support and cleanups to existing subsystems. Highlights in here are: - Large IIO driver updates and additions and device tree changes - Android binder bugfixes and logfile fixes - mhi driver updates - comedi driver updates - counter driver updates and additions - coresight driver updates and additions - echo driver removal as there are no in-kernel users of it - nvmem driver updates - spmi driver updates - new amd-sbi driver "subsystem" and drivers added - rust miscdriver binding documentation fix - other small driver fixes and updates (uio, w1, acrn, hpet, xillybus, cardreader drivers, fastrpc and others) All of these have been in linux-next for quite a while with no reported problems" * tag 'char-misc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (390 commits) binder: fix yet another UAF in binder_devices counter: microchip-tcb-capture: Add watch validation support dt-bindings: iio: adc: Add ROHM BD79100G iio: adc: add support for Nuvoton NCT7201 dt-bindings: iio: adc: add NCT7201 ADCs iio: chemical: Add driver for SEN0322 dt-bindings: trivial-devices: Document SEN0322 iio: adc: ad7768-1: reorganize driver headers iio: bmp280: zero-init buffer iio: ssp_sensors: optimalize -> optimize HID: sensor-hub: Fix typo and improve documentation iio: admv1013: replace redundant ternary operator with just len iio: chemical: mhz19b: Fix error code in probe() iio: adc: at91-sama5d2: use IIO_DECLARE_BUFFER_WITH_TS iio: accel: sca3300: use IIO_DECLARE_BUFFER_WITH_TS iio: adc: ad7380: use IIO_DECLARE_DMA_BUFFER_WITH_TS iio: adc: ad4695: rename AD4695_MAX_VIN_CHANNELS iio: adc: ad4695: use IIO_DECLARE_DMA_BUFFER_WITH_TS iio: introduce IIO_DECLARE_BUFFER_WITH_TS macros iio: make IIO_DMA_MINALIGN minimum of 8 bytes ...
2025-05-26Merge tag 'vfs-6.16-rc1.async.dir' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs directory lookup updates from Christian Brauner: "This contains cleanups for the lookup_one*() family of helpers. We expose a set of functions with names containing "lookup_one_len" and others without the "_len". This difference has nothing to do with "len". It's rater a historical accident that can be confusing. The functions without "_len" take a "mnt_idmap" pointer. This is found in the "vfsmount" and that is an important question when choosing which to use: do you have a vfsmount, or are you "inside" the filesystem. A related question is "is permission checking relevant here?". nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len functions. They pass nop_mnt_idmap and refuse to work on filesystems which have any other idmap. This work changes nfsd and cachefile to use the lookup_one family of functions and to explictily pass &nop_mnt_idmap which is consistent with all other vfs interfaces used where &nop_mnt_idmap is explicitly passed. The remaining uses of the "_one" functions do not require permission checks so these are renamed to be "_noperm" and the permission checking is removed. This series also changes these lookup function to take a qstr instead of separate name and len. In many cases this simplifies the call" * tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: VFS: change lookup_one_common and lookup_noperm_common to take a qstr Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS VFS: rename lookup_one_len family to lookup_noperm and remove permission check cachefiles: Use lookup_one() rather than lookup_one_len() nfsd: Use lookup_one() rather than lookup_one_len() VFS: improve interface for lookup_one functions
2025-05-25binder: fix yet another UAF in binder_devicesCarlos Llamas1-0/+1
Commit e77aff5528a18 ("binderfs: fix use-after-free in binder_devices") addressed a use-after-free where devices could be released without first being removed from the binder_devices list. However, there is a similar path in binder_free_proc() that was missed: ================================================================== BUG: KASAN: slab-use-after-free in binder_remove_device+0xd4/0x100 Write of size 8 at addr ffff0000c773b900 by task umount/467 CPU: 12 UID: 0 PID: 467 Comm: umount Not tainted 6.15.0-rc7-00138-g57483a362741 #9 PREEMPT Hardware name: linux,dummy-virt (DT) Call trace: binder_remove_device+0xd4/0x100 binderfs_evict_inode+0x230/0x2f0 evict+0x25c/0x5dc iput+0x304/0x480 dentry_unlink_inode+0x208/0x46c __dentry_kill+0x154/0x530 [...] Allocated by task 463: __kmalloc_cache_noprof+0x13c/0x324 binderfs_binder_device_create.isra.0+0x138/0xa60 binder_ctl_ioctl+0x1ac/0x230 [...] Freed by task 215: kfree+0x184/0x31c binder_proc_dec_tmpref+0x33c/0x4ac binder_deferred_func+0xc10/0x1108 process_one_work+0x520/0xba4 [...] ================================================================== Call binder_remove_device() within binder_free_proc() to ensure the device is removed from the binder_devices list before being kfreed. Cc: stable@vger.kernel.org Fixes: 12d909cac1e1 ("binderfs: add new binder devices to binder_devices") Reported-by: syzbot+4af454407ec393de51d6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4af454407ec393de51d6 Tested-by: syzbot+4af454407ec393de51d6@syzkaller.appspotmail.com Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250524220758.915028-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21binder: Create safe versions of binder log filesTiffany Y. Yang1-27/+79
Binder defines several seq_files that can be accessed via debugfs or binderfs. Some of these files (e.g., 'state' and 'transactions') contain more granular information about binder's internal state that is helpful for debugging, but they also leak userspace address data through user-defined 'cookie' or 'ptr' values. Consequently, access to these files must be heavily restricted. Add two new files, 'state_hashed' and 'transactions_hashed', that reproduce the information in the original files but use the kernel's raw pointer obfuscation to hash any potential user addresses. This approach allows systems to grant broader access to the new files without having to change the security policy around the existing ones. In practice, userspace populates these fields with user addresses, but within the driver, these values only serve as unique identifiers for their associated binder objects. Consequently, binder logs can obfuscate these values and still retain meaning. While this strategy prevents leaking information about the userspace memory layout in the existing log files, it also decouples log messages about binder objects from their user-defined identifiers. Acked-by: Carlos Llamas <cmllamas@google.com> Tested-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: "Tiffany Y. Yang" <ynaffit@google.com> Link: https://lore.kernel.org/r/20250510013435.1520671-7-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21binder: Refactor binder_node print synchronizationTiffany Y. Yang1-51/+68
The binder driver outputs information about each dead binder node by iterating over the dead nodes list, and it prints the state of each live node in the system by traversing each binder_proc's proc->nodes tree. Both cases require similar logic to maintain the global lock ordering while accessing each node. Create a helper function to synchronize around printing binder nodes in a list. Opportunistically make minor cosmetic changes to binder print functions. Acked-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: "Tiffany Y. Yang" <ynaffit@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250510013435.1520671-5-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21binder: fix use-after-free in binderfs_evict_inode()Dmitry Antipov3-5/+20
Running 'stress-ng --binderfs 16 --timeout 300' under KASAN-enabled kernel, I've noticed the following: BUG: KASAN: slab-use-after-free in binderfs_evict_inode+0x1de/0x2d0 Write of size 8 at addr ffff88807379bc08 by task stress-ng-binde/1699 CPU: 0 UID: 0 PID: 1699 Comm: stress-ng-binde Not tainted 6.14.0-rc7-g586de92313fc-dirty #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x1c2/0x2a0 ? __pfx_dump_stack_lvl+0x10/0x10 ? __pfx__printk+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? __virt_addr_valid+0x18c/0x540 ? __virt_addr_valid+0x469/0x540 print_report+0x155/0x840 ? __virt_addr_valid+0x18c/0x540 ? __virt_addr_valid+0x469/0x540 ? __phys_addr+0xba/0x170 ? binderfs_evict_inode+0x1de/0x2d0 kasan_report+0x147/0x180 ? binderfs_evict_inode+0x1de/0x2d0 binderfs_evict_inode+0x1de/0x2d0 ? __pfx_binderfs_evict_inode+0x10/0x10 evict+0x524/0x9f0 ? __pfx_lock_release+0x10/0x10 ? __pfx_evict+0x10/0x10 ? do_raw_spin_unlock+0x4d/0x210 ? _raw_spin_unlock+0x28/0x50 ? iput+0x697/0x9b0 __dentry_kill+0x209/0x660 ? shrink_kill+0x8d/0x2c0 shrink_kill+0xa9/0x2c0 shrink_dentry_list+0x2e0/0x5e0 shrink_dcache_parent+0xa2/0x2c0 ? __pfx_shrink_dcache_parent+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 do_one_tree+0x23/0xe0 shrink_dcache_for_umount+0xa0/0x170 generic_shutdown_super+0x67/0x390 kill_litter_super+0x76/0xb0 binderfs_kill_super+0x44/0x90 deactivate_locked_super+0xb9/0x130 cleanup_mnt+0x422/0x4c0 ? lockdep_hardirqs_on+0x9d/0x150 task_work_run+0x1d2/0x260 ? __pfx_task_work_run+0x10/0x10 resume_user_mode_work+0x52/0x60 syscall_exit_to_user_mode+0x9a/0x120 do_syscall_64+0x103/0x210 ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0xcac57b Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 RSP: 002b:00007ffecf4226a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 00007ffecf422720 RCX: 0000000000cac57b RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffecf422850 RBP: 00007ffecf422850 R08: 0000000028d06ab1 R09: 7fffffffffffffff R10: 3fffffffffffffff R11: 0000000000000246 R12: 00007ffecf422718 R13: 00007ffecf422710 R14: 00007f478f87b658 R15: 00007ffecf422830 </TASK> Allocated by task 1705: kasan_save_track+0x3e/0x80 __kasan_kmalloc+0x8f/0xa0 __kmalloc_cache_noprof+0x213/0x3e0 binderfs_binder_device_create+0x183/0xa80 binder_ctl_ioctl+0x138/0x190 __x64_sys_ioctl+0x120/0x1b0 do_syscall_64+0xf6/0x210 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 1705: kasan_save_track+0x3e/0x80 kasan_save_free_info+0x46/0x50 __kasan_slab_free+0x62/0x70 kfree+0x194/0x440 evict+0x524/0x9f0 do_unlinkat+0x390/0x5b0 __x64_sys_unlink+0x47/0x50 do_syscall_64+0xf6/0x210 entry_SYSCALL_64_after_hwframe+0x77/0x7f This 'stress-ng' workload causes the concurrent deletions from 'binder_devices' and so requires full-featured synchronization to prevent list corruption. I've found this issue independently but pretty sure that syzbot did the same, so Reported-by: and Closes: should be applicable here as well. Cc: stable@vger.kernel.org Reported-by: syzbot+353d7b75658a95aa955a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=353d7b75658a95aa955a Fixes: e77aff5528a18 ("binderfs: fix use-after-free in binder_devices") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250517170957.1317876-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-28Merge 6.15-rc4 into char-misc-nextGreg Kroah-Hartman1-1/+1
We need the char-misc fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-15binder: use buffer offsets in debug logsTiffany Y. Yang1-16/+11
Identify buffer addresses using vma offsets instead of full user addresses in debug logs or drop them if they are not useful. Signed-off-by: Tiffany Y. Yang <ynaffit@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20250401202846.3510162-2-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-15binder: fix offset calculation in debug logCarlos Llamas1-1/+1
The vma start address should be substracted from the buffer's user data address and not the other way around. Cc: Tiffany Y. Yang <ynaffit@google.com> Cc: stable <stable@kernel.org> Fixes: 162c79731448 ("binder: avoid user addresses in debug logs") Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Tiffany Y. Yang <ynaffit@google.com> Link: https://lore.kernel.org/r/20250325184902.587138-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-08VFS: rename lookup_one_len family to lookup_noperm and remove permission checkNeilBrown1-2/+2
The lookup_one_len family of functions is (now) only used internally by a filesystem on itself either - in a context where permission checking is irrelevant such as by a virtual filesystem populating itself, or xfs accessing its ORPHANAGE or dquota accessing the quota file; or - in a context where a permission check (MAY_EXEC on the parent) has just been performed such as a network filesystem finding in "silly-rename" file in the same directory. This is also the context after the _parentat() functions where currently lookup_one_qstr_excl() is used. So the permission check is pointless. The name "one_len" is unhelpful in understanding the purpose of these functions and should be changed. Most of the callers pass the len as "strlen()" so using a qstr and QSTR() can simplify the code. This patch renames these functions (include lookup_positive_unlocked() which is part of the family despite the name) to have a name based on "lookup_noperm". They are changed to receive a 'struct qstr' instead of separate name and len. In a few cases the use of QSTR() results in a new call to strlen(). try_lookup_noperm() takes a pointer to a qstr instead of the whole qstr. This is consistent with d_hash_and_lookup() (which is nearly identical) and useful for lookup_noperm_unlocked(). The new lookup_noperm_common() doesn't take a qstr yet. That will be tidied up in a subsequent patch. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/r/20250319031545.2999807-5-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-10Merge 6.14-rc6 into char-misc-nextGreg Kroah-Hartman1-0/+1
We need the fixes in here as well to build on top of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20binder: remove unneeded <linux/export.h> inclusion from binder_internal.hMasahiro Yamada1-1/+0
binder_internal.h is included only in the following two C files: $ git grep binder_internal.h drivers/android/binder.c:#include "binder_internal.h" drivers/android/binderfs.c:#include "binder_internal.h" Neither of these files use the EXPORT_SYMBOL macro, so including <linux/export.h> is unnecessary. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20250217112756.1011333-1-masahiroy@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20binderfs: fix use-after-free in binder_devicesCarlos Llamas1-0/+1
Devices created through binderfs are added to the global binder_devices list but are not removed before being destroyed. This leads to dangling pointers in the list and subsequent use-after-free errors: ================================================================== BUG: KASAN: slab-use-after-free in binder_add_device+0x5c/0x9c Write of size 8 at addr ffff0000c258d708 by task mount/653 CPU: 7 UID: 0 PID: 653 Comm: mount Not tainted 6.13.0-09030-g6d61a53dd6f5 #1 Hardware name: linux,dummy-virt (DT) Call trace: binder_add_device+0x5c/0x9c binderfs_binder_device_create+0x690/0x84c [...] __arm64_sys_mount+0x324/0x3bc Allocated by task 632: binderfs_binder_device_create+0x168/0x84c binder_ctl_ioctl+0xfc/0x184 [...] __arm64_sys_ioctl+0x110/0x150 Freed by task 649: kfree+0xe0/0x338 binderfs_evict_inode+0x138/0x1dc [...] ================================================================== Remove devices from binder_devices before destroying them. Cc: Li Li <dualli@google.com> Reported-by: syzbot+7015dcf45953112c8b45@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7015dcf45953112c8b45 Fixes: 12d909cac1e1 ("binderfs: add new binder devices to binder_devices") Signed-off-by: Carlos Llamas <cmllamas@google.com> Tested-by: syzbot+7015dcf45953112c8b45@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20250130215823.1518990-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-28Merge tag 'char-misc-6.14-rc1' of ↵Linus Torvalds7-169/+288
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull Char/Misc/IIO driver updates from Greg KH: "Here is the "big" set of char/misc/iio and other smaller driver subsystem updates for 6.14-rc1. Loads of different things in here this development cycle, highlights are: - ntsync "driver" to handle Windows locking types enabling Wine to work much better on many workloads (i.e. games). The driver framework was in 6.13, but now it's enabled and fully working properly. Should make many SteamOS users happy. Even comes with tests! - Large IIO driver updates and bugfixes - FPGA driver updates - Coresight driver updates - MHI driver updates - PPS driver updatesa - const bin_attribute reworking for many drivers - binder driver updates - smaller driver updates and fixes All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (311 commits) ntsync: Fix reference leaks in the remaining create ioctls. spmi: hisi-spmi-controller: Drop duplicated OF node assignment in spmi_controller_probe() spmi: Set fwnode for spmi devices ntsync: fix a file reference leak in drivers/misc/ntsync.c scripts/tags.sh: Don't tag usages of DECLARE_BITMAP dt-bindings: interconnect: qcom,msm8998-bwmon: Add SM8750 CPU BWMONs dt-bindings: interconnect: OSM L3: Document sm8650 OSM L3 compatible dt-bindings: interconnect: qcom-bwmon: Document QCS615 bwmon compatibles interconnect: sm8750: Add missing const to static qcom_icc_desc memstick: core: fix kernel-doc notation intel_th: core: fix kernel-doc warnings binder: log transaction code on failure iio: dac: ad3552r-hs: clear reset status flag iio: dac: ad3552r-common: fix ad3541/2r ranges iio: chemical: bme680: Fix uninitialized variable in __bme680_read_raw() misc: fastrpc: Fix copy buffer page size misc: fastrpc: Fix registered buffer page address misc: fastrpc: Deregister device nodes properly in error scenarios nvmem: core: improve range check for nvmem_cell_write() nvmem: qcom-spmi-sdam: Set size in struct nvmem_config ...
2025-01-13binder: log transaction code on failureCarlos Llamas1-2/+2
When a transaction fails, log the 'tr->code' to help indentify the problematic userspace call path. This additional information will simplify debugging efforts. Cc: Steven Moreland <smoreland@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250110175051.2656975-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08binder: fix kernel-doc warning of 'file' memberCarlos Llamas1-1/+1
The 'struct file' member in 'binder_task_work_cb' definition was renamed to 'file' between patch versions but its kernel-doc reference kept the old name 'fd'. Update the naming to fix the W=1 build warning. Cc: Todd Kjos <tkjos@google.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501031535.erbln3A2-lkp@intel.com/ Signed-off-by: Carlos Llamas <cmllamas@google.com> Acked-by: Todd Kjos <tkjos@google.com> Link: https://lore.kernel.org/r/20250106192608.1107362-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08binderfs: add new binder devices to binder_devicesLi Li3-2/+16
When binderfs is not enabled, the binder driver parses the kernel config to create all binder devices. All of the new binder devices are stored in the list binder_devices. When binderfs is enabled, the binder driver creates new binder devices dynamically when userspace applications call BINDER_CTL_ADD ioctl. But the devices created in this way are not stored in the same list. This patch fixes that. Signed-off-by: Li Li <dualli@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20241218212935.4162907-2-dualli@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24binder: use per-vma lock in page reclaimingCarlos Llamas1-7/+22
Use per-vma locking in the shrinker's callback when reclaiming pages, similar to the page installation logic. This minimizes contention with unrelated vmas improving performance. The mmap_sem is still acquired if the per-vma lock cannot be obtained. Cc: Suren Baghdasaryan <surenb@google.com> Suggested-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20241210143114.661252-10-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24binder: propagate vm_insert_page() errorsCarlos Llamas1-1/+0
Instead of always overriding errors with -ENOMEM, propagate the specific error code returned by vm_insert_page(). This allows for more accurate error logs and handling. Cc: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20241210143114.661252-9-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24binder: use per-vma lock in page installationCarlos Llamas1-17/+50
Use per-vma locking for concurrent page installations, this minimizes contention with unrelated vmas improving performance. The mmap_lock is still acquired when needed though, e.g. before get_user_pages_remote(). Many thanks to Barry Song who posted a similar approach [1]. Link: https://lore.kernel.org/all/20240902225009.34576-1-21cnbao@gmail.com/ [1] Cc: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20241210143114.661252-8-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24binder: rename alloc->buffer to vm_startCarlos Llamas5-19/+19
The alloc->buffer field in struct binder_alloc stores the starting address of the mapped vma, rename this field to alloc->vm_start to better reflect its purpose. It also avoids confusion with the binder buffer concept, e.g. transaction->buffer. No functional changes in this patch. Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20241210143114.661252-7-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24binder: replace alloc->vma with alloc->mappedCarlos Llamas3-26/+30
It is unsafe to use alloc->vma outside of the mmap_sem. Instead, add a new boolean alloc->mapped to save the vma state (mapped or unmmaped) and use this as a replacement for alloc->vma to validate several paths. Using the alloc->vma caused several performance and security issues in the past. Now that it has been replaced with either vm_lookup() or the alloc->mapped state, we can finally remove it. Cc: Minchan Kim <minchan@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20241210143114.661252-6-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24binder: store shrinker metadata under page->privateCarlos Llamas3-70/+99
Instead of pre-allocating an entire array of struct binder_lru_page in alloc->pages, install the shrinker metadata under page->private. This ensures the memory is allocated and released as needed alongside pages. By converting the alloc->pages[] into an array of struct page pointers, we can access these pages directly and only reference the shrinker metadata where it's being used (e.g. inside the shrinker's callback). Rename struct binder_lru_page to struct binder_shrinker_mdata to better reflect its purpose. Add convenience functions that wrap the allocation and freeing of pages along with their shrinker metadata. Note I've reworked this patch to avoid using page->lru and page->index directly, as Matthew pointed out that these are being removed [1]. Link: https://lore.kernel.org/all/ZzziucEm3np6e7a0@casper.infradead.org/ [1] Cc: Matthew Wilcox <willy@infradead.org> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20241210143114.661252-5-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24binder: select correct nid for pages in LRUCarlos Llamas1-4/+12
The numa node id for binder pages is currently being derived from the lru entry under struct binder_lru_page. However, this object doesn't reflect the node id of the struct page items allocated separately. Instead, select the correct node id from the page itself. This was made possible since commit 0a97c01cd20b ("list_lru: allow explicit memcg and NUMA node selection"). Cc: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20241210143114.661252-4-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24binder: concurrent page installationCarlos Llamas1-24/+41
Allow multiple callers to install pages simultaneously by switching the mmap_sem from write-mode to read-mode. Races to the same PTE are handled using get_user_pages_remote() to retrieve the already installed page. This method significantly reduces contention in the mmap semaphore. To ensure safety, vma_lookup() is used (instead of alloc->vma) to avoid operating on an isolated VMA. In addition, zap_page_range_single() is called under the alloc->mutex to avoid racing with the shrinker. Many thanks to Barry Song who posted a similar approach [1]. Link: https://lore.kernel.org/all/20240902225009.34576-1-21cnbao@gmail.com/ [1] Cc: David Hildenbrand <david@redhat.com> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20241210143114.661252-3-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24Revert "binder: switch alloc->mutex to spinlock_t"Carlos Llamas2-28/+28
This reverts commit 7710e2cca32e7f3958480e8bd44f50e29d0c2509. In preparation for concurrent page installations, restore the original alloc->mutex which will serialize zap_page_range_single() against page installations in subsequent patches (instead of the mmap_sem). Resolved trivial conflicts with commit 2c10a20f5e84a ("binder_alloc: Fix sleeping function called from invalid context") and commit da0c02516c50 ("mm/list_lru: simplify the list_lru walk callback function"). Cc: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20241210143114.661252-2-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16binder: initialize lsm_context structureCasey Schaufler1-1/+1
It is possible to reach the end of binder_transaction() without having set lsmctx. As the variable value is checked there it needs to be initialized. Suggested-by: Kees Bakker <kees@ijzerbout.nl> [PM: subj tweak to fit convention] Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-12-04lsm: replace context+len with lsm_contextCasey Schaufler1-3/+2
Replace the (secctx,seclen) pointer pair with a single lsm_context pointer to allow return of the LSM identifier along with the context and context length. This allows security_release_secctx() to know how to release the context. Callers have been modified to use or save the returned data from the new structure. security_secid_to_secctx() and security_lsmproc_to_secctx() will now return the length value on success instead of 0. Cc: netdev@vger.kernel.org Cc: audit@vger.kernel.org Cc: netfilter-devel@vger.kernel.org Cc: Todd Kjos <tkjos@google.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subject tweak, kdoc fix, signedness fix from Dan Carpenter] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-12-04lsm: ensure the correct LSM context releaserCasey Schaufler1-12/+12
Add a new lsm_context data structure to hold all the information about a "security context", including the string, its size and which LSM allocated the string. The allocation information is necessary because LSMs have different policies regarding the lifecycle of these strings. SELinux allocates and destroys them on each use, whereas Smack provides a pointer to an entry in a list that never goes away. Update security_release_secctx() to use the lsm_context instead of a (char *, len) pair. Change its callers to do likewise. The LSMs supporting this hook have had comments added to remind the developer that there is more work to be done. The BPF security module provides all LSM hooks. While there has yet to be a known instance of a BPF configuration that uses security contexts, the possibility is real. In the existing implementation there is potential for multiple frees in that case. Cc: linux-integrity@vger.kernel.org Cc: netdev@vger.kernel.org Cc: audit@vger.kernel.org Cc: netfilter-devel@vger.kernel.org To: Pablo Neira Ayuso <pablo@netfilter.org> Cc: linux-nfs@vger.kernel.org Cc: Todd Kjos <tkjos@google.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subject tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-11-29Merge tag 'char-misc-6.13-rc1' of ↵Linus Torvalds1-15/+49
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc/IIO/whatever driver subsystem updates from Greg KH: "Here is the 'big and hairy' char/misc/iio and other small driver subsystem updates for 6.13-rc1. Loads of things in here, and even a fun merge conflict! - rust misc driver bindings and other rust changes to make misc drivers actually possible. I think this is the tipping point, expect to see way more rust drivers going forward now that these bindings are present. Next merge window hopefully we will have pci and platform drivers working, which will fully enable almost all driver subsystems to start accepting (or at least getting) rust drivers. This is the end result of a lot of work from a lot of people, congrats to all of them for getting this far, you've proved many of us wrong in the best way possible, working code :) - IIO driver updates, too many to list individually, that subsystem keeps growing and growing... - Interconnect driver updates - nvmem driver updates - pwm driver updates - platform_driver::remove() fixups, loads of them - counter driver updates - misc driver updates (keba?) - binder driver updates and fixes - loads of other small char/misc/etc driver updates and additions, full details in the shortlog. All of these have been in linux-next for a while, with no other reported issues other than that merge conflict" * tag 'char-misc-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (401 commits) mei: vsc: Fix typo "maintstepping" -> "mainstepping" firmware: Switch back to struct platform_driver::remove() misc: isl29020: Fix the wrong format specifier scripts/tags.sh: Don't tag usages of DEFINE_MUTEX fpga: Switch back to struct platform_driver::remove() mei: vsc: Improve error logging in vsc_identify_silicon() mei: vsc: Do not re-enable interrupt from vsc_tp_reset() dt-bindings: spmi: qcom,x1e80100-spmi-pmic-arb: Add SAR2130P compatible dt-bindings: spmi: spmi-mtk-pmif: Add compatible for MT8188 spmi: pmic-arb: fix return path in for_each_available_child_of_node() iio: Move __private marking before struct element priv in struct iio_dev docs: iio: ad7380: add adaq4370-4 and adaq4380-4 iio: adc: ad7380: add support for adaq4370-4 and adaq4380-4 iio: adc: ad7380: use local dev variable to shorten long lines iio: adc: ad7380: fix oversampling formula dt-bindings: iio: adc: ad7380: add adaq4370-4 and adaq4380-4 compatible parts bus: mhi: host: pci_generic: Use pcim_iomap_region() to request and map MHI BAR bus: mhi: host: Switch trace_mhi_gen_tre fields to native endian misc: atmel-ssc: Use of_property_present() for non-boolean properties misc: keba: Add hardware dependency ...
2024-11-12mm/list_lru: simplify the list_lru walk callback functionKairui Song2-5/+4
Now isolation no longer takes the list_lru global node lock, only use the per-cgroup lock instead. And this lock is inside the list_lru_one being walked, no longer needed to pass the lock explicitly. Link: https://lkml.kernel.org/r/20241104175257.60853-7-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-12mm/list_lru: split the lock to per-cgroup scopeKairui Song1-1/+0
Currently, every list_lru has a per-node lock that protects adding, deletion, isolation, and reparenting of all list_lru_one instances belonging to this list_lru on this node. This lock contention is heavy when multiple cgroups modify the same list_lru. This lock can be split into per-cgroup scope to reduce contention. To achieve this, we need a stable list_lru_one for every cgroup. This commit adds a lock to each list_lru_one and introduced a helper function lock_list_lru_of_memcg, making it possible to pin the list_lru of a memcg. Then reworked the reparenting process. Reparenting will switch the list_lru_one instances one by one. By locking each instance and marking it dead using the nr_items counter, reparenting ensures that all items in the corresponding cgroup (on-list or not, because items have a stable cgroup, see below) will see the list_lru_one switch synchronously. Objcg reparent is also moved after list_lru reparent so items will have a stable mem cgroup until all list_lru_one instances are drained. The only caller that doesn't work the *_obj interfaces are direct calls to list_lru_{add,del}. But it's only used by zswap and that's also based on objcg, so it's fine. This also changes the bahaviour of the isolation function when LRU_RETRY or LRU_REMOVED_RETRY is returned, because now releasing the lock could unblock reparenting and free the list_lru_one, isolation function will have to return withoug re-lock the lru. prepare() { mkdir /tmp/test-fs modprobe brd rd_nr=1 rd_size=33554432 mkfs.xfs -f /dev/ram0 mount -t xfs /dev/ram0 /tmp/test-fs for i in $(seq 1 512); do mkdir "/tmp/test-fs/$i" for j in $(seq 1 10240); do echo TEST-CONTENT > "/tmp/test-fs/$i/$j" done & done; wait } do_test() { read_worker() { sleep 1 tar -cv "$1" &>/dev/null } read_in_all() { cd "/tmp/test-fs" && ls for i in $(seq 1 512); do (exec sh -c 'echo "$PPID"') > "/sys/fs/cgroup/benchmark/$i/cgroup.procs" read_worker "$i" & done; wait } for i in $(seq 1 512); do mkdir -p "/sys/fs/cgroup/benchmark/$i" done echo +memory > /sys/fs/cgroup/benchmark/cgroup.subtree_control echo 512M > /sys/fs/cgroup/benchmark/memory.max echo 3 > /proc/sys/vm/drop_caches time read_in_all } Above script simulates compression of small files in multiple cgroups with memory pressure. Run prepare() then do_test for 6 times: Before: real 0m7.762s user 0m11.340s sys 3m11.224s real 0m8.123s user 0m11.548s sys 3m2.549s real 0m7.736s user 0m11.515s sys 3m11.171s real 0m8.539s user 0m11.508s sys 3m7.618s real 0m7.928s user 0m11.349s sys 3m13.063s real 0m8.105s user 0m11.128s sys 3m14.313s After this commit (about ~15% faster): real 0m6.953s user 0m11.327s sys 2m42.912s real 0m7.453s user 0m11.343s sys 2m51.942s real 0m6.916s user 0m11.269s sys 2m43.957s real 0m6.894s user 0m11.528s sys 2m45.346s real 0m6.911s user 0m11.095s sys 2m43.168s real 0m6.773s user 0m11.518s sys 2m40.774s Link: https://lkml.kernel.org/r/20241104175257.60853-6-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-13binder: add delivered_freeze to debugfs outputCarlos Llamas1-0/+4
Add the pending proc->delivered_freeze work to the debugfs output. This information was omitted in the original implementation of the freeze notification and can be valuable for debugging issues. Fixes: d579b04a52a1 ("binder: frozen notification") Cc: stable@vger.kernel.org Signed-off-by: Carlos Llamas <cmllamas@google.com> Acked-by: Todd Kjos <tkjos@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240926233632.821189-9-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-13binder: fix memleak of proc->delivered_freezeCarlos Llamas1-0/+11
If a freeze notification is cleared with BC_CLEAR_FREEZE_NOTIFICATION before calling binder_freeze_notification_done(), then it is detached from its reference (e.g. ref->freeze) but the work remains queued in proc->delivered_freeze. This leads to a memory leak when the process exits as any pending entries in proc->delivered_freeze are not freed: unreferenced object 0xffff38e8cfa36180 (size 64): comm "binder-util", pid 655, jiffies 4294936641 hex dump (first 32 bytes): b8 e9 9e c8 e8 38 ff ff b8 e9 9e c8 e8 38 ff ff .....8.......8.. 0b 00 00 00 00 00 00 00 3c 1f 4b 00 00 00 00 00 ........<.K..... backtrace (crc 95983b32): [<000000000d0582cf>] kmemleak_alloc+0x34/0x40 [<000000009c99a513>] __kmalloc_cache_noprof+0x208/0x280 [<00000000313b1704>] binder_thread_write+0xdec/0x439c [<000000000cbd33bb>] binder_ioctl+0x1b68/0x22cc [<000000002bbedeeb>] __arm64_sys_ioctl+0x124/0x190 [<00000000b439adee>] invoke_syscall+0x6c/0x254 [<00000000173558fc>] el0_svc_common.constprop.0+0xac/0x230 [<0000000084f72311>] do_el0_svc+0x40/0x58 [<000000008b872457>] el0_svc+0x38/0x78 [<00000000ee778653>] el0t_64_sync_handler+0x120/0x12c [<00000000a8ec61bf>] el0t_64_sync+0x190/0x194 This patch fixes the leak by ensuring that any pending entries in proc->delivered_freeze are freed during binder_deferred_release(). Fixes: d579b04a52a1 ("binder: frozen notification") Cc: stable@vger.kernel.org Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Todd Kjos <tkjos@google.com> Link: https://lore.kernel.org/r/20240926233632.821189-8-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-13binder: allow freeze notification for dead nodesCarlos Llamas1-15/+13
Alice points out that binder_request_freeze_notification() should not return EINVAL when the relevant node is dead [1]. The node can die at any point even if the user input is valid. Instead, allow the request to be allocated but skip the initial notification for dead nodes. This avoids propagating unnecessary errors back to userspace. Fixes: d579b04a52a1 ("binder: frozen notification") Cc: stable@vger.kernel.org Suggested-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/all/CAH5fLghapZJ4PbbkC8V5A6Zay-_sgTzwVpwqk6RWWUNKKyJC_Q@mail.gmail.com/ [1] Signed-off-by: Carlos Llamas <cmllamas@google.com> Acked-by: Todd Kjos <tkjos@google.com> Link: https://lore.kernel.org/r/20240926233632.821189-7-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>