summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-05-29Merge tag 'v6.16-p2' of ↵Linus Torvalds1-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix a buffer overflow regression in shash" * tag 'v6.16-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: shash - Fix buffer overrun in import function
2025-05-28Merge tag 'jfs-6.16' of github.com:kleikamp/linux-shaggyLinus Torvalds3-5/+22
Pull jfs updates from David Kleikamp: "A few small fixes for jfs" * tag 'jfs-6.16' of github.com:kleikamp/linux-shaggy: jfs: fix array-index-out-of-bounds read in add_missing_indices jfs: Fix null-ptr-deref in jfs_ioc_trim jfs: validate AG parameters in dbMount() to prevent crashes
2025-05-28tracing: Fix compilation warning on arm32Pan Taixi1-1/+1
On arm32, size_t is defined to be unsigned int, while PAGE_SIZE is unsigned long. This hence triggers a compilation warning as min() asserts the type of two operands to be equal. Casting PAGE_SIZE to size_t solves this issue and works on other target architectures as well. Compilation warning details: kernel/trace/trace.c: In function 'tracing_splice_read_pipe': ./include/linux/minmax.h:20:28: warning: comparison of distinct pointer types lacks a cast (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) ^ ./include/linux/minmax.h:26:4: note: in expansion of macro '__typecheck' (__typecheck(x, y) && __no_side_effects(x, y)) ^~~~~~~~~~~ ... kernel/trace/trace.c:6771:8: note: in expansion of macro 'min' min((size_t)trace_seq_used(&iter->seq), ^~~ Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250526013731.1198030-1-pantaixi@huaweicloud.com Fixes: f5178c41bb43 ("tracing: Fix oob write in trace_seq_to_buffer()") Reviewed-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Pan Taixi <pantaixi@huaweicloud.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-28Merge tag 'dlm-6.16' of ↵Linus Torvalds3-3/+8
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This fixes delays when shutting down SCTP connections, and updates dlm Kconfig for SCTP" * tag 'dlm-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: drop SCTP Kconfig dependency dlm: reject SCTP configuration if not enabled dlm: use SHUT_RDWR for SCTP shutdown dlm: mask sk_shutdown value
2025-05-28Merge tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds41-381/+848
Pull nfsd updates from Chuck Lever: "The marquee feature for this release is that the limit on the maximum rsize and wsize has been raised to 4MB. The default remains at 1MB, but risk-seeking administrators now have the ability to try larger I/O sizes with NFS clients that support them. Eventually the default setting will be increased when we have confidence that this change will not have negative impact. With v6.16, NFSD now has its own debugfs file system where we can add experimental features and make them available outside of our development community without impacting production deployments. The first experimental setting added is one that makes all NFS READ operations use vfs_iter_read() instead of the NFSD splice actor. The plan is to eventually retire the splice actor, as that will enable a number of new capabilities such as the use of struct bio_vec from the top to the bottom of the NFSD stack. Jeff Layton contributed a number of observability improvements. The use of dprintk() in a number of high-traffic code paths has been replaced with static trace points. This release sees the continuation of efforts to harden the NFSv4.2 COPY operation. Soon, the restriction on async COPY operations can be lifted. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.16 development cycle" * tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (60 commits) xdrgen: Fix code generated for counted arrays SUNRPC: Bump the maximum payload size for the server NFSD: Add a "default" block size NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macro NFSD: Remove NFSD_BUFSIZE sunrpc: Remove the RPCSVC_MAXPAGES macro svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages sunrpc: Adjust size of socket's receive page array dynamically SUNRPC: Remove svc_rqst :: rq_vec SUNRPC: Remove svc_fill_write_vector() NFSD: Use rqstp->rq_bvec in nfsd_iter_write() SUNRPC: Export xdr_buf_to_bvec() NFSD: De-duplicate the svc_fill_write_vector() call sites NFSD: Use rqstp->rq_bvec in nfsd_iter_read() sunrpc: Replace the rq_bvec array with dynamically-allocated memory sunrpc: Replace the rq_pages array with dynamically-allocated memory sunrpc: Remove backchannel check in svc_init_buffer() sunrpc: Add a helper to derive maxpages from sv_max_mesg svcrdma: Reduce the number of rdma_rw contexts per-QP ...
2025-05-28Merge tag 'ext4_for_linus-6.16-rc1' of ↵Linus Torvalds27-474/+1290
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "New ext4 features and performance improvements: - Fast commit performance improvements - Multi-fsblock atomic write support for bigalloc file systems - Large folio support for regular files This last can result in really stupendous performance for the right workloads. For example, see [1] where the Kernel Test Robot reported over 37% improvement on a large sequential I/O workload. There are also the usual bug fixes and cleanups. Of note are cleanups of the extent status tree to fix potential races that could result in the extent status tree getting corrupted under heavy simultaneous allocation and deallocation to a single file" Link: https://lore.kernel.org/all/202505161418.ec0d753f-lkp@intel.com/ [1] * tag 'ext4_for_linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (52 commits) ext4: Add a WARN_ON_ONCE for querying LAST_IN_LEAF instead ext4: Simplify flags in ext4_map_query_blocks() ext4: Rename and document EXT4_EX_FILTER to EXT4_EX_QUERY_FILTER ext4: Simplify last in leaf check in ext4_map_query_blocks ext4: Unwritten to written conversion requires EXT4_EX_NOCACHE ext4: only dirty folios when data journaling regular files ext4: Add atomic block write documentation ext4: Enable support for ext4 multi-fsblock atomic write using bigalloc ext4: Add multi-fsblock atomic write support with bigalloc ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS ext4: Make ext4_meta_trans_blocks() non-static for later use ext4: Check if inode uses extents in ext4_inode_can_atomic_write() ext4: Document an edge case for overwrites jbd2: remove journal_t argument from jbd2_superblock_csum() jbd2: remove journal_t argument from jbd2_chksum() ext4: remove sb argument from ext4_superblock_csum() ext4: remove sbi argument from ext4_chksum() ext4: enable large folio for regular file ext4: make online defragmentation support large folios ext4: make the writeback path support large folios ...
2025-05-28Merge tag 'ntfs3_for_6.16' of ↵Linus Torvalds8-257/+28
https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs updates from Konstantin Komarov: "Added: - missing direct_IO in ntfs_aops_cmpr - handling of hdr_first_de() return value Fixed: - handling of InitializeFileRecordSegment operation. Removed: - ability to change compression on mounted volume - redundant NULL check" * tag 'ntfs3_for_6.16' of https://github.com/Paragon-Software-Group/linux-ntfs3: fs/ntfs3: remove ability to change compression on mounted volume fs/ntfs3: Fix handling of InitializeFileRecordSegment fs/ntfs3: Add missing direct_IO in ntfs_aops_cmpr fs/ntfs3: handle hdr_first_de() return value fs/ntfs3: Drop redundant NULL check
2025-05-28Merge tag 'for-linus-6.16-ofs1' of ↵Linus Torvalds3-98/+102
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs update from Mike Marshall: "Convert to use the new mount API. Code from Eric Sandeen at redhat that converts orangefs over to the new mount API" * tag 'for-linus-6.16-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: Convert to use the new mount API
2025-05-28Merge tag 'exfat-for-6.16-rc1' of ↵Linus Torvalds2-23/+8
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Fix xfstests generic/482 test failure - Fix double free in delayed_free * tag 'exfat-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: do not clear volume dirty flag during sync exfat: fix double free in delayed_free
2025-05-28Merge tag 'for-6.16-tag' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A fixup to the xarray conversion sent in the main 6.16 batch. It was not included because it would cause rebase/refresh of like 80 patches, right before sending the early pull request last week. It's fixing a bug when zoned mode is enabled on btrfs so it's not affecting most people" * tag 'for-6.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: don't drop a reference if btrfs_check_write_meta_pointer() fails
2025-05-28Merge tag 'kvm-s390-next-6.16-1' of ↵Paolo Bonzini25-392/+482
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD * Fix interaction between some filesystems and Secure Execution * Some cleanups and refactorings, preparing for an upcoming big series
2025-05-28Merge tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds1726-36208/+71418
Pull drm updates from Dave Airlie: "As part of building up nova-core/nova-drm pieces we've brought in some rust abstractions through this tree, aux bus being the main one, with devres changes also in the driver-core tree. Along with the drm core abstractions and enough nova-core/nova-drm to use them. This is still all stub work under construction, to build the nova driver upstream. The other big NVIDIA related one is nouveau adds support for Hopper/Blackwell GPUs, this required a new GSP firmware update to 570.144, and a bunch of rework in order to support multiple fw interfaces. There is also the introduction of an asahi uapi header file as a precursor to getting the real driver in later, but to unblock userspace mesa packages while the driver is trapped behind rust enablement. Otherwise it's the usual mixture of stuff all over, amdgpu, i915/xe, and msm being the main ones, and some changes to vsprintf. new drivers: - bring in the asahi uapi header standalone - nova-drm: stub driver rust dependencies (for nova-core): - auxiliary - bus abstractions - driver registration - sample driver - devres changes from driver-core - revocable changes core: - add Apple fourcc modifiers - add virtio capset definitions - extend EXPORT_SYNC_FILE for timeline syncobjs - convert to devm_platform_ioremap_resource - refactor shmem helper page pinning - DP powerup/down link helpers - extended %p4cc in vsprintf.c to support fourcc prints - change vsprintf %p4cn to %p4chR, remove %p4cn - Add drm_file_err function - IN_FORMATS_ASYNC property - move sitronix from tiny to their own subdir rust: - add drm core infrastructure rust abstractions (device/driver, ioctl, file, gem) dma-buf: - adjust sg handling to not cache map on attach - allow setting dma-device for import - Add a helper to sort and deduplicate dma_fence arrays docs: - updated drm scheduler docs - fbdev todo update - fb rendering - actual brightness ttm: - fix delayed destroy resv object bridge: - add kunit tests - convert tc358775 to atomic - convert drivers to devm_drm_bridge_alloc - convert rk3066_hdmi to bridge driver scheduler: - add kunit tests panel: - refcount panels to improve lifetime handling - Powertip PH128800T004-ZZA01 - NLT NL13676BC25-03F, Tianma TM070JDHG34-00 - Himax HX8279/HX8279-D DDIC - Visionox G2647FB105 - Sitronix ST7571 - ZOTAC rotation quirk vkms: - allow attaching more displays i915: - xe3lpd display updates - vrr refactor - intel_display struct conversions - xe2hpd memory type identification - add link rate/count to i915_display_info - cleanup VGA plane handling - refactor HDCP GSC - fix SLPC wait boosting reference counting - add 20ms delay to engine reset - fix fence release on early probe errors xe: - SRIOV updates - BMG PCI ID update - support separate firmware for each GT - SVM fix, prelim SVM multi-device work - export fan speed - temp disable d3cold on BMG - backup VRAM in PM notifier instead of suspend/freeze - update xe_ttm_access_memory to use GPU for non-visible access - fix guc_info debugfs for VFs - use copy_from_user instead of __copy_from_user - append PCIe gen5 limitations to xe_firmware document amdgpu: - DSC cleanup - DC Scaling updates - Fused I2C-over-AUX updates - DMUB updates - Use drm_file_err in amdgpu - Enforce isolation updates - Use new dma_fence helpers - USERQ fixes - Documentation updates - SR-IOV updates - RAS updates - PSP 12 cleanups - GC 9.5 updates - SMU 13.x updates - VCN / JPEG SR-IOV updates amdkfd: - Update error messages for SDMA - Userptr updates - XNACK fixes radeon: - CIK doorbell cleanup nouveau: - add support for NVIDIA r570 GSP firmware - enable Hopper/Blackwell support nova-core: - fix task list - register definition infrastructure - move firmware into own rust module - register auxiliary device for nova-drm nova-drm: - initial driver skeleton msm: - GPU: - ACD (adaptive clock distribution) for X1-85 - drop fictional address_space_size - improve GMU HFI response time out robustness - fix crash when throttling during boot - DPU: - use single CTL path for flushing on DPU 5.x+ - improve SSPP allocation code for better sharing - Enabled SmartDMA on SM8150, SC8180X, SC8280XP, SM8550 - Added SAR2130P support - Disabled DSC support on MSM8937, MSM8917, MSM8953, SDM660 - DP: - switch to new audio helpers - better LTTPR handling - DSI: - Added support for SA8775P - Added SAR2130P support - HDMI: - Switched to use new helpers for ACR data - Fixed old standing issue of HPD not working in some cases amdxdna: - add dma-buf support - allow empty command submits renesas: - add dma-buf support - add zpos, alpha, blend support panthor: - fail properly for NO_MMAP bos - add SET_LABEL ioctl - debugfs BO dumping support imagination: - update DT bindings - support TI AM68 GPU hibmc: - improve interrupt handling and HPD support virtio: - add panic handler support rockchip: - add RK3588 support - add DP AUX bus panel support ivpu: - add heartbeat based hangcheck mediatek: - prepares support for MT8195/99 HDMIv2/DDCv2 anx7625: - improve HPD tegra: - speed up firmware loading * tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel: (1627 commits) drm/nouveau/tegra: Fix error pointer vs NULL return in nvkm_device_tegra_resource_addr() drm/xe: Default auto_link_downgrade status to false drm/xe/guc: Make creation of SLPC debugfs files conditional drm/i915/display: Add check for alloc_ordered_workqueue() and alloc_workqueue() drm/i915/dp_mst: Work around Thunderbolt sink disconnect after SINK_COUNT_ESI read drm/i915/ptl: Use everywhere the correct DDI port clock select mask drm/nouveau/kms: add support for GB20x drm/dp: add option to disable zero sized address only transactions. drm/nouveau: add support for GB20x drm/nouveau/gsp: add hal for fifo.chan.doorbell_handle drm/nouveau: add support for GB10x drm/nouveau/gf100-: track chan progress with non-WFI semaphore release drm/nouveau/nv50-: separate CHANNEL_GPFIFO handling out from CHANNEL_DMA drm/nouveau: add helper functions for allocating pinned/cpu-mapped bos drm/nouveau: add support for GH100 drm/nouveau: improve handling of 64-bit BARs drm/nouveau/gv100-: switch to volta semaphore methods drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES drm/nouveau/gsp: init client VMMs with NV0080_CTRL_DMA_SET_PAGE_DIRECTORY drm/nouveau/gsp: fetch level shift and PDE from BAR2 VMM ...
2025-05-28Merge tag 'media/v6.16-1' of ↵Linus Torvalds320-9032/+24119
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - v4l2-core fix: V4L2_BUF_TYPE_VIDEO_OVERLAY is capture, not output - New driver: Amlogic C3 ISP - New sensor drivers: ST VD55G1 and VD56G3, OmniVision OV02C10 - amlogic: c3-mipi-csi2: Handle 64-bits division - a fix for 64-bits division at the amlogic c3-mipi-csi2 driver - Changes at atomisp to support mainline mt9m114 driver and remove deprecated GPIO APIs - various cleanups, fixes and enhancements * tag 'media/v6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (314 commits) media: rkvdec: h264: Support High 10 and 4:2:2 profiles media: rkvdec: Add get_image_fmt ops media: rkvdec: Initialize the m2m context before the controls media: rkvdec: h264: Limit minimum profile to constrained baseline media: mediatek: jpeg: support 34bits media: verisilicon: Free post processor buffers on error media: platform: mtk-mdp3: Remove unused mdp_get_plat_device media: amlogic: c3-mipi-csi2: Handle 64-bits division media: uvcvideo: Use dev_err_probe for devm_gpiod_get_optional media: uvcvideo: Fix deferred probing error media: uvcvideo: Rollback non processed entities on error media: uvcvideo: Send control events for partial succeeds media: uvcvideo: Return the number of processed controls media: uvcvideo: Do not turn on the camera for some ioctls media: uvcvideo: Make power management granular media: uvcvideo: Increase/decrease the PM counter per IOCTL media: uvcvideo: Create uvc_pm_(get|put) functions media: uvcvideo: Keep streaming state in the file handle Documentation: media: Add documentation file c3-isp.rst Documentation: media: Add documentation file metafmt-c3-isp.rst ...
2025-05-28f2fs: fix to correct check conditions in f2fs_cross_renameZhiguo Niu1-1/+1
Should be "old_dir" here. Fixes: 5c57132eaf52 ("f2fs: support project quota") Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28f2fs: use d_inode(dentry) cleanup dentry->d_inodeZhiguo Niu2-6/+6
no logic changes. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28f2fs: fix to skip f2fs_balance_fs() if checkpoint is disabledChao Yu1-1/+1
Syzbot reports a f2fs bug as below: INFO: task syz-executor328:5856 blocked for more than 144 seconds. Not tainted 6.15.0-rc6-syzkaller-00208-g3c21441eeffc #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor328 state:D stack:24392 pid:5856 tgid:5832 ppid:5826 task_flags:0x400040 flags:0x00004006 Call Trace: <TASK> context_switch kernel/sched/core.c:5382 [inline] __schedule+0x168f/0x4c70 kernel/sched/core.c:6767 __schedule_loop kernel/sched/core.c:6845 [inline] schedule+0x165/0x360 kernel/sched/core.c:6860 io_schedule+0x81/0xe0 kernel/sched/core.c:7742 f2fs_balance_fs+0x4b4/0x780 fs/f2fs/segment.c:444 f2fs_map_blocks+0x3af1/0x43b0 fs/f2fs/data.c:1791 f2fs_expand_inode_data+0x653/0xaf0 fs/f2fs/file.c:1872 f2fs_fallocate+0x4f5/0x990 fs/f2fs/file.c:1975 vfs_fallocate+0x6a0/0x830 fs/open.c:338 ioctl_preallocate fs/ioctl.c:290 [inline] file_ioctl fs/ioctl.c:-1 [inline] do_vfs_ioctl+0x1b8f/0x1eb0 fs/ioctl.c:885 __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0x82/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xf6/0x210 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is after commit 84b5bb8bf0f6 ("f2fs: modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable"), we will get chance to allow f2fs_is_checkpoint_ready() to return true once below conditions are all true: 1. checkpoint is disabled 2. there are not enough free segments 3. there are enough free blocks Then it will cause f2fs_balance_fs() to trigger foreground GC. void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need) ... if (!f2fs_is_checkpoint_ready(sbi)) return; And the testcase mounts f2fs image w/ gc_merge,checkpoint=disable, so deadloop will happen through below race condition: - f2fs_do_shutdown - vfs_fallocate - gc_thread_func - file_start_write - __sb_start_write(SB_FREEZE_WRITE) - f2fs_fallocate - f2fs_expand_inode_data - f2fs_map_blocks - f2fs_balance_fs - prepare_to_wait - wake_up(gc_wait_queue_head) - io_schedule - bdev_freeze - freeze_super - sb->s_writers.frozen = SB_FREEZE_WRITE; - sb_wait_write(sb, SB_FREEZE_WRITE); - if (sbi->sb->s_writers.frozen >= SB_FREEZE_WRITE) continue; : cause deadloop This patch fix to add check condition in f2fs_balance_fs(), so that if checkpoint is disabled, we will just skip trigger foreground GC to avoid such deadloop issue. Meanwhile let's remove f2fs_is_checkpoint_ready() check condition in f2fs_balance_fs(), since it's redundant, due to the main logic in the function is to check: a) whether checkpoint is disabled b) there is enough free segments f2fs_balance_fs() still has all logics after f2fs_is_checkpoint_ready()'s removal. Reported-by: syzbot+aa5bb5f6860e08a60450@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/682d743a.a00a0220.29bc26.0289.GAE@google.com Fixes: 84b5bb8bf0f6 ("f2fs: modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable") Cc: Qi Han <hanqi@vivo.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28f2fs: clean up to check bi_status w/ BLK_STS_OKChao Yu2-4/+4
Check bi_status w/ BLK_STS_OK instead of 0 for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28f2fs: introduce is_{meta,node}_folioChao Yu5-15/+24
Just cleanup, no changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28f2fs: add ckpt_valid_blocks to the section entryyohan.joung2-18/+85
when performing buffered writes in a large section, overhead is incurred due to the iteration through ckpt_valid_blocks within the section. when SEGS_PER_SEC is 128, this overhead accounts for 20% within the f2fs_write_single_data_page routine. as the size of the section increases, the overhead also grows. to handle this problem ckpt_valid_blocks is added within the section entries. Test insmod null_blk.ko nr_devices=1 completion_nsec=1 submit_queues=8 hw_queue_depth=64 max_sectors=512 bs=4096 memory_backed=1 make_f2fs /dev/block/nullb0 make_f2fs -s 128 /dev/block/nullb0 fio --bs=512k --size=1536M --rw=write --name=1 --filename=/mnt/test_dir/seq_write --ioengine=io_uring --iodepth=64 --end_fsync=1 before SEGS_PER_SEC 1 2556MiB/s SEGS_PER_SEC 128 2145MiB/s after SEGS_PER_SEC 1 2556MiB/s SEGS_PER_SEC 128 2556MiB/s Signed-off-by: yohan.joung <yohan.joung@sk.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28f2fs: add a method for calculating the remaining blocks in the current ↵yohan.joung1-4/+19
segment in LFS mode. In LFS mode, the previous segment cannot use invalid blocks, so the remaining blocks from the next_blkoff of the current segment to the end of the section are calculated. Signed-off-by: yohan.joung <yohan.joung@sk.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28KVM: s390: Simplify and move pv codeClaudio Imbrenda11-182/+133
All functions in kvm/gmap.c fit better in kvm/pv.c instead. Move and rename them appropriately, then delete the now empty kvm/gmap.c and kvm/gmap.h. Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20250528095502.226213-5-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250528095502.226213-5-imbrenda@linux.ibm.com>
2025-05-28KVM: s390: Refactor and split some gmap helpersClaudio Imbrenda8-187/+274
Refactor some gmap functions; move the implementation into a separate file with only helper functions. The new helper functions work on vm addresses, leaving all gmap logic in the gmap functions, which mostly become just wrappers. The whole gmap handling is going to be moved inside KVM soon, but the helper functions need to touch core mm functions, and thus need to stay in the core of kernel. Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20250528095502.226213-4-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250528095502.226213-4-imbrenda@linux.ibm.com>
2025-05-28KVM: s390: Remove unneeded srcu lockClaudio Imbrenda1-4/+2
All paths leading to handle_essa() already hold the kvm->srcu. Remove unneeded srcu locking from handle_essa(). Add lockdep assertion to make sure we will always be holding kvm->srcu when entering handle_essa(). Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Link: https://lore.kernel.org/r/20250528095502.226213-3-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250528095502.226213-3-imbrenda@linux.ibm.com>
2025-05-28s390: Remove unneeded includesClaudio Imbrenda8-7/+2
Many files don't need to include asm/tlb.h or asm/gmap.h. On the other hand, asm/tlb.h does need to include asm/gmap.h. Remove all unneeded includes so that asm/tlb.h is not directly used by s390 arch code anymore. Remove asm/gmap.h from a few other files as well, so that now only KVM code, mm/gmap.c, and asm/tlb.h include it. Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20250528095502.226213-2-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250528095502.226213-2-imbrenda@linux.ibm.com>
2025-05-28s390/uv: Improve splitting of large folios that cannot be split while dirtyDavid Hildenbrand1-6/+60
Currently, starting a PV VM on an iomap-based filesystem with large folio support, such as XFS, will not work. We'll be stuck in unpack_one()->gmap_make_secure(), because we can't seem to make progress splitting the large folio. The problem is that we require a writable PTE but a writable PTE under such filesystems will imply a dirty folio. So whenever we have a writable PTE, we'll have a dirty folio, and dirty iomap folios cannot currently get split, because split_folio()->split_huge_page_to_list_to_order()->filemap_release_folio() will fail in iomap_release_folio(). So we will not make any progress splitting such large folios. Until dirty folios can be split more reliably, let's manually trigger writeback of the problematic folio using filemap_write_and_wait_range(), and retry the split immediately afterwards exactly once, before looking up the folio again. Should this logic be part of split_folio()? Likely not; most split users don't have to split so eagerly to make any progress. For now, this seems to affect xfs, zonefs and erofs, and this patch makes it work again (tested on xfs only). While this could be considered a fix for commit 6795801366da ("xfs: Support large folios"), commit df2f9708ff1f ("zonefs: enable support for large folios") and commit ce529cc25b18 ("erofs: enable large folios for iomap mode"), before commit eef88fe45ac9 ("s390/uv: Split large folios in gmap_make_secure()"), we did not try splitting large folios at all. So it's all rather part of making SE compatible with file systems that support large folios. But to have some "Fixes:" tag, let's just use eef88fe45ac9. Not CCing stable, because there are a lot of dependencies, and it simply not working is not critical in stable kernels. Reported-by: Sebastian Mitterle <smitterl@redhat.com> Closes: https://issues.redhat.com/browse/RHEL-58218 Fixes: eef88fe45ac9 ("s390/uv: Split large folios in gmap_make_secure()") Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20250516123946.1648026-4-david@redhat.com Message-ID: <20250516123946.1648026-4-david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2025-05-28Merge tag 'audit-pr-20250527' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: - Always record AUDIT_ANOM events when auditing is enabled. Prior to this patch we only recorded AUDIT_ANOM events if auditing was enabled and the admin/distro had explicitly configured audit beyond the defaults. Considering that AUDIT_ANOM events are anomolous events considered to be "security relevant", it seems wise to record these events as long as auditing is enabled, even if the system is running with a default audit configuration. - Mark the audit_log_vformat() function with the __printf() attribute to quiet GCC. * tag 'audit-pr-20250527' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: record AUDIT_ANOM_* events regardless of presence of rules audit: mark audit_log_vformat() with __printf() attribute
2025-05-28Merge tag 'selinux-pr-20250527' of ↵Linus Torvalds11-85/+232
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Reduce the SELinux impact on path walks. Add a small directory access cache to the per-task SELinux state. This cache allows SELinux to cache the most recently used directory access decisions in order to avoid repeatedly querying the AVC on path walks where the majority of the directories have similar security contexts/labels. My performance measurements are crude, but prior to this patch the time spent in SELinux code on a 'make allmodconfig' run was 103% that of __d_lookup_rcu(), and with this patch the time spent in SELinux code dropped to 63% of __d_lookup_rcu(), a ~40% improvement. Additional improvments can be expected in the future, but those will require additional SELinux policy/toolchain support. - Add support for wildcards in genfscon policy statements. This patch allows for wildcards in the genfscon patch matching logic as opposed to the prefix matching that was used prior to this change. Adding wilcard support allows for more expressive and efficient path matching in the policy which is especially helpful for sysfs, and has resulted in a ~15% boot time reduction in Android. SELinux policies can opt into wilcard matching by using the "genfs_seclabel_wildcard" policy capability. - Unify the error/OOM handling of the SELinux network caches. A failure to allocate memory for the SELinux network caches isn't fatal as the object label can still be safely returned to the caller, it simply means that we cannot add the new data to the cache, at least temporarily. This patch corrects this behavior for the InfiniBand cache and does some minor cleanup. - Minor improvements around constification, 'likely' annotations, and removal of bogus comments. * tag 'selinux-pr-20250527' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: fix the kdoc header for task_avdcache_update selinux: remove a duplicated include selinux: reduce path walk overhead selinux: support wildcard match in genfscon selinux: drop copy-paste comment selinux: unify OOM handling in network hashtables selinux: add likely hints for fast paths selinux: contify network namespace pointer selinux: constify network address pointer
2025-05-28Merge tag 'lsm-pr-20250527' of ↵Linus Torvalds2-24/+24
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm update from Paul Moore: "One minor LSM framework patch to move the selinux_netlink_send() hook under the CONFIG_SECURITY_NETWORK Kconfig knob" * tag 'lsm-pr-20250527' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: Move security_netlink_send to under CONFIG_SECURITY_NETWORK
2025-05-28Merge tag 'integrity-v6.16' of ↵Linus Torvalds8-34/+283
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "Carrying the IMA measurement list across kexec is not a new feature, but is updated to address a couple of issues: - Carrying the IMA measurement list across kexec required knowing apriori all the file measurements between the "kexec load" and "kexec execute" in order to measure them before the "kexec load". Any delay between the "kexec load" and "kexec exec" exacerbated the problem. - Any file measurements post "kexec load" were not carried across kexec, resulting in the measurement list being out of sync with the TPM PCR. With these changes, the buffer for the IMA measurement list is still allocated at "kexec load", but copying the IMA measurement list is deferred to after quiescing the TPM. Two new kexec critical data records are defined" * tag 'integrity-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: do not copy measurement list to kdump kernel ima: measure kexec load and exec events as critical data ima: make the kexec extra memory configurable ima: verify if the segment size has changed ima: kexec: move IMA log copy from kexec load to execute ima: kexec: define functions to copy IMA log at soft boot ima: kexec: skip IMA segment validation after kexec soft reboot kexec: define functions to map and unmap segments ima: define and call ima_alloc_kexec_file_buf() ima: rename variable the seq_file "file" to "ima_kexec_file"
2025-05-28Merge tag 'Smack-for-6.16' of https://github.com/cschaufler/smack-nextLinus Torvalds1-7/+5
Pull smack update from Casey Schaufler: "One trivial kernel doc fix" * tag 'Smack-for-6.16' of https://github.com/cschaufler/smack-next: security/smack/smackfs: small kernel-doc fixes
2025-05-28Merge tag 'hardening-v6.16-rc1' of ↵Linus Torvalds30-169/+459
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - Update overflow helpers to ease refactoring of on-stack flex array instances (Gustavo A. R. Silva, Kees Cook) - lkdtm: Use SLAB_NO_MERGE instead of constructors (Harry Yoo) - Simplify CONFIG_CC_HAS_COUNTED_BY (Jan Hendrik Farr) - Disable u64 usercopy KUnit test on 32-bit SPARC (Thomas Weißschuh) - Add missed designated initializers now exposed by fixed randstruct (Nathan Chancellor, Kees Cook) - Document compilers versions for __builtin_dynamic_object_size - Remove ARM_SSP_PER_TASK GCC plugin - Fix GCC plugin randstruct, add selftests, and restore COMPILE_TEST builds - Kbuild: induce full rebuilds when dependencies change with GCC plugins, the Clang sanitizer .scl file, or the randstruct seed. - Kbuild: Switch from -Wvla to -Wvla-larger-than=1 - Correct several __nonstring uses for -Wunterminated-string-initialization * tag 'hardening-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits) Revert "hardening: Disable GCC randstruct for COMPILE_TEST" lib/tests: randstruct: Add deep function pointer layout test lib/tests: Add randstruct KUnit test randstruct: gcc-plugin: Remove bogus void member net: qede: Initialize qede_ll_ops with designated initializer scsi: qedf: Use designated initializer for struct qed_fcoe_cb_ops md/bcache: Mark __nonstring look-up table integer-wrap: Force full rebuild when .scl file changes randstruct: Force full rebuild when seed changes gcc-plugins: Force full rebuild when plugins change kbuild: Switch from -Wvla to -Wvla-larger-than=1 hardening: simplify CONFIG_CC_HAS_COUNTED_BY overflow: Fix direct struct member initialization in _DEFINE_FLEX() kunit/overflow: Add tests for STACK_FLEX_ARRAY_SIZE() helper overflow: Add STACK_FLEX_ARRAY_SIZE() helper input/joystick: magellan: Mark __nonstring look-up table const watchdog: exar: Shorten identity name to fit correctly mod_devicetable: Enlarge the maximum platform_device_id name length overflow: Clarify expectations for getting DEFINE_FLEX variable sizes compiler_types: Identify compiler versions for __builtin_dynamic_object_size ...
2025-05-28Merge tag 'seccomp-v6.16-rc1' of ↵Linus Torvalds2-9/+6
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: - selftest fixes for arm32 (Neill Kapron, Terry Tritton) - documentation typo fix (Sumanth Gavini) * tag 'seccomp-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: selftests: seccomp: Fix "performace" to "performance" selftests/seccomp: fix negative_ENOSYS tracer tests on arm32 selftests/seccomp: fix syscall_restart test for arm compat
2025-05-28dt-bindings: timer: Add fsl,vf610-pit.yamlFrank Li1-0/+54
Add binding doc fsl,vf610-pit.yaml to fix below CHECK_DTB warnings: arch/arm/boot/dts/nxp/vf/vf610m4-colibri.dtb: /soc/bus@40000000/pit@40037000: failed to match any schema with compatible: ['fsl,vf610-pit'] Signed-off-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20250522205710.502779-1-Frank.Li@nxp.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-05-28dt-bindings: gpu: mali-bifrost: Add compatible for RZ/G3E SoCTommaso Merciai1-0/+2
Add a compatible string for the Renesas RZ/G3E SoC variants that include a Mali-G52 GPU. These variants share the same restrictions on interrupts, clocks, and power domains as the RZ/G2L SoC, so extend the existing schema validation accordingly. Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com> Link: https://lore.kernel.org/r/20250528073040.904033-1-tommaso.merciai.xr@bp.renesas.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-05-28ASoC: dt-bindings: qcom,sm8250: Add Fairphone 5 sound cardLuca Weiss1-0/+1
Document the bindings for the sound card on Fairphone 5 which uses the older non-audioreach audio architecture. Acked-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Link: https://lore.kernel.org/r/20250507-fp5-dp-sound-v4-1-4098e918a29e@fairphone.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-05-28s390/uv: Always return 0 from s390_wiggle_split_folio() if successfulDavid Hildenbrand1-10/+12
Let's consistently return 0 if the operation was successful, and just detect ourselves whether splitting is required -- folio_test_large() is a cheap operation. Update the documentation. Should we simply always return -EAGAIN instead of 0, so we don't have to handle it in the caller? Not sure, staring at the documentation, this way looks a bit cleaner. Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20250516123946.1648026-3-david@redhat.com Message-ID: <20250516123946.1648026-3-david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2025-05-28s390/uv: Don't return 0 from make_hva_secure() if the operation was not ↵David Hildenbrand1-1/+4
successful If s390_wiggle_split_folio() returns 0 because splitting a large folio succeeded, we will return 0 from make_hva_secure() even though a retry is required. Return -EAGAIN in that case. Otherwise, we'll return 0 from gmap_make_secure(), and consequently from unpack_one(). In kvm_s390_pv_unpack(), we assume that unpacking succeeded and skip unpacking this page. Later on, we run into issues and fail booting the VM. So far, this issue was only observed with follow-up patches where we split large pagecache XFS folios. Maybe it can also be triggered with shmem? We'll cleanup s390_wiggle_split_folio() a bit next, to also return 0 if no split was required. Fixes: d8dfda5af0be ("KVM: s390: pv: fix race when making a page secure") Cc: stable@vger.kernel.org Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20250516123946.1648026-2-david@redhat.com Message-ID: <20250516123946.1648026-2-david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2025-05-28Merge branch 'kvm-lockdep-common' into HEADPaolo Bonzini13-170/+131
Introduce new mutex locking functions mutex_trylock_nest_lock() and mutex_lock_killable_nest_lock() and use them to clean up locking of all vCPUs for a VM. For x86, this removes some complex code that was used instead of lockdep's "nest_lock" feature. For ARM and RISC-V, this removes a lockdep warning when the VM is configured to have more than MAX_LOCK_DEPTH vCPUs, and removes a fair amount of duplicate code by sharing the logic across all architectures. Signed-off-by: Paolo BOnzini <pbonzini@redhat.com>
2025-05-28rust: add helper for mutex_trylockPaolo Bonzini1-0/+5
After commit c5b6ababd21a ("locking/mutex: implement mutex_trylock_nested", currently in the KVM tree) mutex_trylock() will be a macro when lockdep is enabled. Rust therefore needs the corresponding helper. Just add it and the rust/bindings/bindings_helpers_generated.rs Makefile rules will do their thing. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20250528083431.1875345-1-pbonzini@redhat.com> Acked-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni24-61/+210
Merge in late fixes to prepare for the 6.16 net-next PR. No conflicts nor adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-28selftests/bpf: Fix bpf selftest build warningSaket Kumar Bhaskar1-3/+3
On linux-next, build for bpf selftest displays a warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h'. Commit 8066e388be48 ("net: add UAPI to the header guard in various network headers") changed the header guard from _LINUX_IF_XDP_H to _UAPI_LINUX_IF_XDP_H in include/uapi/linux/if_xdp.h. To resolve the warning, update tools/include/uapi/linux/if_xdp.h to align with the changes in include/uapi/linux/if_xdp.h Fixes: 8066e388be48 ("net: add UAPI to the header guard in various network headers") Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/all/c2bc466d-dff2-4d0d-a797-9af7f676c065@linux.ibm.com/ Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20250527054138.1086006-1-skb99@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-28selftests: netfilter: Fix skip of wildcard interface testPhil Sutter1-2/+5
The script is supposed to skip wildcard interface testing if unsupported by the host's nft tool. The failing check caused script abort due to 'set -e' though. Fix this by running the potentially failing nft command inside the if-conditional pipe. Fixes: 73db1b5dab6f ("selftests: netfilter: Torture nftables netdev hooks") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://patch.msgid.link/20250527094117.18589-1-phil@nwl.cc Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-28iomap: don't lose folio dropbehind state for overwritesJens Axboe3-3/+26
DONTCACHE I/O must have the completion punted to a workqueue, just like what is done for unwritten extents, as the completion needs task context to perform the invalidation of the folio(s). However, if writeback is started off filemap_fdatawrite_range() off generic_sync() and it's an overwrite, then the DONTCACHE marking gets lost as iomap_add_to_ioend() don't look at the folio being added and no further state is passed down to help it know that this is a dropbehind/DONTCACHE write. Check if the folio being added is marked as dropbehind, and set IOMAP_IOEND_DONTCACHE if that is the case. Then XFS can factor this into the decision making of completion context in xfs_submit_ioend(). Additionally include this ioend flag in the NOMERGE flags, to avoid mixing it with unrelated IO. Since this is the 3rd flag that will cause XFS to punt the completion to a workqueue, add a helper so that each one of them can get appropriately commented. This fixes extra page cache being instantiated when the write performed is an overwrite, rather than newly instantiated blocks. Fixes: b2cd5ae693a3 ("iomap: make buffered writes work with RWF_DONTCACHE") Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/5153f6e8-274d-4546-bf55-30a5018e0d03@kernel.dk Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-28virtio: reject shm region if length is zeroSami Uddin1-0/+2
Prevent usage of shared memory regions where the length is zero, as such configurations are not valid and may lead to unexpected behavior. Signed-off-by: Sami Uddin <sami.md.ko@gmail.com> Message-Id: <20250511222153.2332-1-sami.md.ko@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-05-28net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 framesHoratiu Vultur1-1/+3
We have noticed that when PHY timestamping is enabled, L2 frames seems to be modified by changing two 2 bytes with a value of 0. The place were these 2 bytes seems to be random(or I couldn't find a pattern). In most of the cases the userspace can ignore these frames but if for example those 2 bytes are in the correction field there is nothing to do. This seems to happen when configuring the HW for IPv4 even that the flow is not enabled. These 2 bytes correspond to the UDPv4 checksum and once we don't enable clearing the checksum when using L2 frames then the frame doesn't seem to be changed anymore. Fixes: 7d272e63e0979d ("net: phy: mscc: timestamping and PHC support") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://patch.msgid.link/20250523082716.2935895-1-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-28net: openvswitch: Fix the dead loop of MPLS parseFaicker Mo1-1/+1
The unexpected MPLS packet may not end with the bottom label stack. When there are many stacks, The label count value has wrapped around. A dead loop occurs, soft lockup/CPU stuck finally. stack backtrace: UBSAN: array-index-out-of-bounds in /build/linux-0Pa0xK/linux-5.15.0/net/openvswitch/flow.c:662:26 index -1 is out of range for type '__be32 [3]' CPU: 34 PID: 0 Comm: swapper/34 Kdump: loaded Tainted: G OE 5.15.0-121-generic #131-Ubuntu Hardware name: Dell Inc. PowerEdge C6420/0JP9TF, BIOS 2.12.2 07/14/2021 Call Trace: <IRQ> show_stack+0x52/0x5c dump_stack_lvl+0x4a/0x63 dump_stack+0x10/0x16 ubsan_epilogue+0x9/0x36 __ubsan_handle_out_of_bounds.cold+0x44/0x49 key_extract_l3l4+0x82a/0x840 [openvswitch] ? kfree_skbmem+0x52/0xa0 key_extract+0x9c/0x2b0 [openvswitch] ovs_flow_key_extract+0x124/0x350 [openvswitch] ovs_vport_receive+0x61/0xd0 [openvswitch] ? kernel_init_free_pages.part.0+0x4a/0x70 ? get_page_from_freelist+0x353/0x540 netdev_port_receive+0xc4/0x180 [openvswitch] ? netdev_port_receive+0x180/0x180 [openvswitch] netdev_frame_hook+0x1f/0x40 [openvswitch] __netif_receive_skb_core.constprop.0+0x23a/0xf00 __netif_receive_skb_list_core+0xfa/0x240 netif_receive_skb_list_internal+0x18e/0x2a0 napi_complete_done+0x7a/0x1c0 bnxt_poll+0x155/0x1c0 [bnxt_en] __napi_poll+0x30/0x180 net_rx_action+0x126/0x280 ? bnxt_msix+0x67/0x80 [bnxt_en] handle_softirqs+0xda/0x2d0 irq_exit_rcu+0x96/0xc0 common_interrupt+0x8e/0xa0 </IRQ> Fixes: fbdcdd78da7c ("Change in Openvswitch to support MPLS label depth of 3 in ingress direction") Signed-off-by: Faicker Mo <faicker.mo@zenlayer.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/259D3404-575D-4A6D-B263-1DF59A67CF89@zenlayer.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-28calipso: Don't call calipso functions for AF_INET sk.Kuniyuki Iwashima1-0/+3
syzkaller reported a null-ptr-deref in txopt_get(). [0] The offset 0x70 was of struct ipv6_txoptions in struct ipv6_pinfo, so struct ipv6_pinfo was NULL there. However, this never happens for IPv6 sockets as inet_sk(sk)->pinet6 is always set in inet6_create(), meaning the socket was not IPv6 one. The root cause is missing validation in netlbl_conn_setattr(). netlbl_conn_setattr() switches branches based on struct sockaddr.sa_family, which is passed from userspace. However, netlbl_conn_setattr() does not check if the address family matches the socket. The syzkaller must have called connect() for an IPv6 address on an IPv4 socket. We have a proper validation in tcp_v[46]_connect(), but security_socket_connect() is called in the earlier stage. Let's copy the validation to netlbl_conn_setattr(). [0]: Oops: general protection fault, probably for non-canonical address 0xdffffc000000000e: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077] CPU: 2 UID: 0 PID: 12928 Comm: syz.9.1677 Not tainted 6.12.0 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:txopt_get include/net/ipv6.h:390 [inline] RIP: 0010: Code: 02 00 00 49 8b ac 24 f8 02 00 00 e8 84 69 2a fd e8 ff 00 16 fd 48 8d 7d 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 53 02 00 00 48 8b 6d 70 48 85 ed 0f 84 ab 01 00 RSP: 0018:ffff88811b8afc48 EFLAGS: 00010212 RAX: dffffc0000000000 RBX: 1ffff11023715f8a RCX: ffffffff841ab00c RDX: 000000000000000e RSI: ffffc90007d9e000 RDI: 0000000000000070 RBP: 0000000000000000 R08: ffffed1023715f9d R09: ffffed1023715f9e R10: ffffed1023715f9d R11: 0000000000000003 R12: ffff888123075f00 R13: ffff88810245bd80 R14: ffff888113646780 R15: ffff888100578a80 FS: 00007f9019bd7640(0000) GS:ffff8882d2d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f901b927bac CR3: 0000000104788003 CR4: 0000000000770ef0 PKRU: 80000000 Call Trace: <TASK> calipso_sock_setattr+0x56/0x80 net/netlabel/netlabel_calipso.c:557 netlbl_conn_setattr+0x10c/0x280 net/netlabel/netlabel_kapi.c:1177 selinux_netlbl_socket_connect_helper+0xd3/0x1b0 security/selinux/netlabel.c:569 selinux_netlbl_socket_connect_locked security/selinux/netlabel.c:597 [inline] selinux_netlbl_socket_connect+0xb6/0x100 security/selinux/netlabel.c:615 selinux_socket_connect+0x5f/0x80 security/selinux/hooks.c:4931 security_socket_connect+0x50/0xa0 security/security.c:4598 __sys_connect_file+0xa4/0x190 net/socket.c:2067 __sys_connect+0x12c/0x170 net/socket.c:2088 __do_sys_connect net/socket.c:2098 [inline] __se_sys_connect net/socket.c:2095 [inline] __x64_sys_connect+0x73/0xb0 net/socket.c:2095 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xaa/0x1b0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f901b61a12d Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f9019bd6fa8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 00007f901b925fa0 RCX: 00007f901b61a12d RDX: 000000000000001c RSI: 0000200000000140 RDI: 0000000000000003 RBP: 00007f901b701505 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007f901b5b62a0 R15: 00007f9019bb7000 </TASK> Modules linked in: Fixes: ceba1832b1b2 ("calipso: Set the calipso socket label to match the secattr.") Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: John Cheung <john.cs.hey@gmail.com> Closes: https://lore.kernel.org/netdev/CAP=Rh=M1LzunrcQB1fSGauMrJrhL6GGps5cPAKzHJXj6GQV+-g@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20250522221858.91240-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-28Merge branch ↵Paolo Abeni2-1/+43
'net_sched-hfsc-address-reentrant-enqueue-adding-class-to-eltree-twice' Pedro Tammela says: ==================== net_sched: hfsc: Address reentrant enqueue adding class to eltree twice Savino says: "We are writing to report that this recent patch (141d34391abbb315d68556b7c67ad97885407547) can be bypassed, and a UAF can still occur when HFSC is utilized with NETEM. The patch only checks the cl->cl_nactive field to determine whether it is the first insertion or not, but this field is only incremented by init_vf. By using HFSC_RSC (which uses init_ed), it is possible to bypass the check and insert the class twice in the eltree. Under normal conditions, this would lead to an infinite loop in hfsc_dequeue for the reasons we already explained in this report. However, if TBF is added as root qdisc and it is configured with a very low rate, it can be utilized to prevent packets from being dequeued. This behavior can be exploited to perform subsequent insertions in the HFSC eltree and cause a UAF." To fix both the UAF and the infinite loop, with netem as an hfsc child, check explicitly in hfsc_enqueue whether the class is already in the eltree whenever the HFSC_RSC flag is set. Also add a TDC test to reproduce the UAF scenario. ==================== Link: https://patch.msgid.link/20250522181448.1439717-1-pctammela@mojatatu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-28selftests/tc-testing: Add a test for HFSC eltree double add with reentrant ↵Pedro Tammela1-0/+35
enqueue behaviour on netem Reproduce the UAF scenario where netem is a child of HFSC and HFSC is configured to use the eltree. In such case, this TDC test would cause the HFSC class to be added to the eltree twice resulting in a UAF. Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://patch.msgid.link/20250522181448.1439717-3-pctammela@mojatatu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-28net_sched: hfsc: Address reentrant enqueue adding class to eltree twicePedro Tammela1-1/+8
Savino says: "We are writing to report that this recent patch (141d34391abbb315d68556b7c67ad97885407547) [1] can be bypassed, and a UAF can still occur when HFSC is utilized with NETEM. The patch only checks the cl->cl_nactive field to determine whether it is the first insertion or not [2], but this field is only incremented by init_vf [3]. By using HFSC_RSC (which uses init_ed) [4], it is possible to bypass the check and insert the class twice in the eltree. Under normal conditions, this would lead to an infinite loop in hfsc_dequeue for the reasons we already explained in this report [5]. However, if TBF is added as root qdisc and it is configured with a very low rate, it can be utilized to prevent packets from being dequeued. This behavior can be exploited to perform subsequent insertions in the HFSC eltree and cause a UAF." To fix both the UAF and the infinite loop, with netem as an hfsc child, check explicitly in hfsc_enqueue whether the class is already in the eltree whenever the HFSC_RSC flag is set. [1] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=141d34391abbb315d68556b7c67ad97885407547 [2] https://elixir.bootlin.com/linux/v6.15-rc5/source/net/sched/sch_hfsc.c#L1572 [3] https://elixir.bootlin.com/linux/v6.15-rc5/source/net/sched/sch_hfsc.c#L677 [4] https://elixir.bootlin.com/linux/v6.15-rc5/source/net/sched/sch_hfsc.c#L1574 [5] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/T/#u Fixes: 37d9cf1a3ce3 ("sched: Fix detection of empty queues in child qdiscs") Reported-by: Savino Dicanosa <savy@syst3mfailure.io> Reported-by: William Liu <will@willsroot.io> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Tested-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://patch.msgid.link/20250522181448.1439717-2-pctammela@mojatatu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>