summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-12-18NFSD: Revert 6c41d9a9bd0298002805758216a9c44e38a8500dChuck Lever3-118/+14
For some reason, the wait_on_bit() in nfsd4_deleg_getattr_conflict() is waiting forever, preventing a clean server shutdown. The requesting client might also hang waiting for a reply to the conflicting GETATTR. Invoking wait_on_bit() in an nfsd thread context is a hazard. The correct fix is to replace this wait_on_bit() call site with a mechanism that defers the conflicting GETATTR until the CB_GETATTR completes or is known to have failed. That will require some surgery and extended testing and it's late in the v6.7-rc cycle, so I'm reverting now in favor of trying again in a subsequent kernel release. This is my fault: I should have recognized the ramifications of calling wait_on_bit() in here before accepting this patch. Thanks to Dai Ngo <dai.ngo@oracle.com> for diagnosing the issue. Reported-by: Wolfgang Walter <linux-nfs@stwm.de> Closes: https://lore.kernel.org/linux-nfs/e3d43ecdad554fbdcaa7181833834f78@stwm.de/ Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-12-18platform/x86/amd/pmc: Disable keyboard wakeup on AMD Framework 13Mario Limonciello1-0/+17
The Laptop 13 (AMD Ryzen 7040Series) BIOS 03.03 has a workaround included in the EC firmware that will cause the EC to emit a "spurious" keypress during the resume from s0i3 [1]. This series of keypress events can be observed in the kernel log on resume. ``` atkbd serio0: Unknown key pressed (translated set 2, code 0x6b on isa0060/serio0). atkbd serio0: Use 'setkeycodes 6b <keycode>' to make it known. atkbd serio0: Unknown key released (translated set 2, code 0x6b on isa0060/serio0). atkbd serio0: Use 'setkeycodes 6b <keycode>' to make it known. ``` In some user flows this is harmless, but if a user has specifically suspended the laptop and then closed the lid it will cause the laptop to wakeup. The laptop wakes up because the ACPI SCI triggers when the lid is closed and when the kernel sees that IRQ1 is "also" active. The kernel can't distinguish from a real keyboard keypress and wakes the system. Add the model into the list of quirks to disable keyboard wakeup source. This is intentionally only matching the production BIOS version in hopes that a newer EC firmware included in a newer BIOS can avoid this behavior. Cc: Kieran Levin <ktl@framework.net> Link: https://github.com/FrameworkComputer/EmbeddedController/blob/lotus-zephyr/zephyr/program/lotus/azalea/src/power_sequence.c#L313 [1] Link: https://community.frame.work/t/amd-wont-sleep-properly/41755 Link: https://community.frame.work/t/tracking-framework-amd-ryzen-7040-series-lid-wakeup-behavior-feedback/39128 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20231212045006.97581-5-mario.limonciello@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2023-12-18platform/x86/amd/pmc: Move keyboard wakeup disablement detection to pmc-quirksMario Limonciello3-1/+5
Other platforms may need to disable keyboard wakeup besides Cezanne, so move the detection into amd_pmc_quirks_init() where it may be applied to multiple platforms. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20231212045006.97581-4-mario.limonciello@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2023-12-18platform/x86/amd/pmc: Only run IRQ1 firmware version check on CezanneMario Limonciello1-9/+12
amd_pmc_wa_czn_irq1() only runs on Cezanne platforms currently but may be extended to other platforms in the future. Rename the function and only check platform firmware version when it's called for a Cezanne based platform. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20231212045006.97581-3-mario.limonciello@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2023-12-18platform/x86/amd/pmc: Move platform defines to headerMario Limonciello2-10/+11
The platform defines will be used by the quirks in the future, so move them to the common header to allow use by both source files. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20231212045006.97581-2-mario.limonciello@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2023-12-18platform/x86/intel/pmc: Fix hang in pmc_core_send_ltr_ignore()Rajvi Jingar1-1/+1
For input value 0, PMC stays unassigned which causes crash while trying to access PMC for register read/write. Include LTR index 0 in pmc_index and ltr_index calculation. Fixes: 2bcef4529222 ("platform/x86:intel/pmc: Enable debugfs multiple PMC support") Signed-off-by: Rajvi Jingar <rajvi.jingar@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20231216011650.1973941-1-rajvi.jingar@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2023-12-18platform/x86: thinkpad_acpi: fix for incorrect fan reporting on some ↵Vishnu Sankar1-13/+85
ThinkPad systems Some ThinkPad systems ECFW use non-standard addresses for fan control and reporting. This patch adds support for such ECFW so that it can report the correct fan values. Tested on Thinkpads L13 Yoga Gen 2 and X13 Yoga Gen 2. Suggested-by: Mark Pearson <mpearson-lenovo@squebb.ca> Signed-off-by: Vishnu Sankar <vishnuocv@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20231214134702.166464-1-vishnuocv@gmail.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2023-12-18s390/bpf: Fix indirect trampoline generationAlexei Starovoitov2-3/+2
The func_addr used to be NULL for indirect trampolines used by struct_ops. Now func_addr is a valid function pointer. Hence use BPF_TRAMP_F_INDIRECT flag to detect such condition. Fixes: 2cd3e3772e41 ("x86/cfi,bpf: Fix bpf_struct_ops CFI") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/bpf/20231216004549.78355-1-alexei.starovoitov@gmail.com
2023-12-18s390/vx: fix save/restore of fpu kernel contextHeiko Carstens1-1/+1
The KERNEL_FPR mask only contains a flag for the first eight vector registers. However floating point registers overlay parts of the first sixteen vector registers. This could lead to vector register corruption if a kernel fpu context uses any of the vector registers 8 to 15 and is interrupted or calls a KERNEL_FPR context. If that context uses also vector registers 8 to 15, their contents will be corrupted on return. Luckily this is currently not a real bug, since the kernel has only one KERNEL_FPR user with s390_adjust_jiffies() and it is only using floating point registers 0 to 2. Fix this by using the correct bits for KERNEL_FPR. Fixes: 7f79695cc1b6 ("s390/fpu: improve kernel_fpu_[begin|end]") Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-12-18HID: nintendo: fix initializer element is not constant errorRyan McClelland1-22/+22
With gcc-7 builds, an error happens with the controller button values being defined as const. Change to a define. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312141227.C2h1IzfI-lkp@intel.com/ Signed-off-by: Ryan McClelland <rymcclel@gmail.com> Reviewed-by: Daniel J. Ogorchock <djogorchock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2023-12-18bcachefs: print explicit recovery pass message only onceKent Overstreet1-0/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-18Merge branch 'rtnl-rcu'David S. Miller2-9/+15
Pedro Tammela says: ==================== net: rtnl: introduce rcu_replace_pointer_rtnl Introduce the rcu_replace_pointer_rtnl helper to lockdep check rtnl lock rcu replacements, alongside the already existing helpers. Patch 2 uses the new helper in the rtnl_unregister_* functions. Originally this change was part of the P4TC series, as it's a recurrent pattern there, but since it has a use case in mainline we are pushing it separately. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-18net: rtnl: use rcu_replace_pointer_rtnl in rtnl_unregister_*Pedro Tammela1-9/+3
With the introduction of the rcu_replace_pointer_rtnl helper, cleanup the rtnl_unregister_* functions to use the helper instead of open coding it. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-18net: rtnl: introduce rcu_replace_pointer_rtnlJamal Hadi Salim1-0/+12
Introduce the rcu_replace_pointer_rtnl helper to lockdep check rtnl lock rcu replacements, alongside the already existing helpers. This is a quality of life helper so instead of using: rcu_replace_pointer(rp, p, lockdep_rtnl_is_held()) .. or the open coded.. rtnl_dereference() / rcu_assign_pointer() .. or the lazy check version .. rcu_replace_pointer(rp, p, 1) Use: rcu_replace_pointer_rtnl(rp, p) Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-18smb: client: fix potential OOB in cifs_dump_detail()Paulo Alcantara1-5/+7
Validate SMB message with ->check_message() before calling ->calc_smb_size(). Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-18smb: client: fix OOB in smbCalcSize()Paulo Alcantara1-0/+4
Validate @smb->WordCount to avoid reading off the end of @smb and thus causing the following KASAN splat: BUG: KASAN: slab-out-of-bounds in smbCalcSize+0x32/0x40 [cifs] Read of size 2 at addr ffff88801c024ec5 by task cifsd/1328 CPU: 1 PID: 1328 Comm: cifsd Not tainted 6.7.0-rc5 #9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 print_report+0xcf/0x650 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __phys_addr+0x46/0x90 kasan_report+0xd8/0x110 ? smbCalcSize+0x32/0x40 [cifs] ? smbCalcSize+0x32/0x40 [cifs] kasan_check_range+0x105/0x1b0 smbCalcSize+0x32/0x40 [cifs] checkSMB+0x162/0x370 [cifs] ? __pfx_checkSMB+0x10/0x10 [cifs] cifs_handle_standard+0xbc/0x2f0 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 cifs_demultiplex_thread+0xed1/0x1360 [cifs] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? lockdep_hardirqs_on_prepare+0x136/0x210 ? __pfx_lock_release+0x10/0x10 ? srso_alias_return_thunk+0x5/0xfbef5 ? mark_held_locks+0x1a/0x90 ? lockdep_hardirqs_on_prepare+0x136/0x210 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __kthread_parkme+0xce/0xf0 ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] kthread+0x18d/0x1d0 ? kthread+0xdb/0x1d0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> This fixes CVE-2023-6606. Reported-by: j51569436@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218218 Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-18smb: client: fix OOB in SMB2_query_info_init()Paulo Alcantara1-7/+22
A small CIFS buffer (448 bytes) isn't big enough to hold SMB2_QUERY_INFO request along with user's input data from CIFS_QUERY_INFO ioctl. That is, if the user passed an input buffer > 344 bytes, the client will memcpy() off the end of @req->Buffer in SMB2_query_info_init() thus causing the following KASAN splat: BUG: KASAN: slab-out-of-bounds in SMB2_query_info_init+0x242/0x250 [cifs] Write of size 1023 at addr ffff88801308c5a8 by task a.out/1240 CPU: 1 PID: 1240 Comm: a.out Not tainted 6.7.0-rc4 #5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 print_report+0xcf/0x650 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __phys_addr+0x46/0x90 kasan_report+0xd8/0x110 ? SMB2_query_info_init+0x242/0x250 [cifs] ? SMB2_query_info_init+0x242/0x250 [cifs] kasan_check_range+0x105/0x1b0 __asan_memcpy+0x3c/0x60 SMB2_query_info_init+0x242/0x250 [cifs] ? __pfx_SMB2_query_info_init+0x10/0x10 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? smb_rqst_len+0xa6/0xc0 [cifs] smb2_ioctl_query_info+0x4f4/0x9a0 [cifs] ? __pfx_smb2_ioctl_query_info+0x10/0x10 [cifs] ? __pfx_cifsConvertToUTF16+0x10/0x10 [cifs] ? kasan_set_track+0x25/0x30 ? srso_alias_return_thunk+0x5/0xfbef5 ? __kasan_kmalloc+0x8f/0xa0 ? srso_alias_return_thunk+0x5/0xfbef5 ? cifs_strndup_to_utf16+0x12d/0x1a0 [cifs] ? __build_path_from_dentry_optional_prefix+0x19d/0x2d0 [cifs] ? __pfx_smb2_ioctl_query_info+0x10/0x10 [cifs] cifs_ioctl+0x11c7/0x1de0 [cifs] ? __pfx_cifs_ioctl+0x10/0x10 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? rcu_is_watching+0x23/0x50 ? srso_alias_return_thunk+0x5/0xfbef5 ? __rseq_handle_notify_resume+0x6cd/0x850 ? __pfx___schedule+0x10/0x10 ? blkcg_iostat_update+0x250/0x290 ? srso_alias_return_thunk+0x5/0xfbef5 ? ksys_write+0xe9/0x170 __x64_sys_ioctl+0xc9/0x100 do_syscall_64+0x47/0xf0 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f893dde49cf Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 RSP: 002b:00007ffc03ff4160 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffc03ff4378 RCX: 00007f893dde49cf RDX: 00007ffc03ff41d0 RSI: 00000000c018cf07 RDI: 0000000000000003 RBP: 00007ffc03ff4260 R08: 0000000000000410 R09: 0000000000000001 R10: 00007f893dce7300 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc03ff4388 R14: 00007f893df15000 R15: 0000000000406de0 </TASK> Fix this by increasing size of SMB2_QUERY_INFO request buffers and validating input length to prevent other callers from overflowing @req in SMB2_query_info_init() as well. Fixes: f5b05d622a3e ("cifs: add IOCTL for QUERY_INFO passthrough to userspace") Cc: stable@vger.kernel.org Reported-by: Robert Morris <rtm@csail.mit.edu> Signed-off-by: Paulo Alcantara <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-18smb: client: fix OOB in cifsd when receiving compounded respsPaulo Alcantara3-9/+20
Validate next header's offset in ->next_header() so that it isn't smaller than MID_HEADER_SIZE(server) and then standard_receive3() or ->receive() ends up writing off the end of the buffer because 'pdu_length - MID_HEADER_SIZE(server)' wraps up to a huge length: BUG: KASAN: slab-out-of-bounds in _copy_to_iter+0x4fc/0x840 Write of size 701 at addr ffff88800caf407f by task cifsd/1090 CPU: 0 PID: 1090 Comm: cifsd Not tainted 6.7.0-rc4 #5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 print_report+0xcf/0x650 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __phys_addr+0x46/0x90 kasan_report+0xd8/0x110 ? _copy_to_iter+0x4fc/0x840 ? _copy_to_iter+0x4fc/0x840 kasan_check_range+0x105/0x1b0 __asan_memcpy+0x3c/0x60 _copy_to_iter+0x4fc/0x840 ? srso_alias_return_thunk+0x5/0xfbef5 ? hlock_class+0x32/0xc0 ? srso_alias_return_thunk+0x5/0xfbef5 ? __pfx__copy_to_iter+0x10/0x10 ? srso_alias_return_thunk+0x5/0xfbef5 ? lock_is_held_type+0x90/0x100 ? srso_alias_return_thunk+0x5/0xfbef5 ? __might_resched+0x278/0x360 ? __pfx___might_resched+0x10/0x10 ? srso_alias_return_thunk+0x5/0xfbef5 __skb_datagram_iter+0x2c2/0x460 ? __pfx_simple_copy_to_iter+0x10/0x10 skb_copy_datagram_iter+0x6c/0x110 tcp_recvmsg_locked+0x9be/0xf40 ? __pfx_tcp_recvmsg_locked+0x10/0x10 ? mark_held_locks+0x5d/0x90 ? srso_alias_return_thunk+0x5/0xfbef5 tcp_recvmsg+0xe2/0x310 ? __pfx_tcp_recvmsg+0x10/0x10 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? lock_acquire+0x14a/0x3a0 ? srso_alias_return_thunk+0x5/0xfbef5 inet_recvmsg+0xd0/0x370 ? __pfx_inet_recvmsg+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? do_raw_spin_trylock+0xd1/0x120 sock_recvmsg+0x10d/0x150 cifs_readv_from_socket+0x25a/0x490 [cifs] ? __pfx_cifs_readv_from_socket+0x10/0x10 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 cifs_read_from_socket+0xb5/0x100 [cifs] ? __pfx_cifs_read_from_socket+0x10/0x10 [cifs] ? __pfx_lock_release+0x10/0x10 ? do_raw_spin_trylock+0xd1/0x120 ? _raw_spin_unlock+0x23/0x40 ? srso_alias_return_thunk+0x5/0xfbef5 ? __smb2_find_mid+0x126/0x230 [cifs] cifs_demultiplex_thread+0xd39/0x1270 [cifs] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] ? __pfx_lock_release+0x10/0x10 ? srso_alias_return_thunk+0x5/0xfbef5 ? mark_held_locks+0x1a/0x90 ? lockdep_hardirqs_on_prepare+0x136/0x210 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __kthread_parkme+0xce/0xf0 ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] kthread+0x18d/0x1d0 ? kthread+0xdb/0x1d0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Fixes: 8ce79ec359ad ("cifs: update multiplex loop to handle compounded responses") Cc: stable@vger.kernel.org Reported-by: Robert Morris <rtm@csail.mit.edu> Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-18Linux 6.7-rc6Linus Torvalds1-1/+1
2023-12-18Merge tag 'perf_urgent_for_v6.7_rc6' of ↵Linus Torvalds1-0/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - Avoid iterating over newly created group leader event's siblings because there are none, and thus prevent a lockdep splat * tag 'perf_urgent_for_v6.7_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix perf_event_validate_size() lockdep splat
2023-12-17Merge branch 'mptcp-misc-fixes'David S. Miller7-21/+36
Matthieu Baerts says: ==================== mptcp: misc. fixes for v6.7 Here are a few fixes related to MPTCP: Patch 1 avoids skipping some subtests of the MPTCP Join selftest by mistake when using older versions of GCC. This fixes a patch introduced in v6.4, backported up to v6.1. Patch 2 fixes an inconsistent state when using MPTCP + FastOpen. A fix for v6.2. Patch 3 adds a description for MPTCP Kunit test modules to avoid a warning. Patch 4 adds an entry to the mailmap file for Geliang's email addresses. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
2023-12-17mailmap: add entries for Geliang TangGeliang Tang1-0/+4
Map Geliang's old mail addresses to his @linux.dev one. Suggested-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@linux.dev> Reviewed-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17mptcp: fill in missing MODULE_DESCRIPTION()Matthieu Baerts2-0/+2
W=1 builds warn on missing MODULE_DESCRIPTION, add them here in MPTCP. Only two were missing: two modules with different KUnit tests for MPTCP. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17mptcp: fix inconsistent state on fastopen racePaolo Abeni3-17/+26
The netlink PM can race with fastopen self-connect attempts, shutting down the first subflow via: MPTCP_PM_CMD_DEL_ADDR -> mptcp_nl_remove_id_zero_address -> mptcp_pm_nl_rm_subflow_received -> mptcp_close_ssk and transitioning such subflow to FIN_WAIT1 status before the syn-ack packet is processed. The MPTCP code does not react to such state change, leaving the connection in not-fallback status and the subflow handshake uncompleted, triggering the following splat: WARNING: CPU: 0 PID: 10630 at net/mptcp/subflow.c:1405 subflow_data_ready+0x39f/0x690 net/mptcp/subflow.c:1405 Modules linked in: CPU: 0 PID: 10630 Comm: kworker/u4:11 Not tainted 6.6.0-syzkaller-14500-g1c41041124bd #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023 Workqueue: bat_events batadv_nc_worker RIP: 0010:subflow_data_ready+0x39f/0x690 net/mptcp/subflow.c:1405 Code: 18 89 ee e8 e3 d2 21 f7 40 84 ed 75 1f e8 a9 d7 21 f7 44 89 fe bf 07 00 00 00 e8 0c d3 21 f7 41 83 ff 07 74 07 e8 91 d7 21 f7 <0f> 0b e8 8a d7 21 f7 48 89 df e8 d2 b2 ff ff 31 ff 89 c5 89 c6 e8 RSP: 0018:ffffc90000007448 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888031efc700 RCX: ffffffff8a65baf4 RDX: ffff888043222140 RSI: ffffffff8a65baff RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000007 R10: 000000000000000b R11: 0000000000000000 R12: 1ffff92000000e89 R13: ffff88807a534d80 R14: ffff888021c11a00 R15: 000000000000000b FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa19a0ffc81 CR3: 000000007a2db000 CR4: 00000000003506f0 DR0: 000000000000d8dd DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <IRQ> tcp_data_ready+0x14c/0x5b0 net/ipv4/tcp_input.c:5128 tcp_data_queue+0x19c3/0x5190 net/ipv4/tcp_input.c:5208 tcp_rcv_state_process+0x11ef/0x4e10 net/ipv4/tcp_input.c:6844 tcp_v4_do_rcv+0x369/0xa10 net/ipv4/tcp_ipv4.c:1929 tcp_v4_rcv+0x3888/0x3b30 net/ipv4/tcp_ipv4.c:2329 ip_protocol_deliver_rcu+0x9f/0x480 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x2e4/0x510 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip_local_deliver+0x1b6/0x550 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:461 [inline] ip_rcv_finish+0x1c4/0x2e0 net/ipv4/ip_input.c:449 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip_rcv+0xce/0x440 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5527 __netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5641 process_backlog+0x101/0x6b0 net/core/dev.c:5969 __napi_poll.constprop.0+0xb4/0x540 net/core/dev.c:6531 napi_poll net/core/dev.c:6600 [inline] net_rx_action+0x956/0xe90 net/core/dev.c:6733 __do_softirq+0x21a/0x968 kernel/softirq.c:553 do_softirq kernel/softirq.c:454 [inline] do_softirq+0xaa/0xe0 kernel/softirq.c:441 </IRQ> <TASK> __local_bh_enable_ip+0xf8/0x120 kernel/softirq.c:381 spin_unlock_bh include/linux/spinlock.h:396 [inline] batadv_nc_purge_paths+0x1ce/0x3c0 net/batman-adv/network-coding.c:471 batadv_nc_worker+0x9b1/0x10e0 net/batman-adv/network-coding.c:722 process_one_work+0x884/0x15c0 kernel/workqueue.c:2630 process_scheduled_works kernel/workqueue.c:2703 [inline] worker_thread+0x8b9/0x1290 kernel/workqueue.c:2784 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242 </TASK> To address the issue, catch the racing subflow state change and use it to cause the MPTCP fallback. Such fallback is also used to cause the first subflow state propagation to the msk socket via mptcp_set_connected(). After this change, the first subflow can additionally propagate the TCP_FIN_WAIT1 state, so rename the helper accordingly. Finally, if the state propagation is delayed to the msk release callback, the first subflow can change to a different state in between. Cache the relevant target state in a new msk-level field and use such value to update the msk state at release time. Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support") Cc: stable@vger.kernel.org Reported-by: <syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/458 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17selftests: mptcp: join: fix subflow_send_ack lookupGeliang Tang1-4/+4
MPC backups tests will skip unexpected sometimes (For example, when compiling kernel with an older version of gcc, such as gcc-8), since static functions like mptcp_subflow_send_ack also be listed in /proc/kallsyms, with a 't' in front of it, not 'T' ('T' is for a global function): > grep "mptcp_subflow_send_ack" /proc/kallsyms 0000000000000000 T __pfx___mptcp_subflow_send_ack 0000000000000000 T __mptcp_subflow_send_ack 0000000000000000 t __pfx_mptcp_subflow_send_ack 0000000000000000 t mptcp_subflow_send_ack In this case, mptcp_lib_kallsyms_doesnt_have "mptcp_subflow_send_ack$" will be false, MPC backups tests will skip. This is not what we expected. The correct logic here should be: if mptcp_subflow_send_ack is not a global function in /proc/kallsyms, do these MPC backups tests. So a 'T' must be added in front of mptcp_subflow_send_ack. Fixes: 632978f0a961 ("selftests: mptcp: join: skip MPC backups tests if not supported") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <geliang.tang@linux.dev> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17Merge branch 'phy-ackage-addr-mmd-apis'David S. Miller6-78/+266
Christian Marangi says: ==================== net: phy: add PHY package base addr + mmd APIs This small series is required for the upcoming qca807x PHY that will make use of PHY package mmd API and the new implementation with read/write based on base addr. The MMD PHY package patch currently has no use but it will be used in the upcoming patch and it does complete what a PHY package may require in addition to basic read/write to setup global PHY address. (Changelog for all the revision is present in the single patch) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17net: phy: add support for PHY package MMD read/writeChristian Marangi2-0/+156
Some PHY in PHY package may require to read/write MMD regs to correctly configure the PHY package. Add support for these additional required function in both lock and no lock variant. It's assumed that the entire PHY package is either C22 or C45. We use C22 or C45 way of writing/reading to mmd regs based on the passed phydev whether it's C22 or C45. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17net: phy: restructure __phy_write/read_mmd to helper and phydev userChristian Marangi1-34/+30
Restructure phy_write_mmd and phy_read_mmd to implement generic helper for direct mdiobus access for mmd and use these helper for phydev user. This is needed in preparation of PHY package API that requires generic access to the mdiobus and are deatched from phydev struct but instead access them based on PHY package base_addr and offsets. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17net: phy: extend PHY package API to support multiple global addressChristian Marangi5-44/+80
Current API for PHY package are limited to single address to configure global settings for the PHY package. It was found that some PHY package (for example the qca807x, a PHY package that is shipped with a bundle of 5 PHY) requires multiple PHY address to configure global settings. An example scenario is a PHY that have a dedicated PHY for PSGMII/serdes calibrarion and have a specific PHY in the package where the global PHY mode is set and affects every other PHY in the package. Change the API in the following way: - Change phy_package_join() to take the base addr of the PHY package instead of the global PHY addr. - Make __/phy_package_write/read() require an additional arg that select what global PHY address to use by passing the offset from the base addr passed on phy_package_join(). Each user of this API is updated to follow this new implementation following a pattern where an enum is defined to declare the offset of the addr. We also drop the check if shared is defined as any user of the phy_package_read/write is expected to use phy_package_join first. Misuse of this will correctly trigger a kernel panic for NULL pointer exception. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17net: phy: make addr type u8 in phy_package_shared structChristian Marangi1-1/+1
Switch addr type in phy_package_shared struct to u8. The value is already checked to be non negative and to be less than PHY_MAX_ADDR, hence u8 is better suited than using int. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17octeontx2-af: Add new devlink param to configure maximum usable NIX block LFsSuman Ghosh3-24/+133
On some silicon variants the number of available CAM entries are less. Reserving one entry for each NIX-LF for default DMAC based pkt forwarding rules will reduce the number of available CAM entries further. Hence add configurability via devlink to set maximum number of NIX-LFs needed which inturn frees up some CAM entries. Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17Merge tag 'for-6.7-rc5-tag' of ↵Linus Torvalds1-0/+9
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One more fix that verifies that the snapshot source is a root, same check is also done in user space but should be done by the ioctl as well" * tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: do not allow non subvolume root targets for snapshot
2023-12-17Merge tag 'soundwire-6.7-fixes' of ↵Linus Torvalds2-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire Pull soundwire fixes from Vinod Koul: - Null pointer dereference for mult link in core - AC timing fix in intel driver * tag 'soundwire-6.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: soundwire: intel_ace2x: fix AC timing setting for ACE2.x soundwire: stream: fix NULL pointer dereference for multi_link
2023-12-17Merge tag 'phy-fixes-6.7' of ↵Linus Torvalds3-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy Pull phy fixes from Vinod Koul: - register offset fix for TI driver - mediatek driver minimal supported frequency fix - negative error code in probe fix for sunplus driver * tag 'phy-fixes-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: phy: sunplus: return negative error code in sp_usb_phy_probe phy: mediatek: mipi: mt8183: fix minimal supported frequency phy: ti: gmii-sel: Fix register offset when parent is not a syscon node
2023-12-17Merge tag 'dmaengine-fix-6.7' of ↵Linus Torvalds7-30/+41
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fixes from Vinod Koul: - SPI PDMA data fix for TI k3-psil drivers - suspend fix, pointer check, logic for arbitration fix and channel leak fix in fsl-edma driver - couple of fixes in idxd driver for GRPCFG descriptions and int_handle field handling - single fix for stm32 driver for bitfield overflow * tag 'dmaengine-fix-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dmaengine: fsl-edma: fix DMA channel leak in eDMAv4 dmaengine: fsl-edma: fix wrong pointer check in fsl_edma3_attach_pd() dmaengine: idxd: Fix incorrect descriptions for GRPCFG register dmaengine: idxd: Protect int_handle field in hw descriptor dmaengine: stm32-dma: avoid bitfield overflow assertion dmaengine: fsl-edma: Add judgment on enabling round robin arbitration dmaengine: fsl-edma: Do not suspend and resume the masked dma channel when the system is sleeping dmaengine: ti: k3-psil-am62a: Fix SPI PDMA data dmaengine: ti: k3-psil-am62: Fix SPI PDMA data
2023-12-17Merge tag 'cxl-fixes-6.7-rc6' of ↵Linus Torvalds10-24/+49
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull CXL (Compute Express Link) fixes from Dan Williams: "A collection of CXL fixes. The touch outside of drivers/cxl/ is for a helper that allocates physical address space. Device hotplug tests showed that the driver failed to utilize (skipped over) valid capacity when allocating a new memory region. Outside of that, new tests uncovered a small crop of lockdep reports. There is also some miscellaneous error path and leak fixups that are not urgent, but useful to cleanup now. - Fix alloc_free_mem_region()'s scan for address space, prevent false negative out-of-space events - Fix sleeping lock acquisition from CXL trace event (atomic context) - Fix put_device() like for the new CXL PMU driver - Fix wrong pointer freed on error path - Fixup several lockdep reports (missing lock hold) from new assertion in cxl_num_decoders_committed() and new tests" * tag 'cxl-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/pmu: Ensure put_device on pmu devices cxl/cdat: Free correct buffer on checksum error cxl/hdm: Fix dpa translation locking kernel/resource: Increment by align value in get_free_mem_region() cxl: Add cxl_num_decoders_committed() usage to cxl_test cxl/memdev: Hold region_rwsem during inject and clear poison ops cxl/core: Always hold region_rwsem while reading poison lists cxl/hdm: Fix a benign lockdep splat
2023-12-17Merge tag 'edac_urgent_for_v6.7_rc6' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fix from Borislav Petkov: - A single fix for the EDAC Versal driver to read out register fields properly * tag 'edac_urgent_for_v6.7_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/versal: Read num_csrows and num_chans using the correct bitfield macro
2023-12-17Merge tag 'powerpc-6.7-5' of ↵Linus Torvalds3-7/+48
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix a bug where heavy VAS (accelerator) usage could race with partition migration and prevent the migration from completing. - Update MAINTAINERS to add Aneesh & Naveen. Thanks to Haren Myneni. * tag 'powerpc-6.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: MAINTAINERS: powerpc: Add Aneesh & Naveen powerpc/pseries/vas: Migration suspend waits for no in-progress open windows
2023-12-17ovl: fix dentry reference leak after changes to underlying layersAmir Goldstein1-2/+3
syzbot excercised the forbidden practice of moving the workdir under lowerdir while overlayfs is mounted and tripped a dentry reference leak. Fixes: c63e56a4a652 ("ovl: do not open/llseek lower file with upper sb_writers held") Reported-and-tested-by: syzbot+8608bb4553edb8c78f41@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-12-17Merge tag 'ath-next-20231215' of ↵Kalle Valo128-465/+630
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath ath.git patches for v6.8. We have new features only for ath12k but lots of small cleanup for ath10k, ath11k and ath12k. And of course smaller fixes to several drivers. Major changes: ath12k * support one MSI vector * WCN7850: support AP mode
2023-12-17wifi: mt76: mt7996: Use DECLARE_FLEX_ARRAY() and fix -Warray-bounds warningsGustavo A. R. Silva1-5/+5
Transform zero-length arrays `rate`, `adm_stat` and `msdu_cnt` into proper flexible-array members in anonymous union in `struct mt7996_mcu_all_sta_info_event` via the DECLARE_FLEX_ARRAY() helper; and fix multiple -Warray-bounds warnings: drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:544:61: warning: array subscript <unknown> is outside array bounds of 'struct <anonymous>[0]' [-Warray-bounds=] drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:551:58: warning: array subscript <unknown> is outside array bounds of 'struct <anonymous>[0]' [-Warray-bounds=] drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:553:58: warning: array subscript <unknown> is outside array bounds of 'struct <anonymous>[0]' [-Warray-bounds=] drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:530:61: warning: array subscript <unknown> is outside array bounds of 'struct <anonymous>[0]' [-Warray-bounds=] drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:538:66: warning: array subscript <unknown> is outside array bounds of 'struct <anonymous>[0]' [-Warray-bounds=] drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:540:66: warning: array subscript <unknown> is outside array bounds of 'struct <anonymous>[0]' [-Warray-bounds=] drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:520:57: warning: array subscript <unknown> is outside array bounds of 'struct all_sta_trx_rate[0]' [-Warray-bounds=] drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:526:76: warning: array subscript <unknown> is outside array bounds of 'struct all_sta_trx_rate[0]' [-Warray-bounds=] drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:526:76: warning: array subscript <unknown> is outside array bounds of 'struct all_sta_trx_rate[0]' [-Warray-bounds=] drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:526:76: warning: array subscript <unknown> is outside array bounds of 'struct all_sta_trx_rate[0]' [-Warray-bounds=] drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:526:76: warning: array subscript <unknown> is outside array bounds of 'struct all_sta_trx_rate[0]' [-Warray-bounds=] This results in no differences in binary output, helps with the ongoing efforts to globally enable -Warray-bounds. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://msgid.link/ZXiU9ayVCslt3qiI@work
2023-12-17Merge branch 'skb-coalescing-page_pool'David S. Miller3-14/+52
Liang Chen says: ==================== skbuff: Optimize SKB coalescing for page pool The combination of the following condition was excluded from skb coalescing: from->pp_recycle = 1 from->cloned = 1 to->pp_recycle = 1 With page pool in use, this combination can be quite common(ex. NetworkMananger may lead to the additional packet_type being registered, thus the cloning). In scenarios with a higher number of small packets, it can significantly affect the success rate of coalescing. This patchset aims to optimize this scenario and enable coalescing of this particular combination. That also involves supporting multiple users referencing the same fragment of a pp page to accomondate the need to increment the "from" SKB page's pp page reference count. Changes from v10: - re-number patches to 1/3, 2/3, 3/3 Changes from v9: - patch 1 was already applied - imporve description for patch 2 - make sure skb_pp_frag_ref only work for pp aware skbs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17skbuff: Optimization of SKB coalescing for page poolLiang Chen2-12/+45
In order to address the issues encountered with commit 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment recycling"), the combination of the following condition was excluded from skb coalescing: from->pp_recycle = 1 from->cloned = 1 to->pp_recycle = 1 However, with page pool environments, the aforementioned combination can be quite common(ex. NetworkMananger may lead to the additional packet_type being registered, thus the cloning). In scenarios with a higher number of small packets, it can significantly affect the success rate of coalescing. For example, considering packets of 256 bytes size, our comparison of coalescing success rate is as follows: Without page pool: 70% With page pool: 13% Consequently, this has an impact on performance: Without page pool: 2.57 Gbits/sec With page pool: 2.26 Gbits/sec Therefore, it seems worthwhile to optimize this scenario and enable coalescing of this particular combination. To achieve this, we need to ensure the correct increment of the "from" SKB page's page pool reference count (pp_ref_count). Following this optimization, the success rate of coalescing measured in our environment has improved as follows: With page pool: 60% This success rate is approaching the rate achieved without using page pool, and the performance has also been improved: With page pool: 2.52 Gbits/sec Below is the performance comparison for small packets before and after this optimization. We observe no impact to packets larger than 4K. packet size before after improved (bytes) (Gbits/sec) (Gbits/sec) 128 1.19 1.27 7.13% 256 2.26 2.52 11.75% 512 4.13 4.81 16.50% 1024 6.17 6.73 9.05% 2048 14.54 15.47 6.45% 4096 25.44 27.87 9.52% Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Suggested-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17skbuff: Add a function to check if a page belongs to page_poolLiang Chen1-1/+6
Wrap code for checking if a page is a page_pool page into a function for better readability and ease of reuse. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Mina Almasry <almarsymina@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17page_pool: halve BIAS_MAX for multiple user references of a fragmentLiang Chen1-1/+1
Up to now, we were only subtracting from the number of used page fragments to figure out when a page could be freed or recycled. A following patch introduces support for multiple users referencing the same fragment. So reduce the initial page fragments value to half to avoid overflowing. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Mina Almasry <almarsymina@google.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17Merge branch 'tcp-ao-selftests'David S. Miller23-0/+7733
Dmitry Safonov says: ==================== selftests/net: Add TCP-AO tests An essential part of any big kernel submissions is selftests. At the beginning of TCP-AO project, I made patches to fcnal-test.sh and nettest.c to have the benefits of easy refactoring, early noticing breakages, putting a moat around the code, documenting and designing uAPI. While tests based on fcnal-test.sh/nettest.c provided initial testing* and were very easy to add, the pile of TCP-AO quickly grew out of one-binary + shell-script testing. The design of the TCP-AO testing is a bit different than one-big selftest binary as I did previously in net/ipsec.c. I found it beneficial to avoid implementing a tests runner/scheduler and delegate it to the user or Makefile. The approach is very influenced by CRIU/ZDTM testing[1]: it provides a static library with helper functions and selftest binaries that create specific scenarios. I also tried to utilize kselftest.h. test_init() function does all needed preparations. To not leave any traces after a selftest exists, it creates a network namespace and if the test wants to establish a TCP connection, a child netns. The parent and child netns have veth pair with proper ip addresses and routes set up. Both peers, the client and server are different pthreads. The treading model was chosen over forking mostly by easiness of cleanup on a failure: no need to search for children, handle SIGCHLD, make sure not to wait for a dead peer to perform anything, etc. Any thread that does exit() naturally kills the tests, sweet! The selftests are compiled currently in two variants: ipv4 and ipv6. Ipv4-mapped-ipv6 addresses might be a third variant to add, but it's not there in this version. As pretty much all tests are shared between two address families, most of the code can be shared, too. To differ in code what kind of test is running, Makefile supplies -DIPV6_TEST to compiler and ifdeffery in tests can do things that have to be different between address families. This is similar to TARGETS_C_BOTHBITS in x86 selftests and also to tests code sharing in CRIU/ZDTM. The total number of tests is 832. From them rst_ipv{4,6} has currently one flaky subtest, that may fail: > not ok 9 client connection was not reset: 0 I'll investigate what happens there. Also, unsigned-md5_ipv{4,6} are flaky because of netns counter checks: it doesn't expect that there may be retransmitted TCP segments from a previous sub-selftest. That will be fixed. Besides, key-management_ipv{4,6} has 3 sub-tests passing with XFAIL: > ok 15 # XFAIL listen() after current/rnext keys set: the socket has current/rnext keys: 100:200 > ok 16 # XFAIL listen socket, delete current key from before listen(): failed to delete the key 100:100 -16 > ok 17 # XFAIL listen socket, delete rnext key from before listen(): failed to delete the key 200:200 -16 ... > # Totals: pass:117 fail:0 xfail:3 xpass:0 skip:0 error:0 Those need some more kernel work to pass instead of xfail. The overview of selftests (see the diffstat at the bottom): ├── lib │ ├── aolib.h │ │ The header for all selftests to include. │ ├── kconfig.c │ │ Kernel kconfig detector to SKIP tests that depend on something. │ ├── netlink.c │ │ Netlink helper to add/modify/delete VETH/IPs/routes/VRFs │ │ I considered just using libmnl, but this is around 400 lines │ │ and avoids selftests dependency on out-of-tree sources/packets. │ ├── proc.c │ │ SNMP/netstat procfs parser and the counters comparator. │ ├── repair.c │ │ Heavily influenced by libsoccr and reduced to minimum TCP │ │ socket checkpoint/repair. Shouldn't be used out of selftests, │ │ though. │ ├── setup.c │ │ All the needed netns/veth/ips/etc preparations for test init. │ ├── sock.c │ │ Socket helpers: {s,g}etsockopt()s/connect()/listen()/etc. │ └── utils.c │ Random stuff (a pun intended). ├── bench-lookups.c │ The only benchmark in selftests currently: checks how well TCP-AO │ setsockopt()s perform, depending on the amount of keys on a socket. ├── connect.c │ Trivial sample, can be used as a boilerplate to write a new test. ├── connect-deny.c │ More-or-less what could be expected for TCP-AO in fcnal-test.sh ├── icmps-accept.c -> icmps-discard.c ├── icmps-discard.c │ Verifies RFC5925 (7.8) by checking that TCP-AO connection can be │ broken if ICMPs are accepted and survives when ::accept_icmps = 0 ├── key-management.c │ Key manipulations, rotations between randomized hashing algorithms │ and counter checks for those scenarios. ├── restore.c │ TCP_AO_REPAIR: verifies that a socket can be re-created without │ TCP-AO connection being interrupted. ├── rst.c │ As RST segments are signed on a separate code-path in kernel, │ verifies passive/active TCP send_reset(). ├── self-connect.c │ Verifies that TCP self-connect and also simultaneous open work. ├── seq-ext.c │ Utilizes TCP_AO_REPAIR to check that on SEQ roll-over SNE │ increment is performed and segments with different SNEs fail to │ pass verification. ├── setsockopt-closed.c │ Checks that {s,g}etsockopt()s are extendable syscalls and common │ error-paths for them. └── unsigned-md5.c Checks listen() socket for (non-)matching peers with: AO/MD5/none keys. As well as their interaction with VRFs and AO_REQUIRED flag. There are certainly more test scenarios that can be added, but even so, I'm pretty happy that this much of TCP-AO functionality and uAPIs got covered. These selftests were iteratively developed by me during TCP-AO kernel upstreaming and the resulting kernel patches would have been worse without having these tests. They provided the user-side perspective but also allowed safer refactoring with less possibility of introducing a regression. Now it's time to use them to dig a moat around the TCP-AO code! There are also people from other network companies that work on TCP-AO (+testing), so sharing these selftests will allow them to contribute and may benefit from their efforts. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17selftests/net: Add TCP-AO key-management testDmitry Safonov2-0/+1181
Check multiple keys on a socket: - rotation on closed socket - current/rnext operations shouldn't be possible on listen sockets - current/rnext key set should be the one, that's used on connect() - key rotations with pseudo-random generated keys - copying matching keys on connect() and on accept() At this moment there are 3 tests that are "expected" to fail: a kernel fix is needed to improve the situation, they are marked XFAIL. Sample output: > # ./key-management_ipv4 > 1..120 > # 1601[lib/setup.c:239] rand seed 1700526653 > TAP version 13 > ok 1 closed socket, delete a key: the key was deleted > ok 2 closed socket, delete all keys: the key was deleted > ok 3 closed socket, delete current key: key deletion was prevented > ok 4 closed socket, delete rnext key: key deletion was prevented > ok 5 closed socket, delete a key + set current/rnext: the key was deleted > ok 6 closed socket, force-delete current key: the key was deleted > ok 7 closed socket, force-delete rnext key: the key was deleted > ok 8 closed socket, delete current+rnext key: key deletion was prevented > ok 9 closed socket, add + change current key > ok 10 closed socket, add + change rnext key > ok 11 listen socket, delete a key: the key was deleted > ok 12 listen socket, delete all keys: the key was deleted > ok 13 listen socket, setting current key not allowed > ok 14 listen socket, setting rnext key not allowed > ok 15 # XFAIL listen() after current/rnext keys set: the socket has current/rnext keys: 100:200 > ok 16 # XFAIL listen socket, delete current key from before listen(): failed to delete the key 100:100 -16 > ok 17 # XFAIL listen socket, delete rnext key from before listen(): failed to delete the key 200:200 -16 > ok 18 listen socket, getsockopt(TCP_AO_REPAIR) is restricted > ok 19 listen socket, setsockopt(TCP_AO_REPAIR) is restricted > ok 20 listen socket, delete a key + set current/rnext: key deletion was prevented > ok 21 listen socket, force-delete current key: key deletion was prevented > ok 22 listen socket, force-delete rnext key: key deletion was prevented > ok 23 listen socket, delete a key: the key was deleted > ok 24 listen socket, add + change current key > ok 25 listen socket, add + change rnext key > ok 26 server: Check current/rnext keys unset before connect(): The socket keys are consistent with the expectations > ok 27 client: Check current/rnext keys unset before connect(): current key 19 as expected > ok 28 client: Check current/rnext keys unset before connect(): rnext key 146 as expected > ok 29 server: Check current/rnext keys unset before connect(): server alive > ok 30 server: Check current/rnext keys unset before connect(): passed counters checks > ok 31 client: Check current/rnext keys unset before connect(): The socket keys are consistent with the expectations > ok 32 server: Check current/rnext keys unset before connect(): The socket keys are consistent with the expectations > ok 33 server: Check current/rnext keys unset before connect(): passed counters checks > ok 34 client: Check current/rnext keys unset before connect(): passed counters checks > ok 35 server: Check current/rnext keys set before connect(): The socket keys are consistent with the expectations > ok 36 server: Check current/rnext keys set before connect(): server alive > ok 37 server: Check current/rnext keys set before connect(): passed counters checks > ok 38 client: Check current/rnext keys set before connect(): current key 10 as expected > ok 39 client: Check current/rnext keys set before connect(): rnext key 137 as expected > ok 40 server: Check current/rnext keys set before connect(): The socket keys are consistent with the expectations > ok 41 client: Check current/rnext keys set before connect(): The socket keys are consistent with the expectations > ok 42 client: Check current/rnext keys set before connect(): passed counters checks > ok 43 server: Check current/rnext keys set before connect(): passed counters checks > ok 44 server: Check current != rnext keys set before connect(): The socket keys are consistent with the expectations > ok 45 server: Check current != rnext keys set before connect(): server alive > ok 46 server: Check current != rnext keys set before connect(): passed counters checks > ok 47 client: Check current != rnext keys set before connect(): current key 10 as expected > ok 48 client: Check current != rnext keys set before connect(): rnext key 132 as expected > ok 49 server: Check current != rnext keys set before connect(): The socket keys are consistent with the expectations > ok 50 client: Check current != rnext keys set before connect(): The socket keys are consistent with the expectations > ok 51 client: Check current != rnext keys set before connect(): passed counters checks > ok 52 server: Check current != rnext keys set before connect(): passed counters checks > ok 53 server: Check current flapping back on peer's RnextKey request: The socket keys are consistent with the expectations > ok 54 server: Check current flapping back on peer's RnextKey request: server alive > ok 55 server: Check current flapping back on peer's RnextKey request: passed counters checks > ok 56 client: Check current flapping back on peer's RnextKey request: current key 10 as expected > ok 57 client: Check current flapping back on peer's RnextKey request: rnext key 132 as expected > ok 58 server: Check current flapping back on peer's RnextKey request: The socket keys are consistent with the expectations > ok 59 client: Check current flapping back on peer's RnextKey request: The socket keys are consistent with the expectations > ok 60 server: Check current flapping back on peer's RnextKey request: passed counters checks > ok 61 client: Check current flapping back on peer's RnextKey request: passed counters checks > ok 62 server: Rotate over all different keys: The socket keys are consistent with the expectations > ok 63 server: Rotate over all different keys: server alive > ok 64 server: Rotate over all different keys: passed counters checks > ok 65 server: Rotate over all different keys: current key 128 as expected > ok 66 client: Rotate over all different keys: rnext key 128 as expected > ok 67 server: Rotate over all different keys: current key 129 as expected > ok 68 client: Rotate over all different keys: rnext key 129 as expected > ok 69 server: Rotate over all different keys: current key 130 as expected > ok 70 client: Rotate over all different keys: rnext key 130 as expected > ok 71 server: Rotate over all different keys: current key 131 as expected > ok 72 client: Rotate over all different keys: rnext key 131 as expected > ok 73 server: Rotate over all different keys: current key 132 as expected > ok 74 client: Rotate over all different keys: rnext key 132 as expected > ok 75 server: Rotate over all different keys: current key 133 as expected > ok 76 client: Rotate over all different keys: rnext key 133 as expected > ok 77 server: Rotate over all different keys: current key 134 as expected > ok 78 client: Rotate over all different keys: rnext key 134 as expected > ok 79 server: Rotate over all different keys: current key 135 as expected > ok 80 client: Rotate over all different keys: rnext key 135 as expected > ok 81 server: Rotate over all different keys: current key 136 as expected > ok 82 client: Rotate over all different keys: rnext key 136 as expected > ok 83 server: Rotate over all different keys: current key 137 as expected > ok 84 client: Rotate over all different keys: rnext key 137 as expected > ok 85 server: Rotate over all different keys: current key 138 as expected > ok 86 client: Rotate over all different keys: rnext key 138 as expected > ok 87 server: Rotate over all different keys: current key 139 as expected > ok 88 client: Rotate over all different keys: rnext key 139 as expected > ok 89 server: Rotate over all different keys: current key 140 as expected > ok 90 client: Rotate over all different keys: rnext key 140 as expected > ok 91 server: Rotate over all different keys: current key 141 as expected > ok 92 client: Rotate over all different keys: rnext key 141 as expected > ok 93 server: Rotate over all different keys: current key 142 as expected > ok 94 client: Rotate over all different keys: rnext key 142 as expected > ok 95 server: Rotate over all different keys: current key 143 as expected > ok 96 client: Rotate over all different keys: rnext key 143 as expected > ok 97 server: Rotate over all different keys: current key 144 as expected > ok 98 client: Rotate over all different keys: rnext key 144 as expected > ok 99 server: Rotate over all different keys: current key 145 as expected > ok 100 client: Rotate over all different keys: rnext key 145 as expected > ok 101 server: Rotate over all different keys: current key 146 as expected > ok 102 client: Rotate over all different keys: rnext key 146 as expected > ok 103 server: Rotate over all different keys: current key 127 as expected > ok 104 client: Rotate over all different keys: rnext key 127 as expected > ok 105 client: Rotate over all different keys: current key 0 as expected > ok 106 client: Rotate over all different keys: rnext key 127 as expected > ok 107 server: Rotate over all different keys: The socket keys are consistent with the expectations > ok 108 client: Rotate over all different keys: The socket keys are consistent with the expectations > ok 109 client: Rotate over all different keys: passed counters checks > ok 110 server: Rotate over all different keys: passed counters checks > ok 111 server: Check accept() => established key matching: The socket keys are consistent with the expectations > ok 112 Can't add a key with non-matching ip-address for established sk > ok 113 Can't add a key with non-matching VRF for established sk > ok 114 server: Check accept() => established key matching: server alive > ok 115 server: Check accept() => established key matching: passed counters checks > ok 116 client: Check connect() => established key matching: current key 0 as expected > ok 117 client: Check connect() => established key matching: rnext key 128 as expected > ok 118 client: Check connect() => established key matching: The socket keys are consistent with the expectations > ok 119 server: Check accept() => established key matching: The socket keys are consistent with the expectations > ok 120 server: Check accept() => established key matching: passed counters checks > # Totals: pass:120 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17selftests/net: Add TCP-AO selfconnect/simultaneous connect testDmitry Safonov2-0/+198
Check that a rare functionality of TCP named self-connect works with TCP-AO. This "under the cover" also checks TCP simultaneous connect (TCP_SYN_RECV socket state), which would be harder to check other ways. In order to verify that it's indeed TCP simultaneous connect, check the counters TCPChallengeACK and TCPSYNChallenge. Sample of the output: > # ./self-connect_ipv6 > 1..4 > # 1738[lib/setup.c:254] rand seed 1696451931 > TAP version 13 > ok 1 self-connect(same keyids): connect TCPAOGood 0 => 24 > ok 2 self-connect(different keyids): connect TCPAOGood 26 => 50 > ok 3 self-connect(restore): connect TCPAOGood 52 => 97 > ok 4 self-connect(restore, different keyids): connect TCPAOGood 99 => 144 > # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17selftests/net: Add TCP-AO RST testDmitry Safonov3-1/+417
Check that both active and passive reset works and correctly sign segments with TCP-AO or don't send RSTs if not possible to sign. A listening socket with backlog = 0 gets one connection in accept queue, another in syn queue. Once the server/listener socket is forcibly closed, client sockets aren't connected to anything. In regular situation they would receive RST on any segment, but with TCP-AO as there's no listener, no AO-key and unknown ISNs, no RST should be sent. And "passive" reset, where RST is sent on reply for some segment (tcp_v{4,6}_send_reset()) - there use TCP_REPAIR to corrupt SEQ numbers, which later results in TCP-AO signed RST, which will be verified and client socket will get EPIPE. No TCPAORequired/TCPAOBad segments are expected during these tests. Sample of the output: > # ./rst_ipv4 > 1..15 > # 1462[lib/setup.c:254] rand seed 1686611171 > TAP version 13 > ok 1 servered 1000 bytes > ok 2 Verified established tcp connection > ok 3 sk[0] = 7, connection was reset > ok 4 sk[1] = 8, connection was reset > ok 5 sk[2] = 9 > ok 6 MKT counters are good on server > ok 7 Verified established tcp connection > ok 8 client connection broken post-seq-adjust > ok 9 client connection was reset > ok 10 No segments without AO sign (server) > ok 11 Signed AO segments (server): 0 => 30 > ok 12 No segments with bad AO sign (server) > ok 13 No segments without AO sign (client) > ok 14 Signed AO segments (client): 0 => 30 > ok 15 No segments with bad AO sign (client) > # Totals: pass:15 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17selftests/net: Add SEQ number extension testDmitry Safonov2-0/+246
Check that on SEQ number wraparound there is no disruption or TCPAOBad segments produced. Sample of expected output: > # ./seq-ext_ipv4 > 1..7 > # 1436[lib/setup.c:254] rand seed 1686611079 > TAP version 13 > ok 1 server alive > ok 2 post-migrate connection alive > ok 3 TCPAOGood counter increased 1002 => 3002 > ok 4 TCPAOGood counter increased 1003 => 3003 > ok 5 TCPAOBad counter didn't increase > ok 6 TCPAOBad counter didn't increase > ok 7 SEQ extension incremented: 1/1999, 1/998999 > # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>