summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-24net: dsa: mt7530: disable EEE abilities on failure on MT7531 and MT7988Arınç ÜNAL1-1/+5
The MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 bits let the PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits determine the 1G/100 EEE abilities of the MAC. If MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 are unset, the abilities are left to be determined by PHY auto polling. The commit 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features") made it so that the PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits are set on mt753x_phylink_mac_link_up(). But it did not set the MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 bits. Because of this, the EEE abilities will be determined by PHY auto polling, regardless of the result of phy_init_eee(). Define these bits and add them to the MT7531_FORCE_MODE mask which is set in mt7531_setup_common(). With this, there won't be any EEE abilities set when phy_init_eee() returns a negative value. Thanks to Russell for explaining when phy_init_eee() could return a negative value below. Looking at phy_init_eee(), it could return a negative value when: 1. phydev->drv is NULL 2. if genphy_c45_eee_is_active() returns negative 3. if genphy_c45_eee_is_active() returns zero, it returns -EPROTONOSUPPORT 4. if phy_set_bits_mmd() fails (e.g. communication error with the PHY) If we then look at genphy_c45_eee_is_active(), then: genphy_c45_read_eee_adv() and genphy_c45_read_eee_lpa() propagate their non-zero return values, otherwise this function returns zero or positive integer. If we then look at genphy_c45_read_eee_adv(), then a failure of phy_read_mmd() would cause a negative value to be returned. Looking at genphy_c45_read_eee_lpa(), the same is true. So, it can be summarised as: - phydev->drv is NULL - there is a communication error accessing the PHY - EEE is not active otherwise, it returns zero on success. If one wishes to determine whether an error occurred vs EEE not being supported through negotiation for the negotiated speed, if it returns -EPROTONOSUPPORT in the latter case. Other error codes mean either the driver has been unloaded or communication error. In conclusion, determining the EEE abilities by PHY auto polling shouldn't result in having any EEE abilities enabled, when one of the last two situations in the summary happens. And it seems that if phydev->drv is NULL, there would be bigger problems with the device than a broken link. So this is not a bugfix. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: phy: mediatek-ge-soc: follow netdev LED trigger semanticsDaniel Golle1-17/+26
Only blink if the link is up on a LED which is programmed to also indicate link-status. Otherwise, if both LEDs are in use to indicate different speeds, the resulting blinking being inverted on LEDs which aren't switched on at a specific speed is quite counter-intuitive. Also make sure that state left behind by reset or the bootloader is recognized correctly including the half-duplex and full-duplex bits as well as the (unsupported by Linux netdev trigger semantics) link-down bit. Fixes: c66937b0f8db ("net: phy: mediatek-ge-soc: support PHY LEDs") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: gtp: Fix Use-After-Free in gtp_dellinkHyunwoo Kim1-1/+2
Since call_rcu, which is called in the hlist_for_each_entry_rcu traversal of gtp_dellink, is not part of the RCU read critical section, it is possible that the RCU grace period will pass during the traversal and the key will be free. To prevent this, it should be changed to hlist_for_each_entry_safe. Fixes: 94dc550a5062 ("gtp: fix an use-after-free in ipv4_pdp_find()") Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24Merge branch 'introduce-bpf_wq'Alexei Starovoitov16-103/+889
Benjamin Tissoires says: ==================== Introduce bpf_wq This is a followup of sleepable bpf_timer[0]. When discussing sleepable bpf_timer, it was thought that we should give a try to bpf_wq, as the 2 APIs are similar but distinct enough to justify a new one. So here it is. I tried to keep as much as possible common code in kernel/bpf/helpers.c but I couldn't get away with code duplication in kernel/bpf/verifier.c. This series introduces a basic bpf_wq support: - creation is supported - assignment is supported - running a simple bpf_wq is also supported. We will probably need to extend the API further with: - a full delayed_work API (can be piggy backed on top with a correct flag) - bpf_wq_cancel() <- apparently not, this is shooting ourself in the foot - bpf_wq_cancel_sync() (for sleepable programs) - documentation --- For reference, the use cases I have in mind: --- Basically, I need to be able to defer a HID-BPF program for the following reasons (from the aforementioned patch): 1. defer an event: Sometimes we receive an out of proximity event, but the device can not be trusted enough, and we need to ensure that we won't receive another one in the following n milliseconds. So we need to wait those n milliseconds, and eventually re-inject that event in the stack. 2. inject new events in reaction to one given event: We might want to transform one given event into several. This is the case for macro keys where a single key press is supposed to send a sequence of key presses. But this could also be used to patch a faulty behavior, if a device forgets to send a release event. 3. communicate with the device in reaction to one event: We might want to communicate back to the device after a given event. For example a device might send us an event saying that it came back from sleeping state and needs to be re-initialized. Currently we can achieve that by keeping a userspace program around, raise a bpf event, and let that userspace program inject the events and commands. However, we are just keeping that program alive as a daemon for just scheduling commands. There is no logic in it, so it doesn't really justify an actual userspace wakeup. So a kernel workqueue seems simpler to handle. bpf_timers are currently running in a soft IRQ context, this patch series implements a sleppable context for them. Cheers, Benjamin [0] https://lore.kernel.org/all/20240408-hid-bpf-sleepable-v6-0-0499ddd91b94@kernel.org/ Changes in v2: - took previous review into account - mainly dropped BPF_F_WQ_SLEEPABLE - Link to v1: https://lore.kernel.org/r/20240416-bpf_wq-v1-0-c9e66092f842@kernel.org ==================== Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-0-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24selftests/bpf: wq: add bpf_wq_start() checksBenjamin Tissoires3-3/+40
Allows to test if allocation/free works Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-16-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24bpf: add bpf_wq_startBenjamin Tissoires1-0/+18
again, copy/paste from bpf_timer_start(). Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-15-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24selftests/bpf: add checks for bpf_wq_set_callback()Benjamin Tissoires5-7/+111
We assign the callback and set everything up. The actual tests of these callbacks will be done when bpf_wq_start() is available. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-14-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24bpf: wq: add bpf_wq_set_callback_implBenjamin Tissoires3-6/+70
To support sleepable async callbacks, we need to tell push_async_cb() whether the cb is sleepable or not. The verifier now detects that we are in bpf_wq_set_callback_impl and can allow a sleepable callback to happen. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-13-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24selftests/bpf: wq: add bpf_wq_init() checksBenjamin Tissoires4-0/+97
Allows to test if allocation/free works Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-12-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24bpf: wq: add bpf_wq_initBenjamin Tissoires1-2/+102
We need to teach the verifier about the second argument which is declared as void * but which is of type KF_ARG_PTR_TO_MAP. We could have dropped this extra case if we declared the second argument as struct bpf_map *, but that means users will have to do extra casting to have their program compile. We also need to duplicate the timer code for the checking if the map argument is matching the provided workqueue. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-11-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24selftests/bpf: add bpf_wq testsBenjamin Tissoires2-0/+140
We simply try in all supported map types if we can store/load a bpf_wq. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-10-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24tcp: Fix Use-After-Free in tcp_ao_connect_initHyunwoo Kim1-1/+2
Since call_rcu, which is called in the hlist_for_each_entry_rcu traversal of tcp_ao_connect_init, is not part of the RCU read critical section, it is possible that the RCU grace period will pass during the traversal and the key will be free. To prevent this, it should be changed to hlist_for_each_entry_safe. Fixes: 7c2ffaf21bd6 ("net/tcp: Calculate TCP-AO traffic keys") Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://lore.kernel.org/r/ZiYu9NJ/ClR8uSkH@v4bel-B760M-AORUS-ELITE-AX Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24neighbour: fix neigh_master_filtered()Eric Dumazet1-1/+1
If we no longer hold RTNL, we must use netdev_master_upper_dev_get_rcu() instead of netdev_master_upper_dev_get(). Fixes: ba0f78069423 ("neighbour: no longer hold RTNL in neigh_dump_info()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240421185753.1808077-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24net: usb: ax88179_178a: stop lying about skb->truesizeEric Dumazet1-8/+3
Some usb drivers try to set small skb->truesize and break core networking stacks. In this patch, I removed one of the skb->truesize overide. I also replaced one skb_clone() by an allocation of a fresh and small skb, to get minimally sized skbs, like we did in commit 1e2c61172342 ("net: cdc_ncm: reduce skb truesize in rx path") Fixes: f8ebb3ac881b ("net: usb: ax88179_178a: Fix packet receiving") Reported-by: shironeko <shironeko@tesaguri.club> Closes: https://lore.kernel.org/netdev/c110f41a0d2776b525930f213ca9715c@tesaguri.club/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jose Alonso <joalonsof@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240421193828.1966195-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24ipv4: check for NULL idev in ip_route_use_hint()Eric Dumazet1-0/+3
syzbot was able to trigger a NULL deref in fib_validate_source() in an old tree [1]. It appears the bug exists in latest trees. All calls to __in_dev_get_rcu() must be checked for a NULL result. [1] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 2 PID: 3257 Comm: syz-executor.3 Not tainted 5.10.0-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:fib_validate_source+0xbf/0x15a0 net/ipv4/fib_frontend.c:425 Code: 18 f2 f2 f2 f2 42 c7 44 20 23 f3 f3 f3 f3 48 89 44 24 78 42 c6 44 20 27 f3 e8 5d 88 48 fc 4c 89 e8 48 c1 e8 03 48 89 44 24 18 <42> 80 3c 20 00 74 08 4c 89 ef e8 d2 15 98 fc 48 89 5c 24 10 41 bf RSP: 0018:ffffc900015fee40 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88800f7a4000 RCX: ffff88800f4f90c0 RDX: 0000000000000000 RSI: 0000000004001eac RDI: ffff8880160c64c0 RBP: ffffc900015ff060 R08: 0000000000000000 R09: ffff88800f7a4000 R10: 0000000000000002 R11: ffff88800f4f90c0 R12: dffffc0000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88800f7a4000 FS: 00007f938acfe6c0(0000) GS:ffff888058c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f938acddd58 CR3: 000000001248e000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip_route_use_hint+0x410/0x9b0 net/ipv4/route.c:2231 ip_rcv_finish_core+0x2c4/0x1a30 net/ipv4/ip_input.c:327 ip_list_rcv_finish net/ipv4/ip_input.c:612 [inline] ip_sublist_rcv+0x3ed/0xe50 net/ipv4/ip_input.c:638 ip_list_rcv+0x422/0x470 net/ipv4/ip_input.c:673 __netif_receive_skb_list_ptype net/core/dev.c:5572 [inline] __netif_receive_skb_list_core+0x6b1/0x890 net/core/dev.c:5620 __netif_receive_skb_list net/core/dev.c:5672 [inline] netif_receive_skb_list_internal+0x9f9/0xdc0 net/core/dev.c:5764 netif_receive_skb_list+0x55/0x3e0 net/core/dev.c:5816 xdp_recv_frames net/bpf/test_run.c:257 [inline] xdp_test_run_batch net/bpf/test_run.c:335 [inline] bpf_test_run_xdp_live+0x1818/0x1d00 net/bpf/test_run.c:363 bpf_prog_test_run_xdp+0x81f/0x1170 net/bpf/test_run.c:1376 bpf_prog_test_run+0x349/0x3c0 kernel/bpf/syscall.c:3736 __sys_bpf+0x45c/0x710 kernel/bpf/syscall.c:5115 __do_sys_bpf kernel/bpf/syscall.c:5201 [inline] __se_sys_bpf kernel/bpf/syscall.c:5199 [inline] __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5199 Fixes: 02b24941619f ("ipv4: use dst hint for ipv4 list receive") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240421184326.1704930-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24net: fix sk_memory_allocated_{add|sub} vs softirqsEric Dumazet1-18/+20
Jonathan Heathcote reported a regression caused by blamed commit on aarch64 architecture. x86 happens to have irq-safe __this_cpu_add_return() and __this_cpu_sub(), but this is not generic. I think my confusion came from "struct sock" argument, because these helpers are called with a locked socket. But the memory accounting is per-proto (and per-cpu after the blamed commit). We might cleanup these helpers later to directly accept a "struct proto *proto" argument. Switch to this_cpu_add_return() and this_cpu_xchg() operations, and get rid of preempt_disable()/preempt_enable() pairs. Fast path becomes a bit faster as a result :) Many thanks to Jonathan Heathcote for his awesome report and investigations. Fixes: 3cd3399dd7a8 ("net: implement per-cpu reserves for memory_allocated") Reported-by: Jonathan Heathcote <jonathan.heathcote@bbc.co.uk> Closes: https://lore.kernel.org/netdev/VI1PR01MB42407D7947B2EA448F1E04EFD10D2@VI1PR01MB4240.eurprd01.prod.exchangelabs.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://lore.kernel.org/r/20240421175248.1692552-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24bpf: allow struct bpf_wq to be embedded in arraymaps and hashmapsBenjamin Tissoires5-19/+73
Currently bpf_wq_cancel_and_free() is just a placeholder as there is no memory allocation for bpf_wq just yet. Again, duplication of the bpf_timer approach Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-9-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24bpf: add support for KF_ARG_PTR_TO_WORKQUEUEBenjamin Tissoires1-0/+65
Introduce support for KF_ARG_PTR_TO_WORKQUEUE. The kfuncs will use bpf_wq as argument and that will be recognized as workqueue argument by verifier. bpf_wq_kern casting can happen inside kfunc, but using bpf_wq in argument makes life easier for users who work with non-kern type in BPF progs. Duplicate process_timer_func into process_wq_func. meta argument is only needed to ensure bpf_wq_init's workqueue and map arguments are coming from the same map (map_uid logic is necessary for correct inner-map handling), so also amend check_kfunc_args() to match what helpers functions check is doing. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-8-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24bpf: verifier: bail out if the argument is not a mapBenjamin Tissoires1-0/+5
When a kfunc is declared with a KF_ARG_PTR_TO_MAP, we should have reg->map_ptr set to a non NULL value, otherwise, that means that the underlying type is not a map. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-7-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24tools: sync include/uapi/linux/bpf.hBenjamin Tissoires1-0/+4
cp include/uapi/linux/bpf.h tools/include/uapi/linux/bpf.h Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-6-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24bpf: add support for bpf_wq user typeBenjamin Tissoires5-2/+45
Mostly a copy/paste from the bpf_timer API, without the initialization and free, as they will be done in a separate patch. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-5-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24bpf: replace bpf_timer_cancel_and_free with a generic helperBenjamin Tissoires1-17/+25
Same reason than most bpf_timer* functions, we need almost the same for workqueues. So extract the generic part out of it so bpf_wq_cancel_and_free can reuse it. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-4-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24bpf: replace bpf_timer_set_callback with a generic helperBenjamin Tissoires1-11/+18
In the same way we have a generic __bpf_async_init(), we also need to share code between timer and workqueue for the set_callback call. We just add an unused flags parameter, as it will be used for workqueues. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-3-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24bpf: replace bpf_timer_init with a generic helperBenjamin Tissoires1-28/+63
No code change except for the new flags argument being stored in the local data struct. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-2-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24bpf: make timer data struct more genericBenjamin Tissoires1-33/+38
To be able to add workqueues and reuse most of the timer code, we need to make bpf_hrtimer more generic. There is no code change except that the new struct gets a new u64 flags attribute. We are still below 2 cache lines, so this shouldn't impact the current running codes. The ordering is also changed. Everything related to async callback is now on top of bpf_hrtimer. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-1-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24Revert "NFSD: Convert the callback workqueue to use delayed_work"Chuck Lever2-4/+4
This commit was a pre-requisite for commit c1ccfcf1a9bf ("NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"), which has already been reverted. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-24Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"Chuck Lever1-18/+2
The reverted commit attempted to enable NFSD to retransmit pending callback operations if an NFS client disconnects, but unintentionally introduces a hazardous behavior regression if the client becomes permanently unreachable while callback operations are still pending. A disconnect can occur due to network partition or if the NFS server needs to force the NFS client to retransmit (for example, if a GSS window under-run occurs). Reverting the commit will make NFSD behave the same as it did in v6.8 and before. Pending callback operations are permanently lost if the client connection is terminated before the client receives them. For some callback operations, this loss is not harmful. However, for CB_RECALL, the loss means a delegation might be revoked unnecessarily. For CB_OFFLOAD, pending COPY operations will never complete unless the NFS client subsequently sends an OFFLOAD_STATUS operation, which the Linux NFS client does not currently implement. These issues still need to be addressed somehow. Reported-by: Dai Ngo <dai.ngo@oracle.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218735 Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-23Merge branch 'selftests-drv-net-support-testing-with-a-remote-system'Jakub Kicinski12-30/+417
Jakub Kicinski says: ==================== selftests: drv-net: support testing with a remote system Implement support for tests which require access to a remote system / endpoint which can generate traffic. This series concludes the "groundwork" for upstream driver tests. I wanted to support the three models which came up in discussions: - SW testing with netdevsim - "local" testing with two ports on the same system in a loopback - "remote" testing via SSH so there is a tiny bit of an abstraction which wraps up how "remote" commands are executed. Otherwise hopefully there's nothing surprising. I'm only adding a ping test. I had a bigger one written but I was worried we'll get into discussing the details of the test itself and how I chose to hack up netdevsim, instead of the test infra... So that test will be a follow up :) v4: https://lore.kernel.org/all/20240418233844.2762396-1-kuba@kernel.org v3: https://lore.kernel.org/all/20240417231146.2435572-1-kuba@kernel.org v2: https://lore.kernel.org/all/20240416004556.1618804-1-kuba@kernel.org v1: https://lore.kernel.org/all/20240412233705.1066444-1-kuba@kernel.org ==================== Link: https://lore.kernel.org/r/20240420025237.3309296-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23selftests: drv-net: add require_XYZ() helpers for validating envJakub Kicinski2-1/+34
Wrap typical checks like whether given command used by the test is available in helpers. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240420025237.3309296-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23selftests: drv-net: add a TCP ping test case (and useful helpers)Jakub Kicinski3-1/+68
More complex tests often have to spawn a background process, like a server which will respond to requests or tcpdump. Add support for creating such processes using the with keyword: with bkg("my-daemon", ..): # my-daemon is alive in this block My initial thought was to add this support to cmd() directly but it runs the command in the constructor, so by the time we __enter__ it's too late to make sure we used "background=True". Second useful helper transplanted from net_helper.sh is wait_port_listen(). The test itself uses socat, which insists on v6 addresses being wrapped in [], it's not the only command which requires this format, so add the wrapped address to env. The hope is to save test code from checking if address is v6. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240420025237.3309296-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23selftests: net: support matching cases by name prefixJakub Kicinski2-3/+13
While writing tests with a lot more cases I got tired of having to jump back and forth to add the name of the test to the ksft_run() list. Most unittest frameworks do some name matching, e.g. assume that functions with names starting with test_ are test cases. Support similar flow in ksft_run(). Let the author list the desired prefixes. globals() need to be passed explicitly, IDK how to work around that. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240420025237.3309296-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23selftests: drv-net: add a trivial ping testJakub Kicinski2-1/+31
Add a very simple test for testing with a remote system. Both IPv4 and IPv6 connectivity is optional, later change will add checks to skip tests based on available addresses. Using netdevsim: $ ./run_kselftest.sh -t drivers/net:ping.py TAP version 13 1..1 # timeout set to 45 # selftests: drivers/net: ping.py # KTAP version 1 # 1..2 # ok 1 ping.test_v4 # ok 2 ping.test_v6 # # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0 ok 1 selftests: drivers/net: ping.py Command line SSH: $ NETIF=virbr0 REMOTE_TYPE=ssh REMOTE_ARGS=root@192.168.122.123 \ LOCAL_V4=192.168.122.1 REMOTE_V4=192.168.122.123 \ ./tools/testing/selftests/drivers/net/ping.py KTAP version 1 1..2 ok 1 ping.test_v4 ok 2 ping.test_v6 # SKIP Test requires IPv6 connectivity # Totals: pass:1 fail:0 xfail:1 xpass:0 skip:0 error:0 Existing devices placed in netns (and using net.config): $ cat drivers/net/net.config NETIF=veth0 REMOTE_TYPE=netns REMOTE_ARGS=red LOCAL_V4="192.168.1.1" REMOTE_V4="192.168.1.2" $ ./run_kselftest.sh -t drivers/net:ping.py TAP version 13 1..1 # timeout set to 45 # selftests: drivers/net: ping.py # KTAP version 1 # 1..2 # ok 1 ping.test_v4 # ok 2 ping.test_v6 # SKIP Test requires IPv6 connectivity # # Totals: pass:1 fail:0 xfail:1 xpass:0 skip:0 error:0 Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240420025237.3309296-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23selftests: drv-net: construct environment for running tests which require an ↵Jakub Kicinski4-1/+162
endpoint Nothing surprising here, hopefully. Wrap the variables from the environment into a class or spawn a netdevsim based env and pass it to the tests. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240420025237.3309296-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23selftests: drv-net: factor out parsing of the envJakub Kicinski1-18/+27
The tests with a remote end will use a different class, for clarity, but will also need to parse the env. So factor parsing the env out to a function. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240420025237.3309296-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23selftests: drv-net: define endpoint structuresJakub Kicinski5-8/+85
Define the remote endpoint "model". To execute most meaningful device driver tests we need to be able to communicate with a remote system, and have it send traffic to the device under test. Various test environments will have different requirements. 0) "Local" netdevsim-based testing can simply use net namespaces. netdevsim supports connecting two devices now, to form a veth-like construct. 1) Similarly on hosts with multiple NICs, the NICs may be connected together with a loopback cable or internal device loopback. One interface may be placed into separate netns, and tests would proceed much like in the netdevsim case. Note that the loopback config or the moving of one interface into a netns is not expected to be part of selftest code. 2) Some systems may need to communicate with the remote endpoint via SSH. 3) Last but not least environment may have its own custom communication method. Fundamentally we only need two operations: - run a command remotely - deploy a binary (if some tool we need is built as part of kselftests) Wrap these two in a class. Use dynamic loading to load the Remote class. This will allow very easy definition of other communication methods without bothering upstream code base. Stick to the "simple" / "no unnecessary abstractions" model for referring to the remote endpoints. The host / remote object are passed as an argument to the usual cmd() or ip() invocation. For example: ip("link show", json=True, host=remote) Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240420025237.3309296-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23Merge branch 'netdev-support-dumping-a-single-netdev-in-qstats'Jakub Kicinski6-81/+188
Jakub Kicinski says: ==================== netdev: support dumping a single netdev in qstats I was writing a test for page pool which depended on qstats, and got tired of having to filter dumps in user space. Add support for dumping stats for a single netdev. To get there we first need to add full support for extack in dumps (and fix a dump error handling bug in YNL, sent separately to the net tree). ==================== Link: https://lore.kernel.org/r/20240420023543.3300306-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23selftests: drv-net: test dumping qstats per deviceJakub Kicinski2-3/+77
Add a test for dumping qstats device by device. ksft framework grows a ksft_raises() helper, to be used under with, which should be familiar to unittest users. Link: https://lore.kernel.org/r/20240420023543.3300306-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23netlink: support all extack types in dumpsJakub Kicinski1-5/+10
Note that when this commit message refers to netlink dump it only means the actual dumping part, the parsing / dump start is handled by the same code as "doit". Commit 4a19edb60d02 ("netlink: Pass extack to dump handlers") added support for returning extack messages from dump handlers, but left out other extack info, e.g. bad attribute. This used to be fine because until YNL we had little practical use for the machine readable attributes, and only messages were used in practice. YNL flips the preference 180 degrees, it's now much more useful to point to a bad attr with NL_SET_BAD_ATTR() than type an English message saying "attribute XYZ is $reason-why-bad". Support all of extack. The fact that extack only gets added if it fits remains unaddressed. Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240420023543.3300306-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23netlink: move extack writing helpersJakub Kicinski1-63/+63
Next change will need them in netlink_dump_done(), pure move. Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240420023543.3300306-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23netdev: support dumping a single netdev in qstatsJakub Kicinski3-13/+41
Having to filter the right ifindex in the tests is a bit tedious. Add support for dumping qstats for a single ifindex. Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240420023543.3300306-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23Merge tag '6.9-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds13-26/+176
Pull smb client fixes from Steve French: - fscache fix - fix for case where we could use uninitialized lease - add tracepoint for debugging refcounting of tcon - fix mount option regression (e.g. forceuid vs. noforceuid when uid= specified) caused by conversion to the new mount API * tag '6.9-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: reinstate original behavior again for forceuid/forcegid smb: client: fix rename(2) regression against samba cifs: Add tracing for the cifs_tcon struct refcounting cifs: Fix reacquisition of volume cookie on still-live connection
2024-04-23tools: ynl: don't ignore errors in NLMSG_DONE messagesJakub Kicinski1-0/+1
NLMSG_DONE contains an error code, it has to be extracted. Prior to this change all dumps will end in success, and in case of failure the result is silently truncated. Fixes: e4b48ed460d3 ("tools: ynl: add a completely generic client") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240420020827.3288615-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23af_unix: Don't access successor in unix_del_edges() during GC.Kuniyuki Iwashima1-5/+12
syzbot reported use-after-free in unix_del_edges(). [0] What the repro does is basically repeat the following quickly. 1. pass a fd of an AF_UNIX socket to itself socketpair(AF_UNIX, SOCK_DGRAM, 0, [3, 4]) = 0 sendmsg(3, {..., msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[4]}], ...}, 0) = 0 2. pass other fds of AF_UNIX sockets to the socket above socketpair(AF_UNIX, SOCK_SEQPACKET, 0, [5, 6]) = 0 sendmsg(3, {..., msg_control=[{cmsg_len=48, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[5, 6]}], ...}, 0) = 0 3. close all sockets Here, two skb are created, and every unix_edge->successor is the first socket. Then, __unix_gc() will garbage-collect the two skb: (a) free skb with self-referencing fd (b) free skb holding other sockets After (a), the self-referencing socket will be scheduled to be freed later by the delayed_fput() task. syzbot repeated the sequences above (1. ~ 3.) quickly and triggered the task concurrently while GC was running. So, at (b), the socket was already freed, and accessing it was illegal. unix_del_edges() accesses the receiver socket as edge->successor to optimise GC. However, we should not do it during GC. Garbage-collecting sockets does not change the shape of the rest of the graph, so we need not call unix_update_graph() to update unix_graph_grouped when we purge skb. However, if we clean up all loops in the unix_walk_scc_fast() path, unix_graph_maybe_cyclic remains unchanged (true), and __unix_gc() will call unix_walk_scc_fast() continuously even though there is no socket to garbage-collect. To keep that optimisation while fixing UAF, let's add the same updating logic of unix_graph_maybe_cyclic in unix_walk_scc_fast() as done in unix_walk_scc() and __unix_walk_scc(). Note that when unix_del_edges() is called from other places, the receiver socket is always alive: - sendmsg: the successor's sk_refcnt is bumped by sock_hold() unix_find_other() for SOCK_DGRAM, connect() for SOCK_STREAM - recvmsg: the successor is the receiver, and its fd is alive [0]: BUG: KASAN: slab-use-after-free in unix_edge_successor net/unix/garbage.c:109 [inline] BUG: KASAN: slab-use-after-free in unix_del_edge net/unix/garbage.c:165 [inline] BUG: KASAN: slab-use-after-free in unix_del_edges+0x148/0x630 net/unix/garbage.c:237 Read of size 8 at addr ffff888079c6e640 by task kworker/u8:6/1099 CPU: 0 PID: 1099 Comm: kworker/u8:6 Not tainted 6.9.0-rc4-next-20240418-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: events_unbound __unix_gc Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 unix_edge_successor net/unix/garbage.c:109 [inline] unix_del_edge net/unix/garbage.c:165 [inline] unix_del_edges+0x148/0x630 net/unix/garbage.c:237 unix_destroy_fpl+0x59/0x210 net/unix/garbage.c:298 unix_detach_fds net/unix/af_unix.c:1811 [inline] unix_destruct_scm+0x13e/0x210 net/unix/af_unix.c:1826 skb_release_head_state+0x100/0x250 net/core/skbuff.c:1127 skb_release_all net/core/skbuff.c:1138 [inline] __kfree_skb net/core/skbuff.c:1154 [inline] kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1190 __skb_queue_purge_reason include/linux/skbuff.h:3251 [inline] __skb_queue_purge include/linux/skbuff.h:3256 [inline] __unix_gc+0x1732/0x1830 net/unix/garbage.c:575 process_one_work kernel/workqueue.c:3218 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299 worker_thread+0x86d/0xd70 kernel/workqueue.c:3380 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Allocated by task 14427: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 unpoison_slab_object mm/kasan/common.c:312 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3897 [inline] slab_alloc_node mm/slub.c:3957 [inline] kmem_cache_alloc_noprof+0x135/0x290 mm/slub.c:3964 sk_prot_alloc+0x58/0x210 net/core/sock.c:2074 sk_alloc+0x38/0x370 net/core/sock.c:2133 unix_create1+0xb4/0x770 unix_create+0x14e/0x200 net/unix/af_unix.c:1034 __sock_create+0x490/0x920 net/socket.c:1571 sock_create net/socket.c:1622 [inline] __sys_socketpair+0x33e/0x720 net/socket.c:1773 __do_sys_socketpair net/socket.c:1822 [inline] __se_sys_socketpair net/socket.c:1819 [inline] __x64_sys_socketpair+0x9b/0xb0 net/socket.c:1819 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 1805: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2190 [inline] slab_free mm/slub.c:4393 [inline] kmem_cache_free+0x145/0x340 mm/slub.c:4468 sk_prot_free net/core/sock.c:2114 [inline] __sk_destruct+0x467/0x5f0 net/core/sock.c:2208 sock_put include/net/sock.h:1948 [inline] unix_release_sock+0xa8b/0xd20 net/unix/af_unix.c:665 unix_release+0x91/0xc0 net/unix/af_unix.c:1049 __sock_release net/socket.c:659 [inline] sock_close+0xbc/0x240 net/socket.c:1421 __fput+0x406/0x8b0 fs/file_table.c:422 delayed_fput+0x59/0x80 fs/file_table.c:445 process_one_work kernel/workqueue.c:3218 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299 worker_thread+0x86d/0xd70 kernel/workqueue.c:3380 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The buggy address belongs to the object at ffff888079c6e000 which belongs to the cache UNIX of size 1920 The buggy address is located 1600 bytes inside of freed 1920-byte region [ffff888079c6e000, ffff888079c6e780) Reported-by: syzbot+f3f3eef1d2100200e593@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f3f3eef1d2100200e593 Fixes: 77e5593aebba ("af_unix: Skip GC if no cycle exists.") Fixes: fd86344823b5 ("af_unix: Try not to hold unix_gc_lock during accept().") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240419235102.31707-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23Merge branch 'net-ipa-eight-simple-cleanups'Paolo Abeni15-167/+51
Alex Elder says: ==================== net: ipa: eight simple cleanups This series contains a mix of cleanups, some dating back to December, 2022. Version 1 was based on an older version of net-next/main; this version has simply been rebased. The first two make it so the IPA SUSPEND interrupt only gets enabled when necessary. That make it possible in the third patch to call device_init_wakeup() during an earlier phase of initialization, and remove two functions. The next patch removes IPA register definitions that are never used. The fifth patch makes ipa_table_hash_support() a real function, so the IPA structure only needs to be declared rather than defined when that file is parsed. The sixth patch fixes improper argument names in two function declarations. The seventh removes the declaration for a function that does not exist, and makes ipa_cmd_init() actually get called. And the last one eliminates ipa_version_supported(), in favor of just deciding that if a device is probed because its compatible matches, that device is assumed to be supported. ==================== Link: https://lore.kernel.org/r/20240419151800.2168903-1-elder@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: kill ipa_version_supported()Alex Elder2-23/+0
The only place ipa_version_supported() is called is in the probe function. The version comes from the match data. Rather than checking the version validity separately, just consider anything that has match data to be supported. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: fix two minor ipa_cmd problemsAlex Elder2-8/+4
In "ipa_cmd.h", ipa_cmd_data_valid() is declared, but that function does not exist. So delete that declaration. Also, for some reason ipa_cmd_init() never gets called. It isn't really critical--it just validates that some memory offsets and a size can be represented in some register fields, and they won't fail with current data. Regardless, call the function in ipa_probe(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: fix two bogus argument namesAlex Elder1-3/+3
In "ipa_endpoint.h", two function declarations have bogus argument names. Fix these. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: make ipa_table_hash_support() a real functionAlex Elder2-6/+9
With the exception of ipa_table_hash_support(), nothing defined in "ipa_table.h" requires the full definition of the IPA structure. Change that function to be a "real" function rather than an inline, to avoid requring the IPA structure to be defined. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: remove unneeded FILT_ROUT_HASH_EN definitionsAlex Elder6-84/+0
The FILT_ROUT_HASH_EN register is only used for IPA v4.2. There, routing and filter table hashing are not supported, and so the register must be written to disable the feature. No other version uses this register, so its definition can be removed. If we need to use these some day (for example, explicitly enable the feature) this commit can be reverted. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: call device_init_wakeup() earlierAlex Elder4-32/+10
Currently, enabling wakeup for the IPA device doesn't occur until the setup phase of initialization (in ipa_power_setup()). There is no need to delay doing that, however. We can conveniently do it during the config phase, in ipa_interrupt_config(), where we enable power management wakeup mode for the IPA interrupt. Moving the device_init_wakeup() out of ipa_power_setup() leaves that function empty, so it can just be eliminated. Similarly, rearrange all of the matching inverse calls, disabling device wakeup in ipa_interrupt_deconfig() and removing that function as well. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>