summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-23netdev: support dumping a single netdev in qstatsJakub Kicinski3-13/+41
Having to filter the right ifindex in the tests is a bit tedious. Add support for dumping qstats for a single ifindex. Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240420023543.3300306-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23Merge tag '6.9-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds13-26/+176
Pull smb client fixes from Steve French: - fscache fix - fix for case where we could use uninitialized lease - add tracepoint for debugging refcounting of tcon - fix mount option regression (e.g. forceuid vs. noforceuid when uid= specified) caused by conversion to the new mount API * tag '6.9-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: reinstate original behavior again for forceuid/forcegid smb: client: fix rename(2) regression against samba cifs: Add tracing for the cifs_tcon struct refcounting cifs: Fix reacquisition of volume cookie on still-live connection
2024-04-23workqueue: Fix selection of wake_cpu in kick_pool()Sven Schnelle1-2/+6
With cpu_possible_mask=0-63 and cpu_online_mask=0-7 the following kernel oops was observed: smp: Bringing up secondary CPUs ... smp: Brought up 1 node, 8 CPUs Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 0000000000000000 TEID: 0000000000000803 [..] Call Trace: arch_vcpu_is_preempted+0x12/0x80 select_idle_sibling+0x42/0x560 select_task_rq_fair+0x29a/0x3b0 try_to_wake_up+0x38e/0x6e0 kick_pool+0xa4/0x198 __queue_work.part.0+0x2bc/0x3a8 call_timer_fn+0x36/0x160 __run_timers+0x1e2/0x328 __run_timer_base+0x5a/0x88 run_timer_softirq+0x40/0x78 __do_softirq+0x118/0x388 irq_exit_rcu+0xc0/0xd8 do_ext_irq+0xae/0x168 ext_int_handler+0xbe/0xf0 psw_idle_exit+0x0/0xc default_idle_call+0x3c/0x110 do_idle+0xd4/0x158 cpu_startup_entry+0x40/0x48 rest_init+0xc6/0xc8 start_kernel+0x3c4/0x5e0 startup_continue+0x3c/0x50 The crash is caused by calling arch_vcpu_is_preempted() for an offline CPU. To avoid this, select the cpu with cpumask_any_and_distribute() to mask __pod_cpumask with cpu_online_mask. In case no cpu is left in the pool, skip the assignment. tj: This doesn't fully fix the bug as CPUs can still go down between picking the target CPU and the wake call. Fixing that likely requires adding cpu_online() test to either the sched or s390 arch code. However, regardless of how that is fixed, workqueue shouldn't be picking a CPU which isn't online as that would result in unpredictable and worse behavior. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Fixes: 8639ecebc9b1 ("workqueue: Implement non-strict affinity scope for unbound workqueues") Cc: stable@vger.kernel.org # v6.6+ Signed-off-by: Tejun Heo <tj@kernel.org>
2024-04-23riscv: hwprobe: fix invalid sign extension for RISCV_HWPROBE_EXT_ZVFHMINClément Léger1-1/+1
The current definition yields a negative 32bits signed value which result in a mask with is obviously incorrect. Replace it by using a 1ULL bit shift value to obtain a single set bit mask. Fixes: 5dadda5e6a59 ("riscv: hwprobe: export Zvfh[min] ISA extensions") Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240409143839.558784-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-23tools: ynl: don't ignore errors in NLMSG_DONE messagesJakub Kicinski1-0/+1
NLMSG_DONE contains an error code, it has to be extracted. Prior to this change all dumps will end in success, and in case of failure the result is silently truncated. Fixes: e4b48ed460d3 ("tools: ynl: add a completely generic client") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240420020827.3288615-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23af_unix: Don't access successor in unix_del_edges() during GC.Kuniyuki Iwashima1-5/+12
syzbot reported use-after-free in unix_del_edges(). [0] What the repro does is basically repeat the following quickly. 1. pass a fd of an AF_UNIX socket to itself socketpair(AF_UNIX, SOCK_DGRAM, 0, [3, 4]) = 0 sendmsg(3, {..., msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[4]}], ...}, 0) = 0 2. pass other fds of AF_UNIX sockets to the socket above socketpair(AF_UNIX, SOCK_SEQPACKET, 0, [5, 6]) = 0 sendmsg(3, {..., msg_control=[{cmsg_len=48, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[5, 6]}], ...}, 0) = 0 3. close all sockets Here, two skb are created, and every unix_edge->successor is the first socket. Then, __unix_gc() will garbage-collect the two skb: (a) free skb with self-referencing fd (b) free skb holding other sockets After (a), the self-referencing socket will be scheduled to be freed later by the delayed_fput() task. syzbot repeated the sequences above (1. ~ 3.) quickly and triggered the task concurrently while GC was running. So, at (b), the socket was already freed, and accessing it was illegal. unix_del_edges() accesses the receiver socket as edge->successor to optimise GC. However, we should not do it during GC. Garbage-collecting sockets does not change the shape of the rest of the graph, so we need not call unix_update_graph() to update unix_graph_grouped when we purge skb. However, if we clean up all loops in the unix_walk_scc_fast() path, unix_graph_maybe_cyclic remains unchanged (true), and __unix_gc() will call unix_walk_scc_fast() continuously even though there is no socket to garbage-collect. To keep that optimisation while fixing UAF, let's add the same updating logic of unix_graph_maybe_cyclic in unix_walk_scc_fast() as done in unix_walk_scc() and __unix_walk_scc(). Note that when unix_del_edges() is called from other places, the receiver socket is always alive: - sendmsg: the successor's sk_refcnt is bumped by sock_hold() unix_find_other() for SOCK_DGRAM, connect() for SOCK_STREAM - recvmsg: the successor is the receiver, and its fd is alive [0]: BUG: KASAN: slab-use-after-free in unix_edge_successor net/unix/garbage.c:109 [inline] BUG: KASAN: slab-use-after-free in unix_del_edge net/unix/garbage.c:165 [inline] BUG: KASAN: slab-use-after-free in unix_del_edges+0x148/0x630 net/unix/garbage.c:237 Read of size 8 at addr ffff888079c6e640 by task kworker/u8:6/1099 CPU: 0 PID: 1099 Comm: kworker/u8:6 Not tainted 6.9.0-rc4-next-20240418-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: events_unbound __unix_gc Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 unix_edge_successor net/unix/garbage.c:109 [inline] unix_del_edge net/unix/garbage.c:165 [inline] unix_del_edges+0x148/0x630 net/unix/garbage.c:237 unix_destroy_fpl+0x59/0x210 net/unix/garbage.c:298 unix_detach_fds net/unix/af_unix.c:1811 [inline] unix_destruct_scm+0x13e/0x210 net/unix/af_unix.c:1826 skb_release_head_state+0x100/0x250 net/core/skbuff.c:1127 skb_release_all net/core/skbuff.c:1138 [inline] __kfree_skb net/core/skbuff.c:1154 [inline] kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1190 __skb_queue_purge_reason include/linux/skbuff.h:3251 [inline] __skb_queue_purge include/linux/skbuff.h:3256 [inline] __unix_gc+0x1732/0x1830 net/unix/garbage.c:575 process_one_work kernel/workqueue.c:3218 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299 worker_thread+0x86d/0xd70 kernel/workqueue.c:3380 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Allocated by task 14427: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 unpoison_slab_object mm/kasan/common.c:312 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3897 [inline] slab_alloc_node mm/slub.c:3957 [inline] kmem_cache_alloc_noprof+0x135/0x290 mm/slub.c:3964 sk_prot_alloc+0x58/0x210 net/core/sock.c:2074 sk_alloc+0x38/0x370 net/core/sock.c:2133 unix_create1+0xb4/0x770 unix_create+0x14e/0x200 net/unix/af_unix.c:1034 __sock_create+0x490/0x920 net/socket.c:1571 sock_create net/socket.c:1622 [inline] __sys_socketpair+0x33e/0x720 net/socket.c:1773 __do_sys_socketpair net/socket.c:1822 [inline] __se_sys_socketpair net/socket.c:1819 [inline] __x64_sys_socketpair+0x9b/0xb0 net/socket.c:1819 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 1805: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2190 [inline] slab_free mm/slub.c:4393 [inline] kmem_cache_free+0x145/0x340 mm/slub.c:4468 sk_prot_free net/core/sock.c:2114 [inline] __sk_destruct+0x467/0x5f0 net/core/sock.c:2208 sock_put include/net/sock.h:1948 [inline] unix_release_sock+0xa8b/0xd20 net/unix/af_unix.c:665 unix_release+0x91/0xc0 net/unix/af_unix.c:1049 __sock_release net/socket.c:659 [inline] sock_close+0xbc/0x240 net/socket.c:1421 __fput+0x406/0x8b0 fs/file_table.c:422 delayed_fput+0x59/0x80 fs/file_table.c:445 process_one_work kernel/workqueue.c:3218 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299 worker_thread+0x86d/0xd70 kernel/workqueue.c:3380 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The buggy address belongs to the object at ffff888079c6e000 which belongs to the cache UNIX of size 1920 The buggy address is located 1600 bytes inside of freed 1920-byte region [ffff888079c6e000, ffff888079c6e780) Reported-by: syzbot+f3f3eef1d2100200e593@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f3f3eef1d2100200e593 Fixes: 77e5593aebba ("af_unix: Skip GC if no cycle exists.") Fixes: fd86344823b5 ("af_unix: Try not to hold unix_gc_lock during accept().") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240419235102.31707-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23Merge branch 'net-ipa-eight-simple-cleanups'Paolo Abeni15-167/+51
Alex Elder says: ==================== net: ipa: eight simple cleanups This series contains a mix of cleanups, some dating back to December, 2022. Version 1 was based on an older version of net-next/main; this version has simply been rebased. The first two make it so the IPA SUSPEND interrupt only gets enabled when necessary. That make it possible in the third patch to call device_init_wakeup() during an earlier phase of initialization, and remove two functions. The next patch removes IPA register definitions that are never used. The fifth patch makes ipa_table_hash_support() a real function, so the IPA structure only needs to be declared rather than defined when that file is parsed. The sixth patch fixes improper argument names in two function declarations. The seventh removes the declaration for a function that does not exist, and makes ipa_cmd_init() actually get called. And the last one eliminates ipa_version_supported(), in favor of just deciding that if a device is probed because its compatible matches, that device is assumed to be supported. ==================== Link: https://lore.kernel.org/r/20240419151800.2168903-1-elder@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: kill ipa_version_supported()Alex Elder2-23/+0
The only place ipa_version_supported() is called is in the probe function. The version comes from the match data. Rather than checking the version validity separately, just consider anything that has match data to be supported. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: fix two minor ipa_cmd problemsAlex Elder2-8/+4
In "ipa_cmd.h", ipa_cmd_data_valid() is declared, but that function does not exist. So delete that declaration. Also, for some reason ipa_cmd_init() never gets called. It isn't really critical--it just validates that some memory offsets and a size can be represented in some register fields, and they won't fail with current data. Regardless, call the function in ipa_probe(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: fix two bogus argument namesAlex Elder1-3/+3
In "ipa_endpoint.h", two function declarations have bogus argument names. Fix these. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: make ipa_table_hash_support() a real functionAlex Elder2-6/+9
With the exception of ipa_table_hash_support(), nothing defined in "ipa_table.h" requires the full definition of the IPA structure. Change that function to be a "real" function rather than an inline, to avoid requring the IPA structure to be defined. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: remove unneeded FILT_ROUT_HASH_EN definitionsAlex Elder6-84/+0
The FILT_ROUT_HASH_EN register is only used for IPA v4.2. There, routing and filter table hashing are not supported, and so the register must be written to disable the feature. No other version uses this register, so its definition can be removed. If we need to use these some day (for example, explicitly enable the feature) this commit can be reverted. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: call device_init_wakeup() earlierAlex Elder4-32/+10
Currently, enabling wakeup for the IPA device doesn't occur until the setup phase of initialization (in ipa_power_setup()). There is no need to delay doing that, however. We can conveniently do it during the config phase, in ipa_interrupt_config(), where we enable power management wakeup mode for the IPA interrupt. Moving the device_init_wakeup() out of ipa_power_setup() leaves that function empty, so it can just be eliminated. Similarly, rearrange all of the matching inverse calls, disabling device wakeup in ipa_interrupt_deconfig() and removing that function as well. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: only enable the SUSPEND IPA interrupt when neededAlex Elder2-10/+9
Only enable the SUSPEND IPA interrupt type when at least one endpoint has that interrupt enabled. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: maintain bitmap of suspend-enabled endpointsAlex Elder1-2/+17
Keep track of which endpoints have the SUSPEND IPA interrupt enabled in a variable-length bitmap. This will be used in the next patch to allow the SUSPEND interrupt type to be disabled except when needed. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23Merge branch 'net-stmmac-fix-mac-capabilities-procedure'Paolo Abeni3-27/+25
Serge Semin says: ==================== net: stmmac: Fix MAC-capabilities procedure The series got born as a result of the discussions around the recent Yanteng' series adding the Loongson LS7A1000, LS2K1000, LS7A2000, LS2K2000 MACs support: Link: https://lore.kernel.org/netdev/fu3f6uoakylnb6eijllakeu5i4okcyqq7sfafhp5efaocbsrwe@w74xe7gb6x7p In particular the Yanteng' patchset needed to implement the Loongson MAC-specific constraints applied to the link speed and link duplex mode. As a result of the discussion with Russel the next preliminary patch was born: Link: https://lore.kernel.org/netdev/df31e8bcf74b3b4ddb7ddf5a1c371390f16a2ad5.1712917541.git.siyanteng@loongson.cn The patch above was a temporal solution utilized by Yanteng for further developments and to move on with the on-going review. This patchset is a refactored version of that single patch with formatting required for the fixes patches. The main part of the series has already been merged in on v1 stage. The leftover is the cleanup patches which rename stmmac_ops::phylink_get_caps() callback to stmmac_ops::update_caps() and move the MAC-capabilities init/re-init to the phylink MAC-capabilities getter. Link: https://lore.kernel.org/netdev/20240412180340.7965-1-fancer.lancer@gmail.com/ Changelog v2: - Add a new patch (Romain): [PATCH net-next v2 1/2] net: stmmac: Rename phylink_get_caps() callback to update_caps() - Resubmit the leftover patches to net-next tree (Paolo). Link: https://lore.kernel.org/netdev/20240417140013.12575-1-fancer.lancer@gmail.com/ Changelog v3: - Just resubmit (Jakub). Signed-off-by: Serge Semin <fancer.lancer@gmail.com> ==================== Link: https://lore.kernel.org/r/20240419090357.5547-1-fancer.lancer@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: stmmac: Move MAC caps init to phylink MAC caps getterSerge Semin1-19/+17
After a set of recent fixes the stmmac_phy_setup() and stmmac_reinit_queues() methods have turned to having some duplicated code. Let's get rid from the duplication by moving the MAC-capabilities initialization to the PHYLINK MAC-capabilities getter. The getter is called during each network device interface open/close cycle. So the MAC-capabilities will be initialized in generic device open procedure and in case of the Tx/Rx queues re-initialization as the original code semantics implies. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Romain Gantois <romain.gantois@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: stmmac: Rename phylink_get_caps() callback to update_caps()Serge Semin3-11/+11
Since recent commits the stmmac_ops::phylink_get_caps() callback has no longer been responsible for the phylink MAC capabilities getting, but merely updates the MAC capabilities in the mac_device_info::link::caps field. Rename the callback to comply with the what the method does now. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Romain Gantois <romain.gantois@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23soc: mediatek: mtk-socinfo: depends on CONFIG_SOC_BUSDaniel Golle1-0/+1
The mtk-socinfo driver uses symbols 'soc_device_register' and 'soc_device_unregister' which are part of the bus driver for System-on-Chip devices. Select SOC_BUS to make sure that driver is built and the symbols are available. Fixes: 423a54da3c7e ("soc: mediatek: mtk-socinfo: Add driver for getting chip information") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/cc8f7f7da5bdccce514a320e0ae7468659cf7346.1707327680.git.daniel@makrotopia.org Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-23soc: mediatek: mtk-svs: Append "-thermal" to thermal zone namesAngeloGioacchino Del Regno1-2/+5
The thermal framework registers thermal zones as specified in DT and including the "-thermal" suffix: append that to the driver specified tzone_name to actually match the thermal zone name as registered by the thermal API. Fixes: 2bfbf82956e2 ("soc: mediatek: mtk-svs: Constify runtime-immutable members of svs_bank") Link: https://lore.kernel.org/r/20240318113237.125802-1-angelogioacchino.delregno@collabora.com Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-23Merge branch 'enable-rx-hw-timestamp-for-ptp-packets-using-cpts-fifo'Paolo Abeni4-64/+118
Chintan Vankar says: ==================== Enable RX HW timestamp for PTP packets using CPTS FIFO The CPSW offers two mechanisms for communicating packet ingress timestamp information to the host. The first mechanism is via the CPTS Event FIFO which records timestamp when triggered by certain events. One such event is the reception of an Ethernet packet with a specified EtherType field. This is used to capture ingress timestamps for PTP packets. With this mechanism the host must read the timestamp (from the CPTS FIFO) separately from the packet payload which is delivered via DMA. In the second mechanism of timestamping, CPSW driver enables hardware timestamping for all received packets by setting the TSTAMP_EN bit in CPTS_CONTROL register, which directs the CPTS module to timestamp all received packets, followed by passing timestamp via DMA descriptors. This mechanism is responsible for triggering errata i2401: "CPSW: Host Timestamps Cause CPSW Port to Lock up." The errata affects all K3 SoCs. Link to errata for AM64x: https://www.ti.com/lit/er/sprz457h/sprz457h.pdf As a workaround we can use first mechanism to timestamp received packets. ==================== Link: https://lore.kernel.org/r/20240419082626.57225-1-c-vankar@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ethernet: ti: am65-cpsw/ethtool: Enable RX HW timestamp only for PTP ↵Chintan Vankar4-57/+35
packets In the current mechanism of timestamping, am65-cpsw-nuss driver enables hardware timestamping for all received packets by setting the TSTAMP_EN bit in CPTS_CONTROL register, which directs the CPTS module to timestamp all received packets, followed by passing timestamp via DMA descriptors. This mechanism causes CPSW Port to Lock up. To prevent port lock up, don't enable rx packet timestamping by setting TSTAMP_EN bit in CPTS_CONTROL register. The workaround for timestamping received packets is to utilize the CPTS Event FIFO that records timestamps corresponding to certain events. The CPTS module is configured to generate timestamps for Multicast Ethernet, UDP/IPv4 and UDP/IPv6 PTP packets. Update supported hwtstamp_rx_filters values for CPSW's timestamping capability. Fixes: b1f66a5bee07 ("net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support") Signed-off-by: Chintan Vankar <c-vankar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ethernet: ti: am65-cpts: Enable RX HW timestamp for PTP packets using ↵Chintan Vankar2-7/+83
CPTS FIFO Add a new function "am65_cpts_rx_timestamp()" which checks for PTP packets from header and timestamps them. Add another function "am65_cpts_find_rx_ts()" which finds CPTS FIFO Event to get the timestamp of received PTP packet. Signed-off-by: Chintan Vankar <c-vankar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23ax25: Fix netdev refcount issueDuoming Zhou1-1/+1
The dev_tracker is added to ax25_cb in ax25_bind(). When the ax25 device is detaching, the dev_tracker of ax25_cb should be deallocated in ax25_kill_by_device() instead of the dev_tracker of ax25_dev. The log reported by ref_tracker is shown below: [ 80.884935] ref_tracker: reference already released. [ 80.885150] ref_tracker: allocated in: [ 80.885349] ax25_dev_device_up+0x105/0x540 [ 80.885730] ax25_device_event+0xa4/0x420 [ 80.885730] notifier_call_chain+0xc9/0x1e0 [ 80.885730] __dev_notify_flags+0x138/0x280 [ 80.885730] dev_change_flags+0xd7/0x180 [ 80.885730] dev_ifsioc+0x6a9/0xa30 [ 80.885730] dev_ioctl+0x4d8/0xd90 [ 80.885730] sock_do_ioctl+0x1c2/0x2d0 [ 80.885730] sock_ioctl+0x38b/0x4f0 [ 80.885730] __se_sys_ioctl+0xad/0xf0 [ 80.885730] do_syscall_64+0xc4/0x1b0 [ 80.885730] entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 80.885730] ref_tracker: freed in: [ 80.885730] ax25_device_event+0x272/0x420 [ 80.885730] notifier_call_chain+0xc9/0x1e0 [ 80.885730] dev_close_many+0x272/0x370 [ 80.885730] unregister_netdevice_many_notify+0x3b5/0x1180 [ 80.885730] unregister_netdev+0xcf/0x120 [ 80.885730] sixpack_close+0x11f/0x1b0 [ 80.885730] tty_ldisc_kill+0xcb/0x190 [ 80.885730] tty_ldisc_hangup+0x338/0x3d0 [ 80.885730] __tty_hangup+0x504/0x740 [ 80.885730] tty_release+0x46e/0xd80 [ 80.885730] __fput+0x37f/0x770 [ 80.885730] __x64_sys_close+0x7b/0xb0 [ 80.885730] do_syscall_64+0xc4/0x1b0 [ 80.885730] entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 80.893739] ------------[ cut here ]------------ [ 80.894030] WARNING: CPU: 2 PID: 140 at lib/ref_tracker.c:255 ref_tracker_free+0x47b/0x6b0 [ 80.894297] Modules linked in: [ 80.894929] CPU: 2 PID: 140 Comm: ax25_conn_rel_6 Not tainted 6.9.0-rc4-g8cd26fd90c1a #11 [ 80.895190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qem4 [ 80.895514] RIP: 0010:ref_tracker_free+0x47b/0x6b0 [ 80.895808] Code: 83 c5 18 4c 89 eb 48 c1 eb 03 8a 04 13 84 c0 0f 85 df 01 00 00 41 83 7d 00 00 75 4b 4c 89 ff 9 [ 80.896171] RSP: 0018:ffff888009edf8c0 EFLAGS: 00000286 [ 80.896339] RAX: 1ffff1100141ac00 RBX: 1ffff1100149463b RCX: dffffc0000000000 [ 80.896502] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff88800a0d6518 [ 80.896925] RBP: ffff888009edf9b0 R08: ffff88806d3288d3 R09: 1ffff1100da6511a [ 80.897212] R10: dffffc0000000000 R11: ffffed100da6511b R12: ffff88800a4a31d4 [ 80.897859] R13: ffff88800a4a31d8 R14: dffffc0000000000 R15: ffff88800a0d6518 [ 80.898279] FS: 00007fd88b7fe700(0000) GS:ffff88806d300000(0000) knlGS:0000000000000000 [ 80.899436] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 80.900181] CR2: 00007fd88c001d48 CR3: 000000000993e000 CR4: 00000000000006f0 ... [ 80.935774] ref_tracker: sp%d@000000000bb9df3d has 1/1 users at [ 80.935774] ax25_bind+0x424/0x4e0 [ 80.935774] __sys_bind+0x1d9/0x270 [ 80.935774] __x64_sys_bind+0x75/0x80 [ 80.935774] do_syscall_64+0xc4/0x1b0 [ 80.935774] entry_SYSCALL_64_after_hwframe+0x67/0x6f Change ax25_dev->dev_tracker to the dev_tracker of ax25_cb in order to mitigate the bug. Fixes: feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20240419020456.29826-1-duoming@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23Merge branch ↵Paolo Abeni3-84/+75
'read-phy-address-of-switch-from-device-tree-on-mt7530-dsa-subdriver' Arınç ÜNAL says: ==================== Read PHY address of switch from device tree on MT7530 DSA subdriver This patch series makes the driver read the PHY address the switch listens on from the device tree which, in result, brings support for MT7530 switches listening on a different PHY address than 31. And the patch series simplifies the core operations. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> ==================== Link: https://lore.kernel.org/r/20240418-b4-for-netnext-mt7530-phy-addr-from-dt-and-simplify-core-ops-v3-0-3b5fb249b004@arinc9.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: dsa: mt7530: simplify core operationsArınç ÜNAL1-65/+43
The core_rmw() function calls core_read_mmd_indirect() to read the requested register, and then calls core_write_mmd_indirect() to write the requested value to the register. Because Clause 22 is used to access Clause 45 registers, some operations on core_write_mmd_indirect() are unnecessarily run. Get rid of core_read_mmd_indirect() and core_write_mmd_indirect(), and run only the necessary operations on core_write() and core_rmw(). Reviewed-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: dsa: mt7530-mdio: read PHY address of switch from device treeArınç ÜNAL3-28/+41
Read the PHY address the switch listens on from the reg property of the switch node on the device tree. This change brings support for MT7530 switches on boards with such bootstrapping configuration where the switch listens on a different PHY address than the hardcoded PHY address on the driver, 31. As described on the "MT7621 Programming Guide v0.4" document, the MT7530 switch and its PHYs can be configured to listen on the range of 7-12, 15-20, 23-28, and 31 and 0-4 PHY addresses. There are operations where the switch PHY registers are used. For the PHY address of the control PHY, transform the MT753X_CTRL_PHY_ADDR constant into a macro and use it. The PHY address for the control PHY is 0 when the switch listens on 31. In any other case, it is one greater than the PHY address the switch listens on. Reviewed-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23eeprom: at24: fix memory corruption race conditionDaniel Okazaki1-9/+9
If the eeprom is not accessible, an nvmem device will be registered, the read will fail, and the device will be torn down. If another driver accesses the nvmem device after the teardown, it will reference invalid memory. Move the failure point before registering the nvmem device. Signed-off-by: Daniel Okazaki <dtokazaki@google.com> Fixes: b20eb4c1f026 ("eeprom: at24: drop unnecessary label") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240422174337.2487142-1-dtokazaki@google.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-04-23netfs: Fix writethrough-mode error handlingDavid Howells1-4/+6
Fix the error return in netfs_perform_write() acting in writethrough-mode to return any cached error in the case that netfs_end_writethrough() returns 0. This can affect the use of O_SYNC/O_DSYNC/RWF_SYNC/RWF_DSYNC in 9p and afs. Fixes: 41d8e7673a77 ("netfs: Implement a write-through caching option") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/6736.1713343639@warthog.procyon.org.uk Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-23ntfs3: add legacy ntfs file operationsChristian Brauner4-4/+33
To ensure that ioctl()s can't be used to circumvent write restrictions. Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-23ntfs3: enforce read-only when used as legacy ntfs driverChristian Brauner2-4/+34
Ensure that ntfs3 is mounted read-only when it is used to provide the legacy ntfs driver. Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-23regulator: change stubbed devm_regulator_get_enable to return OkMatti Vaittinen1-1/+1
The devm_regulator_get_enable() should be a 'call and forget' API, meaning, when it is used to enable the regulators, the API does not provide a handle to do any further control of the regulators. It gives no real benefit to return an error from the stub if CONFIG_REGULATOR is not set. On the contrary, returning and error is causing problems to drivers when hardware is such it works out just fine with no regulator control. Returning an error forces drivers to specifically handle the case where CONFIG_REGULATOR is not set, making the mere existence of the stub questionalble. Furthermore, the stub of the regulator_enable() seems to be returning Ok. Change the stub implementation for the devm_regulator_get_enable() to return Ok so drivers do not separately handle the case where the CONFIG_REGULATOR is not set. Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com> Reported-by: Aleksander Mazur <deweloper@wp.pl> Suggested-by: Guenter Roeck <linux@roeck-us.net> Fixes: da279e6965b3 ("regulator: Add devm helpers for get and enable") Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/ZiYF6d1V1vSPcsJS@drtxq0yyyyyyyyyyyyyby-3.rev.dnainternet.fi Signed-off-by: Mark Brown <broonie@kernel.org>
2024-04-23net: ethernet: mtk_eth_soc: flower: validate control flagsAsbjørn Sloth Tønnesen1-0/+4
This driver currently doesn't support any control flags. Use flow_rule_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240418161821.189263-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23dpaa2-switch: flower: validate control flagsAsbjørn Sloth Tønnesen1-0/+6
This driver currently doesn't support any control flags. Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20240418161802.189247-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23cxgb4: flower: validate control flagsAsbjørn Sloth Tønnesen1-0/+3
This driver currently doesn't support any control flags. Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Only compile tested, no hardware available. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240418161751.189226-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23net: openvswitch: Check vport netdev nameJun Gu1-1/+4
Ensure that the provided netdev name is not one of its aliases to prevent unnecessary creation and destruction of the vport by ovs-vswitchd. Signed-off-by: Jun Gu <jun.gu@easystack.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/20240419061425.132723-1-jun.gu@easystack.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23Merge branch 'netlink-add-nftables-spec-w-multi-messages'Jakub Kicinski4-30/+1346
Donald Hunter says: ==================== netlink: Add nftables spec w/ multi messages This series adds a ynl spec for nftables and extends ynl with a --multi command line option that makes it possible to send transactional batches for nftables. This series includes a patch for nfnetlink which adds ACK processing for batch begin/end messages. If you'd prefer that to be sent separately to nf-next then I can do so, but I included it here so that it gets seen in context. An example of usage is: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/nftables.yaml \ --multi batch-begin '{"res-id": 10}' \ --multi newtable '{"name": "test", "nfgen-family": 1}' \ --multi newchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \ --multi batch-end '{"res-id": 10}' [None, None, None, None] It can also be used for bundling get requests: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/nftables.yaml \ --multi gettable '{"name": "test", "nfgen-family": 1}' \ --multi getchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \ --output-json [{"name": "test", "use": 1, "handle": 1, "flags": [], "nfgen-family": 1, "version": 0, "res-id": 2}, {"table": "test", "name": "chain", "handle": 1, "use": 0, "nfgen-family": 1, "version": 0, "res-id": 2}] There are 2 issues that may be worth resolving: - ynl reports errors by raising an NlError exception so only the first error gets reported. This could be changed to add errors to the list of responses so that multiple errors could be reported. - If any message does not get a response (e.g. batch-begin w/o patch 2) then ynl waits indefinitely. A recv timeout could be added which would allow ynl to terminate. ==================== Link: https://lore.kernel.org/r/20240418104737.77914-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23netfilter: nfnetlink: Handle ACK flags for batch messagesDonald Hunter1-0/+5
The NLM_F_ACK flag is ignored for nfnetlink batch begin and end messages. This is a problem for ynl which wants to receive an ack for every message it sends, not just the commands in between the begin/end messages. Add processing for ACKs for begin/end messages and provide responses when requested. I have checked that iproute2, pyroute2 and systemd are unaffected by this change since none of them use NLM_F_ACK for batch begin/end. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240418104737.77914-5-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23tools/net/ynl: Add multi message support to ynlDonald Hunter2-22/+71
Add a "--multi <do-op> <json>" command line to ynl that makes it possible to add several operations to a single netlink request payload. The --multi command line option is repeated for each operation. This is used by the nftables family for transaction batches. For example: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/nftables.yaml \ --multi batch-begin '{"res-id": 10}' \ --multi newtable '{"name": "test", "nfgen-family": 1}' \ --multi newchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \ --multi batch-end '{"res-id": 10}' [None, None, None, None] It can also be used for bundling get requests: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/nftables.yaml \ --multi gettable '{"name": "test", "nfgen-family": 1}' \ --multi getchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \ --output-json [{"name": "test", "use": 1, "handle": 1, "flags": [], "nfgen-family": 1, "version": 0, "res-id": 2}, {"table": "test", "name": "chain", "handle": 1, "use": 0, "nfgen-family": 1, "version": 0, "res-id": 2}] Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240418104737.77914-4-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23tools/net/ynl: Fix extack decoding for directional opsDonald Hunter1-8/+6
NetlinkProtocol.decode() was looking up ops by response value which breaks when it is used for extack decoding of directional ops. Instead, pass the op to decode(). Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240418104737.77914-3-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23doc/netlink/specs: Add draft nftables specDonald Hunter1-0/+1264
Add a spec for nftables that has nearly complete coverage of the ops, but limited coverage of rule types and subexpressions. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240418104737.77914-2-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23Merge branch 'for-uring-ubufops' into HEADJakub Kicinski9-36/+69
Pavel Begunkov says: ==================== implement io_uring notification (ubuf_info) stacking (net part) To have per request buffer notifications each zerocopy io_uring send request allocates a new ubuf_info. However, as an skb can carry only one uarg, it may force the stack to create many small skbs hurting performance in many ways. The patchset implements notification, i.e. an io_uring's ubuf_info extension, stacking. It attempts to link ubuf_info's into a list, allowing to have multiple of them per skb. liburing/examples/send-zerocopy shows up 6 times performance improvement for TCP with 4KB bytes per send, and levels it with MSG_ZEROCOPY. Without the patchset it requires much larger sends to utilise all potential. bytes | before | after (Kqps) 1200 | 195 | 1023 4000 | 193 | 1386 8000 | 154 | 1058 ==================== Link: https://lore.kernel.org/all/cover.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23Merge tag '6.9-rc5-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds5-28/+42
Pull smb server fixes from Steve French: "Five ksmbd server fixes, most also for stable: - rename fix - two fixes for potential out of bounds - fix for connections from MacOS (padding in close response) - fix for when to enable persistent handles" * tag '6.9-rc5-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: add continuous availability share parameter ksmbd: common: use struct_group_attr instead of struct_group for network_open_info ksmbd: clear RENAME_NOREPLACE before calling vfs_rename ksmbd: validate request buffer size in smb2_allocate_rsp_buf() ksmbd: fix slab-out-of-bounds in smb2_allocate_rsp_buf
2024-04-23net: add callback for setting a ubuf_info to skbPavel Begunkov2-6/+16
At the moment an skb can only have one ubuf_info associated with it, which might be a performance problem for zerocopy sends in cases like TCP via io_uring. Add a callback for assigning ubuf_info to skb, this way we will implement smarter assignment later like linking ubuf_info together. Note, it's an optional callback, which should be compatible with skb_zcopy_set(), that's because the net stack might potentially decide to clone an skb and take another reference to ubuf_info whenever it wishes. Also, a correct implementation should always be able to bind to an skb without prior ubuf_info, otherwise we could end up in a situation when the send would not be able to progress. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/all/b7918aadffeb787c84c9e72e34c729dc04f3a45d.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23net: extend ubuf_info callback to ops structurePavel Begunkov9-30/+53
We'll need to associate additional callbacks with ubuf_info, introduce a structure holding ubuf_info callbacks. Apart from a more smarter io_uring notification management introduced in next patches, it can be used to generalise msg_zerocopy_put_abort() and also store ->sg_from_iter, which is currently passed in struct msghdr. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/all/a62015541de49c0e2a8a0377a1d5d0a5aeb07016.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23kbuild: rust: remove unneeded `@rustc_cfg` to avoid ICEMiguel Ojeda1-1/+0
When KUnit tests are enabled, under very big kernel configurations (e.g. `allyesconfig`), we can trigger a `rustdoc` ICE [1]: RUSTDOC TK rust/kernel/lib.rs error: the compiler unexpectedly panicked. this is a bug. The reason is that this build step has a duplicated `@rustc_cfg` argument, which contains the kernel configuration, and thus a lot of arguments. The factor 2 happens to be enough to reach the ICE. Thus remove the unneeded `@rustc_cfg`. By doing so, we clean up the command and workaround the ICE. The ICE has been fixed in the upcoming Rust 1.79 [2]. Cc: stable@vger.kernel.org Fixes: a66d733da801 ("rust: support running Rust documentation tests as KUnit ones") Link: https://github.com/rust-lang/rust/issues/122722 [1] Link: https://github.com/rust-lang/rust/pull/122840 [2] Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240422091215.526688-1-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-04-23rust: kernel: require `Send` for `Module` implementationsWedson Almeida Filho1-1/+1
The thread that calls the module initialisation code when a module is loaded is not guaranteed [in fact, it is unlikely] to be the same one that calls the module cleanup code on module unload, therefore, `Module` implementations must be `Send` to account for them moving from one thread to another implicitly. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Cc: stable@vger.kernel.org # 6.8.x: df70d04d5697: rust: phy: implement `Send` for `Registration` Cc: stable@vger.kernel.org Fixes: 247b365dc8dc ("rust: add `kernel` crate") Link: https://lore.kernel.org/r/20240328195457.225001-3-wedsonaf@gmail.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-04-23rust: phy: implement `Send` for `Registration`Wedson Almeida Filho1-0/+4
In preparation for requiring `Send` for `Module` implementations in the next patch. Cc: FUJITA Tomonori <fujita.tomonori@gmail.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: netdev@vger.kernel.org Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240328195457.225001-2-wedsonaf@gmail.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-04-23Merge branch 'tcp-avoid-sending-too-small-packets'Jakub Kicinski1-25/+53
Eric Dumazet says: ==================== tcp: avoid sending too small packets tcp_sendmsg() cooks 'large' skbs, that are later split if needed from tcp_write_xmit(). After a split, the leftover skb size is smaller than the optimal size, and this causes a performance drop. In this series, tcp_grow_skb() helper is added to shift payload from the second skb in the write queue to the first skb to always send optimal sized skbs. This increases TSO efficiency, and decreases number of ACK packets. ==================== Link: https://lore.kernel.org/r/20240418214600.1291486-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23tcp: try to send bigger TSO packetsEric Dumazet1-2/+36
While investigating TCP performance, I found that TCP would sometimes send big skbs followed by a single MSS skb, in a 'locked' pattern. For instance, BIG TCP is enabled, MSS is set to have 4096 bytes of payload per segment. gso_max_size is set to 181000. This means that an optimal TCP packet size should contain 44 * 4096 = 180224 bytes of payload, However, I was seeing packets sizes interleaved in this pattern: 172032, 8192, 172032, 8192, 172032, 8192, <repeat> tcp_tso_should_defer() heuristic is defeated, because after a split of a packet in write queue for whatever reason (this might be a too small CWND or a small enough pacing_rate), the leftover packet in the queue is smaller than the optimal size. It is time to try to make 'leftover packets' bigger so that tcp_tso_should_defer() can give its full potential. After this patch, we can see the following output: 14:13:34.009273 IP6 sender > receiver: Flags [P.], seq 4048380:4098360, ack 1, win 256, options [nop,nop,TS val 3425678144 ecr 1561784500], length 49980 14:13:34.010272 IP6 sender > receiver: Flags [P.], seq 4098360:4148340, ack 1, win 256, options [nop,nop,TS val 3425678145 ecr 1561784501], length 49980 14:13:34.011271 IP6 sender > receiver: Flags [P.], seq 4148340:4198320, ack 1, win 256, options [nop,nop,TS val 3425678146 ecr 1561784502], length 49980 14:13:34.012271 IP6 sender > receiver: Flags [P.], seq 4198320:4248300, ack 1, win 256, options [nop,nop,TS val 3425678147 ecr 1561784503], length 49980 14:13:34.013272 IP6 sender > receiver: Flags [P.], seq 4248300:4298280, ack 1, win 256, options [nop,nop,TS val 3425678148 ecr 1561784504], length 49980 14:13:34.014271 IP6 sender > receiver: Flags [P.], seq 4298280:4348260, ack 1, win 256, options [nop,nop,TS val 3425678149 ecr 1561784505], length 49980 14:13:34.015272 IP6 sender > receiver: Flags [P.], seq 4348260:4398240, ack 1, win 256, options [nop,nop,TS val 3425678150 ecr 1561784506], length 49980 14:13:34.016270 IP6 sender > receiver: Flags [P.], seq 4398240:4448220, ack 1, win 256, options [nop,nop,TS val 3425678151 ecr 1561784507], length 49980 14:13:34.017269 IP6 sender > receiver: Flags [P.], seq 4448220:4498200, ack 1, win 256, options [nop,nop,TS val 3425678152 ecr 1561784508], length 49980 14:13:34.018276 IP6 sender > receiver: Flags [P.], seq 4498200:4548180, ack 1, win 256, options [nop,nop,TS val 3425678153 ecr 1561784509], length 49980 14:13:34.019259 IP6 sender > receiver: Flags [P.], seq 4548180:4598160, ack 1, win 256, options [nop,nop,TS val 3425678154 ecr 1561784510], length 49980 With 200 concurrent flows on a 100Gbit NIC, we can see a reduction of TSO packets (and ACK packets) of about 30 %. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240418214600.1291486-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>