summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2023-05-01net/sched: act_mirred: Add carrier checkVictor Nogueira1-1/+1
There are cases where the device is adminstratively UP, but operationally down. For example, we have a physical device (Nvidia ConnectX-6 Dx, 25Gbps) who's cable was pulled out, here is its ip link output: 5: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 link/ether b8:ce:f6:4b:68:35 brd ff:ff:ff:ff:ff:ff altname enp179s0f1np1 As you can see, it's administratively UP but operationally down. In this case, sending a packet to this port caused a nasty kernel hang (so nasty that we were unable to capture it). Aborting a transmit based on operational status (in addition to administrative status) fixes the issue. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> v1->v2: Add fixes tag v2->v3: Remove blank line between tags + add change log, suggested by Leon Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-28net: ipv6: fix skb hash for some RST packetsAntoine Tenart1-1/+1
The skb hash comes from sk->sk_txhash when using TCP, except for some IPv6 RST packets. This is because in tcp_v6_send_reset when not in TIME_WAIT the hash is taken from sk->sk_hash, while it should come from sk->sk_txhash as those two hashes are not computed the same way. Packetdrill script to test the above, 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > (flowlabel 0x1) S 0:0(0) <...> // Wrong ack seq, trigger a rst. +0 < S. 0:0(0) ack 0 win 4000 // Check the flowlabel matches prior one from SYN. +0 > (flowlabel 0x1) R 0:0(0) <...> Fixes: 9258b8b1be2e ("ipv6: tcp: send consistent autoflowlabel in RST packets") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-28sit: update dev->needed_headroom in ipip6_tunnel_bind_dev()Cong Wang1-3/+5
When a tunnel device is bound with the underlying device, its dev->needed_headroom needs to be updated properly. IPv4 tunnels already do the same in ip_tunnel_bind_dev(). Otherwise we may not have enough header room for skb, especially after commit b17f709a2401 ("gue: TX support for using remote checksum offload option"). Fixes: 32b8a8e59c9c ("sit: add IPv4 over IPv4 support") Reported-by: Palash Oswal <oswalpalash@gmail.com> Link: https://lore.kernel.org/netdev/CAGyP=7fDcSPKu6nttbGwt7RXzE3uyYxLjCSE97J64pRxJP8jPA@mail.gmail.com/ Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-28net/sched: cls_api: remove block_cb from driver_list before freeingVlad Buslov1-0/+1
Error handler of tcf_block_bind() frees the whole bo->cb_list on error. However, by that time the flow_block_cb instances are already in the driver list because driver ndo_setup_tc() callback is called before that up the call chain in tcf_block_offload_cmd(). This leaves dangling pointers to freed objects in the list and causes use-after-free[0]. Fix it by also removing flow_block_cb instances from driver_list before deallocating them. [0]: [ 279.868433] ================================================================== [ 279.869964] BUG: KASAN: slab-use-after-free in flow_block_cb_setup_simple+0x631/0x7c0 [ 279.871527] Read of size 8 at addr ffff888147e2bf20 by task tc/2963 [ 279.873151] CPU: 6 PID: 2963 Comm: tc Not tainted 6.3.0-rc6+ #4 [ 279.874273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 279.876295] Call Trace: [ 279.876882] <TASK> [ 279.877413] dump_stack_lvl+0x33/0x50 [ 279.878198] print_report+0xc2/0x610 [ 279.878987] ? flow_block_cb_setup_simple+0x631/0x7c0 [ 279.879994] kasan_report+0xae/0xe0 [ 279.880750] ? flow_block_cb_setup_simple+0x631/0x7c0 [ 279.881744] ? mlx5e_tc_reoffload_flows_work+0x240/0x240 [mlx5_core] [ 279.883047] flow_block_cb_setup_simple+0x631/0x7c0 [ 279.884027] tcf_block_offload_cmd.isra.0+0x189/0x2d0 [ 279.885037] ? tcf_block_setup+0x6b0/0x6b0 [ 279.885901] ? mutex_lock+0x7d/0xd0 [ 279.886669] ? __mutex_unlock_slowpath.constprop.0+0x2d0/0x2d0 [ 279.887844] ? ingress_init+0x1c0/0x1c0 [sch_ingress] [ 279.888846] tcf_block_get_ext+0x61c/0x1200 [ 279.889711] ingress_init+0x112/0x1c0 [sch_ingress] [ 279.890682] ? clsact_init+0x2b0/0x2b0 [sch_ingress] [ 279.891701] qdisc_create+0x401/0xea0 [ 279.892485] ? qdisc_tree_reduce_backlog+0x470/0x470 [ 279.893473] tc_modify_qdisc+0x6f7/0x16d0 [ 279.894344] ? tc_get_qdisc+0xac0/0xac0 [ 279.895213] ? mutex_lock+0x7d/0xd0 [ 279.896005] ? __mutex_lock_slowpath+0x10/0x10 [ 279.896910] rtnetlink_rcv_msg+0x5fe/0x9d0 [ 279.897770] ? rtnl_calcit.isra.0+0x2b0/0x2b0 [ 279.898672] ? __sys_sendmsg+0xb5/0x140 [ 279.899494] ? do_syscall_64+0x3d/0x90 [ 279.900302] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 279.901337] ? kasan_save_stack+0x2e/0x40 [ 279.902177] ? kasan_save_stack+0x1e/0x40 [ 279.903058] ? kasan_set_track+0x21/0x30 [ 279.903913] ? kasan_save_free_info+0x2a/0x40 [ 279.904836] ? ____kasan_slab_free+0x11a/0x1b0 [ 279.905741] ? kmem_cache_free+0x179/0x400 [ 279.906599] netlink_rcv_skb+0x12c/0x360 [ 279.907450] ? rtnl_calcit.isra.0+0x2b0/0x2b0 [ 279.908360] ? netlink_ack+0x1550/0x1550 [ 279.909192] ? rhashtable_walk_peek+0x170/0x170 [ 279.910135] ? kmem_cache_alloc_node+0x1af/0x390 [ 279.911086] ? _copy_from_iter+0x3d6/0xc70 [ 279.912031] netlink_unicast+0x553/0x790 [ 279.912864] ? netlink_attachskb+0x6a0/0x6a0 [ 279.913763] ? netlink_recvmsg+0x416/0xb50 [ 279.914627] netlink_sendmsg+0x7a1/0xcb0 [ 279.915473] ? netlink_unicast+0x790/0x790 [ 279.916334] ? iovec_from_user.part.0+0x4d/0x220 [ 279.917293] ? netlink_unicast+0x790/0x790 [ 279.918159] sock_sendmsg+0xc5/0x190 [ 279.918938] ____sys_sendmsg+0x535/0x6b0 [ 279.919813] ? import_iovec+0x7/0x10 [ 279.920601] ? kernel_sendmsg+0x30/0x30 [ 279.921423] ? __copy_msghdr+0x3c0/0x3c0 [ 279.922254] ? import_iovec+0x7/0x10 [ 279.923041] ___sys_sendmsg+0xeb/0x170 [ 279.923854] ? copy_msghdr_from_user+0x110/0x110 [ 279.924797] ? ___sys_recvmsg+0xd9/0x130 [ 279.925630] ? __perf_event_task_sched_in+0x183/0x470 [ 279.926656] ? ___sys_sendmsg+0x170/0x170 [ 279.927529] ? ctx_sched_in+0x530/0x530 [ 279.928369] ? update_curr+0x283/0x4f0 [ 279.929185] ? perf_event_update_userpage+0x570/0x570 [ 279.930201] ? __fget_light+0x57/0x520 [ 279.931023] ? __switch_to+0x53d/0xe70 [ 279.931846] ? sockfd_lookup_light+0x1a/0x140 [ 279.932761] __sys_sendmsg+0xb5/0x140 [ 279.933560] ? __sys_sendmsg_sock+0x20/0x20 [ 279.934436] ? fpregs_assert_state_consistent+0x1d/0xa0 [ 279.935490] do_syscall_64+0x3d/0x90 [ 279.936300] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 279.937311] RIP: 0033:0x7f21c814f887 [ 279.938085] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 279.941448] RSP: 002b:00007fff11efd478 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 279.942964] RAX: ffffffffffffffda RBX: 0000000064401979 RCX: 00007f21c814f887 [ 279.944337] RDX: 0000000000000000 RSI: 00007fff11efd4e0 RDI: 0000000000000003 [ 279.945660] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 279.947003] R10: 00007f21c8008708 R11: 0000000000000246 R12: 0000000000000001 [ 279.948345] R13: 0000000000409980 R14: 000000000047e538 R15: 0000000000485400 [ 279.949690] </TASK> [ 279.950706] Allocated by task 2960: [ 279.951471] kasan_save_stack+0x1e/0x40 [ 279.952338] kasan_set_track+0x21/0x30 [ 279.953165] __kasan_kmalloc+0x77/0x90 [ 279.954006] flow_block_cb_setup_simple+0x3dd/0x7c0 [ 279.955001] tcf_block_offload_cmd.isra.0+0x189/0x2d0 [ 279.956020] tcf_block_get_ext+0x61c/0x1200 [ 279.956881] ingress_init+0x112/0x1c0 [sch_ingress] [ 279.957873] qdisc_create+0x401/0xea0 [ 279.958656] tc_modify_qdisc+0x6f7/0x16d0 [ 279.959506] rtnetlink_rcv_msg+0x5fe/0x9d0 [ 279.960392] netlink_rcv_skb+0x12c/0x360 [ 279.961216] netlink_unicast+0x553/0x790 [ 279.962044] netlink_sendmsg+0x7a1/0xcb0 [ 279.962906] sock_sendmsg+0xc5/0x190 [ 279.963702] ____sys_sendmsg+0x535/0x6b0 [ 279.964534] ___sys_sendmsg+0xeb/0x170 [ 279.965343] __sys_sendmsg+0xb5/0x140 [ 279.966132] do_syscall_64+0x3d/0x90 [ 279.966908] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 279.968407] Freed by task 2960: [ 279.969114] kasan_save_stack+0x1e/0x40 [ 279.969929] kasan_set_track+0x21/0x30 [ 279.970729] kasan_save_free_info+0x2a/0x40 [ 279.971603] ____kasan_slab_free+0x11a/0x1b0 [ 279.972483] __kmem_cache_free+0x14d/0x280 [ 279.973337] tcf_block_setup+0x29d/0x6b0 [ 279.974173] tcf_block_offload_cmd.isra.0+0x226/0x2d0 [ 279.975186] tcf_block_get_ext+0x61c/0x1200 [ 279.976080] ingress_init+0x112/0x1c0 [sch_ingress] [ 279.977065] qdisc_create+0x401/0xea0 [ 279.977857] tc_modify_qdisc+0x6f7/0x16d0 [ 279.978695] rtnetlink_rcv_msg+0x5fe/0x9d0 [ 279.979562] netlink_rcv_skb+0x12c/0x360 [ 279.980388] netlink_unicast+0x553/0x790 [ 279.981214] netlink_sendmsg+0x7a1/0xcb0 [ 279.982043] sock_sendmsg+0xc5/0x190 [ 279.982827] ____sys_sendmsg+0x535/0x6b0 [ 279.983703] ___sys_sendmsg+0xeb/0x170 [ 279.984510] __sys_sendmsg+0xb5/0x140 [ 279.985298] do_syscall_64+0x3d/0x90 [ 279.986076] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 279.987532] The buggy address belongs to the object at ffff888147e2bf00 which belongs to the cache kmalloc-192 of size 192 [ 279.989747] The buggy address is located 32 bytes inside of freed 192-byte region [ffff888147e2bf00, ffff888147e2bfc0) [ 279.992367] The buggy address belongs to the physical page: [ 279.993430] page:00000000550f405c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147e2a [ 279.995182] head:00000000550f405c order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 279.996713] anon flags: 0x200000000010200(slab|head|node=0|zone=2) [ 279.997878] raw: 0200000000010200 ffff888100042a00 0000000000000000 dead000000000001 [ 279.999384] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [ 280.000894] page dumped because: kasan: bad access detected [ 280.002386] Memory state around the buggy address: [ 280.003338] ffff888147e2be00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 280.004781] ffff888147e2be80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 280.006224] >ffff888147e2bf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 280.007700] ^ [ 280.008592] ffff888147e2bf80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 280.010035] ffff888147e2c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 280.011564] ================================================================== Fixes: 59094b1e5094 ("net: sched: use flow block API") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-28tcp: fix skb_copy_ubufs() vs BIG TCPEric Dumazet1-6/+14
David Ahern reported crashes in skb_copy_ubufs() caused by TCP tx zerocopy using hugepages, and skb length bigger than ~68 KB. skb_copy_ubufs() assumed it could copy all payload using up to MAX_SKB_FRAGS order-0 pages. This assumption broke when BIG TCP was able to put up to 512 KB per skb. We did not hit this bug at Google because we use CONFIG_MAX_SKB_FRAGS=45 and limit gso_max_size to 180000. A solution is to use higher order pages if needed. v2: add missing __GFP_COMP, or we leak memory. Fixes: 7c4e983c4f3c ("net: allow gso_max_size to exceed 65536") Reported-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/netdev/c70000f6-baa4-4a05-46d0-4b3e0dc1ccc8@gmail.com/T/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Coco Li <lixiaoyan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-28net/ncsi: clear Tx enable mode when handling a Config required AENCosmo Chou1-0/+1
ncsi_channel_is_tx() determines whether a given channel should be used for Tx or not. However, when reconfiguring the channel by handling a Configuration Required AEN, there is a misjudgment that the channel Tx has already been enabled, which results in the Enable Channel Network Tx command not being sent. Clear the channel Tx enable flag before reconfiguring the channel to avoid the misjudgment. Fixes: 8d951a75d022 ("net/ncsi: Configure multi-package, multi-channel modes with failover") Signed-off-by: Cosmo Chou <chou.cosmo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-27net/sched: act_pedit: free pedit keys on bail from offset checkPedro Tammela1-1/+3
Ido Schimmel reports a memleak on a syzkaller instance: BUG: memory leak unreferenced object 0xffff88803d45e400 (size 1024): comm "syz-executor292", pid 563, jiffies 4295025223 (age 51.781s) hex dump (first 32 bytes): 28 bd 70 00 fb db df 25 02 00 14 1f ff 02 00 02 (.p....%........ 00 32 00 00 1f 00 00 00 ac 14 14 3e 08 00 07 00 .2.........>.... backtrace: [<ffffffff81bd0f2c>] kmemleak_alloc_recursive include/linux/kmemleak.h:42 [inline] [<ffffffff81bd0f2c>] slab_post_alloc_hook mm/slab.h:772 [inline] [<ffffffff81bd0f2c>] slab_alloc_node mm/slub.c:3452 [inline] [<ffffffff81bd0f2c>] __kmem_cache_alloc_node+0x25c/0x320 mm/slub.c:3491 [<ffffffff81a865d9>] __do_kmalloc_node mm/slab_common.c:966 [inline] [<ffffffff81a865d9>] __kmalloc+0x59/0x1a0 mm/slab_common.c:980 [<ffffffff83aa85c3>] kmalloc include/linux/slab.h:584 [inline] [<ffffffff83aa85c3>] tcf_pedit_init+0x793/0x1ae0 net/sched/act_pedit.c:245 [<ffffffff83a90623>] tcf_action_init_1+0x453/0x6e0 net/sched/act_api.c:1394 [<ffffffff83a90e58>] tcf_action_init+0x5a8/0x950 net/sched/act_api.c:1459 [<ffffffff83a96258>] tcf_action_add+0x118/0x4e0 net/sched/act_api.c:1985 [<ffffffff83a96997>] tc_ctl_action+0x377/0x490 net/sched/act_api.c:2044 [<ffffffff83920a8d>] rtnetlink_rcv_msg+0x46d/0xd70 net/core/rtnetlink.c:6395 [<ffffffff83b24305>] netlink_rcv_skb+0x185/0x490 net/netlink/af_netlink.c:2575 [<ffffffff83901806>] rtnetlink_rcv+0x26/0x30 net/core/rtnetlink.c:6413 [<ffffffff83b21cae>] netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] [<ffffffff83b21cae>] netlink_unicast+0x5be/0x8a0 net/netlink/af_netlink.c:1365 [<ffffffff83b2293f>] netlink_sendmsg+0x9af/0xed0 net/netlink/af_netlink.c:1942 [<ffffffff8380c39f>] sock_sendmsg_nosec net/socket.c:724 [inline] [<ffffffff8380c39f>] sock_sendmsg net/socket.c:747 [inline] [<ffffffff8380c39f>] ____sys_sendmsg+0x3ef/0xaa0 net/socket.c:2503 [<ffffffff838156d2>] ___sys_sendmsg+0x122/0x1c0 net/socket.c:2557 [<ffffffff8381594f>] __sys_sendmsg+0x11f/0x200 net/socket.c:2586 [<ffffffff83815ab0>] __do_sys_sendmsg net/socket.c:2595 [inline] [<ffffffff83815ab0>] __se_sys_sendmsg net/socket.c:2593 [inline] [<ffffffff83815ab0>] __x64_sys_sendmsg+0x80/0xc0 net/socket.c:2593 The recently added static offset check missed a free to the key buffer when bailing out on error. Fixes: e1201bc781c2 ("net/sched: act_pedit: check static offsets a priori") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230425144725.669262-1-pctammela@mojatatu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-27net/sched: flower: Fix wrong handle assignment during filter changeIvan Vecera1-1/+1
Commit 08a0063df3ae ("net/sched: flower: Move filter handle initialization earlier") moved filter handle initialization but an assignment of the handle to fnew->handle is done regardless of fold value. This is wrong because if fold != NULL (so fold->handle == handle) no new handle is allocated and passed handle is assigned to fnew->handle. Then if any subsequent action in fl_change() fails then the handle value is removed from IDR that is incorrect as we will have still valid old filter instance with handle that is not present in IDR. Fix this issue by moving the assignment so it is done only when passed fold == NULL. Prior the patch: [root@machine tc-testing]# ./tdc.py -d enp1s0f0np0 -e 14be Test 14be: Concurrently replace same range of 100k flower filters from 10 tc instances exit: 123 exit: 0 RTNETLINK answers: Invalid argument We have an error talking to the kernel Command failed tmp/replace_6:1885 All test results: 1..1 not ok 1 14be - Concurrently replace same range of 100k flower filters from 10 tc instances Command exited with 123, expected 0 RTNETLINK answers: Invalid argument We have an error talking to the kernel Command failed tmp/replace_6:1885 After the patch: [root@machine tc-testing]# ./tdc.py -d enp1s0f0np0 -e 14be Test 14be: Concurrently replace same range of 100k flower filters from 10 tc instances All test results: 1..1 ok 1 14be - Concurrently replace same range of 100k flower filters from 10 tc instances Fixes: 08a0063df3ae ("net/sched: flower: Move filter handle initialization earlier") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230425140604.169881-1-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-27rxrpc: Fix potential data race in rxrpc_wait_to_be_connected()David Howells1-8/+4
Inside the loop in rxrpc_wait_to_be_connected() it checks call->error to see if it should exit the loop without first checking the call state. This is probably safe as if call->error is set, the call is dead anyway, but we should probably wait for the call state to have been set to completion first, lest it cause surprise on the way out. Fix this by only accessing call->error if the call is complete. We don't actually need to access the error inside the loop as we'll do that after. This caused the following report: BUG: KCSAN: data-race in rxrpc_send_data / rxrpc_set_call_completion write to 0xffff888159cf3c50 of 4 bytes by task 25673 on cpu 1: rxrpc_set_call_completion+0x71/0x1c0 net/rxrpc/call_state.c:22 rxrpc_send_data_packet+0xba9/0x1650 net/rxrpc/output.c:479 rxrpc_transmit_one+0x1e/0x130 net/rxrpc/output.c:714 rxrpc_decant_prepared_tx net/rxrpc/call_event.c:326 [inline] rxrpc_transmit_some_data+0x496/0x600 net/rxrpc/call_event.c:350 rxrpc_input_call_event+0x564/0x1220 net/rxrpc/call_event.c:464 rxrpc_io_thread+0x307/0x1d80 net/rxrpc/io_thread.c:461 kthread+0x1ac/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 read to 0xffff888159cf3c50 of 4 bytes by task 25672 on cpu 0: rxrpc_send_data+0x29e/0x1950 net/rxrpc/sendmsg.c:296 rxrpc_do_sendmsg+0xb7a/0xc20 net/rxrpc/sendmsg.c:726 rxrpc_sendmsg+0x413/0x520 net/rxrpc/af_rxrpc.c:565 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x375/0x4c0 net/socket.c:2501 ___sys_sendmsg net/socket.c:2555 [inline] __sys_sendmmsg+0x263/0x500 net/socket.c:2641 __do_sys_sendmmsg net/socket.c:2670 [inline] __se_sys_sendmmsg net/socket.c:2667 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2667 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x00000000 -> 0xffffffea Fixes: 9d35d880e0e4 ("rxrpc: Move client call connection to the I/O thread") Reported-by: syzbot+ebc945fdb4acd72cba78@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000e7c6d205fa10a3cd@google.com/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Dmitry Vyukov <dvyukov@google.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/508133.1682427395@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-27Merge tag 'net-next-6.4' of ↵Linus Torvalds271-2997/+10038
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the default value allows for better BIG TCP performances - Reduce compound page head access for zero-copy data transfers - RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when possible - Threaded NAPI improvements, adding defer skb free support and unneeded softirq avoidance - Address dst_entry reference count scalability issues, via false sharing avoidance and optimize refcount tracking - Add lockless accesses annotation to sk_err[_soft] - Optimize again the skb struct layout - Extends the skb drop reasons to make it usable by multiple subsystems - Better const qualifier awareness for socket casts BPF: - Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses - Add a new BPF netfilter program type and minimal support to hook BPF programs to netfilter hooks such as prerouting or forward - Add more precise memory usage reporting for all BPF map types - Adds support for using {FOU,GUE} encap with an ipip device operating in collect_md mode and add a set of BPF kfuncs for controlling encap params - Allow BPF programs to detect at load time whether a particular kfunc exists or not, and also add support for this in light skeleton - Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities - Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc - Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps - Enable RCU semantics for task BPF kptrs and allow referenced kptr tasks to be stored in BPF maps - Add support for refcounted local kptrs to the verifier for allowing shared ownership, useful for adding a node to both the BPF list and rbtree - Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them - Add ARM32 USDT support to libbpf - Improve bpftool's visual program dump which produces the control flow graph in a DOT format by adding C source inline annotations Protocols: - IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value indicates the provenance of the IP address - IPv6: optimize route lookup, dropping unneeded R/W lock acquisition - Add the handshake upcall mechanism, allowing the user-space to implement generic TLS handshake on kernel's behalf - Bridge: support per-{Port, VLAN} neighbor suppression, increasing resilience to nodes failures - SCTP: add support for Fair Capacity and Weighted Fair Queueing schedulers - MPTCP: delay first subflow allocation up to its first usage. This will allow for later better LSM interaction - xfrm: Remove inner/outer modes from input/output path. These are not needed anymore - WiFi: - reduced neighbor report (RNR) handling for AP mode - HW timestamping support - support for randomized auth/deauth TA for PASN privacy - per-link debugfs for multi-link - TC offload support for mac80211 drivers - mac80211 mesh fast-xmit and fast-rx support - enable Wi-Fi 7 (EHT) mesh support Netfilter: - Add nf_tables 'brouting' support, to force a packet to be routed instead of being bridged - Update bridge netfilter and ovs conntrack helpers to handle IPv6 Jumbo packets properly, i.e. fetch the packet length from hop-by-hop extension header. This is needed for BIT TCP support - The iptables 32bit compat interface isn't compiled in by default anymore - Move ip(6)tables builtin icmp matches to the udptcp one. This has the advantage that icmp/icmpv6 match doesn't load the iptables/ip6tables modules anymore when iptables-nft is used - Extended netlink error report for netdevice in flowtables and netdev/chains. Allow for incrementally add/delete devices to netdev basechain. Allow to create netdev chain without device Driver API: - Remove redundant Device Control Error Reporting Enable, as PCI core has already error reporting enabled at enumeration time - Move Multicast DB netlink handlers to core, allowing devices other then bridge to use them - Allow the page_pool to directly recycle the pages from safely localized NAPI - Implement lockless TX queue stop/wake combo macros, allowing for further code de-duplication and sanitization - Add YNL support for user headers and struct attrs - Add partial YNL specification for devlink - Add partial YNL specification for ethtool - Add tc-mqprio and tc-taprio support for preemptible traffic classes - Add tx push buf len param to ethtool, specifies the maximum number of bytes of a transmitted packet a driver can push directly to the underlying device - Add basic LED support for switch/phy - Add NAPI documentation, stop relaying on external links - Convert dsa_master_ioctl() to netdev notifier. This is a preparatory work to make the hardware timestamping layer selectable by user space - Add transceiver support and improve the error messages for CAN-FD controllers New hardware / drivers: - Ethernet: - AMD/Pensando core device support - MediaTek MT7981 SoC - MediaTek MT7988 SoC - Broadcom BCM53134 embedded switch - Texas Instruments CPSW9G ethernet switch - Qualcomm EMAC3 DWMAC ethernet - StarFive JH7110 SoC - NXP CBTX ethernet PHY - WiFi: - Apple M1 Pro/Max devices - RealTek rtl8710bu/rtl8188gu - RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset - Bluetooth: - Realtek RTL8821CS, RTL8851B, RTL8852BS - Mediatek MT7663, MT7922 - NXP w8997 - Actions Semi ATS2851 - QTI WCN6855 - Marvell 88W8997 - Can: - STMicroelectronics bxcan stm32f429 Drivers: - Ethernet NICs: - Intel (1G, icg): - add tracking and reporting of QBV config errors - add support for configuring max SDU for each Tx queue - Intel (100G, ice): - refactor mailbox overflow detection to support Scalable IOV - GNSS interface optimization - Intel (i40e): - support XDP multi-buffer - nVidia/Mellanox: - add the support for linux bridge multicast offload - enable TC offload for egress and engress MACVLAN over bond - add support for VxLAN GBP encap/decap flows offload - extend packet offload to fully support libreswan - support tunnel mode in mlx5 IPsec packet offload - extend XDP multi-buffer support - support MACsec VLAN offload - add support for dynamic msix vectors allocation - drop RX page_cache and fully use page_pool - implement thermal zone to report NIC temperature - Netronome/Corigine: - add support for multi-zone conntrack offload - Solarflare/Xilinx: - support offloading TC VLAN push/pop actions to the MAE - support TC decap rules - support unicast PTP - Other NICs: - Broadcom (bnxt): enforce software based freq adjustments only on shared PHC NIC - RealTek (r8169): refactor to addess ASPM issues during NAPI poll - Micrel (lan8841): add support for PTP_PF_PEROUT - Cadence (macb): enable PTP unicast - Engleder (tsnep): add XDP socket zero-copy support - virtio-net: implement exact header length guest feature - veth: add page_pool support for page recycling - vxlan: add MDB data path support - gve: add XDP support for GQI-QPL format - geneve: accept every ethertype - macvlan: allow some packets to bypass broadcast queue - mana: add support for jumbo frame - Ethernet high-speed switches: - Microchip (sparx5): Add support for TC flower templates - Ethernet embedded switches: - Broadcom (b54): - configure 6318 and 63268 RGMII ports - Marvell (mv88e6xxx): - faster C45 bus scan - Microchip: - lan966x: - add support for IS1 VCAP - better TX/RX from/to CPU performances - ksz9477: add ETS Qdisc support - ksz8: enhance static MAC table operations and error handling - sama7g5: add PTP capability - NXP (ocelot): - add support for external ports - add support for preemptible traffic classes - Texas Instruments: - add CPSWxG SGMII support for J7200 and J721E - Intel WiFi (iwlwifi): - preparation for Wi-Fi 7 EHT and multi-link support - EHT (Wi-Fi 7) sniffer support - hardware timestamping support for some devices/firwmares - TX beacon protection on newer hardware - Qualcomm 802.11ax WiFi (ath11k): - MU-MIMO parameters support - ack signal support for management packets - RealTek WiFi (rtw88): - SDIO bus support - better support for some SDIO devices (e.g. MAC address from efuse) - RealTek WiFi (rtw89): - HW scan support for 8852b - better support for 6 GHz scanning - support for various newer firmware APIs - framework firmware backwards compatibility - MediaTek WiFi (mt76): - P2P support - mesh A-MSDU support - EHT (Wi-Fi 7) support - coredump support" * tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits) net: phy: hide the PHYLIB_LEDS knob net: phy: marvell-88x2222: remove unnecessary (void*) conversions tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp. net: amd: Fix link leak when verifying config failed net: phy: marvell: Fix inconsistent indenting in led_blink_set lan966x: Don't use xdp_frame when action is XDP_TX tsnep: Add XDP socket zero-copy TX support tsnep: Add XDP socket zero-copy RX support tsnep: Move skb receive action to separate function tsnep: Add functions for queue enable/disable tsnep: Rework TX/RX queue initialization tsnep: Replace modulo operation with mask net: phy: dp83867: Add led_brightness_set support net: phy: Fix reading LED reg property drivers: nfc: nfcsim: remove return value check of `dev_dir` net: phy: dp83867: Remove unnecessary (void*) conversions net: ethtool: coalesce: try to make user settings stick twice net: mana: Check if netdev/napi_alloc_frag returns single page net: mana: Rename mana_refill_rxoob and remove some empty lines net: veth: add page_pool stats ...
2023-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni13-101/+111
No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-26Merge tag 'acpi-6.4-rc1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These update the ACPICA code in the kernel to upstream revision 20230331, fix the ACPI SBS driver and the evaluation of the _PDC method on Xen dom0 in the ACPI processor driver, update the ACPI driver for Intel SoCs and clean up code in multiple places. Specifics: - Update the ACPICA code in the kernel to upstream revision 20230331 including the following changes: * Delete bogus node_array array of pointers from AEST table (Jessica Clarke) * Add support for trace buffer extension in GICC to the ACPI MADT parser (Xiongfeng Wang) * Add missing macro ACPI_FUNCTION_TRACE() for acpi_ns_repair_HID() (Xiongfeng Wang) * Add missing tables to astable (Pedro Falcato) * Add support for 64 bit loong_arch compilation to ACPICA (Huacai Chen) * Add support for ASPT table in disassembler to ACPICA (Jeremi Piotrowski) * Add support for Arm's MPAM ACPI table version 2 (Hesham Almatary) * Update all copyrights/signons in ACPICA to 2023 (Bob Moore) * Add support for ClockInput resource (v6.5) (Niyas Sait) * Add RISC-V INTC interrupt controller definition to the list of supported interrupt controllers for MADT (Sunil V L) * Add structure definitions for the RISC-V RHCT ACPI table (Sunil V L) * Address several cases in which the ACPICA code might lead to undefined behavior (Tamir Duberstein) * Make ACPICA code support flexible arrays properly (Kees Cook) * Check null return of ACPI_ALLOCATE_ZEROED in acpi_db_display_objects() (void0red) * Add os specific support for Zephyr RTOS to ACPICA (Najumon) * Update version to 20230331 (Bob Moore) - Fix evaluating the _PDC ACPI control method when running as Xen dom0 (Roger Pau Monne) - Use platform devices to load ACPI PPC and PCC drivers (Petr Pavlu) - Check for null return of devm_kzalloc() in fch_misc_setup() (Kang Chen) - Log a message if enable_irq_wake() fails for the ACPI SCI (Simon Gaiser) - Initialize the correct IOMMU fwspec while parsing ACPI VIOT (Jean-Philippe Brucker) - Amend indentation and prefix error messages with FW_BUG in the ACPI SPCR parsing code (Andy Shevchenko) - Enable ACPI sysfs support for CCEL records (Kuppuswamy Sathyanarayanan) - Make the APEI error injection code warn on invalid arguments when explicitly indicated by platform (Shuai Xue) - Add CXL error types to the error injection code in APEI (Tony Luck) - Refactor acpi_data_prop_read_single() (Andy Shevchenko) - Fix two issues in the ACPI SBS driver (Armin Wolf) - Replace ternary operator with min_t() in the generic ACPI thermal zone driver (Jiangshan Yi) - Ensure that ACPI notify handlers are not running after removal and clean up code in acpi_sb_notify() (Rafael Wysocki) - Remove register_backlight_delay module option and code and remove quirks for false-positive backlight control support advertised on desktop boards (Hans de Goede) - Replace irqdomain.h include with struct declarations in ACPI headers and update several pieces of code previously including of.h implicitly through those headers (Rob Herring) - Fix acpi_evaluate_dsm_typed() redefinition error (Kiran K) - Update the pm_profile sysfs attribute documentation (Rafael Wysocki) - Add 80862289 ACPI _HID for second PWM controller on Cherry Trail to the ACPI driver for Intel SoCs (Hans de Goede)" * tag 'acpi-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits) ACPI: LPSS: Add 80862289 ACPI _HID for second PWM controller on Cherry Trail ACPI: bus: Ensure that notify handlers are not running after removal ACPI: bus: Add missing braces to acpi_sb_notify() ACPI: video: Remove desktops without backlight DMI quirks ACPI: video: Remove register_backlight_delay module option and code ACPI: Replace irqdomain.h include with struct declarations fpga: lattice-sysconfig-spi: Add explicit include for of.h tpm: atmel: Add explicit include for of.h virtio-mmio: Add explicit include for of.h pata: ixp4xx: Add explicit include for of.h ata: pata_macio: Add explicit include of irqdomain.h serial: 8250_tegra: Add explicit include for of.h net: rfkill-gpio: Add explicit include for of.h staging: iio: resolver: ad2s1210: Add explicit include for of.h iio: adc: ad7292: Add explicit include for of.h ACPICA: Update version to 20230331 ACPICA: add os specific support for Zephyr RTOS ACPICA: ACPICA: check null return of ACPI_ALLOCATE_ZEROED in acpi_db_display_objects ACPICA: acpi_resource_irq: Replace 1-element arrays with flexible array ACPICA: acpi_madt_oem_data: Fix flexible array member definition ...
2023-04-25Merge tag 'asm-generic-6.4' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "These are various cleanups, fixing a number of uapi header files to no longer reference CONFIG_* symbols, and one patch that introduces the new CONFIG_HAS_IOPORT symbol for architectures that provide working inb()/outb() macros, as a preparation for adding driver dependencies on those in the following release" * tag 'asm-generic-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: Kconfig: introduce HAS_IOPORT option and select it as necessary scripts: Update the CONFIG_* ignore list in headers_install.sh pktcdvd: Remove CONFIG_CDROM_PKTCDVD_WCACHE from uapi header Move bp_type_idx to include/linux/hw_breakpoint.h Move ep_take_care_of_epollwakeup() to fs/eventpoll.c Move COMPAT_ATM_ADDPARTY to net/atm/svc.c
2023-04-25tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.Kuniyuki Iwashima1-0/+3
syzkaller reported [0] memory leaks of an UDP socket and ZEROCOPY skbs. We can reproduce the problem with these sequences: sk = socket(AF_INET, SOCK_DGRAM, 0) sk.setsockopt(SOL_SOCKET, SO_TIMESTAMPING, SOF_TIMESTAMPING_TX_SOFTWARE) sk.setsockopt(SOL_SOCKET, SO_ZEROCOPY, 1) sk.sendto(b'', MSG_ZEROCOPY, ('127.0.0.1', 53)) sk.close() sendmsg() calls msg_zerocopy_alloc(), which allocates a skb, sets skb->cb->ubuf.refcnt to 1, and calls sock_hold(). Here, struct ubuf_info_msgzc indirectly holds a refcnt of the socket. When the skb is sent, __skb_tstamp_tx() clones it and puts the clone into the socket's error queue with the TX timestamp. When the original skb is received locally, skb_copy_ubufs() calls skb_unclone(), and pskb_expand_head() increments skb->cb->ubuf.refcnt. This additional count is decremented while freeing the skb, but struct ubuf_info_msgzc still has a refcnt, so __msg_zerocopy_callback() is not called. The last refcnt is not released unless we retrieve the TX timestamped skb by recvmsg(). Since we clear the error queue in inet_sock_destruct() after the socket's refcnt reaches 0, there is a circular dependency. If we close() the socket holding such skbs, we never call sock_put() and leak the count, sk, and skb. TCP has the same problem, and commit e0c8bccd40fc ("net: stream: purge sk_error_queue in sk_stream_kill_queues()") tried to fix it by calling skb_queue_purge() during close(). However, there is a small chance that skb queued in a qdisc or device could be put into the error queue after the skb_queue_purge() call. In __skb_tstamp_tx(), the cloned skb should not have a reference to the ubuf to remove the circular dependency, but skb_clone() does not call skb_copy_ubufs() for zerocopy skb. So, we need to call skb_orphan_frags_rx() for the cloned skb to call skb_copy_ubufs(). [0]: BUG: memory leak unreferenced object 0xffff88800c6d2d00 (size 1152): comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 cd af e8 81 00 00 00 00 ................ 02 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............ backtrace: [<0000000055636812>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:2024 [<0000000054d77b7a>] sk_alloc+0x3b/0x800 net/core/sock.c:2083 [<0000000066f3c7e0>] inet_create net/ipv4/af_inet.c:319 [inline] [<0000000066f3c7e0>] inet_create+0x31e/0xe40 net/ipv4/af_inet.c:245 [<000000009b83af97>] __sock_create+0x2ab/0x550 net/socket.c:1515 [<00000000b9b11231>] sock_create net/socket.c:1566 [inline] [<00000000b9b11231>] __sys_socket_create net/socket.c:1603 [inline] [<00000000b9b11231>] __sys_socket_create net/socket.c:1588 [inline] [<00000000b9b11231>] __sys_socket+0x138/0x250 net/socket.c:1636 [<000000004fb45142>] __do_sys_socket net/socket.c:1649 [inline] [<000000004fb45142>] __se_sys_socket net/socket.c:1647 [inline] [<000000004fb45142>] __x64_sys_socket+0x73/0xb0 net/socket.c:1647 [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd BUG: memory leak unreferenced object 0xffff888017633a00 (size 240): comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 2d 6d 0c 80 88 ff ff .........-m..... backtrace: [<000000002b1c4368>] __alloc_skb+0x229/0x320 net/core/skbuff.c:497 [<00000000143579a6>] alloc_skb include/linux/skbuff.h:1265 [inline] [<00000000143579a6>] sock_omalloc+0xaa/0x190 net/core/sock.c:2596 [<00000000be626478>] msg_zerocopy_alloc net/core/skbuff.c:1294 [inline] [<00000000be626478>] msg_zerocopy_realloc+0x1ce/0x7f0 net/core/skbuff.c:1370 [<00000000cbfc9870>] __ip_append_data+0x2adf/0x3b30 net/ipv4/ip_output.c:1037 [<0000000089869146>] ip_make_skb+0x26c/0x2e0 net/ipv4/ip_output.c:1652 [<00000000098015c2>] udp_sendmsg+0x1bac/0x2390 net/ipv4/udp.c:1253 [<0000000045e0e95e>] inet_sendmsg+0x10a/0x150 net/ipv4/af_inet.c:819 [<000000008d31bfde>] sock_sendmsg_nosec net/socket.c:714 [inline] [<000000008d31bfde>] sock_sendmsg+0x141/0x190 net/socket.c:734 [<0000000021e21aa4>] __sys_sendto+0x243/0x360 net/socket.c:2117 [<00000000ac0af00c>] __do_sys_sendto net/socket.c:2129 [inline] [<00000000ac0af00c>] __se_sys_sendto net/socket.c:2125 [inline] [<00000000ac0af00c>] __x64_sys_sendto+0xe1/0x1c0 net/socket.c:2125 [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY") Fixes: b5947e5d1e71 ("udp: msg_zerocopy") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-25Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-12/+11
Pull vfs fget updates from Al Viro: "fget() to fdget() conversions" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fuse_dev_ioctl(): switch to fdget() cgroup_get_from_fd(): switch to fdget_raw() bpf: switch to fdget_raw() build_mount_idmapped(): switch to fdget() kill the last remaining user of proc_ns_fget() SVM-SEV: convert the rest of fget() uses to fdget() in there convert sgx_set_attribute() to fdget()/fdput() convert setns(2) to fdget()/fdput()
2023-04-25net: ethtool: coalesce: try to make user settings stick twiceJakub Kicinski1-11/+43
SET_COALESCE may change operation mode and parameters in one call. Changing operation mode may cause the driver to reset the parameter values to what is a reasonable default for new operation mode. Since driver does not know which parameters come from user and which are echoed back from ->get, driver may ignore the parameters when switching operation modes. This used to be inevitable for ioctl() but in netlink we know which parameters are actually specified by the user. We could inform which parameters were set by the user but this would lead to a lot of code duplication in the drivers. Instead try to call the drivers twice if both mode and params are changed. The set method already checks if any params need updating so in case the driver did the right thing the first time around - there will be no second call to it's ->set method (only an extra call to ->get()). For mlx5 for example before this patch we'd see: # ethtool -C eth0 adaptive-rx on adaptive-tx on # ethtool -C eth0 adaptive-rx off adaptive-tx off \ tx-usecs 123 rx-usecs 123 Adaptive RX: off TX: off rx-usecs: 3 rx-frames: 32 tx-usecs: 16 tx-frames: 32 [...] After the change: # ethtool -C eth0 adaptive-rx on adaptive-tx on # ethtool -C eth0 adaptive-rx off adaptive-tx off \ tx-usecs 123 rx-usecs 123 Adaptive RX: off TX: off rx-usecs: 123 rx-frames: 32 tx-usecs: 123 tx-frames: 32 [...] This only works for netlink, so it's a small discrepancy between netlink and ioctl(). Since we anticipate most users to move to netlink I believe it's worth making their lives easier. Link: https://lore.kernel.org/r/20230420233302.944382-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-25netlink: Use copy_to_user() for optval in netlink_getsockopt().Kuniyuki Iwashima1-52/+23
Brad Spencer provided a detailed report [0] that when calling getsockopt() for AF_NETLINK, some SOL_NETLINK options set only 1 byte even though such options require at least sizeof(int) as length. The options return a flag value that fits into 1 byte, but such behaviour confuses users who do not initialise the variable before calling getsockopt() and do not strictly check the returned value as char. Currently, netlink_getsockopt() uses put_user() to copy data to optlen and optval, but put_user() casts the data based on the pointer, char *optval. As a result, only 1 byte is set to optval. To avoid this behaviour, we need to use copy_to_user() or cast optval for put_user(). Note that this changes the behaviour on big-endian systems, but we document that the size of optval is int in the man page. $ man 7 netlink ... Socket options To set or get a netlink socket option, call getsockopt(2) to read or setsockopt(2) to write the option with the option level argument set to SOL_NETLINK. Unless otherwise noted, optval is a pointer to an int. Fixes: 9a4595bc7e67 ("[NETLINK]: Add set/getsockopt options to support more than 32 groups") Fixes: be0c22a46cfb ("netlink: add NETLINK_BROADCAST_ERROR socket option") Fixes: 38938bfe3489 ("netlink: add NETLINK_NO_ENOBUFS socket flag") Fixes: 0a6a3a23ea6e ("netlink: add NETLINK_CAP_ACK socket option") Fixes: 2d4bc93368f5 ("netlink: extended ACK reporting") Fixes: 89d35528d17d ("netlink: Add new socket option to enable strict checking on dumps") Reported-by: Brad Spencer <bspencer@blackberry.com> Link: https://lore.kernel.org/netdev/ZD7VkNWFfp22kTDt@datsun.rim.net/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Link: https://lore.kernel.org/r/20230421185255.94606-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-25Merge tag 'nf-next-23-04-22' of ↵Jakub Kicinski9-345/+432
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next 1) Reduce jumpstack footprint: Stash chain in last rule marker in blob for tracing. Remove last rule and chain from jumpstack. From Florian Westphal. 2) nf_tables validates all tables before committing the new rules. Unfortunately, this has two drawbacks: - Since addition of the transaction mutex pernet state gets written to outside of the locked section from the cleanup callback, this is wrong so do this cleanup directly after table has passed all checks. - Revalidate tables that saw no changes. This can be avoided by keeping the validation state per table, not per netns. From Florian Westphal. 3) Get rid of a few redundant pointers in the traceinfo structure. The three removed pointers are used in the expression evaluation loop, so gcc keeps them in registers. Passing them to the (inlined) helpers thus doesn't increase nft_do_chain text size, while stack is reduced by another 24 bytes on 64bit arches. From Florian Westphal. 4) IPVS cleanups in several ways without implementing any functional changes, aside from removing some debugging output: - Update width of source for ip_vs_sync_conn_options The operation is safe, use an annotation to describe it properly. - Consistently use array_size() in ip_vs_conn_init() It seems better to use helpers consistently. - Remove {Enter,Leave}Function. These seem to be well past their use-by date. - Correct spelling in comments. From Simon Horman. 5) Extended netlink error report for netdevice in flowtables and netdev/chains. Allow for incrementally add/delete devices to netdev basechain. Allow to create netdev chain without device. * tag 'nf-next-23-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: allow to create netdev chain without device netfilter: nf_tables: support for deleting devices in an existing netdev chain netfilter: nf_tables: support for adding new devices to an existing netdev chain netfilter: nf_tables: rename function to destroy hook list netfilter: nf_tables: do not send complete notification of deletions netfilter: nf_tables: extended netlink error reporting for netdevice ipvs: Correct spelling in comments ipvs: Remove {Enter,Leave}Function ipvs: Consistently use array_size() in ip_vs_conn_init() ipvs: Update width of source for ip_vs_sync_conn_options netfilter: nf_tables: do not store rule in traceinfo structure netfilter: nf_tables: do not store verdict in traceinfo structure netfilter: nf_tables: do not store pktinfo in traceinfo structure netfilter: nf_tables: remove unneeded conditional netfilter: nf_tables: make validation state per table netfilter: nf_tables: don't write table validation state without mutex netfilter: nf_tables: don't store chain address on jump netfilter: nf_tables: don't store address of last rule on jump netfilter: nf_tables: merge nft_rules_old structure and end of ruleblob marker ==================== Link: https://lore.kernel.org/r/20230421235021.216950-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24Merge tag 'rcu.6.4.april5.2023.3' of ↵Linus Torvalds2-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux Pull RCU updates from Joel Fernandes: - Updates and additions to MAINTAINERS files, with Boqun being added to the RCU entry and Zqiang being added as an RCU reviewer. I have also transitioned from reviewer to maintainer; however, Paul will be taking over sending RCU pull-requests for the next merge window. - Resolution of hotplug warning in nohz code, achieved by fixing cpu_is_hotpluggable() through interaction with the nohz subsystem. Tick dependency modifications by Zqiang, focusing on fixing usage of the TICK_DEP_BIT_RCU_EXP bitmask. - Avoid needless calls to the rcu-lazy shrinker for CONFIG_RCU_LAZY=n kernels, fixed by Zqiang. - Improvements to rcu-tasks stall reporting by Neeraj. - Initial renaming of k[v]free_rcu() to k[v]free_rcu_mightsleep() for increased robustness, affecting several components like mac802154, drbd, vmw_vmci, tracing, and more. A report by Eric Dumazet showed that the API could be unknowingly used in an atomic context, so we'd rather make sure they know what they're asking for by being explicit: https://lore.kernel.org/all/20221202052847.2623997-1-edumazet@google.com/ - Documentation updates, including corrections to spelling, clarifications in comments, and improvements to the srcu_size_state comments. - Better srcu_struct cache locality for readers, by adjusting the size of srcu_struct in support of SRCU usage by Christoph Hellwig. - Teach lockdep to detect deadlocks between srcu_read_lock() vs synchronize_srcu() contributed by Boqun. Previously lockdep could not detect such deadlocks, now it can. - Integration of rcutorture and rcu-related tools, targeted for v6.4 from Boqun's tree, featuring new SRCU deadlock scenarios, test_nmis module parameter, and more - Miscellaneous changes, various code cleanups and comment improvements * tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux: (71 commits) checkpatch: Error out if deprecated RCU API used mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep() rcuscale: Rename kfree_rcu() to kfree_rcu_mightsleep() ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep() net/mlx5: Rename kfree_rcu() to kfree_rcu_mightsleep() net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep() lib/test_vmalloc.c: Rename kvfree_rcu() to kvfree_rcu_mightsleep() tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep() misc: vmw_vmci: Rename kvfree_rcu() to kvfree_rcu_mightsleep() drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep() rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan() rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race rcu/trace: use strscpy() to instead of strncpy() tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem ...
2023-04-24Merge branches 'acpi-bus', 'acpi-video' and 'acpi-misc'Rafael J. Wysocki1-0/+1
Merge ACPI bus type driver changes, ACPI backlight driver updates and a series of cleanups related to of.h for 6.4-rc1: - Ensure that ACPI notify handlers are not running after removal and clean up code in acpi_sb_notify() (Rafael Wysocki). - Remove register_backlight_delay module option and code and remove quirks for false-positive backlight control support advertised on desktop boards (Hans de Goede). - Replace irqdomain.h include with struct declarations in ACPI headers and update several pieces of code previously including of.h implicitly through those headers (Rob Herring). * acpi-bus: ACPI: bus: Ensure that notify handlers are not running after removal ACPI: bus: Add missing braces to acpi_sb_notify() * acpi-video: ACPI: video: Remove desktops without backlight DMI quirks ACPI: video: Remove register_backlight_delay module option and code * acpi-misc: ACPI: Replace irqdomain.h include with struct declarations fpga: lattice-sysconfig-spi: Add explicit include for of.h tpm: atmel: Add explicit include for of.h virtio-mmio: Add explicit include for of.h pata: ixp4xx: Add explicit include for of.h ata: pata_macio: Add explicit include of irqdomain.h serial: 8250_tegra: Add explicit include for of.h net: rfkill-gpio: Add explicit include for of.h staging: iio: resolver: ad2s1210: Add explicit include for of.h iio: adc: ad7292: Add explicit include for of.h
2023-04-24Bluetooth: hci_sync: Only allow hci_cmd_sync_queue if runningLuiz Augusto von Dentz2-6/+31
This makes sure hci_cmd_sync_queue only queue new work if HCI_RUNNING has been set otherwise there is a risk of commands being sent while turning off. Because hci_cmd_sync_queue can no longer queue work while HCI_RUNNING is not set it cannot be used to power on adapters so instead hci_cmd_sync_submit is introduced which bypass the HCI_RUNNING check, so it behaves like the old implementation. Link: https://lore.kernel.org/all/CAB4PzUpDMvdc8j2MdeSAy1KkAE-D3woprCwAdYWeOc-3v3c9Sw@mail.gmail.com/ Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: Cancel sync command before suspend and power offArchie Pusaka2-0/+7
Some of the sync commands might take a long time to complete, e.g. LE Create Connection when the peer device isn't responding might take 20 seconds before it times out. If suspend command is issued during this time, it will need to wait for completion since both commands are using the same sync lock. This patch cancel any running sync commands before attempting to suspend or adapter power off. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Ying Hsu <yinghsu@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: Devcoredump: Fix storing u32 without specifying byte order issueZijun Hu1-6/+7
API hci_devcd_init() stores its u32 type parameter @dump_size into skb, but it does not specify which byte order is used to store the integer, let us take little endian to store and parse the integer. Fixes: f5cc609d09d4 ("Bluetooth: Add support for hci devcoredump") Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24bluetooth: Perform careful capability checks in hci_sock_ioctl()Ruihan Li1-1/+8
Previously, capability was checked using capable(), which verified that the caller of the ioctl system call had the required capability. In addition, the result of the check would be stored in the HCI_SOCK_TRUSTED flag, making it persistent for the socket. However, malicious programs can abuse this approach by deliberately sharing an HCI socket with a privileged task. The HCI socket will be marked as trusted when the privileged task occasionally makes an ioctl call. This problem can be solved by using sk_capable() to check capability, which ensures that not only the current task but also the socket opener has the specified capability, thus reducing the risk of privilege escalation through the previously identified vulnerability. Cc: stable@vger.kernel.org Fixes: f81f5b2db869 ("Bluetooth: Send control open and close messages for HCI raw sockets") Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: L2CAP: fix "bad unlock balance" in l2cap_disconnect_rspMin Li1-1/+0
conn->chan_lock isn't acquired before l2cap_get_chan_by_scid, if l2cap_get_chan_by_scid returns NULL, then 'bad unlock balance' is triggered. Reported-by: syzbot+9519d6b5b79cf7787cf3@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/000000000000894f5f05f95e9f4d@google.com/ Signed-off-by: Min Li <lm0963hack@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24bluetooth: Add cmd validity checks at the start of hci_sock_ioctl()Ruihan Li1-0/+28
Previously, channel open messages were always sent to monitors on the first ioctl() call for unbound HCI sockets, even if the command and arguments were completely invalid. This can leave an exploitable hole with the abuse of invalid ioctl calls. This commit hardens the ioctl processing logic by first checking if the command is valid, and immediately returning with an ENOIOCTLCMD error code if it is not. This ensures that ioctl calls with invalid commands are free of side effects, and increases the difficulty of further exploitation by forcing exploitation to find a way to pass a valid command first. Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn> Co-developed-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: Add new quirk for broken set random RPA timeout for ATS2851Raul Cheleguini1-1/+5
The ATS2851 based controller advertises support for command "LE Set Random Private Address Timeout" but does not actually implement it, impeding the controller initialization. Add the quirk HCI_QUIRK_BROKEN_SET_RPA_TIMEOUT to unblock the controller initialization. < HCI Command: LE Set Resolvable Private... (0x08|0x002e) plen 2 Timeout: 900 seconds > HCI Event: Command Status (0x0f) plen 4 LE Set Resolvable Private Address Timeout (0x08|0x002e) ncmd 1 Status: Unknown HCI Command (0x01) Co-developed-by: imoc <wzj9912@gmail.com> Signed-off-by: imoc <wzj9912@gmail.com> Signed-off-by: Raul Cheleguini <raul.cheleguini@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: hci_conn: Fix not waiting for HCI_EVT_LE_CIS_ESTABLISHEDLuiz Augusto von Dentz2-57/+66
When submitting HCI_OP_LE_CREATE_CIS the code shall wait for HCI_EVT_LE_CIS_ESTABLISHED thus enforcing the serialization of HCI_OP_LE_CREATE_CIS as the Core spec does not allow to send them in parallel: BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 2566: If the Host issues this command before all the HCI_LE_CIS_Established events from the previous use of the command have been generated, the Controller shall return the error code Command Disallowed (0x0C). Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: hci_conn: Fix not matching by CIS IDLuiz Augusto von Dentz1-1/+2
This fixes only matching CIS by address which prevents creating new hcon if upper layer is requesting a specific CIS ID. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: hci_conn: Add support for linking multiple hconLuiz Augusto von Dentz3-95/+160
Since it is required for some configurations to have multiple CIS with the same peer which is now covered by iso-tester in the following test cases: ISO AC 6(i) - Success ISO AC 7(i) - Success ISO AC 8(i) - Success ISO AC 9(i) - Success ISO AC 11(i) - Success Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: hci_conn: remove extra line in hci_le_big_create_syncIulia Tanasescu1-1/+0
Remove extra line setting the broadcast code parameter of the hci_cp_le_create_big struct to 0. The broadcast code is copied from the QoS struct. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: fix inconsistent indentingLanzhe Li1-1/+1
Fixed a wrong indentation before "return".This line uses a 7 space indent instead of a tab. Signed-off-by: Lanzhe Li <u202212060@hust.edu.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: Enable all supported LE PHY by defaultLuiz Augusto von Dentz2-6/+26
This enables 2M and Coded PHY by default if they are marked as supported in the LE features bits. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: Split bt_iso_qos into dedicated structuresIulia Tanasescu3-118/+202
Split bt_iso_qos into dedicated unicast and broadcast structures and add additional broadcast parameters. Fixes: eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections") Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: Add support for hci devcoredumpAbhishek Pandit-Subedi4-0/+540
Add devcoredump APIs to hci core so that drivers only have to provide the dump skbs instead of managing the synchronization and timeouts. The devcoredump APIs should be used in the following manner: - hci_devcoredump_init is called to allocate the dump. - hci_devcoredump_append is called to append any skbs with dump data OR hci_devcoredump_append_pattern is called to insert a pattern. - hci_devcoredump_complete is called when all dump packets have been sent OR hci_devcoredump_abort is called to indicate an error and cancel an ongoing dump collection. The high level APIs just prepare some skbs with the appropriate data and queue it for the dump to process. Packets part of the crashdump can be intercepted in the driver in interrupt context and forwarded directly to the devcoredump APIs. Internally, there are 5 states for the dump: idle, active, complete, abort and timeout. A devcoredump will only be in active state after it has been initialized. Once active, it accepts data to be appended, patterns to be inserted (i.e. memset) and a completion event or an abort event to generate a devcoredump. The timeout is initialized at the same time the dump is initialized (defaulting to 10s) and will be cleared either when the timeout occurs or the dump is complete or aborted. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Manish Mandlik <mmandlik@google.com> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: Add new quirk for broken local ext features page 2Vasily Khoruzhick1-2/+7
Some adapters (e.g. RTL8723CS) advertise that they have more than 2 pages for local ext features, but they don't support any features declared in these pages. RTL8723CS reports max_page = 2 and declares support for sync train and secure connection, but it responds with either garbage or with error in status on corresponding commands. Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Bastian Germann <bage@debian.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: L2CAP: Delay identity address updatesLuiz Augusto von Dentz2-5/+11
This delays the identity address updates to give time for userspace to process the new address otherwise there is a risk that userspace creates a duplicated device if the MGMT event is delayed for some reason. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: hci_sync: Remove duplicate statementInga Stotland1-1/+0
This removes the following duplicate statement in hci_le_ext_directed_advertising_sync(): cp.own_addr_type = own_addr_type; Signed-off-by: Inga Stotland <inga.stotland@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: Convert MSFT filter HCI cmd to hci_syncBrian Gix1-25/+11
The msft_set_filter_enable() command was using the deprecated hci_request mechanism rather than hci_sync. This caused the warning error: hci0: HCI_REQ-0xfcf0 Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: hci_sync: Don't wait peer's reply when powering offArchie Pusaka1-3/+5
Currently, when we initiate disconnection, we will wait for the peer's reply unless when we are suspending, where we fire and forget the disconnect request. A similar case is when adapter is powering off. However, we still wait for the peer's reply in this case. Therefore, if the peer is unresponsive, the command will time out and the power off sequence will fail, causing "bluetooth powered on by itself" to users. This patch makes the host doesn't wait for the peer's reply when the disconnection reason is powering off. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-24Bluetooth: hci_sync: Fix smatch warningLuiz Augusto von Dentz1-1/+1
This fixes the following new warning: net/bluetooth/hci_sync.c:2403 hci_pause_addr_resolution() warn: missing error code? 'err' Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/202302251952.xryXOegd-lkp@intel.com/ Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23net/sched: sch_qfq: refactor parsing of netlink parametersPedro Tammela1-14/+11
Two parameters can be transformed into netlink policies and validated while parsing the netlink message. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net/sched: sch_qfq: use extack on errors messagesPedro Tammela1-4/+5
Some error messages are still being printed to dmesg. Since extack is available, provide error messages there. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net/sched: sch_htb: use extack on errors messagesPedro Tammela1-8/+9
Some error messages are still being printed to dmesg. Since extack is available, provide error messages there. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net/sched: act_pedit: rate limit datapath messagesPedro Tammela1-7/+5
Unbounded info messages in the pedit datapath can flood the printk ring buffer quite easily depending on the action created. As these messages are informational, usually printing some, not all, is enough to bring attention to the real issue. Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net/sched: act_pedit: remove extra check for key typePedro Tammela1-22/+7
The netlink parsing already validates the key 'htype'. Remove the datapath check as it's redundant. Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net/sched: act_pedit: check static offsets a prioriPedro Tammela1-6/+14
Static key offsets should always be on 32 bit boundaries. Validate them on create/update time for static offsets and move the datapath validation for runtime offsets only. iproute2 already errors out if a given offset and data size cannot be packed to a 32 bit boundary. This change will make sure users which create/update pedit instances directly via netlink also error out, instead of finding out when packets are traversing. Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net/sched: act_pedit: use extack in 'ex' parsing errorsPedro Tammela1-4/+13
We have extack available when parsing 'ex' keys, so pass it to tcf_pedit_keys_ex_parse and add more detailed error messages. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net/sched: act_pedit: use NLA_POLICY for parsing 'ex' keysPedro Tammela1-8/+3
Transform two checks in the 'ex' key parsing into netlink policies removing extra if checks. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: sched: Print msecs when transmit queue time outYajun Deng1-5/+5
The kernel will print several warnings in a short period of time when it stalls. Like this: First warning: [ 7100.097547] ------------[ cut here ]------------ [ 7100.097550] NETDEV WATCHDOG: eno2 (xxx): transmit queue 8 timed out [ 7100.097571] WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x260/0x270 ... Second warning: [ 7147.756952] rcu: INFO: rcu_preempt self-detected stall on CPU [ 7147.756958] rcu: 24-....: (59999 ticks this GP) idle=546/1/0x400000000000000 softirq=367 3137/3673146 fqs=13844 [ 7147.756960] (t=60001 jiffies g=4322709 q=133381) [ 7147.756962] NMI backtrace for cpu 24 ... We calculate that the transmit queue start stall should occur before 7095s according to watchdog_timeo, the rcu start stall at 7087s. These two times are close together, it is difficult to confirm which happened first. To let users know the exact time the stall started, print msecs when the transmit queue time out. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>