Age | Commit message (Collapse) | Author | Files | Lines |
|
There are cases where the device is adminstratively UP, but operationally
down. For example, we have a physical device (Nvidia ConnectX-6 Dx, 25Gbps)
who's cable was pulled out, here is its ip link output:
5: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether b8:ce:f6:4b:68:35 brd ff:ff:ff:ff:ff:ff
altname enp179s0f1np1
As you can see, it's administratively UP but operationally down.
In this case, sending a packet to this port caused a nasty kernel hang (so
nasty that we were unable to capture it). Aborting a transmit based on
operational status (in addition to administrative status) fixes the issue.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
v1->v2: Add fixes tag
v2->v3: Remove blank line between tags + add change log, suggested by Leon
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The skb hash comes from sk->sk_txhash when using TCP, except for some
IPv6 RST packets. This is because in tcp_v6_send_reset when not in
TIME_WAIT the hash is taken from sk->sk_hash, while it should come from
sk->sk_txhash as those two hashes are not computed the same way.
Packetdrill script to test the above,
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > (flowlabel 0x1) S 0:0(0) <...>
// Wrong ack seq, trigger a rst.
+0 < S. 0:0(0) ack 0 win 4000
// Check the flowlabel matches prior one from SYN.
+0 > (flowlabel 0x1) R 0:0(0) <...>
Fixes: 9258b8b1be2e ("ipv6: tcp: send consistent autoflowlabel in RST packets")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a tunnel device is bound with the underlying device, its
dev->needed_headroom needs to be updated properly. IPv4 tunnels
already do the same in ip_tunnel_bind_dev(). Otherwise we may
not have enough header room for skb, especially after commit
b17f709a2401 ("gue: TX support for using remote checksum offload option").
Fixes: 32b8a8e59c9c ("sit: add IPv4 over IPv4 support")
Reported-by: Palash Oswal <oswalpalash@gmail.com>
Link: https://lore.kernel.org/netdev/CAGyP=7fDcSPKu6nttbGwt7RXzE3uyYxLjCSE97J64pRxJP8jPA@mail.gmail.com/
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Error handler of tcf_block_bind() frees the whole bo->cb_list on error.
However, by that time the flow_block_cb instances are already in the driver
list because driver ndo_setup_tc() callback is called before that up the
call chain in tcf_block_offload_cmd(). This leaves dangling pointers to
freed objects in the list and causes use-after-free[0]. Fix it by also
removing flow_block_cb instances from driver_list before deallocating them.
[0]:
[ 279.868433] ==================================================================
[ 279.869964] BUG: KASAN: slab-use-after-free in flow_block_cb_setup_simple+0x631/0x7c0
[ 279.871527] Read of size 8 at addr ffff888147e2bf20 by task tc/2963
[ 279.873151] CPU: 6 PID: 2963 Comm: tc Not tainted 6.3.0-rc6+ #4
[ 279.874273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 279.876295] Call Trace:
[ 279.876882] <TASK>
[ 279.877413] dump_stack_lvl+0x33/0x50
[ 279.878198] print_report+0xc2/0x610
[ 279.878987] ? flow_block_cb_setup_simple+0x631/0x7c0
[ 279.879994] kasan_report+0xae/0xe0
[ 279.880750] ? flow_block_cb_setup_simple+0x631/0x7c0
[ 279.881744] ? mlx5e_tc_reoffload_flows_work+0x240/0x240 [mlx5_core]
[ 279.883047] flow_block_cb_setup_simple+0x631/0x7c0
[ 279.884027] tcf_block_offload_cmd.isra.0+0x189/0x2d0
[ 279.885037] ? tcf_block_setup+0x6b0/0x6b0
[ 279.885901] ? mutex_lock+0x7d/0xd0
[ 279.886669] ? __mutex_unlock_slowpath.constprop.0+0x2d0/0x2d0
[ 279.887844] ? ingress_init+0x1c0/0x1c0 [sch_ingress]
[ 279.888846] tcf_block_get_ext+0x61c/0x1200
[ 279.889711] ingress_init+0x112/0x1c0 [sch_ingress]
[ 279.890682] ? clsact_init+0x2b0/0x2b0 [sch_ingress]
[ 279.891701] qdisc_create+0x401/0xea0
[ 279.892485] ? qdisc_tree_reduce_backlog+0x470/0x470
[ 279.893473] tc_modify_qdisc+0x6f7/0x16d0
[ 279.894344] ? tc_get_qdisc+0xac0/0xac0
[ 279.895213] ? mutex_lock+0x7d/0xd0
[ 279.896005] ? __mutex_lock_slowpath+0x10/0x10
[ 279.896910] rtnetlink_rcv_msg+0x5fe/0x9d0
[ 279.897770] ? rtnl_calcit.isra.0+0x2b0/0x2b0
[ 279.898672] ? __sys_sendmsg+0xb5/0x140
[ 279.899494] ? do_syscall_64+0x3d/0x90
[ 279.900302] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 279.901337] ? kasan_save_stack+0x2e/0x40
[ 279.902177] ? kasan_save_stack+0x1e/0x40
[ 279.903058] ? kasan_set_track+0x21/0x30
[ 279.903913] ? kasan_save_free_info+0x2a/0x40
[ 279.904836] ? ____kasan_slab_free+0x11a/0x1b0
[ 279.905741] ? kmem_cache_free+0x179/0x400
[ 279.906599] netlink_rcv_skb+0x12c/0x360
[ 279.907450] ? rtnl_calcit.isra.0+0x2b0/0x2b0
[ 279.908360] ? netlink_ack+0x1550/0x1550
[ 279.909192] ? rhashtable_walk_peek+0x170/0x170
[ 279.910135] ? kmem_cache_alloc_node+0x1af/0x390
[ 279.911086] ? _copy_from_iter+0x3d6/0xc70
[ 279.912031] netlink_unicast+0x553/0x790
[ 279.912864] ? netlink_attachskb+0x6a0/0x6a0
[ 279.913763] ? netlink_recvmsg+0x416/0xb50
[ 279.914627] netlink_sendmsg+0x7a1/0xcb0
[ 279.915473] ? netlink_unicast+0x790/0x790
[ 279.916334] ? iovec_from_user.part.0+0x4d/0x220
[ 279.917293] ? netlink_unicast+0x790/0x790
[ 279.918159] sock_sendmsg+0xc5/0x190
[ 279.918938] ____sys_sendmsg+0x535/0x6b0
[ 279.919813] ? import_iovec+0x7/0x10
[ 279.920601] ? kernel_sendmsg+0x30/0x30
[ 279.921423] ? __copy_msghdr+0x3c0/0x3c0
[ 279.922254] ? import_iovec+0x7/0x10
[ 279.923041] ___sys_sendmsg+0xeb/0x170
[ 279.923854] ? copy_msghdr_from_user+0x110/0x110
[ 279.924797] ? ___sys_recvmsg+0xd9/0x130
[ 279.925630] ? __perf_event_task_sched_in+0x183/0x470
[ 279.926656] ? ___sys_sendmsg+0x170/0x170
[ 279.927529] ? ctx_sched_in+0x530/0x530
[ 279.928369] ? update_curr+0x283/0x4f0
[ 279.929185] ? perf_event_update_userpage+0x570/0x570
[ 279.930201] ? __fget_light+0x57/0x520
[ 279.931023] ? __switch_to+0x53d/0xe70
[ 279.931846] ? sockfd_lookup_light+0x1a/0x140
[ 279.932761] __sys_sendmsg+0xb5/0x140
[ 279.933560] ? __sys_sendmsg_sock+0x20/0x20
[ 279.934436] ? fpregs_assert_state_consistent+0x1d/0xa0
[ 279.935490] do_syscall_64+0x3d/0x90
[ 279.936300] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 279.937311] RIP: 0033:0x7f21c814f887
[ 279.938085] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 279.941448] RSP: 002b:00007fff11efd478 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 279.942964] RAX: ffffffffffffffda RBX: 0000000064401979 RCX: 00007f21c814f887
[ 279.944337] RDX: 0000000000000000 RSI: 00007fff11efd4e0 RDI: 0000000000000003
[ 279.945660] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[ 279.947003] R10: 00007f21c8008708 R11: 0000000000000246 R12: 0000000000000001
[ 279.948345] R13: 0000000000409980 R14: 000000000047e538 R15: 0000000000485400
[ 279.949690] </TASK>
[ 279.950706] Allocated by task 2960:
[ 279.951471] kasan_save_stack+0x1e/0x40
[ 279.952338] kasan_set_track+0x21/0x30
[ 279.953165] __kasan_kmalloc+0x77/0x90
[ 279.954006] flow_block_cb_setup_simple+0x3dd/0x7c0
[ 279.955001] tcf_block_offload_cmd.isra.0+0x189/0x2d0
[ 279.956020] tcf_block_get_ext+0x61c/0x1200
[ 279.956881] ingress_init+0x112/0x1c0 [sch_ingress]
[ 279.957873] qdisc_create+0x401/0xea0
[ 279.958656] tc_modify_qdisc+0x6f7/0x16d0
[ 279.959506] rtnetlink_rcv_msg+0x5fe/0x9d0
[ 279.960392] netlink_rcv_skb+0x12c/0x360
[ 279.961216] netlink_unicast+0x553/0x790
[ 279.962044] netlink_sendmsg+0x7a1/0xcb0
[ 279.962906] sock_sendmsg+0xc5/0x190
[ 279.963702] ____sys_sendmsg+0x535/0x6b0
[ 279.964534] ___sys_sendmsg+0xeb/0x170
[ 279.965343] __sys_sendmsg+0xb5/0x140
[ 279.966132] do_syscall_64+0x3d/0x90
[ 279.966908] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 279.968407] Freed by task 2960:
[ 279.969114] kasan_save_stack+0x1e/0x40
[ 279.969929] kasan_set_track+0x21/0x30
[ 279.970729] kasan_save_free_info+0x2a/0x40
[ 279.971603] ____kasan_slab_free+0x11a/0x1b0
[ 279.972483] __kmem_cache_free+0x14d/0x280
[ 279.973337] tcf_block_setup+0x29d/0x6b0
[ 279.974173] tcf_block_offload_cmd.isra.0+0x226/0x2d0
[ 279.975186] tcf_block_get_ext+0x61c/0x1200
[ 279.976080] ingress_init+0x112/0x1c0 [sch_ingress]
[ 279.977065] qdisc_create+0x401/0xea0
[ 279.977857] tc_modify_qdisc+0x6f7/0x16d0
[ 279.978695] rtnetlink_rcv_msg+0x5fe/0x9d0
[ 279.979562] netlink_rcv_skb+0x12c/0x360
[ 279.980388] netlink_unicast+0x553/0x790
[ 279.981214] netlink_sendmsg+0x7a1/0xcb0
[ 279.982043] sock_sendmsg+0xc5/0x190
[ 279.982827] ____sys_sendmsg+0x535/0x6b0
[ 279.983703] ___sys_sendmsg+0xeb/0x170
[ 279.984510] __sys_sendmsg+0xb5/0x140
[ 279.985298] do_syscall_64+0x3d/0x90
[ 279.986076] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 279.987532] The buggy address belongs to the object at ffff888147e2bf00
which belongs to the cache kmalloc-192 of size 192
[ 279.989747] The buggy address is located 32 bytes inside of
freed 192-byte region [ffff888147e2bf00, ffff888147e2bfc0)
[ 279.992367] The buggy address belongs to the physical page:
[ 279.993430] page:00000000550f405c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147e2a
[ 279.995182] head:00000000550f405c order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 279.996713] anon flags: 0x200000000010200(slab|head|node=0|zone=2)
[ 279.997878] raw: 0200000000010200 ffff888100042a00 0000000000000000 dead000000000001
[ 279.999384] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[ 280.000894] page dumped because: kasan: bad access detected
[ 280.002386] Memory state around the buggy address:
[ 280.003338] ffff888147e2be00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 280.004781] ffff888147e2be80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 280.006224] >ffff888147e2bf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 280.007700] ^
[ 280.008592] ffff888147e2bf80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 280.010035] ffff888147e2c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 280.011564] ==================================================================
Fixes: 59094b1e5094 ("net: sched: use flow block API")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
David Ahern reported crashes in skb_copy_ubufs() caused by TCP tx zerocopy
using hugepages, and skb length bigger than ~68 KB.
skb_copy_ubufs() assumed it could copy all payload using up to
MAX_SKB_FRAGS order-0 pages.
This assumption broke when BIG TCP was able to put up to 512 KB per skb.
We did not hit this bug at Google because we use CONFIG_MAX_SKB_FRAGS=45
and limit gso_max_size to 180000.
A solution is to use higher order pages if needed.
v2: add missing __GFP_COMP, or we leak memory.
Fixes: 7c4e983c4f3c ("net: allow gso_max_size to exceed 65536")
Reported-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/netdev/c70000f6-baa4-4a05-46d0-4b3e0dc1ccc8@gmail.com/T/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ncsi_channel_is_tx() determines whether a given channel should be
used for Tx or not. However, when reconfiguring the channel by
handling a Configuration Required AEN, there is a misjudgment that
the channel Tx has already been enabled, which results in the Enable
Channel Network Tx command not being sent.
Clear the channel Tx enable flag before reconfiguring the channel to
avoid the misjudgment.
Fixes: 8d951a75d022 ("net/ncsi: Configure multi-package, multi-channel modes with failover")
Signed-off-by: Cosmo Chou <chou.cosmo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ido Schimmel reports a memleak on a syzkaller instance:
BUG: memory leak
unreferenced object 0xffff88803d45e400 (size 1024):
comm "syz-executor292", pid 563, jiffies 4295025223 (age 51.781s)
hex dump (first 32 bytes):
28 bd 70 00 fb db df 25 02 00 14 1f ff 02 00 02 (.p....%........
00 32 00 00 1f 00 00 00 ac 14 14 3e 08 00 07 00 .2.........>....
backtrace:
[<ffffffff81bd0f2c>] kmemleak_alloc_recursive include/linux/kmemleak.h:42 [inline]
[<ffffffff81bd0f2c>] slab_post_alloc_hook mm/slab.h:772 [inline]
[<ffffffff81bd0f2c>] slab_alloc_node mm/slub.c:3452 [inline]
[<ffffffff81bd0f2c>] __kmem_cache_alloc_node+0x25c/0x320 mm/slub.c:3491
[<ffffffff81a865d9>] __do_kmalloc_node mm/slab_common.c:966 [inline]
[<ffffffff81a865d9>] __kmalloc+0x59/0x1a0 mm/slab_common.c:980
[<ffffffff83aa85c3>] kmalloc include/linux/slab.h:584 [inline]
[<ffffffff83aa85c3>] tcf_pedit_init+0x793/0x1ae0 net/sched/act_pedit.c:245
[<ffffffff83a90623>] tcf_action_init_1+0x453/0x6e0 net/sched/act_api.c:1394
[<ffffffff83a90e58>] tcf_action_init+0x5a8/0x950 net/sched/act_api.c:1459
[<ffffffff83a96258>] tcf_action_add+0x118/0x4e0 net/sched/act_api.c:1985
[<ffffffff83a96997>] tc_ctl_action+0x377/0x490 net/sched/act_api.c:2044
[<ffffffff83920a8d>] rtnetlink_rcv_msg+0x46d/0xd70 net/core/rtnetlink.c:6395
[<ffffffff83b24305>] netlink_rcv_skb+0x185/0x490 net/netlink/af_netlink.c:2575
[<ffffffff83901806>] rtnetlink_rcv+0x26/0x30 net/core/rtnetlink.c:6413
[<ffffffff83b21cae>] netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
[<ffffffff83b21cae>] netlink_unicast+0x5be/0x8a0 net/netlink/af_netlink.c:1365
[<ffffffff83b2293f>] netlink_sendmsg+0x9af/0xed0 net/netlink/af_netlink.c:1942
[<ffffffff8380c39f>] sock_sendmsg_nosec net/socket.c:724 [inline]
[<ffffffff8380c39f>] sock_sendmsg net/socket.c:747 [inline]
[<ffffffff8380c39f>] ____sys_sendmsg+0x3ef/0xaa0 net/socket.c:2503
[<ffffffff838156d2>] ___sys_sendmsg+0x122/0x1c0 net/socket.c:2557
[<ffffffff8381594f>] __sys_sendmsg+0x11f/0x200 net/socket.c:2586
[<ffffffff83815ab0>] __do_sys_sendmsg net/socket.c:2595 [inline]
[<ffffffff83815ab0>] __se_sys_sendmsg net/socket.c:2593 [inline]
[<ffffffff83815ab0>] __x64_sys_sendmsg+0x80/0xc0 net/socket.c:2593
The recently added static offset check missed a free to the key buffer when
bailing out on error.
Fixes: e1201bc781c2 ("net/sched: act_pedit: check static offsets a priori")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230425144725.669262-1-pctammela@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Commit 08a0063df3ae ("net/sched: flower: Move filter handle initialization
earlier") moved filter handle initialization but an assignment of
the handle to fnew->handle is done regardless of fold value. This is wrong
because if fold != NULL (so fold->handle == handle) no new handle is
allocated and passed handle is assigned to fnew->handle. Then if any
subsequent action in fl_change() fails then the handle value is
removed from IDR that is incorrect as we will have still valid old filter
instance with handle that is not present in IDR.
Fix this issue by moving the assignment so it is done only when passed
fold == NULL.
Prior the patch:
[root@machine tc-testing]# ./tdc.py -d enp1s0f0np0 -e 14be
Test 14be: Concurrently replace same range of 100k flower filters from 10 tc instances
exit: 123
exit: 0
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
Command failed tmp/replace_6:1885
All test results:
1..1
not ok 1 14be - Concurrently replace same range of 100k flower filters from 10 tc instances
Command exited with 123, expected 0
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
Command failed tmp/replace_6:1885
After the patch:
[root@machine tc-testing]# ./tdc.py -d enp1s0f0np0 -e 14be
Test 14be: Concurrently replace same range of 100k flower filters from 10 tc instances
All test results:
1..1
ok 1 14be - Concurrently replace same range of 100k flower filters from 10 tc instances
Fixes: 08a0063df3ae ("net/sched: flower: Move filter handle initialization earlier")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230425140604.169881-1-ivecera@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Inside the loop in rxrpc_wait_to_be_connected() it checks call->error to
see if it should exit the loop without first checking the call state. This
is probably safe as if call->error is set, the call is dead anyway, but we
should probably wait for the call state to have been set to completion
first, lest it cause surprise on the way out.
Fix this by only accessing call->error if the call is complete. We don't
actually need to access the error inside the loop as we'll do that after.
This caused the following report:
BUG: KCSAN: data-race in rxrpc_send_data / rxrpc_set_call_completion
write to 0xffff888159cf3c50 of 4 bytes by task 25673 on cpu 1:
rxrpc_set_call_completion+0x71/0x1c0 net/rxrpc/call_state.c:22
rxrpc_send_data_packet+0xba9/0x1650 net/rxrpc/output.c:479
rxrpc_transmit_one+0x1e/0x130 net/rxrpc/output.c:714
rxrpc_decant_prepared_tx net/rxrpc/call_event.c:326 [inline]
rxrpc_transmit_some_data+0x496/0x600 net/rxrpc/call_event.c:350
rxrpc_input_call_event+0x564/0x1220 net/rxrpc/call_event.c:464
rxrpc_io_thread+0x307/0x1d80 net/rxrpc/io_thread.c:461
kthread+0x1ac/0x1e0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
read to 0xffff888159cf3c50 of 4 bytes by task 25672 on cpu 0:
rxrpc_send_data+0x29e/0x1950 net/rxrpc/sendmsg.c:296
rxrpc_do_sendmsg+0xb7a/0xc20 net/rxrpc/sendmsg.c:726
rxrpc_sendmsg+0x413/0x520 net/rxrpc/af_rxrpc.c:565
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x375/0x4c0 net/socket.c:2501
___sys_sendmsg net/socket.c:2555 [inline]
__sys_sendmmsg+0x263/0x500 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00000000 -> 0xffffffea
Fixes: 9d35d880e0e4 ("rxrpc: Move client call connection to the I/O thread")
Reported-by: syzbot+ebc945fdb4acd72cba78@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000e7c6d205fa10a3cd@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Dmitry Vyukov <dvyukov@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/508133.1682427395@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
default value allows for better BIG TCP performances
- Reduce compound page head access for zero-copy data transfers
- RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
possible
- Threaded NAPI improvements, adding defer skb free support and
unneeded softirq avoidance
- Address dst_entry reference count scalability issues, via false
sharing avoidance and optimize refcount tracking
- Add lockless accesses annotation to sk_err[_soft]
- Optimize again the skb struct layout
- Extends the skb drop reasons to make it usable by multiple
subsystems
- Better const qualifier awareness for socket casts
BPF:
- Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and
variable-sized accesses
- Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward
- Add more precise memory usage reporting for all BPF map types
- Adds support for using {FOU,GUE} encap with an ipip device
operating in collect_md mode and add a set of BPF kfuncs for
controlling encap params
- Allow BPF programs to detect at load time whether a particular
kfunc exists or not, and also add support for this in light
skeleton
- Bigger batch of BPF verifier improvements to prepare for upcoming
BPF open-coded iterators allowing for less restrictive looping
capabilities
- Rework RCU enforcement in the verifier, add kptr_rcu and enforce
BPF programs to NULL-check before passing such pointers into kfunc
- Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
in local storage maps
- Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps
- Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree
- Add BPF verifier support for ST instructions in
convert_ctx_access() which will help new -mcpu=v4 clang flag to
start emitting them
- Add ARM32 USDT support to libbpf
- Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations
Protocols:
- IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
indicates the provenance of the IP address
- IPv6: optimize route lookup, dropping unneeded R/W lock acquisition
- Add the handshake upcall mechanism, allowing the user-space to
implement generic TLS handshake on kernel's behalf
- Bridge: support per-{Port, VLAN} neighbor suppression, increasing
resilience to nodes failures
- SCTP: add support for Fair Capacity and Weighted Fair Queueing
schedulers
- MPTCP: delay first subflow allocation up to its first usage. This
will allow for later better LSM interaction
- xfrm: Remove inner/outer modes from input/output path. These are
not needed anymore
- WiFi:
- reduced neighbor report (RNR) handling for AP mode
- HW timestamping support
- support for randomized auth/deauth TA for PASN privacy
- per-link debugfs for multi-link
- TC offload support for mac80211 drivers
- mac80211 mesh fast-xmit and fast-rx support
- enable Wi-Fi 7 (EHT) mesh support
Netfilter:
- Add nf_tables 'brouting' support, to force a packet to be routed
instead of being bridged
- Update bridge netfilter and ovs conntrack helpers to handle IPv6
Jumbo packets properly, i.e. fetch the packet length from
hop-by-hop extension header. This is needed for BIT TCP support
- The iptables 32bit compat interface isn't compiled in by default
anymore
- Move ip(6)tables builtin icmp matches to the udptcp one. This has
the advantage that icmp/icmpv6 match doesn't load the
iptables/ip6tables modules anymore when iptables-nft is used
- Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device
Driver API:
- Remove redundant Device Control Error Reporting Enable, as PCI core
has already error reporting enabled at enumeration time
- Move Multicast DB netlink handlers to core, allowing devices other
then bridge to use them
- Allow the page_pool to directly recycle the pages from safely
localized NAPI
- Implement lockless TX queue stop/wake combo macros, allowing for
further code de-duplication and sanitization
- Add YNL support for user headers and struct attrs
- Add partial YNL specification for devlink
- Add partial YNL specification for ethtool
- Add tc-mqprio and tc-taprio support for preemptible traffic classes
- Add tx push buf len param to ethtool, specifies the maximum number
of bytes of a transmitted packet a driver can push directly to the
underlying device
- Add basic LED support for switch/phy
- Add NAPI documentation, stop relaying on external links
- Convert dsa_master_ioctl() to netdev notifier. This is a
preparatory work to make the hardware timestamping layer selectable
by user space
- Add transceiver support and improve the error messages for CAN-FD
controllers
New hardware / drivers:
- Ethernet:
- AMD/Pensando core device support
- MediaTek MT7981 SoC
- MediaTek MT7988 SoC
- Broadcom BCM53134 embedded switch
- Texas Instruments CPSW9G ethernet switch
- Qualcomm EMAC3 DWMAC ethernet
- StarFive JH7110 SoC
- NXP CBTX ethernet PHY
- WiFi:
- Apple M1 Pro/Max devices
- RealTek rtl8710bu/rtl8188gu
- RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
- Bluetooth:
- Realtek RTL8821CS, RTL8851B, RTL8852BS
- Mediatek MT7663, MT7922
- NXP w8997
- Actions Semi ATS2851
- QTI WCN6855
- Marvell 88W8997
- Can:
- STMicroelectronics bxcan stm32f429
Drivers:
- Ethernet NICs:
- Intel (1G, icg):
- add tracking and reporting of QBV config errors
- add support for configuring max SDU for each Tx queue
- Intel (100G, ice):
- refactor mailbox overflow detection to support Scalable IOV
- GNSS interface optimization
- Intel (i40e):
- support XDP multi-buffer
- nVidia/Mellanox:
- add the support for linux bridge multicast offload
- enable TC offload for egress and engress MACVLAN over bond
- add support for VxLAN GBP encap/decap flows offload
- extend packet offload to fully support libreswan
- support tunnel mode in mlx5 IPsec packet offload
- extend XDP multi-buffer support
- support MACsec VLAN offload
- add support for dynamic msix vectors allocation
- drop RX page_cache and fully use page_pool
- implement thermal zone to report NIC temperature
- Netronome/Corigine:
- add support for multi-zone conntrack offload
- Solarflare/Xilinx:
- support offloading TC VLAN push/pop actions to the MAE
- support TC decap rules
- support unicast PTP
- Other NICs:
- Broadcom (bnxt): enforce software based freq adjustments only on
shared PHC NIC
- RealTek (r8169): refactor to addess ASPM issues during NAPI poll
- Micrel (lan8841): add support for PTP_PF_PEROUT
- Cadence (macb): enable PTP unicast
- Engleder (tsnep): add XDP socket zero-copy support
- virtio-net: implement exact header length guest feature
- veth: add page_pool support for page recycling
- vxlan: add MDB data path support
- gve: add XDP support for GQI-QPL format
- geneve: accept every ethertype
- macvlan: allow some packets to bypass broadcast queue
- mana: add support for jumbo frame
- Ethernet high-speed switches:
- Microchip (sparx5): Add support for TC flower templates
- Ethernet embedded switches:
- Broadcom (b54):
- configure 6318 and 63268 RGMII ports
- Marvell (mv88e6xxx):
- faster C45 bus scan
- Microchip:
- lan966x:
- add support for IS1 VCAP
- better TX/RX from/to CPU performances
- ksz9477: add ETS Qdisc support
- ksz8: enhance static MAC table operations and error handling
- sama7g5: add PTP capability
- NXP (ocelot):
- add support for external ports
- add support for preemptible traffic classes
- Texas Instruments:
- add CPSWxG SGMII support for J7200 and J721E
- Intel WiFi (iwlwifi):
- preparation for Wi-Fi 7 EHT and multi-link support
- EHT (Wi-Fi 7) sniffer support
- hardware timestamping support for some devices/firwmares
- TX beacon protection on newer hardware
- Qualcomm 802.11ax WiFi (ath11k):
- MU-MIMO parameters support
- ack signal support for management packets
- RealTek WiFi (rtw88):
- SDIO bus support
- better support for some SDIO devices (e.g. MAC address from
efuse)
- RealTek WiFi (rtw89):
- HW scan support for 8852b
- better support for 6 GHz scanning
- support for various newer firmware APIs
- framework firmware backwards compatibility
- MediaTek WiFi (mt76):
- P2P support
- mesh A-MSDU support
- EHT (Wi-Fi 7) support
- coredump support"
* tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
net: phy: hide the PHYLIB_LEDS knob
net: phy: marvell-88x2222: remove unnecessary (void*) conversions
tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
net: amd: Fix link leak when verifying config failed
net: phy: marvell: Fix inconsistent indenting in led_blink_set
lan966x: Don't use xdp_frame when action is XDP_TX
tsnep: Add XDP socket zero-copy TX support
tsnep: Add XDP socket zero-copy RX support
tsnep: Move skb receive action to separate function
tsnep: Add functions for queue enable/disable
tsnep: Rework TX/RX queue initialization
tsnep: Replace modulo operation with mask
net: phy: dp83867: Add led_brightness_set support
net: phy: Fix reading LED reg property
drivers: nfc: nfcsim: remove return value check of `dev_dir`
net: phy: dp83867: Remove unnecessary (void*) conversions
net: ethtool: coalesce: try to make user settings stick twice
net: mana: Check if netdev/napi_alloc_frag returns single page
net: mana: Rename mana_refill_rxoob and remove some empty lines
net: veth: add page_pool stats
...
|
|
No conflicts.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These update the ACPICA code in the kernel to upstream revision
20230331, fix the ACPI SBS driver and the evaluation of the _PDC
method on Xen dom0 in the ACPI processor driver, update the ACPI
driver for Intel SoCs and clean up code in multiple places.
Specifics:
- Update the ACPICA code in the kernel to upstream revision 20230331
including the following changes:
* Delete bogus node_array array of pointers from AEST table
(Jessica Clarke)
* Add support for trace buffer extension in GICC to the ACPI MADT
parser (Xiongfeng Wang)
* Add missing macro ACPI_FUNCTION_TRACE() for
acpi_ns_repair_HID() (Xiongfeng Wang)
* Add missing tables to astable (Pedro Falcato)
* Add support for 64 bit loong_arch compilation to ACPICA (Huacai
Chen)
* Add support for ASPT table in disassembler to ACPICA (Jeremi
Piotrowski)
* Add support for Arm's MPAM ACPI table version 2 (Hesham
Almatary)
* Update all copyrights/signons in ACPICA to 2023 (Bob Moore)
* Add support for ClockInput resource (v6.5) (Niyas Sait)
* Add RISC-V INTC interrupt controller definition to the list of
supported interrupt controllers for MADT (Sunil V L)
* Add structure definitions for the RISC-V RHCT ACPI table (Sunil
V L)
* Address several cases in which the ACPICA code might lead to
undefined behavior (Tamir Duberstein)
* Make ACPICA code support flexible arrays properly (Kees Cook)
* Check null return of ACPI_ALLOCATE_ZEROED in
acpi_db_display_objects() (void0red)
* Add os specific support for Zephyr RTOS to ACPICA (Najumon)
* Update version to 20230331 (Bob Moore)
- Fix evaluating the _PDC ACPI control method when running as Xen
dom0 (Roger Pau Monne)
- Use platform devices to load ACPI PPC and PCC drivers (Petr Pavlu)
- Check for null return of devm_kzalloc() in fch_misc_setup() (Kang
Chen)
- Log a message if enable_irq_wake() fails for the ACPI SCI (Simon
Gaiser)
- Initialize the correct IOMMU fwspec while parsing ACPI VIOT
(Jean-Philippe Brucker)
- Amend indentation and prefix error messages with FW_BUG in the ACPI
SPCR parsing code (Andy Shevchenko)
- Enable ACPI sysfs support for CCEL records (Kuppuswamy
Sathyanarayanan)
- Make the APEI error injection code warn on invalid arguments when
explicitly indicated by platform (Shuai Xue)
- Add CXL error types to the error injection code in APEI (Tony Luck)
- Refactor acpi_data_prop_read_single() (Andy Shevchenko)
- Fix two issues in the ACPI SBS driver (Armin Wolf)
- Replace ternary operator with min_t() in the generic ACPI thermal
zone driver (Jiangshan Yi)
- Ensure that ACPI notify handlers are not running after removal and
clean up code in acpi_sb_notify() (Rafael Wysocki)
- Remove register_backlight_delay module option and code and remove
quirks for false-positive backlight control support advertised on
desktop boards (Hans de Goede)
- Replace irqdomain.h include with struct declarations in ACPI
headers and update several pieces of code previously including of.h
implicitly through those headers (Rob Herring)
- Fix acpi_evaluate_dsm_typed() redefinition error (Kiran K)
- Update the pm_profile sysfs attribute documentation (Rafael
Wysocki)
- Add 80862289 ACPI _HID for second PWM controller on Cherry Trail to
the ACPI driver for Intel SoCs (Hans de Goede)"
* tag 'acpi-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits)
ACPI: LPSS: Add 80862289 ACPI _HID for second PWM controller on Cherry Trail
ACPI: bus: Ensure that notify handlers are not running after removal
ACPI: bus: Add missing braces to acpi_sb_notify()
ACPI: video: Remove desktops without backlight DMI quirks
ACPI: video: Remove register_backlight_delay module option and code
ACPI: Replace irqdomain.h include with struct declarations
fpga: lattice-sysconfig-spi: Add explicit include for of.h
tpm: atmel: Add explicit include for of.h
virtio-mmio: Add explicit include for of.h
pata: ixp4xx: Add explicit include for of.h
ata: pata_macio: Add explicit include of irqdomain.h
serial: 8250_tegra: Add explicit include for of.h
net: rfkill-gpio: Add explicit include for of.h
staging: iio: resolver: ad2s1210: Add explicit include for of.h
iio: adc: ad7292: Add explicit include for of.h
ACPICA: Update version to 20230331
ACPICA: add os specific support for Zephyr RTOS
ACPICA: ACPICA: check null return of ACPI_ALLOCATE_ZEROED in acpi_db_display_objects
ACPICA: acpi_resource_irq: Replace 1-element arrays with flexible array
ACPICA: acpi_madt_oem_data: Fix flexible array member definition
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"These are various cleanups, fixing a number of uapi header files to no
longer reference CONFIG_* symbols, and one patch that introduces the
new CONFIG_HAS_IOPORT symbol for architectures that provide working
inb()/outb() macros, as a preparation for adding driver dependencies
on those in the following release"
* tag 'asm-generic-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
Kconfig: introduce HAS_IOPORT option and select it as necessary
scripts: Update the CONFIG_* ignore list in headers_install.sh
pktcdvd: Remove CONFIG_CDROM_PKTCDVD_WCACHE from uapi header
Move bp_type_idx to include/linux/hw_breakpoint.h
Move ep_take_care_of_epollwakeup() to fs/eventpoll.c
Move COMPAT_ATM_ADDPARTY to net/atm/svc.c
|
|
syzkaller reported [0] memory leaks of an UDP socket and ZEROCOPY
skbs. We can reproduce the problem with these sequences:
sk = socket(AF_INET, SOCK_DGRAM, 0)
sk.setsockopt(SOL_SOCKET, SO_TIMESTAMPING, SOF_TIMESTAMPING_TX_SOFTWARE)
sk.setsockopt(SOL_SOCKET, SO_ZEROCOPY, 1)
sk.sendto(b'', MSG_ZEROCOPY, ('127.0.0.1', 53))
sk.close()
sendmsg() calls msg_zerocopy_alloc(), which allocates a skb, sets
skb->cb->ubuf.refcnt to 1, and calls sock_hold(). Here, struct
ubuf_info_msgzc indirectly holds a refcnt of the socket. When the
skb is sent, __skb_tstamp_tx() clones it and puts the clone into
the socket's error queue with the TX timestamp.
When the original skb is received locally, skb_copy_ubufs() calls
skb_unclone(), and pskb_expand_head() increments skb->cb->ubuf.refcnt.
This additional count is decremented while freeing the skb, but struct
ubuf_info_msgzc still has a refcnt, so __msg_zerocopy_callback() is
not called.
The last refcnt is not released unless we retrieve the TX timestamped
skb by recvmsg(). Since we clear the error queue in inet_sock_destruct()
after the socket's refcnt reaches 0, there is a circular dependency.
If we close() the socket holding such skbs, we never call sock_put()
and leak the count, sk, and skb.
TCP has the same problem, and commit e0c8bccd40fc ("net: stream:
purge sk_error_queue in sk_stream_kill_queues()") tried to fix it
by calling skb_queue_purge() during close(). However, there is a
small chance that skb queued in a qdisc or device could be put
into the error queue after the skb_queue_purge() call.
In __skb_tstamp_tx(), the cloned skb should not have a reference
to the ubuf to remove the circular dependency, but skb_clone() does
not call skb_copy_ubufs() for zerocopy skb. So, we need to call
skb_orphan_frags_rx() for the cloned skb to call skb_copy_ubufs().
[0]:
BUG: memory leak
unreferenced object 0xffff88800c6d2d00 (size 1152):
comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 cd af e8 81 00 00 00 00 ................
02 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............
backtrace:
[<0000000055636812>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:2024
[<0000000054d77b7a>] sk_alloc+0x3b/0x800 net/core/sock.c:2083
[<0000000066f3c7e0>] inet_create net/ipv4/af_inet.c:319 [inline]
[<0000000066f3c7e0>] inet_create+0x31e/0xe40 net/ipv4/af_inet.c:245
[<000000009b83af97>] __sock_create+0x2ab/0x550 net/socket.c:1515
[<00000000b9b11231>] sock_create net/socket.c:1566 [inline]
[<00000000b9b11231>] __sys_socket_create net/socket.c:1603 [inline]
[<00000000b9b11231>] __sys_socket_create net/socket.c:1588 [inline]
[<00000000b9b11231>] __sys_socket+0x138/0x250 net/socket.c:1636
[<000000004fb45142>] __do_sys_socket net/socket.c:1649 [inline]
[<000000004fb45142>] __se_sys_socket net/socket.c:1647 [inline]
[<000000004fb45142>] __x64_sys_socket+0x73/0xb0 net/socket.c:1647
[<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
[<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
BUG: memory leak
unreferenced object 0xffff888017633a00 (size 240):
comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 2d 6d 0c 80 88 ff ff .........-m.....
backtrace:
[<000000002b1c4368>] __alloc_skb+0x229/0x320 net/core/skbuff.c:497
[<00000000143579a6>] alloc_skb include/linux/skbuff.h:1265 [inline]
[<00000000143579a6>] sock_omalloc+0xaa/0x190 net/core/sock.c:2596
[<00000000be626478>] msg_zerocopy_alloc net/core/skbuff.c:1294 [inline]
[<00000000be626478>] msg_zerocopy_realloc+0x1ce/0x7f0 net/core/skbuff.c:1370
[<00000000cbfc9870>] __ip_append_data+0x2adf/0x3b30 net/ipv4/ip_output.c:1037
[<0000000089869146>] ip_make_skb+0x26c/0x2e0 net/ipv4/ip_output.c:1652
[<00000000098015c2>] udp_sendmsg+0x1bac/0x2390 net/ipv4/udp.c:1253
[<0000000045e0e95e>] inet_sendmsg+0x10a/0x150 net/ipv4/af_inet.c:819
[<000000008d31bfde>] sock_sendmsg_nosec net/socket.c:714 [inline]
[<000000008d31bfde>] sock_sendmsg+0x141/0x190 net/socket.c:734
[<0000000021e21aa4>] __sys_sendto+0x243/0x360 net/socket.c:2117
[<00000000ac0af00c>] __do_sys_sendto net/socket.c:2129 [inline]
[<00000000ac0af00c>] __se_sys_sendto net/socket.c:2125 [inline]
[<00000000ac0af00c>] __x64_sys_sendto+0xe1/0x1c0 net/socket.c:2125
[<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
[<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY")
Fixes: b5947e5d1e71 ("udp: msg_zerocopy")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull vfs fget updates from Al Viro:
"fget() to fdget() conversions"
* tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fuse_dev_ioctl(): switch to fdget()
cgroup_get_from_fd(): switch to fdget_raw()
bpf: switch to fdget_raw()
build_mount_idmapped(): switch to fdget()
kill the last remaining user of proc_ns_fget()
SVM-SEV: convert the rest of fget() uses to fdget() in there
convert sgx_set_attribute() to fdget()/fdput()
convert setns(2) to fdget()/fdput()
|
|
SET_COALESCE may change operation mode and parameters in one call.
Changing operation mode may cause the driver to reset the parameter
values to what is a reasonable default for new operation mode.
Since driver does not know which parameters come from user and which
are echoed back from ->get, driver may ignore the parameters when
switching operation modes.
This used to be inevitable for ioctl() but in netlink we know which
parameters are actually specified by the user.
We could inform which parameters were set by the user but this would
lead to a lot of code duplication in the drivers. Instead try to call
the drivers twice if both mode and params are changed. The set method
already checks if any params need updating so in case the driver did
the right thing the first time around - there will be no second call
to it's ->set method (only an extra call to ->get()).
For mlx5 for example before this patch we'd see:
# ethtool -C eth0 adaptive-rx on adaptive-tx on
# ethtool -C eth0 adaptive-rx off adaptive-tx off \
tx-usecs 123 rx-usecs 123
Adaptive RX: off TX: off
rx-usecs: 3
rx-frames: 32
tx-usecs: 16
tx-frames: 32
[...]
After the change:
# ethtool -C eth0 adaptive-rx on adaptive-tx on
# ethtool -C eth0 adaptive-rx off adaptive-tx off \
tx-usecs 123 rx-usecs 123
Adaptive RX: off TX: off
rx-usecs: 123
rx-frames: 32
tx-usecs: 123
tx-frames: 32
[...]
This only works for netlink, so it's a small discrepancy between
netlink and ioctl(). Since we anticipate most users to move to
netlink I believe it's worth making their lives easier.
Link: https://lore.kernel.org/r/20230420233302.944382-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Brad Spencer provided a detailed report [0] that when calling getsockopt()
for AF_NETLINK, some SOL_NETLINK options set only 1 byte even though such
options require at least sizeof(int) as length.
The options return a flag value that fits into 1 byte, but such behaviour
confuses users who do not initialise the variable before calling
getsockopt() and do not strictly check the returned value as char.
Currently, netlink_getsockopt() uses put_user() to copy data to optlen and
optval, but put_user() casts the data based on the pointer, char *optval.
As a result, only 1 byte is set to optval.
To avoid this behaviour, we need to use copy_to_user() or cast optval for
put_user().
Note that this changes the behaviour on big-endian systems, but we document
that the size of optval is int in the man page.
$ man 7 netlink
...
Socket options
To set or get a netlink socket option, call getsockopt(2) to read
or setsockopt(2) to write the option with the option level argument
set to SOL_NETLINK. Unless otherwise noted, optval is a pointer to
an int.
Fixes: 9a4595bc7e67 ("[NETLINK]: Add set/getsockopt options to support more than 32 groups")
Fixes: be0c22a46cfb ("netlink: add NETLINK_BROADCAST_ERROR socket option")
Fixes: 38938bfe3489 ("netlink: add NETLINK_NO_ENOBUFS socket flag")
Fixes: 0a6a3a23ea6e ("netlink: add NETLINK_CAP_ACK socket option")
Fixes: 2d4bc93368f5 ("netlink: extended ACK reporting")
Fixes: 89d35528d17d ("netlink: Add new socket option to enable strict checking on dumps")
Reported-by: Brad Spencer <bspencer@blackberry.com>
Link: https://lore.kernel.org/netdev/ZD7VkNWFfp22kTDt@datsun.rim.net/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Link: https://lore.kernel.org/r/20230421185255.94606-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
1) Reduce jumpstack footprint: Stash chain in last rule marker in blob for
tracing. Remove last rule and chain from jumpstack. From Florian Westphal.
2) nf_tables validates all tables before committing the new rules.
Unfortunately, this has two drawbacks:
- Since addition of the transaction mutex pernet state gets written to
outside of the locked section from the cleanup callback, this is
wrong so do this cleanup directly after table has passed all checks.
- Revalidate tables that saw no changes. This can be avoided by
keeping the validation state per table, not per netns.
From Florian Westphal.
3) Get rid of a few redundant pointers in the traceinfo structure.
The three removed pointers are used in the expression evaluation loop,
so gcc keeps them in registers. Passing them to the (inlined) helpers
thus doesn't increase nft_do_chain text size, while stack is reduced
by another 24 bytes on 64bit arches. From Florian Westphal.
4) IPVS cleanups in several ways without implementing any functional
changes, aside from removing some debugging output:
- Update width of source for ip_vs_sync_conn_options
The operation is safe, use an annotation to describe it properly.
- Consistently use array_size() in ip_vs_conn_init()
It seems better to use helpers consistently.
- Remove {Enter,Leave}Function. These seem to be well past their
use-by date.
- Correct spelling in comments.
From Simon Horman.
5) Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device.
* tag 'nf-next-23-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: allow to create netdev chain without device
netfilter: nf_tables: support for deleting devices in an existing netdev chain
netfilter: nf_tables: support for adding new devices to an existing netdev chain
netfilter: nf_tables: rename function to destroy hook list
netfilter: nf_tables: do not send complete notification of deletions
netfilter: nf_tables: extended netlink error reporting for netdevice
ipvs: Correct spelling in comments
ipvs: Remove {Enter,Leave}Function
ipvs: Consistently use array_size() in ip_vs_conn_init()
ipvs: Update width of source for ip_vs_sync_conn_options
netfilter: nf_tables: do not store rule in traceinfo structure
netfilter: nf_tables: do not store verdict in traceinfo structure
netfilter: nf_tables: do not store pktinfo in traceinfo structure
netfilter: nf_tables: remove unneeded conditional
netfilter: nf_tables: make validation state per table
netfilter: nf_tables: don't write table validation state without mutex
netfilter: nf_tables: don't store chain address on jump
netfilter: nf_tables: don't store address of last rule on jump
netfilter: nf_tables: merge nft_rules_old structure and end of ruleblob marker
====================
Link: https://lore.kernel.org/r/20230421235021.216950-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux
Pull RCU updates from Joel Fernandes:
- Updates and additions to MAINTAINERS files, with Boqun being added to
the RCU entry and Zqiang being added as an RCU reviewer.
I have also transitioned from reviewer to maintainer; however, Paul
will be taking over sending RCU pull-requests for the next merge
window.
- Resolution of hotplug warning in nohz code, achieved by fixing
cpu_is_hotpluggable() through interaction with the nohz subsystem.
Tick dependency modifications by Zqiang, focusing on fixing usage of
the TICK_DEP_BIT_RCU_EXP bitmask.
- Avoid needless calls to the rcu-lazy shrinker for CONFIG_RCU_LAZY=n
kernels, fixed by Zqiang.
- Improvements to rcu-tasks stall reporting by Neeraj.
- Initial renaming of k[v]free_rcu() to k[v]free_rcu_mightsleep() for
increased robustness, affecting several components like mac802154,
drbd, vmw_vmci, tracing, and more.
A report by Eric Dumazet showed that the API could be unknowingly
used in an atomic context, so we'd rather make sure they know what
they're asking for by being explicit:
https://lore.kernel.org/all/20221202052847.2623997-1-edumazet@google.com/
- Documentation updates, including corrections to spelling,
clarifications in comments, and improvements to the srcu_size_state
comments.
- Better srcu_struct cache locality for readers, by adjusting the size
of srcu_struct in support of SRCU usage by Christoph Hellwig.
- Teach lockdep to detect deadlocks between srcu_read_lock() vs
synchronize_srcu() contributed by Boqun.
Previously lockdep could not detect such deadlocks, now it can.
- Integration of rcutorture and rcu-related tools, targeted for v6.4
from Boqun's tree, featuring new SRCU deadlock scenarios, test_nmis
module parameter, and more
- Miscellaneous changes, various code cleanups and comment improvements
* tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux: (71 commits)
checkpatch: Error out if deprecated RCU API used
mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep()
rcuscale: Rename kfree_rcu() to kfree_rcu_mightsleep()
ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep()
net/mlx5: Rename kfree_rcu() to kfree_rcu_mightsleep()
net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
lib/test_vmalloc.c: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
misc: vmw_vmci: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access
rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed
rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan()
rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early
rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels
rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check
rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race
rcu/trace: use strscpy() to instead of strncpy()
tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem
...
|
|
Merge ACPI bus type driver changes, ACPI backlight driver updates and a
series of cleanups related to of.h for 6.4-rc1:
- Ensure that ACPI notify handlers are not running after removal and
clean up code in acpi_sb_notify() (Rafael Wysocki).
- Remove register_backlight_delay module option and code and remove
quirks for false-positive backlight control support advertised on
desktop boards (Hans de Goede).
- Replace irqdomain.h include with struct declarations in ACPI headers
and update several pieces of code previously including of.h
implicitly through those headers (Rob Herring).
* acpi-bus:
ACPI: bus: Ensure that notify handlers are not running after removal
ACPI: bus: Add missing braces to acpi_sb_notify()
* acpi-video:
ACPI: video: Remove desktops without backlight DMI quirks
ACPI: video: Remove register_backlight_delay module option and code
* acpi-misc:
ACPI: Replace irqdomain.h include with struct declarations
fpga: lattice-sysconfig-spi: Add explicit include for of.h
tpm: atmel: Add explicit include for of.h
virtio-mmio: Add explicit include for of.h
pata: ixp4xx: Add explicit include for of.h
ata: pata_macio: Add explicit include of irqdomain.h
serial: 8250_tegra: Add explicit include for of.h
net: rfkill-gpio: Add explicit include for of.h
staging: iio: resolver: ad2s1210: Add explicit include for of.h
iio: adc: ad7292: Add explicit include for of.h
|
|
This makes sure hci_cmd_sync_queue only queue new work if HCI_RUNNING
has been set otherwise there is a risk of commands being sent while
turning off.
Because hci_cmd_sync_queue can no longer queue work while HCI_RUNNING is
not set it cannot be used to power on adapters so instead
hci_cmd_sync_submit is introduced which bypass the HCI_RUNNING check, so
it behaves like the old implementation.
Link: https://lore.kernel.org/all/CAB4PzUpDMvdc8j2MdeSAy1KkAE-D3woprCwAdYWeOc-3v3c9Sw@mail.gmail.com/
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Some of the sync commands might take a long time to complete, e.g.
LE Create Connection when the peer device isn't responding might take
20 seconds before it times out. If suspend command is issued during
this time, it will need to wait for completion since both commands are
using the same sync lock.
This patch cancel any running sync commands before attempting to
suspend or adapter power off.
Signed-off-by: Archie Pusaka <apusaka@chromium.org>
Reviewed-by: Ying Hsu <yinghsu@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
API hci_devcd_init() stores its u32 type parameter @dump_size into
skb, but it does not specify which byte order is used to store the
integer, let us take little endian to store and parse the integer.
Fixes: f5cc609d09d4 ("Bluetooth: Add support for hci devcoredump")
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Previously, capability was checked using capable(), which verified that the
caller of the ioctl system call had the required capability. In addition,
the result of the check would be stored in the HCI_SOCK_TRUSTED flag,
making it persistent for the socket.
However, malicious programs can abuse this approach by deliberately sharing
an HCI socket with a privileged task. The HCI socket will be marked as
trusted when the privileged task occasionally makes an ioctl call.
This problem can be solved by using sk_capable() to check capability, which
ensures that not only the current task but also the socket opener has the
specified capability, thus reducing the risk of privilege escalation
through the previously identified vulnerability.
Cc: stable@vger.kernel.org
Fixes: f81f5b2db869 ("Bluetooth: Send control open and close messages for HCI raw sockets")
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
conn->chan_lock isn't acquired before l2cap_get_chan_by_scid,
if l2cap_get_chan_by_scid returns NULL, then 'bad unlock balance'
is triggered.
Reported-by: syzbot+9519d6b5b79cf7787cf3@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000894f5f05f95e9f4d@google.com/
Signed-off-by: Min Li <lm0963hack@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Previously, channel open messages were always sent to monitors on the first
ioctl() call for unbound HCI sockets, even if the command and arguments
were completely invalid. This can leave an exploitable hole with the abuse
of invalid ioctl calls.
This commit hardens the ioctl processing logic by first checking if the
command is valid, and immediately returning with an ENOIOCTLCMD error code
if it is not. This ensures that ioctl calls with invalid commands are free
of side effects, and increases the difficulty of further exploitation by
forcing exploitation to find a way to pass a valid command first.
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Co-developed-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
The ATS2851 based controller advertises support for command "LE Set Random
Private Address Timeout" but does not actually implement it, impeding the
controller initialization.
Add the quirk HCI_QUIRK_BROKEN_SET_RPA_TIMEOUT to unblock the controller
initialization.
< HCI Command: LE Set Resolvable Private... (0x08|0x002e) plen 2
Timeout: 900 seconds
> HCI Event: Command Status (0x0f) plen 4
LE Set Resolvable Private Address Timeout (0x08|0x002e) ncmd 1
Status: Unknown HCI Command (0x01)
Co-developed-by: imoc <wzj9912@gmail.com>
Signed-off-by: imoc <wzj9912@gmail.com>
Signed-off-by: Raul Cheleguini <raul.cheleguini@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
When submitting HCI_OP_LE_CREATE_CIS the code shall wait for
HCI_EVT_LE_CIS_ESTABLISHED thus enforcing the serialization of
HCI_OP_LE_CREATE_CIS as the Core spec does not allow to send them in
parallel:
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 2566:
If the Host issues this command before all the HCI_LE_CIS_Established
events from the previous use of the command have been generated, the
Controller shall return the error code Command Disallowed (0x0C).
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This fixes only matching CIS by address which prevents creating new hcon
if upper layer is requesting a specific CIS ID.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Since it is required for some configurations to have multiple CIS with
the same peer which is now covered by iso-tester in the following test
cases:
ISO AC 6(i) - Success
ISO AC 7(i) - Success
ISO AC 8(i) - Success
ISO AC 9(i) - Success
ISO AC 11(i) - Success
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Remove extra line setting the broadcast code parameter of the
hci_cp_le_create_big struct to 0. The broadcast code is copied
from the QoS struct.
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Fixed a wrong indentation before "return".This line uses a 7 space
indent instead of a tab.
Signed-off-by: Lanzhe Li <u202212060@hust.edu.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This enables 2M and Coded PHY by default if they are marked as supported
in the LE features bits.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Split bt_iso_qos into dedicated unicast and broadcast
structures and add additional broadcast parameters.
Fixes: eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections")
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Add devcoredump APIs to hci core so that drivers only have to provide
the dump skbs instead of managing the synchronization and timeouts.
The devcoredump APIs should be used in the following manner:
- hci_devcoredump_init is called to allocate the dump.
- hci_devcoredump_append is called to append any skbs with dump data
OR hci_devcoredump_append_pattern is called to insert a pattern.
- hci_devcoredump_complete is called when all dump packets have been
sent OR hci_devcoredump_abort is called to indicate an error and
cancel an ongoing dump collection.
The high level APIs just prepare some skbs with the appropriate data and
queue it for the dump to process. Packets part of the crashdump can be
intercepted in the driver in interrupt context and forwarded directly to
the devcoredump APIs.
Internally, there are 5 states for the dump: idle, active, complete,
abort and timeout. A devcoredump will only be in active state after it
has been initialized. Once active, it accepts data to be appended,
patterns to be inserted (i.e. memset) and a completion event or an abort
event to generate a devcoredump. The timeout is initialized at the same
time the dump is initialized (defaulting to 10s) and will be cleared
either when the timeout occurs or the dump is complete or aborted.
Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Manish Mandlik <mmandlik@google.com>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Some adapters (e.g. RTL8723CS) advertise that they have more than
2 pages for local ext features, but they don't support any features
declared in these pages. RTL8723CS reports max_page = 2 and declares
support for sync train and secure connection, but it responds with
either garbage or with error in status on corresponding commands.
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Bastian Germann <bage@debian.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This delays the identity address updates to give time for userspace to
process the new address otherwise there is a risk that userspace
creates a duplicated device if the MGMT event is delayed for some
reason.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This removes the following duplicate statement in
hci_le_ext_directed_advertising_sync():
cp.own_addr_type = own_addr_type;
Signed-off-by: Inga Stotland <inga.stotland@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
The msft_set_filter_enable() command was using the deprecated
hci_request mechanism rather than hci_sync. This caused the warning error:
hci0: HCI_REQ-0xfcf0
Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Currently, when we initiate disconnection, we will wait for the peer's
reply unless when we are suspending, where we fire and forget the
disconnect request.
A similar case is when adapter is powering off. However, we still wait
for the peer's reply in this case. Therefore, if the peer is
unresponsive, the command will time out and the power off sequence
will fail, causing "bluetooth powered on by itself" to users.
This patch makes the host doesn't wait for the peer's reply when the
disconnection reason is powering off.
Signed-off-by: Archie Pusaka <apusaka@chromium.org>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This fixes the following new warning:
net/bluetooth/hci_sync.c:2403 hci_pause_addr_resolution() warn: missing
error code? 'err'
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202302251952.xryXOegd-lkp@intel.com/
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Two parameters can be transformed into netlink policies and
validated while parsing the netlink message.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some error messages are still being printed to dmesg.
Since extack is available, provide error messages there.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some error messages are still being printed to dmesg.
Since extack is available, provide error messages there.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Unbounded info messages in the pedit datapath can flood the printk
ring buffer quite easily depending on the action created.
As these messages are informational, usually printing some, not all,
is enough to bring attention to the real issue.
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The netlink parsing already validates the key 'htype'.
Remove the datapath check as it's redundant.
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Static key offsets should always be on 32 bit boundaries. Validate them on
create/update time for static offsets and move the datapath validation
for runtime offsets only.
iproute2 already errors out if a given offset and data size cannot be
packed to a 32 bit boundary. This change will make sure users which
create/update pedit instances directly via netlink also error out,
instead of finding out when packets are traversing.
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We have extack available when parsing 'ex' keys, so pass it to
tcf_pedit_keys_ex_parse and add more detailed error messages.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Transform two checks in the 'ex' key parsing into netlink policies
removing extra if checks.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The kernel will print several warnings in a short period of time
when it stalls. Like this:
First warning:
[ 7100.097547] ------------[ cut here ]------------
[ 7100.097550] NETDEV WATCHDOG: eno2 (xxx): transmit queue 8 timed out
[ 7100.097571] WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:467
dev_watchdog+0x260/0x270
...
Second warning:
[ 7147.756952] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 7147.756958] rcu: 24-....: (59999 ticks this GP) idle=546/1/0x400000000000000
softirq=367 3137/3673146 fqs=13844
[ 7147.756960] (t=60001 jiffies g=4322709 q=133381)
[ 7147.756962] NMI backtrace for cpu 24
...
We calculate that the transmit queue start stall should occur before
7095s according to watchdog_timeo, the rcu start stall at 7087s.
These two times are close together, it is difficult to confirm which
happened first.
To let users know the exact time the stall started, print msecs when
the transmit queue time out.
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
|