summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2022-02-15mctp: fix use after freeTom Rix1-5/+6
Clang static analysis reports this problem route.c:425:4: warning: Use of memory after it is freed trace_mctp_key_acquire(key); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ When mctp_key_add() fails, key is freed but then is later used in trace_mctp_key_acquire(). Add an else statement to use the key only when mctp_key_add() is successful. Fixes: 4f9e1ba6de45 ("mctp: Add tracepoints for tag/key handling") Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-15ipv4: add description about martian sourceZhang Yunkai1-0/+3
When multiple containers are running in the environment and multiple macvlan network port are configured in each container, a lot of martian source prints will appear after martian_log is enabled. they are almost the same, and printed by net_warn_ratelimited. Each arp message will trigger this print on each network port. Such as: IPv4: martian source 173.254.95.16 from 173.254.100.109, on dev eth0 ll header: 00000000: ff ff ff ff ff ff 40 00 ad fe 64 6d 08 06 ......@...dm.. IPv4: martian source 173.254.95.16 from 173.254.100.109, on dev eth1 ll header: 00000000: ff ff ff ff ff ff 40 00 ad fe 64 6d 08 06 ......@...dm.. There is no description of this kind of source in the RFC1812. Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-15tipc: fix wrong publisher node address in link publicationsJon Maloy1-1/+1
When a link comes up we add its presence to the name table to make it possible for users to subscribe for link up/down events. However, after a previous call signature change the binding is wrongly published with the peer node as publishing node, instead of the own node as it should be. This has the effect that the command 'tipc name table show' will list the link binding (service type 2) with node scope and a peer node as originator, something that obviously is impossible. We correct this bug here. Fixes: 50a3499ab853 ("tipc: simplify signature of tipc_namtbl_publish()") Signed-off-by: Jon Maloy <jmaloy@redhat.com> Link: https://lore.kernel.org/r/20220214013852.2803940-1-jmaloy@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-14net: fix documentation for kernel_getsocknameAlex Maydanik1-2/+2
Fixes return value documentation of kernel_getsockname() and kernel_getpeername() functions. The previous documentation wrongly specified that the return value is 0 in case of success, however sock->ops->getname returns the length of the address in bytes in case of success. Signed-off-by: Alex Maydanik <alexander.maydanik@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14net_sched: add __rcu annotation to netdev->qdiscEric Dumazet4-28/+35
syzbot found a data-race [1] which lead me to add __rcu annotations to netdev->qdisc, and proper accessors to get LOCKDEP support. [1] BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1: attach_default_qdiscs net/sched/sch_generic.c:1167 [inline] dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221 __dev_open+0x2e9/0x3a0 net/core/dev.c:1416 __dev_change_flags+0x167/0x3f0 net/core/dev.c:8139 rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150 __rtnl_newlink net/core/rtnetlink.c:3489 [inline] rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529 rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmsg+0x195/0x230 net/socket.c:2496 __do_sys_sendmsg net/socket.c:2505 [inline] __se_sys_sendmsg net/socket.c:2503 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0: qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323 __tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050 tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211 rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmsg+0x195/0x230 net/socket.c:2496 __do_sys_sendmsg net/socket.c:2505 [inline] __se_sys_sendmsg net/socket.c:2503 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e1383-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 470502de5bdb ("net: sched: unlock rules update API") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Buslov <vladbu@mellanox.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLANVladimir Oltean2-1/+1
mv88e6xxx is special among DSA drivers in that it requires the VTU to contain the VID of the FDB entry it modifies in mv88e6xxx_port_db_load_purge(), otherwise it will return -EOPNOTSUPP. Sometimes due to races this is not always satisfied even if external code does everything right (first deletes the FDB entries, then the VLAN), because DSA commits to hardware FDB entries asynchronously since commit c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification"). Therefore, the mv88e6xxx driver must close this race condition by itself, by asking DSA to flush the switchdev workqueue of any FDB deletions in progress, prior to exiting a VLAN. Fixes: c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification") Reported-by: Rafael Richter <rafael.richter@gin.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()Ignat Korchagin2-3/+3
Some time ago 8965779d2c0e ("ipv6,mcast: always hold idev->lock before mca_lock") switched ipv6_get_lladdr() to __ipv6_get_lladdr(), which is rcu-unsafe version. That was OK, because idev->lock was held for these codepaths. In 88e2ca308094 ("mld: convert ifmcaddr6 to RCU") these external locks were removed, so we probably need to restore the original rcu-safe call. Otherwise, we occasionally get a machine crashed/stalled with the following in dmesg: [ 3405.966610][T230589] general protection fault, probably for non-canonical address 0xdead00000000008c: 0000 [#1] SMP NOPTI [ 3405.982083][T230589] CPU: 44 PID: 230589 Comm: kworker/44:3 Tainted: G O 5.15.19-cloudflare-2022.2.1 #1 [ 3405.998061][T230589] Hardware name: SUPA-COOL-SERV [ 3406.009552][T230589] Workqueue: mld mld_ifc_work [ 3406.017224][T230589] RIP: 0010:__ipv6_get_lladdr+0x34/0x60 [ 3406.025780][T230589] Code: 57 10 48 83 c7 08 48 89 e5 48 39 d7 74 3e 48 8d 82 38 ff ff ff eb 13 48 8b 90 d0 00 00 00 48 8d 82 38 ff ff ff 48 39 d7 74 22 <66> 83 78 32 20 77 1b 75 e4 89 ca 23 50 2c 75 dd 48 8b 50 08 48 8b [ 3406.055748][T230589] RSP: 0018:ffff94e4b3fc3d10 EFLAGS: 00010202 [ 3406.065617][T230589] RAX: dead00000000005a RBX: ffff94e4b3fc3d30 RCX: 0000000000000040 [ 3406.077477][T230589] RDX: dead000000000122 RSI: ffff94e4b3fc3d30 RDI: ffff8c3a31431008 [ 3406.089389][T230589] RBP: ffff94e4b3fc3d10 R08: 0000000000000000 R09: 0000000000000000 [ 3406.101445][T230589] R10: ffff8c3a31430000 R11: 000000000000000b R12: ffff8c2c37887100 [ 3406.113553][T230589] R13: ffff8c3a39537000 R14: 00000000000005dc R15: ffff8c3a31431000 [ 3406.125730][T230589] FS: 0000000000000000(0000) GS:ffff8c3b9fc80000(0000) knlGS:0000000000000000 [ 3406.138992][T230589] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3406.149895][T230589] CR2: 00007f0dfea1db60 CR3: 000000387b5f2000 CR4: 0000000000350ee0 [ 3406.162421][T230589] Call Trace: [ 3406.170235][T230589] <TASK> [ 3406.177736][T230589] mld_newpack+0xfe/0x1a0 [ 3406.186686][T230589] add_grhead+0x87/0xa0 [ 3406.195498][T230589] add_grec+0x485/0x4e0 [ 3406.204310][T230589] ? newidle_balance+0x126/0x3f0 [ 3406.214024][T230589] mld_ifc_work+0x15d/0x450 [ 3406.223279][T230589] process_one_work+0x1e6/0x380 [ 3406.232982][T230589] worker_thread+0x50/0x3a0 [ 3406.242371][T230589] ? rescuer_thread+0x360/0x360 [ 3406.252175][T230589] kthread+0x127/0x150 [ 3406.261197][T230589] ? set_kthread_struct+0x40/0x40 [ 3406.271287][T230589] ret_from_fork+0x22/0x30 [ 3406.280812][T230589] </TASK> [ 3406.288937][T230589] Modules linked in: ... [last unloaded: kheaders] [ 3406.476714][T230589] ---[ end trace 3525a7655f2f3b9e ]--- Fixes: 88e2ca308094 ("mld: convert ifmcaddr6 to RCU") Reported-by: David Pinilla Caparros <dpini@cloudflare.com> Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-13tipc: fix a bit overflow in tipc_crypto_key_rcv()Hangyu Hua1-1/+1
msg_data_sz return a 32bit value, but size is 16bit. This may lead to a bit overflow. Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11Merge tag 'wireless-2022-02-11' of ↵David S. Miller2-21/+25
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless wireless fixes for v5.17 Second set of fixes for v5.17. This is the first pull request with both driver and stack patches. Most important here are a regression fix for brcmfmac USB devices and an iwlwifi fix for use after free when the firmware was missing. We have new maintainers for ath9k and wcn36xx as well as ath6kl is now orphaned. Also smaller fixes to iwlwifi and stack.
2022-02-11Merge ra.kernel.org:/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller2-2/+4
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Add selftest for nft_synproxy, from Florian Westphal. 2) xt_socket destroy path incorrectly disables IPv4 defrag for IPv6 traffic (typo), from Eric Dumazet. 3) Fix exit value selftest nft_concat_range.sh, from Hangbin Liu. 4) nft_synproxy disables the IPv4 hooks if the IPv6 hooks fail to be registered. 5) disable rp_filter on router in selftest nft_fib.sh, also from Hangbin Liu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11drop_monitor: fix data-race in dropmon_net_event / trace_napi_poll_hitEric Dumazet1-2/+9
trace_napi_poll_hit() is reading stat->dev while another thread can write on it from dropmon_net_event() Use READ_ONCE()/WRITE_ONCE() here, RCU rules are properly enforced already, we only have to take care of load/store tearing. BUG: KCSAN: data-race in dropmon_net_event / trace_napi_poll_hit write to 0xffff88816f3ab9c0 of 8 bytes by task 20260 on cpu 1: dropmon_net_event+0xb8/0x2b0 net/core/drop_monitor.c:1579 notifier_call_chain kernel/notifier.c:84 [inline] raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:392 call_netdevice_notifiers_info net/core/dev.c:1919 [inline] call_netdevice_notifiers_extack net/core/dev.c:1931 [inline] call_netdevice_notifiers net/core/dev.c:1945 [inline] unregister_netdevice_many+0x867/0xfb0 net/core/dev.c:10415 ip_tunnel_delete_nets+0x24a/0x280 net/ipv4/ip_tunnel.c:1123 vti_exit_batch_net+0x2a/0x30 net/ipv4/ip_vti.c:515 ops_exit_list net/core/net_namespace.c:173 [inline] cleanup_net+0x4dc/0x8d0 net/core/net_namespace.c:597 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307 worker_thread+0x616/0xa70 kernel/workqueue.c:2454 kthread+0x1bf/0x1e0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 read to 0xffff88816f3ab9c0 of 8 bytes by interrupt on cpu 0: trace_napi_poll_hit+0x89/0x1c0 net/core/drop_monitor.c:292 trace_napi_poll include/trace/events/napi.h:14 [inline] __napi_poll+0x36b/0x3f0 net/core/dev.c:6366 napi_poll net/core/dev.c:6432 [inline] net_rx_action+0x29e/0x650 net/core/dev.c:6519 __do_softirq+0x158/0x2de kernel/softirq.c:558 do_softirq+0xb1/0xf0 kernel/softirq.c:459 __local_bh_enable_ip+0x68/0x70 kernel/softirq.c:383 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x33/0x40 kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:394 [inline] ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline] wg_packet_decrypt_worker+0x73c/0x780 drivers/net/wireguard/receive.c:506 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307 worker_thread+0x616/0xa70 kernel/workqueue.c:2454 kthread+0x1bf/0x1e0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 value changed: 0xffff88815883e000 -> 0x0000000000000000 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 26435 Comm: kworker/0:1 Not tainted 5.17.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: wg-crypt-wg2 wg_packet_decrypt_worker Fixes: 4ea7e38696c7 ("dropmon: add ability to detect when hardware dropsrxpackets") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11net/smc: Avoid overwriting the copies of clcsock callback functionsWen Gu1-3/+7
The callback functions of clcsock will be saved and replaced during the fallback. But if the fallback happens more than once, then the copies of these callback functions will be overwritten incorrectly, resulting in a loop call issue: clcsk->sk_error_report |- smc_fback_error_report() <------------------------------| |- smc_fback_forward_wakeup() | (loop) |- clcsock_callback() (incorrectly overwritten) | |- smc->clcsk_error_report() ------------------| So this patch fixes the issue by saving these function pointers only once in the fallback and avoiding overwriting. Reported-by: syzbot+4de3c0e8a263e1e499bc@syzkaller.appspotmail.com Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback") Link: https://lore.kernel.org/r/0000000000006d045e05d78776f6@google.com Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11Merge tag 'net-5.17-rc4' of ↵Linus Torvalds19-82/+144
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter and can. Current release - new code bugs: - sparx5: fix get_stat64 out-of-bound access and crash - smc: fix netdev ref tracker misuse Previous releases - regressions: - eth: ixgbevf: require large buffers for build_skb on 82599VF, avoid overflows - eth: ocelot: fix all IP traffic getting trapped to CPU with PTP over IP - bonding: fix rare link activation misses in 802.3ad mode Previous releases - always broken: - tcp: fix tcp sock mem accounting in zero-copy corner cases - remove the cached dst when uncloning an skb dst and its metadata, since we only have one ref it'd lead to an UaF - netfilter: - conntrack: don't refresh sctp entries in closed state - conntrack: re-init state for retransmitted syn-ack, avoid connection establishment getting stuck with strange stacks - ctnetlink: disable helper autoassign, avoid it getting lost - nft_payload: don't allow transport header access for fragments - dsa: fix use of devres for mdio throughout drivers - eth: amd-xgbe: disable interrupts during pci removal - eth: dpaa2-eth: unregister netdev before disconnecting the PHY - eth: ice: fix IPIP and SIT TSO offload" * tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister net: mscc: ocelot: fix mutex lock error during ethtool stats read ice: Avoid RTNL lock when re-creating auxiliary device ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler ice: fix IPIP and SIT TSO offload ice: fix an error code in ice_cfg_phy_fec() net: mpls: Fix GCC 12 warning dpaa2-eth: unregister the netdev before disconnecting from the PHY skbuff: cleanup double word in comment net: macb: Align the dma and coherent dma masks mptcp: netlink: process IPv6 addrs in creating listening sockets selftests: mptcp: add missing join check net: usb: qmi_wwan: Add support for Dell DW5829e vlan: move dev_put into vlan_dev_uninit vlan: introduce vlan_dev_free_egress_priority ax25: fix UAF bugs of net_device caused by rebinding operation net: dsa: fix panic when DSA master device unbinds on shutdown net: amd-xgbe: disable interrupts during pci removal tipc: rate limit warning for received illegal binding update net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE ...
2022-02-10netfilter: nft_synproxy: unregister hooks on init error pathPablo Neira Ayuso1-1/+3
Disable the IPv4 hooks if the IPv6 hooks fail to be registered. Fixes: ad49d86e07a4 ("netfilter: nf_tables: Add synproxy support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-10net: mpls: Fix GCC 12 warningVictor Erminpour1-1/+1
When building with automatic stack variable initialization, GCC 12 complains about variables defined outside of switch case statements. Move the variable outside the switch, which silences the warning: ./net/mpls/af_mpls.c:1624:21: error: statement will never be executed [-Werror=switch-unreachable] 1624 | int err; | ^~~ Signed-off-by: Victor Erminpour <victor.erminpour@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10skbuff: cleanup double word in commentTom Rix1-1/+1
Remove the second 'to'. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10tipc: improve size validations for received domain recordsJon Maloy2-2/+9
The function tipc_mon_rcv() allows a node to receive and process domain_record structs from peer nodes to track their views of the network topology. This patch verifies that the number of members in a received domain record does not exceed the limit defined by MAX_MON_DOMAIN, something that may otherwise lead to a stack overflow. tipc_mon_rcv() is called from the function tipc_link_proto_rcv(), where we are reading a 32 bit message data length field into a uint16. To avert any risk of bit overflow, we add an extra sanity check for this in that function. We cannot see that happen with the current code, but future designers being unaware of this risk, may introduce it by allowing delivery of very large (> 64k) sk buffers from the bearer layer. This potential problem was identified by Eric Dumazet. This fixes CVE-2022-0435 Reported-by: Samuel Page <samuel.page@appgate.com> Reported-by: Eric Dumazet <edumazet@google.com> Fixes: 35c55c9877f8 ("tipc: add neighbor monitoring framework") Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Samuel Page <samuel.page@appgate.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-10mptcp: netlink: process IPv6 addrs in creating listening socketsKishen Maloor1-2/+6
This change updates mptcp_pm_nl_create_listen_socket() to create listening sockets bound to IPv6 addresses (where IPv6 is supported). Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Acked-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-09vlan: move dev_put into vlan_dev_uninitXin Long1-3/+5
Shuang Li reported an QinQ issue by simply doing: # ip link add dummy0 type dummy # ip link add link dummy0 name dummy0.1 type vlan id 1 # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2 # rmmod 8021q unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1 When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links(). When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However, due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and release dummy0.1, as it delays dev_put until calling dev->priv_destructor, vlan_dev_free(). This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before netdev_wait_allrefs(). Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09vlan: introduce vlan_dev_free_egress_priorityXin Long3-5/+11
This patch is to introduce vlan_dev_free_egress_priority() to free egress priority for vlan dev, and keep vlan_dev_uninit() static as .ndo_uninit. It makes the code more clear and safer when adding new code in vlan_dev_uninit() in the future. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09ax25: fix UAF bugs of net_device caused by rebinding operationDuoming Zhou1-1/+4
The ax25_kill_by_device() will set s->ax25_dev = NULL and call ax25_disconnect() to change states of ax25_cb and sock, if we call ax25_bind() before ax25_kill_by_device(). However, if we call ax25_bind() again between the window of ax25_kill_by_device() and ax25_dev_device_down(), the values and states changed by ax25_kill_by_device() will be reassigned. Finally, ax25_dev_device_down() will deallocate net_device. If we dereference net_device in syscall functions such as ax25_release(), ax25_sendmsg(), ax25_getsockopt(), ax25_getname() and ax25_info_show(), a UAF bug will occur. One of the possible race conditions is shown below: (USE) | (FREE) ax25_bind() | | ax25_kill_by_device() ax25_bind() | ax25_connect() | ... | ax25_dev_device_down() | ... | dev_put_track(dev, ...) //FREE ax25_release() | ... ax25_send_control() | alloc_skb() //USE | the corresponding fail log is shown below: =============================================================== BUG: KASAN: use-after-free in ax25_send_control+0x43/0x210 ... Call Trace: ... ax25_send_control+0x43/0x210 ax25_release+0x2db/0x3b0 __sock_release+0x6d/0x120 sock_close+0xf/0x20 __fput+0x11f/0x420 ... Allocated by task 1283: ... __kasan_kmalloc+0x81/0xa0 alloc_netdev_mqs+0x5a/0x680 mkiss_open+0x6c/0x380 tty_ldisc_open+0x55/0x90 ... Freed by task 1969: ... kfree+0xa3/0x2c0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 tty_ldisc_kill+0x3e/0x80 ... In order to fix these UAF bugs caused by rebinding operation, this patch adds dev_hold_track() into ax25_bind() and corresponding dev_put_track() into ax25_kill_by_device(). Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net: dsa: fix panic when DSA master device unbinds on shutdownVladimir Oltean1-19/+6
Rafael reports that on a system with LX2160A and Marvell DSA switches, if a reboot occurs while the DSA master (dpaa2-eth) is up, the following panic can be seen: systemd-shutdown[1]: Rebooting. Unable to handle kernel paging request at virtual address 00a0000800000041 [00a0000800000041] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] PREEMPT SMP CPU: 6 PID: 1 Comm: systemd-shutdow Not tainted 5.16.5-00042-g8f5585009b24 #32 pc : dsa_slave_netdevice_event+0x130/0x3e4 lr : raw_notifier_call_chain+0x50/0x6c Call trace: dsa_slave_netdevice_event+0x130/0x3e4 raw_notifier_call_chain+0x50/0x6c call_netdevice_notifiers_info+0x54/0xa0 __dev_close_many+0x50/0x130 dev_close_many+0x84/0x120 unregister_netdevice_many+0x130/0x710 unregister_netdevice_queue+0x8c/0xd0 unregister_netdev+0x20/0x30 dpaa2_eth_remove+0x68/0x190 fsl_mc_driver_remove+0x20/0x5c __device_release_driver+0x21c/0x220 device_release_driver_internal+0xac/0xb0 device_links_unbind_consumers+0xd4/0x100 __device_release_driver+0x94/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_device_remove+0x24/0x40 __fsl_mc_device_remove+0xc/0x20 device_for_each_child+0x58/0xa0 dprc_remove+0x90/0xb0 fsl_mc_driver_remove+0x20/0x5c __device_release_driver+0x21c/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_bus_remove+0x80/0x100 fsl_mc_bus_shutdown+0xc/0x1c platform_shutdown+0x20/0x30 device_shutdown+0x154/0x330 __do_sys_reboot+0x1cc/0x250 __arm64_sys_reboot+0x20/0x30 invoke_syscall.constprop.0+0x4c/0xe0 do_el0_svc+0x4c/0x150 el0_svc+0x24/0xb0 el0t_64_sync_handler+0xa8/0xb0 el0t_64_sync+0x178/0x17c It can be seen from the stack trace that the problem is that the deregistration of the master causes a dev_close(), which gets notified as NETDEV_GOING_DOWN to dsa_slave_netdevice_event(). But dsa_switch_shutdown() has already run, and this has unregistered the DSA slave interfaces, and yet, the NETDEV_GOING_DOWN handler attempts to call dev_close_many() on those slave interfaces, leading to the problem. The previous attempt to avoid the NETDEV_GOING_DOWN on the master after dsa_switch_shutdown() was called seems improper. Unregistering the slave interfaces is unnecessary and unhelpful. Instead, after the slaves have stopped being uppers of the DSA master, we can now reset to NULL the master->dsa_ptr pointer, which will make DSA start ignoring all future notifier events on the master. Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") Reported-by: Rafael Richter <rafael.richter@gin.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09tipc: rate limit warning for received illegal binding updateJon Maloy1-1/+1
It would be easy to craft a message containing an illegal binding table update operation. This is handled correctly by the code, but the corresponding warning printout is not rate limited as is should be. We fix this now. Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09Merge tag 'linux-can-fixes-for-5.17-20220209' of ↵David S. Miller1-7/+22
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2022-02-09 this is a pull request of 2 patches for net/master. Oliver Hartkopp contributes 2 fixes for the CAN ISOTP protocol. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09ax25: fix NPD bug in ax25_disconnectDuoming Zhou1-1/+1
The ax25_disconnect() in ax25_kill_by_device() is not protected by any locks, thus there is a race condition between ax25_disconnect() and ax25_destroy_socket(). when ax25->sk is assigned as NULL by ax25_destroy_socket(), a NULL pointer dereference bug will occur if site (1) or (2) dereferences ax25->sk. ax25_kill_by_device() | ax25_release() ax25_disconnect() | ax25_destroy_socket() ... | if(ax25->sk != NULL) | ... ... | ax25->sk = NULL; bh_lock_sock(ax25->sk); //(1) | ... ... | bh_unlock_sock(ax25->sk); //(2)| This patch moves ax25_disconnect() into lock_sock(), which can synchronize with ax25_destroy_socket() in ax25_release(). Fail log: =============================================================== BUG: kernel NULL pointer dereference, address: 0000000000000088 ... RIP: 0010:_raw_spin_lock+0x7e/0xd0 ... Call Trace: ax25_disconnect+0xf6/0x220 ax25_device_event+0x187/0x250 raw_notifier_call_chain+0x5e/0x70 dev_close_many+0x17d/0x230 rollback_registered_many+0x1f1/0x950 unregister_netdevice_queue+0x133/0x200 unregister_netdev+0x13/0x20 ... Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09netfilter: xt_socket: fix a typo in socket_mt_destroy()Eric Dumazet1-1/+1
Calling nf_defrag_ipv4_disable() instead of nf_defrag_ipv6_disable() was probably not the intent. I found this by code inspection, while chasing a possible issue in TPROXY. Fixes: de8c12110a13 ("netfilter: disable defrag once its no longer needed") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-09can: isotp: fix error path in isotp_sendmsg() to unlock wait queueOliver Hartkopp1-6/+9
Commit 43a08c3bdac4 ("can: isotp: isotp_sendmsg(): fix TX buffer concurrent access in isotp_sendmsg()") introduced a new locking scheme that may render the userspace application in a locking state when an error is detected. This issue shows up under high load on simultaneously running isotp channels with identical configuration which is against the ISO specification and therefore breaks any reasonable PDU communication anyway. Fixes: 43a08c3bdac4 ("can: isotp: isotp_sendmsg(): fix TX buffer concurrent access in isotp_sendmsg()") Link: https://lore.kernel.org/all/20220209073601.25728-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Cc: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-09can: isotp: fix potential CAN frame reception race in isotp_rcv()Oliver Hartkopp1-1/+13
When receiving a CAN frame the current code logic does not consider concurrently receiving processes which do not show up in real world usage. Ziyang Xuan writes: The following syz problem is one of the scenarios. so->rx.len is changed by isotp_rcv_ff() during isotp_rcv_cf(), so->rx.len equals 0 before alloc_skb() and equals 4096 after alloc_skb(). That will trigger skb_over_panic() in skb_put(). ======================================================= CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 5.16.0-rc8-syzkaller #0 RIP: 0010:skb_panic+0x16c/0x16e net/core/skbuff.c:113 Call Trace: <TASK> skb_over_panic net/core/skbuff.c:118 [inline] skb_put.cold+0x24/0x24 net/core/skbuff.c:1990 isotp_rcv_cf net/can/isotp.c:570 [inline] isotp_rcv+0xa38/0x1e30 net/can/isotp.c:668 deliver net/can/af_can.c:574 [inline] can_rcv_filter+0x445/0x8d0 net/can/af_can.c:635 can_receive+0x31d/0x580 net/can/af_can.c:665 can_rcv+0x120/0x1c0 net/can/af_can.c:696 __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5465 __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5579 Therefore we make sure the state changes and data structures stay consistent at CAN frame reception time by adding a spin_lock in isotp_rcv(). This fixes the issue reported by syzkaller but does not affect real world operation. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://lore.kernel.org/linux-can/d7e69278-d741-c706-65e1-e87623d9a8e8@huawei.com/T/ Link: https://lore.kernel.org/all/20220208200026.13783-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Reported-by: syzbot+4c63f36709a642f801c5@syzkaller.appspotmail.com Reported-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-09ipmr,ip6mr: acquire RTNL before calling ip[6]mr_free_table() on failure pathEric Dumazet2-0/+4
ip[6]mr_free_table() can only be called under RTNL lock. RTNL: assertion failed at net/core/dev.c (10367) WARNING: CPU: 1 PID: 5890 at net/core/dev.c:10367 unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367 Modules linked in: CPU: 1 PID: 5890 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller-11627-g422ee58dc0ef #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367 Code: 0f 85 9b ee ff ff e8 69 07 4b fa ba 7f 28 00 00 48 c7 c6 00 90 ae 8a 48 c7 c7 40 90 ae 8a c6 05 6d b1 51 06 01 e8 8c 90 d8 01 <0f> 0b e9 70 ee ff ff e8 3e 07 4b fa 4c 89 e7 e8 86 2a 59 fa e9 ee RSP: 0018:ffffc900046ff6e0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888050f51d00 RSI: ffffffff815fa008 RDI: fffff520008dfece RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815f3d6e R11: 0000000000000000 R12: 00000000fffffff4 R13: dffffc0000000000 R14: ffffc900046ff750 R15: ffff88807b7dc000 FS: 00007f4ab736e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fee0b4f8990 CR3: 000000001e7d2000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> mroute_clean_tables+0x244/0xb40 net/ipv6/ip6mr.c:1509 ip6mr_free_table net/ipv6/ip6mr.c:389 [inline] ip6mr_rules_init net/ipv6/ip6mr.c:246 [inline] ip6mr_net_init net/ipv6/ip6mr.c:1306 [inline] ip6mr_net_init+0x3f0/0x4e0 net/ipv6/ip6mr.c:1298 ops_init+0xaf/0x470 net/core/net_namespace.c:140 setup_net+0x54f/0xbb0 net/core/net_namespace.c:331 copy_net_ns+0x318/0x760 net/core/net_namespace.c:475 create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110 copy_namespaces+0x391/0x450 kernel/nsproxy.c:178 copy_process+0x2e0c/0x7300 kernel/fork.c:2167 kernel_clone+0xe7/0xab0 kernel/fork.c:2555 __do_sys_clone+0xc8/0x110 kernel/fork.c:2672 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f4ab89f9059 Code: Unable to access opcode bytes at RIP 0x7f4ab89f902f. RSP: 002b:00007f4ab736e118 EFLAGS: 00000206 ORIG_RAX: 0000000000000038 RAX: ffffffffffffffda RBX: 00007f4ab8b0bf60 RCX: 00007f4ab89f9059 RDX: 0000000020000280 RSI: 0000000020000270 RDI: 0000000040200000 RBP: 00007f4ab8a5308d R08: 0000000020000300 R09: 0000000020000300 R10: 00000000200002c0 R11: 0000000000000206 R12: 0000000000000000 R13: 00007ffc3977cc1f R14: 00007f4ab736e300 R15: 0000000000022000 </TASK> Fixes: f243e5a7859a ("ipmr,ip6mr: call ip6mr_free_table() on failure path") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cong.wang@bytedance.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220208053451.2885398-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08Merge tag 'nfs-for-5.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds3-2/+13
Pull NFS client fixes from Anna Schumaker: "Stable Fixes: - Fix initialization of nfs_client cl_flags Other Fixes: - Fix performance issues with uncached readdir calls - Fix potential pointer dereferences in rpcrdma_ep_create - Fix nfs4_proc_get_locations() kernel-doc comment - Fix locking during sunrpc sysfs reads - Update my email address in the MAINTAINERS file to my new kernel.org email" * tag 'nfs-for-5.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: lock against ->sock changing during sysfs read MAINTAINERS: Update my email address NFS: Fix nfs4_proc_get_locations() kernel-doc comment xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create NFS: Fix initialisation of nfs_client cl_flags field NFS: Avoid duplicate uncached readdir calls on eof NFS: Don't skip directory entries when doing uncached readdir NFS: Don't overfill uncached readdir pages
2022-02-08SUNRPC: lock against ->sock changing during sysfs readNeilBrown2-2/+10
->sock can be set to NULL asynchronously unless ->recv_mutex is held. So it is important to hold that mutex. Otherwise a sysfs read can trigger an oops. Commit 17f09d3f619a ("SUNRPC: Check if the xprt is connected before handling sysfs reads") appears to attempt to fix this problem, but it only narrows the race window. Fixes: 17f09d3f619a ("SUNRPC: Check if the xprt is connected before handling sysfs reads") Fixes: a8482488a7d6 ("SUNRPC query transport's source port") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-08xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_createDan Aloni1-0/+3
If there are failures then we must not leave the non-NULL pointers with the error value, otherwise `rpcrdma_ep_destroy` gets confused and tries free them, resulting in an Oops. Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-07net/smc: use GFP_ATOMIC allocation in smc_pnet_add_eth()Eric Dumazet1-1/+1
My last patch moved the netdev_tracker_alloc() call to a section protected by a write_lock(). I should have replaced GFP_KERNEL with GFP_ATOMIC to avoid the infamous: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:256 Fixes: 28f922213886 ("net/smc: fix ref_tracker issue in smc_pnet_add()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-06net/smc: fix ref_tracker issue in smc_pnet_add()Eric Dumazet1-3/+5
I added the netdev_tracker_alloc() right after ndev was stored into the newly allocated object: new_pe->ndev = ndev; if (ndev) netdev_tracker_alloc(ndev, &new_pe->dev_tracker, GFP_KERNEL); But I missed that later, we could end up freeing new_pe, then calling dev_put(ndev) to release the reference on ndev. The new_pe->dev_tracker would not be freed. To solve this issue, move the netdev_tracker_alloc() call to the point we know for sure new_pe will be kept. syzbot report (on net-next tree, but the bug is present in net tree) WARNING: CPU: 0 PID: 6019 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 0 PID: 6019 Comm: syz-executor.3 Not tainted 5.17.0-rc2-syzkaller-00650-g5a8fb33e5305 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d f4 70 a0 09 31 ff 89 de e8 4d bc 99 fd 84 db 75 e0 e8 64 b8 99 fd 48 c7 c7 20 0c 06 8a c6 05 d4 70 a0 09 01 e8 9e 4e 28 05 <0f> 0b eb c4 e8 48 b8 99 fd 0f b6 1d c3 70 a0 09 31 ff 89 de e8 18 RSP: 0018:ffffc900043b7400 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff815fb318 RDI: fffff52000876e72 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815f507e R11: 0000000000000000 R12: 1ffff92000876e85 R13: 0000000000000000 R14: ffff88805c1c6600 R15: 0000000000000000 FS: 00007f1ef6feb700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2d02b000 CR3: 00000000223f4000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] ref_tracker_free+0x53f/0x6c0 lib/ref_tracker.c:119 netdev_tracker_free include/linux/netdevice.h:3867 [inline] dev_put_track include/linux/netdevice.h:3884 [inline] dev_put_track include/linux/netdevice.h:3880 [inline] dev_put include/linux/netdevice.h:3910 [inline] smc_pnet_add_eth net/smc/smc_pnet.c:399 [inline] smc_pnet_enter net/smc/smc_pnet.c:493 [inline] smc_pnet_add+0x5fc/0x15f0 net/smc/smc_pnet.c:556 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: b60645248af3 ("net/smc: add net device tracker to struct smc_pnetentry") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05tcp: take care of mixed splice()/sendmsg(MSG_ZEROCOPY) caseEric Dumazet1-14/+19
syzbot found that mixing sendpage() and sendmsg(MSG_ZEROCOPY) calls over the same TCP socket would again trigger the infamous warning in inet_sock_destruct() WARN_ON(sk_forward_alloc_get(sk)); While Talal took into account a mix of regular copied data and MSG_ZEROCOPY one in the same skb, the sendpage() path has been forgotten. We want the charging to happen for sendpage(), because pages could be coming from a pipe. What is missing is the downgrading of pure zerocopy status to make sure sk_forward_alloc will stay synced. Add tcp_downgrade_zcopy_pure() helper so that we can use it from the two callers. Fixes: 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Talal Ahmad <talalahmad@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Link: https://lore.kernel.org/r/20220203225547.665114-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04Merge tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-clientLinus Torvalds4-70/+245
Pull ceph fixes from Ilya Dryomov: "A patch to make it possible to disable zero copy path in the messenger to avoid checksum or authentication tag mismatches and ensuing session resets in case the destination buffer isn't guaranteed to be stable" * tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client: libceph: optionally use bounce buffer on recv path in crc mode libceph: make recv path in secure mode work the same as send path
2022-02-04cfg80211: fix race in netlink owner interface destructionJohannes Berg1-13/+4
My previous fix here to fix the deadlock left a race where the exact same deadlock (see the original commit referenced below) can still happen if cfg80211_destroy_ifaces() already runs while nl80211_netlink_notify() is still marking some interfaces as nl_owner_dead. The race happens because we have two loops here - first we dev_close() all the netdevs, and then we destroy them. If we also have two netdevs (first one need only be a wdev though) then we can find one during the first iteration, close it, and go to the second iteration -- but then find two, and try to destroy also the one we didn't close yet. Fix this by only iterating once. Reported-by: Toke Høiland-Jørgensen <toke@redhat.com> Fixes: ea6b2098dd02 ("cfg80211: fix locking in netlink owner interface destruction") Tested-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20220201130951.22093-1-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-02-04netfilter: ctnetlink: disable helper autoassignFlorian Westphal1-1/+2
When userspace, e.g. conntrackd, inserts an entry with a specified helper, its possible that the helper is lost immediately after its added: ctnetlink_create_conntrack -> nf_ct_helper_ext_add + assign helper -> ctnetlink_setup_nat -> ctnetlink_parse_nat_setup -> parse_nat_setup -> nfnetlink_parse_nat_setup -> nf_nat_setup_info -> nf_conntrack_alter_reply -> __nf_ct_try_assign_helper ... and __nf_ct_try_assign_helper will zero the helper again. Set IPS_HELPER bit to bypass auto-assign logic, its unwanted, just like when helper is assigned via ruleset. Dropped old 'not strictly necessary' comment, it referred to use of rcu_assign_pointer() before it got replaced by RCU_INIT_POINTER(). NB: Fixes tag intentionally incorrect, this extends the referenced commit, but this change won't build without IPS_HELPER introduced there. Fixes: 6714cf5465d280 ("netfilter: nf_conntrack: fix explicit helper attachment and NAT") Reported-by: Pham Thanh Tuyen <phamtyn@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: conntrack: re-init state for retransmitted syn-ackFlorian Westphal1-0/+12
TCP conntrack assumes that a syn-ack retransmit is identical to the previous syn-ack. This isn't correct and causes stuck 3whs in some more esoteric scenarios. tcpdump to illustrate the problem: client > server: Flags [S] seq 1365731894, win 29200, [mss 1460,sackOK,TS val 2083035583 ecr 0,wscale 7] server > client: Flags [S.] seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215367629 ecr 2082921663] Note the invalid/outdated synack ack number. Conntrack marks this syn-ack as out-of-window/invalid, but it did initialize the reply direction parameters based on this packets content. client > server: Flags [S] seq 1365731894, win 29200, [mss 1460,sackOK,TS val 2083036623 ecr 0,wscale 7] ... retransmit... server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215368644 ecr 2082921663] and another bogus synack. This repeats, then client re-uses for a new attempt: client > server: Flags [S], seq 2375731741, win 29200, [mss 1460,sackOK,TS val 2083100223 ecr 0,wscale 7] server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215430754 ecr 2082921663] ... but still gets a invalid syn-ack. This repeats until: server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215437785 ecr 2082921663] server > client: Flags [R.], seq 145824454, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215443451 ecr 2082921663] client > server: Flags [S], seq 2375731741, win 29200, [mss 1460,sackOK,TS val 2083115583 ecr 0,wscale 7] server > client: Flags [S.], seq 162602410, ack 2375731742, win 65535, [mss 8952,wscale 5,TS val 3215445754 ecr 2083115583] This syn-ack has the correct ack number, but conntrack flags it as invalid: The internal state was created from the first syn-ack seen so the sequence number of the syn-ack is treated as being outside of the announced window. Don't assume that retransmitted syn-ack is identical to previous one. Treat it like the first syn-ack and reinit state. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: conntrack: move synack init code to helperFlorian Westphal1-18/+29
It seems more readable to use a common helper in the followup fix rather than copypaste or goto. No functional change intended. The function is only called for syn-ack or syn in repy direction in case of simultaneous open. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: nft_payload: don't allow th access for fragmentsFlorian Westphal2-5/+6
Loads relative to ->thoff naturally expect that this points to the transport header, but this is only true if pkt->fragoff == 0. This has little effect for rulesets with connection tracking/nat because these enable ip defra. For other rulesets this prevents false matches. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: conntrack: don't refresh sctp entries in closed stateFlorian Westphal1-0/+9
Vivek Thrivikraman reported: An SCTP server application which is accessed continuously by client application. When the session disconnects the client retries to establish a connection. After restart of SCTP server application the session is not established because of stale conntrack entry with connection state CLOSED as below. (removing this entry manually established new connection): sctp 9 CLOSED src=10.141.189.233 [..] [ASSURED] Just skip timeout update of closed entries, we don't want them to stay around forever. Reported-and-tested-by: Vivek Thrivikraman <vivek.thrivikraman@est.tech> Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1579 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04Merge tag 'net-5.17-rc3' of ↵Linus Torvalds21-69/+247
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, netfilter, and ieee802154. Current release - regressions: - Partially revert "net/smc: Add netlink net namespace support", fix uABI breakage - netfilter: - nft_ct: fix use after free when attaching zone template - nft_byteorder: track register operations Previous releases - regressions: - ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback - phy: qca8081: fix speeds lower than 2.5Gb/s - sched: fix use-after-free in tc_new_tfilter() Previous releases - always broken: - tcp: fix mem under-charging with zerocopy sendmsg() - tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data() - neigh: do not trigger immediate probes on NUD_FAILED from neigh_managed_work, avoid a deadlock - bpf: use VM_MAP instead of VM_ALLOC for ringbuf, avoid KASAN false-positives - netfilter: nft_reject_bridge: fix for missing reply from prerouting - smc: forward wakeup to smc socket waitqueue after fallback - ieee802154: - return meaningful error codes from the netlink helpers - mcr20a: fix lifs/sifs periods - at86rf230, ca8210: stop leaking skbs on error paths - macsec: add missing un-offload call for NETDEV_UNREGISTER of parent - ax25: add refcount in ax25_dev to avoid UAF bugs - eth: mlx5e: - fix SFP module EEPROM query - fix broken SKB allocation in HW-GRO - IPsec offload: fix tunnel mode crypto for non-TCP/UDP flows - eth: amd-xgbe: - fix skb data length underflow - ensure reset of the tx_timer_active flag, avoid Tx timeouts - eth: stmmac: fix runtime pm use in stmmac_dvr_remove() - eth: e1000e: handshake with CSME starts from Alder Lake platforms" * tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits) ax25: fix reference count leaks of ax25_dev net: stmmac: ensure PTP time register reads are consistent net: ipa: request IPA register values be retained dt-bindings: net: qcom,ipa: add optional qcom,qmp property tools/resolve_btfids: Do not print any commands when building silently bpf: Use VM_MAP instead of VM_ALLOC for ringbuf net, neigh: Do not trigger immediate probes on NUD_FAILED from neigh_managed_work tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data() net: sparx5: do not refer to skb after passing it on Partially revert "net/smc: Add netlink net namespace support" net/mlx5e: Avoid field-overflowing memcpy() net/mlx5e: Use struct_group() for memcpy() region net/mlx5e: Avoid implicit modify hdr for decap drop rule net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic net/mlx5e: Don't treat small ceil values as unlimited in HTB offload net/mlx5: E-Switch, Fix uninitialized variable modact net/mlx5e: Fix handling of wrong devices during bond netevent net/mlx5e: Fix broken SKB allocation in HW-GRO net/mlx5e: Fix wrong calculation of header index in HW_GRO ...
2022-02-04ax25: fix reference count leaks of ax25_devDuoming Zhou3-16/+36
The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") introduces refcount into ax25_dev, but there are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(), ax25_rt_add(), ax25_rt_del() and ax25_rt_opt(). This patch uses ax25_dev_put() and adjusts the position of ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev. Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03net, neigh: Do not trigger immediate probes on NUD_FAILED from ↵Daniel Borkmann1-6/+12
neigh_managed_work syzkaller was able to trigger a deadlock for NTF_MANAGED entries [0]: kworker/0:16/14617 is trying to acquire lock: ffffffff8d4dd370 (&tbl->lock){++-.}-{2:2}, at: ___neigh_create+0x9e1/0x2990 net/core/neighbour.c:652 [...] but task is already holding lock: ffffffff8d4dd370 (&tbl->lock){++-.}-{2:2}, at: neigh_managed_work+0x35/0x250 net/core/neighbour.c:1572 The neighbor entry turned to NUD_FAILED state, where __neigh_event_send() triggered an immediate probe as per commit cd28ca0a3dd1 ("neigh: reduce arp latency") via neigh_probe() given table lock was held. One option to fix this situation is to defer the neigh_probe() back to the neigh_timer_handler() similarly as pre cd28ca0a3dd1. For the case of NTF_MANAGED, this deferral is acceptable given this only happens on actual failure state and regular / expected state is NUD_VALID with the entry already present. The fix adds a parameter to __neigh_event_send() in order to communicate whether immediate probe is allowed or disallowed. Existing call-sites of neigh_event_send() default as-is to immediate probe. However, the neigh_managed_work() disables it via use of neigh_event_send_probe(). [0] <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_deadlock_bug kernel/locking/lockdep.c:2956 [inline] check_deadlock kernel/locking/lockdep.c:2999 [inline] validate_chain kernel/locking/lockdep.c:3788 [inline] __lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5027 lock_acquire kernel/locking/lockdep.c:5639 [inline] lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604 __raw_write_lock_bh include/linux/rwlock_api_smp.h:202 [inline] _raw_write_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:334 ___neigh_create+0x9e1/0x2990 net/core/neighbour.c:652 ip6_finish_output2+0x1070/0x14f0 net/ipv6/ip6_output.c:123 __ip6_finish_output net/ipv6/ip6_output.c:191 [inline] __ip6_finish_output+0x61e/0xe90 net/ipv6/ip6_output.c:170 ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224 dst_output include/net/dst.h:451 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] ndisc_send_skb+0xa99/0x17f0 net/ipv6/ndisc.c:508 ndisc_send_ns+0x3a9/0x840 net/ipv6/ndisc.c:650 ndisc_solicit+0x2cd/0x4f0 net/ipv6/ndisc.c:742 neigh_probe+0xc2/0x110 net/core/neighbour.c:1040 __neigh_event_send+0x37d/0x1570 net/core/neighbour.c:1201 neigh_event_send include/net/neighbour.h:470 [inline] neigh_managed_work+0x162/0x250 net/core/neighbour.c:1574 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307 worker_thread+0x657/0x1110 kernel/workqueue.c:2454 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> Fixes: 7482e3841d52 ("net, neigh: Add NTF_MANAGED flag for managed neighbor entries") Reported-by: syzbot+5239d0e1778a500d477a@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Tested-by: syzbot+5239d0e1778a500d477a@syzkaller.appspotmail.com Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220201193942.5055-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()Eric Dumazet1-0/+2
tcp_shift_skb_data() might collapse three packets into a larger one. P_A, P_B, P_C -> P_ABC Historically, it used a single tcp_skb_can_collapse_to(P_A) call, because it was enough. In commit 85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions"), this call was replaced by a call to tcp_skb_can_collapse(P_A, P_B) But the now needed test over P_C has been missed. This probably broke MPTCP. Then later, commit 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs") added an extra condition to tcp_skb_can_collapse(), but the missing call from tcp_shift_skb_data() is also breaking TCP zerocopy, because P_A and P_C might have different skb_zcopy_pure() status. Fixes: 85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions") Fixes: 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mat Martineau <mathew.j.martineau@linux.intel.com> Cc: Talal Ahmad <talalahmad@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220201184640.756716-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-02libceph: optionally use bounce buffer on recv path in crc modeIlya Dryomov4-20/+103
Both msgr1 and msgr2 in crc mode are zero copy in the sense that message data is read from the socket directly into the destination buffer. We assume that the destination buffer is stable (i.e. remains unchanged while it is being read to) though. Otherwise, CRC errors ensue: libceph: read_partial_message 0000000048edf8ad data crc 1063286393 != exp. 228122706 libceph: osd1 (1)192.168.122.1:6843 bad crc/signature libceph: bad data crc, calculated 57958023, expected 1805382778 libceph: osd2 (2)192.168.122.1:6876 integrity error, bad crc Introduce rxbounce option to enable use of a bounce buffer when receiving message data. In particular this is needed if a mapped image is a Windows VM disk, passed to QEMU. Windows has a system-wide "dummy" page that may be mapped into the destination buffer (potentially more than once into the same buffer) by the Windows Memory Manager in an effort to generate a single large I/O [1][2]. QEMU makes a point of preserving overlap relationships when cloning I/O vectors, so krbd gets exposed to this behaviour. [1] "What Is Really in That MDL?" https://docs.microsoft.com/en-us/previous-versions/windows/hardware/design/dn614012(v=vs.85) [2] https://blogs.msmvps.com/kernelmustard/2005/05/04/dummy-pages/ URL: https://bugzilla.redhat.com/show_bug.cgi?id=1973317 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-02-02libceph: make recv path in secure mode work the same as send pathIlya Dryomov1-62/+154
The recv path of secure mode is intertwined with that of crc mode. While it's slightly more efficient that way (the ciphertext is read into the destination buffer and decrypted in place, thus avoiding two potentially heavy memory allocations for the bounce buffer and the corresponding sg array), it isn't really amenable to changes. Sacrifice that edge and align with the send path which always uses a full-sized bounce buffer (currently there is no other way -- if the kernel crypto API ever grows support for streaming (piecewise) en/decryption for GCM [1], we would be able to easily take advantage of that on both sides). [1] https://lore.kernel.org/all/20141225202830.GA18794@gondor.apana.org.au/ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-02-02Partially revert "net/smc: Add netlink net namespace support"Dmitry V. Levin1-2/+0
The change of sizeof(struct smc_diag_linkinfo) by commit 79d39fc503b4 ("net/smc: Add netlink net namespace support") introduced an ABI regression: since struct smc_diag_lgrinfo contains an object of type "struct smc_diag_linkinfo", offset of all subsequent members of struct smc_diag_lgrinfo was changed by that change. As result, applications compiled with the old version of struct smc_diag_linkinfo will receive garbage in struct smc_diag_lgrinfo.role if the kernel implements this new version of struct smc_diag_linkinfo. Fix this regression by reverting the part of commit 79d39fc503b4 that changes struct smc_diag_linkinfo. After all, there is SMC_GEN_NETLINK interface which is good enough, so there is probably no need to touch the smc_diag ABI in the first place. Fixes: 79d39fc503b4 ("net/smc: Add netlink net namespace support") Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20220202030904.GA9742@altlinux.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-02tcp: fix mem under-charging with zerocopy sendmsg()Eric Dumazet1-2/+5
We got reports of following warning in inet_sock_destruct() WARN_ON(sk_forward_alloc_get(sk)); Whenever we add a non zero-copy fragment to a pure zerocopy skb, we have to anticipate that whole skb->truesize will be uncharged when skb is finally freed. skb->data_len is the payload length. But the memory truesize estimated by __zerocopy_sg_from_iter() is page aligned. Fixes: 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Talal Ahmad <talalahmad@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Link: https://lore.kernel.org/r/20220201065254.680532-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>