summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2022-09-20net: dsa: hellcreek: Print warning only onceKurt Kanzenbach1-1/+1
[ Upstream commit 52267ce25f60f37ae40ccbca0b21328ebae5ae75 ] In case the source port cannot be decoded, print the warning only once. This still brings attention to the user and does not spam the logs at the same time. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220830163448.8921-1-kurt@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-20Bluetooth: MGMT: Fix Get Device FlagsLuiz Augusto von Dentz1-29/+42
[ Upstream commit 23b72814da1a094b4c065e0bb598249f310c5577 ] Get Device Flags don't check if device does actually use an RPA in which case it shall only set HCI_CONN_FLAG_REMOTE_WAKEUP if LL Privacy is enabled. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15time64.h: consolidate uses of PSEC_PER_NSECVladimir Oltean1-2/+3
[ Upstream commit 837ced3a1a5d8bb1a637dd584711f31ae6b54d93 ] Time-sensitive networking code needs to work with PTP times expressed in nanoseconds, and with packet transmission times expressed in picoseconds, since those would be fractional at higher than gigabit speed when expressed in nanoseconds. Convert the existing uses in tc-taprio and the ocelot/felix DSA driver to a PSEC_PER_NSEC macro. This macro is placed in include/linux/time64.h as opposed to its relatives (PSEC_PER_SEC etc) from include/vdso/time64.h because the vDSO library does not (yet) need/use it. Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for the vDSO parts Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 11afdc6526de ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15sch_sfb: Also store skb len before calling child enqueueToke Høiland-Jørgensen1-1/+2
[ Upstream commit 2f09707d0c972120bf794cfe0f0c67e2c2ddb252 ] Cong Wang noticed that the previous fix for sch_sfb accessing the queued skb after enqueueing it to a child qdisc was incomplete: the SFB enqueue function was also calling qdisc_qstats_backlog_inc() after enqueue, which reads the pkt len from the skb cb field. Fix this by also storing the skb len, and using the stored value to increment the backlog after enqueueing. Fixes: 9efd23297cca ("sch_sfb: Don't assume the skb is still around after enqueueing to child") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Acked-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20220905192137.965549-1-toke@toke.dk Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15net/smc: Fix possible access to freed memory in link clearYacan Liu4-0/+13
[ Upstream commit e9b1a4f867ae9c1dbd1d71cd09cbdb3239fb4968 ] After modifying the QP to the Error state, all RX WR would be completed with WC in IB_WC_WR_FLUSH_ERR status. Current implementation does not wait for it is done, but destroy the QP and free the link group directly. So there is a risk that accessing the freed memory in tasklet context. Here is a crash example: BUG: unable to handle page fault for address: ffffffff8f220860 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD f7300e067 P4D f7300e067 PUD f7300f063 PMD 8c4e45063 PTE 800ffff08c9df060 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G S OE 5.10.0-0607+ #23 Hardware name: Inspur NF5280M4/YZMB-00689-101, BIOS 4.1.20 07/09/2018 RIP: 0010:native_queued_spin_lock_slowpath+0x176/0x1b0 Code: f3 90 48 8b 32 48 85 f6 74 f6 eb d5 c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 c8 02 00 48 03 04 f5 00 09 98 8e <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32 RSP: 0018:ffffb3b6c001ebd8 EFLAGS: 00010086 RAX: ffffffff8f220860 RBX: 0000000000000246 RCX: 0000000000080000 RDX: ffff91db1f86c800 RSI: 000000000000173c RDI: ffff91db62bace00 RBP: ffff91db62bacc00 R08: 0000000000000000 R09: c00000010000028b R10: 0000000000055198 R11: ffffb3b6c001ea58 R12: ffff91db80e05010 R13: 000000000000000a R14: 0000000000000006 R15: 0000000000000040 FS: 0000000000000000(0000) GS:ffff91db1f840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff8f220860 CR3: 00000001f9580004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> _raw_spin_lock_irqsave+0x30/0x40 mlx5_ib_poll_cq+0x4c/0xc50 [mlx5_ib] smc_wr_rx_tasklet_fn+0x56/0xa0 [smc] tasklet_action_common.isra.21+0x66/0x100 __do_softirq+0xd5/0x29c asm_call_irq_on_stack+0x12/0x20 </IRQ> do_softirq_own_stack+0x37/0x40 irq_exit_rcu+0x9d/0xa0 sysvec_call_function_single+0x34/0x80 asm_sysvec_call_function_single+0x12/0x20 Fixes: bd4ad57718cc ("smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR") Signed-off-by: Yacan Liu <liuyacan@corp.netease.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15tcp: fix early ETIMEDOUT after spurious non-SACK RTONeal Cardwell1-7/+18
[ Upstream commit 686dc2db2a0fdc1d34b424ec2c0a735becd8d62b ] Fix a bug reported and analyzed by Nagaraj Arankal, where the handling of a spurious non-SACK RTO could cause a connection to fail to clear retrans_stamp, causing a later RTO to very prematurely time out the connection with ETIMEDOUT. Here is the buggy scenario, expanding upon Nagaraj Arankal's excellent report: (*1) Send one data packet on a non-SACK connection (*2) Because no ACK packet is received, the packet is retransmitted and we enter CA_Loss; but this retransmission is spurious. (*3) The ACK for the original data is received. The transmitted packet is acknowledged. The TCP timestamp is before the retrans_stamp, so tcp_may_undo() returns true, and tcp_try_undo_loss() returns true without changing state to Open (because tcp_is_sack() is false), and tcp_process_loss() returns without calling tcp_try_undo_recovery(). Normally after undoing a CA_Loss episode, tcp_fastretrans_alert() would see that the connection has returned to CA_Open and fall through and call tcp_try_to_open(), which would set retrans_stamp to 0. However, for non-SACK connections we hold the connection in CA_Loss, so do not fall through to call tcp_try_to_open() and do not set retrans_stamp to 0. So retrans_stamp is (erroneously) still non-zero. At this point the first "retransmission event" has passed and been recovered from. Any future retransmission is a completely new "event". However, retrans_stamp is erroneously still set. (And we are still in CA_Loss, which is correct.) (*4) After 16 minutes (to correspond with tcp_retries2=15), a new data packet is sent. Note: No data is transmitted between (*3) and (*4) and we disabled keep alives. The socket's timeout SHOULD be calculated from this point in time, but instead it's calculated from the prior "event" 16 minutes ago (step (*2)). (*5) Because no ACK packet is received, the packet is retransmitted. (*6) At the time of the 2nd retransmission, the socket returns ETIMEDOUT, prematurely, because retrans_stamp is (erroneously) too far in the past (set at the time of (*2)). This commit fixes this bug by ensuring that we reuse in tcp_try_undo_loss() the same careful logic for non-SACK connections that we have in tcp_try_undo_recovery(). To avoid duplicating logic, we factor out that logic into a new tcp_is_non_sack_preventing_reopen() helper and call that helper from both undo functions. Fixes: da34ac7626b5 ("tcp: only undo on partial ACKs in CA_Loss") Reported-by: Nagaraj Arankal <nagaraj.p.arankal@hpe.com> Link: https://lore.kernel.org/all/SJ0PR84MB1847BE6C24D274C46A1B9B0EB27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM/ Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220903121023.866900-1-ncardwell.kernel@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15ipv6: sr: fix out-of-bounds read when setting HMAC data.David Lebrun1-0/+5
[ Upstream commit 84a53580c5d2138c7361c7c3eea5b31827e63b35 ] The SRv6 layer allows defining HMAC data that can later be used to sign IPv6 Segment Routing Headers. This configuration is realised via netlink through four attributes: SEG6_ATTR_HMACKEYID, SEG6_ATTR_SECRET, SEG6_ATTR_SECRETLEN and SEG6_ATTR_ALGID. Because the SECRETLEN attribute is decoupled from the actual length of the SECRET attribute, it is possible to provide invalid combinations (e.g., secret = "", secretlen = 64). This case is not checked in the code and with an appropriately crafted netlink message, an out-of-bounds read of up to 64 bytes (max secret length) can occur past the skb end pointer and into skb_shared_info: Breakpoint 1, seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208 208 memcpy(hinfo->secret, secret, slen); (gdb) bt #0 seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208 #1 0xffffffff81e012e9 in genl_family_rcv_msg_doit (skb=skb@entry=0xffff88800b1f9f00, nlh=nlh@entry=0xffff88800b1b7600, extack=extack@entry=0xffffc90000ba7af0, ops=ops@entry=0xffffc90000ba7a80, hdrlen=4, net=0xffffffff84237580 <init_net>, family=<optimized out>, family=<optimized out>) at net/netlink/genetlink.c:731 #2 0xffffffff81e01435 in genl_family_rcv_msg (extack=0xffffc90000ba7af0, nlh=0xffff88800b1b7600, skb=0xffff88800b1f9f00, family=0xffffffff82fef6c0 <seg6_genl_family>) at net/netlink/genetlink.c:775 #3 genl_rcv_msg (skb=0xffff88800b1f9f00, nlh=0xffff88800b1b7600, extack=0xffffc90000ba7af0) at net/netlink/genetlink.c:792 #4 0xffffffff81dfffc3 in netlink_rcv_skb (skb=skb@entry=0xffff88800b1f9f00, cb=cb@entry=0xffffffff81e01350 <genl_rcv_msg>) at net/netlink/af_netlink.c:2501 #5 0xffffffff81e00919 in genl_rcv (skb=0xffff88800b1f9f00) at net/netlink/genetlink.c:803 #6 0xffffffff81dff6ae in netlink_unicast_kernel (ssk=0xffff888010eec800, skb=0xffff88800b1f9f00, sk=0xffff888004aed000) at net/netlink/af_netlink.c:1319 #7 netlink_unicast (ssk=ssk@entry=0xffff888010eec800, skb=skb@entry=0xffff88800b1f9f00, portid=portid@entry=0, nonblock=<optimized out>) at net/netlink/af_netlink.c:1345 #8 0xffffffff81dff9a4 in netlink_sendmsg (sock=<optimized out>, msg=0xffffc90000ba7e48, len=<optimized out>) at net/netlink/af_netlink.c:1921 ... (gdb) p/x ((struct sk_buff *)0xffff88800b1f9f00)->head + ((struct sk_buff *)0xffff88800b1f9f00)->end $1 = 0xffff88800b1b76c0 (gdb) p/x secret $2 = 0xffff88800b1b76c0 (gdb) p slen $3 = 64 '@' The OOB data can then be read back from userspace by dumping HMAC state. This commit fixes this by ensuring SECRETLEN cannot exceed the actual length of SECRET. Reported-by: Lucas Leong <wmliang.tw@gmail.com> Tested: verified that EINVAL is correctly returned when secretlen > len(secret) Fixes: 4f4853dc1c9c1 ("ipv6: sr: implement API to control SR HMAC structure") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15bonding: add all node mcast address when slave upHangbin Liu1-2/+6
[ Upstream commit fd16eb948ea8b28afb03e11a5b11841e6ac2aa2b ] When a link is enslave to bond, it need to set the interface down first. This makes the slave remove mac multicast address 33:33:00:00:00:01(The IPv6 multicast address ff02::1 is kept even when the interface down). When bond set the slave up, ipv6_mc_up() was not called due to commit c2edacf80e15 ("bonding / ipv6: no addrconf for slaves separately from master"). This is not an issue before we adding the lladdr target feature for bonding, as the mac multicast address will be added back when bond interface up and join group ff02::1. But after adding lladdr target feature for bonding. When user set a lladdr target, the unsolicited NA message with all-nodes multicast dest will be dropped as the slave interface never add 33:33:00:00:00:01 back. Fix this by calling ipv6_mc_up() to add 33:33:00:00:00:01 back when the slave interface up. Reported-by: LiLiang <liali@redhat.com> Fixes: 5e1eeef69c0f ("bonding: NS target should accept link local address") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15tcp: TX zerocopy should not sense pfmemalloc statusEric Dumazet2-2/+2
[ Upstream commit 3261400639463a853ba2b3be8bd009c2a8089775 ] We got a recent syzbot report [1] showing a possible misuse of pfmemalloc page status in TCP zerocopy paths. Indeed, for pages coming from user space or other layers, using page_is_pfmemalloc() is moot, and possibly could give false positives. There has been attempts to make page_is_pfmemalloc() more robust, but not using it in the first place in this context is probably better, removing cpu cycles. Note to stable teams : You need to backport 84ce071e38a6 ("net: introduce __skb_fill_page_desc_noacc") as a prereq. Race is more probable after commit c07aea3ef4d4 ("mm: add a signature in struct page") because page_is_pfmemalloc() is now using low order bit from page->lru.next, which can change more often than page->index. Low order bit should never be set for lru.next (when used as an anchor in LRU list), so KCSAN report is mostly a false positive. Backporting to older kernel versions seems not necessary. [1] BUG: KCSAN: data-race in lru_add_fn / tcp_build_frag write to 0xffffea0004a1d2c8 of 8 bytes by task 18600 on cpu 0: __list_add include/linux/list.h:73 [inline] list_add include/linux/list.h:88 [inline] lruvec_add_folio include/linux/mm_inline.h:105 [inline] lru_add_fn+0x440/0x520 mm/swap.c:228 folio_batch_move_lru+0x1e1/0x2a0 mm/swap.c:246 folio_batch_add_and_move mm/swap.c:263 [inline] folio_add_lru+0xf1/0x140 mm/swap.c:490 filemap_add_folio+0xf8/0x150 mm/filemap.c:948 __filemap_get_folio+0x510/0x6d0 mm/filemap.c:1981 pagecache_get_page+0x26/0x190 mm/folio-compat.c:104 grab_cache_page_write_begin+0x2a/0x30 mm/folio-compat.c:116 ext4_da_write_begin+0x2dd/0x5f0 fs/ext4/inode.c:2988 generic_perform_write+0x1d4/0x3f0 mm/filemap.c:3738 ext4_buffered_write_iter+0x235/0x3e0 fs/ext4/file.c:270 ext4_file_write_iter+0x2e3/0x1210 call_write_iter include/linux/fs.h:2187 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x468/0x760 fs/read_write.c:578 ksys_write+0xe8/0x1a0 fs/read_write.c:631 __do_sys_write fs/read_write.c:643 [inline] __se_sys_write fs/read_write.c:640 [inline] __x64_sys_write+0x3e/0x50 fs/read_write.c:640 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffffea0004a1d2c8 of 8 bytes by task 18611 on cpu 1: page_is_pfmemalloc include/linux/mm.h:1740 [inline] __skb_fill_page_desc include/linux/skbuff.h:2422 [inline] skb_fill_page_desc include/linux/skbuff.h:2443 [inline] tcp_build_frag+0x613/0xb20 net/ipv4/tcp.c:1018 do_tcp_sendpages+0x3e8/0xaf0 net/ipv4/tcp.c:1075 tcp_sendpage_locked net/ipv4/tcp.c:1140 [inline] tcp_sendpage+0x89/0xb0 net/ipv4/tcp.c:1150 inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833 kernel_sendpage+0x184/0x300 net/socket.c:3561 sock_sendpage+0x5a/0x70 net/socket.c:1054 pipe_to_sendpage+0x128/0x160 fs/splice.c:361 splice_from_pipe_feed fs/splice.c:415 [inline] __splice_from_pipe+0x222/0x4d0 fs/splice.c:559 splice_from_pipe fs/splice.c:594 [inline] generic_splice_sendpage+0x89/0xc0 fs/splice.c:743 do_splice_from fs/splice.c:764 [inline] direct_splice_actor+0x80/0xa0 fs/splice.c:931 splice_direct_to_actor+0x305/0x620 fs/splice.c:886 do_splice_direct+0xfb/0x180 fs/splice.c:974 do_sendfile+0x3bf/0x910 fs/read_write.c:1249 __do_sys_sendfile64 fs/read_write.c:1317 [inline] __se_sys_sendfile64 fs/read_write.c:1303 [inline] __x64_sys_sendfile64+0x10c/0x150 fs/read_write.c:1303 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x0000000000000000 -> 0xffffea0004a1d288 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 18611 Comm: syz-executor.4 Not tainted 6.0.0-rc2-syzkaller-00248-ge022620b5d05-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022 Fixes: c07aea3ef4d4 ("mm: add a signature in struct page") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Shakeel Butt <shakeelb@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15tipc: fix shift wrapping bug in map_get()Dan Carpenter1-1/+1
[ Upstream commit e2b224abd9bf45dcb55750479fc35970725a430b ] There is a shift wrapping bug in this code so anything thing above 31 will return false. Fixes: 35c55c9877f8 ("tipc: add neighbor monitoring framework") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15sch_sfb: Don't assume the skb is still around after enqueueing to childToke Høiland-Jørgensen1-4/+6
[ Upstream commit 9efd23297cca530bb35e1848665805d3fcdd7889 ] The sch_sfb enqueue() routine assumes the skb is still alive after it has been enqueued into a child qdisc, using the data in the skb cb field in the increment_qlen() routine after enqueue. However, the skb may in fact have been freed, causing a use-after-free in this case. In particular, this happens if sch_cake is used as a child of sfb, and the GSO splitting mode of CAKE is enabled (in which case the skb will be split into segments and the original skb freed). Fix this by copying the sfb cb data to the stack before enqueueing the skb, and using this stack copy in increment_qlen() instead of the skb pointer itself. Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18231 Fixes: e13e02a3c68d ("net_sched: SFB flow scheduler") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15rxrpc: Fix an insufficiently large sglist in rxkad_verify_packet_2()David Howells1-1/+1
[ Upstream commit 0d40f728e28393a8817d1fcae923dfa3409e488c ] rxkad_verify_packet_2() has a small stack-allocated sglist of 4 elements, but if that isn't sufficient for the number of fragments in the socket buffer, we try to allocate an sglist large enough to hold all the fragments. However, for large packets with a lot of fragments, this isn't sufficient and we need at least one additional fragment. The problem manifests as skb_to_sgvec() returning -EMSGSIZE and this then getting returned by userspace. Most of the time, this isn't a problem as rxrpc sets a limit of 5692, big enough for 4 jumbo subpackets to be glued together; occasionally, however, the server will ignore the reported limit and give a packet that's a lot bigger - say 19852 bytes with ->nr_frags being 7. skb_to_sgvec() then tries to return a "zeroth" fragment that seems to occur before the fragments counted by ->nr_frags and we hit the end of the sglist too early. Note that __skb_to_sgvec() also has an skb_walk_frags() loop that is recursive up to 24 deep. I'm not sure if I need to take account of that too - or if there's an easy way of counting those frags too. Fix this by counting an extra frag and allocating a larger sglist based on that. Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15rxrpc: Fix ICMP/ICMP6 error handlingDavid Howells6-38/+265
[ Upstream commit ac56a0b48da86fd1b4389632fb7c4c8a5d86eefa ] Because rxrpc pretends to be a tunnel on top of a UDP/UDP6 socket, allowing it to siphon off UDP packets early in the handling of received UDP packets thereby avoiding the packet going through the UDP receive queue, it doesn't get ICMP packets through the UDP ->sk_error_report() callback. In fact, it doesn't appear that there's any usable option for getting hold of ICMP packets. Fix this by adding a new UDP encap hook to distribute error messages for UDP tunnels. If the hook is set, then the tunnel driver will be able to see ICMP packets. The hook provides the offset into the packet of the UDP header of the original packet that caused the notification. An alternative would be to call the ->error_handler() hook - but that requires that the skbuff be cloned (as ip_icmp_error() or ipv6_cmp_error() do, though isn't really necessary or desirable in rxrpc's case is we want to parse them there and then, not queue them). Changes ======= ver #3) - Fixed an uninitialised variable. ver #2) - Fixed some missing CONFIG_AF_RXRPC_IPV6 conditionals. Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15netfilter: nf_conntrack_irc: Fix forged IP logicDavid Leadbeater1-2/+3
[ Upstream commit 0efe125cfb99e6773a7434f3463f7c2fa28f3a43 ] Ensure the match happens in the right direction, previously the destination used was the server, not the NAT host, as the comment shows the code intended. Additionally nf_nat_irc uses port 0 as a signal and there's no valid way it can appear in a DCC message, so consider port 0 also forged. Fixes: 869f37d8e48f ("[NETFILTER]: nf_conntrack/nf_nat: add IRC helper port") Signed-off-by: David Leadbeater <dgl@dgl.cx> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15netfilter: nf_tables: clean up hook list when offload flags check failsPablo Neira Ayuso1-1/+3
[ Upstream commit 77972a36ecc4db7fc7c68f0e80714263c5f03f65 ] splice back the hook list so nft_chain_release_hook() has a chance to release the hooks. BUG: memory leak unreferenced object 0xffff88810180b100 (size 96): comm "syz-executor133", pid 3619, jiffies 4294945714 (age 12.690s) hex dump (first 32 bytes): 28 64 23 02 81 88 ff ff 28 64 23 02 81 88 ff ff (d#.....(d#..... 90 a8 aa 83 ff ff ff ff 00 00 b5 0f 81 88 ff ff ................ backtrace: [<ffffffff83a8c59b>] kmalloc include/linux/slab.h:600 [inline] [<ffffffff83a8c59b>] nft_netdev_hook_alloc+0x3b/0xc0 net/netfilter/nf_tables_api.c:1901 [<ffffffff83a9239a>] nft_chain_parse_netdev net/netfilter/nf_tables_api.c:1998 [inline] [<ffffffff83a9239a>] nft_chain_parse_hook+0x33a/0x530 net/netfilter/nf_tables_api.c:2073 [<ffffffff83a9b14b>] nf_tables_addchain.constprop.0+0x10b/0x950 net/netfilter/nf_tables_api.c:2218 [<ffffffff83a9c41b>] nf_tables_newchain+0xa8b/0xc60 net/netfilter/nf_tables_api.c:2593 [<ffffffff83a3d6a6>] nfnetlink_rcv_batch+0xa46/0xd20 net/netfilter/nfnetlink.c:517 [<ffffffff83a3db79>] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:638 [inline] [<ffffffff83a3db79>] nfnetlink_rcv+0x1f9/0x220 net/netfilter/nfnetlink.c:656 [<ffffffff83a13b17>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] [<ffffffff83a13b17>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 [<ffffffff83a13fd6>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 [<ffffffff83865ab6>] sock_sendmsg_nosec net/socket.c:714 [inline] [<ffffffff83865ab6>] sock_sendmsg+0x56/0x80 net/socket.c:734 [<ffffffff8386601c>] ____sys_sendmsg+0x36c/0x390 net/socket.c:2482 [<ffffffff8386a918>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 [<ffffffff8386aaa8>] __sys_sendmsg+0x88/0x100 net/socket.c:2565 [<ffffffff845e5955>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845e5955>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: d54725cd11a5 ("netfilter: nf_tables: support for multiple devices per netdev hook") Reported-by: syzbot+5fcdbfab6d6744c57418@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15netfilter: br_netfilter: Drop dst references before setting.Harsh Modi2-0/+3
[ Upstream commit d047283a7034140ea5da759a494fd2274affdd46 ] The IPv6 path already drops dst in the daddr changed case, but the IPv4 path does not. This change makes the two code paths consistent. Further, it is possible that there is already a metadata_dst allocated from ingress that might already be attached to skbuff->dst while following the bridge path. If it is not released before setting a new metadata_dst, it will be leaked. This is similar to what is done in bpf_set_tunnel_key() or ip6_route_input(). It is important to note that the memory being leaked is not the dst being set in the bridge code, but rather memory allocated from some other code path that is not being freed correctly before the skb dst is overwritten. An example of the leakage fixed by this commit found using kmemleak: unreferenced object 0xffff888010112b00 (size 256): comm "softirq", pid 0, jiffies 4294762496 (age 32.012s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 80 16 f1 83 ff ff ff ff ................ e1 4e f6 82 ff ff ff ff 00 00 00 00 00 00 00 00 .N.............. backtrace: [<00000000d79567ea>] metadata_dst_alloc+0x1b/0xe0 [<00000000be113e13>] udp_tun_rx_dst+0x174/0x1f0 [<00000000a36848f4>] geneve_udp_encap_recv+0x350/0x7b0 [<00000000d4afb476>] udp_queue_rcv_one_skb+0x380/0x560 [<00000000ac064aea>] udp_unicast_rcv_skb+0x75/0x90 [<000000009a8ee8c5>] ip_protocol_deliver_rcu+0xd8/0x230 [<00000000ef4980bb>] ip_local_deliver_finish+0x7a/0xa0 [<00000000d7533c8c>] __netif_receive_skb_one_core+0x89/0xa0 [<00000000a879497d>] process_backlog+0x93/0x190 [<00000000e41ade9f>] __napi_poll+0x28/0x170 [<00000000b4c0906b>] net_rx_action+0x14f/0x2a0 [<00000000b20dd5d4>] __do_softirq+0xf4/0x305 [<000000003a7d7e15>] __irq_exit_rcu+0xc3/0x140 [<00000000968d39a2>] sysvec_apic_timer_interrupt+0x9e/0xc0 [<000000009e920794>] asm_sysvec_apic_timer_interrupt+0x16/0x20 [<000000008942add0>] native_safe_halt+0x13/0x20 Florian Westphal says: "Original code was likely fine because nothing ever did set a skb->dst entry earlier than bridge in those days." Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Harsh Modi <harshmodi@google.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15net/core/skbuff: Check the return value of skb_copy_bits()lily1-3/+2
[ Upstream commit c624c58e08b15105662b9ab9be23d14a6b945a49 ] skb_copy_bits() could fail, which requires a check on the return value. Signed-off-by: Li Zhong <floridsleeves@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15netfilter: conntrack: work around exceeded receive windowFlorian Westphal1-0/+31
[ Upstream commit cf97769c761abfeac8931b35fe0e1a8d5fabc9d8 ] When a TCP sends more bytes than allowed by the receive window, all future packets can be marked as invalid. This can clog up the conntrack table because of 5-day default timeout. Sequence of packets: 01 initiator > responder: [S], seq 171, win 5840, options [mss 1330,sackOK,TS val 63 ecr 0,nop,wscale 1] 02 responder > initiator: [S.], seq 33211, ack 172, win 65535, options [mss 1460,sackOK,TS val 010 ecr 63,nop,wscale 8] 03 initiator > responder: [.], ack 33212, win 2920, options [nop,nop,TS val 068 ecr 010], length 0 04 initiator > responder: [P.], seq 172:240, ack 33212, win 2920, options [nop,nop,TS val 279 ecr 010], length 68 Window is 5840 starting from 33212 -> 39052. 05 responder > initiator: [.], ack 240, win 256, options [nop,nop,TS val 872 ecr 279], length 0 06 responder > initiator: [.], seq 33212:34530, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 This is fine, conntrack will flag the connection as having outstanding data (UNACKED), which lowers the conntrack timeout to 300s. 07 responder > initiator: [.], seq 34530:35848, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 08 responder > initiator: [.], seq 35848:37166, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 09 responder > initiator: [.], seq 37166:38484, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 10 responder > initiator: [.], seq 38484:39802, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 Packet 10 is already sending more than permitted, but conntrack doesn't validate this (only seq is tested vs. maxend, not 'seq+len'). 38484 is acceptable, but only up to 39052, so this packet should not have been sent (or only 568 bytes, not 1318). At this point, connection is still in '300s' mode. Next packet however will get flagged: 11 responder > initiator: [P.], seq 39802:40128, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 326 nf_ct_proto_6: SEQ is over the upper bound (over the window of the receiver) .. LEN=378 .. SEQ=39802 ACK=240 ACK PSH .. Now, a couple of replies/acks comes in: 12 initiator > responder: [.], ack 34530, win 4368, [.. irrelevant acks removed ] 16 initiator > responder: [.], ack 39802, win 8712, options [nop,nop,TS val 296201291 ecr 2982371892], length 0 This ack is significant -- this acks the last packet send by the responder that conntrack considered valid. This means that ack == td_end. This will withdraw the 'unacked data' flag, the connection moves back to the 5-day timeout of established conntracks. 17 initiator > responder: ack 40128, win 10030, ... This packet is also flagged as invalid. Because conntrack only updates state based on packets that are considered valid, packet 11 'did not exist' and that gets us: nf_ct_proto_6: ACK is over upper bound 39803 (ACKed data not seen yet) .. SEQ=240 ACK=40128 WINDOW=10030 RES=0x00 ACK URG Because this received and processed by the endpoints, the conntrack entry remains in a bad state, no packets will ever be considered valid again: 30 responder > initiator: [F.], seq 40432, ack 2045, win 391, .. 31 initiator > responder: [.], ack 40433, win 11348, .. 32 initiator > responder: [F.], seq 2045, ack 40433, win 11348 .. ... all trigger 'ACK is over bound' test and we end up with non-early-evictable 5-day default timeout. NB: This patch triggers a bunch of checkpatch warnings because of silly indent. I will resend the cleanup series linked below to reduce the indent level once this change has propagated to net-next. I could route the cleanup via nf but that causes extra backport work for stable maintainers. Link: https://lore.kernel.org/netfilter-devel/20220720175228.17880-1-fw@strlen.de/T/#mb1d7147d36294573cc4f81d00f9f8dadfdd06cd8 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08net: mac802154: Fix a condition in the receive pathMiquel Raynal1-1/+1
commit f0da47118c7e93cdbbc6fb403dd729a5f2c90ee3 upstream. Upon reception, a packet must be categorized, either it's destination is the host, or it is another host. A packet with no destination addressing fields may be valid in two situations: - the packet has no source field: only ACKs are built like that, we consider the host as the destination. - the packet has a valid source field: it is directed to the PAN coordinator, as for know we don't have this information we consider we are not the PAN coordinator. There was likely a copy/paste error made during a previous cleanup because the if clause is now containing exactly the same condition as in the switch case, which can never be true. In the past the destination address was used in the switch and the source address was used in the if, which matches what the spec says. Cc: stable@vger.kernel.org Fixes: ae531b9475f6 ("ieee802154: use ieee802154_addr instead of *_sa variants") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220826142954.254853-1-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-08net: Use u64_stats_fetch_begin_irq() for stats fetch.Sebastian Andrzej Siewior2-6/+6
commit 278d3ba61563ceed3cb248383ced19e14ec7bc1f upstream. On 32bit-UP u64_stats_fetch_begin() disables only preemption. If the reader is in preemptible context and the writer side (u64_stats_update_begin*()) runs in an interrupt context (IRQ or softirq) then the writer can update the stats during the read operation. This update remains undetected. Use u64_stats_fetch_begin_irq() to ensure the stats fetch on 32bit-UP are not interrupted by a writer. 32bit-SMP remains unaffected by this change. Cc: "David S. Miller" <davem@davemloft.net> Cc: Catherine Sullivan <csully@google.com> Cc: David Awogbemila <awogbemila@google.com> Cc: Dimitris Michailidis <dmichail@fungible.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Hans Ulli Kroll <ulli.kroll@googlemail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jeroen de Borst <jeroendb@google.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <simon.horman@corigine.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Cc: oss-drivers@corigine.com Cc: stable@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-08ip: fix triggering of 'icmp redirect'Nicolas Dichtel1-2/+2
commit eb55dc09b5dd040232d5de32812cc83001a23da6 upstream. __mkroute_input() uses fib_validate_source() to trigger an icmp redirect. My understanding is that fib_validate_source() is used to know if the src address and the gateway address are on the same link. For that, fib_validate_source() returns 1 (same link) or 0 (not the same network). __mkroute_input() is the only user of these positive values, all other callers only look if the returned value is negative. Since the below patch, fib_validate_source() didn't return anymore 1 when both addresses are on the same network, because the route lookup returns RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right. Let's adapat the test to return 1 again when both addresses are on the same link. CC: stable@vger.kernel.org Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop") Reported-by: kernel test robot <yujie.liu@intel.com> Reported-by: Heng Qi <hengqi@linux.alibaba.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220829100121.3821-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-08wifi: mac80211: Fix UAF in ieee80211_scan_rx()Siddh Raman Pant1-4/+7
commit 60deb9f10eec5c6a20252ed36238b55d8b614a2c upstream. ieee80211_scan_rx() tries to access scan_req->flags after a null check, but a UAF is observed when the scan is completed and __ieee80211_scan_completed() executes, which then calls cfg80211_scan_done() leading to the freeing of scan_req. Since scan_req is rcu_dereference()'d, prevent the racing in __ieee80211_scan_completed() by ensuring that from mac80211's POV it is no longer accessed from an RCU read critical section before we call cfg80211_scan_done(). Cc: stable@vger.kernel.org Link: https://syzkaller.appspot.com/bug?extid=f9acff9bf08a845f225d Reported-by: syzbot+f9acff9bf08a845f225d@syzkaller.appspotmail.com Suggested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Siddh Raman Pant <code@siddh.me> Link: https://lore.kernel.org/r/20220819200340.34826-1-code@siddh.me Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-08wifi: mac80211: Don't finalize CSA in IBSS mode if state is disconnectedSiddh Raman Pant1-0/+4
commit 15bc8966b6d3a5b9bfe4c9facfa02f2b69b1e5f0 upstream. When we are not connected to a channel, sending channel "switch" announcement doesn't make any sense. The BSS list is empty in that case. This causes the for loop in cfg80211_get_bss() to be bypassed, so the function returns NULL (check line 1424 of net/wireless/scan.c), causing the WARN_ON() in ieee80211_ibss_csa_beacon() to get triggered (check line 500 of net/mac80211/ibss.c), which was consequently reported on the syzkaller dashboard. Thus, check if we have an existing connection before generating the CSA beacon in ieee80211_ibss_finish_csa(). Cc: stable@vger.kernel.org Fixes: cd7760e62c2a ("mac80211: add support for CSA in IBSS mode") Link: https://syzkaller.appspot.com/bug?id=05603ef4ae8926761b678d2939a3b2ad28ab9ca6 Reported-by: syzbot+b6c9fe29aefe68e4ad34@syzkaller.appspotmail.com Signed-off-by: Siddh Raman Pant <code@siddh.me> Tested-by: syzbot+b6c9fe29aefe68e4ad34@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20220814151512.9985-1-code@siddh.me Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-08net/smc: Remove redundant refcount increaseYacan Liu1-1/+0
[ Upstream commit a8424a9b4522a3ab9f32175ad6d848739079071f ] For passive connections, the refcount increment has been done in smc_clcsock_accept()-->smc_sock_alloc(). Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc") Signed-off-by: Yacan Liu <liuyacan@corp.netease.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Link: https://lore.kernel.org/r/20220830152314.838736-1-liuyacan@corp.netease.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb"Jakub Kicinski1-3/+1
[ Upstream commit 0b4f688d53fdc2a731b9d9cdf0c96255bc024ea6 ] This reverts commit 90fabae8a2c225c4e4936723c38857887edde5cc. Patch was applied hastily, revert and let the v2 be reviewed. Fixes: 90fabae8a2c2 ("sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb") Link: https://lore.kernel.org/all/87wnao2ha3.fsf@toke.dk/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08tcp: annotate data-race around challenge_timestampEric Dumazet1-2/+2
[ Upstream commit 8c70521238b7863c2af607e20bcba20f974c969b ] challenge_timestamp can be read an written by concurrent threads. This was expected, but we need to annotate the race to avoid potential issues. Following patch moves challenge_timestamp and challenge_count to per-netns storage to provide better isolation. Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skbToke Høiland-Jørgensen1-1/+3
[ Upstream commit 90fabae8a2c225c4e4936723c38857887edde5cc ] When the GSO splitting feature of sch_cake is enabled, GSO superpackets will be broken up and the resulting segments enqueued in place of the original skb. In this case, CAKE calls consume_skb() on the original skb, but still returns NET_XMIT_SUCCESS. This can confuse parent qdiscs into assuming the original skb still exists, when it really has been freed. Fix this by adding the __NET_XMIT_STOLEN flag to the return value in this case. Fixes: 0c850344d388 ("sch_cake: Conditionally split GSO segments") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18231 Link: https://lore.kernel.org/r/20220831092103.442868-1-toke@toke.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08kcm: fix strp_init() order and cleanupCong Wang1-8/+7
[ Upstream commit 8fc29ff3910f3af08a7c40a75d436b5720efe2bf ] strp_init() is called just a few lines above this csk->sk_user_data check, it also initializes strp->work etc., therefore, it is unnecessary to call strp_done() to cancel the freshly initialized work. And if sk_user_data is already used by KCM, psock->strp should not be touched, particularly strp->work state, so we need to move strp_init() after the csk->sk_user_data check. This also makes a lockdep warning reported by syzbot go away. Reported-and-tested-by: syzbot+9fc084a4348493ef65d2@syzkaller.appspotmail.com Reported-by: syzbot+e696806ef96cdd2d87cd@syzkaller.appspotmail.com Fixes: e5571240236c ("kcm: Check if sk_user_data already set in kcm_attach") Fixes: dff8baa26117 ("kcm: Call strp_stop before strp_done in kcm_attach") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20220827181314.193710-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08net/sched: fix netdevice reference leaks in attach_default_qdiscs()Wang Hai1-15/+16
[ Upstream commit f612466ebecb12a00d9152344ddda6f6345f04dc ] In attach_default_qdiscs(), if a dev has multiple queues and queue 0 fails to attach qdisc because there is no memory in attach_one_default_qdisc(). Then dev->qdisc will be noop_qdisc by default. But the other queues may be able to successfully attach to default qdisc. In this case, the fallback to noqueue process will be triggered. If the original attached qdisc is not released and a new one is directly attached, this will cause netdevice reference leaks. The following is the bug log: veth0: default qdisc (fq_codel) fail, fallback to noqueue unregister_netdevice: waiting for veth0 to become free. Usage count = 32 leaked reference. qdisc_alloc+0x12e/0x210 qdisc_create_dflt+0x62/0x140 attach_one_default_qdisc.constprop.41+0x44/0x70 dev_activate+0x128/0x290 __dev_open+0x12a/0x190 __dev_change_flags+0x1a2/0x1f0 dev_change_flags+0x23/0x60 do_setlink+0x332/0x1150 __rtnl_newlink+0x52f/0x8e0 rtnl_newlink+0x43/0x70 rtnetlink_rcv_msg+0x140/0x3b0 netlink_rcv_skb+0x50/0x100 netlink_unicast+0x1bb/0x290 netlink_sendmsg+0x37c/0x4e0 sock_sendmsg+0x5f/0x70 ____sys_sendmsg+0x208/0x280 Fix this bug by clearing any non-noop qdiscs that may have been assigned before trying to re-attach. Fixes: bf6dba76d278 ("net: sched: fallback to qdisc noqueue if default qdisc setup fail") Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20220826090055.24424-1-wanghai38@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08net: sched: tbf: don't call qdisc_put() while holding tree lockZhengchao Shao1-1/+3
[ Upstream commit b05972f01e7d30419987a1f221b5593668fd6448 ] The issue is the same to commit c2999f7fb05b ("net: sched: multiq: don't call qdisc_put() while holding tree lock"). Qdiscs call qdisc_put() while holding sch tree spinlock, which results sleeping-while-atomic BUG. Fixes: c266f64dbfa2 ("net: sched: protect block state with mutex") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220826013930.340121-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08openvswitch: fix memory leak at failed datapath creationAndrey Zhadchenko1-1/+3
[ Upstream commit a87406f4adee9c53b311d8a1ba2849c69e29a6d0 ] ovs_dp_cmd_new()->ovs_dp_change()->ovs_dp_set_upcall_portids() allocates array via kmalloc. If for some reason new_vport() fails during ovs_dp_cmd_new() dp->upcall_portids must be freed. Add missing kfree. Kmemleak example: unreferenced object 0xffff88800c382500 (size 64): comm "dump_state", pid 323, jiffies 4294955418 (age 104.347s) hex dump (first 32 bytes): 5e c2 79 e4 1f 7a 38 c7 09 21 38 0c 80 88 ff ff ^.y..z8..!8..... 03 00 00 00 0a 00 00 00 14 00 00 00 28 00 00 00 ............(... backtrace: [<0000000071bebc9f>] ovs_dp_set_upcall_portids+0x38/0xa0 [<000000000187d8bd>] ovs_dp_change+0x63/0xe0 [<000000002397e446>] ovs_dp_cmd_new+0x1f0/0x380 [<00000000aa06f36e>] genl_family_rcv_msg_doit+0xea/0x150 [<000000008f583bc4>] genl_rcv_msg+0xdc/0x1e0 [<00000000fa10e377>] netlink_rcv_skb+0x50/0x100 [<000000004959cece>] genl_rcv+0x24/0x40 [<000000004699ac7f>] netlink_unicast+0x23e/0x360 [<00000000c153573e>] netlink_sendmsg+0x24e/0x4b0 [<000000006f4aa380>] sock_sendmsg+0x62/0x70 [<00000000d0068654>] ____sys_sendmsg+0x230/0x270 [<0000000012dacf7d>] ___sys_sendmsg+0x88/0xd0 [<0000000011776020>] __sys_sendmsg+0x59/0xa0 [<000000002e8f2dc1>] do_syscall_64+0x3b/0x90 [<000000003243e7cb>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: b83d23a2a38b ("openvswitch: Introduce per-cpu upcall dispatch") Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> Link: https://lore.kernel.org/r/20220825020326.664073-1-andrey.zhadchenko@virtuozzo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08Bluetooth: hci_sync: hold hdev->lock when cleanup hci_connZhengping Jiang1-2/+4
[ Upstream commit 2da8eb834b775a9d1acea6214d3e4a78ac841e6e ] When disconnecting all devices, hci_conn_failed is used to cleanup hci_conn object when the hci_conn object cannot be aborted. The function hci_conn_failed requires the caller holds hdev->lock. Fixes: 9b3628d79b46f ("Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted") Signed-off-by: Zhengping Jiang <jiangzp@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08Bluetooth: hci_event: Fix checking conn for le_conn_complete_evtArchie Pusaka1-1/+1
[ Upstream commit f48735a9aaf8258f39918e13adf464ccd7dce33b ] To prevent multiple conn complete events, we shouldn't look up the conn with hci_lookup_le_connect, since it requires the state to be BT_CONNECT. By the time the duplicate event is processed, the state might have changed, so we end up processing the new event anyway. Change the lookup function to hci_conn_hash_lookup_ba. Fixes: d5ebaa7c5f6f6 ("Bluetooth: hci_event: Ignore multiple conn complete events") Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08Bluetooth: hci_sync: Fix suspend performance regressionLuiz Augusto von Dentz1-10/+14
[ Upstream commit 1fd02d56dae35b08e4cba8e6bd2c2e7ccff68ecc ] This attempts to fix suspend performance when there is no connections by not updating the event mask. Fixes: ef61b6ea1544 ("Bluetooth: Always set event mask on suspend") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08Bluetooth: hci_event: Fix vendor (unknown) opcode status handlingHans de Goede1-0/+11
[ Upstream commit b82a26d8633cc89367fac75beb3ec33061bea44a ] Commit c8992cffbe74 ("Bluetooth: hci_event: Use of a function table to handle Command Complete") was (presumably) meant to only refactor things without any functional changes. But it does have one undesirable side-effect, before *status would always be set to skb->data[0] and it might be overridden by some of the opcode specific handling. While now it always set by the opcode specific handlers. This means that if the opcode is not known *status does not get set any more at all! This behavior change has broken bluetooth support for BCM4343A0 HCIs, the hci_bcm.c code tries to configure UART attached HCIs at a higher baudraute using vendor specific opcodes. The BCM4343A0 does not support this and this used to simply fail: [ 25.646442] Bluetooth: hci0: BCM: failed to write clock (-56) [ 25.646481] Bluetooth: hci0: Failed to set baudrate After which things would continue with the initial baudraute. But now that hci_cmd_complete_evt() no longer sets status for unknown opcodes *status is left at 0. This causes the hci_bcm.c code to think the baudraute has been changed on the HCI side and to also adjust the UART baudrate, after which communication with the HCI is broken, leading to: [ 28.579042] Bluetooth: hci0: command 0x0c03 tx timeout [ 36.961601] Bluetooth: hci0: BCM: Reset failed (-110) And non working bluetooth. Fix this by restoring the previous default "*status = skb->data[0]" handling for unknown opcodes. Fixes: c8992cffbe74 ("Bluetooth: hci_event: Use of a function table to handle Command Complete") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08wifi: cfg80211: debugfs: fix return type in ht40allow_map_read()Dan Carpenter1-1/+2
[ Upstream commit d776763f48084926b5d9e25507a3ddb7c9243d5e ] The return type is supposed to be ssize_t, which is signed long, but "r" was declared as unsigned int. This means that on 64 bit systems we return positive values instead of negative error codes. Fixes: 80a3511d70e8 ("cfg80211: add debugfs HT40 allow map") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YutvOQeJm0UjLhwU@kili Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08ip_tunnel: Respect tunnel key's "flow_flags" in IP tunnelsEyal Birger2-4/+5
[ Upstream commit 7ec9fce4b31604f8415136a4c07f7dc8ad431aec ] Commit 451ef36bd229 ("ip_tunnels: Add new flow flags field to ip_tunnel_key") added a "flow_flags" member to struct ip_tunnel_key which was later used by the commit in the fixes tag to avoid dropping packets with sources that aren't locally configured when set in bpf_set_tunnel_key(). VXLAN and GENEVE were made to respect this flag, ip tunnels like IPIP and GRE were not. This commit fixes this omission by making ip_tunnel_init_flow() receive the flow flags from the tunnel key in the relevant collect_md paths. Fixes: b8fff748521c ("bpf: Set flow flag to allow any source IP in bpf_tunnel_key") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Paul Chaignon <paul@isovalent.com> Link: https://lore.kernel.org/bpf/20220818074118.726639-1-eyal.birger@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08skmsg: Fix wrong last sg check in sk_msg_recvmsg()Liu Jian1-2/+2
[ Upstream commit 583585e48d965338e73e1eb383768d16e0922d73 ] Fix one kernel NULL pointer dereference as below: [ 224.462334] Call Trace: [ 224.462394] __tcp_bpf_recvmsg+0xd3/0x380 [ 224.462441] ? sock_has_perm+0x78/0xa0 [ 224.462463] tcp_bpf_recvmsg+0x12e/0x220 [ 224.462494] inet_recvmsg+0x5b/0xd0 [ 224.462534] __sys_recvfrom+0xc8/0x130 [ 224.462574] ? syscall_trace_enter+0x1df/0x2e0 [ 224.462606] ? __do_page_fault+0x2de/0x500 [ 224.462635] __x64_sys_recvfrom+0x24/0x30 [ 224.462660] do_syscall_64+0x5d/0x1d0 [ 224.462709] entry_SYSCALL_64_after_hwframe+0x65/0xca In commit 9974d37ea75f ("skmsg: Fix invalid last sg check in sk_msg_recvmsg()"), we change last sg check to sg_is_last(), but in sockmap redirection case (without stream_parser/stream_verdict/ skb_verdict), we did not mark the end of the scatterlist. Check the sk_msg_alloc, sk_msg_page_add, and bpf_msg_push_data functions, they all do not mark the end of sg. They are expected to use sg.end for end judgment. So the judgment of '(i != msg_rx->sg.end)' is added back here. Fixes: 9974d37ea75f ("skmsg: Fix invalid last sg check in sk_msg_recvmsg()") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20220809094915.150391-1-liujian56@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-08xsk: Fix corrupted packets for XDP_SHARED_UMEMMagnus Karlsson1-6/+10
[ Upstream commit 58ca14ed98c87cfe0d1408cc65a9745d9e9b7a56 ] Fix an issue in XDP_SHARED_UMEM mode together with aligned mode where packets are corrupted for the second and any further sockets bound to the same umem. In other words, this does not affect the first socket bound to the umem. The culprit for this bug is that the initialization of the DMA addresses for the pre-populated xsk buffer pool entries was not performed for any socket but the first one bound to the umem. Only the linear array of DMA addresses was populated. Fix this by populating the DMA addresses in the xsk buffer pool for every socket bound to the same umem. Fixes: 94033cd8e73b8 ("xsk: Optimize for aligned case") Reported-by: Alasdair McWilliam <alasdair.mcwilliam@outlook.com> Reported-by: Intrusion Shield Team <dnevil@intrusion.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Alasdair McWilliam <alasdair.mcwilliam@outlook.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/xdp-newbies/6205E10C-292E-4995-9D10-409649354226@outlook.com/ Link: https://lore.kernel.org/bpf/20220812113259.531-1-magnus.karlsson@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-05net: neigh: don't call kfree_skb() under spin_lock_irqsave()Yang Yingliang1-2/+8
commit d5485d9dd24e1d04e5509916515260186eb1455c upstream. It is not allowed to call kfree_skb() from hardware interrupt context or with interrupts being disabled. So add all skb to a tmp list, then free them after spin_unlock_irqrestore() at once. Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop") Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-05net/af_packet: check len when min_header_len equals to 0Zhengchao Shao1-2/+2
commit dc633700f00f726e027846a318c5ffeb8deaaeda upstream. User can use AF_PACKET socket to send packets with the length of 0. When min_header_len equals to 0, packet_snd will call __dev_queue_xmit to send packets, and sock->type can be any type. Reported-by: syzbot+5ea725c25d06fb9114c4@syzkaller.appspotmail.com Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-05netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to yGeert Uytterhoeven1-1/+0
[ Upstream commit aa5762c34213aba7a72dc58e70601370805fa794 ] NF_CONNTRACK_PROCFS was marked obsolete in commit 54b07dca68557b09 ("netfilter: provide config option to disable ancient procfs parts") in v3.3. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-05neigh: fix possible DoS due to net iface start/stop loopDenis V. Lunev1-8/+17
[ Upstream commit 66ba215cb51323e4e55e38fd5f250e0fae0cbc94 ] Normal processing of ARP request (usually this is Ethernet broadcast packet) coming to the host is looking like the following: * the packet comes to arp_process() call and is passed through routing procedure * the request is put into the queue using pneigh_enqueue() if corresponding ARP record is not local (common case for container records on the host) * the request is processed by timer (within 80 jiffies by default) and ARP reply is sent from the same arp_process() using NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside pneigh_enqueue()) And here the problem comes. Linux kernel calls pneigh_queue_purge() which destroys the whole queue of ARP requests on ANY network interface start/stop event through __neigh_ifdown(). This is actually not a problem within the original world as network interface start/stop was accessible to the host 'root' only, which could do more destructive things. But the world is changed and there are Linux containers available. Here container 'root' has an access to this API and could be considered as untrusted user in the hosting (container's) world. Thus there is an attack vector to other containers on node when container's root will endlessly start/stop interfaces. We have observed similar situation on a real production node when docker container was doing such activity and thus other containers on the node become not accessible. The patch proposed doing very simple thing. It drops only packets from the same namespace in the pneigh_queue_purge() where network interface state change is detected. This is enough to prevent the problem for the whole node preserving original semantics of the code. v2: - do del_timer_sync() if queue is empty after pneigh_queue_purge() v3: - rebase to net tree Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@kernel.org> Cc: Yajun Deng <yajun.deng@linux.dev> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Christian Brauner <brauner@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Konstantin Khorenko <khorenko@virtuozzo.com> Cc: kernel@openvz.org Cc: devel@openvz.org Investigated-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-05bpf: Don't redirect packets with invalid pkt_lenZhengchao Shao2-0/+4
commit fd1894224407c484f652ad456e1ce423e89bb3eb upstream. Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any skbs, that is, the flow->head is null. The root cause, as the [2] says, is because that bpf_prog_test_run_skb() run a bpf prog which redirects empty skbs. So we should determine whether the length of the packet modified by bpf prog or others like bpf_prog_test is valid before forwarding it directly. LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5 LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220715115559.139691-1-shaozhengchao@huawei.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-05net: fix refcount bug in sk_psock_get (2)Hawkins Jiawei1-1/+3
commit 2a0133723f9ebeb751cfce19f74ec07e108bef1f upstream. Syzkaller reports refcount bug as follows: ------------[ cut here ]------------ refcount_t: saturated; leaking memory. WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19 Modules linked in: CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0 <TASK> __refcount_add_not_zero include/linux/refcount.h:163 [inline] __refcount_inc_not_zero include/linux/refcount.h:227 [inline] refcount_inc_not_zero include/linux/refcount.h:245 [inline] sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439 tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091 tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983 tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057 tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659 tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682 sk_backlog_rcv include/net/sock.h:1061 [inline] __release_sock+0x134/0x3b0 net/core/sock.c:2849 release_sock+0x54/0x1b0 net/core/sock.c:3404 inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909 __sys_shutdown_sock net/socket.c:2331 [inline] __sys_shutdown_sock net/socket.c:2325 [inline] __sys_shutdown+0xf1/0x1b0 net/socket.c:2343 __do_sys_shutdown net/socket.c:2351 [inline] __se_sys_shutdown net/socket.c:2349 [inline] __x64_sys_shutdown+0x50/0x70 net/socket.c:2349 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> During SMC fallback process in connect syscall, kernel will replaces TCP with SMC. In order to forward wakeup smc socket waitqueue after fallback, kernel will sets clcsk->sk_user_data to origin smc socket in smc_fback_replace_callbacks(). Later, in shutdown syscall, kernel will calls sk_psock_get(), which treats the clcsk->sk_user_data as psock type, triggering the refcnt warning. So, the root cause is that smc and psock, both will use sk_user_data field. So they will mismatch this field easily. This patch solves it by using another bit(defined as SK_USER_DATA_PSOCK) in PTRMASK, to mark whether sk_user_data points to a psock object or not. This patch depends on a PTRMASK introduced in commit f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data pointer on clone if tagged"). For there will possibly be more flags in the sk_user_data field, this patch also refactor sk_user_data flags code to be more generic to improve its maintainability. Reported-and-tested-by: syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-05Bluetooth: L2CAP: Fix build errors in some archsLuiz Augusto von Dentz1-5/+5
commit b840304fb46cdf7012722f456bce06f151b3e81b upstream. This attempts to fix the follow errors: In function 'memcmp', inlined from 'bacmp' at ./include/net/bluetooth/bluetooth.h:347:9, inlined from 'l2cap_global_chan_by_psm' at net/bluetooth/l2cap_core.c:2003:15: ./include/linux/fortify-string.h:44:33: error: '__builtin_memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread] 44 | #define __underlying_memcmp __builtin_memcmp | ^ ./include/linux/fortify-string.h:420:16: note: in expansion of macro '__underlying_memcmp' 420 | return __underlying_memcmp(p, q, size); | ^~~~~~~~~~~~~~~~~~~ In function 'memcmp', inlined from 'bacmp' at ./include/net/bluetooth/bluetooth.h:347:9, inlined from 'l2cap_global_chan_by_psm' at net/bluetooth/l2cap_core.c:2004:15: ./include/linux/fortify-string.h:44:33: error: '__builtin_memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread] 44 | #define __underlying_memcmp __builtin_memcmp | ^ ./include/linux/fortify-string.h:420:16: note: in expansion of macro '__underlying_memcmp' 420 | return __underlying_memcmp(p, q, size); | ^~~~~~~~~~~~~~~~~~~ Fixes: 332f1795ca20 ("Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-31rxrpc: Fix locking in rxrpc's sendmsgDavid Howells2-39/+57
[ Upstream commit b0f571ecd7943423c25947439045f0d352ca3dbf ] Fix three bugs in the rxrpc's sendmsg implementation: (1) rxrpc_new_client_call() should release the socket lock when returning an error from rxrpc_get_call_slot(). (2) rxrpc_wait_for_tx_window_intr() will return without the call mutex held in the event that we're interrupted by a signal whilst waiting for tx space on the socket or relocking the call mutex afterwards. Fix this by: (a) moving the unlock/lock of the call mutex up to rxrpc_send_data() such that the lock is not held around all of rxrpc_wait_for_tx_window*() and (b) indicating to higher callers whether we're return with the lock dropped. Note that this means recvmsg() will not block on this call whilst we're waiting. (3) After dropping and regaining the call mutex, rxrpc_send_data() needs to go and recheck the state of the tx_pending buffer and the tx_total_len check in case we raced with another sendmsg() on the same call. Thinking on this some more, it might make sense to have different locks for sendmsg() and recvmsg(). There's probably no need to make recvmsg() wait for sendmsg(). It does mean that recvmsg() can return MSG_EOR indicating that a call is dead before a sendmsg() to that call returns - but that can currently happen anyway. Without fix (2), something like the following can be induced: WARNING: bad unlock balance detected! 5.16.0-rc6-syzkaller #0 Not tainted ------------------------------------- syz-executor011/3597 is trying to release lock (&call->user_mutex) at: [<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748 but there are no more locks to release! other info that might help us debug this: no locks held by syz-executor011/3597. ... Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline] __lock_release kernel/locking/lockdep.c:5306 [inline] lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657 __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900 rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748 rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae [Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this] Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Reported-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com cc: Hawkins Jiawei <yin31149@gmail.com> cc: Khalid Masum <khalid.masum.92@gmail.com> cc: Dan Carpenter <dan.carpenter@oracle.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31net: Fix a data-race around sysctl_somaxconn.Kuniyuki Iwashima1-1/+1
[ Upstream commit 3c9ba81d72047f2e81bb535d42856517b613aba7 ] While reading sysctl_somaxconn, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31net: Fix a data-race around netdev_unregister_timeout_secs.Kuniyuki Iwashima1-1/+1
[ Upstream commit 05e49cfc89e4f325eebbc62d24dd122e55f94c23 ] While reading netdev_unregister_timeout_secs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 5aa3afe107d9 ("net: make unregister netdev warning timeout configurable") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31net: Fix data-races around sysctl_devconf_inherit_init_net.Kuniyuki Iwashima2-9/+12
[ Upstream commit a5612ca10d1aa05624ebe72633e0c8c792970833 ] While reading sysctl_devconf_inherit_init_net, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>