summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-02-09gve: Recording rx queue before sending to napiTao Liu1-0/+1
This caused a significant performance degredation when using generic XDP with multiple queues. Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support") Signed-off-by: Tao Liu <xliutaox@google.com> Link: https://lore.kernel.org/r/20220207175901.2486596-1-jeroendb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08net: phy: marvell: Fix RGMII Tx/Rx delays setting in 88e1121-compatible PHYsPavel Parkhomenko1-4/+6
It is mandatory for a software to issue a reset upon modifying RGMII Receive Timing Control and RGMII Transmit Timing Control bit fields of MAC Specific Control register 2 (page 2, register 21) otherwise the changes won't be perceived by the PHY (the same is applicable for a lot of other registers). Not setting the RGMII delays on the platforms that imply it' being done on the PHY side will consequently cause the traffic loss. We discovered that the denoted soft-reset is missing in the m88e1121_config_aneg() method for the case if the RGMII delays are modified but the MDIx polarity isn't changed or the auto-negotiation is left enabled, thus causing the traffic loss on our platform with Marvell Alaska 88E1510 installed. Let's fix that by issuing the soft-reset if the delays have been actually set in the m88e1121_config_aneg_rgmii_delays() method. Cc: stable@vger.kernel.org Fixes: d6ab93364734 ("net: phy: marvell: Avoid unnecessary soft reset") Signed-off-by: Pavel Parkhomenko <Pavel.Parkhomenko@baikalelectronics.ru> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Link: https://lore.kernel.org/r/20220205203932.26899-1-Pavel.Parkhomenko@baikalelectronics.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-07net/smc: use GFP_ATOMIC allocation in smc_pnet_add_eth()Eric Dumazet1-1/+1
My last patch moved the netdev_tracker_alloc() call to a section protected by a write_lock(). I should have replaced GFP_KERNEL with GFP_ATOMIC to avoid the infamous: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:256 Fixes: 28f922213886 ("net/smc: fix ref_tracker issue in smc_pnet_add()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-06net/smc: fix ref_tracker issue in smc_pnet_add()Eric Dumazet1-3/+5
I added the netdev_tracker_alloc() right after ndev was stored into the newly allocated object: new_pe->ndev = ndev; if (ndev) netdev_tracker_alloc(ndev, &new_pe->dev_tracker, GFP_KERNEL); But I missed that later, we could end up freeing new_pe, then calling dev_put(ndev) to release the reference on ndev. The new_pe->dev_tracker would not be freed. To solve this issue, move the netdev_tracker_alloc() call to the point we know for sure new_pe will be kept. syzbot report (on net-next tree, but the bug is present in net tree) WARNING: CPU: 0 PID: 6019 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 0 PID: 6019 Comm: syz-executor.3 Not tainted 5.17.0-rc2-syzkaller-00650-g5a8fb33e5305 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d f4 70 a0 09 31 ff 89 de e8 4d bc 99 fd 84 db 75 e0 e8 64 b8 99 fd 48 c7 c7 20 0c 06 8a c6 05 d4 70 a0 09 01 e8 9e 4e 28 05 <0f> 0b eb c4 e8 48 b8 99 fd 0f b6 1d c3 70 a0 09 31 ff 89 de e8 18 RSP: 0018:ffffc900043b7400 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff815fb318 RDI: fffff52000876e72 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815f507e R11: 0000000000000000 R12: 1ffff92000876e85 R13: 0000000000000000 R14: ffff88805c1c6600 R15: 0000000000000000 FS: 00007f1ef6feb700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2d02b000 CR3: 00000000223f4000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] ref_tracker_free+0x53f/0x6c0 lib/ref_tracker.c:119 netdev_tracker_free include/linux/netdevice.h:3867 [inline] dev_put_track include/linux/netdevice.h:3884 [inline] dev_put_track include/linux/netdevice.h:3880 [inline] dev_put include/linux/netdevice.h:3910 [inline] smc_pnet_add_eth net/smc/smc_pnet.c:399 [inline] smc_pnet_enter net/smc/smc_pnet.c:493 [inline] smc_pnet_add+0x5fc/0x15f0 net/smc/smc_pnet.c:556 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: b60645248af3 ("net/smc: add net device tracker to struct smc_pnetentry") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-06net: phy: marvell: Fix MDI-x polarity setting in 88e1118-compatible PHYsPavel Parkhomenko1-4/+3
When setting up autonegotiation for 88E1118R and compatible PHYs, a software reset of PHY is issued before setting up polarity. This is incorrect as changes of MDI Crossover Mode bits are disruptive to the normal operation and must be followed by a software reset to take effect. Let's patch m88e1118_config_aneg() to fix the issue mentioned before by invoking software reset of the PHY just after setting up MDI-x polarity. Fixes: 605f196efbf8 ("phy: Add support for Marvell 88E1118 PHY") Signed-off-by: Pavel Parkhomenko <Pavel.Parkhomenko@baikalelectronics.ru> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Suggested-by: Andrew Lunn <andrew@lunn.ch> Cc: stable@vger.kernel.org Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: mscc: ocelot: fix all IP traffic getting trapped to CPU with PTP over IPVladimir Oltean1-0/+8
The filters for the PTP trap keys are incorrectly configured, in the sense that is2_entry_set() only looks at trap->key.ipv4.dport or trap->key.ipv6.dport if trap->key.ipv4.proto or trap->key.ipv6.proto is set to IPPROTO_TCP or IPPROTO_UDP. But we don't do that, so is2_entry_set() goes through the "else" branch of the IP protocol check, and ends up installing a rule for "Any IP protocol match" (because msk is also 0). The UDP port is ignored. This means that when we run "ptp4l -i swp0 -4", all IP traffic is trapped to the CPU, which hinders bridging. Fix this by specifying the IP protocol in the VCAP IS2 filters for PTP over UDP. Fixes: 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05tcp: take care of mixed splice()/sendmsg(MSG_ZEROCOPY) caseEric Dumazet1-14/+19
syzbot found that mixing sendpage() and sendmsg(MSG_ZEROCOPY) calls over the same TCP socket would again trigger the infamous warning in inet_sock_destruct() WARN_ON(sk_forward_alloc_get(sk)); While Talal took into account a mix of regular copied data and MSG_ZEROCOPY one in the same skb, the sendpage() path has been forgotten. We want the charging to happen for sendpage(), because pages could be coming from a pipe. What is missing is the downgrading of pure zerocopy status to make sure sk_forward_alloc will stay synced. Add tcp_downgrade_zcopy_pure() helper so that we can use it from the two callers. Fixes: 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Talal Ahmad <talalahmad@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Link: https://lore.kernel.org/r/20220203225547.665114-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski7-27/+61
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Don't refresh timeout for SCTP flows in CLOSED state. 2) Don't allow access to transport header if fragment offset is set on. 3) Reinitialize internal conntrack state for retransmitted TCP syn-ack packet. 4) Update MAINTAINER file to add the Netfilter group tree. Moving forward, Florian Westphal has access to this tree so he can also send pull requests. 5) Set on IPS_HELPER for entries created via ctnetlink, otherwise NAT might zap it. All patches from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: ctnetlink: disable helper autoassign MAINTAINERS: netfilter: update git links netfilter: conntrack: re-init state for retransmitted syn-ack netfilter: conntrack: move synack init code to helper netfilter: nft_payload: don't allow th access for fragments netfilter: conntrack: don't refresh sctp entries in closed state ==================== Link: https://lore.kernel.org/r/20220204151903.320786-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04ixgbevf: Require large buffers for build_skb on 82599VFSamuel Mendoza-Jonas1-6/+7
From 4.17 onwards the ixgbevf driver uses build_skb() to build an skb around new data in the page buffer shared with the ixgbe PF. This uses either a 2K or 3K buffer, and offsets the DMA mapping by NET_SKB_PAD + NET_IP_ALIGN. When using a smaller buffer RXDCTL is set to ensure the PF does not write a full 2K bytes into the buffer, which is actually 2K minus the offset. However on the 82599 virtual function, the RXDCTL mechanism is not available. The driver attempts to work around this by using the SET_LPE mailbox method to lower the maximm frame size, but the ixgbe PF driver ignores this in order to keep the PF and all VFs in sync[0]. This means the PF will write up to the full 2K set in SRRCTL, causing it to write NET_SKB_PAD + NET_IP_ALIGN bytes past the end of the buffer. With 4K pages split into two buffers, this means it either writes NET_SKB_PAD + NET_IP_ALIGN bytes past the first buffer (and into the second), or NET_SKB_PAD + NET_IP_ALIGN bytes past the end of the DMA mapping. Avoid this by only enabling build_skb when using "large" buffers (3K). These are placed in each half of an order-1 page, preventing the PF from writing past the end of the mapping. [0]: Technically it only ever raises the max frame size, see ixgbe_set_vf_lpe() in ixgbe_sriov.c Fixes: f15c5ba5b6cd ("ixgbevf: add support for using order 1 pages to receive large frames") Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04netfilter: ctnetlink: disable helper autoassignFlorian Westphal2-2/+3
When userspace, e.g. conntrackd, inserts an entry with a specified helper, its possible that the helper is lost immediately after its added: ctnetlink_create_conntrack -> nf_ct_helper_ext_add + assign helper -> ctnetlink_setup_nat -> ctnetlink_parse_nat_setup -> parse_nat_setup -> nfnetlink_parse_nat_setup -> nf_nat_setup_info -> nf_conntrack_alter_reply -> __nf_ct_try_assign_helper ... and __nf_ct_try_assign_helper will zero the helper again. Set IPS_HELPER bit to bypass auto-assign logic, its unwanted, just like when helper is assigned via ruleset. Dropped old 'not strictly necessary' comment, it referred to use of rcu_assign_pointer() before it got replaced by RCU_INIT_POINTER(). NB: Fixes tag intentionally incorrect, this extends the referenced commit, but this change won't build without IPS_HELPER introduced there. Fixes: 6714cf5465d280 ("netfilter: nf_conntrack: fix explicit helper attachment and NAT") Reported-by: Pham Thanh Tuyen <phamtyn@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04MAINTAINERS: netfilter: update git linksFlorian Westphal1-2/+2
nf and nf-next have a new location. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: conntrack: re-init state for retransmitted syn-ackFlorian Westphal1-0/+12
TCP conntrack assumes that a syn-ack retransmit is identical to the previous syn-ack. This isn't correct and causes stuck 3whs in some more esoteric scenarios. tcpdump to illustrate the problem: client > server: Flags [S] seq 1365731894, win 29200, [mss 1460,sackOK,TS val 2083035583 ecr 0,wscale 7] server > client: Flags [S.] seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215367629 ecr 2082921663] Note the invalid/outdated synack ack number. Conntrack marks this syn-ack as out-of-window/invalid, but it did initialize the reply direction parameters based on this packets content. client > server: Flags [S] seq 1365731894, win 29200, [mss 1460,sackOK,TS val 2083036623 ecr 0,wscale 7] ... retransmit... server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215368644 ecr 2082921663] and another bogus synack. This repeats, then client re-uses for a new attempt: client > server: Flags [S], seq 2375731741, win 29200, [mss 1460,sackOK,TS val 2083100223 ecr 0,wscale 7] server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215430754 ecr 2082921663] ... but still gets a invalid syn-ack. This repeats until: server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215437785 ecr 2082921663] server > client: Flags [R.], seq 145824454, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215443451 ecr 2082921663] client > server: Flags [S], seq 2375731741, win 29200, [mss 1460,sackOK,TS val 2083115583 ecr 0,wscale 7] server > client: Flags [S.], seq 162602410, ack 2375731742, win 65535, [mss 8952,wscale 5,TS val 3215445754 ecr 2083115583] This syn-ack has the correct ack number, but conntrack flags it as invalid: The internal state was created from the first syn-ack seen so the sequence number of the syn-ack is treated as being outside of the announced window. Don't assume that retransmitted syn-ack is identical to previous one. Treat it like the first syn-ack and reinit state. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: conntrack: move synack init code to helperFlorian Westphal1-18/+29
It seems more readable to use a common helper in the followup fix rather than copypaste or goto. No functional change intended. The function is only called for syn-ack or syn in repy direction in case of simultaneous open. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: nft_payload: don't allow th access for fragmentsFlorian Westphal2-5/+6
Loads relative to ->thoff naturally expect that this points to the transport header, but this is only true if pkt->fragoff == 0. This has little effect for rulesets with connection tracking/nat because these enable ip defra. For other rulesets this prevents false matches. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: conntrack: don't refresh sctp entries in closed stateFlorian Westphal1-0/+9
Vivek Thrivikraman reported: An SCTP server application which is accessed continuously by client application. When the session disconnects the client retries to establish a connection. After restart of SCTP server application the session is not established because of stale conntrack entry with connection state CLOSED as below. (removing this entry manually established new connection): sctp 9 CLOSED src=10.141.189.233 [..] [ASSURED] Just skip timeout update of closed entries, we don't want them to stay around forever. Reported-and-tested-by: Vivek Thrivikraman <vivek.thrivikraman@est.tech> Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1579 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04net: sparx5: Fix get_stat64 crash in tcpdumpSteen Hegelund1-1/+1
This problem was found with Sparx5 when the tcpdump tool requests the do_get_stats64 (sparx5_get_stats64) statistic. The portstats pointer was incorrectly incremented when fetching priority based statistics. Fixes: af4b11022e2d (net: sparx5: add ethtool configuration and statistics support) Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Link: https://lore.kernel.org/r/20220203102900.528987-1-steen.hegelund@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04gcc-plugins/stackleak: Use noinstr in favor of notraceKees Cook1-3/+2
While the stackleak plugin was already using notrace, objtool is now a bit more picky. Update the notrace uses to noinstr. Silences the following objtool warnings when building with: CONFIG_DEBUG_ENTRY=y CONFIG_STACK_VALIDATION=y CONFIG_VMLINUX_VALIDATION=y CONFIG_GCC_PLUGIN_STACKLEAK=y vmlinux.o: warning: objtool: do_syscall_64()+0x9: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: do_int80_syscall_32()+0x9: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: exc_general_protection()+0x22: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: fixup_bad_iret()+0x20: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: do_machine_check()+0x27: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: .text+0x5346e: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x143: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x10eb: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x17f9: call to stackleak_erase() leaves .noinstr.text section Note that the plugin's addition of calls to stackleak_track_stack() from noinstr functions is expected to be safe, as it isn't runtime instrumentation and is self-contained. Cc: Alexander Popov <alex.popov@linux.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-04Merge tag 'net-5.17-rc3' of ↵Linus Torvalds78-453/+877
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, netfilter, and ieee802154. Current release - regressions: - Partially revert "net/smc: Add netlink net namespace support", fix uABI breakage - netfilter: - nft_ct: fix use after free when attaching zone template - nft_byteorder: track register operations Previous releases - regressions: - ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback - phy: qca8081: fix speeds lower than 2.5Gb/s - sched: fix use-after-free in tc_new_tfilter() Previous releases - always broken: - tcp: fix mem under-charging with zerocopy sendmsg() - tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data() - neigh: do not trigger immediate probes on NUD_FAILED from neigh_managed_work, avoid a deadlock - bpf: use VM_MAP instead of VM_ALLOC for ringbuf, avoid KASAN false-positives - netfilter: nft_reject_bridge: fix for missing reply from prerouting - smc: forward wakeup to smc socket waitqueue after fallback - ieee802154: - return meaningful error codes from the netlink helpers - mcr20a: fix lifs/sifs periods - at86rf230, ca8210: stop leaking skbs on error paths - macsec: add missing un-offload call for NETDEV_UNREGISTER of parent - ax25: add refcount in ax25_dev to avoid UAF bugs - eth: mlx5e: - fix SFP module EEPROM query - fix broken SKB allocation in HW-GRO - IPsec offload: fix tunnel mode crypto for non-TCP/UDP flows - eth: amd-xgbe: - fix skb data length underflow - ensure reset of the tx_timer_active flag, avoid Tx timeouts - eth: stmmac: fix runtime pm use in stmmac_dvr_remove() - eth: e1000e: handshake with CSME starts from Alder Lake platforms" * tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits) ax25: fix reference count leaks of ax25_dev net: stmmac: ensure PTP time register reads are consistent net: ipa: request IPA register values be retained dt-bindings: net: qcom,ipa: add optional qcom,qmp property tools/resolve_btfids: Do not print any commands when building silently bpf: Use VM_MAP instead of VM_ALLOC for ringbuf net, neigh: Do not trigger immediate probes on NUD_FAILED from neigh_managed_work tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data() net: sparx5: do not refer to skb after passing it on Partially revert "net/smc: Add netlink net namespace support" net/mlx5e: Avoid field-overflowing memcpy() net/mlx5e: Use struct_group() for memcpy() region net/mlx5e: Avoid implicit modify hdr for decap drop rule net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic net/mlx5e: Don't treat small ceil values as unlimited in HTB offload net/mlx5: E-Switch, Fix uninitialized variable modact net/mlx5e: Fix handling of wrong devices during bond netevent net/mlx5e: Fix broken SKB allocation in HW-GRO net/mlx5e: Fix wrong calculation of header index in HW_GRO ...
2022-02-04Merge tag 'selinux-pr-20220203' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fix from Paul Moore: "One small SELinux patch to ensure that a policy structure field is properly reset after freeing so that we don't inadvertently do a double-free on certain error conditions" * tag 'selinux-pr-20220203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: fix double free of cond_list on error paths
2022-02-04Merge tag 'linux-kselftest-fixes-5.17-rc3' of ↵Linus Torvalds15-177/+209
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: "Important fixes to several tests and documentation clarification on running mainline kselftest on stable releases. A few notable fixes: - fix kselftest run hang due to child processes that haven't been terminated. Fix signals all child processes - fix false pass/fail results from vdso_test_abi, openat2, mincore - build failures when using -j (multiple jobs) option - exec test build failure due to incorrect build rule for a run-time created "pipe" - zram test fixes related to interaction with zram-generator to make sure zram test to coordinate deleted with zram-generator - zram test compression ratio calculation fix and skipping max_comp_streams. - increasing rtc test timeout - cpufreq test to write test results to stdout which will necessary on automated test systems" * tag 'linux-kselftest-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kselftest: Fix vdso_test_abi return status selftests: skip mincore.check_file_mmap when fs lacks needed support selftests: openat2: Skip testcases that fail with EOPNOTSUPP selftests: openat2: Add missing dependency in Makefile selftests: openat2: Print also errno in failure messages selftests: futex: Use variable MAKE instead of make selftests/exec: Remove pipe from TEST_GEN_FILES selftests/zram: Adapt the situation that /dev/zram0 is being used selftests/zram01.sh: Fix compression ratio calculation selftests/zram: Skip max_comp_streams interface on newer kernel docs/kselftest: clarify running mainline tests on stables kselftest: signal all child processes selftests: cpufreq: Write test output to stdout as well selftests: rtc: Increase test timeout so that all tests run
2022-02-04ax25: fix reference count leaks of ax25_devDuoming Zhou4-19/+41
The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") introduces refcount into ax25_dev, but there are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(), ax25_rt_add(), ax25_rt_del() and ax25_rt_opt(). This patch uses ax25_dev_put() and adjusts the position of ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev. Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04net: stmmac: ensure PTP time register reads are consistentYannick Vignon1-7/+12
Even if protected from preemption and interrupts, a small time window remains when the 2 register reads could return inconsistent values, each time the "seconds" register changes. This could lead to an about 1-second error in the reported time. Add logic to ensure the "seconds" and "nanoseconds" values are consistent. Fixes: 92ba6888510c ("stmmac: add the support for PTP hw clock driver") Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20220203160025.750632-1-yannick.vignon@oss.nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski7-236/+11
Daniel Borkmann says: ==================== pull-request: bpf 2022-02-03 We've added 6 non-merge commits during the last 10 day(s) which contain a total of 7 files changed, 11 insertions(+), 236 deletions(-). The main changes are: 1) Fix BPF ringbuf to allocate its area with VM_MAP instead of VM_ALLOC flag which otherwise trips over KASAN, from Hou Tao. 2) Fix unresolved symbol warning in resolve_btfids due to LSM callback rename, from Alexei Starovoitov. 3) Fix a possible race in inc_misses_counter() when IRQ would trigger during counter update, from He Fengqing. 4) Fix tooling infra for cross-building with clang upon probing whether gcc provides the standard libraries, from Jean-Philippe Brucker. 5) Fix silent mode build for resolve_btfids, from Nathan Chancellor. 6) Drop unneeded and outdated lirc.h header copy from tooling infra as BPF does not require it anymore, from Sean Young. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: tools/resolve_btfids: Do not print any commands when building silently bpf: Use VM_MAP instead of VM_ALLOC for ringbuf tools: Ignore errors from `which' when searching a GCC toolchain tools headers UAPI: remove stale lirc.h bpf: Fix possible race in inc_misses_counter bpf: Fix renaming task_getsecid_subj->current_getsecid_subj. ==================== Link: https://lore.kernel.org/r/20220203155815.25689-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03printk: Fix incorrect __user type in proc_dointvec_minmax_sysadmin()Mickaël Salaün1-1/+1
The move of proc_dointvec_minmax_sysadmin() from kernel/sysctl.c to kernel/printk/sysctl.c introduced an incorrect __user attribute to the buffer argument. I spotted this change in [1] as well as the kernel test robot. Revert this change to please sparse: kernel/printk/sysctl.c:20:51: warning: incorrect type in argument 3 (different address spaces) kernel/printk/sysctl.c:20:51: expected void * kernel/printk/sysctl.c:20:51: got void [noderef] __user *buffer Fixes: faaa357a55e0 ("printk: move printk sysctl to printk/sysctl.c") Link: https://lore.kernel.org/r/20220104155024.48023-2-mic@digikod.net [1] Reported-by: kernel test robot <lkp@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com> Link: https://lore.kernel.org/r/20220203145029.272640-1-mic@digikod.net Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03Revert "module, async: async_synchronize_full() on module init iff async is ↵Igor Pylypiv3-24/+5
used" This reverts commit 774a1221e862b343388347bac9b318767336b20b. We need to finish all async code before the module init sequence is done. In the reverted commit the PF_USED_ASYNC flag was added to mark a thread that called async_schedule(). Then the PF_USED_ASYNC flag was used to determine whether or not async_synchronize_full() needs to be invoked. This works when modprobe thread is calling async_schedule(), but it does not work if module dispatches init code to a worker thread which then calls async_schedule(). For example, PCI driver probing is invoked from a worker thread based on a node where device is attached: if (cpu < nr_cpu_ids) error = work_on_cpu(cpu, local_pci_probe, &ddi); else error = local_pci_probe(&ddi); We end up in a situation where a worker thread gets the PF_USED_ASYNC flag set instead of the modprobe thread. As a result, async_synchronize_full() is not invoked and modprobe completes without waiting for the async code to finish. The issue was discovered while loading the pm80xx driver: (scsi_mod.scan=async) modprobe pm80xx worker ... do_init_module() ... pci_call_probe() work_on_cpu(local_pci_probe) local_pci_probe() pm8001_pci_probe() scsi_scan_host() async_schedule() worker->flags |= PF_USED_ASYNC; ... < return from worker > ... if (current->flags & PF_USED_ASYNC) <--- false async_synchronize_full(); Commit 21c3c5d28007 ("block: don't request module during elevator init") fixed the deadlock issue which the reverted commit 774a1221e862 ("module, async: async_synchronize_full() on module init iff async is used") tried to fix. Since commit 0fdff3ec6d87 ("async, kmod: warn on synchronous request_module() from async workers") synchronous module loading from async is not allowed. Given that the original deadlock issue is fixed and it is no longer allowed to call synchronous request_module() from async we can remove PF_USED_ASYNC flag to make module init consistently invoke async_synchronize_full() unless async module probe is requested. Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Reviewed-by: Changyuan Lyu <changyuanl@google.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03Merge branch 'for-5.17-fixes' of ↵Linus Torvalds2-14/+65
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Eric's fix for a long standing cgroup1 permission issue where it only checks for uid 0 instead of CAP which inadvertently allows unprivileged userns roots to modify release_agent userhelper - Fixes for the fallout from Waiman's recent cpuset work * 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning cgroup-v1: Require capabilities to set release_agent cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask() cgroup/cpuset: Make child cpusets restrict parents on v1 hierarchy
2022-02-03Merge branch 'net-ipa-enable-register-retention'Jakub Kicinski4-0/+70
Alex Elder says: ==================== net: ipa: enable register retention With runtime power management in place, we sometimes need to issue a command to enable retention of IPA register values before power collapse. This requires a new Device Tree property, whose presence will also be used to signal that the command is required. ==================== Link: https://lore.kernel.org/r/20220201150205.468403-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03net: ipa: request IPA register values be retainedAlex Elder3-0/+64
In some cases, the IPA hardware needs to request the always-on subsystem (AOSS) to coordinate with the IPA microcontroller to retain IPA register values at power collapse. This is done by issuing a QMP request to the AOSS microcontroller. A similar request ondoes that request. We must get and hold the "QMP" handle early, because we might get back EPROBE_DEFER for that. But the actual request should be sent while we know the IPA clock is active, and when we know the microcontroller is operational. Fixes: 1aac309d3207 ("net: ipa: use autosuspend") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03dt-bindings: net: qcom,ipa: add optional qcom,qmp propertyAlex Elder1-0/+6
For some systems, the IPA driver must make a request to ensure that its registers are retained across power collapse of the IPA hardware. On such systems, we'll use the existence of the "qcom,qmp" property as a signal that this request is required. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03cgroup/cpuset: Fix "suspicious RCU usage" lockdep warningWaiman Long1-0/+10
It was found that a "suspicious RCU usage" lockdep warning was issued with the rcu_read_lock() call in update_sibling_cpumasks(). It is because the update_cpumasks_hier() function may sleep. So we have to release the RCU lock, call update_cpumasks_hier() and reacquire it afterward. Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks() instead of stating that in the comment. Fixes: 4716909cc5c5 ("cpuset: Track cpusets that use parent's effective_cpus") Signed-off-by: Waiman Long <longman@redhat.com> Tested-by: Phil Auld <pauld@redhat.com> Reviewed-by: Phil Auld <pauld@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-02-03tools/resolve_btfids: Do not print any commands when building silentlyNathan Chancellor1-1/+5
When building with 'make -s', there is some output from resolve_btfids: $ make -sj"$(nproc)" oldconfig prepare MKDIR .../tools/bpf/resolve_btfids/libbpf/ MKDIR .../tools/bpf/resolve_btfids//libsubcmd LINK resolve_btfids Silent mode means that no information should be emitted about what is currently being done. Use the $(silent) variable from Makefile.include to avoid defining the msg macro so that there is no information printed. Fixes: fbbb68de80a4 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220201212503.731732-1-nathan@kernel.org
2022-02-03Revert "mm/gup: small refactoring: simplify try_grab_page()"John Hubbard1-5/+30
This reverts commit 54d516b1d62ff8f17cee2da06e5e4706a0d00b8a That commit did a refactoring that effectively combined fast and slow gup paths (again). And that was again incorrect, for two reasons: a) Fast gup and slow gup get reference counts on pages in different ways and with different goals: see Linus' writeup in commit cd1adf1b63a1 ("Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly""), and b) try_grab_compound_head() also has a specific check for "FOLL_LONGTERM && !is_pinned(page)", that assumes that the caller can fall back to slow gup. This resulted in new failures, as recently report by Will McVicker [1]. But (a) has problems too, even though they may not have been reported yet. So just revert this. Link: https://lore.kernel.org/r/20220131203504.3458775-1-willmcvicker@google.com [1] Fixes: 54d516b1d62f ("mm/gup: small refactoring: simplify try_grab_page()") Reported-and-tested-by: Will McVicker <willmcvicker@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Minchan Kim <minchan@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: stable@vger.kernel.org # 5.15 Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03Merge tag 'mips-fixes-5.17_2' of ↵Linus Torvalds2-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - fix missed change for PTR->PTR_WD conversion - kernel-doc fixes * tag 'mips-fixes-5.17_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: KVM: fix vz.c kernel-doc notation MIPS: octeon: Fix missed PTR->PTR_WD conversion
2022-02-03bpf: Use VM_MAP instead of VM_ALLOC for ringbufHou Tao1-1/+1
After commit 2fd3fb0be1d1 ("kasan, vmalloc: unpoison VM_ALLOC pages after mapping"), non-VM_ALLOC mappings will be marked as accessible in __get_vm_area_node() when KASAN is enabled. But now the flag for ringbuf area is VM_ALLOC, so KASAN will complain out-of-bound access after vmap() returns. Because the ringbuf area is created by mapping allocated pages, so use VM_MAP instead. After the change, info in /proc/vmallocinfo also changes from [start]-[end] 24576 ringbuf_map_alloc+0x171/0x290 vmalloc user to [start]-[end] 24576 ringbuf_map_alloc+0x171/0x290 vmap user Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: syzbot+5ad567a418794b9b5983@syzkaller.appspotmail.com Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220202060158.6260-1-houtao1@huawei.com
2022-02-03net, neigh: Do not trigger immediate probes on NUD_FAILED from ↵Daniel Borkmann2-11/+25
neigh_managed_work syzkaller was able to trigger a deadlock for NTF_MANAGED entries [0]: kworker/0:16/14617 is trying to acquire lock: ffffffff8d4dd370 (&tbl->lock){++-.}-{2:2}, at: ___neigh_create+0x9e1/0x2990 net/core/neighbour.c:652 [...] but task is already holding lock: ffffffff8d4dd370 (&tbl->lock){++-.}-{2:2}, at: neigh_managed_work+0x35/0x250 net/core/neighbour.c:1572 The neighbor entry turned to NUD_FAILED state, where __neigh_event_send() triggered an immediate probe as per commit cd28ca0a3dd1 ("neigh: reduce arp latency") via neigh_probe() given table lock was held. One option to fix this situation is to defer the neigh_probe() back to the neigh_timer_handler() similarly as pre cd28ca0a3dd1. For the case of NTF_MANAGED, this deferral is acceptable given this only happens on actual failure state and regular / expected state is NUD_VALID with the entry already present. The fix adds a parameter to __neigh_event_send() in order to communicate whether immediate probe is allowed or disallowed. Existing call-sites of neigh_event_send() default as-is to immediate probe. However, the neigh_managed_work() disables it via use of neigh_event_send_probe(). [0] <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_deadlock_bug kernel/locking/lockdep.c:2956 [inline] check_deadlock kernel/locking/lockdep.c:2999 [inline] validate_chain kernel/locking/lockdep.c:3788 [inline] __lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5027 lock_acquire kernel/locking/lockdep.c:5639 [inline] lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604 __raw_write_lock_bh include/linux/rwlock_api_smp.h:202 [inline] _raw_write_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:334 ___neigh_create+0x9e1/0x2990 net/core/neighbour.c:652 ip6_finish_output2+0x1070/0x14f0 net/ipv6/ip6_output.c:123 __ip6_finish_output net/ipv6/ip6_output.c:191 [inline] __ip6_finish_output+0x61e/0xe90 net/ipv6/ip6_output.c:170 ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224 dst_output include/net/dst.h:451 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] ndisc_send_skb+0xa99/0x17f0 net/ipv6/ndisc.c:508 ndisc_send_ns+0x3a9/0x840 net/ipv6/ndisc.c:650 ndisc_solicit+0x2cd/0x4f0 net/ipv6/ndisc.c:742 neigh_probe+0xc2/0x110 net/core/neighbour.c:1040 __neigh_event_send+0x37d/0x1570 net/core/neighbour.c:1201 neigh_event_send include/net/neighbour.h:470 [inline] neigh_managed_work+0x162/0x250 net/core/neighbour.c:1574 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307 worker_thread+0x657/0x1110 kernel/workqueue.c:2454 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> Fixes: 7482e3841d52 ("net, neigh: Add NTF_MANAGED flag for managed neighbor entries") Reported-by: syzbot+5239d0e1778a500d477a@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Tested-by: syzbot+5239d0e1778a500d477a@syzkaller.appspotmail.com Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220201193942.5055-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()Eric Dumazet1-0/+2
tcp_shift_skb_data() might collapse three packets into a larger one. P_A, P_B, P_C -> P_ABC Historically, it used a single tcp_skb_can_collapse_to(P_A) call, because it was enough. In commit 85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions"), this call was replaced by a call to tcp_skb_can_collapse(P_A, P_B) But the now needed test over P_C has been missed. This probably broke MPTCP. Then later, commit 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs") added an extra condition to tcp_skb_can_collapse(), but the missing call from tcp_shift_skb_data() is also breaking TCP zerocopy, because P_A and P_C might have different skb_zcopy_pure() status. Fixes: 85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions") Fixes: 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mat Martineau <mathew.j.martineau@linux.intel.com> Cc: Talal Ahmad <talalahmad@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220201184640.756716-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-02Merge tag 'nfsd-5.17-1' of ↵Linus Torvalds2-9/+13
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Notable bug fixes: - Ensure SM_NOTIFY doesn't crash the NFS server host - Ensure NLM locks are cleaned up after client reboot - Fix a leak of internal NFSv4 lease information" * tag 'nfsd-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: nfsd4_setclientid_confirm mistakenly expires confirmed client. lockd: fix failure to cleanup client locks lockd: fix server crash on reboot of client holding lock
2022-02-02Merge tag 'fsnotify_for_v5.17-rc3' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify fix from Jan Kara: "Fix stale file descriptor in copy_event_to_user" * tag 'fsnotify_for_v5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: Fix stale file descriptor in copy_event_to_user()
2022-02-02Merge tag 'linux-kselftest-kunit-fixes-5.17-rc3' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit fixes from Shuah Khan: "A single fix to an error seen on qemu due to a missing import" * tag 'linux-kselftest-kunit-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: tool: Import missing importlib.abc
2022-02-02Merge tag 'pinctrl-v5.17-2' of ↵Linus Torvalds9-105/+101
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Most interesting and urgent is the Intel stuff affecting Chromebooks and laptops. - Fix up group name building on the Intel Thunderbay - Fix interrupt problems on the Intel Cherryview - Fix some pin data on the Sunxi H616 - Fix up the CONFIG_PINCTRL_ST Kconfig sort order as noted during the merge window - Fix an unexpected interrupt problem on the Intel Sunrisepoint - Fix a glitch when updating IRQ flags on all Intel pin controllers - Revert a Zynqmp patch to unify the pin naming, let's find some better solution - Fix some error paths in the Broadcom BCM2835 driver - Fix a Kconfig problem pertaining to the BCM63XX drivers - Fix the regmap support in the Microchip SGPIO driver" * tag 'pinctrl-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: microchip-sgpio: Fix support for regmap pinctrl: bcm63xx: fix unmet dependency on REGMAP for GPIO_REGMAP pinctrl: bcm2835: Fix a few error paths pinctrl: zynqmp: Revert "Unify pin naming" pinctrl: intel: Fix a glitch when updating IRQ flags on a preconfigured line pinctrl: intel: fix unexpected interrupt pinctrl: Place correctly CONFIG_PINCTRL_ST in the Makefile pinctrl: sunxi: Fix H616 I2S3 pin data pinctrl: cherryview: Trigger hwirq0 for interrupt-lines without a mapping pinctrl: thunderbay: rework loops looking for groups names pinctrl: thunderbay: comment process of building functions a bit
2022-02-02net: sparx5: do not refer to skb after passing it onSteen Hegelund1-1/+1
Do not try to use any SKB fields after the packet has been passed up in the receive stack. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Link: https://lore.kernel.org/r/20220202083039.3774851-1-steen.hegelund@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-02selinux: fix double free of cond_list on error pathsVratislav Bendel1-1/+2
On error path from cond_read_list() and duplicate_policydb_cond_list() the cond_list_destroy() gets called a second time in caller functions, resulting in NULL pointer deref. Fix this by resetting the cond_list_len to 0 in cond_list_destroy(), making subsequent calls a noop. Also consistently reset the cond_list pointer to NULL after freeing. Cc: stable@vger.kernel.org Signed-off-by: Vratislav Bendel <vbendel@redhat.com> [PM: fix line lengths in the description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-02-02Partially revert "net/smc: Add netlink net namespace support"Dmitry V. Levin2-8/+5
The change of sizeof(struct smc_diag_linkinfo) by commit 79d39fc503b4 ("net/smc: Add netlink net namespace support") introduced an ABI regression: since struct smc_diag_lgrinfo contains an object of type "struct smc_diag_linkinfo", offset of all subsequent members of struct smc_diag_lgrinfo was changed by that change. As result, applications compiled with the old version of struct smc_diag_linkinfo will receive garbage in struct smc_diag_lgrinfo.role if the kernel implements this new version of struct smc_diag_linkinfo. Fix this regression by reverting the part of commit 79d39fc503b4 that changes struct smc_diag_linkinfo. After all, there is SMC_GEN_NETLINK interface which is good enough, so there is probably no need to touch the smc_diag ABI in the first place. Fixes: 79d39fc503b4 ("net/smc: Add netlink net namespace support") Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20220202030904.GA9742@altlinux.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-02Merge tag 'mlx5-fixes-2022-02-01' of ↵David S. Miller17-55/+102
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2022-02-01 This series provides bug fixes to mlx5 driver. Please pull and let me know if there is any problem. Sorry about the long series, but I had to move the top two patches from net-next to net to help avoiding a build break when kspp branch is merged into linus-next on next merge window. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-02Merge branch '1GbE' of ↵Jakub Kicinski3-19/+44
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-02-01 This series contains updates to e1000e driver only. Sasha removes CSME handshake with TGL platform as this is not supported and is causing hardware unit hangs to be reported. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: e1000e: Handshake with CSME starts from ADL platforms e1000e: Separate ADP board type from TGP ==================== Link: https://lore.kernel.org/r/20220201173754.580305-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-02net/mlx5e: Avoid field-overflowing memcpy()Kees Cook2-4/+6
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally writing across neighboring fields. Use flexible arrays instead of zero-element arrays (which look like they are always overflowing) and split the cross-field memcpy() into two halves that can be appropriately bounds-checked by the compiler. We were doing: #define ETH_HLEN 14 #define VLAN_HLEN 4 ... #define MLX5E_XDP_MIN_INLINE (ETH_HLEN + VLAN_HLEN) ... struct mlx5e_tx_wqe *wqe = mlx5_wq_cyc_get_wqe(wq, pi); ... struct mlx5_wqe_eth_seg *eseg = &wqe->eth; struct mlx5_wqe_data_seg *dseg = wqe->data; ... memcpy(eseg->inline_hdr.start, xdptxd->data, MLX5E_XDP_MIN_INLINE); target is wqe->eth.inline_hdr.start (which the compiler sees as being 2 bytes in size), but copying 18, intending to write across start (really vlan_tci, 2 bytes). The remaining 16 bytes get written into wqe->data[0], covering byte_count (4 bytes), lkey (4 bytes), and addr (8 bytes). struct mlx5e_tx_wqe { struct mlx5_wqe_ctrl_seg ctrl; /* 0 16 */ struct mlx5_wqe_eth_seg eth; /* 16 16 */ struct mlx5_wqe_data_seg data[]; /* 32 0 */ /* size: 32, cachelines: 1, members: 3 */ /* last cacheline: 32 bytes */ }; struct mlx5_wqe_eth_seg { u8 swp_outer_l4_offset; /* 0 1 */ u8 swp_outer_l3_offset; /* 1 1 */ u8 swp_inner_l4_offset; /* 2 1 */ u8 swp_inner_l3_offset; /* 3 1 */ u8 cs_flags; /* 4 1 */ u8 swp_flags; /* 5 1 */ __be16 mss; /* 6 2 */ __be32 flow_table_metadata; /* 8 4 */ union { struct { __be16 sz; /* 12 2 */ u8 start[2]; /* 14 2 */ } inline_hdr; /* 12 4 */ struct { __be16 type; /* 12 2 */ __be16 vlan_tci; /* 14 2 */ } insert; /* 12 4 */ __be32 trailer; /* 12 4 */ }; /* 12 4 */ /* size: 16, cachelines: 1, members: 9 */ /* last cacheline: 16 bytes */ }; struct mlx5_wqe_data_seg { __be32 byte_count; /* 0 4 */ __be32 lkey; /* 4 4 */ __be64 addr; /* 8 8 */ /* size: 16, cachelines: 1, members: 3 */ /* last cacheline: 16 bytes */ }; So, split the memcpy() so the compiler can reason about the buffer sizes. "pahole" shows no size nor member offset changes to struct mlx5e_tx_wqe nor struct mlx5e_umr_wqe. "objdump -d" shows no meaningful object code changes (i.e. only source line number induced differences and optimizations). Fixes: b5503b994ed5 ("net/mlx5e: XDP TX forwarding support") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-02net/mlx5e: Use struct_group() for memcpy() regionKees Cook2-3/+5
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally writing across neighboring fields. Use struct_group() in struct vlan_ethhdr around members h_dest and h_source, so they can be referenced together. This will allow memcpy() and sizeof() to more easily reason about sizes, improve readability, and avoid future warnings about writing beyond the end of h_dest. "pahole" shows no size nor member offset changes to struct vlan_ethhdr. "objdump -d" shows no object code changes. Fixes: 34802a42b352 ("net/mlx5e: Do not modify the TX SKB") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-02net/mlx5e: Avoid implicit modify hdr for decap drop ruleRoi Dayan1-1/+2
Currently the driver adds implicit modify hdr action for decap rules on tunnel devices if the port is an ovs port. This is also done if the action is drop and makes the modify hdr redundant and also the FW doesn't support it and will generate a syndrome. kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3) Fix it by adding the implicit modify hdr only for fwd actions. Fixes: b16eb3c81fe2 ("net/mlx5: Support internal port as decap route device") Fixes: 077cdda764c7 ("net/mlx5e: TC, Fix memory leak with rules with internal port") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-02net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP trafficRaed Salem1-2/+11
IPsec Tunnel mode crypto offload software parser (SWP) setting in data path currently always set the inner L4 offset regardless of the encapsulated L4 header type and whether it exists in the first place, this breaks non TCP/UDP traffic as such. Set the SWP inner L4 offset only when the IPsec tunnel encapsulated L4 header protocol is TCP/UDP. While at it fix inner ip protocol read for setting MLX5_ETH_WQE_SWP_INNER_L4_UDP flag to address the case where the ip header protocol is IPv6. Fixes: f1267798c980 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload") Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-02net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated trafficRaed Salem1-3/+6
IPsec crypto offload always set the ethernet segment checksum flags with the inner L4 header checksum flag enabled for encapsulated IPsec offloaded packet regardless of the encapsulated L4 header type, and even if it doesn't exists in the first place, this breaks non TCP/UDP traffic as such. Set the inner L4 checksum flag only when the encapsulated L4 header protocol is TCP/UDP using software parser swp_inner_l4_offset field as indication. Fixes: 5cfb540ef27b ("net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.") Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>