summaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)AuthorFilesLines
2024-08-29net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and ↵Vladimir Oltean2-7/+24
register injection [ Upstream commit 67c3ca2c5cfe6a50772514e3349b5e7b3b0fac03 ] Problem description ------------------- On an NXP LS1028A (felix DSA driver) with the following configuration: - ocelot-8021q tagging protocol - VLAN-aware bridge (with STP) spanning at least swp0 and swp1 - 8021q VLAN upper interfaces on swp0 and swp1: swp0.700, swp1.700 - ptp4l on swp0.700 and swp1.700 we see that the ptp4l instances do not see each other's traffic, and they all go to the grand master state due to the ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES condition. Jumping to the conclusion for the impatient ------------------------------------------- There is a zero-day bug in the ocelot switchdev driver in the way it handles VLAN-tagged packet injection. The correct logic already exists in the source code, in function ocelot_xmit_get_vlan_info() added by commit 5ca721c54d86 ("net: dsa: tag_ocelot: set the classified VLAN during xmit"). But it is used only for normal NPI-based injection with the DSA "ocelot" tagging protocol. The other injection code paths (register-based and FDMA-based) roll their own wrong logic. This affects and was noticed on the DSA "ocelot-8021q" protocol because it uses register-based injection. By moving ocelot_xmit_get_vlan_info() to a place that's common for both the DSA tagger and the ocelot switch library, it can also be called from ocelot_port_inject_frame() in ocelot.c. We need to touch the lines with ocelot_ifh_port_set()'s prototype anyway, so let's rename it to something clearer regarding what it does, and add a kernel-doc. ocelot_ifh_set_basic() should do. Investigation notes ------------------- Debugging reveals that PTP event (aka those carrying timestamps, like Sync) frames injected into swp0.700 (but also swp1.700) hit the wire with two VLAN tags: 00000000: 01 1b 19 00 00 00 00 01 02 03 04 05 81 00 02 bc ~~~~~~~~~~~ 00000010: 81 00 02 bc 88 f7 00 12 00 2c 00 00 02 00 00 00 ~~~~~~~~~~~ 00000020: 00 00 00 00 00 00 00 00 00 00 00 01 02 ff fe 03 00000030: 04 05 00 01 00 04 00 00 00 00 00 00 00 00 00 00 00000040: 00 00 The second (unexpected) VLAN tag makes felix_check_xtr_pkt() -> ptp_classify_raw() fail to see these as PTP packets at the link partner's receiving end, and return PTP_CLASS_NONE (because the BPF classifier is not written to expect 2 VLAN tags). The reason why packets have 2 VLAN tags is because the transmission code treats VLAN incorrectly. Neither ocelot switchdev, nor felix DSA, declare the NETIF_F_HW_VLAN_CTAG_TX feature. Therefore, at xmit time, all VLANs should be in the skb head, and none should be in the hwaccel area. This is done by: static struct sk_buff *validate_xmit_vlan(struct sk_buff *skb, netdev_features_t features) { if (skb_vlan_tag_present(skb) && !vlan_hw_offload_capable(features, skb->vlan_proto)) skb = __vlan_hwaccel_push_inside(skb); return skb; } But ocelot_port_inject_frame() handles things incorrectly: ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb)); void ocelot_ifh_port_set(struct sk_buff *skb, void *ifh, int port, u32 rew_op) { (...) if (vlan_tag) ocelot_ifh_set_vlan_tci(ifh, vlan_tag); (...) } The way __vlan_hwaccel_push_inside() pushes the tag inside the skb head is by calling: static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb) { skb->vlan_present = 0; } which does _not_ zero out skb->vlan_tci as seen by skb_vlan_tag_get(). This means that ocelot, when it calls skb_vlan_tag_get(), sees (and uses) a residual skb->vlan_tci, while the same VLAN tag is _already_ in the skb head. The trivial fix for double VLAN headers is to replace the content of ocelot_ifh_port_set() with: if (skb_vlan_tag_present(skb)) ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb)); but this would not be correct either, because, as mentioned, vlan_hw_offload_capable() is false for us, so we'd be inserting dead code and we'd always transmit packets with VID=0 in the injection frame header. I can't actually test the ocelot switchdev driver and rely exclusively on code inspection, but I don't think traffic from 8021q uppers has ever been injected properly, and not double-tagged. Thus I'm blaming the introduction of VLAN fields in the injection header - early driver code. As hinted at in the early conclusion, what we _want_ to happen for VLAN transmission was already described once in commit 5ca721c54d86 ("net: dsa: tag_ocelot: set the classified VLAN during xmit"). ocelot_xmit_get_vlan_info() intends to ensure that if the port through which we're transmitting is under a VLAN-aware bridge, the outer VLAN tag from the skb head is stripped from there and inserted into the injection frame header (so that the packet is processed in hardware through that actual VLAN). And in all other cases, the packet is sent with VID=0 in the injection frame header, since the port is VLAN-unaware and has logic to strip this VID on egress (making it invisible to the wire). Fixes: 08d02364b12f ("net: mscc: fix the injection header") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29gtp: pull network headers in gtp_dev_xmit()Eric Dumazet1-0/+3
commit 3a3be7ff9224f424e485287b54be00d2c6bd9c40 upstream. syzbot/KMSAN reported use of uninit-value in get_dev_xmit() [1] We must make sure the IPv4 or Ipv6 header is pulled in skb->head before accessing fields in them. Use pskb_inet_may_pull() to fix this issue. [1] BUG: KMSAN: uninit-value in ipv6_pdp_find drivers/net/gtp.c:220 [inline] BUG: KMSAN: uninit-value in gtp_build_skb_ip6 drivers/net/gtp.c:1229 [inline] BUG: KMSAN: uninit-value in gtp_dev_xmit+0x1424/0x2540 drivers/net/gtp.c:1281 ipv6_pdp_find drivers/net/gtp.c:220 [inline] gtp_build_skb_ip6 drivers/net/gtp.c:1229 [inline] gtp_dev_xmit+0x1424/0x2540 drivers/net/gtp.c:1281 __netdev_start_xmit include/linux/netdevice.h:4913 [inline] netdev_start_xmit include/linux/netdevice.h:4922 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3596 __dev_queue_xmit+0x358c/0x5610 net/core/dev.c:4423 dev_queue_xmit include/linux/netdevice.h:3105 [inline] packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3145 [inline] packet_sendmsg+0x90e3/0xa3a0 net/packet/af_packet.c:3177 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 __sys_sendto+0x685/0x830 net/socket.c:2204 __do_sys_sendto net/socket.c:2216 [inline] __se_sys_sendto net/socket.c:2212 [inline] __x64_sys_sendto+0x125/0x1d0 net/socket.c:2212 x64_sys_call+0x3799/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:45 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:3994 [inline] slab_alloc_node mm/slub.c:4037 [inline] kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4080 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:583 __alloc_skb+0x363/0x7b0 net/core/skbuff.c:674 alloc_skb include/linux/skbuff.h:1320 [inline] alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6526 sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2815 packet_alloc_skb net/packet/af_packet.c:2994 [inline] packet_snd net/packet/af_packet.c:3088 [inline] packet_sendmsg+0x749c/0xa3a0 net/packet/af_packet.c:3177 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 __sys_sendto+0x685/0x830 net/socket.c:2204 __do_sys_sendto net/socket.c:2216 [inline] __se_sys_sendto net/socket.c:2212 [inline] __x64_sys_sendto+0x125/0x1d0 net/socket.c:2212 x64_sys_call+0x3799/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:45 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f CPU: 0 UID: 0 PID: 7115 Comm: syz.1.515 Not tainted 6.11.0-rc1-syzkaller-00043-g94ede2a3e913 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024 Fixes: 999cb275c807 ("gtp: add IPv6 support") Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Harald Welte <laforge@gnumonks.org> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://patch.msgid.link/20240808132455.3413916-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-29ionic: check cmd_regs before copying in or outShannon Nelson4-1/+24
[ Upstream commit 7662fad348ac54120e9e6443cb0bbe4f3b582219 ] Since we now have potential cases of NULL cmd_regs and info_regs during a reset recovery, and left NULL if a reset recovery has failed, we need to check that they exist before we use them. Most of the cases were covered in the original patch where we verify before doing the ioreadb() for health or cmd status. However, we need to protect a few uses of io mem that could be hit in error recovery or asynchronous threads calls as well (e.g. ethtool or devlink handlers). Fixes: 219e183272b4 ("ionic: no fw read when PCI reset failed") Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29ionic: use pci_is_enabled not open codeShannon Nelson1-1/+1
[ Upstream commit 121e4dcba3700b30e63f25203d09ddfccbab4a09 ] Since there is a utility available for this, use the API rather than open code. Fixes: 13943d6c8273 ("ionic: prevent pci disable of already disabled device") Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: hns3: add checking for vf id of mailboxJian Shen1-3/+4
[ Upstream commit 4e2969a0d6a7549bc0bc1ebc990588b622c4443d ] Add checking for vf id of mailbox, in order to avoid array out-of-bounds risk. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net/sun3_82586: Avoid reading past buffer in debug outputKees Cook1-1/+1
[ Upstream commit 4bea747f3fbec33c16d369b2f51e55981d7c78d0 ] Since NUM_XMIT_BUFFS is always 1, building m68k with sun3_defconfig and -Warraybounds, this build warning is visible[1]: drivers/net/ethernet/i825xx/sun3_82586.c: In function 'sun3_82586_timeout': drivers/net/ethernet/i825xx/sun3_82586.c:990:122: warning: array subscript 1 is above array bounds of 'volatile struct transmit_cmd_struct *[1]' [-Warray-bounds=] 990 | printk("%s: command-stats: %04x %04x\n",dev->name,swab16(p->xmit_cmds[0]->cmd_status),swab16(p->xmit_cmds[1]->cmd_status)); | ~~~~~~~~~~~~^~~ ... drivers/net/ethernet/i825xx/sun3_82586.c:156:46: note: while referencing 'xmit_cmds' 156 | volatile struct transmit_cmd_struct *xmit_cmds[NUM_XMIT_BUFFS]; Avoid accessing index 1 since it doesn't exist. Link: https://github.com/KSPP/linux/issues/325 [1] Cc: Sam Creasey <sammy@sammy.net> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20240206161651.work.876-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29wifi: iwlwifi: mvm: avoid garbage iPNShaul Triebitz1-0/+3
[ Upstream commit 0c1c91604f3e3fc41f4d77dcfc3753860a9a32c9 ] After waking from D3, we set the iPN given by the firmware. For some reason, CIPHER_SUITE_AES_CMAC was missed. That caused copying garbage to the iPN - causing false replays. (since 'seq' is on the stack, and the iPN from the firmware was not copied into it, it contains garbage which later is copied to the iPN key). Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240205211151.2be5b35be30f.I99db8700d01092d22a6d76f1fc1bd5916c9df784@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29wifi: ath12k: Add missing qmi_txn_cancel() callsJeff Johnson1-0/+7
[ Upstream commit 2e82b5f09a97f1b98b885470c81c1248bec103af ] Per the QMI documentation "A client calling qmi_txn_init() must call either qmi_txn_wait() or qmi_txn_cancel() to free up the allocated resources." Unfortunately, in most of the ath12k messaging functions, when qmi_send_request() fails, the function returns without performing the necessary cleanup. So update those functions to call qmi_txn_cancel() when qmi_send_request() fails. No functional changes, compile tested only. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240111-qmi-cleanup-v2-2-53343af953d5@quicinc.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29ionic: no fw read when PCI reset failedShannon Nelson2-5/+23
[ Upstream commit 219e183272b4a566650a37264aff90a8c613d9b5 ] If there was a failed attempt to reset the PCI connection, don't later try to read from PCI as the space is unmapped and will cause a paging request crash. When clearing the PCI setup we can clear the dev_info register pointer, and check it before using it in the fw_running test. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29ionic: prevent pci disable of already disabled deviceShannon Nelson1-1/+3
[ Upstream commit 13943d6c82730a2a4e40e05d6deaca26a8de0a4d ] If a reset fails, the PCI device is left in a disabled state, so don't try to disable it again on driver remove. This prevents a scary looking WARN trace in the kernel log. ionic 0000:2b:00.0: disabling already-disabled device Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29wifi: iwlwifi: check for kmemdup() return value in iwl_parse_tlv_firmware()Dmitry Antipov1-2/+4
[ Upstream commit 3c8aaaa7557b1e33e6ef95a27a5d8a139dcd0874 ] In 'iwl_parse_tlv_firmware()', check for 'kmemdup()' return value when handling IWL_UCODE_TLV_CURRENT_PC and set the number of parsed entries only if an allocation was successful (just like it does with handling IWL_UCODE_TLV_CMD_VERSIONS above). Compile tested only. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20231009170453.149905-1-dmantipov@yandex.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29wifi: iwlwifi: fw: Fix debugfs command sendingMukesh Sisodiya1-1/+5
[ Upstream commit 048449fc666d736a1a17d950fde0b5c5c8fd10cc ] During debugfs command handling transport function is used directly, this bypasses the locking used by runtime operation function and leads to a kernel warning when two commands are sent in parallel. Fix it by using runtime operations function when sending debugfs command. Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20231004123422.4f80ac90658a.Ia1dfa1195c919f3002fe08db3eefbd2bfa921bbf@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29wifi: iwlwifi: abort scan when rfkill on but device enabledMiri Korenblit1-1/+1
[ Upstream commit 3c6a0b1f0add72e7f522bc9145222b86d0a7712a ] In RFKILL we first set the RFKILL bit, then we abort scan (if one exists) by waiting for the notification from FW and notifying mac80211. And then we stop the device. But in case we have a scan ongoing in the period of time between rfkill on and before the device is stopped - we will not wait for the FW notification because of the iwl_mvm_is_radio_killed() condition, and then the scan_status and uid_status are misconfigured, (scan_status is cleared but uid_status not) and when the notification suddenly arrives (before stopping the device) we will get into the assert about scan_status and uid_status mismatch. Fix this by waiting for FW notif when rfkill is on but the device isn't disabled yet. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20231004123422.c43b69aa2c77.Icc7b5efb47974d6f499156ff7510b786e177993b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29wifi: mt76: fix race condition related to checking tx queue fill statusFelix Fietkau10-51/+155
[ Upstream commit 0335c034e7265d36d956e806f33202c94a8a9860 ] When drv_tx calls race against local tx scheduling, the queue fill status checks can potentially race, leading to dma queue entries being overwritten. Fix this by deferring packets from drv_tx calls to the tx worker, in order to ensure that all regular queue tx comes from the same context. Reported-by: Ryder Lee <Ryder.Lee@mediatek.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29wifi: ath11k: fix ath11k_mac_op_remain_on_channel() stack usageDmitry Antipov1-19/+25
[ Upstream commit 4fd15bb705d3faa7e6adab2daba2e3af80d9b6bd ] When compiling with clang 16.0.6, I've noticed the following: drivers/net/wireless/ath/ath11k/mac.c:8903:12: warning: stack frame size (1032) exceeds limit (1024) in 'ath11k_mac_op_remain_on_channel' [-Wframe-larger-than] static int ath11k_mac_op_remain_on_channel(struct ieee80211_hw *hw, ^ 68/1032 (6.59%) spills, 964/1032 (93.41%) variables So switch to kzalloc()'ed instance of 'struct scan_req_params' like it's done in 'ath11k_mac_op_hw_scan()'. Compile tested only. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230926042906.13725-1-dmantipov@yandex.ru Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29wifi: ath12k: fix WARN_ON during ath12k_mac_update_vif_chanManish Dharanenthiran1-6/+21
[ Upstream commit 8b8b990fe495e9be057249e1651b59b5ebacf2ef ] Fix WARN_ON() from ath12k_mac_update_vif_chan() if vdev is not up. Since change_chanctx can be called even before vdev_up. Do vdev stop followed by a vdev start in case of vdev is down. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0-02903-QCAHKSWPL_SILICONZ-1 Signed-off-by: Manish Dharanenthiran <quic_mdharane@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230802085852.19821-2-quic_mdharane@quicinc.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29wifi: cw1200: Avoid processing an invalid TIM IEJeff Johnson1-1/+1
[ Upstream commit b7bcea9c27b3d87b54075735c870500123582145 ] While converting struct ieee80211_tim_ie::virtual_map to be a flexible array it was observed that the TIM IE processing in cw1200_rx_cb() could potentially process a malformed IE in a manner that could result in a buffer over-read. Add logic to verify that the TIM IE length is large enough to hold a valid TIM payload before processing it. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230831-ieee80211_tim_ie-v3-1-e10ff584ab5d@quicinc.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: ethernet: mtk_wed: check update_wo_rx_stats in mtk_wed_update_rx_stats()Lorenzo Bianconi1-0/+3
[ Upstream commit 486e6ca6b48d68d7fefc99e15cc1865e2210d893 ] Check if update_wo_rx_stats function pointer is properly set in mtk_wed_update_rx_stats routine before accessing it. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/b0d233386e059bccb59f18f69afb79a7806e5ded.1694507226.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29wifi: iwlwifi: mvm: fix recovery flow in CSAEmmanuel Grumbach1-1/+6
[ Upstream commit 828c79d9feb000acbd9c15bd1ed7e0914473b363 ] If the firmware crashes in the de-activation / re-activation of the link during CSA, we will not have a valid phy_ctxt pointer in mvmvif. This is a legit case, but when mac80211 removes the station to cleanup our state during the re-configuration, we need to make sure we clear ap_sta otherwise we won't re-add the station after the firmware has been restarted. Later on, we'd activate the link, try to send a TLC command crash again on ASSERT 3508. Fix this by properly cleaning up our state. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230913145231.2651e6f6a55a.I4cd50e88ee5c23c1c8dd5b157a800e4b4c96f236@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: hns3: fix a deadlock problem when config TC during resettingJie Wang1-0/+3
[ Upstream commit be5e816d00a506719e9dbb1a9c861c5ced30a109 ] When config TC during the reset process, may cause a deadlock, the flow is as below: pf reset start │ ▼ ...... setup tc │ │ ▼ ▼ DOWN: napi_disable() napi_disable()(skip) │ │ │ ▼ ▼ ...... ...... │ │ ▼ │ napi_enable() │ ▼ UINIT: netif_napi_del() │ ▼ ...... │ ▼ INIT: netif_napi_add() │ ▼ ...... global reset start │ │ ▼ ▼ UP: napi_enable()(skip) ...... │ │ ▼ ▼ ...... napi_disable() In reset process, the driver will DOWN the port and then UINIT, in this case, the setup tc process will UP the port before UINIT, so cause the problem. Adds a DOWN process in UINIT to fix it. Fixes: bb6b94a896d4 ("net: hns3: Add reset interface implementation in client") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: hns3: use the user's cfg after resetPeiyang Wang2-6/+21
[ Upstream commit 30545e17eac1f50c5ef49644daf6af205100a965 ] Consider the followed case that the user change speed and reset the net interface. Before the hw change speed successfully, the driver get old old speed from hw by timer task. After reset, the previous speed is config to hw. As a result, the new speed is configed successfully but lost after PF reset. The followed pictured shows more dirrectly. +------+ +----+ +----+ | USER | | PF | | HW | +---+--+ +-+--+ +-+--+ | ethtool -s 100G | | +------------------>| set speed 100G | | +--------------------->| | | set successfully | | |<---------------------+---+ | |query cfg (timer task)| | | +--------------------->| | handle speed | | return 200G | | changing event | ethtool --reset |<---------------------+ | (100G) +------------------>| cfg previous speed |<--+ | | after reset (200G) | | +--------------------->| | | +---+ | |query cfg (timer task)| | | +--------------------->| | handle speed | | return 100G | | changing event | |<---------------------+ | (200G) | | |<--+ | |query cfg (timer task)| | +--------------------->| | | return 200G | | |<---------------------+ | | | v v v This patch save new speed if hw change speed successfully, which will be used after reset successfully. Fixes: 2d03eacc0b7e ("net: hns3: Only update mac configuation when necessary") Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: hns3: fix wrong use of semaphore upJie Wang2-4/+4
[ Upstream commit 8445d9d3c03101859663d34fda747f6a50947556 ] Currently, if hns3 PF or VF FLR reset failed after five times retry, the reset done process will directly release the semaphore which has already released in hclge_reset_prepare_general. This will cause down operation fail. So this patch fixes it by adding reset state judgement. The up operation is only called after successful PF FLR reset. Fixes: 8627bdedc435 ("net: hns3: refactor the precedure of PF FLR") Fixes: f28368bb4542 ("net: hns3: refactor the procedure of VF FLR") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29mlxbf_gige: disable RX filters until RX path initializedDavid Thompson4-6/+64
[ Upstream commit df934abb185c71c9f2fa07a5013672d0cbd36560 ] A recent change to the driver exposed a bug where the MAC RX filters (unicast MAC, broadcast MAC, and multicast MAC) are configured and enabled before the RX path is fully initialized. The result of this bug is that after the PHY is started packets that match these MAC RX filters start to flow into the RX FIFO. And then, after rx_init() is completed, these packets will go into the driver RX ring as well. If enough packets are received to fill the RX ring (default size is 128 packets) before the call to request_irq() completes, the driver RX function becomes stuck. This bug is intermittent but is most likely to be seen where the oob_net0 interface is connected to a busy network with lots of broadcast and multicast traffic. All the MAC RX filters must be disabled until the RX path is ready, i.e. all initialization is done and all the IRQs are installed. Fixes: f7442a634ac0 ("mlxbf_gige: call request_irq() after NAPI initialized") Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Signed-off-by: David Thompson <davthompson@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240809163612.12852-1-davthompson@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: ethernet: mtk_wed: fix use-after-free panic in mtk_wed_setup_tc_block_cb()Zheng Zhang1-2/+4
[ Upstream commit db1b4bedb9b97c6d34b03d03815147c04fffe8b4 ] When there are multiple ap interfaces on one band and with WED on, turning the interface down will cause a kernel panic on MT798X. Previously, cb_priv was freed in mtk_wed_setup_tc_block() without marking NULL,and mtk_wed_setup_tc_block_cb() didn't check the value, too. Assign NULL after free cb_priv in mtk_wed_setup_tc_block() and check NULL in mtk_wed_setup_tc_block_cb(). ---------- Unable to handle kernel paging request at virtual address 0072460bca32b4f5 Call trace: mtk_wed_setup_tc_block_cb+0x4/0x38 0xffffffc0794084bc tcf_block_playback_offloads+0x70/0x1e8 tcf_block_unbind+0x6c/0xc8 ... --------- Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support") Signed-off-by: Zheng Zhang <everything411@qq.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: dsa: vsc73xx: check busy flag in MDIO operationsPawel Dembicki1-1/+36
[ Upstream commit fa63c6434b6f6aaf9d8d599dc899bc0a074cc0ad ] The VSC73xx has a busy flag used during MDIO operations. It is raised when MDIO read/write operations are in progress. Without it, PHYs are misconfigured and bus operations do not work as expected. Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: dsa: vsc73xx: use read_poll_timeout instead delay loopPawel Dembicki1-14/+16
[ Upstream commit eb7e33d01db3aec128590391b2397384bab406b6 ] Switch the delay loop during the Arbiter empty check from vsc73xx_adjust_link() to use read_poll_timeout(). Functionally, one msleep() call is eliminated at the end of the loop in the timeout case. As Russell King suggested: "This [change] avoids the issue that on the last iteration, the code reads the register, tests it, finds the condition that's being waiting for is false, _then_ waits and end up printing the error message - that last wait is rather useless, and as the arbiter state isn't checked after waiting, it could be that we had success during the last wait." Suggested-by: Russell King <linux@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Link: https://lore.kernel.org/r/20240417205048.3542839-2-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: fa63c6434b6f ("net: dsa: vsc73xx: check busy flag in MDIO operations") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: dsa: vsc73xx: pass value in phy_write operationPawel Dembicki1-1/+1
[ Upstream commit 5b9eebc2c7a5f0cc7950d918c1e8a4ad4bed5010 ] In the 'vsc73xx_phy_write' function, the register value is missing, and the phy write operation always sends zeros. This commit passes the value variable into the proper register. Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: axienet: Fix register defines comment descriptionRadhey Shyam Pandey1-8/+8
[ Upstream commit 9ff2f816e2aa65ca9a1cdf0954842f8173c0f48d ] In axiethernet header fix register defines comment description to be inline with IP documentation. It updates MAC configuration register, MDIO configuration register and frame filter control description. Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net/mlx5e: Correctly report errors for ethtool rx flowsCosmin Ratiu1-1/+1
[ Upstream commit cbc796be1779c4dbc9a482c7233995e2a8b6bfb3 ] Previously, an ethtool rx flow with no attrs would not be added to the NIC as it has no rules to configure the hw with, but it would be reported as successful to the caller (return code 0). This is confusing for the user as ethtool then reports "Added rule $num", but no rule was actually added. This change corrects that by instead reporting these wrong rules as -EINVAL. Fixes: b29c61dac3a2 ("net/mlx5e: Ethtool steering flow validation refactoring") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808144107.2095424-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net/mlx5e: Take state lock during tx timeout reporterDragos Tatulea1-0/+2
[ Upstream commit e6b5afd30b99b43682a7764e1a74a42fe4d5f4b3 ] mlx5e_safe_reopen_channels() requires the state lock taken. The referenced changed in the Fixes tag removed the lock to fix another issue. This patch adds it back but at a later point (when calling mlx5e_safe_reopen_channels()) to avoid the deadlock referenced in the Fixes tag. Fixes: eab0da38912e ("net/mlx5e: Fix possible deadlock on mlx5e_tx_timeout_work") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/all/ZplpKq8FKi3vwfxv@gmail.com/T/ Reviewed-by: Breno Leitao <leitao@debian.org> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808144107.2095424-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29igc: Fix reset adapter logics when tx mode changeFaizal Rahim1-5/+19
[ Upstream commit 0afeaeb5dae86aceded0d5f0c3a54d27858c0c6f ] Following the "igc: Fix TX Hang issue when QBV Gate is close" changes, remaining issues with the reset adapter logic in igc_tsn_offload_apply() have been observed: 1. The reset adapter logics for i225 and i226 differ, although they should be the same according to the guidelines in I225/6 HW Design Section 7.5.2.1 on software initialization during tx mode changes. 2. The i225 resets adapter every time, even though tx mode doesn't change. This occurs solely based on the condition igc_is_device_id_i225() when calling schedule_work(). 3. i226 doesn't reset adapter for tsn->legacy tx mode changes. It only resets adapter for legacy->tsn tx mode transitions. 4. qbv_count introduced in the patch is actually not needed; in this context, a non-zero value of qbv_count is used to indicate if tx mode was unconditionally set to tsn in igc_tsn_enable_offload(). This could be replaced by checking the existing register IGC_TQAVCTRL_TRANSMIT_MODE_TSN bit. This patch resolves all issues and enters schedule_work() to reset the adapter only when changing tx mode. It also removes reliance on qbv_count. qbv_count field will be removed in a future patch. Test ran: 1. Verify reset adapter behaviour in i225/6: a) Enrol a new GCL Reset adapter observed (tx mode change legacy->tsn) b) Enrol a new GCL without deleting qdisc No reset adapter observed (tx mode remain tsn->tsn) c) Delete qdisc Reset adapter observed (tx mode change tsn->legacy) 2. Tested scenario from "igc: Fix TX Hang issue when QBV Gate is closed" to confirm it remains resolved. Fixes: 175c241288c0 ("igc: Fix TX Hang issue when QBV Gate is closed") Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29igc: Fix qbv_config_change_errors logicsFaizal Rahim3-10/+15
[ Upstream commit f8d6acaee9d35cbff3c3cfad94641666c596f8da ] When user issues these cmds: 1. Either a) or b) a) mqprio with hardware offload disabled b) taprio with txtime-assist feature enabled 2. etf 3. tc qdisc delete 4. taprio with base time in the past At step 4, qbv_config_change_errors wrongly increased by 1. Excerpt from IEEE 802.1Q-2018 8.6.9.3.1: "If AdminBaseTime specifies a time in the past, and the current schedule is running, then: Increment ConfigChangeError counter" qbv_config_change_errors should only increase if base time is in the past and no taprio is active. In user perspective, taprio was not active when first triggered at step 4. However, i225/6 reuses qbv for etf, so qbv is enabled with a dummy schedule at step 2 where it enters igc_tsn_enable_offload() and qbv_count got incremented to 1. At step 4, it enters igc_tsn_enable_offload() again, qbv_count is incremented to 2. Because taprio is running, tc_setup_type is TC_SETUP_QDISC_ETF and qbv_count > 1, qbv_config_change_errors value got incremented. This issue happens due to reliance on qbv_count field where a non-zero value indicates that taprio is running. But qbv_count increases regardless if taprio is triggered by user or by other tsn feature. It does not align with qbv_config_change_errors expectation where it is only concerned with taprio triggered by user. Fixing this by relocating the qbv_config_change_errors logic to igc_save_qbv_schedule(), eliminating reliance on qbv_count and its inaccuracies from i225/6's multiple uses of qbv feature for other TSN features. The new function created: igc_tsn_is_taprio_activated_by_user() uses taprio_offload_enable field to indicate that the current running taprio was triggered by user, instead of triggered by non-qbv feature like etf. Fixes: ae4fe4698300 ("igc: Add qbv_config_change_errors counter") Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29igc: Fix packet still tx after gate close by reducing i226 MAC retry bufferFaizal Rahim2-0/+40
[ Upstream commit e037a26ead187901f83cad9c503ccece5ff6817a ] Testing uncovered that even when the taprio gate is closed, some packets still transmit. According to i225/6 hardware errata [1], traffic might overflow the planned QBV window. This happens because MAC maintains an internal buffer, primarily for supporting half duplex retries. Therefore, even when the gate closes, residual MAC data in the buffer may still transmit. To mitigate this for i226, reduce the MAC's internal buffer from 192 bytes to the recommended 88 bytes by modifying the RETX_CTL register value. This follows guidelines from: [1] Ethernet Controller I225/I22 Spec Update Rev 2.1 Errata Item 9: TSN: Packet Transmission Might Cross Qbv Window [2] I225/6 SW User Manual Rev 1.2.4: Section 8.11.5 Retry Buffer Control Note that the RETX_CTL register can't be used in TSN mode because half duplex feature cannot coexist with TSN. Test Steps: 1. Send taprio cmd to board A: tc qdisc replace dev enp1s0 parent root handle 100 taprio \ num_tc 4 \ map 3 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \ queues 1@0 1@1 1@2 1@3 \ base-time 0 \ sched-entry S 0x07 500000 \ sched-entry S 0x0f 500000 \ flags 0x2 \ txtime-delay 0 Note that for TC3, gate should open for 500us and close for another 500us. 3. Take tcpdump log on Board B. 4. Send udp packets via UDP tai app from Board A to Board B. 5. Analyze tcpdump log via wireshark log on Board B. Ensure that the total time from the first to the last packet received during one cycle for TC3 does not exceed 500us. Fixes: 43546211738e ("igc: Add new device ID's") Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29wifi: brcmfmac: cfg80211: Handle SSID based pmksa deletionJanne Grunau1-3/+10
commit 2ad4e1ada8eebafa2d75a4b75eeeca882de6ada1 upstream. wpa_supplicant 2.11 sends since 1efdba5fdc2c ("Handle PMKSA flush in the driver for SAE/OWE offload cases") SSID based PMKSA del commands. brcmfmac is not prepared and tries to dereference the NULL bssid and pmkid pointers in cfg80211_pmksa. PMKID_V3 operations support SSID based updates so copy the SSID. Fixes: a96202acaea4 ("wifi: brcmfmac: cfg80211: Add support for PMKID_V3 operations") Cc: stable@vger.kernel.org # 6.4.x Signed-off-by: Janne Grunau <j@jannau.net> Reviewed-by: Neal Gompa <neal@gompa.dev> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://patch.msgid.link/20240803-brcmfmac_pmksa_del_ssid-v1-1-4e85f19135e1@jannau.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-29net: mana: Fix doorbell out of order violation and avoid unnecessary ↵Long Li1-9/+15
doorbell rings commit 58a63729c957621f1990c3494c702711188ca347 upstream. After napi_complete_done() is called when NAPI is polling in the current process context, another NAPI may be scheduled and start running in softirq on another CPU and may ring the doorbell before the current CPU does. When combined with unnecessary rings when there is no need to arm the CQ, it triggers error paths in the hardware. This patch fixes this by calling napi_complete_done() after doorbell rings. It limits the number of unnecessary rings when there is no need to arm. MANA hardware specifies that there must be one doorbell ring every 8 CQ wraparounds. This driver guarantees one doorbell ring as soon as the number of consumed CQEs exceeds 4 CQ wraparounds. In practical workloads, the 4 CQ wraparounds proves to be big enough that it rarely exceeds this limit before all the napi weight is consumed. To implement this, add a per-CQ counter cq->work_done_since_doorbell, and make sure the CQ is armed as soon as passing 4 wraparounds of the CQ. Cc: stable@vger.kernel.org Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/1723219138-29887-1-git-send-email-longli@linuxonhyperv.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-29net: mana: Fix RX buf alloc_size alignment and atomic op panicHaiyang Zhang1-1/+5
commit 32316f676b4ee87c0404d333d248ccf777f739bc upstream. The MANA driver's RX buffer alloc_size is passed into napi_build_skb() to create SKB. skb_shinfo(skb) is located at the end of skb, and its alignment is affected by the alloc_size passed into napi_build_skb(). The size needs to be aligned properly for better performance and atomic operations. Otherwise, on ARM64 CPU, for certain MTU settings like 4000, atomic operations may panic on the skb_shinfo(skb)->dataref due to alignment fault. To fix this bug, add proper alignment to the alloc_size calculation. Sample panic info: [ 253.298819] Unable to handle kernel paging request at virtual address ffff000129ba5cce [ 253.300900] Mem abort info: [ 253.301760] ESR = 0x0000000096000021 [ 253.302825] EC = 0x25: DABT (current EL), IL = 32 bits [ 253.304268] SET = 0, FnV = 0 [ 253.305172] EA = 0, S1PTW = 0 [ 253.306103] FSC = 0x21: alignment fault Call trace: __skb_clone+0xfc/0x198 skb_clone+0x78/0xe0 raw6_local_deliver+0xfc/0x228 ip6_protocol_deliver_rcu+0x80/0x500 ip6_input_finish+0x48/0x80 ip6_input+0x48/0xc0 ip6_sublist_rcv_finish+0x50/0x78 ip6_sublist_rcv+0x1cc/0x2b8 ipv6_list_rcv+0x100/0x150 __netif_receive_skb_list_core+0x180/0x220 netif_receive_skb_list_internal+0x198/0x2a8 __napi_poll+0x138/0x250 net_rx_action+0x148/0x330 handle_softirqs+0x12c/0x3a0 Cc: stable@vger.kernel.org Fixes: 80f6215b450e ("net: mana: Add support for jumbo frame") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-19pppoe: Fix memory leak in pppoe_sendmsg()Gavrilov Ilia1-14/+9
[ Upstream commit dc34ebd5c018b0edf47f39d11083ad8312733034 ] syzbot reports a memory leak in pppoe_sendmsg [1]. The problem is in the pppoe_recvmsg() function that handles errors in the wrong order. For the skb_recv_datagram() function, check the pointer to skb for NULL first, and then check the 'error' variable, because the skb_recv_datagram() function can set 'error' to -EAGAIN in a loop but return a correct pointer to socket buffer after a number of attempts, though 'error' remains set to -EAGAIN. skb_recv_datagram __skb_recv_datagram // Loop. if (err == -EAGAIN) then // go to the next loop iteration __skb_try_recv_datagram // if (skb != NULL) then return 'skb' // else if a signal is received then // return -EAGAIN Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with Syzkaller. Link: https://syzkaller.appspot.com/bug?extid=6bdfd184eac7709e5cc9 [1] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+6bdfd184eac7709e5cc9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6bdfd184eac7709e5cc9 Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/20240214085814.3894917-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14net: stmmac: qcom-ethqos: enable SGMII loopback during DMA reset on ↵Bartosz Golaszewski1-0/+23
sa8775p-ride-r3 [ Upstream commit 3c466d6537b99f801b3f68af3d8124d4312437a0 ] On sa8775p-ride-r3 the RX clocks from the AQR115C PHY are not available at the time of the DMA reset. We can however extract the RX clock from the internal SERDES block. Once the link is up, we can revert to the previous state. The AQR115C PHY doesn't support in-band signalling so we can count on getting the link up notification and safely reuse existing callbacks which are already used by another HW quirk workaround which enables the functional clock to avoid a DMA reset due to timeout. Only enable loopback on revision 3 of the board - check the phy_mode to make sure. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240703181500.28491-3-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14can: mcp251xfd: tef: update workaround for erratum DS80000789E 6 of mcp2518fdMarc Kleine-Budde1-44/+27
[ Upstream commit 3a0a88fcbaf9e027ecca3fe8775be9700b4d6460 ] This patch updates the workaround for a problem similar to erratum DS80000789E 6 of the mcp2518fd, the other variants of the chip family (mcp2517fd and mcp251863) are probably also affected. Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA register for an RX FIFO may be corrupted". However observation shows that this problem is not limited to RX FIFOs but also effects the TEF FIFO. In the bad case, the driver reads a too large head index. As the FIFO is implemented as a ring buffer, this results in re-handling old CAN transmit complete events. Every transmit complete event contains with a sequence number that equals to the sequence number of the corresponding TX request. This way old TX complete events can be detected. If the original driver detects a non matching sequence number, it prints an info message and tries again later. As wrong sequence numbers can be explained by the erratum DS80000789E 6, demote the info message to debug level, streamline the code and update the comments. Keep the behavior: If an old CAN TX complete event is detected, abort the iteration and mark the number of valid CAN TX complete events as processed in the chip by incrementing the FIFO's tail index. Cc: Stefan Althöfer <Stefan.Althoefer@janztec.com> Cc: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratumMarc Kleine-Budde3-26/+43
[ Upstream commit b8e0ddd36ce9536ad7478dd27df06c9ae92370ba ] This is a preparatory patch to work around a problem similar to erratum DS80000789E 6 of the mcp2518fd, the other variants of the chip family (mcp2517fd and mcp251863) are probably also affected. Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA register for an RX FIFO may be corrupted". However observation shows that this problem is not limited to RX FIFOs but also effects the TEF FIFO. When handling the TEF interrupt, the driver reads the FIFO header index from the TEF FIFO STA register of the chip. In the bad case, the driver reads a too large head index. In the original code, the driver always trusted the read value, which caused old CAN transmit complete events that were already processed to be re-processed. Instead of reading and trusting the head index, read the head index and calculate the number of CAN frames that were supposedly received - replace mcp251xfd_tef_ring_update() with mcp251xfd_get_tef_len(). The mcp251xfd_handle_tefif() function reads the CAN transmit complete events from the chip, iterates over them and pushes them into the network stack. The original driver already contains code to detect old CAN transmit complete events, that will be updated in the next patch. Cc: Stefan Althöfer <Stefan.Althoefer@janztec.com> Cc: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14net/mlx5e: SHAMPO, Fix invalid WQ linked list unlinkDragos Tatulea1-0/+3
[ Upstream commit fba8334721e266f92079632598e46e5f89082f30 ] When all the strides in a WQE have been consumed, the WQE is unlinked from the WQ linked list (mlx5_wq_ll_pop()). For SHAMPO, it is possible to receive CQEs with 0 consumed strides for the same WQE even after the WQE is fully consumed and unlinked. This triggers an additional unlink for the same wqe which corrupts the linked list. Fix this scenario by accepting 0 sized consumed strides without unlinking the WQE again. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240603212219.1037656-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14wifi: ath12k: fix memory leak in ath12k_dp_rx_peer_frag_setup()Baochen Qiang1-0/+1
[ Upstream commit 3d60041543189438cd1b03a1fa40ff6681c77970 ] Currently the resource allocated by crypto_alloc_shash() is not freed in case ath12k_peer_find() fails, resulting in memory leak. Add crypto_free_shash() to fix it. This is found during code review, compile tested only. Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240526124226.24661-1-quic_bqiang@quicinc.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14net: fec: Stop PPS on driver removeCsókás, Bence1-0/+3
[ Upstream commit 8fee6d5ad5fa18c270eedb2a2cdf58dbadefb94b ] PPS was not stopped in `fec_ptp_stop()`, called when the adapter was removed. Consequentially, you couldn't safely reload the driver with the PPS signal on. Fixes: 32cba57ba74b ("net: fec: introduce fec_ptp_stop and use in probe fail path") Reviewed-by: Fabio Estevam <festevam@gmail.com> Link: https://lore.kernel.org/netdev/CAOMZO5BzcZR8PwKKwBssQq_wAGzVgf1ffwe_nhpQJjviTdxy-w@mail.gmail.com/T/#m01dcb810bfc451a492140f6797ca77443d0cb79f Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20240807080956.2556602-1-csokas.bence@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14net: bcmgenet: Properly overlay PHY and MAC Wake-on-LAN capabilitiesFlorian Fainelli1-9/+5
[ Upstream commit 9ee09edc05f20422e7ced84b1f8a5d3359926ac8 ] Some Wake-on-LAN modes such as WAKE_FILTER may only be supported by the MAC, while others might be only supported by the PHY. Make sure that the .get_wol() returns the union of both rather than only that of the PHY if the PHY supports Wake-on-LAN. Fixes: 7e400ff35cbe ("net: bcmgenet: Add support for PHY-based Wake-on-LAN") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20240806175659.3232204-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14net: dsa: bcm_sf2: Fix a possible memory leak in bcm_sf2_mdio_register()Joe Hattori1-1/+3
[ Upstream commit e3862093ee93fcfbdadcb7957f5f8974fffa806a ] bcm_sf2_mdio_register() calls of_phy_find_device() and then phy_device_remove() in a loop to remove existing PHY devices. of_phy_find_device() eventually calls bus_find_device(), which calls get_device() on the returned struct device * to increment the refcount. The current implementation does not decrement the refcount, which causes memory leak. This commit adds the missing phy_device_free() call to decrement the refcount via put_device() to balance the refcount. Fixes: 771089c2a485 ("net: dsa: bcm_sf2: Ensure that MDIO diversion is used") Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20240806011327.3817861-1-joe@pf.is.s.u-tokyo.ac.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14ice: Fix reset handlerGrzegorz Nitka1-0/+2
[ Upstream commit 25a7123579ecac9a89a7e5b8d8a580bee4b68acd ] Synchronize OICR IRQ when preparing for reset to avoid potential race conditions between the reset procedure and OICR Fixes: 4aad5335969f ("ice: add individual interrupt allocation") Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14net: usb: qmi_wwan: fix memory leak for not ip packetsDaniele Palmas1-0/+1
[ Upstream commit 7ab107544b777c3bd7feb9fe447367d8edd5b202 ] Free the unused skb when not ip packets arrive. Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support") Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14wifi: ath12k: fix soft lockup on suspendJohan Hovold1-1/+2
[ Upstream commit a47f3320bb4ba6714abe8dddb36399367b491358 ] The ext interrupts are enabled when the firmware has been started, but this may never happen, for example, if the board configuration file is missing. When the system is later suspended, the driver unconditionally tries to disable interrupts, which results in an irq disable imbalance and causes the driver to spin indefinitely in napi_synchronize(). Make sure that the interrupts have been enabled before attempting to disable them. Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices") Cc: stable@vger.kernel.org # 6.3 Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://patch.msgid.link/20240709073132.9168-1-johan+linaro@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14wifi: ath12k: add CE and ext IRQ flag to indicate irq_handlerKang Yang2-0/+18
[ Upstream commit 604308a34487eaa382c50fcdb4396c435030b4fa ] Add two flags to indicate whether IRQ handler for CE and DP can be called. This is because in one MSI vector case, interrupt is not disabled in hif_stop and hif_irq_disable. So if interrupt is disabled, MHI interrupt is disabled too. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Kang Yang <quic_kangyang@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20231121021304.12966-3-quic_kangyang@quicinc.com Stable-dep-of: a47f3320bb4b ("wifi: ath12k: fix soft lockup on suspend") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14wifi: ath12k: rename the sc naming convention to abKarthikeyan Periyasamy2-11/+11
[ Upstream commit cda8607e824b8f4f1e5f26fef17736c8be4358f8 ] In PCI and HAL interface layer module, the identifier sc is used to represent an instance of ath12k_base structure. However, within ath12k, the convention is to use "ab" to represent an SoC "base" struct. So change the all instances of sc to ab. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.1.1-00125-QCAHKSWPL_SILICONZ-1 Signed-off-by: Karthikeyan Periyasamy <quic_periyasa@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20231018153008.29820-3-quic_periyasa@quicinc.com Stable-dep-of: a47f3320bb4b ("wifi: ath12k: fix soft lockup on suspend") Signed-off-by: Sasha Levin <sashal@kernel.org>