BMC/Intel-BMC/linux.git - Intel OpenBMC Linux kernel source tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2022-01-27	net: phy: marvell: configure RGMII delays for 88E1118	Russell King (Oracle)	1	-0/+6
	[ Upstream commit f22725c95ececb703c3f741e8f946d23705630b7 ] Corentin Labbe reports that the SSI 1328 does not work when allowing the PHY to operate at gigabit speeds, but does work with the generic PHY driver. This appears to be because m88e1118_config_init() writes a fixed value to the MSCR register, claiming that this is to enable 1G speeds. However, this always sets bits 4 and 5, enabling RGMII transmit and receive delays. The suspicion is that the original board this was added for required the delays to make 1G speeds work. Add the necessary configuration for RGMII delays for the 88E1118 to bring this into line with the requirements for RGMII support, and thus make the SSI 1328 work. Corentin Labbe has tested this on gemini-ssi1328 and gemini-ns2502. Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com> Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	mlxsw: pci: Avoid flow control for EMAD packets	Danielle Ratson	2	-1/+17
	[ Upstream commit d43e4271747ace01a27a49a97a397cb4219f6487 ] Locally generated packets ingress the device through its CPU port. When the CPU port is congested and there are not enough credits in its headroom buffer, packets can be dropped. While this might be acceptable for data packets that traverse the network, configuration packets exchanged between the host and the device (EMADs) should not be subjected to this flow control. The "sdq_lp" bit in the SDQ (Send Descriptor Queue) context allows the host to instruct the device to treat packets sent on this queue as "local processing" and always process them, regardless of the state of the CPU port's headroom. Add the definition of this bit and set it for the dedicated SDQ reserved for the transmission of EMAD packets. This makes the "local processing" bit in the WQE (Work Queue Element) redundant, so clear it. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	net: mdio: Demote probed message to debug print	Florian Fainelli	1	-1/+1
	[ Upstream commit 7590fc6f80ac2cbf23e6b42b668bbeded070850b ] On systems with large numbers of MDIO bus/muxes the message indicating that a given MDIO bus has been successfully probed is repeated for as many buses we have, which can eat up substantial boot time for no reason, demote to a debug print. Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220103194024.2620-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ath11k: Fix napi related hang	Ben Greear	3	-7/+19
	[ Upstream commit d943fdad7589653065be0e20aadc6dff37725ed4 ] Similar to the same bug in ath10k, a napi disable w/out it being enabled will hang forever. I believe I saw this while trying rmmod after driver had some failure on startup. Fix it by keeping state on whether napi is enabled or not. And, remove un-used napi pointer in ath11k driver base struct. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20200903195254.29379-1-greearb@candelatech.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	iwlwifi: pcie: make sure prph_info is set when treating wakeup IRQ	Luca Coelho	1	-1/+6
	[ Upstream commit 459fc0f2c6b0f6e280bfa0f230c100c9dfe3a199 ] In some rare cases when the HW is in a bad state, we may get this interrupt when prph_info is not set yet. Then we will try to dereference it to check the sleep_notif element, which will cause an oops. Fix that by ignoring the interrupt if prph_info is not set yet. Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211219132536.0537aa562313.I183bb336345b9b3da196ba9e596a6f189fbcbd09@changeid Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	iwlwifi: mvm: fix AUX ROC removal	Avraham Stern	1	-3/+6
	[ Upstream commit f0337cb48f3bf5f0bbccc985d8a0a8c4aa4934b7 ] When IWL_UCODE_TLV_CAPA_SESSION_PROT_CMD is set, removing a time event always tries to cancel session protection. However, AUX ROC does not use session protection so it was never removed. As a result, if the driver tries to allocate another AUX ROC event right after cancelling the first one, it will fail with a warning. In addition, the time event data passed to iwl_mvm_remove_aux_roc_te() is incorrect. Fix it. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211219132536.915e1f69f062.Id837e917f1c2beaca7c1eb33333d622548918a76@changeid Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	iwlwifi: mvm: Fix calculation of frame length	Ilan Peer	1	-0/+27
	[ Upstream commit 40a0b38d7a7f91a6027287e0df54f5f547e8d27e ] The RADA might include in the Rx frame the MIC and CRC bytes. These bytes should be removed for non monitor interfaces and should not be passed to mac80211. Fix the Rx processing to remove the extra bytes on non monitor cases. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211219121514.098be12c801e.I1d81733d8a75b84c3b20eb6e0d14ab3405ca6a86@changeid Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	iwlwifi: remove module loading failure message	Johannes Berg	1	-8/+1
	[ Upstream commit 6518f83ffa51131daaf439b66094f684da3fb0ae ] When CONFIG_DEBUG_TEST_DRIVER_REMOVE is set, iwlwifi crashes when the opmode module cannot be loaded, due to completing the completion before using drv->dev, which can then already be freed. Fix this by removing the (fairly useless) message. Moving the completion later causes a deadlock instead, so that's not an option. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20211210091245.289008-2-luca@coelho.fi Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	iwlwifi: fix leaks/bad data after failed firmware load	Johannes Berg	1	-0/+8
	[ Upstream commit ab07506b0454bea606095951e19e72c282bfbb42 ] If firmware load fails after having loaded some parts of the firmware, e.g. the IML image, then this would leak. For the host command list we'd end up running into a WARN on the next attempt to load another firmware image. Fix this by calling iwl_dealloc_ucode() on failures, and make that also clear the data so we start fresh on the next round. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211210110539.1f742f0eb58a.I1315f22f6aa632d94ae2069f85e1bca5e734dce0@changeid Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	rtw88: 8822c: update rx settings to prevent potential hw deadlock	Po-Hao Huang	4	-4/+4
	[ Upstream commit c1afb26727d9e507d3e17a9890e7aaf7fc85cd55 ] These settings enables mac to detect and recover when rx fifo circuit deadlock occurs. Previous version missed this, so we fix it. Signed-off-by: Po-Hao Huang <phhuang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20211217012708.8623-1-pkshih@realtek.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ath9k: Fix out-of-bound memcpy in ath9k_hif_usb_rx_stream	Zekun Shen	1	-0/+7
	[ Upstream commit 6ce708f54cc8d73beca213cec66ede5ce100a781 ] Large pkt_len can lead to out-out-bound memcpy. Current ath9k_hif_usb_rx_stream allows combining the content of two urb inputs to one pkt. The first input can indicate the size of the pkt. Any remaining size is saved in hif_dev->rx_remain_len. While processing the next input, memcpy is used with rx_remain_len. 4-byte pkt_len can go up to 0xffff, while a single input is 0x4000 maximum in size (MAX_RX_BUF_SIZE). Thus, the patch adds a check for pkt_len which must not exceed 2 * MAX_RX_BUG_SIZE. BUG: KASAN: slab-out-of-bounds in ath9k_hif_usb_rx_cb+0x490/0xed7 [ath9k_htc] Read of size 46393 at addr ffff888018798000 by task kworker/0:1/23 CPU: 0 PID: 23 Comm: kworker/0:1 Not tainted 5.6.0 #63 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 Workqueue: events request_firmware_work_func Call Trace: <IRQ> dump_stack+0x76/0xa0 print_address_description.constprop.0+0x16/0x200 ? ath9k_hif_usb_rx_cb+0x490/0xed7 [ath9k_htc] ? ath9k_hif_usb_rx_cb+0x490/0xed7 [ath9k_htc] __kasan_report.cold+0x37/0x7c ? ath9k_hif_usb_rx_cb+0x490/0xed7 [ath9k_htc] kasan_report+0xe/0x20 check_memory_region+0x15a/0x1d0 memcpy+0x20/0x50 ath9k_hif_usb_rx_cb+0x490/0xed7 [ath9k_htc] ? hif_usb_mgmt_cb+0x2d9/0x2d9 [ath9k_htc] ? _raw_spin_lock_irqsave+0x7b/0xd0 ? _raw_spin_trylock_bh+0x120/0x120 ? __usb_unanchor_urb+0x12f/0x210 __usb_hcd_giveback_urb+0x1e4/0x380 usb_giveback_urb_bh+0x241/0x4f0 ? __hrtimer_run_queues+0x316/0x740 ? __usb_hcd_giveback_urb+0x380/0x380 tasklet_action_common.isra.0+0x135/0x330 __do_softirq+0x18c/0x634 irq_exit+0x114/0x140 smp_apic_timer_interrupt+0xde/0x380 apic_timer_interrupt+0xf/0x20 I found the bug using a custome USBFuzz port. It's a research work to fuzz USB stack/drivers. I modified it to fuzz ath9k driver only, providing hand-crafted usb descriptors to QEMU. After fixing the value of pkt_tag to ATH_USB_RX_STREAM_MODE_TAG in QEMU emulation, I found the KASAN report. The bug is triggerable whenever pkt_len is above two MAX_RX_BUG_SIZE. I used the same input that crashes to test the driver works when applying the patch. Signed-off-by: Zekun Shen <bruceshenzk@gmail.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/YXsidrRuK6zBJicZ@10-18-43-117.dynapool.wireless.nyu.edu Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ath9k_htc: fix NULL pointer dereference at ath9k_htc_tx_get_packet()	Tetsuo Handa	3	-0/+10
	[ Upstream commit 8b3046abc99eefe11438090bcc4ec3a3994b55d0 ] syzbot is reporting lockdep warning at ath9k_wmi_event_tasklet() followed by kernel panic at get_htc_epid_queue() from ath9k_htc_tx_get_packet() from ath9k_htc_txstatus() [1], for ath9k_wmi_event_tasklet(WMI_TXSTATUS_EVENTID) depends on spin_lock_init() from ath9k_init_priv() being already completed. Since ath9k_wmi_event_tasklet() is set by ath9k_init_wmi() from ath9k_htc_probe_device(), it is possible that ath9k_wmi_event_tasklet() is called via tasklet interrupt before spin_lock_init() from ath9k_init_priv() from ath9k_init_device() from ath9k_htc_probe_device() is called. Let's hold ath9k_wmi_event_tasklet(WMI_TXSTATUS_EVENTID) no-op until ath9k_tx_init() completes. Link: https://syzkaller.appspot.com/bug?extid=31d54c60c5b254d6f75b [1] Reported-by: syzbot <syzbot+31d54c60c5b254d6f75b@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+31d54c60c5b254d6f75b@syzkaller.appspotmail.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/77b76ac8-2bee-6444-d26c-8c30858b8daa@i-love.sakura.ne.jp Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ath9k_htc: fix NULL pointer dereference at ath9k_htc_rxep()	Tetsuo Handa	2	-0/+9
	[ Upstream commit b0ec7e55fce65f125bd1d7f02e2dc4de62abee34 ] syzbot is reporting lockdep warning followed by kernel panic at ath9k_htc_rxep() [1], for ath9k_htc_rxep() depends on ath9k_rx_init() being already completed. Since ath9k_htc_rxep() is set by ath9k_htc_connect_svc(WMI_BEACON_SVC) from ath9k_init_htc_services(), it is possible that ath9k_htc_rxep() is called via timer interrupt before ath9k_rx_init() from ath9k_init_device() is called. Since we can't call ath9k_init_device() before ath9k_init_htc_services(), let's hold ath9k_htc_rxep() no-op until ath9k_rx_init() completes. Link: https://syzkaller.appspot.com/bug?extid=4d2d56175b934b9a7bf9 [1] Reported-by: syzbot <syzbot+4d2d56175b934b9a7bf9@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+4d2d56175b934b9a7bf9@syzkaller.appspotmail.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/2b88f416-b2cb-7a18-d688-951e6dc3fe92@i-love.sakura.ne.jp Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	mt76: mt7615: improve wmm index allocation	Felix Fietkau	1	-5/+3
	[ Upstream commit 70fb028707c8871ef9e56b3ffa3d895e14956cc4 ] Typically all AP interfaces on a PHY will share the same WMM settings, while sta/mesh interfaces will usually inherit the settings from a remote device. In order minimize the likelihood of conflicting WMM settings, make all AP interfaces share one slot, and all non-AP interfaces another one. This also fixes running multiple AP interfaces on MT7613, which only has 3 WMM slots. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	mt76: do not pass the received frame with decryption error	Xing Song	4	-3/+28
	[ Upstream commit dd28dea52ad9376d2b243a8981726646e1f60b1a ] MAC80211 doesn't care any decryption error in 802.3 path, so received frame will be dropped if HW tell us that the cipher configuration is not matched as well as the header has been translated to 802.3. This case only appears when IEEE80211_FCTL_PROTECTED is 0 and cipher suit is not none in the corresponding HW entry. The received frame is only reported to monitor interface if HW decryption block tell us there is ICV error or CCMP/BIP/WPI MIC error. Note in this case the reported frame is decrypted 802.11 frame and the payload may be malformed due to mismatched key. Signed-off-by: Xing Song <xing.song@mediatek.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	mt76: mt7615: fix possible deadlock while mt7615_register_ext_phy()	Peter Chiu	1	-2/+6
	[ Upstream commit 8c55516de3f9b76b9d9444e7890682ec2efc809f ] ieee80211_register_hw() is called with rtnl_lock held, and this could be caused lockdep from a work item that's on a workqueue that is flushed with the rtnl held. Move mt7615_register_ext_phy() outside the init_work(). Signed-off-by: Peter Chiu <chui-hao.chiu@mediatek.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	net: bonding: debug: avoid printing debug logs when bond is not notifying peers	Suresh Kumar	1	-3/+3
	[ Upstream commit fee32de284ac277ba434a2d59f8ce46528ff3946 ] Currently "bond_should_notify_peers: slave ..." messages are printed whenever "bond_should_notify_peers" function is called. +++ Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): Received LACPDU on port 1 Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): Rx Machine: Port=1, Last State=6, Curr State=6 Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): partner sync=1 Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 ... Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): Received LACPDU on port 2 Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): Rx Machine: Port=2, Last State=6, Curr State=6 Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): partner sync=1 Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 +++ This is confusing and can also clutter up debug logs. Print logs only when the peer notification happens. Signed-off-by: Suresh Kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ath11k: Avoid false DEADLOCK warning reported by lockdep	Baochen Qiang	2	-0/+24
	[ Upstream commit 767c94caf0efad136157110787fe221b74cb5c8a ] With CONFIG_LOCKDEP=y and CONFIG_DEBUG_SPINLOCK=y, lockdep reports below warning: [ 166.059415] ============================================ [ 166.059416] WARNING: possible recursive locking detected [ 166.059418] 5.15.0-wt-ath+ #10 Tainted: G W O [ 166.059420] -------------------------------------------- [ 166.059421] kworker/0:2/116 is trying to acquire lock: [ 166.059423] ffff9905f2083160 (&srng->lock){+.-.}-{2:2}, at: ath11k_hal_reo_cmd_send+0x20/0x490 [ath11k] [ 166.059440] but task is already holding lock: [ 166.059442] ffff9905f2083230 (&srng->lock){+.-.}-{2:2}, at: ath11k_dp_process_reo_status+0x95/0x2d0 [ath11k] [ 166.059491] other info that might help us debug this: [ 166.059492] Possible unsafe locking scenario: [ 166.059493] CPU0 [ 166.059494] ---- [ 166.059495] lock(&srng->lock); [ 166.059498] lock(&srng->lock); [ 166.059500] * DEADLOCK * [ 166.059501] May be due to missing lock nesting notation [ 166.059502] 3 locks held by kworker/0:2/116: [ 166.059504] #0: ffff9905c0081548 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1f6/0x660 [ 166.059511] #1: ffff9d2400a5fe68 ((debug_obj_work).work){+.+.}-{0:0}, at: process_one_work+0x1f6/0x660 [ 166.059517] #2: ffff9905f2083230 (&srng->lock){+.-.}-{2:2}, at: ath11k_dp_process_reo_status+0x95/0x2d0 [ath11k] [ 166.059532] stack backtrace: [ 166.059534] CPU: 0 PID: 116 Comm: kworker/0:2 Kdump: loaded Tainted: G W O 5.15.0-wt-ath+ #10 [ 166.059537] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0059.2019.1112.1124 11/12/2019 [ 166.059539] Workqueue: events free_obj_work [ 166.059543] Call Trace: [ 166.059545] <IRQ> [ 166.059547] dump_stack_lvl+0x56/0x7b [ 166.059552] __lock_acquire+0xb9a/0x1a50 [ 166.059556] lock_acquire+0x1e2/0x330 [ 166.059560] ? ath11k_hal_reo_cmd_send+0x20/0x490 [ath11k] [ 166.059571] _raw_spin_lock_bh+0x33/0x70 [ 166.059574] ? ath11k_hal_reo_cmd_send+0x20/0x490 [ath11k] [ 166.059584] ath11k_hal_reo_cmd_send+0x20/0x490 [ath11k] [ 166.059594] ath11k_dp_tx_send_reo_cmd+0x3f/0x130 [ath11k] [ 166.059605] ath11k_dp_rx_tid_del_func+0x221/0x370 [ath11k] [ 166.059618] ath11k_dp_process_reo_status+0x22f/0x2d0 [ath11k] [ 166.059632] ? ath11k_dp_service_srng+0x2ea/0x2f0 [ath11k] [ 166.059643] ath11k_dp_service_srng+0x2ea/0x2f0 [ath11k] [ 166.059655] ath11k_pci_ext_grp_napi_poll+0x1c/0x70 [ath11k_pci] [ 166.059659] __napi_poll+0x28/0x230 [ 166.059664] net_rx_action+0x285/0x310 [ 166.059668] __do_softirq+0xe6/0x4d2 [ 166.059672] irq_exit_rcu+0xd2/0xf0 [ 166.059675] common_interrupt+0xa5/0xc0 [ 166.059678] </IRQ> [ 166.059679] <TASK> [ 166.059680] asm_common_interrupt+0x1e/0x40 [ 166.059683] RIP: 0010:_raw_spin_unlock_irqrestore+0x38/0x70 [ 166.059686] Code: 83 c7 18 e8 2a 95 43 ff 48 89 ef e8 22 d2 43 ff 81 e3 00 02 00 00 75 25 9c 58 f6 c4 02 75 2d 48 85 db 74 01 fb bf 01 00 00 00 <e8> 63 2e 40 ff 65 8b 05 8c 59 97 5c 85 c0 74 0a 5b 5d c3 e8 00 6a [ 166.059689] RSP: 0018:ffff9d2400a5fca0 EFLAGS: 00000206 [ 166.059692] RAX: 0000000000000002 RBX: 0000000000000200 RCX: 0000000000000006 [ 166.059694] RDX: 0000000000000000 RSI: ffffffffa404879b RDI: 0000000000000001 [ 166.059696] RBP: ffff9905c0053000 R08: 0000000000000001 R09: 0000000000000001 [ 166.059698] R10: ffff9d2400a5fc50 R11: 0000000000000001 R12: ffffe186c41e2840 [ 166.059700] R13: 0000000000000001 R14: ffff9905c78a1c68 R15: 0000000000000001 [ 166.059704] free_debug_processing+0x257/0x3d0 [ 166.059708] ? free_obj_work+0x1f5/0x250 [ 166.059712] __slab_free+0x374/0x5a0 [ 166.059718] ? kmem_cache_free+0x2e1/0x370 [ 166.059721] ? free_obj_work+0x1f5/0x250 [ 166.059724] kmem_cache_free+0x2e1/0x370 [ 166.059727] free_obj_work+0x1f5/0x250 [ 166.059731] process_one_work+0x28b/0x660 [ 166.059735] ? process_one_work+0x660/0x660 [ 166.059738] worker_thread+0x37/0x390 [ 166.059741] ? process_one_work+0x660/0x660 [ 166.059743] kthread+0x176/0x1a0 [ 166.059746] ? set_kthread_struct+0x40/0x40 [ 166.059749] ret_from_fork+0x22/0x30 [ 166.059754] </TASK> Since these two lockes are both initialized in ath11k_hal_srng_setup, they are assigned with the same key. As a result lockdep suspects that the task is trying to acquire the same lock (due to same key) while already holding it, and thus reports the DEADLOCK warning. However as they are different spinlock instances, the warning is false positive. On the other hand, even no dead lock indeed, this is a major issue for upstream regression testing as it disables lockdep functionality. Fix it by assigning separate lock class key for each srng->lock. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20211209011949.151472-1-quic_bqiang@quicinc.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	net: phy: prefer 1000baseT over 1000baseKX	Russell King (Oracle)	1	-1/+1
	[ Upstream commit f20f94f7f52c4685c81754f489ffcc72186e8bdb ] The PHY settings table is supposed to be sorted by descending match priority - in other words, earlier entries are preferred over later entries. The order of 1000baseKX/Full and 1000baseT/Full is such that we prefer 1000baseKX/Full over 1000baseT/Full, but 1000baseKX/Full is a lot rarer than 1000baseT/Full, and thus is much less likely to be preferred. This causes phylink problems - it means a fixed link specifying a speed of 1G and full duplex gets an ethtool linkmode of 1000baseKX/Full rather than 1000baseT/Full as would be expected - and since we offer userspace a software emulation of a conventional copper PHY, we want to offer copper modes in preference to anything else. However, we do still want to allow the rarer modes as well. Hence, let's reorder these two modes to prefer copper. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reported-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/E1muvFO-00F6jY-1K@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ath10k: Fix tx hanging	Sebastian Gottschall	2	-2/+3
	[ Upstream commit e8a91863eba3966a447d2daa1526082d52b5db2a ] While running stress tests in roaming scenarios (switching ap's every 5 seconds, we discovered a issue which leads to tx hangings of exactly 5 seconds while or after scanning for new accesspoints. We found out that this hanging is triggered by ath10k_mac_wait_tx_complete since the empty_tx_wq was not wake when the num_tx_pending counter reaches zero. To fix this, we simply move the wake_up call to htt_tx_dec_pending, since this call was missed on several locations within the ath10k code. Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20210505085806.11474-1-s.gottschall@dd-wrt.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ath11k: avoid deadlock by change ieee80211_queue_work for regd_update_work	Wen Gong	1	-1/+1
	[ Upstream commit ed05c7cf1286d7e31e7623bce55ff135723591bf ] When enable debug config, it print below warning while shut down wlan interface shuh as run "ifconfig wlan0 down". The reason is because ar->regd_update_work is ran once, and it is will call wiphy_lock(ar->hw->wiphy) in function ath11k_regd_update() which is running in workqueue of ieee80211_local queued by ieee80211_queue_work(). Another thread from "ifconfig wlan0 down" will also accuqire the lock by wiphy_lock(sdata->local->hw.wiphy) in function ieee80211_stop(), and then it call ieee80211_stop_device() to flush_workqueue(local->workqueue), this will wait the workqueue of ieee80211_local finished. Then deadlock will happen easily if the two thread run meanwhile. Below warning disappeared after this change. [ 914.088798] ath11k_pci 0000:05:00.0: mac remove interface (vdev 0) [ 914.088806] ath11k_pci 0000:05:00.0: mac stop 11d scan [ 914.088810] ath11k_pci 0000:05:00.0: mac stop 11d vdev id 0 [ 914.088827] ath11k_pci 0000:05:00.0: htc ep 2 consumed 1 credits (total 0) [ 914.088841] ath11k_pci 0000:05:00.0: send 11d scan stop vdev id 0 [ 914.088849] ath11k_pci 0000:05:00.0: htc insufficient credits ep 2 required 1 available 0 [ 914.088856] ath11k_pci 0000:05:00.0: htc insufficient credits ep 2 required 1 available 0 [ 914.096434] ath11k_pci 0000:05:00.0: rx ce pipe 2 len 16 [ 914.096442] ath11k_pci 0000:05:00.0: htc ep 2 got 1 credits (total 1) [ 914.096481] ath11k_pci 0000:05:00.0: htc ep 2 consumed 1 credits (total 0) [ 914.096491] ath11k_pci 0000:05:00.0: WMI vdev delete id 0 [ 914.111598] ath11k_pci 0000:05:00.0: rx ce pipe 2 len 16 [ 914.111628] ath11k_pci 0000:05:00.0: htc ep 2 got 1 credits (total 1) [ 914.114659] ath11k_pci 0000:05:00.0: rx ce pipe 2 len 20 [ 914.114742] ath11k_pci 0000:05:00.0: htc rx completion ep 2 skb pK-error [ 914.115977] ath11k_pci 0000:05:00.0: vdev delete resp for vdev id 0 [ 914.116685] ath11k_pci 0000:05:00.0: vdev 00:03:7f:29:61:11 deleted, vdev_id 0 [ 914.117583] ====================================================== [ 914.117592] WARNING: possible circular locking dependency detected [ 914.117600] 5.16.0-rc1-wt-ath+ #1 Tainted: G OE [ 914.117611] ------------------------------------------------------ [ 914.117618] ifconfig/2805 is trying to acquire lock: [ 914.117628] ffff9c00a62bb548 ((wq_completion)phy0){+.+.}-{0:0}, at: flush_workqueue+0x87/0x470 [ 914.117674] but task is already holding lock: [ 914.117682] ffff9c00baea07d0 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: ieee80211_stop+0x38/0x180 [mac80211] [ 914.117872] which lock already depends on the new lock. [ 914.117880] the existing dependency chain (in reverse order) is: [ 914.117888] -> #3 (&rdev->wiphy.mtx){+.+.}-{4:4}: [ 914.117910] __mutex_lock+0xa0/0x9c0 [ 914.117930] mutex_lock_nested+0x1b/0x20 [ 914.117944] reg_process_self_managed_hints+0x3a/0xb0 [cfg80211] [ 914.118093] wiphy_regulatory_register+0x47/0x80 [cfg80211] [ 914.118229] wiphy_register+0x84f/0x9c0 [cfg80211] [ 914.118353] ieee80211_register_hw+0x6b1/0xd90 [mac80211] [ 914.118486] ath11k_mac_register+0x6af/0xb60 [ath11k] [ 914.118550] ath11k_core_qmi_firmware_ready+0x383/0x4a0 [ath11k] [ 914.118598] ath11k_qmi_driver_event_work+0x347/0x4a0 [ath11k] [ 914.118656] process_one_work+0x228/0x670 [ 914.118669] worker_thread+0x4d/0x440 [ 914.118680] kthread+0x16d/0x1b0 [ 914.118697] ret_from_fork+0x22/0x30 [ 914.118714] -> #2 (rtnl_mutex){+.+.}-{4:4}: [ 914.118736] __mutex_lock+0xa0/0x9c0 [ 914.118751] mutex_lock_nested+0x1b/0x20 [ 914.118767] rtnl_lock+0x17/0x20 [ 914.118783] ath11k_regd_update+0x15a/0x260 [ath11k] [ 914.118841] ath11k_regd_update_work+0x15/0x20 [ath11k] [ 914.118897] process_one_work+0x228/0x670 [ 914.118909] worker_thread+0x4d/0x440 [ 914.118920] kthread+0x16d/0x1b0 [ 914.118934] ret_from_fork+0x22/0x30 [ 914.118948] -> #1 ((work_completion)(&ar->regd_update_work)){+.+.}-{0:0}: [ 914.118972] process_one_work+0x1fa/0x670 [ 914.118984] worker_thread+0x4d/0x440 [ 914.118996] kthread+0x16d/0x1b0 [ 914.119010] ret_from_fork+0x22/0x30 [ 914.119023] -> #0 ((wq_completion)phy0){+.+.}-{0:0}: [ 914.119045] __lock_acquire+0x146d/0x1cf0 [ 914.119057] lock_acquire+0x19b/0x360 [ 914.119067] flush_workqueue+0xae/0x470 [ 914.119084] ieee80211_stop_device+0x3b/0x50 [mac80211] [ 914.119260] ieee80211_do_stop+0x5d7/0x830 [mac80211] [ 914.119409] ieee80211_stop+0x45/0x180 [mac80211] [ 914.119557] __dev_close_many+0xb3/0x120 [ 914.119573] __dev_change_flags+0xc3/0x1d0 [ 914.119590] dev_change_flags+0x29/0x70 [ 914.119605] devinet_ioctl+0x653/0x810 [ 914.119620] inet_ioctl+0x193/0x1e0 [ 914.119631] sock_do_ioctl+0x4d/0xf0 [ 914.119649] sock_ioctl+0x262/0x340 [ 914.119665] __x64_sys_ioctl+0x96/0xd0 [ 914.119678] do_syscall_64+0x3d/0xd0 [ 914.119694] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 914.119709] other info that might help us debug this: [ 914.119717] Chain exists of: (wq_completion)phy0 --> rtnl_mutex --> &rdev->wiphy.mtx [ 914.119745] Possible unsafe locking scenario: [ 914.119752] CPU0 CPU1 [ 914.119758] ---- ---- [ 914.119765] lock(&rdev->wiphy.mtx); [ 914.119778] lock(rtnl_mutex); [ 914.119792] lock(&rdev->wiphy.mtx); [ 914.119807] lock((wq_completion)phy0); [ 914.119819] * DEADLOCK * [ 914.119827] 2 locks held by ifconfig/2805: [ 914.119837] #0: ffffffffba3dc010 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x17/0x20 [ 914.119872] #1: ffff9c00baea07d0 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: ieee80211_stop+0x38/0x180 [mac80211] [ 914.120039] stack backtrace: [ 914.120048] CPU: 0 PID: 2805 Comm: ifconfig Tainted: G OE 5.16.0-rc1-wt-ath+ #1 [ 914.120064] Hardware name: LENOVO 418065C/418065C, BIOS 83ET63WW (1.33 ) 07/29/2011 [ 914.120074] Call Trace: [ 914.120084] <TASK> [ 914.120094] dump_stack_lvl+0x73/0xa4 [ 914.120119] dump_stack+0x10/0x12 [ 914.120135] print_circular_bug.isra.44+0x221/0x2e0 [ 914.120165] check_noncircular+0x106/0x150 [ 914.120203] __lock_acquire+0x146d/0x1cf0 [ 914.120215] ? __lock_acquire+0x146d/0x1cf0 [ 914.120245] lock_acquire+0x19b/0x360 [ 914.120259] ? flush_workqueue+0x87/0x470 [ 914.120286] ? lockdep_init_map_type+0x6b/0x250 [ 914.120310] flush_workqueue+0xae/0x470 [ 914.120327] ? flush_workqueue+0x87/0x470 [ 914.120344] ? lockdep_hardirqs_on+0xd7/0x150 [ 914.120391] ieee80211_stop_device+0x3b/0x50 [mac80211] [ 914.120565] ? ieee80211_stop_device+0x3b/0x50 [mac80211] [ 914.120736] ieee80211_do_stop+0x5d7/0x830 [mac80211] [ 914.120906] ieee80211_stop+0x45/0x180 [mac80211] [ 914.121060] __dev_close_many+0xb3/0x120 [ 914.121081] __dev_change_flags+0xc3/0x1d0 [ 914.121109] dev_change_flags+0x29/0x70 [ 914.121131] devinet_ioctl+0x653/0x810 [ 914.121149] ? __might_fault+0x77/0x80 [ 914.121179] inet_ioctl+0x193/0x1e0 [ 914.121194] ? inet_ioctl+0x193/0x1e0 [ 914.121218] ? __might_fault+0x77/0x80 [ 914.121238] ? _copy_to_user+0x68/0x80 [ 914.121266] sock_do_ioctl+0x4d/0xf0 [ 914.121283] ? inet_stream_connect+0x60/0x60 [ 914.121297] ? sock_do_ioctl+0x4d/0xf0 [ 914.121329] sock_ioctl+0x262/0x340 [ 914.121347] ? sock_ioctl+0x262/0x340 [ 914.121362] ? exit_to_user_mode_prepare+0x13b/0x280 [ 914.121388] ? syscall_enter_from_user_mode+0x20/0x50 [ 914.121416] __x64_sys_ioctl+0x96/0xd0 [ 914.121430] ? br_ioctl_call+0x90/0x90 [ 914.121445] ? __x64_sys_ioctl+0x96/0xd0 [ 914.121465] do_syscall_64+0x3d/0xd0 [ 914.121482] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 914.121497] RIP: 0033:0x7f0ed051737b [ 914.121513] Code: 0f 1e fa 48 8b 05 15 3b 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 3a 0d 00 f7 d8 64 89 01 48 [ 914.121527] RSP: 002b:00007fff7be38b98 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 914.121544] RAX: ffffffffffffffda RBX: 00007fff7be38ba0 RCX: 00007f0ed051737b [ 914.121555] RDX: 00007fff7be38ba0 RSI: 0000000000008914 RDI: 0000000000000004 [ 914.121566] RBP: 00007fff7be38c60 R08: 000000000000000a R09: 0000000000000001 [ 914.121576] R10: 0000000000000000 R11: 0000000000000202 R12: 00000000fffffffe [ 914.121586] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000 [ 914.121620] </TASK> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20211201071745.17746-2-quic_wgong@quicinc.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	iwlwifi: mvm: avoid clearing a just saved session protection id	Shaul Triebitz	1	-6/+6
	[ Upstream commit 8e967c137df3b236d2075f9538cb888129425d1a ] When scheduling a session protection the id is saved but then it may be cleared when calling iwl_mvm_te_clear_data (if a previous session protection is currently active). Fix it by saving the id after calling iwl_mvm_te_clear_data. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211204130722.b0743a588d14.I098fef6677d0dab3ef1b6183ed206a10bab01eb2@changeid Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	iwlwifi: mvm: synchronize with FW after multicast commands	Johannes Berg	1	-0/+17
	[ Upstream commit db66abeea3aefed481391ecc564fb7b7fb31d742 ] If userspace installs a lot of multicast groups very quickly, then we may run out of command queue space as we send the updates in an asynchronous fashion (due to locking concerns), and the CPU can create them faster than the firmware can process them. This is true even when mac80211 has a work struct that gets scheduled. Fix this by synchronizing with the firmware after sending all those commands - outside of the iteration we can send a synchronous echo command that just has the effect of the CPU waiting for the prior asynchronous commands to finish. This also will cause fewer of the commands to be sent to the firmware overall, because the work will only run once when rescheduled multiple times while it's running. Link: https://bugzilla.kernel.org/show_bug.cgi?id=213649 Suggested-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reported-by: Maximilian Ernestus <maximilian@ernestus.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211204083238.51aea5b79ea4.I88a44798efda16e9fe480fb3e94224931d311b29@changeid Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ath11k: Avoid NULL ptr access during mgmt tx cleanup	Sriram R	1	-15/+20
	[ Upstream commit a93789ae541c7d5c1c2a4942013adb6bcc5e2848 ] Currently 'ar' reference is not added in skb_cb during WMI mgmt tx. Though this is generally not used during tx completion callbacks, on interface removal the remaining idr cleanup callback uses the ar ptr from skb_cb from mgmt txmgmt_idr. Hence fill them during tx call for proper usage. Also free the skb which is missing currently in these callbacks. Crash_info: [19282.489476] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [19282.489515] pgd = 91eb8000 [19282.496702] [00000000] *pgd=00000000 [19282.502524] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [19282.783728] PC is at ath11k_mac_vif_txmgmt_idr_remove+0x28/0xd8 [ath11k] [19282.789170] LR is at idr_for_each+0xa0/0xc8 Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.5.0.1-00729-QCAHKSWPL_SILICONZ-3 v2 Signed-off-by: Sriram R <quic_srirrama@quicinc.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/1637832614-13831-1-git-send-email-quic_srirrama@quicinc.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	rsi: Fix out-of-bounds read in rsi_read_pkt()	Zekun Shen	3	-1/+6
	[ Upstream commit f1cb3476e48b60c450ec3a1d7da0805bffc6e43a ] rsi_get_* functions rely on an offset variable from usb input. The size of usb input is RSI_MAX_RX_USB_PKT_SIZE(3000), while 2-byte offset can be up to 0xFFFF. Thus a large offset can cause out-of-bounds read. The patch adds a bound checking condition when rcv_pkt_len is 0, indicating it's USB. It's unclear whether this is triggerable from other type of bus. The following check might help in that case. offset > rcv_pkt_len - FRAME_DESC_SZ The bug is trigerrable with conpromised/malfunctioning USB devices. I tested the patch with the crashing input and got no more bug report. Attached is the KASAN report from fuzzing. BUG: KASAN: slab-out-of-bounds in rsi_read_pkt+0x42e/0x500 [rsi_91x] Read of size 2 at addr ffff888019439fdb by task RX-Thread/227 CPU: 0 PID: 227 Comm: RX-Thread Not tainted 5.6.0 #66 Call Trace: dump_stack+0x76/0xa0 print_address_description.constprop.0+0x16/0x200 ? rsi_read_pkt+0x42e/0x500 [rsi_91x] ? rsi_read_pkt+0x42e/0x500 [rsi_91x] __kasan_report.cold+0x37/0x7c ? rsi_read_pkt+0x42e/0x500 [rsi_91x] kasan_report+0xe/0x20 rsi_read_pkt+0x42e/0x500 [rsi_91x] rsi_usb_rx_thread+0x1b1/0x2fc [rsi_usb] ? rsi_probe+0x16a0/0x16a0 [rsi_usb] ? _raw_spin_lock_irqsave+0x7b/0xd0 ? _raw_spin_trylock_bh+0x120/0x120 ? __wake_up_common+0x10b/0x520 ? rsi_probe+0x16a0/0x16a0 [rsi_usb] kthread+0x2b5/0x3b0 ? kthread_create_on_node+0xd0/0xd0 ret_from_fork+0x22/0x40 Reported-by: Brendan Dolan-Gavitt <brendandg@nyu.edu> Signed-off-by: Zekun Shen <bruceshenzk@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/YXxXS4wgu2OsmlVv@10-18-43-117.dynapool.wireless.nyu.edu Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	rsi: Fix use-after-free in rsi_rx_done_handler()	Zekun Shen	1	-1/+7
	[ Upstream commit b07e3c6ebc0c20c772c0f54042e430acec2945c3 ] When freeing rx_cb->rx_skb, the pointer is not set to NULL, a later rsi_rx_done_handler call will try to read the freed address. This bug will very likley lead to double free, although detected early as use-after-free bug. The bug is triggerable with a compromised/malfunctional usb device. After applying the patch, the same input no longer triggers the use-after-free. Attached is the kasan report from fuzzing. BUG: KASAN: use-after-free in rsi_rx_done_handler+0x354/0x430 [rsi_usb] Read of size 4 at addr ffff8880188e5930 by task modprobe/231 Call Trace: <IRQ> dump_stack+0x76/0xa0 print_address_description.constprop.0+0x16/0x200 ? rsi_rx_done_handler+0x354/0x430 [rsi_usb] ? rsi_rx_done_handler+0x354/0x430 [rsi_usb] __kasan_report.cold+0x37/0x7c ? dma_direct_unmap_page+0x90/0x110 ? rsi_rx_done_handler+0x354/0x430 [rsi_usb] kasan_report+0xe/0x20 rsi_rx_done_handler+0x354/0x430 [rsi_usb] __usb_hcd_giveback_urb+0x1e4/0x380 usb_giveback_urb_bh+0x241/0x4f0 ? __usb_hcd_giveback_urb+0x380/0x380 ? apic_timer_interrupt+0xa/0x20 tasklet_action_common.isra.0+0x135/0x330 __do_softirq+0x18c/0x634 ? handle_irq_event+0xcd/0x157 ? handle_edge_irq+0x1eb/0x7b0 irq_exit+0x114/0x140 do_IRQ+0x91/0x1e0 common_interrupt+0xf/0xf </IRQ> Reported-by: Brendan Dolan-Gavitt <brendandg@nyu.edu> Signed-off-by: Zekun Shen <bruceshenzk@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/YXxQL/vIiYcZUu/j@10-18-43-117.dynapool.wireless.nyu.edu Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	mwifiex: Fix skb_over_panic in mwifiex_usb_recv()	Zekun Shen	1	-1/+2
	[ Upstream commit 04d80663f67ccef893061b49ec8a42ff7045ae84 ] Currently, with an unknown recv_type, mwifiex_usb_recv just return -1 without restoring the skb. Next time mwifiex_usb_rx_complete is invoked with the same skb, calling skb_put causes skb_over_panic. The bug is triggerable with a compromised/malfunctioning usb device. After applying the patch, skb_over_panic no longer shows up with the same input. Attached is the panic report from fuzzing. skbuff: skb_over_panic: text:000000003bf1b5fa len:2048 put:4 head:00000000dd6a115b data:000000000a9445d8 tail:0x844 end:0x840 dev:<NULL> kernel BUG at net/core/skbuff.c:109! invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 0 PID: 198 Comm: in:imklog Not tainted 5.6.0 #60 RIP: 0010:skb_panic+0x15f/0x161 Call Trace: <IRQ> ? mwifiex_usb_rx_complete+0x26b/0xfcd [mwifiex_usb] skb_put.cold+0x24/0x24 mwifiex_usb_rx_complete+0x26b/0xfcd [mwifiex_usb] __usb_hcd_giveback_urb+0x1e4/0x380 usb_giveback_urb_bh+0x241/0x4f0 ? __hrtimer_run_queues+0x316/0x740 ? __usb_hcd_giveback_urb+0x380/0x380 tasklet_action_common.isra.0+0x135/0x330 __do_softirq+0x18c/0x634 irq_exit+0x114/0x140 smp_apic_timer_interrupt+0xde/0x380 apic_timer_interrupt+0xf/0x20 </IRQ> Reported-by: Brendan Dolan-Gavitt <brendandg@nyu.edu> Signed-off-by: Zekun Shen <bruceshenzk@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/YX4CqjfRcTa6bVL+@Zekuns-MBP-16.fios-router.home Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	mlxsw: pci: Add shutdown method in PCI driver	Danielle Ratson	1	-0/+1
	[ Upstream commit c1020d3cf4752f61a6a413f632ea2ce2370e150d ] On an arm64 platform with the Spectrum ASIC, after loading and executing a new kernel via kexec, the following trace [1] is observed. This seems to be caused by the fact that the device is not properly shutdown before executing the new kernel. Fix this by implementing a shutdown method which mirrors the remove method, as recommended by the kexec maintainer [2][3]. [1] BUG: Bad page state in process devlink pfn:22f73d page:fffffe00089dcf40 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x2ffff00000000000() raw: 2ffff00000000000 0000000000000000 ffffffff089d0201 0000000000000000 raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000 page dumped because: nonzero _refcount Modules linked in: CPU: 1 PID: 16346 Comm: devlink Tainted: G B 5.8.0-rc6-custom-273020-gac6b365b1bf5 #44 Hardware name: Marvell Armada 7040 TX4810M (DT) Call trace: dump_backtrace+0x0/0x1d0 show_stack+0x1c/0x28 dump_stack+0xbc/0x118 bad_page+0xcc/0xf8 check_free_page_bad+0x80/0x88 __free_pages_ok+0x3f8/0x418 __free_pages+0x38/0x60 kmem_freepages+0x200/0x2a8 slab_destroy+0x28/0x68 slabs_destroy+0x60/0x90 ___cache_free+0x1b4/0x358 kfree+0xc0/0x1d0 skb_free_head+0x2c/0x38 skb_release_data+0x110/0x1a0 skb_release_all+0x2c/0x38 consume_skb+0x38/0x130 __dev_kfree_skb_any+0x44/0x50 mlxsw_pci_rdq_fini+0x8c/0xb0 mlxsw_pci_queue_fini.isra.0+0x28/0x58 mlxsw_pci_queue_group_fini+0x58/0x88 mlxsw_pci_aqs_fini+0x2c/0x60 mlxsw_pci_fini+0x34/0x50 mlxsw_core_bus_device_unregister+0x104/0x1d0 mlxsw_devlink_core_bus_device_reload_down+0x2c/0x48 devlink_reload+0x44/0x158 devlink_nl_cmd_reload+0x270/0x290 genl_rcv_msg+0x188/0x2f0 netlink_rcv_skb+0x5c/0x118 genl_rcv+0x3c/0x50 netlink_unicast+0x1bc/0x278 netlink_sendmsg+0x194/0x390 __sys_sendto+0xe0/0x158 __arm64_sys_sendto+0x2c/0x38 el0_svc_common.constprop.0+0x70/0x168 do_el0_svc+0x28/0x88 el0_sync_handler+0x88/0x190 el0_sync+0x140/0x180 [2] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1195432.html [3] https://patchwork.kernel.org/project/linux-scsi/patch/20170212214920.28866-1-anton@ozlabs.org/#20116693 Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ethernet: renesas: Use div64_ul instead of do_div	Yang Li	1	-4/+2
	[ Upstream commit d9f31aeaa1e5aefa68130878af3c3513d41c1e2d ] do_div() does a 64-by-32 division. Here the divisor is an unsigned long which on some platforms is 64 bit wide. So use div64_ul instead of do_div to avoid a possible truncation. Eliminate the following coccicheck warning: ./drivers/net/ethernet/renesas/ravb_main.c:2492:1-7: WARNING: do_div() does a 64-by-32 division, please consider using div64_ul instead. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/1637228883-100100-1-git-send-email-yang.lee@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ath11k: Fix crash caused by uninitialized TX ring	Baochen Qiang	1	-2/+0
	[ Upstream commit 273703ebdb01b6c5f1aaf4b98fb57b177609055c ] Commit 31582373a4a8 ("ath11k: Change number of TCL rings to one for QCA6390") avoids initializing the other entries of dp->tx_ring cause the corresponding TX rings on QCA6390/WCN6855 are not used, but leaves those ring masks in ath11k_hw_ring_mask_qca6390.tx unchanged. Normally this is OK because we will only get interrupts from the first TX ring on these chips and thus only the first entry of dp->tx_ring is involved. In case of one MSI vector, all DP rings share the same IRQ. For each interrupt, all rings have to be checked, which means the other entries of dp->tx_ring are involved. However since they are not initialized, system crashes. Fix this issue by simply removing those ring masks. crash stack: [ 102.907438] BUG: kernel NULL pointer dereference, address: 0000000000000028 [ 102.907447] #PF: supervisor read access in kernel mode [ 102.907451] #PF: error_code(0x0000) - not-present page [ 102.907453] PGD 1081f0067 P4D 1081f0067 PUD 1081f1067 PMD 0 [ 102.907460] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI [ 102.907465] CPU: 0 PID: 3511 Comm: apt-check Kdump: loaded Tainted: G E 5.15.0-rc4-wt-ath+ #20 [ 102.907470] Hardware name: AMD Celadon-RN/Celadon-RN, BIOS RCD1005E 10/08/2020 [ 102.907472] RIP: 0010:ath11k_dp_tx_completion_handler+0x201/0x830 [ath11k] [ 102.907497] Code: 3c 24 4e 8d ac 37 10 04 00 00 4a 8d bc 37 68 04 00 00 48 89 3c 24 48 63 c8 89 83 84 18 00 00 48 c1 e1 05 48 03 8b 78 18 00 00 <8b> 51 08 89 d6 83 e6 07 89 74 24 24 83 fe 03 74 04 85 f6 75 63 41 [ 102.907501] RSP: 0000:ffff9b7340003e08 EFLAGS: 00010202 [ 102.907505] RAX: 0000000000000001 RBX: ffff8e21530c0100 RCX: 0000000000000020 [ 102.907508] RDX: 0000000000000000 RSI: 00000000fffffe00 RDI: ffff8e21530c1938 [ 102.907511] RBP: ffff8e21530c0000 R08: 0000000000000001 R09: 0000000000000000 [ 102.907513] R10: ffff8e2145534c10 R11: 0000000000000001 R12: ffff8e21530c2938 [ 102.907515] R13: ffff8e21530c18e0 R14: 0000000000000100 R15: ffff8e21530c2978 [ 102.907518] FS: 00007f5d4297e740(0000) GS:ffff8e243d600000(0000) knlGS:0000000000000000 [ 102.907521] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 102.907524] CR2: 0000000000000028 CR3: 00000001034ea000 CR4: 0000000000350ef0 [ 102.907527] Call Trace: [ 102.907531] <IRQ> [ 102.907537] ath11k_dp_service_srng+0x5c/0x2f0 [ath11k] [ 102.907556] ath11k_pci_ext_grp_napi_poll+0x21/0x70 [ath11k_pci] [ 102.907562] __napi_poll+0x2c/0x160 [ 102.907570] net_rx_action+0x251/0x310 [ 102.907576] __do_softirq+0x107/0x2fc [ 102.907585] irq_exit_rcu+0x74/0x90 [ 102.907593] common_interrupt+0x83/0xa0 [ 102.907600] </IRQ> [ 102.907601] asm_common_interrupt+0x1e/0x40 Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 Signed-off-by: Baochen Qiang <bqiang@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20211026011605.58615-1-quic_bqiang@quicinc.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ar5523: Fix null-ptr-deref with unexpected WDCMSG_TARGET_START reply	Zekun Shen	1	-0/+4
	[ Upstream commit ae80b6033834342601e99f74f6a62ff5092b1cee ] Unexpected WDCMSG_TARGET_START replay can lead to null-ptr-deref when ar->tx_cmd->odata is NULL. The patch adds a null check to prevent such case. KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] ar5523_cmd+0x46a/0x581 [ar5523] ar5523_probe.cold+0x1b7/0x18da [ar5523] ? ar5523_cmd_rx_cb+0x7a0/0x7a0 [ar5523] ? __pm_runtime_set_status+0x54a/0x8f0 ? _raw_spin_trylock_bh+0x120/0x120 ? pm_runtime_barrier+0x220/0x220 ? __pm_runtime_resume+0xb1/0xf0 usb_probe_interface+0x25b/0x710 really_probe+0x209/0x5d0 driver_probe_device+0xc6/0x1b0 device_driver_attach+0xe2/0x120 I found the bug using a custome USBFuzz port. It's a research work to fuzz USB stack/drivers. I modified it to fuzz ath9k driver only, providing hand-crafted usb descriptors to QEMU. After fixing the code (fourth byte in usb packet) to WDCMSG_TARGET_START, I got the null-ptr-deref bug. I believe the bug is triggerable whenever cmd->odata is NULL. After patching, I tested with the same input and no longer see the KASAN report. This was NOT tested on a real device. Signed-off-by: Zekun Shen <bruceshenzk@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/YXsmPQ3awHFLuAj2@10-18-43-117.dynapool.wireless.nyu.edu Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	net: mcs7830: handle usb read errors properly	Pavel Skripkin	1	-2/+10
	[ Upstream commit d668769eb9c52b150753f1653f7f5a0aeb8239d2 ] Syzbot reported uninit value in mcs7830_bind(). The problem was in missing validation check for bytes read via usbnet_read_cmd(). usbnet_read_cmd() internally calls usb_control_msg(), that returns number of bytes read. Code should validate that requested number of bytes was actually read. So, this patch adds missing size validation check inside mcs7830_get_reg() to prevent uninit value bugs Reported-and-tested-by: syzbot+003c0a286b9af5412510@syzkaller.appspotmail.com Fixes: 2a36d7083438 ("USB: driver for mcs7830 (aka DeLOCK) USB ethernet adapter") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20220106225716.7425-1-paskripkin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	bnxt_en: use firmware provided max timeout for messages	Edwin Peer	6	-12/+15
	[ Upstream commit bce9a0b7900836df223ab638090df0cb8430d9e8 ] Some older devices cannot accommodate the 40 seconds timeout cap for long running commands (such as NVRAM commands) due to hardware limitations. Allow these devices to request more time for these long running commands, but print a warning, since the longer timeout may cause the hung task watchdog to trigger. In the case of a firmware update operation, this is preferable to failing outright. v2: Use bp->hwrm_cmd_max_timeout directly without the constants. Fixes: 881d8353b05e ("bnxt_en: Add an upper bound for all firmware command timeouts.") Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	bnxt_en: move coredump functions into dedicated file	Edwin Peer	5	-400/+424
	[ Upstream commit b032228e58ea2477955058ad4d70a636ce1dec51 ] Change bnxt_get_coredump() and bnxt_get_coredump_length() to non-static functions. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	bnxt_en: Refactor coredump functions	Edwin Peer	1	-17/+31
	[ Upstream commit 9a575c8c25ae2372112db6d6b3e553cd90e9f02b ] The coredump functionality will be used by devlink health. Refactor these functions that get coredump and coredump length. There is no functional change, but the following checkpatch warnings were addressed: - strscpy is preferred over strlcpy. - sscanf results should be checked, with an additional warning. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	iwlwifi: mvm: Use div_s64 instead of do_div in iwl_mvm_ftm_rtt_smoothing()	Nathan Chancellor	1	-2/+1
	[ Upstream commit 4ccdcc8ffd955490feec05380223db6a48961eb5 ] When building ARCH=arm allmodconfig: drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c: In function ‘iwl_mvm_ftm_rtt_smoothing’: ./include/asm-generic/div64.h:222:35: error: comparison of distinct pointer types lacks a cast [-Werror] 222 \| (void)(((typeof((n)) )0) == ((uint64_t )0)); \ \| ^~ drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c:1070:9: note: in expansion of macro ‘do_div’ 1070 \| do_div(rtt_avg, 100); \| ^~~~~~ do_div() has to be used with an unsigned 64-bit integer dividend but rtt_avg is a signed 64-bit integer. div_s64() expects a signed 64-bit integer dividend and signed 32-bit divisor, which fits this scenario, so use that function here to fix the warning. Fixes: 8b0f92549f2c ("iwlwifi: mvm: fix 32-bit build in FTM") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20211227191757.2354329-1-nathan@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	can: rcar_canfd: rcar_canfd_channel_probe(): make sure we free CAN network ↵	Lad Prabhakar	1	-3/+2
	device [ Upstream commit 72b1e360572f9fa7d08ee554f1da29abce23f288 ] Make sure we free CAN network device in the error path. There are several jumps to fail label after allocating the CAN network device successfully. This patch places the free_candev() under fail label so that in failure path a jump to fail label frees the CAN network device. Fixes: 76e9353a80e9 ("can: rcar_canfd: Add support for RZ/G2L family") Link: https://lore.kernel.org/all/20220106114801.20563-1-prabhakar.mahadev-lad.rj@bp.renesas.com Reported-by: Pavel Machek <pavel@denx.de> Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	can: xilinx_can: xcan_probe(): check for error irq	Jiasheng Jiang	1	-1/+6
	[ Upstream commit c6564c13dae25cd7f8e1de5127b4da4500ee5844 ] For the possible failure of the platform_get_irq(), the returned irq could be error number and will finally cause the failure of the request_irq(). Consider that platform_get_irq() can now in certain cases return -EPROBE_DEFER, and the consequences of letting request_irq() effectively convert that into -EINVAL, even at probe time rather than later on. So it might be better to check just now. Fixes: b1201e44f50b ("can: xilinx CAN controller support") Link: https://lore.kernel.org/all/20211224021324.1447494-1-jiasheng@iscas.ac.cn Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	can: softing: softing_startstop(): fix set but not used variable warning	Marc Kleine-Budde	1	-5/+6
	[ Upstream commit 370d988cc529598ebaec6487d4f84c2115dc696b ] In the function softing_startstop() the variable error_reporting is assigned but not used. The code that uses this variable is commented out. Its stated that the functionality is not finally verified. To fix the warning: \| drivers/net/can/softing/softing_fw.c:424:9: error: variable 'error_reporting' set but not used [-Werror,-Wunused-but-set-variable] remove the comment, activate the code, but add a "0 &&" to the if expression and rely on the optimizer rather than the preprocessor to remove the code. Link: https://lore.kernel.org/all/20220109103126.1872833-1-mkl@pengutronix.de Fixes: 03fd3cf5a179 ("can: add driver for Softing card") Cc: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	can: mcp251xfd: add missing newline to printed strings	Marc Kleine-Budde	1	-2/+2
	[ Upstream commit 3bd9d8ce6f8c5c43ee2f1106021db0f98882cc75 ] This patch adds the missing newline to printed strings. Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN") Link: https://lore.kernel.org/all/20220105154300.1258636-4-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	net: mscc: ocelot: fix incorrect balancing with down LAG ports	Vladimir Oltean	1	-15/+11
	[ Upstream commit a14e6b69f393d651913edcbe4ec0dec27b8b4b40 ] Assuming the test setup described here: https://patchwork.kernel.org/project/netdevbpf/cover/20210205130240.4072854-1-vladimir.oltean@nxp.com/ (swp1 and swp2 are in bond0, and bond0 is in a bridge with swp0) it can be seen that when swp1 goes down (on either board A or B), then traffic that should go through that port isn't forwarded anywhere. A dump of the PGID table shows the following: PGID_DST[0] = ports 0 PGID_DST[1] = ports 1 PGID_DST[2] = ports 2 PGID_DST[3] = ports 3 PGID_DST[4] = ports 4 PGID_DST[5] = ports 5 PGID_DST[6] = no ports PGID_AGGR[0] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[1] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[2] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[3] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[4] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[5] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[6] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[7] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[8] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[9] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[10] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[11] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[12] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[13] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[14] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[15] = ports 0, 1, 2, 3, 4, 5 PGID_SRC[0] = ports 1, 2 PGID_SRC[1] = ports 0 PGID_SRC[2] = ports 0 PGID_SRC[3] = no ports PGID_SRC[4] = no ports PGID_SRC[5] = no ports PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5 Whereas a "good" PGID configuration for that setup should have looked like this: PGID_DST[0] = ports 0 PGID_DST[1] = ports 1, 2 PGID_DST[2] = ports 1, 2 PGID_DST[3] = ports 3 PGID_DST[4] = ports 4 PGID_DST[5] = ports 5 PGID_DST[6] = no ports PGID_AGGR[0] = ports 0, 2, 3, 4, 5 PGID_AGGR[1] = ports 0, 2, 3, 4, 5 PGID_AGGR[2] = ports 0, 2, 3, 4, 5 PGID_AGGR[3] = ports 0, 2, 3, 4, 5 PGID_AGGR[4] = ports 0, 2, 3, 4, 5 PGID_AGGR[5] = ports 0, 2, 3, 4, 5 PGID_AGGR[6] = ports 0, 2, 3, 4, 5 PGID_AGGR[7] = ports 0, 2, 3, 4, 5 PGID_AGGR[8] = ports 0, 2, 3, 4, 5 PGID_AGGR[9] = ports 0, 2, 3, 4, 5 PGID_AGGR[10] = ports 0, 2, 3, 4, 5 PGID_AGGR[11] = ports 0, 2, 3, 4, 5 PGID_AGGR[12] = ports 0, 2, 3, 4, 5 PGID_AGGR[13] = ports 0, 2, 3, 4, 5 PGID_AGGR[14] = ports 0, 2, 3, 4, 5 PGID_AGGR[15] = ports 0, 2, 3, 4, 5 PGID_SRC[0] = ports 1, 2 PGID_SRC[1] = ports 0 PGID_SRC[2] = ports 0 PGID_SRC[3] = no ports PGID_SRC[4] = no ports PGID_SRC[5] = no ports PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5 In other words, in the "bad" configuration, the attempt is to remove the inactive swp1 from the destination ports via PGID_DST. But when a MAC table entry is learned, it is learned towards PGID_DST 1, because that is the logical port id of the LAG itself (it is equal to the lowest numbered member port). So when swp1 becomes inactive, if we set PGID_DST[1] to contain just swp1 and not swp2, the packet will not have any chance to reach the destination via swp2. The "correct" way to remove swp1 as a destination is via PGID_AGGR (remove swp1 from the aggregation port groups for all aggregation codes). This means that PGID_DST[1] and PGID_DST[2] must still contain both swp1 and swp2. This makes the MAC table still treat packets destined towards the single-port LAG as "multicast", and the inactive ports are removed via the aggregation code tables. The change presented here is a design one: the ocelot_get_bond_mask() function used to take an "only_active_ports" argument. We don't need that. The only call site that specifies only_active_ports=true, ocelot_set_aggr_pgids(), must retrieve the entire bonding mask, because it must program that into PGID_DST. Additionally, it must also clear the inactive ports from the bond mask here, which it can't do if bond_mask just contains the active ports: ac = ocelot_read_rix(ocelot, ANA_PGID_PGID, i); ac &= ~bond_mask; <---- here /* Don't do division by zero if there was no active * port. Just make all aggregation codes zero. */ if (num_active_ports) ac \|= BIT(aggr_idx[i % num_active_ports]); ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i); So it becomes the responsibility of ocelot_set_aggr_pgids() to take ocelot_port->lag_tx_active into consideration when populating the aggr_idx array. Fixes: 23ca3b727ee6 ("net: mscc: ocelot: rebalance LAGs on link up/down events") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220107164332.402133-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	octeontx2-af: Increment ptp refcount before use	Subbaraya Sundeep	1	-0/+2
	[ Upstream commit 93440f4888cf049dbd22b41aaf94d2e2153b3eb8 ] Before using the ptp pci device by AF driver increment the reference count of it. Fixes: a8b90c9d26d6 ("octeontx2-af: Add PTP device id for CN10K and 95O silcons") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	net/mlx5: Set command entry semaphore up once got index free	Moshe Shemesh	1	-9/+6
	[ Upstream commit 8e715cd613a1e872b9d918e912d90b399785761a ] Avoid a race where command work handler may fail to allocate command entry index, by holding the command semaphore down till command entry index is being freed. Fixes: 410bd754cd73 ("net/mlx5: Add retry mechanism to the command entry index allocation") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	net/mlx5e: Sync VXLAN udp ports during uplink representor profile change	Maor Dickman	1	-0/+3
	[ Upstream commit 07f6dc4024ea1d2314b9c8b81fd4e492864fcca1 ] Currently during NIC profile disablement all VXLAN udp ports offloaded to the HW are flushed and during its enablement the driver send notification to the stack to inform the core that the entire UDP tunnel port state has been lost, uplink representor doesn't have the same behavior which can cause VXLAN udp ports offload to be in bad state while moving between modes while VXLAN interface exist. Fixed by aligning the uplink representor profile behavior to the NIC behavior. Fixes: 84db66124714 ("net/mlx5e: Move set vxlan nic info to profile init") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	net/mlx5: Fix access to sf_dev_table on allocation failure	Shay Drory	1	-4/+1
	[ Upstream commit a1c7c49c2091926962f8c1c866d386febffec5d8 ] Even when SF devices are supported, the SF device table allocation can still fail. In such case mlx5_sf_dev_supported still reports true, but SF device table is invalid. This can result in NULL table access. Hence, fix it by adding NULL table check. Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	net/mlx5e: Fix matching on modified inner ip_ecn bits	Paul Blakey	1	-4/+116
	[ Upstream commit b6dfff21a170af5c695ebaa153b7f5e297ddca03 ] Tunnel device follows RFC 6040, and during decapsulation inner ip_ecn might change depending on inner and outer ip_ecn as follows: +---------+----------------------------------------+ \|Arriving \| Arriving Outer Header \| \| Inner +---------+---------+---------+----------+ \| Header \| Not-ECT \| ECT(0) \| ECT(1) \| CE \| +---------+---------+---------+---------+----------+ \| Not-ECT \| Not-ECT \| Not-ECT \| Not-ECT \| <drop> \| \| ECT(0) \| ECT(0) \| ECT(0) \| ECT(1) \| CE* \| \| ECT(1) \| ECT(1) \| ECT(1) \| ECT(1)* \| CE* \| \| CE \| CE \| CE \| CE \| CE \| +---------+---------+---------+---------+----------+ Cells marked above are changed from original inner packet ip_ecn value. Tc then matches on the modified inner ip_ecn, but hw offload which matches the inner ip_ecn value before decap, will fail. Fix that by mapping all the cases of outer and inner ip_ecn matching, and only supporting cases where we know inner wouldn't be changed by decap, or in the outer ip_ecn=CE case, inner ip_ecn didn't matter. Fixes: bcef735c59f2 ("net/mlx5e: Offload TC matching on tos/ttl for ip tunnels") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	Revert "net/mlx5e: Block offload of outer header csum for GRE tunnel"	Aya Levin	1	-3/+6
	[ Upstream commit 01c3fd113ef50490ffd43f78f347ef6bb008510b ] This reverts commit 54e1217b90486c94b26f24dcee1ee5ef5372f832. Although the NIC doesn't support offload of outer header CSUM, using gso_partial_features allows offloading the tunnel's segmentation. The driver relies on the stack CSUM calculation of the outer header. For this, NETIF_F_GSO_GRE_CSUM must be a member of the device's features. Fixes: 54e1217b9048 ("net/mlx5e: Block offload of outer header csum for GRE tunnel") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	Revert "net/mlx5e: Block offload of outer header csum for UDP tunnels"	Aya Levin	1	-3/+7
	[ Upstream commit 64050cdad0983ad8060e33c3f4b5aee2366bcebd ] This reverts commit 6d6727dddc7f93fcc155cb8d0c49c29ae0e71122. Although the NIC doesn't support offload of outer header CSUM, using gso_partial_features allows offloading the tunnel's segmentation. The driver relies on the stack CSUM calculation of the outer header. For this, NETIF_F_GSO_UDP_TUNNEL_CSUM must be a member of the device's features. Fixes: 6d6727dddc7f ("net/mlx5e: Block offload of outer header csum for UDP tunnels") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	net/mlx5e: Don't block routes with nexthop objects in SW	Maor Dickman	1	-4/+2
	[ Upstream commit 9e72a55a3c9d54b38a704bb7292d984574a81d9d ] Routes with nexthop objects is currently not supported by multipath offload and any attempts to use it is blocked, however this also block adding SW routes with nexthop. Resolve this by returning NOTIFY_DONE instead of an error which will allow such a route to be created in SW but not offloaded. This fix also solve an issue which block adding such routes on different devices due to missing check if the route FIB device is one of multipath devices. Fixes: 6a87afc072c3 ("mlx5: Fail attempts to use routes with nexthop objects") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	net/mlx5e: Fix wrong usage of fib_info_nh when routes with nexthop objects ↵	Maor Dickman	1	-0/+2
	are used [ Upstream commit 885751eb1b01d276e38f57d78c583e4ce006c5ed ] Creating routes with nexthop objects while in switchdev mode leads to access to un-allocated memory and trigger bellow call trace due to hitting WARN_ON. This is caused due to illegal usage of fib_info_nh in TC tunnel FIB event handling to resolve the FIB device while fib_info built in with nexthop. Fixed by ignoring attempts to use nexthop objects with routes until support can be properly added. WARNING: CPU: 1 PID: 1724 at include/net/nexthop.h:468 mlx5e_tc_tun_fib_event+0x448/0x570 [mlx5_core] CPU: 1 PID: 1724 Comm: ip Not tainted 5.15.0_for_upstream_min_debug_2021_11_09_02_04 #1 RIP: 0010:mlx5e_tc_tun_fib_event+0x448/0x570 [mlx5_core] RSP: 0018:ffff8881349f7910 EFLAGS: 00010202 RAX: ffff8881492f1980 RBX: ffff8881349f79e8 RCX: 0000000000000000 RDX: ffff8881349f79e8 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff8881349f7950 R08: 00000000000000fe R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88811e9d0000 R13: ffff88810eb62000 R14: ffff888106710268 R15: 0000000000000018 FS: 00007f1d5ca6e800(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffedba44ff8 CR3: 0000000129808004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> atomic_notifier_call_chain+0x42/0x60 call_fib_notifiers+0x21/0x40 fib_table_insert+0x479/0x6d0 ? try_charge_memcg+0x480/0x6d0 inet_rtm_newroute+0x65/0xb0 rtnetlink_rcv_msg+0x2af/0x360 ? page_add_file_rmap+0x13/0x130 ? do_set_pte+0xcd/0x120 ? rtnl_calcit.isra.0+0x120/0x120 netlink_rcv_skb+0x4e/0xf0 netlink_unicast+0x1ee/0x2b0 netlink_sendmsg+0x22e/0x460 sock_sendmsg+0x33/0x40 ____sys_sendmsg+0x1d1/0x1f0 ___sys_sendmsg+0xab/0xf0 ? __mod_memcg_lruvec_state+0x40/0x60 ? __mod_lruvec_page_state+0x95/0xd0 ? page_add_new_anon_rmap+0x4e/0xf0 ? __handle_mm_fault+0xec6/0x1470 __sys_sendmsg+0x51/0x90 ? internal_get_user_pages_fast+0x480/0xa10 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 8914add2c9e5 ("net/mlx5e: Handle FIB events to update tunnel endpoint device") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>