Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 5c85f7065718a949902b238a6abd8fc907c5d3e0 ]
in be_lancer_xmit_workarounds(), it should go to label 'tx_drop'
if an unexpected value is returned by pskb_trim().
Fixes: 93040ae5cc8d ("be2net: Fix to trim skb for padded vlan packets to workaround an ASIC Bug")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Link: https://lore.kernel.org/r/20230725032726.15002-1-ruc_gongyuanjun@163.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 15cec633fc7bfe4cd69aa012c3b35b31acfc86f2 ]
According to the clarification [1] in the latest napi.rst, the tx
processing cannot call any XDP (or page pool) APIs if the "budget"
is 0. Because NAPI is called with the budget of 0 (such as netpoll)
indicates we may be in an IRQ context, however, we cannot use the
page pool from IRQ context.
[1] https://lore.kernel.org/all/20230720161323.2025379-1-kuba@kernel.org/
Fixes: 20f797399035 ("net: fec: recycle pages for transmitted XDP frames")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230725074148.2936402-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d4a7ce642100765119a872d4aba1bf63e3a22c8a ]
The Xeon validation group has been carrying out some loaded tests
with various HW configurations, and they have seen some transmit
queue time out happening during the test. This will cause the
reset adapter function to be called by igc_tx_timeout().
Similar race conditions may arise when the interface is being brought
down and up in igc_reinit_locked(), an interrupt being generated, and
igc_clean_tx_irq() being called to complete the TX.
When the igc_tx_timeout() function is invoked, this patch will turn
off all TX ring HW queues during igc_down() process. TX ring HW queues
will be activated again during the igc_configure_tx_ring() process
when performing the igc_up() procedure later.
This patch also moved existing igc_disable_tx_ring_hw() to avoid using
forward declaration.
Kernel trace:
[ 7678.747813] ------------[ cut here ]------------
[ 7678.757914] NETDEV WATCHDOG: enp1s0 (igc): transmit queue 2 timed out
[ 7678.770117] WARNING: CPU: 0 PID: 13 at net/sched/sch_generic.c:525 dev_watchdog+0x1ae/0x1f0
[ 7678.784459] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype nft_compat
nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO) rktpm(PO)
cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO) svfs_pci_hotplug(PO)
vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO) svheartbeat(PO) ioapic(PO)
sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO) smbus(PO) spiflash_cdf(PO) arden(PO)
dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO) pch(PO) sviotargets(PO) svbdf(PO) svmem(PO)
svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO) svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO)
fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O) ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO)
regsupport(O) libnvdimm nls_cp437 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel
snd_intel_dspcfg snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci
[ 7678.784496] input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm fuse backlight
configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic pegasus mmc_block usbhid
mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa scsi_transport_sas e1000e e1000 e100 ax88179_178a
usbnet xhci_pci sd_mod xhci_hcd t10_pi crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore
crct10dif_generic ptp crct10dif_common usb_common pps_core
[ 7679.200403] RIP: 0010:dev_watchdog+0x1ae/0x1f0
[ 7679.210201] Code: 28 e9 53 ff ff ff 4c 89 e7 c6 05 06 42 b9 00 01 e8 17 d1 fb ff 44 89 e9 4c
89 e6 48 c7 c7 40 ad fb 81 48 89 c2 e8 52 62 82 ff <0f> 0b e9 72 ff ff ff 65 8b 05 80 7d 7c 7e
89 c0 48 0f a3 05 0a c1
[ 7679.245438] RSP: 0018:ffa00000001f7d90 EFLAGS: 00010282
[ 7679.256021] RAX: 0000000000000000 RBX: ff11000109938440 RCX: 0000000000000000
[ 7679.268710] RDX: ff11000361e26cd8 RSI: ff11000361e1b880 RDI: ff11000361e1b880
[ 7679.281314] RBP: ffa00000001f7da8 R08: ff1100035f8fffe8 R09: 0000000000027ffb
[ 7679.293840] R10: 0000000000001f0a R11: ff1100035f840000 R12: ff11000109938000
[ 7679.306276] R13: 0000000000000002 R14: dead000000000122 R15: ffa00000001f7e18
[ 7679.318648] FS: 0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
[ 7679.332064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7679.342757] CR2: 00007ffff7fca168 CR3: 000000013b08a006 CR4: 0000000000471ef8
[ 7679.354984] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7679.367207] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 7679.379370] PKRU: 55555554
[ 7679.386446] Call Trace:
[ 7679.393152] <TASK>
[ 7679.399363] ? __pfx_dev_watchdog+0x10/0x10
[ 7679.407870] call_timer_fn+0x31/0x110
[ 7679.415698] expire_timers+0xb2/0x120
[ 7679.423403] run_timer_softirq+0x179/0x1e0
[ 7679.431532] ? __schedule+0x2b1/0x820
[ 7679.439078] __do_softirq+0xd1/0x295
[ 7679.446426] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 7679.454867] run_ksoftirqd+0x22/0x30
[ 7679.462058] smpboot_thread_fn+0xb7/0x160
[ 7679.469670] kthread+0xcd/0xf0
[ 7679.476097] ? __pfx_kthread+0x10/0x10
[ 7679.483211] ret_from_fork+0x29/0x50
[ 7679.490047] </TASK>
[ 7679.495204] ---[ end trace 0000000000000000 ]---
[ 7679.503179] igc 0000:01:00.0 enp1s0: Register Dump
[ 7679.511230] igc 0000:01:00.0 enp1s0: Register Name Value
[ 7679.519892] igc 0000:01:00.0 enp1s0: CTRL 181c0641
[ 7679.528782] igc 0000:01:00.0 enp1s0: STATUS 40280683
[ 7679.537551] igc 0000:01:00.0 enp1s0: CTRL_EXT 10000040
[ 7679.546284] igc 0000:01:00.0 enp1s0: MDIC 180a3800
[ 7679.554942] igc 0000:01:00.0 enp1s0: ICR 00000081
[ 7679.563503] igc 0000:01:00.0 enp1s0: RCTL 04408022
[ 7679.571963] igc 0000:01:00.0 enp1s0: RDLEN[0-3] 00001000 00001000 00001000 00001000
[ 7679.583075] igc 0000:01:00.0 enp1s0: RDH[0-3] 00000068 000000b6 0000000f 00000031
[ 7679.594162] igc 0000:01:00.0 enp1s0: RDT[0-3] 00000066 000000b2 0000000e 00000030
[ 7679.605174] igc 0000:01:00.0 enp1s0: RXDCTL[0-3] 02040808 02040808 02040808 02040808
[ 7679.616196] igc 0000:01:00.0 enp1s0: RDBAL[0-3] 1bb7c000 1bb7f000 1bb82000 0ef33000
[ 7679.627242] igc 0000:01:00.0 enp1s0: RDBAH[0-3] 00000001 00000001 00000001 00000001
[ 7679.638256] igc 0000:01:00.0 enp1s0: TCTL a503f0fa
[ 7679.646607] igc 0000:01:00.0 enp1s0: TDBAL[0-3] 2ba4a000 1bb6f000 1bb74000 1bb79000
[ 7679.657609] igc 0000:01:00.0 enp1s0: TDBAH[0-3] 00000001 00000001 00000001 00000001
[ 7679.668551] igc 0000:01:00.0 enp1s0: TDLEN[0-3] 00001000 00001000 00001000 00001000
[ 7679.679470] igc 0000:01:00.0 enp1s0: TDH[0-3] 000000a7 0000002d 000000bf 000000d9
[ 7679.690406] igc 0000:01:00.0 enp1s0: TDT[0-3] 000000a7 0000002d 000000bf 000000d9
[ 7679.701264] igc 0000:01:00.0 enp1s0: TXDCTL[0-3] 02100108 02100108 02100108 02100108
[ 7679.712123] igc 0000:01:00.0 enp1s0: Reset adapter
[ 7683.085967] igc 0000:01:00.0 enp1s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 8086.945561] ------------[ cut here ]------------
Entering kdb (current=0xffffffff8220b200, pid 0) on processor 0
Oops: (null) due to oops @ 0xffffffff81573888
RIP: 0010:dql_completed+0x148/0x160
Code: c9 00 48 89 57 58 e9 46 ff ff ff 45 85 e4 41 0f 95 c4 41 39 db 0f 95
c1 41 84 cc 74 05 45 85 ed 78 0a 44 89 c1 e9 27 ff ff ff <0f> 0b 01 f6 44 89
c1 29 f1 0f 48 ca eb 8c cc cc cc cc cc cc cc cc
RSP: 0018:ffa0000000003e00 EFLAGS: 00010287
RAX: 000000000000006c RBX: ffa0000003eb0f78 RCX: ff11000109938000
RDX: 0000000000000003 RSI: 0000000000000160 RDI: ff110001002e9480
RBP: ffa0000000003ed8 R08: ff110001002e93c0 R09: ffa0000000003d28
R10: 0000000000007cc0 R11: 0000000000007c54 R12: 00000000ffffffd9
R13: ff1100037039cb00 R14: 00000000ffffffd9 R15: ff1100037039c048
FS: 0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? igc_poll+0x1a9/0x14d0 [igc]
__napi_poll+0x2e/0x1b0
net_rx_action+0x126/0x250
__do_softirq+0xd1/0x295
irq_exit_rcu+0xc5/0xf0
common_interrupt+0x86/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x27/0x40
RIP: 0010:cpuidle_enter_state+0xd3/0x3e0
Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00 31 ff e8 1b
de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00 49 63 cf
4c 2b 75 c8 48 8d 04 49 48 89 ca 48 8d
RSP: 0018:ffffffff82203df0 EFLAGS: 00000202
RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f
RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000
RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018
R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000
R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002
cpuidle_enter+0x2e/0x50
call_cpuidle+0x23/0x40
do_idle+0x1be/0x220
cpu_startup_entry+0x20/0x30
rest_init+0xb5/0xc0
arch_call_rest_init+0xe/0x30
start_kernel+0x448/0x760
x86_64_start_kernel+0x109/0x150
secondary_startup_64_no_verify+0xe0/0xeb
</TASK>
more>
[0]kdb>
[0]kdb>
[0]kdb> go
Catastrophic error detected
kdb_continue_catastrophic=0, type go a second time if you really want to
continue
[0]kdb> go
Catastrophic error detected
kdb_continue_catastrophic=0, attempting to continue
[ 8086.955689] refcount_t: underflow; use-after-free.
[ 8086.955697] WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0xc2/0x110
[ 8086.955706] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype nft_compat
nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO) rktpm(PO)
cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO)
svfs_pci_hotplug(PO) vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO)
svheartbeat(PO) ioapic(PO) sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO)
smbus(PO) spiflash_cdf(PO) arden(PO) dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO)
pch(PO) sviotargets(PO) svbdf(PO) svmem(PO) svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO)
svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO) fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O)
ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO) regsupport(O) libnvdimm nls_cp437
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg
snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci
[ 8086.955751] input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm
fuse backlight configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic
pegasus mmc_block usbhid mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa
scsi_transport_sas e1000e e1000 e100 ax88179_178a usbnet xhci_pci sd_mod xhci_hcd t10_pi
crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore crct10dif_generic ptp crct10dif_common
usb_common pps_core
[ 8086.955784] RIP: 0010:refcount_warn_saturate+0xc2/0x110
[ 8086.955788] Code: 01 e8 82 e7 b4 ff 0f 0b 5d c3 cc cc cc cc 80 3d 68 c6 eb 00 00 75 81
48 c7 c7 a0 87 f6 81 c6 05 58 c6 eb 00 01 e8 5e e7 b4 ff <0f> 0b 5d c3 cc cc cc cc 80 3d
42 c6 eb 00 00 0f 85 59 ff ff ff 48
[ 8086.955790] RSP: 0018:ffa0000000003da0 EFLAGS: 00010286
[ 8086.955793] RAX: 0000000000000000 RBX: ff1100011da40ee0 RCX: ff11000361e1b888
[ 8086.955794] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ff11000361e1b880
[ 8086.955795] RBP: ffa0000000003da0 R08: 80000000ffff9f45 R09: ffa0000000003d28
[ 8086.955796] R10: ff1100035f840000 R11: 0000000000000028 R12: ff11000319ff8000
[ 8086.955797] R13: ff1100011bb79d60 R14: 00000000ffffffd6 R15: ff1100037039cb00
[ 8086.955798] FS: 0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
[ 8086.955800] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8086.955801] CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8
[ 8086.955803] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8086.955803] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 8086.955804] PKRU: 55555554
[ 8086.955805] Call Trace:
[ 8086.955806] <IRQ>
[ 8086.955808] tcp_wfree+0x112/0x130
[ 8086.955814] skb_release_head_state+0x24/0xa0
[ 8086.955818] napi_consume_skb+0x9c/0x160
[ 8086.955821] igc_poll+0x5d8/0x14d0 [igc]
[ 8086.955835] __napi_poll+0x2e/0x1b0
[ 8086.955839] net_rx_action+0x126/0x250
[ 8086.955843] __do_softirq+0xd1/0x295
[ 8086.955846] irq_exit_rcu+0xc5/0xf0
[ 8086.955851] common_interrupt+0x86/0xa0
[ 8086.955857] </IRQ>
[ 8086.955857] <TASK>
[ 8086.955858] asm_common_interrupt+0x27/0x40
[ 8086.955862] RIP: 0010:cpuidle_enter_state+0xd3/0x3e0
[ 8086.955866] Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00 31 ff e8
1b de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00 49 63 cf 4c 2b 75
c8 48 8d 04 49 48 89 ca 48 8d
[ 8086.955867] RSP: 0018:ffffffff82203df0 EFLAGS: 00000202
[ 8086.955869] RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f
[ 8086.955870] RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000
[ 8086.955871] RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018
[ 8086.955872] R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000
[ 8086.955873] R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002
[ 8086.955875] cpuidle_enter+0x2e/0x50
[ 8086.955880] call_cpuidle+0x23/0x40
[ 8086.955884] do_idle+0x1be/0x220
[ 8086.955887] cpu_startup_entry+0x20/0x30
[ 8086.955889] rest_init+0xb5/0xc0
[ 8086.955892] arch_call_rest_init+0xe/0x30
[ 8086.955895] start_kernel+0x448/0x760
[ 8086.955898] x86_64_start_kernel+0x109/0x150
[ 8086.955900] secondary_startup_64_no_verify+0xe0/0xeb
[ 8086.955904] </TASK>
[ 8086.955904] ---[ end trace 0000000000000000 ]---
[ 8086.955912] ------------[ cut here ]------------
[ 8086.955913] kernel BUG at lib/dynamic_queue_limits.c:27!
[ 8086.955918] invalid opcode: 0000 [#1] SMP
[ 8086.955922] RIP: 0010:dql_completed+0x148/0x160
[ 8086.955925] Code: c9 00 48 89 57 58 e9 46 ff ff ff 45 85 e4 41 0f 95 c4 41 39 db
0f 95 c1 41 84 cc 74 05 45 85 ed 78 0a 44 89 c1 e9 27 ff ff ff <0f> 0b 01 f6 44 89
c1 29 f1 0f 48 ca eb 8c cc cc cc cc cc cc cc cc
[ 8086.955927] RSP: 0018:ffa0000000003e00 EFLAGS: 00010287
[ 8086.955928] RAX: 000000000000006c RBX: ffa0000003eb0f78 RCX: ff11000109938000
[ 8086.955929] RDX: 0000000000000003 RSI: 0000000000000160 RDI: ff110001002e9480
[ 8086.955930] RBP: ffa0000000003ed8 R08: ff110001002e93c0 R09: ffa0000000003d28
[ 8086.955931] R10: 0000000000007cc0 R11: 0000000000007c54 R12: 00000000ffffffd9
[ 8086.955932] R13: ff1100037039cb00 R14: 00000000ffffffd9 R15: ff1100037039c048
[ 8086.955933] FS: 0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
[ 8086.955934] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8086.955935] CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8
[ 8086.955936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8086.955937] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 8086.955938] PKRU: 55555554
[ 8086.955939] Call Trace:
[ 8086.955939] <IRQ>
[ 8086.955940] ? igc_poll+0x1a9/0x14d0 [igc]
[ 8086.955949] __napi_poll+0x2e/0x1b0
[ 8086.955952] net_rx_action+0x126/0x250
[ 8086.955956] __do_softirq+0xd1/0x295
[ 8086.955958] irq_exit_rcu+0xc5/0xf0
[ 8086.955961] common_interrupt+0x86/0xa0
[ 8086.955964] </IRQ>
[ 8086.955965] <TASK>
[ 8086.955965] asm_common_interrupt+0x27/0x40
[ 8086.955968] RIP: 0010:cpuidle_enter_state+0xd3/0x3e0
[ 8086.955971] Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00
31 ff e8 1b de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00
49 63 cf 4c 2b 75 c8 48 8d 04 49 48 89 ca 48 8d
[ 8086.955972] RSP: 0018:ffffffff82203df0 EFLAGS: 00000202
[ 8086.955973] RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f
[ 8086.955974] RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000
[ 8086.955974] RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018
[ 8086.955975] R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000
[ 8086.955976] R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002
[ 8086.955978] cpuidle_enter+0x2e/0x50
[ 8086.955981] call_cpuidle+0x23/0x40
[ 8086.955984] do_idle+0x1be/0x220
[ 8086.955985] cpu_startup_entry+0x20/0x30
[ 8086.955987] rest_init+0xb5/0xc0
[ 8086.955990] arch_call_rest_init+0xe/0x30
[ 8086.955992] start_kernel+0x448/0x760
[ 8086.955994] x86_64_start_kernel+0x109/0x150
[ 8086.955996] secondary_startup_64_no_verify+0xe0/0xeb
[ 8086.955998] </TASK>
[ 8086.955999] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype
nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO)
rktpm(PO) cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO)
svfs_pci_hotplug(PO) vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO)
svheartbeat(PO) ioapic(PO) sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO)
smbus(PO) spiflash_cdf(PO) arden(PO) dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO)
pch(PO) sviotargets(PO) svbdf(PO) svmem(PO) svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO)
svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO) fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O)
ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO) regsupport(O) libnvdimm nls_cp437
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg
snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci
[ 8086.956029] input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm
fuse backlight configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic
pegasus mmc_block usbhid mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa
scsi_transport_sas e1000e e1000 e100 ax88179_178a usbnet xhci_pci sd_mod xhci_hcd t10_pi
crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore crct10dif_generic ptp crct10dif_common
usb_common pps_core
[16762.543675] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.593 msecs
[16762.543678] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.595 msecs
[16762.543673] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.495 msecs
[16762.543679] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.599 msecs
[16762.543678] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.598 msecs
[16762.543690] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.605 msecs
[16762.543684] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.599 msecs
[16762.543693] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.613 msecs
[16762.543784] ---[ end trace 0000000000000000 ]---
[16762.849099] RIP: 0010:dql_completed+0x148/0x160
PANIC: Fatal exception in interrupt
Fixes: 9b275176270e ("igc: Add ndo_tx_timeout support")
Tested-by: Alejandra Victoria Alcaraz <alejandra.victoria.alcaraz@intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 284779dbf4e98753458708783af8c35630674a21 ]
commit a3a57bf07de23fe1ff779e0fdf710aa581c3ff73 ("net: stmmac: work
around sporadic tx issue on link-up") worked around a problem with TX
sometimes not working after a link-up by avoiding a redundant write to
MAC_CTRL_REG (aka GMAC_CONFIG), since the IP appeared to have problems
with handling multiple writes to that register in some cases.
That commit however only added the work around to dwmac_lib.c (apart
from the common code in stmmac_main.c), but my systems with version
4.21a of the IP exhibit the same problem, so add the work around to
dwmac4_lib.c too.
Fixes: a3a57bf07de2 ("net: stmmac: work around sporadic tx issue on link-up")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230721-stmmac-tx-workaround-v1-1-9411cbd5ee07@axis.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4e62c99d71e56817c934caa2a709a775c8cee078 ]
As of today, hash extraction support is enabled for all the silicons.
Because of which we are facing initialization issues when the silicon
does not support hash extraction. During creation of the hardware
parsing table for IPv6 address, we need to consider if hash extraction
is enabled then extract only 32 bit, otherwise 128 bit needs to be
extracted. This patch fixes the issue and configures the hardware parser
based on the availability of the feature.
Fixes: a95ab93550d3 ("octeontx2-af: Use hashed field in MCAM key")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230721061222.2632521-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a3336056504d780590ac6d6ac94fbba829994594 ]
Fix ethtool FDIR logic to not use memory after its release.
In the ice_ethtool_fdir.c file there are 2 spots where code can
refer to pointers which may be missing.
In the ice_cfg_fdir_xtrct_seq() function seg may be freed but
even then may be still used by memcpy(&tun_seg[1], seg, sizeof(*seg)).
In the ice_add_fdir_ethtool() function struct ice_fdir_fltr *input
may first fail to be added via ice_fdir_update_list_entry() but then
may be deleted by ice_fdir_update_list_entry.
Terminate in both cases when the returned value of the previous
operation is other than 0, free memory and don't use it anymore.
Reported-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2208423
Fixes: cac2a27cd9ab ("ice: Support IPv4 Flow Director filters")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230721155854.1292805-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bb7a0156365dffe2fcd63e2051145fbe4f8908b4 ]
According to the implementation of XDP of FEC driver, the XDP path
shares the transmit queues with the kernel network stack, so it is
possible to lead to a tx timeout event when XDP uses the tx queue
pretty much exclusively. And this event will cause the reset of the
FEC hardware.
To avoid timeout in this case, we use the txq_trans_cond_update()
interface to update txq->trans_start to jiffies so that watchdog
won't generate a transmit timeout warning.
Fixes: 6d6b39f180b8 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://lore.kernel.org/r/20230721083559.2857312-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 69a184f7a372aac588babfb0bd681aaed9779f5b ]
in atl1e_tso_csum, it should check the return value of pskb_trim(),
and return an error code if an unexpected value is returned
by pskb_trim().
Fixes: a6a5325239c2 ("atl1e: Atheros L1E Gigabit Ethernet driver")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230720144219.39285-1-ruc_gongyuanjun@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ed96824b71ed67664390890441b229423a25317f ]
in atl1_tso(), it should check the return value of pskb_trim(),
and return an error code if an unexpected value is returned
by pskb_trim().
Fixes: 401c0aabec4b ("atl1: simplify tx packet descriptor")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Link: https://lore.kernel.org/r/20230722142511.12448-1-ruc_gongyuanjun@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 94d166c5318c6edd1e079df8552233443e909c33 ]
VXLAN-GPE does not add an extra inner Ethernet header. Take that into
account when calculating header length.
This causes problems in skb_tunnel_check_pmtu, where incorrect PMTU is
cached.
In the collect_md mode (which is the only mode that VXLAN-GPE
supports), there's no magic auto-setting of the tunnel interface MTU.
It can't be, since the destination and thus the underlying interface
may be different for each packet.
So, the administrator is responsible for setting the correct tunnel
interface MTU. Apparently, the administrators are capable enough to
calculate that the maximum MTU for VXLAN-GPE is (their_lower_MTU - 36).
They set the tunnel interface MTU to 1464. If you run a TCP stream over
such interface, it's then segmented according to the MTU 1464, i.e.
producing 1514 bytes frames. Which is okay, this still fits the lower
MTU.
However, skb_tunnel_check_pmtu (called from vxlan_xmit_one) uses 50 as
the header size and thus incorrectly calculates the frame size to be
1528. This leads to ICMP too big message being generated (locally),
PMTU of 1450 to be cached and the TCP stream to be resegmented.
The fix is to use the correct actual header size, especially for
skb_tunnel_check_pmtu calculation.
Fixes: e1e5314de08ba ("vxlan: implement GPE")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 882481b1c55fc44861d7e2d54b4e0936b1b39f2c ]
In dwrr mode, the default bandwidth weight of disabled tc is set to 0.
If the bandwidth weight is 0, the mode will change to sp.
Therefore, disabled tc default bandwidth weight need changed to 1,
and 0 is returned when query the bandwidth weight of disabled tc.
In addition, driver need stop configure bandwidth weight if tc is disabled.
Fixes: 848440544b41 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 116d9f732eef634abbd871f2c6f613a5b4677742 ]
Currently, the weight saved by the driver is used as the query result,
which may be different from the actual weight in the register.
Therefore, the register value read from the firmware is used
as the query result
Fixes: 0e32038dc856 ("net: hns3: refactor dump tc of debugfs")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b27d0232e8897f7c896dc8ad80c9907dd57fd3f3 ]
Current only the first 32 bits of the capability flag bit are considered.
When the matching capability flag bit is greater than 31 bits,
it will get an error bit.This patch use bitmap to solve this issue.
It can handle each capability bit whitout bit width limit.
Fixes: da77aef9cc58 ("net: hns3: create common cmdq resource allocate/free/query APIs")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 91896c8acce23d33ed078cffd46a9534b1f82be5 ]
In iavf_adminq_task(), if the function can't acquire the
adapter->crit_lock, it checks if the driver is removing. If so, it simply
exits without re-enabling the interrupt. This is done to ensure that the
task stops processing as soon as possible once the driver is being removed.
However, if the IAVF_FLAG_PF_COMMS_FAILED is set, the function checks this
before attempting to acquire the lock. In this case, the function exits
early and re-enables the interrupt. This will happen even if the driver is
already removing.
Avoid this, by moving the check to after the adapter->crit_lock is
acquired. This way, if the driver is removing, we will not re-enable the
interrupt.
Fixes: fc2e6b3b132a ("iavf: Rework mutexes for better synchronisation")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a2f054c10bef0b54600ec9cb776508443e941343 ]
In iavf_adminq_task(), if kzalloc() fails to allocate the event.msg_buf,
the function will exit without releasing the adapter->crit_lock.
This is unlikely, but if it happens, the next access to that mutex will
deadlock.
Fix this by moving the unlock to the end of the function, and adding a new
label to allow jumping to the unlock portion of the function exit flow.
Fixes: fc2e6b3b132a ("iavf: Rework mutexes for better synchronisation")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 043b1f185fb0f3939b7427f634787706f45411c4 ]
The debugfs_create_dir() function returns error pointers.
It never returns NULL. Most incorrect error checks were fixed,
but the one in i40e_dbg_init() was forgotten.
Fix the remaining error check.
Fixes: 02e9c290814c ("i40e: debugfs interface")
Signed-off-by: Wang Ming <machel@vivo.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
is disabled during NAPI poll")
commit cf2ffdea0839398cb0551762af7f5efb0a6e0fea upstream.
There have been reports that on a number of systems this change breaks
network connectivity. Therefore effectively revert it. Mainly affected
seem to be systems where BIOS denies ASPM access to OS.
Due to later changes we can't do a direct revert.
Fixes: 2ab19de62d67 ("r8169: remove ASPM restrictions now that ASPM is disabled during NAPI poll")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/netdev/e47bac0d-e802-65e1-b311-6acb26d5cf10@freenet.de/T/
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217596
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/57f13ec0-b216-d5d8-363d-5b05528ec5fb@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9f9d4c1a2e82174a4e799ec405284a2b0de32b6a ]
entries and bind debugfs files would display wrong data on NETSYS_V2 and
later because instead of using mtk_get_ib1_pkt_type the driver would use
MTK_FOE_IB1_PACKET_TYPE which corresponds to NETSYS_V1(.x) SoCs.
Use mtk_get_ib1_pkt_type so entries and bind records display correctly.
Fixes: 03a3180e5c09e ("net: ethernet: mtk_eth_soc: introduce flow offloading support for mt7986")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/c0ae03d0182f4d27b874cbdf0059bc972c317f3c.1689727134.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 78adb4bcf99effbb960c5f9091e2e062509d1030 ]
In normal operation, each populated queue item has
next_to_watch pointing to the last TX desc of the packet,
while each cleaned item has it set to 0. In particular,
next_to_use that points to the next (necessarily clean)
item to use has next_to_watch set to 0.
When the TX queue is used both by an application using
AF_XDP with ZEROCOPY as well as a second non-XDP application
generating high traffic, the queue pointers can get in
an invalid state where next_to_use points to an item
where next_to_watch is NOT set to 0.
However, the implementation assumes at several places
that this is never the case, so if it does hold,
bad things happen. In particular, within the loop inside
of igc_clean_tx_irq(), next_to_clean can overtake next_to_use.
Finally, this prevents any further transmission via
this queue and it never gets unblocked or signaled.
Secondly, if the queue is in this garbled state,
the inner loop of igc_clean_tx_ring() will never terminate,
completely hogging a CPU core.
The reason is that igc_xdp_xmit_zc() reads next_to_use
before acquiring the lock, and writing it back
(potentially unmodified) later. If it got modified
before locking, the outdated next_to_use is written
pointing to an item that was already used elsewhere
(and thus next_to_watch got written).
Fixes: 9acf59a752d4 ("igc: Enable TX via AF_XDP zero-copy")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 95b681485563c64585de78662ee52d06b7fa47d9 ]
High XDP load triggers the netdev watchdog:
|NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out
The reason is the Tx queue transmission start (txq->trans_start) is not updated
in XDP code path. Therefore, add it for all XDP transmission functions.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stable-dep-of: 78adb4bcf99e ("igc: Prevent garbled TX queue with XDP ZEROCOPY")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8fcd7c7b3a38ab5e452f542fda8f7940e77e479a ]
Current driver enables backpressure for LBK interfaces.
But these interfaces do not support this feature.
Hence, this patch fixes the issue by skipping the
backpressure configuration for these interfaces.
Fixes: 75f36270990c ("octeontx2-pf: Support to enable/disable pause frames via ethtool").
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Link: https://lore.kernel.org/r/20230716093741.28063-1-gakula@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c34743daca0eb1dc855831a5210f0800a850088e ]
The reset task is currently scheduled from the watchdog or adminq tasks.
First, all direct calls to schedule the reset task are replaced with the
iavf_schedule_reset(), which is modified to accept the flag showing the
type of reset.
To prevent the reset task from starting once iavf_remove() starts, we need
to check the __IAVF_IN_REMOVE_TASK bit before we schedule it. This is now
easily added to iavf_schedule_reset().
Finally, remove the check for IAVF_FLAG_RESET_NEEDED in the watchdog task.
It is redundant since all callers who set the flag immediately schedules
the reset task.
Fixes: 3ccd54ef44eb ("iavf: Fix init state closure on remove")
Fixes: 14756b2ae265 ("iavf: Fix __IAVF_RESETTING state usage")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d1639a17319ba78a018280cd2df6577a7e5d9fab ]
A driver's lock (crit_lock) is used to serialize all the driver's tasks.
Lockdep, however, shows a circular dependency between rtnl and
crit_lock. This happens when an ndo that already holds the rtnl requests
the driver to reset, since the reset task (in some paths) tries to grab
rtnl to either change real number of queues of update netdev features.
[566.241851] ======================================================
[566.241893] WARNING: possible circular locking dependency detected
[566.241936] 6.2.14-100.fc36.x86_64+debug #1 Tainted: G OE
[566.241984] ------------------------------------------------------
[566.242025] repro.sh/2604 is trying to acquire lock:
[566.242061] ffff9280fc5ceee8 (&adapter->crit_lock){+.+.}-{3:3}, at: iavf_close+0x3c/0x240 [iavf]
[566.242167]
but task is already holding lock:
[566.242209] ffffffff9976d350 (rtnl_mutex){+.+.}-{3:3}, at: iavf_remove+0x6b5/0x730 [iavf]
[566.242300]
which lock already depends on the new lock.
[566.242353]
the existing dependency chain (in reverse order) is:
[566.242401]
-> #1 (rtnl_mutex){+.+.}-{3:3}:
[566.242451] __mutex_lock+0xc1/0xbb0
[566.242489] iavf_init_interrupt_scheme+0x179/0x440 [iavf]
[566.242560] iavf_watchdog_task+0x80b/0x1400 [iavf]
[566.242627] process_one_work+0x2b3/0x560
[566.242663] worker_thread+0x4f/0x3a0
[566.242696] kthread+0xf2/0x120
[566.242730] ret_from_fork+0x29/0x50
[566.242763]
-> #0 (&adapter->crit_lock){+.+.}-{3:3}:
[566.242815] __lock_acquire+0x15ff/0x22b0
[566.242869] lock_acquire+0xd2/0x2c0
[566.242901] __mutex_lock+0xc1/0xbb0
[566.242934] iavf_close+0x3c/0x240 [iavf]
[566.242997] __dev_close_many+0xac/0x120
[566.243036] dev_close_many+0x8b/0x140
[566.243071] unregister_netdevice_many_notify+0x165/0x7c0
[566.243116] unregister_netdevice_queue+0xd3/0x110
[566.243157] iavf_remove+0x6c1/0x730 [iavf]
[566.243217] pci_device_remove+0x33/0xa0
[566.243257] device_release_driver_internal+0x1bc/0x240
[566.243299] pci_stop_bus_device+0x6c/0x90
[566.243338] pci_stop_and_remove_bus_device+0xe/0x20
[566.243380] pci_iov_remove_virtfn+0xd1/0x130
[566.243417] sriov_disable+0x34/0xe0
[566.243448] ice_free_vfs+0x2da/0x330 [ice]
[566.244383] ice_sriov_configure+0x88/0xad0 [ice]
[566.245353] sriov_numvfs_store+0xde/0x1d0
[566.246156] kernfs_fop_write_iter+0x15e/0x210
[566.246921] vfs_write+0x288/0x530
[566.247671] ksys_write+0x74/0xf0
[566.248408] do_syscall_64+0x58/0x80
[566.249145] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[566.249886]
other info that might help us debug this:
[566.252014] Possible unsafe locking scenario:
[566.253432] CPU0 CPU1
[566.254118] ---- ----
[566.254800] lock(rtnl_mutex);
[566.255514] lock(&adapter->crit_lock);
[566.256233] lock(rtnl_mutex);
[566.256897] lock(&adapter->crit_lock);
[566.257388]
*** DEADLOCK ***
The deadlock can be triggered by a script that is continuously resetting
the VF adapter while doing other operations requiring RTNL, e.g:
while :; do
ip link set $VF up
ethtool --set-channels $VF combined 2
ip link set $VF down
ip link set $VF up
ethtool --set-channels $VF combined 4
ip link set $VF down
done
Any operation that triggers a reset can substitute "ethtool --set-channles"
As a fix, add a new task "finish_config" that do all the work which
needs rtnl lock. With the exception of iavf_remove(), all work that
require rtnl should be called from this task.
As for iavf_remove(), at the point where we need to call
unregister_netdevice() (and grab rtnl_lock), we make sure the finish_config
task is not running (cancel_work_sync()) to safely grab rtnl. Subsequent
finish_config work cannot restart after that since the task is guarded
by the __IAVF_IN_REMOVE_TASK bit in iavf_schedule_finish_config().
Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c2ed2403f12c74a74a0091ed5d830e72c58406e8 ]
There was a fail when trying to add the interface to bonding
right after changing the MTU on the interface. It was caused
by bonding interface unable to open the interface due to
interface being in __RESETTING state because of MTU change.
Add new reset_waitqueue to indicate that reset has finished.
Add waiting for reset to finish in callbacks which trigger hw reset:
iavf_set_priv_flags(), iavf_change_mtu() and iavf_set_ringparam().
We use a 5000ms timeout period because on Hyper-V based systems,
this operation takes around 3000-4000ms. In normal circumstances,
it doesn't take more than 500ms to complete.
Add a function iavf_wait_for_reset() to reuse waiting for reset code and
use it also in iavf_set_channels(), which already waits for reset.
We don't use error handling in iavf_set_channels() as this could
cause the device to be in incorrect state if the reset was scheduled
but hit timeout or the waitng function was interrupted by a signal.
Fixes: 4e5e6b5d9d13 ("iavf: Fix return of set the new channel count")
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Co-developed-by: Dawid Wesierski <dawidx.wesierski@intel.com>
Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a4aadf0f5905661cd25c366b96cc1c840f05b756 ]
Make all possible functions static.
Move iavf_force_wb() up to avoid forward declaration.
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stable-dep-of: c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger it")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a77ed5c5b768e9649be240a2d864e5cd9c6a2015 ]
If the system tries to close the netdev while iavf_reset_task() is
running, __LINK_STATE_START will be cleared and netif_running() will
return false in iavf_reinit_interrupt_scheme(). This will result in
iavf_free_traffic_irqs() not being called and a leak as follows:
[7632.489326] remove_proc_entry: removing non-empty directory 'irq/999', leaking at least 'iavf-enp24s0f0v0-TxRx-0'
[7632.490214] WARNING: CPU: 0 PID: 10 at fs/proc/generic.c:718 remove_proc_entry+0x19b/0x1b0
is shown when pci_disable_msix() is later called. Fix by using the
internal adapter state. The traffic IRQs will always exist if
state == __IAVF_RUNNING.
Fixes: 5b36e8d04b44 ("i40evf: Enable VF to request an alternate queue allocation")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7c4bced3caa749ce468b0c5de711c98476b23a52 ]
If we set channels greater during iavf_remove(), and waiting reset done
would be timeout, then returned with error but changed num_active_queues
directly, that will lead to OOB like the following logs. Because the
num_active_queues is greater than tx/rx_rings[] allocated actually.
Reproducer:
[root@host ~]# cat repro.sh
#!/bin/bash
pf_dbsf="0000:41:00.0"
vf0_dbsf="0000:41:02.0"
g_pids=()
function do_set_numvf()
{
echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
sleep $((RANDOM%3+1))
echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
sleep $((RANDOM%3+1))
}
function do_set_channel()
{
local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/)
[ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; }
ifconfig $nic 192.168.18.5 netmask 255.255.255.0
ifconfig $nic up
ethtool -L $nic combined 1
ethtool -L $nic combined 4
sleep $((RANDOM%3))
}
function on_exit()
{
local pid
for pid in "${g_pids[@]}"; do
kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null
done
g_pids=()
}
trap "on_exit; exit" EXIT
while :; do do_set_numvf ; done &
g_pids+=($!)
while :; do do_set_channel ; done &
g_pids+=($!)
wait
Result:
[ 3506.152887] iavf 0000:41:02.0: Removing device
[ 3510.400799] ==================================================================
[ 3510.400820] BUG: KASAN: slab-out-of-bounds in iavf_free_all_tx_resources+0x156/0x160 [iavf]
[ 3510.400823] Read of size 8 at addr ffff88b6f9311008 by task repro.sh/55536
[ 3510.400823]
[ 3510.400830] CPU: 101 PID: 55536 Comm: repro.sh Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1
[ 3510.400832] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021
[ 3510.400835] Call Trace:
[ 3510.400851] dump_stack+0x71/0xab
[ 3510.400860] print_address_description+0x6b/0x290
[ 3510.400865] ? iavf_free_all_tx_resources+0x156/0x160 [iavf]
[ 3510.400868] kasan_report+0x14a/0x2b0
[ 3510.400873] iavf_free_all_tx_resources+0x156/0x160 [iavf]
[ 3510.400880] iavf_remove+0x2b6/0xc70 [iavf]
[ 3510.400884] ? iavf_free_all_rx_resources+0x160/0x160 [iavf]
[ 3510.400891] ? wait_woken+0x1d0/0x1d0
[ 3510.400895] ? notifier_call_chain+0xc1/0x130
[ 3510.400903] pci_device_remove+0xa8/0x1f0
[ 3510.400910] device_release_driver_internal+0x1c6/0x460
[ 3510.400916] pci_stop_bus_device+0x101/0x150
[ 3510.400919] pci_stop_and_remove_bus_device+0xe/0x20
[ 3510.400924] pci_iov_remove_virtfn+0x187/0x420
[ 3510.400927] ? pci_iov_add_virtfn+0xe10/0xe10
[ 3510.400929] ? pci_get_subsys+0x90/0x90
[ 3510.400932] sriov_disable+0xed/0x3e0
[ 3510.400936] ? bus_find_device+0x12d/0x1a0
[ 3510.400953] i40e_free_vfs+0x754/0x1210 [i40e]
[ 3510.400966] ? i40e_reset_all_vfs+0x880/0x880 [i40e]
[ 3510.400968] ? pci_get_device+0x7c/0x90
[ 3510.400970] ? pci_get_subsys+0x90/0x90
[ 3510.400982] ? pci_vfs_assigned.part.7+0x144/0x210
[ 3510.400987] ? __mutex_lock_slowpath+0x10/0x10
[ 3510.400996] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
[ 3510.401001] sriov_numvfs_store+0x214/0x290
[ 3510.401005] ? sriov_totalvfs_show+0x30/0x30
[ 3510.401007] ? __mutex_lock_slowpath+0x10/0x10
[ 3510.401011] ? __check_object_size+0x15a/0x350
[ 3510.401018] kernfs_fop_write+0x280/0x3f0
[ 3510.401022] vfs_write+0x145/0x440
[ 3510.401025] ksys_write+0xab/0x160
[ 3510.401028] ? __ia32_sys_read+0xb0/0xb0
[ 3510.401031] ? fput_many+0x1a/0x120
[ 3510.401032] ? filp_close+0xf0/0x130
[ 3510.401038] do_syscall_64+0xa0/0x370
[ 3510.401041] ? page_fault+0x8/0x30
[ 3510.401043] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 3510.401073] RIP: 0033:0x7f3a9bb842c0
[ 3510.401079] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24
[ 3510.401080] RSP: 002b:00007ffc05f1fe18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 3510.401083] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f3a9bb842c0
[ 3510.401085] RDX: 0000000000000002 RSI: 0000000002327408 RDI: 0000000000000001
[ 3510.401086] RBP: 0000000002327408 R08: 00007f3a9be53780 R09: 00007f3a9c8a4700
[ 3510.401086] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
[ 3510.401087] R13: 0000000000000001 R14: 00007f3a9be52620 R15: 0000000000000001
[ 3510.401090]
[ 3510.401093] Allocated by task 76795:
[ 3510.401098] kasan_kmalloc+0xa6/0xd0
[ 3510.401099] __kmalloc+0xfb/0x200
[ 3510.401104] iavf_init_interrupt_scheme+0x26f/0x1310 [iavf]
[ 3510.401108] iavf_watchdog_task+0x1d58/0x4050 [iavf]
[ 3510.401114] process_one_work+0x56a/0x11f0
[ 3510.401115] worker_thread+0x8f/0xf40
[ 3510.401117] kthread+0x2a0/0x390
[ 3510.401119] ret_from_fork+0x1f/0x40
[ 3510.401122] 0xffffffffffffffff
[ 3510.401123]
In timeout handling, we should keep the original num_active_queues
and reset num_req_queues to 0.
Fixes: 4e5e6b5d9d13 ("iavf: Fix return of set the new channel count")
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
Cc: Huang Cun <huangcun@sangfor.com.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5f4fa1672d98fe99d2297b03add35346f1685d6b ]
We do netif_napi_add() for all allocated q_vectors[], but potentially
do netif_napi_del() for part of them, then kfree q_vectors and leave
invalid pointers at dev->napi_list.
Reproducer:
[root@host ~]# cat repro.sh
#!/bin/bash
pf_dbsf="0000:41:00.0"
vf0_dbsf="0000:41:02.0"
g_pids=()
function do_set_numvf()
{
echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
sleep $((RANDOM%3+1))
echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
sleep $((RANDOM%3+1))
}
function do_set_channel()
{
local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/)
[ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; }
ifconfig $nic 192.168.18.5 netmask 255.255.255.0
ifconfig $nic up
ethtool -L $nic combined 1
ethtool -L $nic combined 4
sleep $((RANDOM%3))
}
function on_exit()
{
local pid
for pid in "${g_pids[@]}"; do
kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null
done
g_pids=()
}
trap "on_exit; exit" EXIT
while :; do do_set_numvf ; done &
g_pids+=($!)
while :; do do_set_channel ; done &
g_pids+=($!)
wait
Result:
[ 4093.900222] ==================================================================
[ 4093.900230] BUG: KASAN: use-after-free in free_netdev+0x308/0x390
[ 4093.900232] Read of size 8 at addr ffff88b4dc145640 by task repro.sh/6699
[ 4093.900233]
[ 4093.900236] CPU: 10 PID: 6699 Comm: repro.sh Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1
[ 4093.900238] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021
[ 4093.900239] Call Trace:
[ 4093.900244] dump_stack+0x71/0xab
[ 4093.900249] print_address_description+0x6b/0x290
[ 4093.900251] ? free_netdev+0x308/0x390
[ 4093.900252] kasan_report+0x14a/0x2b0
[ 4093.900254] free_netdev+0x308/0x390
[ 4093.900261] iavf_remove+0x825/0xd20 [iavf]
[ 4093.900265] pci_device_remove+0xa8/0x1f0
[ 4093.900268] device_release_driver_internal+0x1c6/0x460
[ 4093.900271] pci_stop_bus_device+0x101/0x150
[ 4093.900273] pci_stop_and_remove_bus_device+0xe/0x20
[ 4093.900275] pci_iov_remove_virtfn+0x187/0x420
[ 4093.900277] ? pci_iov_add_virtfn+0xe10/0xe10
[ 4093.900278] ? pci_get_subsys+0x90/0x90
[ 4093.900280] sriov_disable+0xed/0x3e0
[ 4093.900282] ? bus_find_device+0x12d/0x1a0
[ 4093.900290] i40e_free_vfs+0x754/0x1210 [i40e]
[ 4093.900298] ? i40e_reset_all_vfs+0x880/0x880 [i40e]
[ 4093.900299] ? pci_get_device+0x7c/0x90
[ 4093.900300] ? pci_get_subsys+0x90/0x90
[ 4093.900306] ? pci_vfs_assigned.part.7+0x144/0x210
[ 4093.900309] ? __mutex_lock_slowpath+0x10/0x10
[ 4093.900315] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
[ 4093.900318] sriov_numvfs_store+0x214/0x290
[ 4093.900320] ? sriov_totalvfs_show+0x30/0x30
[ 4093.900321] ? __mutex_lock_slowpath+0x10/0x10
[ 4093.900323] ? __check_object_size+0x15a/0x350
[ 4093.900326] kernfs_fop_write+0x280/0x3f0
[ 4093.900329] vfs_write+0x145/0x440
[ 4093.900330] ksys_write+0xab/0x160
[ 4093.900332] ? __ia32_sys_read+0xb0/0xb0
[ 4093.900334] ? fput_many+0x1a/0x120
[ 4093.900335] ? filp_close+0xf0/0x130
[ 4093.900338] do_syscall_64+0xa0/0x370
[ 4093.900339] ? page_fault+0x8/0x30
[ 4093.900341] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 4093.900357] RIP: 0033:0x7f16ad4d22c0
[ 4093.900359] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24
[ 4093.900360] RSP: 002b:00007ffd6491b7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 4093.900362] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f16ad4d22c0
[ 4093.900363] RDX: 0000000000000002 RSI: 0000000001a41408 RDI: 0000000000000001
[ 4093.900364] RBP: 0000000001a41408 R08: 00007f16ad7a1780 R09: 00007f16ae1f2700
[ 4093.900364] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
[ 4093.900365] R13: 0000000000000001 R14: 00007f16ad7a0620 R15: 0000000000000001
[ 4093.900367]
[ 4093.900368] Allocated by task 820:
[ 4093.900371] kasan_kmalloc+0xa6/0xd0
[ 4093.900373] __kmalloc+0xfb/0x200
[ 4093.900376] iavf_init_interrupt_scheme+0x63b/0x1320 [iavf]
[ 4093.900380] iavf_watchdog_task+0x3d51/0x52c0 [iavf]
[ 4093.900382] process_one_work+0x56a/0x11f0
[ 4093.900383] worker_thread+0x8f/0xf40
[ 4093.900384] kthread+0x2a0/0x390
[ 4093.900385] ret_from_fork+0x1f/0x40
[ 4093.900387] 0xffffffffffffffff
[ 4093.900387]
[ 4093.900388] Freed by task 6699:
[ 4093.900390] __kasan_slab_free+0x137/0x190
[ 4093.900391] kfree+0x8b/0x1b0
[ 4093.900394] iavf_free_q_vectors+0x11d/0x1a0 [iavf]
[ 4093.900397] iavf_remove+0x35a/0xd20 [iavf]
[ 4093.900399] pci_device_remove+0xa8/0x1f0
[ 4093.900400] device_release_driver_internal+0x1c6/0x460
[ 4093.900401] pci_stop_bus_device+0x101/0x150
[ 4093.900402] pci_stop_and_remove_bus_device+0xe/0x20
[ 4093.900403] pci_iov_remove_virtfn+0x187/0x420
[ 4093.900404] sriov_disable+0xed/0x3e0
[ 4093.900409] i40e_free_vfs+0x754/0x1210 [i40e]
[ 4093.900415] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
[ 4093.900416] sriov_numvfs_store+0x214/0x290
[ 4093.900417] kernfs_fop_write+0x280/0x3f0
[ 4093.900418] vfs_write+0x145/0x440
[ 4093.900419] ksys_write+0xab/0x160
[ 4093.900420] do_syscall_64+0xa0/0x370
[ 4093.900421] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 4093.900422] 0xffffffffffffffff
[ 4093.900422]
[ 4093.900424] The buggy address belongs to the object at ffff88b4dc144200
which belongs to the cache kmalloc-8k of size 8192
[ 4093.900425] The buggy address is located 5184 bytes inside of
8192-byte region [ffff88b4dc144200, ffff88b4dc146200)
[ 4093.900425] The buggy address belongs to the page:
[ 4093.900427] page:ffffea00d3705000 refcount:1 mapcount:0 mapping:ffff88bf04415c80 index:0x0 compound_mapcount: 0
[ 4093.900430] flags: 0x10000000008100(slab|head)
[ 4093.900433] raw: 0010000000008100 dead000000000100 dead000000000200 ffff88bf04415c80
[ 4093.900434] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000
[ 4093.900434] page dumped because: kasan: bad access detected
[ 4093.900435]
[ 4093.900435] Memory state around the buggy address:
[ 4093.900436] ffff88b4dc145500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900437] ffff88b4dc145580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900438] >ffff88b4dc145600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900438] ^
[ 4093.900439] ffff88b4dc145680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900440] ffff88b4dc145700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900440] ==================================================================
Although the patch #2 (of 2) can avoid the issue triggered by this
repro.sh, there still are other potential risks that if num_active_queues
is changed to less than allocated q_vectors[] by unexpected, the
mismatched netif_napi_add/del() can also cause UAF.
Since we actually call netif_napi_add() for all allocated q_vectors
unconditionally in iavf_alloc_q_vectors(), so we should fix it by
letting netif_napi_del() match to netif_napi_add().
Fixes: 5eae00c57f5e ("i40evf: main driver core")
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
Cc: Huang Cun <huangcun@sangfor.com.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 162d626f3013215b82b6514ca14f20932c7ccce5 ]
Referenced commit missed that for chip versions 42 and 43 ASPM
remained disabled in the respective rtl_hw_start_...() routines.
This resulted in problems as described in the referenced bug
ticket. Therefore re-instantiate the previous logic.
Fixes: 5fc3f6c90cca ("r8169: consolidate disabling ASPM before EPHY access")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217635
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b3e7b3a6ee92ab927f750a6b19615ce88ece808f ]
Calling ethtool during reload can lead to call trace, because VSI isn't
configured for some time, but netdev is alive.
To fix it add rtnl lock for VSI deconfig and config. Set ::num_q_vectors
to 0 after freeing and add a check for ::tx/rx_rings in ring related
ethtool ops.
Add proper unroll of filters in ice_start_eth().
Reproduction:
$watch -n 0.1 -d 'ethtool -g enp24s0f0np0'
$devlink dev reload pci/0000:18:00.0 action driver_reinit
Call trace before fix:
[66303.926205] BUG: kernel NULL pointer dereference, address: 0000000000000000
[66303.926259] #PF: supervisor read access in kernel mode
[66303.926286] #PF: error_code(0x0000) - not-present page
[66303.926311] PGD 0 P4D 0
[66303.926332] Oops: 0000 [#1] PREEMPT SMP PTI
[66303.926358] CPU: 4 PID: 933821 Comm: ethtool Kdump: loaded Tainted: G OE 6.4.0-rc5+ #1
[66303.926400] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.00.01.0014.070920180847 07/09/2018
[66303.926446] RIP: 0010:ice_get_ringparam+0x22/0x50 [ice]
[66303.926649] Code: 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 87 c0 09 00 00 c7 46 04 e0 1f 00 00 c7 46 10 e0 1f 00 00 48 8b 50 20 <48> 8b 12 0f b7 52 3a 89 56 14 48 8b 40 28 48 8b 00 0f b7 40 58 48
[66303.926722] RSP: 0018:ffffad40472f39c8 EFLAGS: 00010246
[66303.926749] RAX: ffff98a8ada05828 RBX: ffff98a8c46dd060 RCX: ffffad40472f3b48
[66303.926781] RDX: 0000000000000000 RSI: ffff98a8c46dd068 RDI: ffff98a8b23c4000
[66303.926811] RBP: ffffad40472f3b48 R08: 00000000000337b0 R09: 0000000000000000
[66303.926843] R10: 0000000000000001 R11: 0000000000000100 R12: ffff98a8b23c4000
[66303.926874] R13: ffff98a8c46dd060 R14: 000000000000000f R15: ffffad40472f3a50
[66303.926906] FS: 00007f6397966740(0000) GS:ffff98b390900000(0000) knlGS:0000000000000000
[66303.926941] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[66303.926967] CR2: 0000000000000000 CR3: 000000011ac20002 CR4: 00000000007706e0
[66303.926999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[66303.927029] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[66303.927060] PKRU: 55555554
[66303.927075] Call Trace:
[66303.927094] <TASK>
[66303.927111] ? __die+0x23/0x70
[66303.927140] ? page_fault_oops+0x171/0x4e0
[66303.927176] ? exc_page_fault+0x7f/0x180
[66303.927209] ? asm_exc_page_fault+0x26/0x30
[66303.927244] ? ice_get_ringparam+0x22/0x50 [ice]
[66303.927433] rings_prepare_data+0x62/0x80
[66303.927469] ethnl_default_doit+0xe2/0x350
[66303.927501] genl_family_rcv_msg_doit.isra.0+0xe3/0x140
[66303.927538] genl_rcv_msg+0x1b1/0x2c0
[66303.927561] ? __pfx_ethnl_default_doit+0x10/0x10
[66303.927590] ? __pfx_genl_rcv_msg+0x10/0x10
[66303.927615] netlink_rcv_skb+0x58/0x110
[66303.927644] genl_rcv+0x28/0x40
[66303.927665] netlink_unicast+0x19e/0x290
[66303.927691] netlink_sendmsg+0x254/0x4d0
[66303.927717] sock_sendmsg+0x93/0xa0
[66303.927743] __sys_sendto+0x126/0x170
[66303.927780] __x64_sys_sendto+0x24/0x30
[66303.928593] do_syscall_64+0x5d/0x90
[66303.929370] ? __count_memcg_events+0x60/0xa0
[66303.930146] ? count_memcg_events.constprop.0+0x1a/0x30
[66303.930920] ? handle_mm_fault+0x9e/0x350
[66303.931688] ? do_user_addr_fault+0x258/0x740
[66303.932452] ? exc_page_fault+0x7f/0x180
[66303.933193] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fixes: 5b246e533d01 ("ice: split probe into smaller functions")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 24a3298ac9e6bd8de838ab79f7868207170d556d ]
Since commit 6624e780a577fc ("ice: split ice_vsi_setup into smaller
functions") ice_vsi_release does things twice. There is unregister
netdev which is unregistered in ice_deinit_eth also.
It also unregisters the devlink_port twice which is also unregistered
in ice_deinit_eth(). This double deregistration is hidden because
devl_port_unregister ignores the return value of xa_erase.
[ 68.642167] Call Trace:
[ 68.650385] ice_devlink_destroy_pf_port+0xe/0x20 [ice]
[ 68.655656] ice_vsi_release+0x445/0x690 [ice]
[ 68.660147] ice_deinit+0x99/0x280 [ice]
[ 68.664117] ice_remove+0x1b6/0x5c0 [ice]
[ 171.103841] Call Trace:
[ 171.109607] ice_devlink_destroy_pf_port+0xf/0x20 [ice]
[ 171.114841] ice_remove+0x158/0x270 [ice]
[ 171.118854] pci_device_remove+0x3b/0xc0
[ 171.122779] device_release_driver_internal+0xc7/0x170
[ 171.127912] driver_detach+0x54/0x8c
[ 171.131491] bus_remove_driver+0x77/0xd1
[ 171.135406] pci_unregister_driver+0x2d/0xb0
[ 171.139670] ice_module_exit+0xc/0x55f [ice]
Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1d6d537dc55d1f42d16290f00157ac387985b95b ]
Move the call to of_get_ethdev_address to mtk_add_mac which is part of
the probe function and can hence itself return -EPROBE_DEFER should
of_get_ethdev_address return -EPROBE_DEFER. This allows us to entirely
get rid of the mtk_init function.
The problem of of_get_ethdev_address returning -EPROBE_DEFER surfaced
in situations in which the NVMEM provider holding the MAC address has
not yet be loaded at the time mtk_eth_soc is initially probed. In this
case probing of mtk_eth_soc should be deferred instead of falling back
to use a random MAC address, so once the NVMEM provider becomes
available probing can be repeated.
Fixes: 656e705243fd ("net-next: mediatek: add support for MT7623 ethernet")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b685f1a58956fa36cc01123f253351b25bfacfda ]
CPSW ALE has 75 bit ALE entries which are stored within three 32 bit words.
The cpsw_ale_get_field() and cpsw_ale_set_field() functions assume that the
field will be strictly contained within one word. However, this is not
guaranteed to be the case and it is possible for ALE field entries to span
across up to two words at the most.
Fix the methods to handle getting/setting fields spanning up to two words.
Fixes: db82173f23c5 ("netdev: driver: ethernet: add cpsw address lookup engine support")
Signed-off-by: Tanmay Patil <t-patil@ti.com>
[s-vadapalli@ti.com: rephrased commit message and added Fixes tag]
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1cf3d5567f273a8746d1bade00633a93204f80f0 ]
Now, strncpy() in hns3_dbg_fill_content() use src-length as copy-length,
it may result in dest-buf overflow.
This patch is to fix intel compile warning for csky-linux-gcc (GCC) 12.1.0
compiler.
The warning reports as below:
hclge_debugfs.c:92:25: warning: 'strncpy' specified bound depends on
the length of the source argument [-Wstringop-truncation]
strncpy(pos, items[i].name, strlen(items[i].name));
hclge_debugfs.c:90:25: warning: 'strncpy' output truncated before
terminating nul copying as many bytes from a string as its length
[-Wstringop-truncation]
strncpy(pos, result[i], strlen(result[i]));
strncpy() use src-length as copy-length, it may result in
dest-buf overflow.
So,this patch add some values check to avoid this issue.
Signed-off-by: Hao Chen <chenhao418@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/lkml/202207170606.7WtHs9yS-lkp@intel.com/T/
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 004d25060c78fc31f66da0fa439c544dda1ac9d5 ]
In a setup where a Thunderbolt hub connects to Ethernet and a display
through USB Type-C, users may experience a hung task timeout when they
remove the cable between the PC and the Thunderbolt hub.
This is because the igb_down function is called multiple times when
the Thunderbolt hub is unplugged. For example, the igb_io_error_detected
triggers the first call, and the igb_remove triggers the second call.
The second call to igb_down will block at napi_synchronize.
Here's the call trace:
__schedule+0x3b0/0xddb
? __mod_timer+0x164/0x5d3
schedule+0x44/0xa8
schedule_timeout+0xb2/0x2a4
? run_local_timers+0x4e/0x4e
msleep+0x31/0x38
igb_down+0x12c/0x22a [igb 6615058754948bfde0bf01429257eb59f13030d4]
__igb_close+0x6f/0x9c [igb 6615058754948bfde0bf01429257eb59f13030d4]
igb_close+0x23/0x2b [igb 6615058754948bfde0bf01429257eb59f13030d4]
__dev_close_many+0x95/0xec
dev_close_many+0x6e/0x103
unregister_netdevice_many+0x105/0x5b1
unregister_netdevice_queue+0xc2/0x10d
unregister_netdev+0x1c/0x23
igb_remove+0xa7/0x11c [igb 6615058754948bfde0bf01429257eb59f13030d4]
pci_device_remove+0x3f/0x9c
device_release_driver_internal+0xfe/0x1b4
pci_stop_bus_device+0x5b/0x7f
pci_stop_bus_device+0x30/0x7f
pci_stop_bus_device+0x30/0x7f
pci_stop_and_remove_bus_device+0x12/0x19
pciehp_unconfigure_device+0x76/0xe9
pciehp_disable_slot+0x6e/0x131
pciehp_handle_presence_or_link_change+0x7a/0x3f7
pciehp_ist+0xbe/0x194
irq_thread_fn+0x22/0x4d
? irq_thread+0x1fd/0x1fd
irq_thread+0x17b/0x1fd
? irq_forced_thread_fn+0x5f/0x5f
kthread+0x142/0x153
? __irq_get_irqchip_state+0x46/0x46
? kthread_associate_blkcg+0x71/0x71
ret_from_fork+0x1f/0x30
In this case, igb_io_error_detected detaches the network interface
and requests a PCIE slot reset, however, the PCIE reset callback is
not being invoked and thus the Ethernet connection breaks down.
As the PCIE error in this case is a non-fatal one, requesting a
slot reset can be avoided.
This patch fixes the task hung issue and preserves Ethernet
connection by ignoring non-fatal PCIE errors.
Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230620174732.4145155-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 18da174d865a87d47d2f33f5b0a322efcf067728 ]
Implement 64 bit per cpu stats to fix the overflow of netdev->stats
on 32 bit platforms. To simplify the code, we use net core
pcpu_sw_netstats infrastructure. One small drawback is some memory
overhead because litex uses just one queue, but we allocate the
counters per cpu.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Gabriel Somlo <gsomlo@gmail.com>
Link: https://lore.kernel.org/r/20230614162035.300-1-jszhang@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit e31a9fedc7d8d80722b19628e66fcb5a36981780 upstream.
This reverts commit e1ed3e4d91112027b90c7ee61479141b3f948e6a.
Turned out the change causes a performance regression.
Link: https://lore.kernel.org/netdev/20230713124914.GA12924@green245/T/
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/055c6bc2-74fa-8c67-9897-3f658abb5ae7@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1e9cb763e9bacf0c932aa948f50dcfca6f519a26 upstream.
The ENA adapters on our instances occasionally reset. Once recently
logged a UBSAN failure to console in the process:
UBSAN: shift-out-of-bounds in build/linux/drivers/net/ethernet/amazon/ena/ena_com.c:540:13
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 28 PID: 70012 Comm: kworker/u72:2 Kdump: loaded not tainted 5.15.117
Hardware name: Amazon EC2 c5d.9xlarge/, BIOS 1.0 10/16/2017
Workqueue: ena ena_fw_reset_device [ena]
Call Trace:
<TASK>
dump_stack_lvl+0x4a/0x63
dump_stack+0x10/0x16
ubsan_epilogue+0x9/0x36
__ubsan_handle_shift_out_of_bounds.cold+0x61/0x10e
? __const_udelay+0x43/0x50
ena_delay_exponential_backoff_us.cold+0x16/0x1e [ena]
wait_for_reset_state+0x54/0xa0 [ena]
ena_com_dev_reset+0xc8/0x110 [ena]
ena_down+0x3fe/0x480 [ena]
ena_destroy_device+0xeb/0xf0 [ena]
ena_fw_reset_device+0x30/0x50 [ena]
process_one_work+0x22b/0x3d0
worker_thread+0x4d/0x3f0
? process_one_work+0x3d0/0x3d0
kthread+0x12a/0x150
? set_kthread_struct+0x50/0x50
ret_from_fork+0x22/0x30
</TASK>
Apparently, the reset delays are getting so large they can trigger a
UBSAN panic.
Looking at the code, the current timeout is capped at 5000us. Using a
base value of 100us, the current code will overflow after (1<<29). Even
at values before 32, this function wraps around, perhaps
unintentionally.
Cap the value of the exponent used for this backoff at (1<<16) which is
larger than currently necessary, but large enough to support bigger
values in the future.
Cc: stable@vger.kernel.org
Fixes: 4bb7f4cf60e3 ("net: ena: reduce driver load time")
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/r/20230711013621.GE1926@templeofstupid.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cc7eab25b1cf3f9594fe61142d3523ce4d14a788 upstream.
When moving devices from one namespace to another, mc addresses are
cleaned in software while not removed from application firmware. Thus
the mc addresses are remained and will cause resource leak.
Now use `__dev_mc_unsync` to clean mc addresses when closing port.
Fixes: e20aa071cd95 ("nfp: fix schedule in atomic context when sync mc address")
Cc: stable@vger.kernel.org
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Message-ID: <20230705052818.7122-1-louis.peens@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1b5ea7ffb7a3bdfffb4b7f40ce0d20a3372ee405 upstream.
With support for Ethernet PHY LEDs having been added, while
unregistering a MDIO bus and its child device liks PHYs there may be
"late" accesses to the MDIO bus. One typical use case is setting the PHY
LEDs brightness to OFF for instance.
We need to ensure that the MDIO bus controller remains entirely
functional since it runs off the main GENET adapter clock.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20230617155500.4005881-1-andrew@lunn.ch/
Fixes: 9a4e79697009 ("net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230622103107.1760280-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit aa846677a9fb19a0f2c58154c140398aa92a87ba ]
For some device types like TXGBE_ID_XAUI, *checksum computed in
txgbe_calc_eeprom_checksum() is larger than TXGBE_EEPROM_SUM. Remove the
limit on the size of *checksum.
Fixes: 049fe5365324 ("net: txgbe: Add operations to interact with firmware")
Fixes: 5e2ea7801fac ("net: txgbe: Fix unsigned comparison to zero in txgbe_calc_eeprom_checksum()")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://lore.kernel.org/r/20230711063414.3311-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8278ee2a2646b9acf747317895e47a640ba933c9 ]
Due to hardware limitation, MCAM drop rule with
ether_type == 802.1Q and vlan_id == 0 is not supported. Hence rejecting
such rules.
Fixes: dce677da57c0 ("octeontx2-pf: Add vlan-etype to ntuple filters")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Link: https://lore.kernel.org/r/20230710103027.2244139-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 56b3c6ba53d0e9649ea5e4089b39cadde13aaef8 ]
When the XDP feature is enabled and with heavy XDP frames to be
transmitted, there is a considerable probability that available
tx BDs are insufficient. This will lead to some XDP frames to be
discarded and the "NOT enough BD for SG!" error log will appear
in the console (as shown below).
[ 160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG!
In the case of heavy XDP traffic, sometimes the speed of recycling
tx BDs may be slower than the speed of sending XDP frames. There
may be several specific reasons, such as the interrupt is not
responsed in time, the efficiency of the NAPI callback function is
too low due to all the queues (tx queues and rx queues) share the
same NAPI, and so on.
After trying various methods, I think that increase the size of tx
BD ring is simple and effective. Maybe the best resolution is that
allocate NAPI for each queue to improve the efficiency of the NAPI
callback, but this change is a bit big and I didn't try this method.
Perheps this method will be implemented in a future patch.
This patch also updates the tx_wake_threshold of tx ring which is
related to the size of tx ring in the previous logic. Otherwise,
the tx_wake_threshold will be too high (403 BDs), which is more
likely to impact the slow path in the case of heavy XDP traffic,
because XDP path and slow path share the tx BD rings. According
to Jakub's suggestion, the tx_wake_threshold is at least equal to
tx_stop_threshold + 2 * MAX_SKB_FRAGS, if a queue of hundreds of
entries is overflowing, we should be able to apply a hysteresis
of a few tens of entries.
Fixes: 6d6b39f180b8 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 20f797399035a8052dbd7297fdbe094079a9482e ]
Once the XDP frames have been successfully transmitted through the
ndo_xdp_xmit() interface, it's the driver responsibility to free
the frames so that the page_pool can recycle the pages and reuse
them. However, this action is not implemented in the fec driver.
This leads to a user-visible problem that the console will print
the following warning log.
[ 157.568851] page_pool_release_retry() stalled pool shutdown 1389 inflight 60 sec
[ 217.983446] page_pool_release_retry() stalled pool shutdown 1389 inflight 120 sec
[ 278.399006] page_pool_release_retry() stalled pool shutdown 1389 inflight 181 sec
[ 338.812885] page_pool_release_retry() stalled pool shutdown 1389 inflight 241 sec
[ 399.226946] page_pool_release_retry() stalled pool shutdown 1389 inflight 302 sec
Therefore, to solve this issue, we free XDP frames via xdp_return_frame()
while cleaning the tx BD ring.
Fixes: 6d6b39f180b8 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bc638eabfed90fdc798fd5765e67e41abea76152 ]
The last_bdp is initialized to bdp, and both last_bdp and bdp are
not changed. That is to say that last_bdp and bdp are always equal.
So bdp can be used directly.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230529022615.669589-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 20f797399035 ("net: fec: recycle pages for transmitted XDP frames")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2ae9c66b04554bf5b3eeaab8c12a0bfb9f28ebde ]
This patch is a cleanup for fec driver. The fec_enet_reset_skb()
is used to free skb buffers for tx queues and is only invoked in
fec_restart(). However, fec_enet_bd_init() also resets skb buffers
and is invoked in fec_restart() too. So fec_enet_reset_skb() is
redundant and useless.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 20f797399035 ("net: fec: recycle pages for transmitted XDP frames")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0bcc62858d6ba62cbade957d69745e6adeed5f3d ]
The insertion of an empty frame was introduced with
commit db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit")
in order to ensure that the current cycle has at least one packet if
there is some packet to be scheduled for the next cycle.
However, the current implementation does not properly check if
a packet is already scheduled for the current cycle. Currently,
an empty packet is always inserted if and only if
txtime >= end_of_cycle && txtime > last_tx_cycle
but since last_tx_cycle is always either the end of the current
cycle (end_of_cycle) or the end of a previous cycle, the
second part (txtime > last_tx_cycle) is always true unless
txtime == last_tx_cycle.
What actually needs to be checked here is if the last_tx_cycle
was already written within the current cycle, so an empty frame
should only be inserted if and only if
txtime >= end_of_cycle && end_of_cycle > last_tx_cycle.
This patch does not only avoid an unnecessary insertion, but it
can actually be harmful to insert an empty packet if packets
are already scheduled in the current cycle, because it can lead
to a situation where the empty packet is actually processed
as the first packet in the upcoming cycle shifting the packet
with the first_flag even one cycle into the future, finally leading
to a TX hang.
The TX hang can be reproduced on a i225 with:
sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x1 \
txtime-delay 500000 \
clockid CLOCK_TAI
sudo tc qdisc replace dev enp1s0 parent 100:1 etf \
clockid CLOCK_TAI \
delta 500000 \
offload \
skip_sock_check
and traffic generator
sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns
with traffic.cfg
#define ETH_P_IP 0x0800
{
/* Ethernet Header */
0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed
0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed
const16(ETH_P_IP),
/* IPv4 Header */
0b01000101, 0, # IPv4 version, IHL, TOS
const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header))
const16(2), # IPv4 ident
0b01000000, 0, # IPv4 flags, fragmentation off
64, # IPv4 TTL
17, # Protocol UDP
csumip(14, 33), # IPv4 checksum
/* UDP Header */
10, 0, 48, 1, # IP Src - adapt as needed
10, 0, 48, 10, # IP Dest - adapt as needed
const16(5555), # UDP Src Port
const16(6666), # UDP Dest Port
const16(1008), # UDP length (UDP header 8 bytes + payload length)
csumudp(14, 34), # UDP checksum
/* Payload */
fill('W', 1000),
}
and the observed message with that is for example
igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang
Tx Queue <0>
TDH <32>
TDT <3c>
next_to_use <3c>
next_to_clean <32>
buffer_info[next_to_clean]
time_stamp <ffff26a8>
next_to_watch <00000000632a1828>
jiffies <ffff27f8>
desc.status <1048000>
Fixes: db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c1bca9ac0bcb355be11354c2e68bc7bf31f5ac5a ]
It is possible (verified on a running system) that frames are processed
by igc_tx_launchtime with a txtime before the start of the cycle
(baset_est).
However, the result of txtime - baset_est is written into a u32,
leading to a wrap around to a positive number. The following
launchtime > 0 check will only branch to executing launchtime = 0
if launchtime is already 0.
Fix it by using a s32 before checking launchtime > 0.
Fixes: db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8b86f10ab64eca0287ea8f7c94e9ad8b2e101c01 ]
The flags IGC_TXQCTL_STRICT_CYCLE and IGC_TXQCTL_STRICT_END
prevent the packet transmission over slot and cycle boundaries.
This is important for taprio offload where the slots and
cycles correspond to the slots and cycles configured for the
network.
However, the Qbv offload feature of the i225 is also used for
enabling TX launchtime / ETF offload. In that case, however,
the cycle has no meaning for the network and is only used
internally to adapt the base time register after a second has
passed.
Enabling strict mode in this case would unnecessarily prevent
the transmission of certain packets (i.e. at the boundary of a
second) and thus interferes with the ETF qdisc that promises
transmission at a certain point in time.
Similar to ETF, this also applies to CBS offload that also should
not be influenced by strict mode unless taprio offload would be
enabled at the same time.
This fully reverts
commit d8f45be01dd9 ("igc: Use strict cycles for Qbv scheduling")
but its commit message only describes what was already implemented
before that commit. The difference to a plain revert of that commit
is that it now copes with the base_time = 0 case that was fixed with
commit e17090eb2494 ("igc: allow BaseTime 0 enrollment for Qbv")
In particular, enabling strict mode leads to TX hang situations
under high traffic if taprio is applied WITHOUT taprio offload
but WITH ETF offload, e.g. as in
sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x1 \
txtime-delay 500000 \
clockid CLOCK_TAI
sudo tc qdisc replace dev enp1s0 parent 100:1 etf \
clockid CLOCK_TAI \
delta 500000 \
offload \
skip_sock_check
and traffic generator
sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns
with traffic.cfg
#define ETH_P_IP 0x0800
{
/* Ethernet Header */
0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed
0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed
const16(ETH_P_IP),
/* IPv4 Header */
0b01000101, 0, # IPv4 version, IHL, TOS
const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header))
const16(2), # IPv4 ident
0b01000000, 0, # IPv4 flags, fragmentation off
64, # IPv4 TTL
17, # Protocol UDP
csumip(14, 33), # IPv4 checksum
/* UDP Header */
10, 0, 48, 1, # IP Src - adapt as needed
10, 0, 48, 10, # IP Dest - adapt as needed
const16(5555), # UDP Src Port
const16(6666), # UDP Dest Port
const16(1008), # UDP length (UDP header 8 bytes + payload length)
csumudp(14, 34), # UDP checksum
/* Payload */
fill('W', 1000),
}
and the observed message with that is for example
igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang
Tx Queue <0>
TDH <d0>
TDT <f0>
next_to_use <f0>
next_to_clean <d0>
buffer_info[next_to_clean]
time_stamp <ffff661f>
next_to_watch <00000000245a4efb>
jiffies <ffff6e48>
desc.status <1048000>
Fixes: d8f45be01dd9 ("igc: Use strict cycles for Qbv scheduling")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e5d88c53d03f8df864776431175d08c053645f50 ]
Since commit e17090eb2494 ("igc: allow BaseTime 0 enrollment for Qbv")
it is possible to enable taprio offload with a basetime of 0.
However, the check if taprio offload is already enabled (and thus -EALREADY
should be returned for igc_save_qbv_schedule) still relied on
adapter->base_time > 0.
This can be reproduced as follows:
# TAPRIO offload (flags == 0x2) and base-time = 0
sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x2
# The second call should fail with "Error: Device failed to setup taprio offload."
# But that only happens if base-time was != 0
sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x2
Fixes: e17090eb2494 ("igc: allow BaseTime 0 enrollment for Qbv")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|