summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/sfc
AgeCommit message (Collapse)AuthorFilesLines
2022-12-02sfc: fix potential memleak in __ef100_hard_start_xmit()Zhang Changzhong1-0/+1
[ Upstream commit aad98abd5cb8133507f22654f56bcb443aaa2d89 ] The __ef100_hard_start_xmit() returns NETDEV_TX_OK without freeing skb in error handling case, add dev_kfree_skb_any() to fix it. Fixes: 51b35a454efd ("sfc: skeleton EF100 PF driver") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/1668671409-10909-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29sfc: include vport_id in filter spec hash and equal()Pieter Jansen van Vuuren2-6/+7
[ Upstream commit c2bf23e4a5af37a4d77901d9ff14c50a269f143d ] Filters on different vports are qualified by different implicit MACs and/or VLANs, so shouldn't be considered equal even if their other match fields are identical. Fixes: 7c460d9be610 ("sfc: Extend and abstract efx_filter_spec to cover Huntington/EF10") Co-developed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20221018092841.32206-1-pieter.jansen-van-vuuren@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29sfc: Change VF mac via PF as first preference if available.Jonathan Cooper1-34/+24
[ Upstream commit a8aed7b35becfd21f22a77c7014029ea837b018f ] Changing a VF's mac address through the VF (rather than via the PF) fails with EPERM because the latter part of efx_ef10_set_mac_address attempts to change the vport mac address list as the VF. Even with this fixed it still fails with EBUSY because the vadaptor is still assigned on the VF - the vadaptor reassignment must be within a section where the VF has torn down its state. A major reason this has broken is because we have two functions that ostensibly do the same thing - have a PF and VF cooperate to change a VF mac address. Rather than do this, if we are changing the mac of a VF that has a link to the PF in the same VM then simply call sriov_set_vf_mac instead, which is a proven working function that does that. If there is no PF available, or that fails non-fatally, then attempt to change the VF's mac address as we would a PF, without updating the PF's data. Test case: Create a VF: echo 1 > /sys/class/net/<if>/device/sriov_numvfs Set the mac address of the VF directly: ip link set <vf> addr 00:11:22:33:44:55 Set the MAC address of the VF via the PF: ip link set <pf> vf 0 mac 00:11:22:33:44:66 Without this patch the last command will fail with ENOENT. Signed-off-by: Jonathan Cooper <jonathan.s.cooper@amd.com> Reported-by: Íñigo Huguet <ihuguet@redhat.com> Fixes: 910c8789a777 ("set the MAC address using MC_CMD_VADAPTOR_SET_MAC") Acked-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-28sfc: fix null pointer dereference in efx_hard_start_xmitÍñigo Huguet1-1/+1
[ Upstream commit 0a242eb2913a4aa3d6fbdb86559f27628e9466f3 ] Trying to get the channel from the tx_queue variable here is wrong because we can only be here if tx_queue is NULL, so we shouldn't dereference it. As the above comment in the code says, this is very unlikely to happen, but it's wrong anyway so let's fix it. I hit this issue because of a different bug that caused tx_queue to be NULL. If that happens, this is the error message that we get here: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [...] RIP: 0010:efx_hard_start_xmit+0x153/0x170 [sfc] Fixes: 12804793b17c ("sfc: decouple TXQ type from label") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220914111135.21038-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-28sfc: fix TX channel offset when using legacy interruptsÍñigo Huguet1-1/+1
[ Upstream commit f232af4295653afa4ade3230462b3be15ad16419 ] In legacy interrupt mode the tx_channel_offset was hardcoded to 1, but that's not correct if efx_sepparate_tx_channels is false. In that case, the offset is 0 because the tx queues are in the single existing channel at index 0, together with the rx queue. Without this fix, as soon as you try to send any traffic, it tries to get the tx queues from an uninitialized channel getting these errors: WARNING: CPU: 1 PID: 0 at drivers/net/ethernet/sfc/tx.c:540 efx_hard_start_xmit+0x12e/0x170 [sfc] [...] RIP: 0010:efx_hard_start_xmit+0x12e/0x170 [sfc] [...] Call Trace: <IRQ> dev_hard_start_xmit+0xd7/0x230 sch_direct_xmit+0x9f/0x360 __dev_queue_xmit+0x890/0xa40 [...] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [...] RIP: 0010:efx_hard_start_xmit+0x153/0x170 [sfc] [...] Call Trace: <IRQ> dev_hard_start_xmit+0xd7/0x230 sch_direct_xmit+0x9f/0x360 __dev_queue_xmit+0x890/0xa40 [...] Fixes: c308dfd1b43e ("sfc: fix wrong tx channel offset with efx_separate_tx_channels") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220914103648.16902-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03sfc: disable softirqs for ptp TXAlejandro Lucero1-0/+22
[ Upstream commit 67c3b611d92fc238c43734878bc3e232ab570c79 ] Sending a PTP packet can imply to use the normal TX driver datapath but invoked from the driver's ptp worker. The kernel generic TX code disables softirqs and preemption before calling specific driver TX code, but the ptp worker does not. Although current ptp driver functionality does not require it, there are several reasons for doing so: 1) The invoked code is always executed with softirqs disabled for non PTP packets. 2) Better if a ptp packet transmission is not interrupted by softirq handling which could lead to high latencies. 3) netdev_xmit_more used by the TX code requires preemption to be disabled. Indeed a solution for dealing with kernel preemption state based on static kernel configuration is not possible since the introduction of dynamic preemption level configuration at boot time using the static calls functionality. Fixes: f79c957a0b537 ("drivers: net: sfc: use netdev_xmit_more helper") Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com> Link: https://lore.kernel.org/r/20220726064504.49613-1-alejandro.lucero-palau@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-21sfc: fix kernel panic when creating VFÍñigo Huguet1-0/+3
[ Upstream commit ada74c5539eba06cf8b47d068f92e0b3963a9a6e ] When creating VFs a kernel panic can happen when calling to efx_ef10_try_update_nic_stats_vf. When releasing a DMA coherent buffer, sometimes, I don't know in what specific circumstances, it has to unmap memory with vunmap. It is disallowed to do that in IRQ context or with BH disabled. Otherwise, we hit this line in vunmap, causing the crash: BUG_ON(in_interrupt()); This patch reenables BH to release the buffer. Log messages when the bug is hit: kernel BUG at mm/vmalloc.c:2727! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 6 PID: 1462 Comm: NetworkManager Kdump: loaded Tainted: G I --------- --- 5.14.0-119.el9.x86_64 #1 Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.8.2 08/27/2020 RIP: 0010:vunmap+0x2e/0x30 ...skip... Call Trace: __iommu_dma_free+0x96/0x100 efx_nic_free_buffer+0x2b/0x40 [sfc] efx_ef10_try_update_nic_stats_vf+0x14a/0x1c0 [sfc] efx_ef10_update_stats_vf+0x18/0x40 [sfc] efx_start_all+0x15e/0x1d0 [sfc] efx_net_open+0x5a/0xe0 [sfc] __dev_open+0xe7/0x1a0 __dev_change_flags+0x1d7/0x240 dev_change_flags+0x21/0x60 ...skip... Fixes: d778819609a2 ("sfc: DMA the VF stats only when requested") Reported-by: Ma Yuying <yuma@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220713092116.21238-1-ihuguet@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-21sfc: fix use after free when disabling sriovÍñigo Huguet1-3/+7
[ Upstream commit ebe41da5d47ac0fff877e57bd14c54dccf168827 ] Use after free is detected by kfence when disabling sriov. What was read after being freed was vf->pci_dev: it was freed from pci_disable_sriov and later read in efx_ef10_sriov_free_vf_vports, called from efx_ef10_sriov_free_vf_vswitching. Set the pointer to NULL at release time to not trying to read it later. Reproducer and dmesg log (note that kfence doesn't detect it every time): $ echo 1 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs $ echo 0 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs BUG: KFENCE: use-after-free read in efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc] Use-after-free read at 0x00000000ff3c1ba5 (in kfence-#224): efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc] efx_ef10_pci_sriov_disable+0x38/0x70 [sfc] efx_pci_sriov_configure+0x24/0x40 [sfc] sriov_numvfs_store+0xfe/0x140 kernfs_fop_write_iter+0x11c/0x1b0 new_sync_write+0x11f/0x1b0 vfs_write+0x1eb/0x280 ksys_write+0x5f/0xe0 do_syscall_64+0x5c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae kfence-#224: 0x00000000edb8ef95-0x00000000671f5ce1, size=2792, cache=kmalloc-4k allocated by task 6771 on cpu 10 at 3137.860196s: pci_alloc_dev+0x21/0x60 pci_iov_add_virtfn+0x2a2/0x320 sriov_enable+0x212/0x3e0 efx_ef10_sriov_configure+0x67/0x80 [sfc] efx_pci_sriov_configure+0x24/0x40 [sfc] sriov_numvfs_store+0xba/0x140 kernfs_fop_write_iter+0x11c/0x1b0 new_sync_write+0x11f/0x1b0 vfs_write+0x1eb/0x280 ksys_write+0x5f/0xe0 do_syscall_64+0x5c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae freed by task 6771 on cpu 12 at 3170.991309s: device_release+0x34/0x90 kobject_cleanup+0x3a/0x130 pci_iov_remove_virtfn+0xd9/0x120 sriov_disable+0x30/0xe0 efx_ef10_pci_sriov_disable+0x57/0x70 [sfc] efx_pci_sriov_configure+0x24/0x40 [sfc] sriov_numvfs_store+0xfe/0x140 kernfs_fop_write_iter+0x11c/0x1b0 new_sync_write+0x11f/0x1b0 vfs_write+0x1eb/0x280 ksys_write+0x5f/0xe0 do_syscall_64+0x5c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 3c5eb87605e85 ("sfc: create vports for VFs and assign random MAC addresses") Reported-by: Yanghang Liu <yanghliu@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220712062642.6915-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14sfc: fix wrong tx channel offset with efx_separate_tx_channelsÍñigo Huguet1-4/+2
[ Upstream commit c308dfd1b43ef0d4c3e57b741bb3462eb7a7f4a2 ] tx_channel_offset is calculated in efx_allocate_msix_channels, but it is also calculated again in efx_set_channels because it was originally done there, and when efx_allocate_msix_channels was introduced it was forgotten to be removed from efx_set_channels. Moreover, the old calculation is wrong when using efx_separate_tx_channels because now we can have XDP channels after the TX channels, so n_channels - n_tx_channels doesn't point to the first TX channel. Remove the old calculation from efx_set_channels, and add the initialization of this variable if MSI or legacy interrupts are used, next to the initialization of the rest of the related variables, where it was missing. Fixes: 3990a8fffbda ("sfc: allocate channels for XDP tx queues") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14sfc: fix considering that all channels have TX queuesMartin Habets1-1/+1
[ Upstream commit 2e102b53f8a778f872dc137f4c7ac548705817aa ] Normally, all channels have RX and TX queues, but this is not true if modparam efx_separate_tx_channels=1 is used. In that cases, some channels only have RX queues and others only TX queues (or more preciselly, they have them allocated, but not initialized). Fix efx_channel_has_tx_queues to return the correct value for this case too. Messages shown at probe time before the fix: sfc 0000:03:00.0 ens6f0np0: MC command 0x82 inlen 544 failed rc=-22 (raw=0) arg=0 ------------[ cut here ]------------ netdevice: ens6f0np0: failed to initialise TXQ -1 WARNING: CPU: 1 PID: 626 at drivers/net/ethernet/sfc/ef10.c:2393 efx_ef10_tx_init+0x201/0x300 [sfc] [...] stripped RIP: 0010:efx_ef10_tx_init+0x201/0x300 [sfc] [...] stripped Call Trace: efx_init_tx_queue+0xaa/0xf0 [sfc] efx_start_channels+0x49/0x120 [sfc] efx_start_all+0x1f8/0x430 [sfc] efx_net_open+0x5a/0xe0 [sfc] __dev_open+0xd0/0x190 __dev_change_flags+0x1b3/0x220 dev_change_flags+0x21/0x60 [...] stripped Messages shown at remove time before the fix: sfc 0000:03:00.0 ens6f0np0: failed to flush 10 queues sfc 0000:03:00.0 ens6f0np0: failed to flush queues Fixes: 8700aff08984 ("sfc: fix channel allocation with brute force") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Tested-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09sfc: ef10: Fix assigning negative value to unsigned variableHaowen Bai1-1/+1
[ Upstream commit b8ff3395fbdf3b79a99d0ef410fc34c51044121e ] fix warning reported by smatch: 251 drivers/net/ethernet/sfc/ef10.c:2259 efx_ef10_tx_tso_desc() warn: assigning (-208) to unsigned variable 'ip_tot_len' Signed-off-by: Haowen Bai <baihaowen@meizu.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/1649640757-30041-1-git-send-email-baihaowen@meizu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()Taehee Yoo1-0/+5
[ Upstream commit 1fa89ffbc04545b7582518e57f4b63e2a062870f ] In the NIC ->probe() callback, ->mtd_probe() callback is called. If NIC has 2 ports, ->probe() is called twice and ->mtd_probe() too. In the ->mtd_probe(), which is efx_ef10_mtd_probe() it allocates and initializes mtd partiion. But mtd partition for sfc is shared data. So that allocated mtd partition data from last called efx_ef10_mtd_probe() will not be used. Therefore it must be freed. But it doesn't free a not used mtd partition data in efx_ef10_mtd_probe(). kmemleak reports: unreferenced object 0xffff88811ddb0000 (size 63168): comm "systemd-udevd", pid 265, jiffies 4294681048 (age 348.586s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffa3767749>] kmalloc_order_trace+0x19/0x120 [<ffffffffa3873f0e>] __kmalloc+0x20e/0x250 [<ffffffffc041389f>] efx_ef10_mtd_probe+0x11f/0x270 [sfc] [<ffffffffc0484c8a>] efx_pci_probe.cold.17+0x3df/0x53d [sfc] [<ffffffffa414192c>] local_pci_probe+0xdc/0x170 [<ffffffffa4145df5>] pci_device_probe+0x235/0x680 [<ffffffffa443dd52>] really_probe+0x1c2/0x8f0 [<ffffffffa443e72b>] __driver_probe_device+0x2ab/0x460 [<ffffffffa443e92a>] driver_probe_device+0x4a/0x120 [<ffffffffa443f2ae>] __driver_attach+0x16e/0x320 [<ffffffffa4437a90>] bus_for_each_dev+0x110/0x190 [<ffffffffa443b75e>] bus_add_driver+0x39e/0x560 [<ffffffffa4440b1e>] driver_register+0x18e/0x310 [<ffffffffc02e2055>] 0xffffffffc02e2055 [<ffffffffa3001af3>] do_one_initcall+0xc3/0x450 [<ffffffffa33ca574>] do_init_module+0x1b4/0x700 Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Fixes: 8127d661e77f ("sfc: Add support for Solarflare SFC9100 family") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20220512054709.12513-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18net: sfc: fix memory leak due to ptp channelTaehee Yoo3-2/+20
[ Upstream commit 49e6123c65dac6393b04f39ceabf79c44f66b8be ] It fixes memory leak in ring buffer change logic. When ring buffer size is changed(ethtool -G eth0 rx 4096), sfc driver works like below. 1. stop all channels and remove ring buffers. 2. allocates new buffer array. 3. allocates rx buffers. 4. start channels. While the above steps are working, it skips some steps if the channel doesn't have a ->copy callback function. Due to ptp channel doesn't have ->copy callback, these above steps are skipped for ptp channel. It eventually makes some problems. a. ptp channel's ring buffer size is not changed, it works only 1024(default). b. memory leak. The reason for memory leak is to use the wrong ring buffer values. There are some values, which is related to ring buffer size. a. efx->rxq_entries - This is global value of rx queue size. b. rx_queue->ptr_mask - used for access ring buffer as circular ring. - roundup_pow_of_two(efx->rxq_entries) - 1 c. rx_queue->max_fill - efx->rxq_entries - EFX_RXD_HEAD_ROOM These all values should be based on ring buffer size consistently. But ptp channel's values are not. a. efx->rxq_entries - This is global(for sfc) value, always new ring buffer size. b. rx_queue->ptr_mask - This is always 1023(default). c. rx_queue->max_fill - This is new ring buffer size - EFX_RXD_HEAD_ROOM. Let's assume we set 4096 for rx ring buffer, normal channel ptp channel efx->rxq_entries 4096 4096 rx_queue->ptr_mask 4095 1023 rx_queue->max_fill 4086 4086 sfc driver allocates rx ring buffers based on these values. When it allocates ptp channel's ring buffer, 4086 ring buffers are allocated then, these buffers are attached to the allocated array. But ptp channel's ring buffer array size is still 1024(default) and ptr_mask is still 1023 too. So, 3062 ring buffers will be overwritten to the array. This is the reason for memory leak. Test commands: ethtool -G <interface name> rx 4096 while : do ip link set <interface name> up ip link set <interface name> down done In order to avoid this problem, it adds ->copy callback to ptp channel type. So that rx_queue->ptr_mask value will be updated correctly. Fixes: 7c236c43b838 ("sfc: Add support for IEEE-1588 PTP") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18sfc: Use swap() instead of open coding itJiapeng Chong1-10/+4
[ Upstream commit 0cf765fb00ce083c017f2571ac449cf7912cdb06 ] Clean the following coccicheck warning: ./drivers/net/ethernet/sfc/efx_channels.c:870:36-37: WARNING opportunity for swap(). ./drivers/net/ethernet/sfc/efx_channels.c:824:36-37: WARNING opportunity for swap(). Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13net: sfc: fix using uninitialized xdp tx_queueTaehee Yoo3-1/+6
[ Upstream commit fb5833d81e4333294add35d3ac7f7f52a7bf107f ] In some cases, xdp tx_queue can get used before initialization. 1. interface up/down 2. ring buffer size change When CPU cores are lower than maximum number of channels of sfc driver, it creates new channels only for XDP. When an interface is up or ring buffer size is changed, all channels are initialized. But xdp channels are always initialized later. So, the below scenario is possible. Packets are received to rx queue of normal channels and it is acted XDP_TX and tx_queue of xdp channels get used. But these tx_queues are not initialized yet. If so, TX DMA or queue error occurs. In order to avoid this problem. 1. initializes xdp tx_queues earlier than other rx_queue in efx_start_channels(). 2. checks whether tx_queue is initialized or not in efx_xdp_tx_buffers(). Splat looks like: sfc 0000:08:00.1 enp8s0f1np1: TX queue 10 spurious TX completion id 250 sfc 0000:08:00.1 enp8s0f1np1: resetting (RECOVER_OR_ALL) sfc 0000:08:00.1 enp8s0f1np1: MC command 0x80 inlen 100 failed rc=-22 (raw=22) arg=789 sfc 0000:08:00.1 enp8s0f1np1: has been disabled Fixes: f28100cb9c96 ("sfc: fix lack of XDP TX queues - error XDP TX failed (-22)") Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13sfc: Do not free an empty page_ringMartin Habets1-0/+3
[ Upstream commit 458f5d92df4807e2a7c803ed928369129996bf96 ] When the page_ring is not used page_ptr_mask is 0. Do not dereference page_ring[0] in this case. Fixes: 2768935a4660 ("sfc: reuse pages to avoid DMA mapping/unmapping costs") Reported-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13net: sfc: add missing xdp queue reinitializationTaehee Yoo1-65/+81
[ Upstream commit 059a47f1da93811d37533556d67e72f2261b1127 ] After rx/tx ring buffer size is changed, kernel panic occurs when it acts XDP_TX or XDP_REDIRECT. When tx/rx ring buffer size is changed(ethtool -G), sfc driver reallocates and reinitializes rx and tx queues and their buffer (tx_queue->buffer). But it misses reinitializing xdp queues(efx->xdp_tx_queues). So, while it is acting XDP_TX or XDP_REDIRECT, it uses the uninitialized tx_queue->buffer. A new function efx_set_xdp_channels() is separated from efx_set_channels() to handle only xdp queues. Splat looks like: BUG: kernel NULL pointer dereference, address: 000000000000002a #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#4] PREEMPT SMP NOPTI RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D 5.17.0+ #55 e8beeee8289528f11357029357cf Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80 RSP: 0018:ffff92f121e45c60 EFLAGS: 00010297 RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc] RAX: 0000000000000040 RBX: ffff92ea506895c0 RCX: ffffffffc0330870 RDX: 0000000000000001 RSI: 00000001139b10ce RDI: ffff92ea506895c0 RBP: ffffffffc0358a80 R08: 00000001139b110d R09: 0000000000000000 R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040 R13: 0000000000000018 R14: 00000001139b10ce R15: ffff92ea506895c0 FS: 0000000000000000(0000) GS:ffff92f121ec0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80 CR2: 000000000000002a CR3: 00000003e6810004 CR4: 00000000007706e0 RSP: 0018:ffff92f121e85c60 EFLAGS: 00010297 PKRU: 55555554 RAX: 0000000000000040 RBX: ffff92ea50689700 RCX: ffffffffc0330870 RDX: 0000000000000001 RSI: 00000001145a90ce RDI: ffff92ea50689700 RBP: ffffffffc0358a80 R08: 00000001145a910d R09: 0000000000000000 R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040 R13: 0000000000000018 R14: 00000001145a90ce R15: ffff92ea50689700 FS: 0000000000000000(0000) GS:ffff92f121e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000002a CR3: 00000003e6810005 CR4: 00000000007706e0 PKRU: 55555554 Call Trace: <IRQ> efx_xdp_tx_buffers+0x12b/0x3d0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] __efx_rx_packet+0x5c3/0x930 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] efx_rx_packet+0x28c/0x2e0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] efx_ef10_ev_process+0x5f8/0xf40 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] ? enqueue_task_fair+0x95/0x550 efx_poll+0xc4/0x360 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] Fixes: 3990a8fffbda ("sfc: allocate channels for XDP tx queues") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-19sfc: extend the locking on mcdi->seqnoNiels Dossche1-1/+1
[ Upstream commit f1fb205efb0ccca55626fd4ef38570dd16b44719 ] seqno could be read as a stale value outside of the lock. The lock is already acquired to protect the modification of seqno against a possible race condition. Place the reading of this value also inside this locking to protect it against a possible race condition. Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-11sfc: The RX page_ring is optionalMartin Habets2-0/+10
commit 1d5a474240407c38ca8c7484a656ee39f585399c upstream. The RX page_ring is an optional feature that improves performance. When allocation fails the driver can still function, but possibly with a lower bandwidth. Guard against dereferencing a NULL page_ring. Fixes: 2768935a4660 ("sfc: reuse pages to avoid DMA mapping/unmapping costs") Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Reported-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Link: https://lore.kernel.org/r/164111288276.5798.10330502993729113868.stgit@palantir17.mph.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-29sfc: falcon: Check null pointer of rx_queue->page_ringJiasheng Jiang1-1/+4
[ Upstream commit 9b8bdd1eb5890aeeab7391dddcf8bd51f7b07216 ] Because of the possible failure of the kcalloc, it should be better to set rx_queue->page_ptr_mask to 0 when it happens in order to maintain the consistency. Fixes: 5a6681e22c14 ("sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20211220140344.978408-1-jiasheng@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-29sfc: Check null pointer of rx_queue->page_ringJiasheng Jiang1-1/+4
[ Upstream commit bdf1b5c3884f6a0dc91b0dbdb8c3b7d205f449e0 ] Because of the possible failure of the kcalloc, it should be better to set rx_queue->page_ptr_mask to 0 when it happens in order to maintain the consistency. Fixes: 5a6681e22c14 ("sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20211220135603.954944-1-jiasheng@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22sfc_ef100: potential dereference of null pointerJiasheng Jiang1-0/+3
[ Upstream commit 407ecd1bd726f240123f704620d46e285ff30dd9 ] The return value of kmalloc() needs to be checked. To avoid use in efx_nic_update_stats() in case of the failure of alloc. Fixes: b593b6f1b492 ("sfc_ef100: statistics gathering") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-06sfc: Fix reading non-legacy supported link modesErik Ekman1-8/+2
commit 041c61488236a5a84789083e3d9f0a51139b6edf upstream. Everything except the first 32 bits was lost when the pause flags were added. This makes the 50000baseCR2 mode flag (bit 34) not appear. I have tested this with a 10G card (SFN5122F-R7) by modifying it to return a non-legacy link mode (10000baseCR). Signed-off-by: Erik Ekman <erik@kryo.se> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-21sfc: Don't use netif_info before net_device setupErik Ekman2-3/+3
Use pci_info instead to avoid unnamed/uninitialized noise: [197088.688729] sfc 0000:01:00.0: Solarflare NIC detected [197088.690333] sfc 0000:01:00.0: Part Number : SFN5122F [197088.729061] sfc 0000:01:00.0 (unnamed net_device) (uninitialized): no SR-IOV VFs probed [197088.729071] sfc 0000:01:00.0 (unnamed net_device) (uninitialized): no PTP support Inspired by fa44821a4ddd ("sfc: don't use netif_info et al before net_device is registered") from Heiner Kallweit. Signed-off-by: Erik Ekman <erik@kryo.se> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21sfc: Export fibre-specific supported link modesErik Ekman1-11/+26
The 1/10GbaseT modes were set up for cards with SFP+ cages in 3497ed8c852a5 ("sfc: report supported link speeds on SFP connections"). 10GbaseT was likely used since no 10G fibre mode existed. The missing fibre modes for 1/10G were added to ethtool.h in 5711a9822144 ("net: ethtool: add support for 1000BaseX and missing 10G link modes") shortly thereafter. The user guide available at https://support-nic.xilinx.com/wp/drivers lists support for the following cable and transceiver types in section 2.9: - QSFP28 100G Direct Attach Cables - QSFP28 100G SR Optical Transceivers (with SR4 modules listed) - SFP28 25G Direct Attach Cables - SFP28 25G SR Optical Transceivers - QSFP+ 40G Direct Attach Cables - QSFP+ 40G Active Optical Cables - QSFP+ 40G SR4 Optical Transceivers - QSFP+ to SFP+ Breakout Direct Attach Cables - QSFP+ to SFP+ Breakout Active Optical Cables - SFP+ 10G Direct Attach Cables - SFP+ 10G SR Optical Transceivers - SFP+ 10G LR Optical Transceivers - SFP 1000BASE‐T Transceivers - 1G Optical Transceivers (From user guide issue 28. Issue 16 which also includes older cards like SFN5xxx/SFN6xxx has matching lists for 1/10/40G transceiver types.) Regarding SFP+ 10GBASE‐T transceivers the latest guide says: "Solarflare adapters do not support 10GBASE‐T transceiver modules." Tested using SFN5122F-R7 (with 2 SFP+ ports). Supported link modes do not change depending on module used (tested with 1000BASE-T, 1000BASE-BX10, 10GBASE-LR). Before: $ ethtool ext Settings for ext: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 10000baseT/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Link partner advertised link modes: Not reported Link partner advertised pause frame use: No Link partner advertised auto-negotiation: No Link partner advertised FEC modes: Not reported Speed: 1000Mb/s Duplex: Full Auto-negotiation: off Port: FIBRE PHYAD: 255 Transceiver: internal Current message level: 0x000020f7 (8439) drv probe link ifdown ifup rx_err tx_err hw Link detected: yes After: $ ethtool ext Settings for ext: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 1000baseX/Full 10000baseCR/Full 10000baseSR/Full 10000baseLR/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Link partner advertised link modes: Not reported Link partner advertised pause frame use: No Link partner advertised auto-negotiation: No Link partner advertised FEC modes: Not reported Speed: 1000Mb/s Duplex: Full Auto-negotiation: off Port: FIBRE PHYAD: 255 Transceiver: internal Supports Wake-on: g Wake-on: d Current message level: 0x000020f7 (8439) drv probe link ifdown ifup rx_err tx_err hw Link detected: yes Signed-off-by: Erik Ekman <erik@kryo.se> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-16Merge tag 'net-5.15-rc2' of ↵Linus Torvalds3-40/+103
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf. Current release - regressions: - vhost_net: fix OoB on sendmsg() failure - mlx5: bridge, fix uninitialized variable usage - bnxt_en: fix error recovery regression Current release - new code bugs: - bpf, mm: fix lockdep warning triggered by stack_map_get_build_id_offset() Previous releases - regressions: - r6040: restore MDIO clock frequency after MAC reset - tcp: fix tp->undo_retrans accounting in tcp_sacktag_one() - dsa: flush switchdev workqueue before tearing down CPU/DSA ports Previous releases - always broken: - ptp: dp83640: don't define PAGE0, avoid compiler warning - igc: fix tunnel segmentation offloads - phylink: update SFP selected interface on advertising changes - stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume - mlx5e: fix mutual exclusion between CQE compression and HW TS Misc: - bpf, cgroups: fix cgroup v2 fallback on v1/v2 mixed mode - sfc: fallback for lack of xdp tx queues - hns3: add option to turn off page pool feature" * tag 'net-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits) mlxbf_gige: clear valid_polarity upon open igc: fix tunnel offloading net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert net: wan: wanxl: define CROSS_COMPILE_M68K selftests: nci: replace unsigned int with int net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports Revert "net: phy: Uniform PHY driver access" net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup ptp: dp83640: don't define PAGE0 bnx2x: Fix enabling network interfaces without VFs Revert "Revert "ipv4: fix memory leaks in ip_cmsg_send() callers"" tcp: fix tp->undo_retrans accounting in tcp_sacktag_one() net-caif: avoid user-triggerable WARN_ON(1) bpf, selftests: Add test case for mixed cgroup v1/v2 bpf, selftests: Add cgroup v1 net_cls classid helpers bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode bpf: Add oversize check before call kvcalloc() net: hns3: fix the timing issue of VF clearing interrupt sources net: hns3: fix the exception when query imp info net: hns3: disable mac in flr process ...
2021-09-09sfc: last resort fallback for lack of xdp tx queuesÍñigo Huguet3-40/+58
Previous patch addressed the situation of having some free resources for xdp tx but not enough for one tx queue per CPU. This patch address the worst case of not having resources at all for xdp tx. Instead of using queues dedicated to xdp, normal queues used by network stack are shared for both cases, using __netif_tx_lock for synchronization. Also queue stop/restart must be considered in the xdp path to avoid freezing the queue. This is not the ideal situation we might want to be, and a performance penalty is expected both for normal and xdp traffic, but at least XDP will work in all possible situations (with a warning in the logs), improving a bit the pain of not knowing in what situations we can use it and in what situations we cannot. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-09sfc: fallback for lack of xdp tx queuesÍñigo Huguet3-19/+64
If there are not enough resources to allocate one TX queue per core for XDP TX it was completely disabled. This patch implements a fallback solution for sharing the available queues using __netif_tx_lock for synchronization. In the normal case that there is one TX queue per CPU, no locking is done, as it was before. With this fallback solution, XDP TX will work in much more cases that were failing, specially in machines with many CPUs. It's hard for XDP users to know what features are supported across different NICs and configurations, so they will benefit on having wider support. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-08Merge tag 'pci-v5.15-changes' of ↵Linus Torvalds2-117/+40
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Convert controller drivers to generic_handle_domain_irq() (Marc Zyngier) - Simplify VPD (Vital Product Data) access and search (Heiner Kallweit) - Update bnx2, bnx2x, bnxt, cxgb4, cxlflash, sfc, tg3 drivers to use simplified VPD interfaces (Heiner Kallweit) - Run Max Payload Size quirks before configuring MPS; work around ASMedia ASM1062 SATA MPS issue (Marek Behún) Resource management: - Refactor pci_ioremap_bar() and pci_ioremap_wc_bar() (Krzysztof Wilczyński) - Optimize pci_resource_len() to reduce kernel size (Zhen Lei) PCI device hotplug: - Fix a double unmap in ibmphp (Vishal Aslot) PCIe port driver: - Enable Bandwidth Notification only if port supports it (Stuart Hayes) Sysfs/proc/syscalls: - Add schedule point in proc_bus_pci_read() (Krzysztof Wilczyński) - Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure (Krzysztof Wilczyński) - Return "int" from pciconfig_read() syscall (Krzysztof Wilczyński) Virtualization: - Extend "pci=noats" to also turn on Translation Blocking to protect against some DMA attacks (Alex Williamson) - Add sysfs mechanism to control the type of reset used between device assignments to VMs (Amey Narkhede) - Add support for ACPI _RST reset method (Shanker Donthineni) - Add ACS quirks for Cavium multi-function devices (George Cherian) - Add ACS quirks for NXP LX2xx0 and LX2xx2 platforms (Wasim Khan) - Allow HiSilicon AMBA devices that appear as fake PCI devices to use PASID and SVA (Zhangfei Gao) Endpoint framework: - Add support for SR-IOV Endpoint devices (Kishon Vijay Abraham I) - Zero-initialize endpoint test tool parameters so we don't use random parameters (Shunyong Yang) APM X-Gene PCIe controller driver: - Remove redundant dev_err() call in xgene_msi_probe() (ErKun Yang) Broadcom iProc PCIe controller driver: - Don't fail devm_pci_alloc_host_bridge() on missing 'ranges' because it's optional on BCMA devices (Rob Herring) - Fix BCMA probe resource handling (Rob Herring) Cadence PCIe driver: - Work around J7200 Link training electrical issue by increasing delays in LTSSM (Nadeem Athani) Intel IXP4xx PCI controller driver: - Depend on ARCH_IXP4XX to avoid useless config questions (Geert Uytterhoeven) Intel Keembay PCIe controller driver: - Add Intel Keem Bay PCIe controller (Srikanth Thokala) Marvell Aardvark PCIe controller driver: - Work around config space completion handling issues (Evan Wang) - Increase timeout for config access completions (Pali Rohár) - Emulate CRS Software Visibility bit (Pali Rohár) - Configure resources from DT 'ranges' property to fix I/O space access (Pali Rohár) - Serialize INTx mask/unmask (Pali Rohár) MediaTek PCIe controller driver: - Add MT7629 support in DT (Chuanjia Liu) - Fix an MSI issue (Chuanjia Liu) - Get syscon regmap ("mediatek,generic-pciecfg"), IRQ number ("pci_irq"), PCI domain ("linux,pci-domain") from DT properties if present (Chuanjia Liu) Microsoft Hyper-V host bridge driver: - Add ARM64 support (Boqun Feng) - Support "Create Interrupt v3" message (Sunil Muthuswamy) NVIDIA Tegra PCIe controller driver: - Use seq_puts(), move err_msg from stack to static, fix OF node leak (Christophe JAILLET) NVIDIA Tegra194 PCIe driver: - Disable suspend when in Endpoint mode (Om Prakash Singh) - Fix MSI-X address programming error (Om Prakash Singh) - Disable interrupts during suspend to avoid spurious AER link down (Om Prakash Singh) Renesas R-Car PCIe controller driver: - Work around hardware issue that prevents Link L1->L0 transition (Marek Vasut) - Fix runtime PM refcount leak (Dinghao Liu) Rockchip DesignWare PCIe controller driver: - Add Rockchip RK356X host controller driver (Simon Xue) TI J721E PCIe driver: - Add support for J7200 and AM64 (Kishon Vijay Abraham I) Toshiba Visconti PCIe controller driver: - Add Toshiba Visconti PCIe host controller driver (Nobuhiro Iwamatsu) Xilinx NWL PCIe controller driver: - Enable PCIe reference clock via CCF (Hyun Kwon) Miscellaneous: - Convert sta2x11 from 'pci_' to 'dma_' API (Christophe JAILLET) - Fix pci_dev_str_match_path() alloc while atomic bug (used for kernel parameters that specify devices) (Dan Carpenter) - Remove pointless Precision Time Management warning when PTM is present but not enabled (Jakub Kicinski) - Remove surplus "break" statements (Krzysztof Wilczyński)" * tag 'pci-v5.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (132 commits) PCI: ibmphp: Fix double unmap of io_mem x86/PCI: sta2x11: switch from 'pci_' to 'dma_' API PCI/VPD: Use unaligned access helpers PCI/VPD: Clean up public VPD defines and inline functions cxgb4: Use pci_vpd_find_id_string() to find VPD ID string PCI/VPD: Add pci_vpd_find_id_string() PCI/VPD: Include post-processing in pci_vpd_find_tag() PCI/VPD: Stop exporting pci_vpd_find_info_keyword() PCI/VPD: Stop exporting pci_vpd_find_tag() PCI: Set dma-can-stall for HiSilicon chips PCI: rockchip-dwc: Add Rockchip RK356X host controller driver PCI: dwc: Remove surplus break statement after return PCI: artpec6: Remove local code block from switch statement PCI: artpec6: Remove surplus break statement after return MAINTAINERS: Add entries for Toshiba Visconti PCIe controller PCI: visconti: Add Toshiba Visconti PCIe host controller driver PCI/portdrv: Enable Bandwidth Notification only if port supports it PCI: Allow PASID on fake PCIe devices without TLP prefixes PCI: mediatek: Use PCI domain to handle ports detection PCI: mediatek: Add new method to get irq number ...
2021-08-24sfc: falcon: Search VPD with pci_vpd_find_ro_info_keyword()Heiner Kallweit1-51/+14
Use pci_vpd_find_ro_info_keyword() to search for keywords in VPD to simplify the code. Replace netif_err() with pci_err() because the netdevice isn't registered yet, which results in very ugly messages. Use kmemdup_nul() instead of open-coding it. This is the same as 37838aa437c7 ("sfc: Search VPD with pci_vpd_find_ro_info_keyword()"), just for the falcon chip version. Link: https://lore.kernel.org/r/898282a1-13bd-17bc-2e9a-d3dcd336b46c@gmail.com Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-08-24sfc: falcon: Read VPD with pci_vpd_alloc()Heiner Kallweit1-16/+14
Use pci_vpd_alloc() to dynamically allocate a properly sized buffer and read the full VPD data into it. This avoids having to allocate a buffer on the stack, and we don't have to make any assumptions on VPD size and location of information in VPD. This is the same as 5119e20facfa ("sfc: Read VPD with pci_vpd_alloc()"), just for the falcon chip version. Link: https://lore.kernel.org/r/2a8d069e-9516-50d8-6520-2614222c8f5f@gmail.com Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-08-24ethtool: extend coalesce setting uAPI with CQE modeYufeng Mo2-4/+12
In order to support more coalesce parameters through netlink, add two new parameter kernel_coal and extack for .set_coalesce and .get_coalesce, then some extra info can return to user with the netlink API. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-23Revert "sfc: falcon: Read VPD with pci_vpd_alloc()"David S. Miller1-14/+16
This reverts commit 3873a9a4d8a87d4a15ff0083cf3b173b190c9089. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-23Revert "sfc: falcon: Search VPD with pci_vpd_find_ro_info_keyword()"David S. Miller1-14/+51
This reverts commit 01dbe7129d9ccd5fe940897888645f06327b34ff. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-22sfc: falcon: Search VPD with pci_vpd_find_ro_info_keyword()Heiner Kallweit1-51/+14
This is the same as 37838aa437c7 "sfc: Search VPD with pci_vpd_find_ro_info_keyword()", just for the falcon chip version. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-22sfc: falcon: Read VPD with pci_vpd_alloc()Heiner Kallweit1-16/+14
This is the same as 5119e20facfa "sfc: Read VPD with pci_vpd_alloc()", just for the falcon chip version. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20sfc: Search VPD with pci_vpd_find_ro_info_keyword()Heiner Kallweit1-51/+14
Use pci_vpd_find_ro_info_keyword() to search for keywords in VPD to simplify the code. Replace netif_err() with pci_err() because the netdevice isn't registered yet, which results in very ugly messages. Use kmemdup_nul() instead of open-coding it. Link: https://lore.kernel.org/r/bf5d4ba9-61a9-2bfe-19ec-75472732d74d@gmail.com Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-08-20sfc: Read VPD with pci_vpd_alloc()Heiner Kallweit1-15/+14
Use pci_vpd_alloc() to dynamically allocate a properly sized buffer and read the full VPD data into it. This avoids having to allocate a buffer on the stack, and we don't have to make any assumptions on VPD size and location of information in VPD. Link: https://lore.kernel.org/r/e58f1e40-c043-0266-9a0f-e5a7f3f6883c@gmail.com Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-08-14ethernet: fix PTP_1588_CLOCK dependenciesArnd Bergmann1-1/+1
The 'imply' keyword does not do what most people think it does, it only politely asks Kconfig to turn on another symbol, but does not prevent it from being disabled manually or built as a loadable module when the user is built-in. In the ICE driver, the latter now causes a link failure: aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_eth_ioctl': ice_main.c:(.text+0x13b0): undefined reference to `ice_ptp_get_ts_config' ice_main.c:(.text+0x13b0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_get_ts_config' aarch64-linux-ld: ice_main.c:(.text+0x13bc): undefined reference to `ice_ptp_set_ts_config' ice_main.c:(.text+0x13bc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_set_ts_config' aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_prepare_for_reset': ice_main.c:(.text+0x31fc): undefined reference to `ice_ptp_release' ice_main.c:(.text+0x31fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_release' aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_rebuild': This is a recurring problem in many drivers, and we have discussed it several times befores, without reaching a consensus. I'm providing a link to the previous email thread for reference, which discusses some related problems. To solve the dependency issue better than the 'imply' keyword, introduce a separate Kconfig symbol "CONFIG_PTP_1588_CLOCK_OPTIONAL" that any driver can depend on if it is able to use PTP support when available, but works fine without it. Whenever CONFIG_PTP_1588_CLOCK=m, those drivers are then prevented from being built-in, the same way as with a 'depends on PTP_1588_CLOCK || !PTP_1588_CLOCK' dependency that does the same trick, but that can be rather confusing when you first see it. Since this should cover the dependencies correctly, the IS_REACHABLE() hack in the header is no longer needed now, and can be turned back into a normal IS_ENABLED() check. Any driver that gets the dependency wrong will now cause a link time failure rather than being unable to use PTP support when that is in a loadable module. However, the two recently added ptp_get_vclocks_index() and ptp_convert_timestamp() interfaces are only called from builtin code with ethtool and socket timestamps, so keep the current behavior by stubbing those out completely when PTP is in a loadable module. This should be addressed properly in a follow-up. As Richard suggested, we may want to actually turn PTP support into a 'bool' option later on, preventing it from being a loadable module altogether, which would be one way to solve the problem with the ethtool interface. Fixes: 06c16d89d2cb ("ice: register 1588 PTP clock device object for E810 devices") Link: https://lore.kernel.org/netdev/20210804121318.337276-1-arnd@kernel.org/ Link: https://lore.kernel.org/netdev/CAK8P3a06enZOf=XyZ+zcAwBczv41UuCTz+=0FMf2gBz1_cOnZQ@mail.gmail.com/ Link: https://lore.kernel.org/netdev/CAK8P3a3=eOxE-K25754+fB_-i_0BZzf9a9RfPTX3ppSwu9WZXw@mail.gmail.com/ Link: https://lore.kernel.org/netdev/20210726084540.3282344-1-arnd@kernel.org/ Acked-by: Shannon Nelson <snelson@pensando.io> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210812183509.1362782-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-27dev_ioctl: split out ndo_eth_ioctlArnd Bergmann2-2/+2
Most users of ndo_do_ioctl are ethernet drivers that implement the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP. Separate these from the few drivers that use ndo_do_ioctl to implement SIOCBOND, SIOCBR and SIOCWANDEV commands. This is a purely cosmetic change intended to help readers find their way through the implementation. Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-13sfc: add logs explaining XDP_TX/REDIRECT is not availableÍñigo Huguet1-0/+4
If it's not possible to allocate enough channels for XDP, XDP_TX and XDP_REDIRECT don't work. However, only a message saying that not enough channels were available was shown, but not saying what are the consequences in that case. The user didn't know if he/she can use XDP or not, if the performance is reduced, or what. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-13sfc: ensure correct number of XDP queuesÍñigo Huguet1-7/+8
Commit 99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues") intended to fix a problem caused by a round up when calculating the number of XDP channels and queues. However, this was not the real problem. The real problem was that the number of XDP TX queues had been reduced to half in commit e26ca4b53582 ("sfc: reduce the number of requested xdp ev queues"), but the variable xdp_tx_queue_count had remained the same. Once the correct number of XDP TX queues is created again in the previous patch of this series, this also can be reverted since the error doesn't actually exist. Only in the case that there is a bug in the code we can have different values in xdp_queue_number and efx->xdp_tx_queue_count. Because of this, and per Edward Cree's suggestion, I add instead a WARN_ON to catch if it happens again in the future. Note that the number of allocated queues can be higher than the number of used ones due to the round up, as explained in the existing comment in the code. That's why we also have to stop increasing xdp_queue_number beyond efx->xdp_tx_queue_count. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-13sfc: fix lack of XDP TX queues - error XDP TX failed (-22)Íñigo Huguet1-1/+2
Fixes: e26ca4b53582 sfc: reduce the number of requested xdp ev queues The buggy commit intended to allocate less channels for XDP in order to be more unlikely to reach the limit of 32 channels of the driver. The idea was to use each IRQ/eventqeue for more XDP TX queues than before, calculating which is the maximum number of TX queues that one event queue can handle. For example, in EF10 each event queue could handle up to 8 queues, better than the 4 they were handling before the change. This way, it would have to allocate half of channels than before for XDP TX. The problem is that the TX queues are also contained inside the channel structs, and there are only 4 queues per channel. Reducing the number of channels means also reducing the number of queues, resulting in not having the desired number of 1 queue per CPU. This leads to getting errors on XDP_TX and XDP_REDIRECT if they're executed from a high numbered CPU, because there only exist queues for the low half of CPUs, actually. If XDP_TX/REDIRECT is executed in a low numbered CPU, the error doesn't happen. This is the error in the logs (repeated many times, even rate limited): sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22) This errors happens in function efx_xdp_tx_buffers, where it expects to have a dedicated XDP TX queue per CPU. Reverting the change makes again more likely to reach the limit of 32 channels in machines with many CPUs. If this happen, no XDP_TX/REDIRECT will be possible at all, and we will have this log error messages: At interface probe: sfc 0000:5e:00.0: Insufficient resources for 12 XDP event queues (24 other channels, max 32) At every subsequent XDP_TX/REDIRECT failure, rate limited: sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22) However, without reverting the change, it makes the user to think that everything is OK at probe time, but later it fails in an unpredictable way, depending on the CPU that handles the packet. It is better to restore the predictable behaviour. If the user sees the error message at probe time, he/she can try to configure the best way it fits his/her needs. At least, he/she will have 2 options: - Accept that XDP_TX/REDIRECT is not available (he/she may not need it) - Load sfc module with modparam 'rss_cpus' with a lower number, thus creating less normal RX queues/channels, letting more free resources for XDP, with some performance penalty. Anyway, let the calculation of maximum TX queues that can be handled by a single event queue, and use it only if it's less than the number of TX queues per channel. This doesn't happen in practice, but could happen if some constant values are tweaked in the future, such us EFX_MAX_TXQ_PER_CHANNEL, EFX_MAX_EVQ_SIZE or EFX_MAX_DMAQ_SIZE. Related mailing list thread: https://lore.kernel.org/bpf/20201215104327.2be76156@carbon/ Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-20/+19
Trivial conflict in net/netfilter/nf_tables_api.c. Duplicate fix in tools/testing/selftests/net/devlink_port_split.py - take the net-next version. skmsg, and L4 bpf - keep the bpf code but remove the flags and err params. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-24sfc: Remove rcu_read_lock() around XDP program invocationToke Høiland-Jørgensen1-7/+2
The sfc driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Cc: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/bpf/20210624160609.292325-17-toke@redhat.com
2021-06-22sfc: avoid duplicated code in ef10_sriovÍñigo Huguet1-3/+1
The fail path of efx_ef10_sriov_alloc_vf_vswitching is identical to the full content of efx_ef10_sriov_free_vf_vswitching, so replace it for a single call to efx_ef10_sriov_free_vf_vswitching. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sfc: explain that "attached" VFs only refer to XenÍñigo Huguet2-4/+6
During SRIOV disabling it is checked wether any VF is currently attached to a guest, using pci_vfs_assigned function. However, this check only works with VFs attached with Xen, not with vfio/KVM. Added comments clarifying this point. Also, replaced manual check of PCI_DEV_FLAGS_ASSIGNED flag and used the helper function pci_is_dev_assigned instead. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sfc: error code if SRIOV cannot be disabledÍñigo Huguet1-4/+11
If SRIOV cannot be disabled during device removal or module unloading, return error code so it can be logged properly in the calling function. Note that this can only happen if any VF is currently attached to a guest using Xen, but not with vfio/KVM. Despite that in that case the VFs won't work properly with PF removed and/or the module unloaded, I have let it as is because I don't know what side effects may have changing it, and also it seems to be the same that other drivers are doing in this situation. In the case of being called during SRIOV reconfiguration, the behavior hasn't changed because the function is called with force=false. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sfc: avoid double pci_remove of VFsÍñigo Huguet1-9/+1
If pci_remove was called for a PF with VFs, the removal of the VFs was called twice from efx_ef10_sriov_fini: one directly with pci_driver->remove and another implicit by calling pci_disable_sriov, which also perform the VFs remove. This was leading to crashing the kernel on the second attempt. Given that pci_disable_sriov already calls to pci remove function, get rid of the direct call to pci_driver->remove from the driver. 2 different ways to trigger the bug: - Create one or more VFs, then attach the PF to a virtual machine (at least with qemu/KVM) - Create one or more VFs, then remove the PF with: echo 1 > /sys/bus/pci/devices/PF_PCI_ID/remove Removing sfc module does not trigger the error, at least for me, because it removes the VF first, and then the PF. Example of a log with the error: list_del corruption, ffff967fd20a8ad0->next is LIST_POISON1 (dead000000000100) ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:47! [...trimmed...] RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x4c [...trimmed...] Call Trace: efx_dissociate+0x1f/0x140 [sfc] efx_pci_remove+0x27/0x150 [sfc] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x103/0x1f0 pci_stop_bus_device+0x69/0x90 pci_stop_and_remove_bus_device+0xe/0x20 pci_iov_remove_virtfn+0xba/0x120 sriov_disable+0x2f/0xe0 efx_ef10_pci_sriov_disable+0x52/0x80 [sfc] ? pcie_aer_is_native+0x12/0x40 efx_ef10_sriov_fini+0x72/0x110 [sfc] efx_pci_remove+0x62/0x150 [sfc] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x103/0x1f0 unbind_store+0xf6/0x130 kernfs_fop_write+0x116/0x190 vfs_write+0xa5/0x1a0 ksys_write+0x4f/0xb0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+1
cdc-wdm: s/kill_urbs/poison_urbs/ to fix build Signed-off-by: Jakub Kicinski <kuba@kernel.org>