summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-07-19libbpf: generalize virtual __kconfig externs and use it for USDTAndrii Nakryiko2-45/+66
Libbpf supports single virtual __kconfig extern currently: LINUX_KERNEL_VERSION. LINUX_KERNEL_VERSION isn't coming from /proc/kconfig.gz and is intead customly filled out by libbpf. This patch generalizes this approach to support more such virtual __kconfig externs. One such extern added in this patch is LINUX_HAS_BPF_COOKIE which is used for BPF-side USDT supporting code in usdt.bpf.h instead of using CO-RE-based enum detection approach for detecting bpf_get_attach_cookie() BPF helper. This allows to remove otherwise not needed CO-RE dependency and keeps user-space and BPF-side parts of libbpf's USDT support strictly in sync in terms of their feature detection. We'll use similar approach for syscall wrapper detection for BPF_KSYSCALL() BPF-side macro in follow up patch. Generally, currently libbpf reserves CONFIG_ prefix for Kconfig values and LINUX_ for virtual libbpf-backed externs. In the future we might extend the set of prefixes that are supported. This can be done without any breaking changes, as currently any __kconfig extern with unrecognized name is rejected. For LINUX_xxx externs we support the normal "weak rule": if libbpf doesn't recognize given LINUX_xxx extern but such extern is marked as __weak, it is not rejected and defaults to zero. This follows CONFIG_xxx handling logic and will allow BPF applications to opportunistically use newer libbpf virtual externs without breaking on older libbpf versions unnecessarily. Tested-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220714070755.3235561-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-19net: dsa: microchip: fix the missing ksz8_r_mib_cntArun Ramadoss1-0/+1
During the refactoring for the ksz8_dev_ops from ksz8795.c to ksz_common.c, the ksz8_r_mib_cnt has been missed. So this patch adds the missing one. Fixes: 6ec23aaaac43 ("net: dsa: microchip: move ksz_dev_ops to ksz_common.c") Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220718061803.4939-1-arun.ramadoss@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19Merge branch 'amt-fix-validation-and-synchronization-bugs'Paolo Abeni2-52/+207
Taehee Yoo says: ==================== amt: fix validation and synchronization bugs There are some synchronization issues in the amt module. Especially, an amt gateway doesn't well synchronize its own variables and status(amt->status). It tries to use a workqueue for handles in a single thread. A global lock is also good, but it would occur complex locking complex. In this patchset, only the gateway uses workqueue. The reason why only gateway interface uses workqueue is that gateway should manage its own states and variables a little bit statefully. But relay doesn't need to manage tunnels statefully, stateless is okay. So, relay side message handlers are okay to be called concurrently. But it doesn't mean that no lock is needed. Only amt multicast data message type will not be processed by the work queue because It contains actual multicast data. So, it should be processed immediately. When any amt gateway events are triggered(sending discovery message by delayed_work, sending request message by delayed_work and receiving messages), it stores event and skb into the event queue(amt->events[16]). Then, workqueue processes these events one by one. The first patch is to use the work queue. The second patch is to remove unnecessary lock due to a previous patch. The third patch is to use READ_ONCE() in the amt module. Even if the amt module uses a single thread, some variables (ready4, ready6, amt->status) can be accessed concurrently. The fourth patch is to add missing nonce generation logic when it sends a new request message. The fifth patch is to drop unexpected advertisement messages. advertisement message should be received only after the gateway sends a discovery message first. So, the gateway should drop advertisement messages if it has never sent a discovery message and it also should drop duplicate advertisement messages. Using nonce is good to distinguish whether a received message is an expected message or not. The sixth patch is to drop unexpected query messages. This is the same behavior as the fourth patch. Query messages should be received only after the gateway sends a request message first. The nonce variable is used to distinguish whether it is a reply to a previous request message or not. amt->ready4 and amt->ready6 are used to distinguish duplicate messages. The seventh patch is to drop unexpected multicast data. AMT gateway should not receive multicast data message type before establish between gateway and relay. In order to drop unexpected multicast data messages, it checks amt->status. The last patch is to fix a locking problem on the relay side. amt->nr_tunnels variable is protected by amt->lock. But amt_request_handler() doesn't protect this variable. v2: - Use local_bh_disable() instead of rcu_read_lock_bh() in amt_membership_query_handler. - Fix using uninitialized variables. - Fix unexpectedly start the event_wq after stopping. - Fix possible deadlock in amt_event_work(). - Add a limit variable in amt_event_work() to prevent infinite working. - Rename amt_queue_events() to amt_queue_event(). ==================== Link: https://lore.kernel.org/r/20220717160910.19156-1-ap420073@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: do not use amt->nr_tunnels outside of lockTaehee Yoo1-3/+6
amt->nr_tunnels is protected by amt->lock. But, amt_request_handler() has been using this variable without the amt->lock. So, it expands context of amt->lock in the amt_request_handler() to protect amt->nr_tunnels variable. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: drop unexpected multicast dataTaehee Yoo1-0/+3
AMT gateway interface should not receive unexpected multicast data. Multicast data message type should be received after sending an update message, which means all establishment between gateway and relay is finished. So, amt_multicast_data_handler() checks amt->status. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: drop unexpected query messageTaehee Yoo1-1/+13
AMT gateway interface should not receive unexpected query messages. In order to drop unexpected query messages, it checks nonce. And it also checks ready4 and ready6 variables to drop duplicated messages. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: drop unexpected advertisement messageTaehee Yoo1-0/+5
AMT gateway interface should not receive unexpected advertisement messages. In order to drop these packets, it should check nonce and amt->status. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: add missing regeneration nonce logic in request logicTaehee Yoo1-0/+4
When AMT gateway starts sending a new request message, it should regenerate the nonce variable. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: use READ_ONCE() in amt moduleTaehee Yoo1-7/+8
There are some data races in the amt module. amt->ready4, amt->ready6, and amt->status can be accessed concurrently without locks. So, it uses READ_ONCE() and WRITE_ONCE(). Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: remove unnecessary locksTaehee Yoo1-27/+5
By the previous patch, amt gateway handlers are changed to worked by a single thread. So, most locks for gateway are not needed. So, it removes. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: use workqueue for gateway side message handlingTaehee Yoo2-15/+164
There are some synchronization issues(amt->status, amt->req_cnt, etc) if the interface is in gateway mode because gateway message handlers are processed concurrently. This applies a work queue for processing these messages instead of expanding the locking context. So, the purposes of this patch are to fix exist race conditions and to make gateway to be able to validate a gateway status more correctly. When the AMT gateway interface is created, it tries to establish to relay. The establishment step looks stateless, but it should be managed well. In order to handle messages in the gateway, it saves the current status(i.e. AMT_STATUS_XXX). This patch makes gateway code to be worked with a single thread. Now, all messages except the multicast are triggered(received or delay expired), and these messages will be stored in the event queue(amt->events). Then, the single worker processes stored messages asynchronously one by one. The multicast data message type will be still processed immediately. Now, amt->lock is only needed to access the event queue(amt->events) if an interface is the gateway mode. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19net: dsa: vitesse-vsc73xx: silent spi_device_id warningsOleksij Rempel1-0/+10
Add spi_device_id entries to silent SPI warnings. Fixes: 5fa6863ba692 ("spi: Check we have a spi_device_id for each DT compatible") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220717135831.2492844-2-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19net: dsa: sja1105: silent spi_device_id warningsOleksij Rempel1-0/+16
Add spi_device_id entries to silent following warnings: SPI driver sja1105 has no spi_device_id for nxp,sja1105e SPI driver sja1105 has no spi_device_id for nxp,sja1105t SPI driver sja1105 has no spi_device_id for nxp,sja1105p SPI driver sja1105 has no spi_device_id for nxp,sja1105q SPI driver sja1105 has no spi_device_id for nxp,sja1105r SPI driver sja1105 has no spi_device_id for nxp,sja1105s SPI driver sja1105 has no spi_device_id for nxp,sja1110a SPI driver sja1105 has no spi_device_id for nxp,sja1110b SPI driver sja1105 has no spi_device_id for nxp,sja1110c SPI driver sja1105 has no spi_device_id for nxp,sja1110d Fixes: 5fa6863ba692 ("spi: Check we have a spi_device_id for each DT compatible") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220717135831.2492844-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19be2net: Fix buffer overflow in be_get_module_eepromHristo Venev3-18/+25
be_cmd_read_port_transceiver_data assumes that it is given a buffer that is at least PAGE_DATA_LEN long, or twice that if the module supports SFF 8472. However, this is not always the case. Fix this by passing the desired offset and length to be_cmd_read_port_transceiver_data so that we only copy the bytes once. Fixes: e36edd9d26cf ("be2net: add ethtool "-m" option support") Signed-off-by: Hristo Venev <hristo@venev.name> Link: https://lore.kernel.org/r/20220716085134.6095-1-hristo@venev.name Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19net: prestera: acl: add support for 'police' action on egressMaksym Glubokiy1-1/+1
Propagate ingress/egress direction for 'police' action down to hardware. Co-developed-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu> Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu> Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Link: https://lore.kernel.org/r/20220714083541.1973919-1-maksym.glubokiy@plvision.eu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19Merge branch '100GbE' of ↵Jakub Kicinski4-11/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-07-15 This series contains updates to ice driver only. Ani updates feature restriction for devices that don't support external time stamping. Zhuo Chen removes unnecessary call to pci_aer_clear_nonfatal_status(). * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Remove pci_aer_clear_nonfatal_status() call ice: Add EXTTS feature to the feature bitmap ==================== Link: https://lore.kernel.org/r/20220715214642.2968799-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: macb: fixup sparse warnings on __be16 portsBen Dooks1-3/+4
The port fields in the ethool flow structures are defined to be __be16 types, so sparse is showing issues where these are being passed to htons(). Fix these warnings by passing them to be16_to_cpu() instead. These are being used in netdev_dbg() so should only effect anyone doing debug. Fixes the following sparse warnings: drivers/net/ethernet/cadence/macb_main.c:3366:9: warning: cast from restricted __be16 drivers/net/ethernet/cadence/macb_main.c:3366:9: warning: cast from restricted __be16 drivers/net/ethernet/cadence/macb_main.c:3366:9: warning: cast from restricted __be16 drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16 drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16 drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16 drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16 Signed-off-by: Ben Dooks <ben.dooks@sifive.com> Link: https://lore.kernel.org/r/20220715173009.526126-1-ben.dooks@sifive.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: prestera: acl: fix code formattingMaksym Glubokiy1-4/+4
Make the code look better. Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Link: https://lore.kernel.org/r/20220715103806.7108-1-maksym.glubokiy@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19vmxnet3: Record queue number to incoming packetsAndrey Turkin1-0/+1
Make generic XDP processing attribute packets to their actual queues instead of queue #0. This improves AF_XDP performance considerably since softirq threads no longer fight over single AF_XDP socket spinlock. Signed-off-by: Andrey Turkin <andrey.turkin@gmail.com> Link: https://lore.kernel.org/r/20220717022050.822766-2-andrey.turkin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: stmmac: remove redunctant disable xPCS EEE callWong Vee Khee1-8/+0
Disable is done in stmmac_init_eee() on the event of MAC link down. Since setting enable/disable EEE via ethtool will eventually trigger a MAC down, removing this redunctant call in stmmac_ethtool.c to avoid calling xpcs_config_eee() twice. Fixes: d4aeaed80b0e ("net: stmmac: trigger PCS EEE to turn off on link down") Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Link: https://lore.kernel.org/r/20220715122402.1017470-1-vee.khee.wong@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19Merge branch 'fix-2-dsa-issues-with-vlan_filtering_is_global'Jakub Kicinski1-3/+4
Vladimir Oltean says: ==================== Fix 2 DSA issues with vlan_filtering_is_global This patch set fixes 2 issues with vlan_filtering_is_global switches. Both are regressions introduced by refactoring commit d0004a020bb5 ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core"), which wasn't tested on a wide enough variety of switches. Tested on the sja1105 driver. ==================== Link: https://lore.kernel.org/r/20220715151659.780544-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: dsa: fix NULL pointer dereference in dsa_port_reset_vlan_filteringVladimir Oltean1-2/+3
The "ds" iterator variable used in dsa_port_reset_vlan_filtering() -> dsa_switch_for_each_port() overwrites the "dp" received as argument, which is later used to call dsa_port_vlan_filtering() proper. As a result, switches which do enter that code path (the ones with vlan_filtering_is_global=true) will dereference an invalid dp in dsa_port_reset_vlan_filtering() after leaving a VLAN-aware bridge. Use a dedicated "other_dp" iterator variable to avoid this from happening. Fixes: d0004a020bb5 ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: dsa: fix dsa_port_vlan_filtering when globalVladimir Oltean1-1/+1
The blamed refactoring commit changed a "port" iterator with "other_dp", but still looked at the slave_dev of the dp outside the loop, instead of other_dp->slave from the loop. As a result, dsa_port_vlan_filtering() would not call dsa_slave_manage_vlan_filtering() except for the port in cause, and not for all switch ports as expected. Fixes: d0004a020bb5 ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core") Reported-by: Lucian Banu <Lucian.Banu@westermo.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19ixgbe: Add locking to prevent panic when setting sriov_numvfs to zeroPiotr Skajewski3-0/+10
It is possible to disable VFs while the PF driver is processing requests from the VF driver. This can result in a panic. BUG: unable to handle kernel paging request at 000000000000106c PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 8 PID: 0 Comm: swapper/8 Kdump: loaded Tainted: G I --------- - Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.8.2 08/27/2020 RIP: 0010:ixgbe_msg_task+0x4c8/0x1690 [ixgbe] Code: 00 00 48 8d 04 40 48 c1 e0 05 89 7c 24 24 89 fd 48 89 44 24 10 83 ff 01 0f 84 b8 04 00 00 4c 8b 64 24 10 4d 03 a5 48 22 00 00 <41> 80 7c 24 4c 00 0f 84 8a 03 00 00 0f b7 c7 83 f8 08 0f 84 8f 0a RSP: 0018:ffffb337869f8df8 EFLAGS: 00010002 RAX: 0000000000001020 RBX: 0000000000000000 RCX: 000000000000002b RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000006 RBP: 0000000000000006 R08: 0000000000000002 R09: 0000000000029780 R10: 00006957d8f42832 R11: 0000000000000000 R12: 0000000000001020 R13: ffff8a00e8978ac0 R14: 000000000000002b R15: ffff8a00e8979c80 FS: 0000000000000000(0000) GS:ffff8a07dfd00000(0000) knlGS:00000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000106c CR3: 0000000063e10004 CR4: 00000000007726e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? ttwu_do_wakeup+0x19/0x140 ? try_to_wake_up+0x1cd/0x550 ? ixgbevf_update_xcast_mode+0x71/0xc0 [ixgbevf] ixgbe_msix_other+0x17e/0x310 [ixgbe] __handle_irq_event_percpu+0x40/0x180 handle_irq_event_percpu+0x30/0x80 handle_irq_event+0x36/0x53 handle_edge_irq+0x82/0x190 handle_irq+0x1c/0x30 do_IRQ+0x49/0xd0 common_interrupt+0xf/0xf This can be eventually be reproduced with the following script: while : do echo 63 > /sys/class/net/<devname>/device/sriov_numvfs sleep 1 echo 0 > /sys/class/net/<devname>/device/sriov_numvfs sleep 1 done Add lock when disabling SR-IOV to prevent process VF mailbox communication. Fixes: d773d1310625 ("ixgbe: Fix memory leak when SR-IOV VFs are direct assigned") Signed-off-by: Piotr Skajewski <piotrx.skajewski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220715214456.2968711-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19i40e: Fix erroneous adapter reinitialization during recovery processDawid Lukwinski1-8/+5
Fix an issue when driver incorrectly detects state of recovery process and erroneously reinitializes interrupts, which results in a kernel error and call trace message. The issue was caused by a combination of two factors: 1. Assuming the EMP reset issued after completing firmware recovery means the whole recovery process is complete. 2. Erroneous reinitialization of interrupt vector after detecting the above mentioned EMP reset. Fixes (1) by changing how recovery state change is detected and (2) by adjusting the conditional expression to ensure using proper interrupt reinitialization method, depending on the situation. Fixes: 4ff0ee1af016 ("i40e: Introduce recovery mode support") Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220715214542.2968762-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: ethernet: mtk_eth_soc: fix off by one check of ARRAY_SIZETom Rix1-1/+1
In mtk_wed_tx_ring_setup(.., int idx, ..), idx is used as an index here struct mtk_wed_ring *ring = &dev->tx_ring[idx]; The bounds of idx are checked here BUG_ON(idx > ARRAY_SIZE(dev->tx_ring)); If idx is the size of the array, it will pass this check and overflow. So change the check to >= . Fixes: 804775dfc288 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)") Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20220716214654.1540240-1-trix@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19Merge branch 'devlink-prepare-mlxsw-and-netdevsim-for-locked-reload'Jakub Kicinski16-604/+816
Jiri Pirko says: ==================== devlink: prepare mlxsw and netdevsim for locked reload This is preparation patchset to be able to eventually make a switch and make reload cmd to take devlink->lock as the other commands do. This patchset is preparing 2 major users of devlink API - mlxsw and netdevsim. The sets of functions are similar, therefore taking care of both here. ==================== Link: https://lore.kernel.org/r/20220716110241.3390528-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: devlink: remove unused locked functionsJiri Pirko2-188/+0
Remove locked versions of functions that are no longer used by anyone. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19netdevsim: convert driver to use unlocked devlink API during init/finiJiri Pirko6-123/+102
Prepare for devlink reload being called with devlink->lock held and convert the netdevsim driver to use unlocked devlink API during init and fini flows. Take devl_lock() in reload_down() and reload_up() ops in the meantime before reload cmd is converted to take the lock itself. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: devlink: add unlocked variants of devlink_region_create/destroy() functionsJiri Pirko2-28/+66
Add unlocked variants of devlink_region_create/destroy() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19mlxsw: convert driver to use unlocked devlink API during init/finiJiri Pirko10-241/+248
Prepare for devlink reload being called with devlink->lock held and convert the mlxsw driver to use unlocked devlink API during init and fini flows. Take devl_lock() in reload_down() and reload_up() ops in the meantime before reload cmd is converted to take the lock itself. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: devlink: add unlocked variants of devlink_dpipe*() functionsJiri Pirko2-46/+147
Add unlocked variants of devlink_dpipe*() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: devlink: add unlocked variants of devlink_sb*() functionsJiri Pirko2-18/+41
Add unlocked variants of devlink_sb*() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: devlink: add unlocked variants of devlink_resource*() functionsJiri Pirko2-61/+173
Add unlocked variants of devlink_resource*() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: devlink: add unlocked variants of devling_trap*() functionsJiri Pirko2-32/+168
Add unlocked variants of devl_trap*() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: devlink: avoid false DEADLOCK warning reported by lockdepMoshe Shemesh1-0/+4
Add a lock_class_key per devlink instance to avoid DEADLOCK warning by lockdep, while locking more than one devlink instance in driver code, for example in opening VFs flow. Kernel log: [ 101.433802] ============================================ [ 101.433803] WARNING: possible recursive locking detected [ 101.433810] 5.19.0-rc1+ #35 Not tainted [ 101.433812] -------------------------------------------- [ 101.433813] bash/892 is trying to acquire lock: [ 101.433815] ffff888127bfc2f8 (&devlink->lock){+.+.}-{3:3}, at: probe_one+0x3c/0x690 [mlx5_core] [ 101.433909] but task is already holding lock: [ 101.433910] ffff888118f4c2f8 (&devlink->lock){+.+.}-{3:3}, at: mlx5_core_sriov_configure+0x62/0x280 [mlx5_core] [ 101.433989] other info that might help us debug this: [ 101.433990] Possible unsafe locking scenario: [ 101.433991] CPU0 [ 101.433991] ---- [ 101.433992] lock(&devlink->lock); [ 101.433993] lock(&devlink->lock); [ 101.433995] *** DEADLOCK *** [ 101.433996] May be due to missing lock nesting notation [ 101.433996] 6 locks held by bash/892: [ 101.433998] #0: ffff88810eb50448 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0xf3/0x1d0 [ 101.434009] #1: ffff888114777c88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x20d/0x520 [ 101.434017] #2: ffff888102b58660 (kn->active#231){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x230/0x520 [ 101.434023] #3: ffff888102d70198 (&dev->mutex){....}-{3:3}, at: sriov_numvfs_store+0x132/0x310 [ 101.434031] #4: ffff888118f4c2f8 (&devlink->lock){+.+.}-{3:3}, at: mlx5_core_sriov_configure+0x62/0x280 [mlx5_core] [ 101.434108] #5: ffff88812adce198 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x430 [ 101.434116] stack backtrace: [ 101.434118] CPU: 5 PID: 892 Comm: bash Not tainted 5.19.0-rc1+ #35 [ 101.434120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 101.434130] Call Trace: [ 101.434133] <TASK> [ 101.434135] dump_stack_lvl+0x57/0x7d [ 101.434145] __lock_acquire.cold+0x1df/0x3e7 [ 101.434151] ? register_lock_class+0x1880/0x1880 [ 101.434157] lock_acquire+0x1c1/0x550 [ 101.434160] ? probe_one+0x3c/0x690 [mlx5_core] [ 101.434229] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 101.434232] ? __xa_alloc+0x1ed/0x2d0 [ 101.434236] ? ksys_write+0xf3/0x1d0 [ 101.434239] __mutex_lock+0x12c/0x14b0 [ 101.434243] ? probe_one+0x3c/0x690 [mlx5_core] [ 101.434312] ? probe_one+0x3c/0x690 [mlx5_core] [ 101.434380] ? devlink_alloc_ns+0x11b/0x910 [ 101.434385] ? mutex_lock_io_nested+0x1320/0x1320 [ 101.434388] ? lockdep_init_map_type+0x21a/0x7d0 [ 101.434391] ? lockdep_init_map_type+0x21a/0x7d0 [ 101.434393] ? __init_swait_queue_head+0x70/0xd0 [ 101.434397] probe_one+0x3c/0x690 [mlx5_core] [ 101.434467] pci_device_probe+0x1b4/0x480 [ 101.434471] really_probe+0x1e0/0xaa0 [ 101.434474] __driver_probe_device+0x219/0x480 [ 101.434478] driver_probe_device+0x49/0x130 [ 101.434481] __device_attach_driver+0x1b8/0x280 [ 101.434484] ? driver_allows_async_probing+0x140/0x140 [ 101.434487] bus_for_each_drv+0x123/0x1a0 [ 101.434489] ? bus_for_each_dev+0x1a0/0x1a0 [ 101.434491] ? lockdep_hardirqs_on_prepare+0x286/0x400 [ 101.434494] ? trace_hardirqs_on+0x2d/0x100 [ 101.434498] __device_attach+0x1a3/0x430 [ 101.434501] ? device_driver_attach+0x1e0/0x1e0 [ 101.434503] ? pci_bridge_d3_possible+0x1e0/0x1e0 [ 101.434506] ? pci_create_resource_files+0xeb/0x190 [ 101.434511] pci_bus_add_device+0x6c/0xa0 [ 101.434514] pci_iov_add_virtfn+0x9e4/0xe00 [ 101.434517] ? trace_hardirqs_on+0x2d/0x100 [ 101.434521] sriov_enable+0x64a/0xca0 [ 101.434524] ? pcibios_sriov_disable+0x10/0x10 [ 101.434528] mlx5_core_sriov_configure+0xab/0x280 [mlx5_core] [ 101.434602] sriov_numvfs_store+0x20a/0x310 [ 101.434605] ? sriov_totalvfs_show+0xc0/0xc0 [ 101.434608] ? sysfs_file_ops+0x170/0x170 [ 101.434611] ? sysfs_file_ops+0x117/0x170 [ 101.434614] ? sysfs_file_ops+0x170/0x170 [ 101.434616] kernfs_fop_write_iter+0x348/0x520 [ 101.434619] new_sync_write+0x2e5/0x520 [ 101.434621] ? new_sync_read+0x520/0x520 [ 101.434624] ? lock_acquire+0x1c1/0x550 [ 101.434626] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 101.434630] vfs_write+0x5cb/0x8d0 [ 101.434633] ksys_write+0xf3/0x1d0 [ 101.434635] ? __x64_sys_read+0xb0/0xb0 [ 101.434638] ? lockdep_hardirqs_on_prepare+0x286/0x400 [ 101.434640] ? syscall_enter_from_user_mode+0x1d/0x50 [ 101.434643] do_syscall_64+0x3d/0x90 [ 101.434647] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 101.434650] RIP: 0033:0x7f5ff536b2f7 [ 101.434658] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 101.434661] RSP: 002b:00007ffd9ea85d58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 101.434664] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5ff536b2f7 [ 101.434666] RDX: 0000000000000002 RSI: 000055c4c279e230 RDI: 0000000000000001 [ 101.434668] RBP: 000055c4c279e230 R08: 000000000000000a R09: 0000000000000001 [ 101.434669] R10: 000055c4c283cbf0 R11: 0000000000000246 R12: 0000000000000002 [ 101.434670] R13: 00007f5ff543d500 R14: 0000000000000002 R15: 00007f5ff543d700 [ 101.434673] </TASK> Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19Merge branch 'net-lan966x-fix-issues-with-mac-table'Jakub Kicinski1-32/+80
Horatiu Vultur says: ==================== net: lan966x: Fix issues with MAC table The patch series fixes 2 issues: - when an entry was forgotten the irq thread was holding a spin lock and then was talking also rtnl_lock. - the access to the HW MAC table is indirect, so the access to the HW MAC table was not synchronized, which means that there could be race conditions. ==================== Link: https://lore.kernel.org/r/20220714194040.231651-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: lan966x: Fix usage of lan966x->mac_lock when used by FDBHoratiu Vultur1-11/+23
When the SW bridge was trying to add/remove entries to/from HW, the access to HW was not protected by any lock. In this way, it was possible to have race conditions. Fix this by using the lan966x->mac_lock to protect parallel access to HW for this cases. Fixes: 25ee9561ec622 ("net: lan966x: More MAC table functionality") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: lan966x: Fix usage of lan966x->mac_lock inside lan966x_mac_irq_handlerHoratiu Vultur1-7/+12
The problem with this spin lock is that it was just protecting the list of the MAC entries in SW and not also the access to the MAC entries in HW. Because the access to HW is indirect, then it could happen to have race conditions. For example when SW introduced an entry in MAC table and the irq mac is trying to read something from the MAC. Update such that also the access to MAC entries in HW is protected by this lock. Fixes: 5ccd66e01cbef ("net: lan966x: add support for interrupts from analyzer") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: lan966x: Fix usage of lan966x->mac_lock when entry is removedHoratiu Vultur1-4/+20
To remove an entry to the MAC table, it is required first to setup the entry and then issue a command for the MAC to forget the entry. So if it happens for two threads to remove simultaneously an entry in MAC table then it would be a race condition. Fix this by using lan966x->mac_lock to protect the HW access. Fixes: e18aba8941b40 ("net: lan966x: add mactable support") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: lan966x: Fix usage of lan966x->mac_lock when entry is addedHoratiu Vultur1-1/+7
To add an entry to the MAC table, it is required first to setup the entry and then issue a command for the MAC to learn the entry. So if it happens for two threads to add simultaneously an entry in MAC table then it would be a race condition. Fix this by using lan966x->mac_lock to protect the HW access. Fixes: fc0c3fe7486f2 ("net: lan966x: Add function lan966x_mac_ip_learn()") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: lan966x: Fix taking rtnl_lock while holding spin_lockHoratiu Vultur1-9/+18
When the HW deletes an entry in MAC table then it generates an interrupt. The SW will go through it's own list of MAC entries and if it is not found then it would notify the listeners about this. The problem is that when the SW will go through it's own list it would take a spin lock(lan966x->mac_lock) and when it notifies that the entry is deleted. But to notify the listeners it taking the rtnl_lock which is illegal. This is fixed by instead of notifying right away that the entry is deleted, move the entry on a temp list and once, it checks all the entries then just notify that the entries from temp list are deleted. Fixes: 5ccd66e01cbe ("net: lan966x: add support for interrupts from analyzer") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19skbuff: add SKBFL_DONT_ORPHAN flagPavel Begunkov2-4/+6
We don't want to list every single ubuf_info callback in skb_orphan_frags(), add a flag controlling the behaviour. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19skbuff: don't mix ubuf_info from different sourcesPavel Begunkov1-0/+4
We should not append MSG_ZEROCOPY requests to skbuff with non MSG_ZEROCOPY ubuf_info, they might be not compatible. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19ipv6: avoid partial copy for zcPavel Begunkov1-1/+6
Even when zerocopy transmission is requested and possible, __ip_append_data() will still copy a small chunk of data just because it allocated some extra linear space (e.g. 128 bytes). It wastes CPU cycles on copy and iter manipulations and also misalignes potentially aligned data. Avoid such copies. And as a bonus we can allocate smaller skb. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19ipv4: avoid partial copy for zcPavel Begunkov1-2/+6
Even when zerocopy transmission is requested and possible, __ip_append_data() will still copy a small chunk of data just because it allocated some extra linear space (e.g. 148 bytes). It wastes CPU cycles on copy and iter manipulations and also misalignes potentially aligned data. Avoid such copies. And as a bonus we can allocate smaller skb. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds5-52/+5
Pull rdma fixes from Jason Gunthorpe: "Two bug fixes for irdma: - x722 does not support 1GB pages, trying to configure them will corrupt the dma mapping - Fix a sleep while holding a spinlock" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/irdma: Fix sleep from invalid context BUG RDMA/irdma: Do not advertise 1GB page size for x722
2022-07-18Merge tag 'hte/for-5.19' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux Pull hardware timestamp fix from Thierry Reding: "A single fix for an out-of-sync kerneldoc comment" * tag 'hte/for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: gpiolib: cdev: Fix kernel doc for struct line
2022-07-18iavf: Fix missing state logsPrzemyslaw Patynowski1-0/+4
Fix debug prints, by adding missing state prints. Extend iavf_state_str by strings for __IAVF_INIT_EXTENDED_CAPS and __IAVF_INIT_CONFIG_ADAPTER. Without this patch, when enabling debug prints for iavf.h, user will see: iavf 0000:06:0e.0: state transition from:__IAVF_INIT_GET_RESOURCES to:__IAVF_UNKNOWN_STATE iavf 0000:06:0e.0: state transition from:__IAVF_UNKNOWN_STATE to:__IAVF_UNKNOWN_STATE Fixes: 605ca7c5c670 ("iavf: Fix kernel BUG in free_msi_irqs") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-18iavf: Fix handling of dummy receive descriptorsPrzemyslaw Patynowski1-3/+2
Fix memory leak caused by not handling dummy receive descriptor properly. iavf_get_rx_buffer now sets the rx_buffer return value for dummy receive descriptors. Without this patch, when the hardware writes a dummy descriptor, iavf would not free the page allocated for the previous receive buffer. This is an unlikely event but can still happen. [Jesse: massaged commit message] Fixes: efa14c398582 ("iavf: allow null RX descriptors") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>