summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-04bnxt_en: Allocate page pool per numa nodeSomnath Kotur1-4/+11
Driver's Page Pool allocation code looks at the node local to the PCIe device to determine where to allocate memory. In scenarios where the core count per NUMA node is low (< default rings) it makes sense to exhaust page pool allocations on Node 0 first and then moving on to allocating page pools for the remaining rings from Node 1. With this patch, and the following configuration on the NIC $ ethtool -L ens1f0np0 combined 16 (core count/node = 12, first 12 rings on node#0, last 4 rings node#1) and traffic redirected to a ring on node#1 , we see a performance improvement of ~20% Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-4-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bnxt_en: Enable XPS by default on driver loadSomnath Kotur1-1/+45
Enable XPS on default during NIC open. The choice of Tx queue is based on the CPU executing the thread that submits the Tx request. The pool of Tx queues will be spread evenly across both device-attached NUMA nodes(local) and remote NUMA nodes. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-3-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bnxt_en: Add delay to handle Downstream Port Containment (DPC) AERVikas Gupta1-0/+4
In case of DPC, after issuing the hot reset, the kernel waits for 100ms for the device to complete the reset. However on some older chips, the firmware may take up to 1 second to complete the reset, only after which the driver can restart the card. Introduce delay of 900ms to handle this scenario on the older chipsets. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-2-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net: ethernet: mtk_eth_soc: Reuse value using READ_ONCE instead of ↵linke li1-1/+1
re-rereading it In mtk_flow_entry_update_l2, the hwe->ib1 is read using READ_ONCE at the beginning of the function, checked, and then re-read from hwe->ib1, may void all guarantees of the checks. Reuse the value that was read by READ_ONCE to ensure the consistency of the ib1 throughout the function. Signed-off-by: linke li <lilinke99@qq.com> Link: https://lore.kernel.org/r/tencent_C699E9540505523424F11A9BD3D21B86840A@qq.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-04net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45()Christophe JAILLET1-1/+1
The definition and declaration of sja1110_pcs_mdio_write_c45() don't have parameters in the same order. Knowing that sja1110_pcs_mdio_write_c45() is used as a function pointer in 'sja1105_info' structure with .pcs_mdio_write_c45, and that we have: int (*pcs_mdio_write_c45)(struct mii_bus *bus, int phy, int mmd, int reg, u16 val); it is likely that the definition is the one to change. Found with cppcheck, funcArgOrderDifferent. Fixes: ae271547bba6 ("net: dsa: sja1105: C45 only transactions for PCS") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Michael Walle <mwalle@kernel.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/ff2a5af67361988b3581831f7bd1eddebfb4c48f.1712082763.git.christophe.jaillet@wanadoo.fr Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-04net: ravb: Always update error countersPaul Barker1-8/+9
The error statistics should be updated each time the poll function is called, even if the full RX work budget has been consumed. This prevents the counts from becoming stuck when RX bandwidth usage is high. This also ensures that error counters are not updated after we've re-enabled interrupts as that could result in a race condition. Also drop an unnecessary space. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/20240402145305.82148-2-paul.barker.ct@bp.renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-04net: ravb: Always process TX descriptor ringPaul Barker1-2/+5
The TX queue should be serviced each time the poll function is called, even if the full RX work budget has been consumed. This prevents starvation of the TX queue when RX bandwidth usage is high. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/20240402145305.82148-1-paul.barker.ct@bp.renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-04netfilter: nf_tables: discard table flag update with pending basechain deletionPablo Neira Ayuso1-4/+5
Hook unregistration is deferred to the commit phase, same occurs with hook updates triggered by the table dormant flag. When both commands are combined, this results in deleting a basechain while leaving its hook still registered in the core. Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-04-04netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()Ziyang Xuan1-2/+7
nft_unregister_flowtable_type() within nf_flow_inet_module_exit() can concurrent with __nft_flowtable_type_get() within nf_tables_newflowtable(). And thhere is not any protection when iterate over nf_tables_flowtables list in __nft_flowtable_type_get(). Therefore, there is pertential data-race of nf_tables_flowtables list entry. Use list_for_each_entry_rcu() to iterate over nf_tables_flowtables list in __nft_flowtable_type_get(), and use rcu_read_lock() in the caller nft_flowtable_type_get() to protect the entire type query process. Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-04-04netfilter: nf_tables: reject new basechain after table flag updatePablo Neira Ayuso1-0/+3
When dormant flag is toggled, hooks are disabled in the commit phase by iterating over current chains in table (existing and new). The following configuration allows for an inconsistent state: add table x add chain x y { type filter hook input priority 0; } add table x { flags dormant; } add chain x w { type filter hook input priority 1; } which triggers the following warning when trying to unregister chain w which is already unregistered. [ 127.322252] WARNING: CPU: 7 PID: 1211 at net/netfilter/core.c:50 1 __nf_unregister_net_hook+0x21a/0x260 [...] [ 127.322519] Call Trace: [ 127.322521] <TASK> [ 127.322524] ? __warn+0x9f/0x1a0 [ 127.322531] ? __nf_unregister_net_hook+0x21a/0x260 [ 127.322537] ? report_bug+0x1b1/0x1e0 [ 127.322545] ? handle_bug+0x3c/0x70 [ 127.322552] ? exc_invalid_op+0x17/0x40 [ 127.322556] ? asm_exc_invalid_op+0x1a/0x20 [ 127.322563] ? kasan_save_free_info+0x3b/0x60 [ 127.322570] ? __nf_unregister_net_hook+0x6a/0x260 [ 127.322577] ? __nf_unregister_net_hook+0x21a/0x260 [ 127.322583] ? __nf_unregister_net_hook+0x6a/0x260 [ 127.322590] ? __nf_tables_unregister_hook+0x8a/0xe0 [nf_tables] [ 127.322655] nft_table_disable+0x75/0xf0 [nf_tables] [ 127.322717] nf_tables_commit+0x2571/0x2620 [nf_tables] Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-04-04netfilter: nf_tables: flush pending destroy work before exit_net releasePablo Neira Ayuso1-0/+1
Similar to 2c9f0293280e ("netfilter: nf_tables: flush pending destroy work before netlink notifier") to address a race between exit_net and the destroy workqueue. The trace below shows an element to be released via destroy workqueue while exit_net path (triggered via module removal) has already released the set that is used in such transaction. [ 1360.547789] BUG: KASAN: slab-use-after-free in nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables] [ 1360.547861] Read of size 8 at addr ffff888140500cc0 by task kworker/4:1/152465 [ 1360.547870] CPU: 4 PID: 152465 Comm: kworker/4:1 Not tainted 6.8.0+ #359 [ 1360.547882] Workqueue: events nf_tables_trans_destroy_work [nf_tables] [ 1360.547984] Call Trace: [ 1360.547991] <TASK> [ 1360.547998] dump_stack_lvl+0x53/0x70 [ 1360.548014] print_report+0xc4/0x610 [ 1360.548026] ? __virt_addr_valid+0xba/0x160 [ 1360.548040] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 1360.548054] ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables] [ 1360.548176] kasan_report+0xae/0xe0 [ 1360.548189] ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables] [ 1360.548312] nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables] [ 1360.548447] ? __pfx_nf_tables_trans_destroy_work+0x10/0x10 [nf_tables] [ 1360.548577] ? _raw_spin_unlock_irq+0x18/0x30 [ 1360.548591] process_one_work+0x2f1/0x670 [ 1360.548610] worker_thread+0x4d3/0x760 [ 1360.548627] ? __pfx_worker_thread+0x10/0x10 [ 1360.548640] kthread+0x16b/0x1b0 [ 1360.548653] ? __pfx_kthread+0x10/0x10 [ 1360.548665] ret_from_fork+0x2f/0x50 [ 1360.548679] ? __pfx_kthread+0x10/0x10 [ 1360.548690] ret_from_fork_asm+0x1a/0x30 [ 1360.548707] </TASK> [ 1360.548719] Allocated by task 192061: [ 1360.548726] kasan_save_stack+0x20/0x40 [ 1360.548739] kasan_save_track+0x14/0x30 [ 1360.548750] __kasan_kmalloc+0x8f/0xa0 [ 1360.548760] __kmalloc_node+0x1f1/0x450 [ 1360.548771] nf_tables_newset+0x10c7/0x1b50 [nf_tables] [ 1360.548883] nfnetlink_rcv_batch+0xbc4/0xdc0 [nfnetlink] [ 1360.548909] nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink] [ 1360.548927] netlink_unicast+0x367/0x4f0 [ 1360.548935] netlink_sendmsg+0x34b/0x610 [ 1360.548944] ____sys_sendmsg+0x4d4/0x510 [ 1360.548953] ___sys_sendmsg+0xc9/0x120 [ 1360.548961] __sys_sendmsg+0xbe/0x140 [ 1360.548971] do_syscall_64+0x55/0x120 [ 1360.548982] entry_SYSCALL_64_after_hwframe+0x55/0x5d [ 1360.548994] Freed by task 192222: [ 1360.548999] kasan_save_stack+0x20/0x40 [ 1360.549009] kasan_save_track+0x14/0x30 [ 1360.549019] kasan_save_free_info+0x3b/0x60 [ 1360.549028] poison_slab_object+0x100/0x180 [ 1360.549036] __kasan_slab_free+0x14/0x30 [ 1360.549042] kfree+0xb6/0x260 [ 1360.549049] __nft_release_table+0x473/0x6a0 [nf_tables] [ 1360.549131] nf_tables_exit_net+0x170/0x240 [nf_tables] [ 1360.549221] ops_exit_list+0x50/0xa0 [ 1360.549229] free_exit_list+0x101/0x140 [ 1360.549236] unregister_pernet_operations+0x107/0x160 [ 1360.549245] unregister_pernet_subsys+0x1c/0x30 [ 1360.549254] nf_tables_module_exit+0x43/0x80 [nf_tables] [ 1360.549345] __do_sys_delete_module+0x253/0x370 [ 1360.549352] do_syscall_64+0x55/0x120 [ 1360.549360] entry_SYSCALL_64_after_hwframe+0x55/0x5d (gdb) list *__nft_release_table+0x473 0x1e033 is in __nft_release_table (net/netfilter/nf_tables_api.c:11354). 11349 list_for_each_entry_safe(flowtable, nf, &table->flowtables, list) { 11350 list_del(&flowtable->list); 11351 nft_use_dec(&table->use); 11352 nf_tables_flowtable_destroy(flowtable); 11353 } 11354 list_for_each_entry_safe(set, ns, &table->sets, list) { 11355 list_del(&set->list); 11356 nft_use_dec(&table->use); 11357 if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) 11358 nft_map_deactivate(&ctx, set); (gdb) [ 1360.549372] Last potentially related work creation: [ 1360.549376] kasan_save_stack+0x20/0x40 [ 1360.549384] __kasan_record_aux_stack+0x9b/0xb0 [ 1360.549392] __queue_work+0x3fb/0x780 [ 1360.549399] queue_work_on+0x4f/0x60 [ 1360.549407] nft_rhash_remove+0x33b/0x340 [nf_tables] [ 1360.549516] nf_tables_commit+0x1c6a/0x2620 [nf_tables] [ 1360.549625] nfnetlink_rcv_batch+0x728/0xdc0 [nfnetlink] [ 1360.549647] nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink] [ 1360.549671] netlink_unicast+0x367/0x4f0 [ 1360.549680] netlink_sendmsg+0x34b/0x610 [ 1360.549690] ____sys_sendmsg+0x4d4/0x510 [ 1360.549697] ___sys_sendmsg+0xc9/0x120 [ 1360.549706] __sys_sendmsg+0xbe/0x140 [ 1360.549715] do_syscall_64+0x55/0x120 [ 1360.549725] entry_SYSCALL_64_after_hwframe+0x55/0x5d Fixes: 0935d5588400 ("netfilter: nf_tables: asynchronous release") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-04-04netfilter: nf_tables: release mutex after nft_gc_seq_end from abort pathPablo Neira Ayuso1-5/+8
The commit mutex should not be released during the critical section between nft_gc_seq_begin() and nft_gc_seq_end(), otherwise, async GC worker could collect expired objects and get the released commit lock within the same GC sequence. nf_tables_module_autoload() temporarily releases the mutex to load module dependencies, then it goes back to replay the transaction again. Move it at the end of the abort phase after nft_gc_seq_end() is called. Cc: stable@vger.kernel.org Fixes: 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path") Reported-by: Kuan-Ting Chen <hexrabbit@devco.re> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-04-04netfilter: nf_tables: release batch on table validation from abort pathPablo Neira Ayuso1-5/+10
Unlike early commit path stage which triggers a call to abort, an explicit release of the batch is required on abort, otherwise mutex is released and commit_list remains in place. Add WARN_ON_ONCE to ensure commit_list is empty from the abort path before releasing the mutex. After this patch, commit_list is always assumed to be empty before grabbing the mutex, therefore 03c1f1ef1584 ("netfilter: Cleanup nft_net->module_list from nf_tables_exit_net()") only needs to release the pending modules for registration. Cc: stable@vger.kernel.org Fixes: c0391b6ab810 ("netfilter: nf_tables: missing validation from the abort path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-04-04Revert "tg3: Remove residual error handling in tg3_suspend"Paolo Abeni1-4/+26
This reverts commit 9ab4ad295622a3481818856762471c1f8c830e18. I went out of coffee and applied it to the wrong tree. Blame on me. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-04tg3: Remove residual error handling in tg3_suspendNikita Kiryushin1-26/+4
As of now, tg3_power_down_prepare always ends with success, but the error handling code from former tg3_set_power_state call is still here. This code became unreachable in commit c866b7eac073 ("tg3: Do not use legacy PCI power management"). Remove (now unreachable) error handling code for simplification and change tg3_power_down_prepare to a void function as its result is no more checked. Signed-off-by: Nikita Kiryushin <kiryushin@ancud.ru> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240401191418.361747-1-kiryushin@ancud.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-04tg3: Remove residual error handling in tg3_suspendNikita Kiryushin1-26/+4
As of now, tg3_power_down_prepare always ends with success, but the error handling code from former tg3_set_power_state call is still here. This code became unreachable in commit c866b7eac073 ("tg3: Do not use legacy PCI power management"). Remove (now unreachable) error handling code for simplification and change tg3_power_down_prepare to a void function as its result is no more checked. Signed-off-by: Nikita Kiryushin <kiryushin@ancud.ru> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240401191418.361747-1-kiryushin@ancud.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-04Merge branch 'mlxsw-preparations-for-improving-performance'Jakub Kicinski2-218/+222
Petr Machata says: ==================== mlxsw: Preparations for improving performance Amit Cohen writes: mlxsw driver will use NAPI for event processing in a next patch set. Some additional improvements will be added later. This patch set prepares the code for NAPI usage and refactor some relevant areas. See more details in commit messages. Patch Set overview: Patches #1-#2 are preparations for patch #3 Patch #3 setups tasklets as part of queue initializtion Patch #4 removes handling of unlikely scenario Patch #5 removes unused counters Patch #6 makes style change in mlxsw_pci_eq_tasklet() Patch #7-#10 poll command interface instead of EQ0 usage Patches #11-#12 make style change and break the function mlxsw_pci_cq_tasklet() Patches #13-#14 remove functions which can be replaced by a stored value Patch #15 improves accessing to descriptor queue instance ==================== Link: https://lore.kernel.org/r/cover.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Store DQ pointer as part of CQ structureAmit Cohen1-13/+28
Currently, for each completion, we check the number of descriptor queue and take it via mlxsw_pci_{sdq,rdq}_get(). This is inefficient, the DQ should be the same for all the completions in CQ, as each CQ handles only one DQ - SDQ or RDQ. This mapping is handled as part of DQ initialization via mlxsw_cmd_mbox_sw2hw_dq_cq_set(). Instead, as part of DQ initialization, set DQ pointer in the appropriate CQ structure. When we handle completions, warn in case that the DQ number that we expect is different from the number we get in the CQE. Call WARN_ON_ONCE() only after checking the value, to avoid calling this method for each completion. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/a5b2559cd6d532c120f3194f89a1e257110318f1.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Remove mlxsw_pci_cq_count()Amit Cohen1-15/+3
Currently, for each interrupt we call mlxsw_pci_cq_count() to determine the number of CQs. This call makes additional two function's calls. This can be removed by storing this value as part of structure 'mlxsw_pci', as we already do for number of SDQs. Remove the function and __mlxsw_pci_queue_count() which is now not used and store the value instead. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/f08ad113e8160678f3c8d401382a696c6c7f44c7.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Remove mlxsw_pci_sdq_count()Amit Cohen1-12/+7
The number of SDQs is stored as part of 'mlxsw_pci' structure. In some cases, the driver uses this value and in some cases it calls mlxsw_pci_sdq_count() to get the value. Align the code to use the stored value. This simplifies the code and makes it clearer that the value is always the same. Rename 'mlxsw_pci->num_sdq_cqs' to 'mlxsw_pci->num_sdqs' as now it is used not only in CQ context. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/0c8788506d9af35d589dbf64be35a508fd63d681.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Break mlxsw_pci_cq_tasklet() into tasklets per queue typeAmit Cohen1-12/+74
Completion queues are used for completions of RDQ or SDQ. Each completion queue is used for one DQ. The first CQs are used for SDQs and the rest are used for RDQs. Currently, for each CQE (completion queue element), we check 'sr' value (send/receive) to know if it is completion of RDQ or SDQ. Actually, we do not really have to check it, as according to the queue number we know if it handles completions of Rx or Tx. Break the tasklet into two - one for Rx (RDQ) and one for Tx (SDQ). Then, setup the appropriate tasklet for each queue as part of queue initialization. Use 'sr' value for unlikely case that we get completion with type that we do not expect. Call WARN_ON_ONCE() only after checking the value, to avoid calling this method for each completion. A next patch set will use NAPI to handle events, then we will have a separate poll method for Rx and Tx. This change is a preparation for NAPI usage. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/50fbc366f8de54cb5dc72a7c4f394333ef71f1d0.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Make style change in mlxsw_pci_cq_tasklet()Amit Cohen1-2/+2
This function will be broken into several functions later. As preparation, reorder variables to reverse xmas tree. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/7170a8f4429ecb5a539b0374c621697778ff8363.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Remove unused wait queueAmit Cohen1-8/+4
The previous patch changed the code to do not handle command interface from event queue. With this change the wait queue is not used anymore. Remove it and 'wait_done' variable. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/f3af6a5a9dabd97d2920cefe475c6aa57767f504.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Use only one event queueAmit Cohen2-40/+16
The device supports two event queues. EQ0 is used for command interface completion events. EQ1 is used for completion events of RDQ or SDQ. Currently, for each EQE (event queue element), we check the queue number and handle accordingly. More than that, for each interrupt we schedule tasklets for both EQs. This is really ineffective, especially because of the fact that EQ0 is used only as part of driver init/fini, when EMADs are not available. There is no point to schedule the tasklet for it and check each EQE. A previous patch changed the code to poll command interface for each use of it. It means that now there is no real reason to use EQ0, as we poll the command interface. Initialize only one event queue and use it as EQ1 (this is determined by queue number). Then, for each interrupt we can schedule the tasklet only for one queue and we do not have to check the queue number. This simplifies the code and should improve performance. Note that polling command interface is ok as we use it only as part of driver init/fini. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/23d764f5c032e4c363b98590b746a4b32d2bf900.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Rename MLXSW_PCI_EQS_COUNTAmit Cohen2-3/+3
Currently we use MLXSW_PCI_EQS_COUNT event queues. A next patch will change the driver to initialize only EQ1, as EQ0 is not required anymore when we poll command interface. Rename the macro to MLXSW_PCI_EQS_MAX as later we will not initialize the maximum supported EQs, this value represents the maximum and a new macro will be added to represent the actual used queues. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/b08df430b62f23ca1aa3aaa257896d2d95aa7691.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Poll command interface for each cmd_exec()Amit Cohen1-31/+17
Command interface is used for configuring and querying FW when EMADs are not available. During the time that the driver sets up the asynchronous queues, it polls the command interface for getting completions. Then, there is a short period when asynchronous queues work, but EMADs are not available (marked in the code as nopoll = true). During this time, we send commands via command interface, but we do not poll it, as we can get an interrupt for the completion. Completions of command interface are received from HW in EQ0 (event queue 0). The usage of EQ0 instead of polling is done only 4 times during initialization and one time during tear down, but it makes an overhead during lifetime of the driver. For each interrupt, we have to check if we get events in EQ0 or EQ1 and handle them. This is really ineffective, especially because of the fact that EQ0 is used only as part of driver init/fini. Instead, we can poll command interface for each call of cmd_exec(). It means that when we send a command via command interface (as EMADs are not available), we will poll it, regardless of availability of the asynchronous queues. This will allow us to configure later only EQ1 and simplify the flow. Remove 'nopoll' indication and change mlxsw_pci_cmd_exec() to poll till answer/timeout regardless of queues' state. For now, completions are handled also by EQ0, but it will be removed in next patch. Additional cleanups will be added in next patches. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/e674c70380ceda953e0e45a77334c5d22e69938f.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Make style changes in mlxsw_pci_eq_tasklet()Amit Cohen1-9/+12
This function will be used later only for EQ1. As preparation, reorder variables to reverse xmas tree and return earlier when it is possible, to simplify the code. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/2412d6c135b2a6aedb4484f5d8baab3aecd7b9ae.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Remove unused countersAmit Cohen1-31/+18
The structure 'mlxsw_pci_queue' stores several counters which were consumed via debugfs. Since commit 9a32562becd9 ("mlxsw: Remove debugfs interface"), these counters are not used. Remove them. This makes the 'union u' and 'struct eq' redundant. Maintain 'struct cq' as it will be extended later. Replace increasing 'q->u.eq.ev_other_count' with WARN_ON_ONCE(), as it is used in an unreasonable case of receiving event in EQ which is not EQ0 or EQ1. When the queues are initialized, we check number of event queues and fail with the print "Unsupported number of queues" in case that the driver tries to initialize more than two queues. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/ee9e658800aa0390e08342100bc27daff4c176c0.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Arm CQ doorbell regardless of number of completionsAmit Cohen1-2/+2
Currently, as part of mlxsw_pci_cq_tasklet(), we check if any item was handled, and only in such case we arm doorbell. This is unlikely case, as we schedule tasklet only for CQs that we get an event for them, which means that they contain completions to handle. Remove this check, which is supposed to be true always, and even if it is false, it is not a mistake to ring the doorbell. We can warn on such case, but it is not really worth to add a check which will be run for each CQ handling when we do not expect to reach it and it does not point to logic error that should be handled. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/f8efa481bfe7bebb9f93bb803f44ab7da77f53e6.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Do not setup tasklet from operationAmit Cohen1-6/+2
Currently, the structure 'mlxsw_pci_queue_ops' holds a pointer to the callback function of tasklet. This is used only for EQ and CQ. mlxsw driver will use NAPI in a following patch set, so CQ will not use tasklet anymore. As preparation, remove this pointer from the shared operation structure and setup the tasklet as part of queue initialization. For now, setup tasklet for EQ and CQ. Later, CQ code will be changed. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/a326cae5fc1ad085a1a063c004983de6fe389414.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Move mlxsw_pci_cq_{init, fini}()Amit Cohen1-43/+43
Move mlxsw_pci_cq_{init, fini}() after mlxsw_pci_cq_tasklet() as a next patch will setup the tasklet as part of initialization. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/25196cb5baf5acf6ec1e956203790e018ba8e306.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04mlxsw: pci: Move mlxsw_pci_eq_{init, fini}()Amit Cohen1-36/+36
Move mlxsw_pci_eq_{init, fini}() after mlxsw_pci_eq_tasklet() as a next patch will setup the tasklet as part of initialization. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/7ae120a02e1c490084daae7e684a0d40b7cce4e7.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04Merge branch 'mlx5-misc-patches'Jakub Kicinski27-408/+596
Tariq Toukan says: ==================== mlx5 misc patches This patchset includes small features and misc code enhancements for the mlx5 core and EN drivers. Patches 1-4 by Gal improves the mlx5e ethtool stats implementation, for example by using standard helpers ethtool_sprintf/puts. Patch 5 by me adds a reset option for the FW command interface debugfs stats entries. This allows explicit FW command interface stats reset between different runs of a test case. Patches 6 and 7 are simple cleanups. Patch 8 by Gal adds driver support for 800Gbps link modes. Patch 9 by Jianbo enhances the L4 steering abilities. Patches 10-11 by Jianbo save redundant operations. ==================== Link: https://lore.kernel.org/r/20240402133043.56322-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net/mlx5: Don't call give_pages() if request 0 pageJianbo Liu1-0/+3
Firmware will return 0 on query BOOT/INIT PAGES for non-page supplier functions (external host PF/VF/SF), so no page is needed to be allocated for them. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240402133043.56322-12-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net/mlx5: Skip pages EQ creation for non-page supplier functionJianbo Liu2-2/+11
Page events are not issued by device on the function if page_request_disable is set, so no need to create pages EQ. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240402133043.56322-11-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net/mlx5: Support matching on l4_type for ttc_tableJianbo Liu7-67/+244
Replace matching on TCP and UDP protocols with new l4_type field which is parsed by steering for ttc_table. It is enabled by the outer_l4_type or inner_l4_type bits in nic_rx or port_sel flow table capabilities and used only if pcc_ifa2 bit in HCA capabilities is set. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240402133043.56322-10-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net/mlx5e: Add support for 800Gbps link modesGal Pressman1-0/+7
Add support for 800Gbps speed, link modes of 100Gbps per lane. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240402133043.56322-9-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net/mlx5: Convert uintX_t to uXGal Pressman11-14/+14
In the kernel, the preferred types are uX. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net/mlx5e: XDP, Fix an inconsistent commentCarolina Jubran1-1/+1
Starting from commit eb9b9fdcafe2 ("net/mlx5e: Introduce extended version for mlx5e_xmit_data") sinfo is no longer passed as an argument to mlx5e_xmit_xdp_frame(), the comment is inconsistent. check_result must be zero when the packet is fragmented. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net/mlx5e: debugfs, Add reset option for command interface statsTariq Toukan1-5/+17
Resetting stats just before some test/debug case allows us to eliminate out the impact of previous commands. Useful in particular for the average latency calculation. The average_write() callback was unreachable, as "average" is a read-only file. Extend, rename, and use it for a newly exposed write-only "reset" file. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net/mlx5e: Make stats group fill_stats callbacks consistent with the APIGal Pressman7-173/+215
The fill_strings() callbacks were changed to accept a **data pointer, and not rely on propagating the index value. Make a similar change to fill_stats() callbacks to keep the API consistent. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net/mlx5e: Use ethtool_sprintf/puts() to fill stats stringsGal Pressman7-146/+85
Use ethtool_sprintf/puts() helper functions which handle the common pattern of printing a string into the ethtool strings interface and incrementing the string pointer by ETH_GSTRING_LEN. Change the fill_strings callback to accept a **data pointer, and remove the index and return value. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net/mlx5e: Use ethtool_sprintf/puts() to fill selftests stringsGal Pressman1-1/+1
Use ethtool_sprintf/puts() helper functions which handle the common pattern of printing a string into the ethtool strings interface and incrementing the string pointer by ETH_GSTRING_LEN. The int return value in mlx5e_self_test_fill_strings() is not removed as it is still used to return the number of selftests. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net/mlx5e: Use ethtool_sprintf/puts() to fill priv flags stringsGal Pressman1-2/+1
Use ethtool_sprintf/puts() helper functions which handle the common pattern of printing a string into the ethtool strings interface and incrementing the string pointer by ETH_GSTRING_LEN. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04Merge tag 'wireless-next-2024-04-03' of ↵Jakub Kicinski124-1810/+7806
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.10 The first "new features" pull request for v6.10 with changes both in stack and in drivers. The big thing in this pull request is that wireless subsystem is now almost free of sparse warnings. There's only one warning left in ath11k which was introduced in v6.9-rc1 and will be fixed via the wireless tree. Realtek drivers continue to improve, now we have support for RTL8922AE and RTL8723CS devices. ath11k also has long waited support for P2P. This time we have a small conflict in iwlwifi, Stephen has an example merge resolution which should help with fixing the conflict: https://lore.kernel.org/all/20240326100945.765b8caf@canb.auug.org.au/ Major changes: rtw89 * RTL8922AE Wi-Fi 7 PCI device support rtw88 * RTL8723CS SDIO device support iwlwifi * don't support puncturing in 5 GHz * support monitor mode on passive channels * BZ-W device support * P2P with HE/EHT support ath11k * P2P support for QCA6390, WCN6855 and QCA2066 * tag 'wireless-next-2024-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (122 commits) wifi: mt76: mt7915: workaround dubious x | !y warning wifi: mwl8k: Avoid -Wflex-array-member-not-at-end warnings wifi: ti: Avoid a hundred -Wflex-array-member-not-at-end warnings wifi: iwlwifi: mvm: fix check in iwl_mvm_sta_fw_id_mask net: rfkill: gpio: Convert to platform remove callback returning void wifi: mac80211: use kvcalloc() for codel vars wifi: iwlwifi: reconfigure TLC during HW restart wifi: iwlwifi: mvm: don't change BA sessions during restart wifi: iwlwifi: mvm: select STA mask only for active links wifi: iwlwifi: mvm: set wider BW OFDMA ignore correctly wifi: iwlwifi: Add support for LARI_CONFIG_CHANGE_CMD cmd v9 wifi: iwlwifi: mvm: Declare HE/EHT capabilities support for P2P interfaces wifi: iwlwifi: mvm: Remove outdated comment wifi: iwlwifi: add support for BZ_W wifi: iwlwifi: Print a specific device name. wifi: iwlwifi: remove wrong CRF_IDs wifi: iwlwifi: remove devices that never came out wifi: iwlwifi: mvm: mark EMLSR disabled in cleanup iterator wifi: iwlwifi: mvm: fix active link counting during recovery wifi: iwlwifi: mvm: assign link STA ID lookups during restart ... ==================== Link: https://lore.kernel.org/r/20240403093625.CF515C433C7@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04tools: ynl: ethtool.py: Make tool invokable from any CWDRahul Rameshbabu1-2/+6
ethtool.py depends on yml files in a specific location of the linux kernel tree. Using relative lookup for those files means that ethtool.py would need to be run under tools/net/ynl/. Lookup needed yml files without depending on the current working directory that ethtool.py is invoked from. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20240402204000.115081-1-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net: phy: marvell: implement cable-test for 88E308X/88E609X familyPawel Dembicki1-0/+208
This commit implements VCT in 88E308X/88E609X Family. It require two workarounds with some magic configuration. Regular use require only one register configuration. But Open Circuit require second workaround. It cause implementation two phases for fault length measuring. Fast Ethernet PHY have implemented very simple version of VCT. It's complitley different than vct5 or vct7. Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240402201123.2961909-3-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net: ethtool: Add impedance mismatch result code to cable testPawel Dembicki1-0/+4
Some PHYs can recognize during a cable test if the impedance in the cable is okay. They can detect reflections caused by impedance discontinuity between a regular 100 Ohm cable and an abnormal part with a higher or lower impedance. This commit introduces a new result code: ETHTOOL_A_CABLE_RESULT_CODE_IMPEDANCE_MISMATCH, which represents the results of a cable test indicating issues with impedance integrity. Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240402201123.2961909-2-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net: phy: marvell: add basic support of 88E308X/88E609X familyPawel Dembicki2-0/+14
This patch implements only basic support. It covers PHY used in multiple IC: PHY: 88E3082, 88E3083 Switch: 88E6096, 88E6097 Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20240402201123.2961909-1-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net: mana: Fix Rx DMA datasize and skb_over_panicHaiyang Zhang2-2/+1
mana_get_rxbuf_cfg() aligns the RX buffer's DMA datasize to be multiple of 64. So a packet slightly bigger than mtu+14, say 1536, can be received and cause skb_over_panic. Sample dmesg: [ 5325.237162] skbuff: skb_over_panic: text:ffffffffc043277a len:1536 put:1536 head:ff1100018b517000 data:ff1100018b517100 tail:0x700 end:0x6ea dev:<NULL> [ 5325.243689] ------------[ cut here ]------------ [ 5325.245748] kernel BUG at net/core/skbuff.c:192! [ 5325.247838] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 5325.258374] RIP: 0010:skb_panic+0x4f/0x60 [ 5325.302941] Call Trace: [ 5325.304389] <IRQ> [ 5325.315794] ? skb_panic+0x4f/0x60 [ 5325.317457] ? asm_exc_invalid_op+0x1f/0x30 [ 5325.319490] ? skb_panic+0x4f/0x60 [ 5325.321161] skb_put+0x4e/0x50 [ 5325.322670] mana_poll+0x6fa/0xb50 [mana] [ 5325.324578] __napi_poll+0x33/0x1e0 [ 5325.326328] net_rx_action+0x12e/0x280 As discussed internally, this alignment is not necessary. To fix this bug, remove it from the code. So oversized packets will be marked as CQE_RX_TRUNCATED by NIC, and dropped. Cc: stable@vger.kernel.org Fixes: 2fbbd712baf1 ("net: mana: Enable RX path to handle various MTU sizes") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1712087316-20886-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>