summaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)AuthorFilesLines
2025-04-30ixgbe: create E610 specific ethtool_ops structureJedrzej Jagielski2-6/+56
E610's implementation of various ethtool ops is different than the ones corresponding to ixgbe legacy products. Therefore create separate E610 ethtool_ops struct which will be filled out in the forthcoming patches. Add adequate ops struct basing on MAC type. This step requires changing a bit the flow of probing by placing ixgbe_set_ethtool_ops after hw.mac.type is assigned. So move the whole netdev assignment block after hw.mac.type is known. This step doesn't have any additional impact on probing sequence. Suggested-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Bharath R <bharath.r@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-30igc: Change Tx mode for MQPRIO offloadingKurt Kanzenbach2-20/+4
The current MQPRIO offload implementation uses the legacy TSN Tx mode. In this mode the hardware uses four packet buffers and considers queue priorities. In order to harmonize the TAPRIO implementation with MQPRIO, switch to the regular TSN Tx mode. This mode also uses four packet buffers and considers queue priorities. In addition to the legacy mode, transmission is always coupled to Qbv. The driver already has mechanisms to use a dummy schedule of 1 second with all gates open for ETF. Simply use this for MQPRIO too. This reduces code and makes it easier to add support for frame preemption later. Tested on i225 with real time application using high priority queue, iperf3 using low priority queue and network TAP device. Acked-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-30igc: Limit netdev_tc calls to MQPRIOKurt Kanzenbach2-21/+17
Limit netdev_tc calls to MQPRIO. Currently these calls are made in igc_tsn_enable_offload() and igc_tsn_disable_offload() which are used by TAPRIO and ETF as well. However, these are only required for MQPRIO. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-30igb: Get rid of spurious interruptsKurt Kanzenbach3-5/+28
When running the igc with XDP/ZC in busy polling mode with deferral of hard interrupts, interrupts still happen from time to time. That is caused by the igb task watchdog which triggers Rx interrupts periodically. That mechanism has been introduced to overcome skb/memory allocation failures [1]. So the Rx clean functions stop processing the Rx ring in case of such failure. The task watchdog triggers Rx interrupts periodically in the hope that memory became available in the mean time. The current behavior is undesirable for real time applications, because the driver induced Rx interrupts trigger also the softirq processing. However, all real time packets should be processed by the application which uses the busy polling method. Therefore, only trigger the Rx interrupts in case of real allocation failures. Introduce a new flag for signaling that condition. Follow the same logic as in commit 8dcf2c212078 ("igc: Get rid of spurious interrupts"). [1] - https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=3be507547e6177e5c808544bd6a2efa2c7f1d436 Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Sweta Kumari <sweta.kumari@intel.com> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-30igb: Add support for persistent NAPI configKurt Kanzenbach1-1/+2
Use netif_napi_add_config() to assign persistent per-NAPI config. This is useful for preserving NAPI settings when changing queue counts or for user space programs using SO_INCOMING_NAPI_ID. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-30igb: Link queues to NAPI instancesKurt Kanzenbach2-5/+40
Link queues to NAPI instances via netdev-genl API. This is required to use XDP/ZC busy polling. See commit 5ef44b3cb43b ("xsk: Bring back busy polling support") for details. This also allows users to query the info with netlink: |$ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ | --dump queue-get --json='{"ifindex": 2}' |[{'id': 0, 'ifindex': 2, 'napi-id': 8201, 'type': 'rx'}, | {'id': 1, 'ifindex': 2, 'napi-id': 8202, 'type': 'rx'}, | {'id': 2, 'ifindex': 2, 'napi-id': 8203, 'type': 'rx'}, | {'id': 3, 'ifindex': 2, 'napi-id': 8204, 'type': 'rx'}, | {'id': 0, 'ifindex': 2, 'napi-id': 8201, 'type': 'tx'}, | {'id': 1, 'ifindex': 2, 'napi-id': 8202, 'type': 'tx'}, | {'id': 2, 'ifindex': 2, 'napi-id': 8203, 'type': 'tx'}, | {'id': 3, 'ifindex': 2, 'napi-id': 8204, 'type': 'tx'}] Add rtnl locking to PCI error handlers, because netif_queue_set_napi() requires the lock held. While at __igb_open() use RCT coding style. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Joe Damato <jdamato@fastly.com> Tested-by: Sweta Kumari <sweta.kumari@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-30igb: Link IRQs to NAPI instancesKurt Kanzenbach1-0/+3
Link IRQs to NAPI instances via netdev-genl API. This allows users to query that information via netlink: |$ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ | --dump napi-get --json='{"ifindex": 2}' |[{'defer-hard-irqs': 0, | 'gro-flush-timeout': 0, | 'id': 8204, | 'ifindex': 2, | 'irq': 127, | 'irq-suspend-timeout': 0}, | {'defer-hard-irqs': 0, | 'gro-flush-timeout': 0, | 'id': 8203, | 'ifindex': 2, | 'irq': 126, | 'irq-suspend-timeout': 0}, | {'defer-hard-irqs': 0, | 'gro-flush-timeout': 0, | 'id': 8202, | 'ifindex': 2, | 'irq': 125, | 'irq-suspend-timeout': 0}, | {'defer-hard-irqs': 0, | 'gro-flush-timeout': 0, | 'id': 8201, | 'ifindex': 2, | 'irq': 124, | 'irq-suspend-timeout': 0}] |$ cat /proc/interrupts | grep enp2s0 |123: 0 1 IR-PCI-MSIX-0000:02:00.0 0-edge enp2s0 |124: 0 7 IR-PCI-MSIX-0000:02:00.0 1-edge enp2s0-TxRx-0 |125: 0 0 IR-PCI-MSIX-0000:02:00.0 2-edge enp2s0-TxRx-1 |126: 0 5 IR-PCI-MSIX-0000:02:00.0 3-edge enp2s0-TxRx-2 |127: 0 0 IR-PCI-MSIX-0000:02:00.0 4-edge enp2s0-TxRx-3 Reviewed-by: Joe Damato <jdamato@fastly.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-30net: phy: aquantia: fix commenting formatAryan Srivastava1-4/+2
Comment was erroneously added with /**, amend this to use /* as it is not a kernel-doc. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504262247.1UBrDBVN-lkp@intel.com/ Signed-off-by: Aryan Srivastava <aryan.srivastava@alliedtelesis.co.nz> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250428214920.813038-1-aryan.srivastava@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-30net: dsa: felix: fix broken taprio gate states after clock jumpVladimir Oltean1-1/+4
Simplest setup to reproduce the issue: connect 2 ports of the LS1028A-RDB together (eno0 with swp0) and run: $ ip link set eno0 up && ip link set swp0 up $ tc qdisc replace dev swp0 parent root handle 100 taprio num_tc 8 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 map 0 1 2 3 4 5 6 7 \ base-time 0 sched-entry S 20 300000 sched-entry S 10 200000 \ sched-entry S 20 300000 sched-entry S 48 200000 \ sched-entry S 20 300000 sched-entry S 83 200000 \ sched-entry S 40 300000 sched-entry S 00 200000 flags 2 $ ptp4l -i eno0 -f /etc/linuxptp/configs/gPTP.cfg -m & $ ptp4l -i swp0 -f /etc/linuxptp/configs/gPTP.cfg -m One will observe that the PTP state machine on swp0 starts synchronizing, then it attempts to do a clock step, and after that, it never fails to recover from the condition below. ptp4l[82.427]: selected best master clock 00049f.fffe.05f627 ptp4l[82.428]: port 1 (swp0): MASTER to UNCALIBRATED on RS_SLAVE ptp4l[83.252]: port 1 (swp0): UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l[83.886]: rms 4537731277 max 9075462553 freq -18518 +/- 11467 delay 818 +/- 0 ptp4l[84.170]: timed out while polling for tx timestamp ptp4l[84.171]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[84.172]: port 1 (swp0): send peer delay request failed ptp4l[84.173]: port 1 (swp0): clearing fault immediately ptp4l[84.269]: port 1 (swp0): SLAVE to LISTENING on INIT_COMPLETE ptp4l[85.303]: timed out while polling for tx timestamp ptp4l[84.171]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[84.172]: port 1 (swp0): send peer delay request failed ptp4l[84.173]: port 1 (swp0): clearing fault immediately ptp4l[84.269]: port 1 (swp0): SLAVE to LISTENING on INIT_COMPLETE ptp4l[85.303]: timed out while polling for tx timestamp ptp4l[85.304]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[85.305]: port 1 (swp0): send peer delay response failed ptp4l[85.306]: port 1 (swp0): clearing fault immediately ptp4l[86.304]: timed out while polling for tx timestamp A hint is given by the non-zero statistics for dropped packets which were expecting hardware TX timestamps: $ ethtool --include-statistics -T swp0 (...) Statistics: tx_pkts: 30 tx_lost: 11 tx_err: 0 We know that when PTP clock stepping takes place (from ocelot_ptp_settime64() or from ocelot_ptp_adjtime()), vsc9959_tas_clock_adjust() is called. Another interesting hint is that placing an early return in vsc9959_tas_clock_adjust(), so as to neutralize this function, fixes the issue and TX timestamps are no longer dropped. The debugging function written by me and included below is intended to read the GCL RAM, after the admin schedule became operational, through the two status registers available for this purpose: QSYS_GCL_STATUS_REG_1 and QSYS_GCL_STATUS_REG_2. static void vsc9959_print_tas_gcl(struct ocelot *ocelot) { u32 val, list_length, interval, gate_state; int i, err; err = read_poll_timeout(ocelot_read, val, !(val & QSYS_PARAM_STATUS_REG_8_CONFIG_PENDING), 10, 100000, false, ocelot, QSYS_PARAM_STATUS_REG_8); if (err) { dev_err(ocelot->dev, "Failed to wait for TAS config pending bit to clear: %pe\n", ERR_PTR(err)); return; } val = ocelot_read(ocelot, QSYS_PARAM_STATUS_REG_3); list_length = QSYS_PARAM_STATUS_REG_3_LIST_LENGTH_X(val); dev_info(ocelot->dev, "GCL length: %u\n", list_length); for (i = 0; i < list_length; i++) { ocelot_rmw(ocelot, QSYS_GCL_STATUS_REG_1_GCL_ENTRY_NUM(i), QSYS_GCL_STATUS_REG_1_GCL_ENTRY_NUM_M, QSYS_GCL_STATUS_REG_1); interval = ocelot_read(ocelot, QSYS_GCL_STATUS_REG_2); val = ocelot_read(ocelot, QSYS_GCL_STATUS_REG_1); gate_state = QSYS_GCL_STATUS_REG_1_GATE_STATE_X(val); dev_info(ocelot->dev, "GCL entry %d: states 0x%x interval %u\n", i, gate_state, interval); } } Calling it from two places: after the initial QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE performed by vsc9959_qos_port_tas_set(), and after the one done by vsc9959_tas_clock_adjust(), I notice the following difference. From the tc-taprio process context, where the schedule was initially configured, the GCL looks like this: mscc_felix 0000:00:00.5: GCL length: 8 mscc_felix 0000:00:00.5: GCL entry 0: states 0x20 interval 300000 mscc_felix 0000:00:00.5: GCL entry 1: states 0x10 interval 200000 mscc_felix 0000:00:00.5: GCL entry 2: states 0x20 interval 300000 mscc_felix 0000:00:00.5: GCL entry 3: states 0x48 interval 200000 mscc_felix 0000:00:00.5: GCL entry 4: states 0x20 interval 300000 mscc_felix 0000:00:00.5: GCL entry 5: states 0x83 interval 200000 mscc_felix 0000:00:00.5: GCL entry 6: states 0x40 interval 300000 mscc_felix 0000:00:00.5: GCL entry 7: states 0x0 interval 200000 But from the ptp4l clock stepping process context, when the vsc9959_tas_clock_adjust() hook is called, the GCL RAM of the operational schedule now looks like this: mscc_felix 0000:00:00.5: GCL length: 8 mscc_felix 0000:00:00.5: GCL entry 0: states 0x0 interval 0 mscc_felix 0000:00:00.5: GCL entry 1: states 0x0 interval 0 mscc_felix 0000:00:00.5: GCL entry 2: states 0x0 interval 0 mscc_felix 0000:00:00.5: GCL entry 3: states 0x0 interval 0 mscc_felix 0000:00:00.5: GCL entry 4: states 0x0 interval 0 mscc_felix 0000:00:00.5: GCL entry 5: states 0x0 interval 0 mscc_felix 0000:00:00.5: GCL entry 6: states 0x0 interval 0 mscc_felix 0000:00:00.5: GCL entry 7: states 0x0 interval 0 I do not have a formal explanation, just experimental conclusions. It appears that after triggering QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE for a port's TAS, the GCL entry RAM is updated anyway, despite what the documentation claims: "Specify the time interval in QSYS::GCL_CFG_REG_2.TIME_INTERVAL. This triggers the actual RAM write with the gate state and the time interval for the entry number specified". We don't touch that register (through vsc9959_tas_gcl_set()) from vsc9959_tas_clock_adjust(), yet the GCL RAM is updated anyway. It seems to be updated with effectively stale memory, which in my testing can hold a variety of things, including even pieces of the previously applied schedule, for particular schedule lengths. As such, in most circumstances it is very difficult to pinpoint this issue, because the newly updated schedule would "behave strangely", but ultimately might still pass traffic to some extent, due to some gate entries still being present in the stale GCL entry RAM. It is easy to miss. With the particular schedule given at the beginning, the GCL RAM "happens" to be reproducibly rewritten with all zeroes, and this is consistent with what we see: when the time-aware shaper has gate entries with all gates closed, traffic is dropped on TX, no wonder we can't retrieve TX timestamps. Rewriting the GCL entry RAM when reapplying the new base time fixes the observed issue. Fixes: 8670dc33f48b ("net: dsa: felix: update base time of time-aware shaper when adjusting PTP time") Reported-by: Richie Pearn <richard.pearn@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250426144859.3128352-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-30net: ethernet: mtk_eth_soc: fix SER panic with 4GB+ RAMChad Monroe1-5/+9
If the mtk_poll_rx() function detects the MTK_RESETTING flag, it will jump to release_desc and refill the high word of the SDP on the 4GB RFB. Subsequently, mtk_rx_clean will process an incorrect SDP, leading to a panic. Add patch from MediaTek's SDK to resolve this. Fixes: 2d75891ebc09 ("net: ethernet: mtk_eth_soc: support 36-bit DMA addressing on MT7988") Link: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/71f47ea785699c6aa3b922d66c2bdc1a43da25b1 Signed-off-by: Chad Monroe <chad@monroe.io> Link: https://patch.msgid.link/4adc2aaeb0fb1b9cdc56bf21cf8e7fa328daa345.1745715843.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-30igc: fix lock order in igc_ptp_resetJacob Keller1-2/+4
Commit 1a931c4f5e68 ("igc: add lock preventing multiple simultaneous PTM transactions") added a new mutex to protect concurrent PTM transactions. This lock is acquired in igc_ptp_reset() in order to ensure the PTM registers are properly disabled after a device reset. The flow where the lock is acquired already holds a spinlock, so acquiring a mutex leads to a sleep-while-locking bug, reported both by smatch, and the kernel test robot. The critical section in igc_ptp_reset() does correctly use the readx_poll_timeout_atomic variants, but the standard PTM flow uses regular sleeping variants. This makes converting the mutex to a spinlock a bit tricky. Instead, re-order the locking in igc_ptp_reset. Acquire the mutex first, and then the tmreg_lock spinlock. This is safe because there is no other ordering dependency on these locks, as this is the only place where both locks were acquired simultaneously. Indeed, any other flow acquiring locks in that order would be wrong regardless. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Fixes: 1a931c4f5e68 ("igc: add lock preventing multiple simultaneous PTM transactions") Link: https://lore.kernel.org/intel-wired-lan/Z_-P-Hc1yxcw0lTB@stanley.mountain/ Link: https://lore.kernel.org/intel-wired-lan/202504211511.f7738f5d-lkp@intel.com/T/#u Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-30idpf: protect shutdown from resetLarysa Zaremba1-0/+1
Before the referenced commit, the shutdown just called idpf_remove(), this way IDPF_REMOVE_IN_PROG was protecting us from the serv_task rescheduling reset. Without this flag set the shutdown process is vulnerable to HW reset or any other triggering conditions (such as default mailbox being destroyed). When one of conditions checked in idpf_service_task becomes true, vc_event_task can be rescheduled during shutdown, this leads to accessing freed memory e.g. idpf_req_rel_vector_indexes() trying to read vport->q_vector_idxs. This in turn causes the system to become defunct during e.g. systemctl kexec. Considering using IDPF_REMOVE_IN_PROG would lead to more heavy shutdown process, instead just cancel the serv_task before cancelling adapter->serv_task before cancelling adapter->vc_event_task to ensure that reset will not be scheduled while we are doing a shutdown. Fixes: 4c9106f4906a ("idpf: fix adapter NULL pointer dereference on reboot") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-30idpf: fix potential memory leak on kcalloc() failureMichal Swiatkowski1-8/+11
In case of failing on rss_data->rss_key allocation the function is freeing vport without freeing earlier allocated q_vector_idxs. Fix it. Move from freeing in error branch to goto scheme. Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport") Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Suggested-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29net: mdio: mux-meson-gxl: set reversed bit when using internal phyDa Xue1-1/+2
This bit is necessary to receive packets from the internal PHY. Without this bit set, no activity occurs on the interface. Normally u-boot sets this bit, but if u-boot is compiled without net support, the interface will be up but without any activity. If bit is set once, it will work until the IP is powered down or reset. The vendor SDK sets this bit along with the PHY_ID bits. Signed-off-by: Da Xue <da@libre.computer> Fixes: 9a24e1ff4326 ("net: mdio: add amlogic gxl mdio mux support") Link: https://patch.msgid.link/20250425192009.1439508-1-da@libre.computer Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net: dlink: Correct endianness handling of led_modeSimon Horman2-2/+2
As it's name suggests, parse_eeprom() parses EEPROM data. This is done by reading data, 16 bits at a time as follows: for (i = 0; i < 128; i++) ((__le16 *) sromdata)[i] = cpu_to_le16(read_eeprom(np, i)); sromdata is at the same memory location as psrom. And the type of psrom is a pointer to struct t_SROM. As can be seen in the loop above, data is stored in sromdata, and thus psrom, as 16-bit little-endian values. However, the integer fields of t_SROM are host byte order integers. And in the case of led_mode this leads to a little endian value being incorrectly treated as host byte order. Looking at rio_set_led_mode, this does appear to be a bug as that code masks led_mode with 0x1, 0x2 and 0x8. Logic that would be effected by a reversed byte order. This problem would only manifest on big endian hosts. Found by inspection while investigating a sparse warning regarding the crc field of t_SROM. I believe that warning is a false positive. And although I plan to send a follow-up to use little-endian types for other the integer fields of PSROM_t I do not believe that will involve any bug fixes. Compile tested only. Fixes: c3f45d322cbd ("dl2k: Add support for IP1000A-based cards") Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250425-dlink-led-mode-v1-1-6bae3c36e736@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net: phylink: Drop unused defines for SUPPORTED/ADVERTISED_INTERFACESAlexander Duyck1-7/+0
The defines for SUPPORTED_INTERFACES and ADVERTISED_INTERFACES both appear to be unused. I couldn't find anything that actually references them in the original diff that added them and it seems like they have persisted despite using deprecated defines that aren't supposed to be used as per the ethtool.h header that defines the bits they are composed of. Since they are unused, and not supposed to be used anymore I am just dropping the lines of code since they seem to just be occupying space. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/174578398922.1580647.9720643128205980455.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29nfp: xsk: Adjust allocation type for nn->dp.xsk_poolsKees Cook1-1/+1
In preparation for making the kmalloc family of allocators type aware, we need to make sure that the returned type from the allocation matches the type of the variable being assigned. (Before, the allocator would always return "void *", which can be implicitly cast to any pointer type.) The assigned type "struct xsk_buff_pool **", but the returned type will be "struct xsk_buff_pool ***". These are the same allocation size (pointer size), but the types don't match. Adjust the allocation type to match the assignment. Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250426060841.work.016-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net/mlx4_core: Adjust allocation type for buddy->bitsKees Cook1-1/+1
In preparation for making the kmalloc family of allocators type aware, we need to make sure that the returned type from the allocation matches the type of the variable being assigned. (Before, the allocator would always return "void *", which can be implicitly cast to any pointer type.) The assigned type is "unsigned long **", but the returned type will be "long **". These are the same size allocation (pointer size) but the types do not match. Adjust the allocation type to match the assignment. Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250426060757.work.865-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29pds_core: Allocate pdsc_viftype_defaults copy with ARRAY_SIZE()Kees Cook1-1/+2
In preparation for making the kmalloc family of allocators type aware, we need to make sure that the returned type from the allocation matches the type of the variable being assigned. (Before, the allocator would always return "void *", which can be implicitly cast to any pointer type.) This is allocating a copy of pdsc_viftype_defaults, which is an array of struct pdsc_viftype. To correctly return "struct pdsc_viftype *" in the future, adjust the allocation to allocating ARRAY_SIZE-many entries. The resulting allocation size is the same. Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20250426060712.work.575-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net: ti: icssg-prueth: Add ICSSG FW StatsMD Danish Anwar5-31/+94
The ICSSG firmware maintains set of stats called PA_STATS. Currently the driver only dumps 4 stats. Add support for dumping more stats. The offset for different stats are defined as MACROs in icssg_switch_map.h file. All the offsets are for Slice0. Slice1 offsets are slice0 + 4. The offset calculation is taken care while reading the stats in emac_update_hardware_stats(). The statistics are documented in Documentation/networking/device_drivers/icssg_prueth.rst Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Link: https://patch.msgid.link/20250424095316.2643573-1-danishanwar@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29rtase: Modify the format specifier in snprintf to %uJustin Lai1-1/+1
Modify the format specifier in snprintf to %u. Signed-off-by: Justin Lai <justinlai0215@realtek.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250425064057.30035-1-justinlai0215@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net: thunder_bgx: Don't disable PCI device manuallyPhilipp Stanner1-5/+3
thunder_bgx's PCI device is enabled with pcim_enable_device(), a managed devres function which ensures that the device gets enabled on driver detach automatically. Remove the calls to pci_disable_device(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250425085740.65304-10-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net: thunder_bgx: Use pure PCI devres APIPhilipp Stanner1-5/+2
The currently used function pci_request_regions() is one of the problematic "hybrid devres" PCI functions, which are sometimes managed through devres, and sometimes not (depending on whether pci_enable_device() or pcim_enable_device() has been called before). The PCI subsystem wants to remove this behavior and, therefore, needs to port all users to functions that don't have this problem. Furthermore, the PCI function being managed implies that it's not necessary to call pci_release_regions() manually. Remove the calls to pci_release_regions(). Replace pci_request_regions() with pcim_request_all_regions(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250425085740.65304-9-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net: mdio: thunder: Use pure PCI devres APIPhilipp Stanner1-7/+3
The currently used function pci_request_regions() is one of the problematic "hybrid devres" PCI functions, which are sometimes managed through devres, and sometimes not (depending on whether pci_enable_device() or pcim_enable_device() has been called before). The PCI subsystem wants to remove this behavior and, therefore, needs to port all users to functions that don't have this problem. Furthermore, the PCI function being managed implies that it's not necessary to call pci_release_regions() manually. Remove the calls to pci_release_regions(). Replace pci_request_regions() with pcim_request_all_regions(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250425085740.65304-8-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net: ethernet: sis900: Use pure PCI devres APIPhilipp Stanner1-1/+1
The currently used function pci_request_regions() is one of the problematic "hybrid devres" PCI functions, which are sometimes managed through devres, and sometimes not (depending on whether pci_enable_device() or pcim_enable_device() has been called before). The PCI subsystem wants to remove this behavior and, therefore, needs to port all users to functions that don't have this problem. Replace pci_request_regions() with pcim_request_all_regions(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Daniele Venzano <venza@brownhat.org> Link: https://patch.msgid.link/20250425085740.65304-7-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net: ethernet: natsemi: Use pure PCI devres APIPhilipp Stanner1-1/+1
The currently used function pci_request_regions() is one of the problematic "hybrid devres" PCI functions, which are sometimes managed through devres, and sometimes not (depending on whether pci_enable_device() or pcim_enable_device() has been called before). The PCI subsystem wants to remove this behavior and, therefore, needs to port all users to functions that don't have this problem. Replace pci_request_regions() with pcim_request_all_regions(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250425085740.65304-6-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net: tulip: Use pure PCI devres APIPhilipp Stanner2-2/+2
The currently used function pci_request_regions() is one of the problematic "hybrid devres" PCI functions, which are sometimes managed through devres, and sometimes not (depending on whether pci_enable_device() or pcim_enable_device() has been called before). The PCI subsystem wants to remove this behavior and, therefore, needs to port all users to functions that don't have this problem. Replace pci_request_regions() with pcim_request_all_regions(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250425085740.65304-5-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net: octeontx2: Use pure PCI devres APIPhilipp Stanner3-27/+13
The currently used function pci_request_regions() is one of the problematic "hybrid devres" PCI functions, which are sometimes managed through devres, and sometimes not (depending on whether pci_enable_device() or pcim_enable_device() has been called before). The PCI subsystem wants to remove this behavior and, therefore, needs to port all users to functions that don't have this problem. Furthermore, the PCI function being managed implies that it's not necessary to call pci_release_regions() manually. Remove the calls to pci_release_regions(). Replace pci_request_regions() with pcim_request_all_regions(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patch.msgid.link/20250425085740.65304-4-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net: prestera: Use pure PCI devres APIPhilipp Stanner1-4/+2
The currently used function pci_request_regions() is one of the problematic "hybrid devres" PCI functions, which are sometimes managed through devres, and sometimes not (depending on whether pci_enable_device() or pcim_enable_device() has been called before). The PCI subsystem wants to remove this behavior and, therefore, needs to port all users to functions that don't have this problem. Furthermore, the PCI function being managed implies that it's not necessary to call pci_release_regions() manually. Remove the calls to pci_release_regions(). Replace pci_request_regions() with pcim_request_all_regions(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Acked-by: Elad Nachman <enachman@marvell.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250425085740.65304-3-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29idpf: fix offloads support for encapsulated packetsMadhu Chittim2-48/+27
Split offloads into csum, tso and other offloads so that tunneled packets do not by default have all the offloads enabled. Stateless offloads for encapsulated packets are not yet supported in firmware/software but in the driver we were setting the features same as non encapsulated features. Fixed naming to clarify CSUM bits are being checked for Tx. Inherit netdev features to VLAN interfaces as well. Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration") Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Zachary Goldstein <zachmgoldstein@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250425222636.3188441-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29ice: Check VF VSI Pointer Value in ice_vc_add_fdir_fltr()Xuanqiang Luo1-0/+5
As mentioned in the commit baeb705fd6a7 ("ice: always check VF VSI pointer values"), we need to perform a null pointer check on the return value of ice_get_vf_vsi() before using it. Fixes: 6ebbe97a4881 ("ice: Add a per-VF limit on number of FDIR filters") Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250425222636.3188441-3-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29ice: fix Get Tx Topology AQ command error on E830Paul Greenwalt1-5/+5
The Get Tx Topology AQ command (opcode 0x0418) has different read flag requirements depending on the hardware/firmware. For E810, E822, and E823 firmware the read flag must be set, and for newer hardware (E825 and E830) it must not be set. This results in failure to configure Tx topology and the following warning message during probe: DDP package does not support Tx scheduling layers switching feature - please update to the latest DDP package and try again The current implementation only handles E825-C but not E830. It is confusing as we first check ice_is_e825c() and then set the flag in the set case. Finally, we check ice_is_e825c() again and set the flag for all other hardware in both the set and get case. Instead, notice that we always need the read flag for set, but only need the read flag for get on E810, E822, and E823 firmware. Fix the logic to check the MAC type and set the read flag in get only on the older devices which require it. Fixes: ba1124f58afd ("ice: Add E830 device IDs, MAC type and registers") Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250425222636.3188441-2-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29pds_core: remove write-after-free of client_idShannon Nelson1-1/+0
A use-after-free error popped up in stress testing: [Mon Apr 21 21:21:33 2025] BUG: KFENCE: use-after-free write in pdsc_auxbus_dev_del+0xef/0x160 [pds_core] [Mon Apr 21 21:21:33 2025] Use-after-free write at 0x000000007013ecd1 (in kfence-#47): [Mon Apr 21 21:21:33 2025] pdsc_auxbus_dev_del+0xef/0x160 [pds_core] [Mon Apr 21 21:21:33 2025] pdsc_remove+0xc0/0x1b0 [pds_core] [Mon Apr 21 21:21:33 2025] pci_device_remove+0x24/0x70 [Mon Apr 21 21:21:33 2025] device_release_driver_internal+0x11f/0x180 [Mon Apr 21 21:21:33 2025] driver_detach+0x45/0x80 [Mon Apr 21 21:21:33 2025] bus_remove_driver+0x83/0xe0 [Mon Apr 21 21:21:33 2025] pci_unregister_driver+0x1a/0x80 The actual device uninit usually happens on a separate thread scheduled after this code runs, but there is no guarantee of order of thread execution, so this could be a problem. There's no actual need to clear the client_id at this point, so simply remove the offending code. Fixes: 10659034c622 ("pds_core: add the aux client API") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250425203857.71547-1-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29veth: apply qdisc backpressure on full ptr_ring to reduce TX dropsJesper Dangaard Brouer1-10/+47
In production, we're seeing TX drops on veth devices when the ptr_ring fills up. This can occur when NAPI mode is enabled, though it's relatively rare. However, with threaded NAPI - which we use in production - the drops become significantly more frequent. The underlying issue is that with threaded NAPI, the consumer often runs on a different CPU than the producer. This increases the likelihood of the ring filling up before the consumer gets scheduled, especially under load, leading to drops in veth_xmit() (ndo_start_xmit()). This patch introduces backpressure by returning NETDEV_TX_BUSY when the ring is full, signaling the qdisc layer to requeue the packet. The txq (netdev queue) is stopped in this condition and restarted once veth_poll() drains entries from the ring, ensuring coordination between NAPI and qdisc. Backpressure is only enabled when a qdisc is attached. Without a qdisc, the driver retains its original behavior - dropping packets immediately when the ring is full. This avoids unexpected behavior changes in setups without a configured qdisc. With a qdisc in place (e.g. fq, sfq) this allows Active Queue Management (AQM) to fairly schedule packets across flows and reduce collateral damage from elephant flows. A known limitation of this approach is that the full ring sits in front of the qdisc layer, effectively forming a FIFO buffer that introduces base latency. While AQM still improves fairness and mitigates flow dominance, the latency impact is measurable. In hardware drivers, this issue is typically addressed using BQL (Byte Queue Limits), which tracks in-flight bytes needed based on physical link rate. However, for virtual drivers like veth, there is no fixed bandwidth constraint - the bottleneck is CPU availability and the scheduler's ability to run the NAPI thread. It is unclear how effective BQL would be in this context. This patch serves as a first step toward addressing TX drops. Future work may explore adapting a BQL-like mechanism to better suit virtual devices like veth. Reported-by: Yan Zhai <yan@cloudflare.com> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Link: https://patch.msgid.link/174559294022.827981.1282809941662942189.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net: sched: generalize check for no-queue qdisc on TX queueJesper Dangaard Brouer1-3/+1
The "noqueue" qdisc can either be directly attached, or get default attached if net_device priv_flags has IFF_NO_QUEUE. In both cases, the allocated Qdisc structure gets it's enqueue function pointer reset to NULL by noqueue_init() via noqueue_qdisc_ops. This is a common case for software virtual net_devices. For these devices with no-queue, the transmission path in __dev_queue_xmit() will bypass the qdisc layer. Directly invoking device drivers ndo_start_xmit (via dev_hard_start_xmit). In this mode the device driver is not allowed to ask for packets to be queued (either via returning NETDEV_TX_BUSY or stopping the TXQ). The simplest and most reliable way to identify this no-queue case is by checking if enqueue == NULL. The vrf driver currently open-codes this check (!qdisc->enqueue). While functionally correct, this low-level detail is better encapsulated in a dedicated helper for clarity and long-term maintainability. To make this behavior more explicit and reusable, this patch introduce a new helper: qdisc_txq_has_no_queue(). Helper will also be used by the veth driver in the next patch, which introduces optional qdisc-based backpressure. This is a non-functional change. Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/174559293172.827981.7583862632045264175.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29mdio: fix CONFIG_MDIO_DEVRES selectsArnd Bergmann2-4/+6
The newly added rtl9300 driver needs MDIO_DEVRES: x86_64-linux-ld: drivers/net/mdio/mdio-realtek-rtl9300.o: in function `rtl9300_mdiobus_probe': mdio-realtek-rtl9300.c:(.text+0x941): undefined reference to `devm_mdiobus_alloc_size' x86_64-linux-ld: mdio-realtek-rtl9300.c:(.text+0x9e2): undefined reference to `__devm_mdiobus_register' Since this is a hidden symbol, it needs to be selected by each user, rather than the usual 'depends on'. I see that there are a few other drivers that accidentally use 'depends on', so fix these as well for consistency and to avoid dependency loops. Fixes: 37f9b2a6c086 ("net: ethernet: Add missing depends on MDIO_DEVRES") Fixes: 24e31e474769 ("net: mdio: Add RTL9300 MDIO driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Link: https://patch.msgid.link/20250425112819.1645342-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net: ethernet: mtk_eth_soc: sync mtk_clks_source_name arrayDaniel Golle1-4/+0
When removing the clock bits for clocks which aren't used by the Ethernet driver their names should also have been removed from the mtk_clks_source_name array. Remove them now as enum mtk_clks_map needs to match the mtk_clks_source_name array so the driver can make sure that all required clocks are present and correctly name missing clocks. Fixes: 887b1d1adb2e ("net: ethernet: mtk_eth_soc: drop clocks unused by Ethernet driver") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/d075e706ff1cebc07f9ec666736d0b32782fd487.1745555321.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28amd-xgbe: Fix to ensure dependent features are toggled with RX checksum offloadVishal Badole4-6/+42
According to the XGMAC specification, enabling features such as Layer 3 and Layer 4 Packet Filtering, Split Header and Virtualized Network support automatically selects the IPC Full Checksum Offload Engine on the receive side. When RX checksum offload is disabled, these dependent features must also be disabled to prevent abnormal behavior caused by mismatched feature dependencies. Ensure that toggling RX checksum offload (disabling or enabling) properly disables or enables all dependent features, maintaining consistent and expected behavior in the network device. Cc: stable@vger.kernel.org Fixes: 1a510ccf5869 ("amd-xgbe: Add support for VXLAN offload capabilities") Signed-off-by: Vishal Badole <Vishal.Badole@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250424130248.428865-1-Vishal.Badole@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net: stmmac: dwmac-loongson: Add new GMAC's PCI device ID supportHuacai Chen1-3/+5
Add a new GMAC's PCI device ID (0x7a23) support which is used in Loongson-2K3000/Loongson-3B6000M. The new GMAC device use external PHY, so it reuses loongson_gmac_data() as the old GMAC device (0x7a03), and the new GMAC device still doesn't support flow control now. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Tested-by: Henry Chen <chenx97@aosc.io> Tested-by: Biao Dong <dongbiao@loongson.cn> Signed-off-by: Baoqi Zhang <zhangbaoqi@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Link: https://patch.msgid.link/20250424072209.3134762-4-chenhuacai@loongson.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net: stmmac: dwmac-loongson: Add new multi-chan IP core supportHuacai Chen1-25/+37
Add a new multi-chan IP core (0x12) support which is used in Loongson- 2K3000/Loongson-3B6000M. Compared with the 0x10 core, the new 0x12 core reduces channel numbers from 8 to 4, but checksum is supported for all channels. Add a "multichan" flag to loongson_data, so that we can simply use a "if (ld->multichan)" condition rather than the complicated condition "if (ld->loongson_id == DWMAC_CORE_MULTICHAN_V1 || ld->loongson_id == DWMAC_CORE_MULTICHAN_V2)". Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Henry Chen <chenx97@aosc.io> Tested-by: Biao Dong <dongbiao@loongson.cn> Signed-off-by: Baoqi Zhang <zhangbaoqi@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Link: https://patch.msgid.link/20250424072209.3134762-3-chenhuacai@loongson.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net: stmmac: socfpga: Remove unused pcs-mdiodev fieldMaxime Chevallier1-1/+0
When dwmac-socfpga was converted to using the Lynx PCS (previously referred to in the driver as the Altera TSE PCS), the lynx_pcs_create_mdiodev() was used to create the pcs instance. As this function didn't exist in the early versions of the series, a local mdiodev object was stored for PCS creation. It was never used, but still made it into the driver, so remove it. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250424071223.221239-4-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net: stmmac: socfpga: Don't check for phy to enable the SGMII adapterMaxime Chevallier1-6/+7
The SGMII adapter needs to be enabled for both Cisco SGMII and 1000BaseX operations. It doesn't make sense to check for an attached phydev here, as we simply might not have any, in particular if we're using the 1000BaseX interface mode. Make so that we only re-enable the SGMII adapter when it's present, and when we use a phy_mode that is handled by said adapter. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250424071223.221239-3-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net: stmmac: socfpga: Enable internal GMII when using 1000BaseXMaxime Chevallier1-0/+1
Dwmac Socfpga may be used with an instance of a Lynx / Altera TSE PCS, in which case it gains support for 1000BaseX. It appears that the PCS is wired to the MAC through an internal GMII bus. Make sure that we enable the GMII_MII mode for the internal MAC when using 1000BaseX. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250424071223.221239-2-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net: stmmac: dwmac-loongson: Move queue number init to common functionHuacai Chen1-31/+9
Currently, the tx and rx queue number initialization is duplicated in loongson_gmac_data() and loongson_gnet_data(), so move it to the common function loongson_default_data(). This is a preparation for later patches. Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Tested-by: Henry Chen <chenx97@aosc.io> Tested-by: Biao Dong <dongbiao@loongson.cn> Signed-off-by: Baoqi Zhang <zhangbaoqi@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-28bonding: assign random address if device address is same as bondHangbin Liu1-7/+18
This change addresses a MAC address conflict issue in failover scenarios, similar to the problem described in commit a951bc1e6ba5 ("bonding: correct the MAC address for 'follow' fail_over_mac policy"). In fail_over_mac=follow mode, the bonding driver expects the formerly active slave to swap MAC addresses with the newly active slave during failover. However, under certain conditions, two slaves may end up with the same MAC address, which breaks this policy: 1) ip link set eth0 master bond0 -> bond0 adopts eth0's MAC address (MAC0). 2) ip link set eth1 master bond0 -> eth1 is added as a backup with its own MAC (MAC1). 3) ip link set eth0 nomaster -> eth0 is released and restores its MAC (MAC0). -> eth1 becomes the active slave, and bond0 assigns MAC0 to eth1. 4) ip link set eth0 master bond0 -> eth0 is re-added to bond0, now both eth0 and eth1 have MAC0. This results in a MAC address conflict and violates the expected behavior of the failover policy. To fix this, we assign a random MAC address to any newly added slave if its current MAC address matches that of the bond. The original (permanent) MAC address is saved and will be restored when the device is released from the bond. This ensures that each slave has a unique MAC address during failover transitions, preserving the integrity of the fail_over_mac=follow policy. Fixes: 3915c1e8634a ("bonding: Add "follow" option to fail_over_mac") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-28wifi: rtlwifi: disable ASPM for RTL8723BE with subsystem ID 11ad:1723Mingcong Bai1-0/+10
RTL8723BE found on some ASUSTek laptops, such as F441U and X555UQ with subsystem ID 11ad:1723 are known to output large amounts of PCIe AER errors during and after boot up, causing heavy lags and at times lock-ups: pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5 pcieport 0000:00:1c.5: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID) pcieport 0000:00:1c.5: device [8086:9d15] error status/mask=00000001/00002000 pcieport 0000:00:1c.5: [ 0] RxErr Disable ASPM on this combo as a quirk. This patch is a revision of a previous patch (linked below) which attempted to disable ASPM for RTL8723BE on all Intel Skylake and Kaby Lake PCIe bridges. I take a more conservative approach as all known reports point to ASUSTek laptops of these two generations with this particular wireless card. Please note, however, before the rtl8723be finishes probing, the AER errors remained. After the module finishes probing, all AER errors would indeed be eliminated, along with heavy lags, poor network throughput, and/or occasional lock-ups. Cc: <stable@vger.kernel.org> Fixes: a619d1abe20c ("rtlwifi: rtl8723be: Add new driver") Reported-by: Liangliang Zou <rawdiamondmc@outlook.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218127 Link: https://lore.kernel.org/lkml/05390e0b-27fd-4190-971e-e70a498c8221@lwfinger.net/T/ Tested-by: Liangliang Zou <rawdiamondmc@outlook.com> Signed-off-by: Mingcong Bai <jeffbai@aosc.io> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/20250422061755.356535-1-jeffbai@aosc.io
2025-04-28wifi: rtw89: mcc: avoid that loose pattern sets negative timing for auxiliary GOZong-Zhe Yang1-1/+2
A MCC (multi-channel concurrency) schedule is like the following. |< mcc interval >| |< duration ref >|< duration aux >| |< tob ref >|< toa ref >|< tob aux >|< toa aux >| V V tbtt ref tbtt aux |< beacon offset >| Original logic might unexpectedly calculate toa (time offset ahead) of auxiliary role to be negative even when there is no role timing limit. If toa-aux is negative, TBTT-aux would in logic fall into duration-ref. Then, if auxiliary role is GO unfortunately, it cannot guarantee that beacons will TX well. So now, when deciding the lower bound of toa-ref, take toa-aux into account. Make toa-aux at least be zero in normal cases. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/20250422014620.18421-13-pkshih@realtek.com
2025-04-28wifi: rtw89: mcc: refine filling function of start TSFZong-Zhe Yang1-4/+5
Since tob (time offset behind) could be negative, change the type of tob in microsecond to s32. And, refine accumulation with while-loop by calculation with roundup_u64(). Besides, as long as one of the MCC roles is GO, use the short MCC start time. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/20250422014620.18421-12-pkshih@realtek.com
2025-04-28wifi: rtw89: mcc: support courtesy mechanism on both roles at the same timeZong-Zhe Yang3-43/+58
MCC has a courtesy mechanism which allows one role to use another's duration in a given cycle. Originally, this courtesy mechanism only supports in one direction. However, in some field cases, both of MCC roles may simultaneously have timing configurations that are not good enough. So, support courtesy mechanism in both directions. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/20250422014620.18421-11-pkshih@realtek.com
2025-04-28wifi: rtw89: mcc: update entire plan when courtesy config changesZong-Zhe Yang1-1/+9
MCC has a courtesy mechanism which allows one role to use another's duration in a given cycle. Courtesy mechanism will be enabled when one role has a not perfect duration. Otherwise, not. When MCC updates, duration of each role will be re-calculated. And then, the new courtesy config might be different from the old one. However, to change courtesy config, the entire MCC plan requires to be renewed when MCC updates. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/20250422014620.18421-10-pkshih@realtek.com