Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 1703b2e0de653b459ca6230be32ce7f2ea0ae7ee ]
When users attempt to obtain the coalesce setting using the
ethtool command, current code always returns 0 for tx-usecs.
This is because I225/6 always uses a queue pair setting, hence
tx_coalesce_usecs does not return a value during the
igc_ethtool_get_coalesce() callback process. The pair queue
condition checking in igc_ethtool_get_coalesce() is removed by
this patch so that the user gets information of the value of tx-usecs.
Even if i225/6 is using queue pair setting, there is no harm in
notifying the user of the tx-usecs. The implementation of the current
code may have previously been a copy of the legacy code i210.
Since I225 has the queue pair setting enabled, tx-usecs will always adhere
to the user-set rx-usecs value. An error message will appear when the user
attempts to set the tx-usecs value for the input parameters because,
by default, they should only set the rx-usecs value.
This patch also adds the helper function to get the
previous rx coalesce value similar to tx coalesce.
How to test:
User can get the coalesce value using ethtool command.
Example command:
Get: ethtool -c <interface>
Previous output:
rx-usecs: 3
rx-frames: n/a
rx-usecs-irq: n/a
rx-frames-irq: n/a
tx-usecs: 0
tx-frames: n/a
tx-usecs-irq: n/a
tx-frames-irq: n/a
New output:
rx-usecs: 3
rx-frames: n/a
rx-usecs-irq: n/a
rx-frames-irq: n/a
tx-usecs: 3
tx-frames: n/a
tx-usecs-irq: n/a
tx-frames-irq: n/a
Fixes: 8c5ad0dae93c ("igc: Add ethtool support")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230919170331.1581031-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cb47b1f679c4d83a5fa5f1852e472f844e41a3da ]
When an XDP redirect happens before the link is ready, that
transmission will not finish and will timeout, causing an adapter
reset. If the redirects do not stop, the adapter will not stop
resetting.
Wait for the driver to signal that there's a carrier before allowing
transmissions to proceed.
Previous code was relying that when __IGC_DOWN is cleared, the NIC is
ready to transmit as all the queues are ready, what happens is that
the carrier presence will only be signaled later, after the watchdog
workqueue has a chance to run. And during this interval (between
clearing __IGC_DOWN and the watchdog running) if any transmission
happens the timeout is emitted (detected by igc_tx_timeout()) which
causes the reset, with the potential for the infinite loop.
Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action")
Reported-by: Ferenc Fejes <ferenc.fejes@ericsson.com>
Closes: https://lore.kernel.org/netdev/0caf33cf6adb3a5bf137eeaa20e89b167c9986d5.camel@ericsson.com/
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Ferenc Fejes <ferenc.fejes@ericsson.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d0d362ffa33da4acdcf7aee2116ceef8c8fef658 ]
If port VLAN is configured on a VF then any other VLANs on top of this VF
are broken.
During i40e_ndo_set_vf_port_vlan() call the i40e driver reset the VF and
iavf driver asks PF (using VIRTCHNL_OP_GET_VF_RESOURCES) for VF capabilities
but this reset occurs too early, prior setting of vf->info.pvid field
and because this field can be zero during i40e_vc_get_vf_resources_msg()
then VIRTCHNL_VF_OFFLOAD_VLAN capability is reported to iavf driver.
This is wrong because iavf driver should not report VLAN offloading
capability when port VLAN is configured as i40e does not support QinQ
offloading.
Fix the issue by moving VF reset after setting of vf->port_vlan_id
field.
Without this patch:
$ echo 1 > /sys/class/net/enp2s0f0/device/sriov_numvfs
$ ip link set enp2s0f0 vf 0 vlan 3
$ ip link set enp2s0f0v0 up
$ ip link add link enp2s0f0v0 name vlan4 type vlan id 4
$ ip link set vlan4 up
...
$ ethtool -k enp2s0f0v0 | grep vlan-offload
rx-vlan-offload: on
tx-vlan-offload: on
$ dmesg -l err | grep iavf
[1292500.742914] iavf 0000:02:02.0: Failed to add VLAN filter, error IAVF_ERR_INVALID_QP_ID
With this patch:
$ echo 1 > /sys/class/net/enp2s0f0/device/sriov_numvfs
$ ip link set enp2s0f0 vf 0 vlan 3
$ ip link set enp2s0f0v0 up
$ ip link add link enp2s0f0v0 name vlan4 type vlan id 4
$ ip link set vlan4 up
...
$ ethtool -k enp2s0f0v0 | grep vlan-offload
rx-vlan-offload: off [requested on]
tx-vlan-offload: off [requested on]
$ dmesg -l err | grep iavf
Fixes: f9b4b6278d51 ("i40e: Reset the VF upon conflicting VLAN configuration")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5f3d319a248654a805bafc9e7094bcea47dac6c7 ]
When the iavf driver wants to reconfigure the VLAN filters
(iavf_add_vlan, iavf_del_vlan), it sets a flag in
aq_required:
adapter->aq_required |= IAVF_FLAG_AQ_ADD_VLAN_FILTER;
or:
adapter->aq_required |= IAVF_FLAG_AQ_DEL_VLAN_FILTER;
This is later processed by the watchdog_task, but it runs periodically
every 2 seconds, so it can be a long time before it processes the request.
In the worst case, the interface is unable to receive traffic for more
than 2 seconds for no objective reason.
Fixes: 5eae00c57f5e ("i40evf: main driver core")
Signed-off-by: Petr Oros <poros@redhat.com>
Co-developed-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ed4cad33df9e272feaa6698b33359b29c2929564 ]
Add helper for set iavf aq request AVF_FLAG_AQ_* and immediately
schedule watchdog_task. Helper will be used in cases where it is
necessary to run aq requests asap
Signed-off-by: Petr Oros <poros@redhat.com>
Co-developed-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stable-dep-of: 5f3d319a2486 ("iavf: schedule a request immediately after add/delete vlan")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c8de44b577eb540e8bfea55afe1d0904bb571b7a ]
Prevent schedule operations for adminq during device remove and when
__IAVF_IN_REMOVE_TASK flag is set. Currently, the iavf_down function
adds operations for adminq that shouldn't be processed when the device
is in the __IAVF_REMOVE state.
Reproduction:
echo 4 > /sys/bus/pci/devices/0000:17:00.0/sriov_numvfs
ip link set dev ens1f0 vf 0 trust on
ip link set dev ens1f0 vf 1 trust on
ip link set dev ens1f0 vf 2 trust on
ip link set dev ens1f0 vf 3 trust on
ip link set dev ens1f0 vf 0 mac 00:22:33:44:55:66
ip link set dev ens1f0 vf 1 mac 00:22:33:44:55:67
ip link set dev ens1f0 vf 2 mac 00:22:33:44:55:68
ip link set dev ens1f0 vf 3 mac 00:22:33:44:55:69
echo 0000:17:02.0 > /sys/bus/pci/devices/0000\:17\:02.0/driver/unbind
echo 0000:17:02.1 > /sys/bus/pci/devices/0000\:17\:02.1/driver/unbind
echo 0000:17:02.2 > /sys/bus/pci/devices/0000\:17\:02.2/driver/unbind
echo 0000:17:02.3 > /sys/bus/pci/devices/0000\:17\:02.3/driver/unbind
sleep 10
echo 0000:17:02.0 > /sys/bus/pci/drivers/iavf/bind
echo 0000:17:02.1 > /sys/bus/pci/drivers/iavf/bind
echo 0000:17:02.2 > /sys/bus/pci/drivers/iavf/bind
echo 0000:17:02.3 > /sys/bus/pci/drivers/iavf/bind
modprobe vfio-pci
echo 8086 154c > /sys/bus/pci/drivers/vfio-pci/new_id
qemu-system-x86_64 -accel kvm -m 4096 -cpu host \
-drive file=centos9.qcow2,if=none,id=virtio-disk0 \
-device virtio-blk-pci,drive=virtio-disk0,bootindex=0 -smp 4 \
-device vfio-pci,host=17:02.0 -net none \
-device vfio-pci,host=17:02.1 -net none \
-device vfio-pci,host=17:02.2 -net none \
-device vfio-pci,host=17:02.3 -net none \
-daemonize -vnc :5
Current result:
There is a probability that the mac of VF in guest is inconsistent with
it in host
Expected result:
When passthrough NIC VF to guest, the VF in guest should always get
the same mac as it in host.
Fixes: 14756b2ae265 ("iavf: Fix __IAVF_RESETTING state usage")
Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7aa529a69e92b9aff585e569d5003f7c15d8d60b ]
There is possibility that ice_eswitch_port_start_xmit might be
called while some resources are still not allocated which might
cause NULL pointer dereference. Fix this by checking if switchdev
configuration was finished.
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bc6ed2fa24b14e40e1005488bbe11268ce7108fa ]
After commit 50f303496d92 ("igb: Enable SR-IOV after reinit"), removing
the igb module could hang or crash (depending on the machine) when the
module has been loaded with the max_vfs parameter set to some value != 0.
In case of one test machine with a dual port 82580, this hang occurred:
[ 232.480687] igb 0000:41:00.1: removed PHC on enp65s0f1
[ 233.093257] igb 0000:41:00.1: IOV Disabled
[ 233.329969] pcieport 0000:40:01.0: AER: Multiple Uncorrected (Non-Fatal) err0
[ 233.340302] igb 0000:41:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fata)
[ 233.352248] igb 0000:41:00.0: device [8086:1516] error status/mask=00100000
[ 233.361088] igb 0000:41:00.0: [20] UnsupReq (First)
[ 233.368183] igb 0000:41:00.0: AER: TLP Header: 40000001 0000040f cdbfc00c c
[ 233.376846] igb 0000:41:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fata)
[ 233.388779] igb 0000:41:00.1: device [8086:1516] error status/mask=00100000
[ 233.397629] igb 0000:41:00.1: [20] UnsupReq (First)
[ 233.404736] igb 0000:41:00.1: AER: TLP Header: 40000001 0000040f cdbfc00c c
[ 233.538214] pci 0000:41:00.1: AER: can't recover (no error_detected callback)
[ 233.538401] igb 0000:41:00.0: removed PHC on enp65s0f0
[ 233.546197] pcieport 0000:40:01.0: AER: device recovery failed
[ 234.157244] igb 0000:41:00.0: IOV Disabled
[ 371.619705] INFO: task irq/35-aerdrv:257 blocked for more than 122 seconds.
[ 371.627489] Not tainted 6.4.0-dirty #2
[ 371.632257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
[ 371.641000] task:irq/35-aerdrv state:D stack:0 pid:257 ppid:2 f0
[ 371.650330] Call Trace:
[ 371.653061] <TASK>
[ 371.655407] __schedule+0x20e/0x660
[ 371.659313] schedule+0x5a/0xd0
[ 371.662824] schedule_preempt_disabled+0x11/0x20
[ 371.667983] __mutex_lock.constprop.0+0x372/0x6c0
[ 371.673237] ? __pfx_aer_root_reset+0x10/0x10
[ 371.678105] report_error_detected+0x25/0x1c0
[ 371.682974] ? __pfx_report_normal_detected+0x10/0x10
[ 371.688618] pci_walk_bus+0x72/0x90
[ 371.692519] pcie_do_recovery+0xb2/0x330
[ 371.696899] aer_process_err_devices+0x117/0x170
[ 371.702055] aer_isr+0x1c0/0x1e0
[ 371.705661] ? __set_cpus_allowed_ptr+0x54/0xa0
[ 371.710723] ? __pfx_irq_thread_fn+0x10/0x10
[ 371.715496] irq_thread_fn+0x20/0x60
[ 371.719491] irq_thread+0xe6/0x1b0
[ 371.723291] ? __pfx_irq_thread_dtor+0x10/0x10
[ 371.728255] ? __pfx_irq_thread+0x10/0x10
[ 371.732731] kthread+0xe2/0x110
[ 371.736243] ? __pfx_kthread+0x10/0x10
[ 371.740430] ret_from_fork+0x2c/0x50
[ 371.744428] </TASK>
The reproducer was a simple script:
#!/bin/sh
for i in `seq 1 5`; do
modprobe -rv igb
modprobe -v igb max_vfs=1
sleep 1
modprobe -rv igb
done
It turned out that this could only be reproduce on 82580 (quad and
dual-port), but not on 82576, i350 and i210. Further debugging showed
that igb_enable_sriov()'s call to pci_enable_sriov() is failing, because
dev->is_physfn is 0 on 82580.
Prior to commit 50f303496d92 ("igb: Enable SR-IOV after reinit"),
igb_enable_sriov() jumped into the "err_out" cleanup branch. After this
commit it only returned the error code.
So the cleanup didn't take place, and the incorrect VF setup in the
igb_adapter structure fooled the igb driver into assuming that VFs have
been set up where no VF actually existed.
Fix this problem by cleaning up again if pci_enable_sriov() fails.
Fixes: 50f303496d92 ("igb: Enable SR-IOV after reinit")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3c44191dd76cf9c0cc49adaf34384cbd42ef8ad2 ]
The commit in fixes introduced flags to control the status of hardware
configuration while processing packets. At the same time another structure
is used to provide configuration of timestamper to user-space applications.
The way it was coded makes this structures go out of sync easily. The
repro is easy for 82599 chips:
[root@hostname ~]# hwstamp_ctl -i eth0 -r 12 -t 1
current settings:
tx_type 0
rx_filter 0
new settings:
tx_type 1
rx_filter 12
The eth0 device is properly configured to timestamp any PTPv2 events.
[root@hostname ~]# hwstamp_ctl -i eth0 -r 1 -t 1
current settings:
tx_type 1
rx_filter 12
SIOCSHWTSTAMP failed: Numerical result out of range
The requested time stamping mode is not supported by the hardware.
The error is properly returned because HW doesn't support all packets
timestamping. But the adapter->flags is cleared of timestamp flags
even though no HW configuration was done. From that point no RX timestamps
are received by user-space application. But configuration shows good
values:
[root@hostname ~]# hwstamp_ctl -i eth0
current settings:
tx_type 1
rx_filter 12
Fix the issue by applying new flags only when the HW was actually
configured.
Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6319685bdc8ad5310890add907b7c42f89302886 ]
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx
value between 64 and 80. All igb devices can use as low as 64 descriptors.
This change will unify igb with other drivers.
Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64")
Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Signed-off-by: Olga Zaborska <olga.zaborska@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8360717524a24a421c36ef8eb512406dbd42160a ]
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx
value between 64 and 80. All igbvf devices can use as low as 64 descriptors.
This change will unify igbvf with other drivers.
Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64")
Fixes: d4e0fe01a38a ("igbvf: add new driver to support 82576 virtual functions")
Signed-off-by: Olga Zaborska <olga.zaborska@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5aa48279712e1f134aac908acde4df798955a955 ]
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx
value between 64 and 80. All igc devices can use as low as 64 descriptors.
This change will unify igc with other drivers.
Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64")
Fixes: 0507ef8a0372 ("igc: Add transmit and receive fastpath and interrupt handlers")
Signed-off-by: Olga Zaborska <olga.zaborska@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fa09bc40b21a33937872c4c4cf0f266ec9fa4869 ]
Disable virtualization features on 82580 just as on i210/i211.
This avoids that virt functions are acidentally called on 82850.
Fixes: 55cac248caa4 ("igb: Add full support for 82580 devices")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit bb5ed01cd2428cd25b1c88a3a9cba87055eb289f upstream.
Increase the RX buffer size to 3K when the SBP bit is on. The size of
the RX buffer determines the number of pages allocated which may not
be sufficient for receive frames larger than the set MTU size.
Cc: stable@vger.kernel.org
Fixes: 89eaefb61dc9 ("igb: Support RX-ALL feature flag.")
Reported-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com>
Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0aacec49c29e7c5b1487e859b0c0a42388c34092 ]
The ice hardware has a synchronization mechanism used to drive the
simultaneous application of commands on both PHY ports and the source timer
in the MAC.
When issuing a sync via ice_ptp_exec_tmr_cmd(), the hardware will
simultaneously apply the commands programmed for the main timer and each
PHY port. Neither the main timer command register, nor the PHY port command
registers auto clear on command execution.
During the execution of a timer command intended for a single port on E822
devices, such as those used to configure a PHY during link up, the driver
is not correctly clearing the previous commands.
This results in unintentionally executing the last programmed command on
the main timer and other PHY ports whenever performing reconfiguration on
E822 ports after link up. This results in unintended side effects on other
timers, depending on what command was previously programmed.
To fix this, the driver must ensure that the main timer and all other PHY
ports are properly initialized to perform no action.
The enumeration for timer commands does not include an enumeration value
for doing nothing. Introduce ICE_PTP_NOP for this purpose. When writing a
timer command to hardware, leave the command bits set to zero which
indicates that no operation should be performed on that port.
Modify ice_ptp_one_port_cmd() to always initialize all ports. For all ports
other than the one being configured, write their timer command register to
ICE_PTP_NOP. This ensures that no side effect happens on the timer command.
To fix this for the PHY ports, modify ice_ptp_one_port_cmd() to always
initialize all other ports to ICE_PTP_NOP. This ensures that no side
effects happen on the other ports.
Call ice_ptp_src_cmd() with a command value if ICE_PTP_NOP in
ice_sync_phy_timer_e822() and ice_start_phy_timer_e822().
With both of these changes, the driver should no longer execute a stale
command on the main timer or another PHY port when reconfiguring one of the
PHY ports after link up.
Fixes: 3a7496234d17 ("ice: implement basic E822 PTP support")
Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e1e8a142c43336e3d25bfa1cb3a4ae7d00875c48 ]
Allow task's event buffer to be filled also in the case that it's size
is exactly the size of the message.
Fixes: d69ea414c9b4 ("ice: implement device flash update via devlink")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
Add check for pf->vf not being NULL before dereferencing
pf->vf[vsi->vf_id] in updating VSI filter sync.
Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted
in the condition for clearing promisc mode bit.
Fixes: c87c938f62d8 ("i40e: Add VF VLAN pruning")
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The IGC_PTM_CTRL_SHRT_CYC defines the time between two consecutive PTM
requests. The bit resolution of this field is six bits. That bit five was
missing in the mask. This patch comes to correct the typo in the
IGC_PTM_CTRL_SHRT_CYC macro.
Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://lore.kernel.org/r/20230821171721.2203572-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If ptp_clock_register() fails or CONFIG_PTP isn't enabled, avoid starting
PTP related workqueues.
In this way we can fix this:
BUG: unable to handle page fault for address: ffffc9000440b6f8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 100000067 P4D 100000067 PUD 1001e0067 PMD 107dc5067 PTE 0
Oops: 0000 [#1] PREEMPT SMP
[...]
Workqueue: events igb_ptp_overflow_check
RIP: 0010:igb_rd32+0x1f/0x60
[...]
Call Trace:
igb_ptp_read_82580+0x20/0x50
timecounter_read+0x15/0x60
igb_ptp_overflow_check+0x1a/0x50
process_one_work+0x1cb/0x3c0
worker_thread+0x53/0x3f0
? rescuer_thread+0x370/0x370
kthread+0x142/0x160
? kthread_associate_blkcg+0xc0/0xc0
ret_from_fork+0x1f/0x30
Fixes: 1f6e8178d685 ("igb: Prevent dropped Tx timestamps via work items and interrupts.")
Fixes: d339b1331616 ("igb: add PTP Hardware Clock code")
Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821171927.2203644-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
During stress test with attaching and detaching VF from KVM and
simultaneously changing VFs spoofcheck and trust there was a
NULL pointer dereference in ice_reset_vf that VF's VSI is null.
More than one instance of ice_reset_vf() can be running at a given
time. When we rebuild the VSI in ice_reset_vf, another reset can be
triaged from ice_service_task. In this case we can access the currently
uninitialized VSI and cause panic. The window for this racing condition
has been around for a long time but it's much worse after commit
227bf4500aaa ("ice: move VSI delete outside deconfig") because
the reset runs faster. ice_reset_vf() using vf->cfg_lock and when
we move this lock before accessing to the VF VSI, we can fix
BUG for all cases.
Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes
in ice_vsi_stop_all_rx_rings()
With our reproducer, we can hit BUG:
~8h before commit 227bf4500aaa ("ice: move VSI delete outside deconfig").
~20m after commit 227bf4500aaa ("ice: move VSI delete outside deconfig").
After this fix we are not able to reproduce it after ~48h
There was commit cf90b74341ee ("ice: Fix call trace with null VSI during
VF reset") which also tried to fix this issue, but it was only
partially resolved and the bug still exists.
[ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 6420.665382] #PF: supervisor read access in kernel mode
[ 6420.670521] #PF: error_code(0x0000) - not-present page
[ 6420.675659] PGD 0
[ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1
[ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022
[ 6420.698729] Workqueue: ice ice_service_task [ice]
[ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice]
[ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted
[ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83
[ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized
[ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246
[ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000
[ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828
[ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000
[ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0
[ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff
[ 6420.783742] FS: 0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000
[ 6420.791829] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0
[ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002)
[ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 6420.811841] PKRU: 55555554
[ 6420.811842] Call Trace:
[ 6420.811843] <TASK>
[ 6420.811844] ice_reset_vf+0x9a/0x450 [ice]
[ 6420.811876] ice_process_vflr_event+0x8f/0xc0 [ice]
[ 6420.841343] ice_service_task+0x23b/0x600 [ice]
[ 6420.845884] ? __schedule+0x212/0x550
[ 6420.849550] process_one_work+0x1e2/0x3b0
[ 6420.853563] ? rescuer_thread+0x390/0x390
[ 6420.857577] worker_thread+0x50/0x3a0
[ 6420.861242] ? rescuer_thread+0x390/0x390
[ 6420.865253] kthread+0xdd/0x100
[ 6420.868400] ? kthread_complete_and_exit+0x20/0x20
[ 6420.873194] ret_from_fork+0x1f/0x30
[ 6420.876774] </TASK>
[ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspk
r
Fixes: efe41860008e ("ice: Fix memory corruption in VF driver")
Fixes: f23df5220d2b ("ice: Fix spurious interrupt during removal of trusted VF")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This reverts commit 7255355a0636b4eff08d5e8139c77d98f151c4fc.
After this commit we are not able to attach VF to VM:
virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52
error: Failed to attach interface
error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable
ice_check_vf_ready_for_cfg() already contain waiting for reset.
New condition in ice_check_vf_ready_for_reset() causing only problems.
Fixes: 7255355a0636 ("ice: Fix ice VF reset during iavf initialization")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The driver is misconfiguring the hardware for some values of MTU such that
it could use multiple descriptors to receive a packet when it could have
simply used one.
Change the driver to use a round-up instead of the result of a shift, as
the shift can truncate the lower bits of the size, and result in the
problem noted above. It also aligns this driver with similar code in i40e.
The insidiousness of this problem is that everything works with the wrong
size, it's just not working as well as it could, as some MTU sizes end up
using two or more descriptors, and there is no way to tell that is
happening without looking at ice_trace or a bus analyzer.
Fixes: efc2214b6047 ("ice: Add support for XDP")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-08-16 (iavf, i40e)
This series contains updates to iavf and i40e drivers.
Piotr adds checks for unsupported Flow Director rules on iavf.
Andrii replaces incorrect 'write' messaging on read operations for i40e.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: fix misleading debug logs
iavf: fix FDIR rule fields masks validation
====================
Link: https://lore.kernel.org/r/20230816193308.1307535-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ADQ and switchdev are not supported simultaneously. Enabling both at the
same time can result in nullptr dereference.
To prevent this, check if ADQ is active when changing devlink mode to
switchdev mode, and check if switchdev is active when enabling ADQ.
Fixes: fbc7b27af0f9 ("ice: enable ndo_setup_tc support for mqprio_qdisc")
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230816193405.1307580-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Change "write" into the actual "read" word.
Change parameters description.
Fixes: 7073f46e443e ("i40e: Add AQ commands for NVM Update for X722")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Return an error if a field's mask is neither full nor empty. When a mask
is only partial the field is not being used for rule programming but it
gives a wrong impression it is used. Fix by returning an error on any
partial mask to make it clear they are not supported.
The ip_ver assignment is moved earlier in code to allow using it in
iavf_validate_fdir_fltr_masks.
Fixes: 527691bf0682 ("iavf: Support IPv4 Flow Director filters")
Fixes: e90cbc257a6f ("iavf: Support IPv6 Flow Director filters")
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add fdir_fltr_lock locking in unprotected places.
The change in iavf_fdir_is_dup_fltr adds a spinlock around a loop which
iterates over all filters and looks for a duplicate. The filter can be
removed from list and freed from memory at the same time it's being
compared. All other places where filters are deleted are already
protected with spinlock.
The remaining changes protect adapter->fdir_active_fltr variable so now
all its uses are under a spinlock.
Fixes: 527691bf0682 ("iavf: Support IPv4 Flow Director filters")
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807205011.3129224-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Access to shared variables through hrtimer requires locking in order
to protect the variables because actions to write into these variables
(oper_gate_closed, admin_gate_closed, and qbv_transition) might potentially
occur simultaneously. This patch provides a locking mechanisms to avoid
such scenarios.
Fixes: 175c241288c0 ("igc: Fix TX Hang issue when QBV Gate is closed")
Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230807205129.3129346-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
During qdisc create/delete, it is necessary to rebuild the queue
of VSIs. An error occurred because the VSIs created by RDMA were
still active.
Added check if RDMA is active. If yes, it disallows qdisc changes
and writes a message in the system logs.
Fixes: 348048e724a0 ("ice: Implement iidc operations")
Signed-off-by: Rafal Rogalski <rafalx.rogalski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230728171243.2446101-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Xeon validation group has been carrying out some loaded tests
with various HW configurations, and they have seen some transmit
queue time out happening during the test. This will cause the
reset adapter function to be called by igc_tx_timeout().
Similar race conditions may arise when the interface is being brought
down and up in igc_reinit_locked(), an interrupt being generated, and
igc_clean_tx_irq() being called to complete the TX.
When the igc_tx_timeout() function is invoked, this patch will turn
off all TX ring HW queues during igc_down() process. TX ring HW queues
will be activated again during the igc_configure_tx_ring() process
when performing the igc_up() procedure later.
This patch also moved existing igc_disable_tx_ring_hw() to avoid using
forward declaration.
Kernel trace:
[ 7678.747813] ------------[ cut here ]------------
[ 7678.757914] NETDEV WATCHDOG: enp1s0 (igc): transmit queue 2 timed out
[ 7678.770117] WARNING: CPU: 0 PID: 13 at net/sched/sch_generic.c:525 dev_watchdog+0x1ae/0x1f0
[ 7678.784459] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype nft_compat
nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO) rktpm(PO)
cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO) svfs_pci_hotplug(PO)
vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO) svheartbeat(PO) ioapic(PO)
sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO) smbus(PO) spiflash_cdf(PO) arden(PO)
dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO) pch(PO) sviotargets(PO) svbdf(PO) svmem(PO)
svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO) svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO)
fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O) ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO)
regsupport(O) libnvdimm nls_cp437 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel
snd_intel_dspcfg snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci
[ 7678.784496] input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm fuse backlight
configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic pegasus mmc_block usbhid
mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa scsi_transport_sas e1000e e1000 e100 ax88179_178a
usbnet xhci_pci sd_mod xhci_hcd t10_pi crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore
crct10dif_generic ptp crct10dif_common usb_common pps_core
[ 7679.200403] RIP: 0010:dev_watchdog+0x1ae/0x1f0
[ 7679.210201] Code: 28 e9 53 ff ff ff 4c 89 e7 c6 05 06 42 b9 00 01 e8 17 d1 fb ff 44 89 e9 4c
89 e6 48 c7 c7 40 ad fb 81 48 89 c2 e8 52 62 82 ff <0f> 0b e9 72 ff ff ff 65 8b 05 80 7d 7c 7e
89 c0 48 0f a3 05 0a c1
[ 7679.245438] RSP: 0018:ffa00000001f7d90 EFLAGS: 00010282
[ 7679.256021] RAX: 0000000000000000 RBX: ff11000109938440 RCX: 0000000000000000
[ 7679.268710] RDX: ff11000361e26cd8 RSI: ff11000361e1b880 RDI: ff11000361e1b880
[ 7679.281314] RBP: ffa00000001f7da8 R08: ff1100035f8fffe8 R09: 0000000000027ffb
[ 7679.293840] R10: 0000000000001f0a R11: ff1100035f840000 R12: ff11000109938000
[ 7679.306276] R13: 0000000000000002 R14: dead000000000122 R15: ffa00000001f7e18
[ 7679.318648] FS: 0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
[ 7679.332064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7679.342757] CR2: 00007ffff7fca168 CR3: 000000013b08a006 CR4: 0000000000471ef8
[ 7679.354984] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7679.367207] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 7679.379370] PKRU: 55555554
[ 7679.386446] Call Trace:
[ 7679.393152] <TASK>
[ 7679.399363] ? __pfx_dev_watchdog+0x10/0x10
[ 7679.407870] call_timer_fn+0x31/0x110
[ 7679.415698] expire_timers+0xb2/0x120
[ 7679.423403] run_timer_softirq+0x179/0x1e0
[ 7679.431532] ? __schedule+0x2b1/0x820
[ 7679.439078] __do_softirq+0xd1/0x295
[ 7679.446426] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 7679.454867] run_ksoftirqd+0x22/0x30
[ 7679.462058] smpboot_thread_fn+0xb7/0x160
[ 7679.469670] kthread+0xcd/0xf0
[ 7679.476097] ? __pfx_kthread+0x10/0x10
[ 7679.483211] ret_from_fork+0x29/0x50
[ 7679.490047] </TASK>
[ 7679.495204] ---[ end trace 0000000000000000 ]---
[ 7679.503179] igc 0000:01:00.0 enp1s0: Register Dump
[ 7679.511230] igc 0000:01:00.0 enp1s0: Register Name Value
[ 7679.519892] igc 0000:01:00.0 enp1s0: CTRL 181c0641
[ 7679.528782] igc 0000:01:00.0 enp1s0: STATUS 40280683
[ 7679.537551] igc 0000:01:00.0 enp1s0: CTRL_EXT 10000040
[ 7679.546284] igc 0000:01:00.0 enp1s0: MDIC 180a3800
[ 7679.554942] igc 0000:01:00.0 enp1s0: ICR 00000081
[ 7679.563503] igc 0000:01:00.0 enp1s0: RCTL 04408022
[ 7679.571963] igc 0000:01:00.0 enp1s0: RDLEN[0-3] 00001000 00001000 00001000 00001000
[ 7679.583075] igc 0000:01:00.0 enp1s0: RDH[0-3] 00000068 000000b6 0000000f 00000031
[ 7679.594162] igc 0000:01:00.0 enp1s0: RDT[0-3] 00000066 000000b2 0000000e 00000030
[ 7679.605174] igc 0000:01:00.0 enp1s0: RXDCTL[0-3] 02040808 02040808 02040808 02040808
[ 7679.616196] igc 0000:01:00.0 enp1s0: RDBAL[0-3] 1bb7c000 1bb7f000 1bb82000 0ef33000
[ 7679.627242] igc 0000:01:00.0 enp1s0: RDBAH[0-3] 00000001 00000001 00000001 00000001
[ 7679.638256] igc 0000:01:00.0 enp1s0: TCTL a503f0fa
[ 7679.646607] igc 0000:01:00.0 enp1s0: TDBAL[0-3] 2ba4a000 1bb6f000 1bb74000 1bb79000
[ 7679.657609] igc 0000:01:00.0 enp1s0: TDBAH[0-3] 00000001 00000001 00000001 00000001
[ 7679.668551] igc 0000:01:00.0 enp1s0: TDLEN[0-3] 00001000 00001000 00001000 00001000
[ 7679.679470] igc 0000:01:00.0 enp1s0: TDH[0-3] 000000a7 0000002d 000000bf 000000d9
[ 7679.690406] igc 0000:01:00.0 enp1s0: TDT[0-3] 000000a7 0000002d 000000bf 000000d9
[ 7679.701264] igc 0000:01:00.0 enp1s0: TXDCTL[0-3] 02100108 02100108 02100108 02100108
[ 7679.712123] igc 0000:01:00.0 enp1s0: Reset adapter
[ 7683.085967] igc 0000:01:00.0 enp1s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 8086.945561] ------------[ cut here ]------------
Entering kdb (current=0xffffffff8220b200, pid 0) on processor 0
Oops: (null) due to oops @ 0xffffffff81573888
RIP: 0010:dql_completed+0x148/0x160
Code: c9 00 48 89 57 58 e9 46 ff ff ff 45 85 e4 41 0f 95 c4 41 39 db 0f 95
c1 41 84 cc 74 05 45 85 ed 78 0a 44 89 c1 e9 27 ff ff ff <0f> 0b 01 f6 44 89
c1 29 f1 0f 48 ca eb 8c cc cc cc cc cc cc cc cc
RSP: 0018:ffa0000000003e00 EFLAGS: 00010287
RAX: 000000000000006c RBX: ffa0000003eb0f78 RCX: ff11000109938000
RDX: 0000000000000003 RSI: 0000000000000160 RDI: ff110001002e9480
RBP: ffa0000000003ed8 R08: ff110001002e93c0 R09: ffa0000000003d28
R10: 0000000000007cc0 R11: 0000000000007c54 R12: 00000000ffffffd9
R13: ff1100037039cb00 R14: 00000000ffffffd9 R15: ff1100037039c048
FS: 0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? igc_poll+0x1a9/0x14d0 [igc]
__napi_poll+0x2e/0x1b0
net_rx_action+0x126/0x250
__do_softirq+0xd1/0x295
irq_exit_rcu+0xc5/0xf0
common_interrupt+0x86/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x27/0x40
RIP: 0010:cpuidle_enter_state+0xd3/0x3e0
Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00 31 ff e8 1b
de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00 49 63 cf
4c 2b 75 c8 48 8d 04 49 48 89 ca 48 8d
RSP: 0018:ffffffff82203df0 EFLAGS: 00000202
RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f
RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000
RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018
R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000
R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002
cpuidle_enter+0x2e/0x50
call_cpuidle+0x23/0x40
do_idle+0x1be/0x220
cpu_startup_entry+0x20/0x30
rest_init+0xb5/0xc0
arch_call_rest_init+0xe/0x30
start_kernel+0x448/0x760
x86_64_start_kernel+0x109/0x150
secondary_startup_64_no_verify+0xe0/0xeb
</TASK>
more>
[0]kdb>
[0]kdb>
[0]kdb> go
Catastrophic error detected
kdb_continue_catastrophic=0, type go a second time if you really want to
continue
[0]kdb> go
Catastrophic error detected
kdb_continue_catastrophic=0, attempting to continue
[ 8086.955689] refcount_t: underflow; use-after-free.
[ 8086.955697] WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0xc2/0x110
[ 8086.955706] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype nft_compat
nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO) rktpm(PO)
cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO)
svfs_pci_hotplug(PO) vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO)
svheartbeat(PO) ioapic(PO) sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO)
smbus(PO) spiflash_cdf(PO) arden(PO) dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO)
pch(PO) sviotargets(PO) svbdf(PO) svmem(PO) svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO)
svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO) fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O)
ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO) regsupport(O) libnvdimm nls_cp437
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg
snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci
[ 8086.955751] input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm
fuse backlight configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic
pegasus mmc_block usbhid mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa
scsi_transport_sas e1000e e1000 e100 ax88179_178a usbnet xhci_pci sd_mod xhci_hcd t10_pi
crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore crct10dif_generic ptp crct10dif_common
usb_common pps_core
[ 8086.955784] RIP: 0010:refcount_warn_saturate+0xc2/0x110
[ 8086.955788] Code: 01 e8 82 e7 b4 ff 0f 0b 5d c3 cc cc cc cc 80 3d 68 c6 eb 00 00 75 81
48 c7 c7 a0 87 f6 81 c6 05 58 c6 eb 00 01 e8 5e e7 b4 ff <0f> 0b 5d c3 cc cc cc cc 80 3d
42 c6 eb 00 00 0f 85 59 ff ff ff 48
[ 8086.955790] RSP: 0018:ffa0000000003da0 EFLAGS: 00010286
[ 8086.955793] RAX: 0000000000000000 RBX: ff1100011da40ee0 RCX: ff11000361e1b888
[ 8086.955794] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ff11000361e1b880
[ 8086.955795] RBP: ffa0000000003da0 R08: 80000000ffff9f45 R09: ffa0000000003d28
[ 8086.955796] R10: ff1100035f840000 R11: 0000000000000028 R12: ff11000319ff8000
[ 8086.955797] R13: ff1100011bb79d60 R14: 00000000ffffffd6 R15: ff1100037039cb00
[ 8086.955798] FS: 0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
[ 8086.955800] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8086.955801] CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8
[ 8086.955803] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8086.955803] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 8086.955804] PKRU: 55555554
[ 8086.955805] Call Trace:
[ 8086.955806] <IRQ>
[ 8086.955808] tcp_wfree+0x112/0x130
[ 8086.955814] skb_release_head_state+0x24/0xa0
[ 8086.955818] napi_consume_skb+0x9c/0x160
[ 8086.955821] igc_poll+0x5d8/0x14d0 [igc]
[ 8086.955835] __napi_poll+0x2e/0x1b0
[ 8086.955839] net_rx_action+0x126/0x250
[ 8086.955843] __do_softirq+0xd1/0x295
[ 8086.955846] irq_exit_rcu+0xc5/0xf0
[ 8086.955851] common_interrupt+0x86/0xa0
[ 8086.955857] </IRQ>
[ 8086.955857] <TASK>
[ 8086.955858] asm_common_interrupt+0x27/0x40
[ 8086.955862] RIP: 0010:cpuidle_enter_state+0xd3/0x3e0
[ 8086.955866] Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00 31 ff e8
1b de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00 49 63 cf 4c 2b 75
c8 48 8d 04 49 48 89 ca 48 8d
[ 8086.955867] RSP: 0018:ffffffff82203df0 EFLAGS: 00000202
[ 8086.955869] RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f
[ 8086.955870] RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000
[ 8086.955871] RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018
[ 8086.955872] R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000
[ 8086.955873] R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002
[ 8086.955875] cpuidle_enter+0x2e/0x50
[ 8086.955880] call_cpuidle+0x23/0x40
[ 8086.955884] do_idle+0x1be/0x220
[ 8086.955887] cpu_startup_entry+0x20/0x30
[ 8086.955889] rest_init+0xb5/0xc0
[ 8086.955892] arch_call_rest_init+0xe/0x30
[ 8086.955895] start_kernel+0x448/0x760
[ 8086.955898] x86_64_start_kernel+0x109/0x150
[ 8086.955900] secondary_startup_64_no_verify+0xe0/0xeb
[ 8086.955904] </TASK>
[ 8086.955904] ---[ end trace 0000000000000000 ]---
[ 8086.955912] ------------[ cut here ]------------
[ 8086.955913] kernel BUG at lib/dynamic_queue_limits.c:27!
[ 8086.955918] invalid opcode: 0000 [#1] SMP
[ 8086.955922] RIP: 0010:dql_completed+0x148/0x160
[ 8086.955925] Code: c9 00 48 89 57 58 e9 46 ff ff ff 45 85 e4 41 0f 95 c4 41 39 db
0f 95 c1 41 84 cc 74 05 45 85 ed 78 0a 44 89 c1 e9 27 ff ff ff <0f> 0b 01 f6 44 89
c1 29 f1 0f 48 ca eb 8c cc cc cc cc cc cc cc cc
[ 8086.955927] RSP: 0018:ffa0000000003e00 EFLAGS: 00010287
[ 8086.955928] RAX: 000000000000006c RBX: ffa0000003eb0f78 RCX: ff11000109938000
[ 8086.955929] RDX: 0000000000000003 RSI: 0000000000000160 RDI: ff110001002e9480
[ 8086.955930] RBP: ffa0000000003ed8 R08: ff110001002e93c0 R09: ffa0000000003d28
[ 8086.955931] R10: 0000000000007cc0 R11: 0000000000007c54 R12: 00000000ffffffd9
[ 8086.955932] R13: ff1100037039cb00 R14: 00000000ffffffd9 R15: ff1100037039c048
[ 8086.955933] FS: 0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
[ 8086.955934] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8086.955935] CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8
[ 8086.955936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8086.955937] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 8086.955938] PKRU: 55555554
[ 8086.955939] Call Trace:
[ 8086.955939] <IRQ>
[ 8086.955940] ? igc_poll+0x1a9/0x14d0 [igc]
[ 8086.955949] __napi_poll+0x2e/0x1b0
[ 8086.955952] net_rx_action+0x126/0x250
[ 8086.955956] __do_softirq+0xd1/0x295
[ 8086.955958] irq_exit_rcu+0xc5/0xf0
[ 8086.955961] common_interrupt+0x86/0xa0
[ 8086.955964] </IRQ>
[ 8086.955965] <TASK>
[ 8086.955965] asm_common_interrupt+0x27/0x40
[ 8086.955968] RIP: 0010:cpuidle_enter_state+0xd3/0x3e0
[ 8086.955971] Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00
31 ff e8 1b de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00
49 63 cf 4c 2b 75 c8 48 8d 04 49 48 89 ca 48 8d
[ 8086.955972] RSP: 0018:ffffffff82203df0 EFLAGS: 00000202
[ 8086.955973] RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f
[ 8086.955974] RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000
[ 8086.955974] RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018
[ 8086.955975] R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000
[ 8086.955976] R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002
[ 8086.955978] cpuidle_enter+0x2e/0x50
[ 8086.955981] call_cpuidle+0x23/0x40
[ 8086.955984] do_idle+0x1be/0x220
[ 8086.955985] cpu_startup_entry+0x20/0x30
[ 8086.955987] rest_init+0xb5/0xc0
[ 8086.955990] arch_call_rest_init+0xe/0x30
[ 8086.955992] start_kernel+0x448/0x760
[ 8086.955994] x86_64_start_kernel+0x109/0x150
[ 8086.955996] secondary_startup_64_no_verify+0xe0/0xeb
[ 8086.955998] </TASK>
[ 8086.955999] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype
nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO)
rktpm(PO) cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO)
svfs_pci_hotplug(PO) vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO)
svheartbeat(PO) ioapic(PO) sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO)
smbus(PO) spiflash_cdf(PO) arden(PO) dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO)
pch(PO) sviotargets(PO) svbdf(PO) svmem(PO) svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO)
svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO) fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O)
ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO) regsupport(O) libnvdimm nls_cp437
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg
snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci
[ 8086.956029] input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm
fuse backlight configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic
pegasus mmc_block usbhid mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa
scsi_transport_sas e1000e e1000 e100 ax88179_178a usbnet xhci_pci sd_mod xhci_hcd t10_pi
crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore crct10dif_generic ptp crct10dif_common
usb_common pps_core
[16762.543675] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.593 msecs
[16762.543678] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.595 msecs
[16762.543673] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.495 msecs
[16762.543679] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.599 msecs
[16762.543678] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.598 msecs
[16762.543690] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.605 msecs
[16762.543684] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.599 msecs
[16762.543693] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.613 msecs
[16762.543784] ---[ end trace 0000000000000000 ]---
[16762.849099] RIP: 0010:dql_completed+0x148/0x160
PANIC: Fatal exception in interrupt
Fixes: 9b275176270e ("igc: Add ndo_tx_timeout support")
Tested-by: Alejandra Victoria Alcaraz <alejandra.victoria.alcaraz@intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-07-21 (i40e, iavf)
This series contains updates to i40e and iavf drivers.
Wang Ming corrects an error check on i40e.
Jake unlocks crit_lock on allocation failure to prevent deadlock and
stops re-enabling of interrupts when it's not intended for iavf.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: check for removal state before IAVF_FLAG_PF_COMMS_FAILED
iavf: fix potential deadlock on allocation failure
i40e: Fix an NULL vs IS_ERR() bug for debugfs_create_dir()
====================
Link: https://lore.kernel.org/r/20230721155812.1292752-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix ethtool FDIR logic to not use memory after its release.
In the ice_ethtool_fdir.c file there are 2 spots where code can
refer to pointers which may be missing.
In the ice_cfg_fdir_xtrct_seq() function seg may be freed but
even then may be still used by memcpy(&tun_seg[1], seg, sizeof(*seg)).
In the ice_add_fdir_ethtool() function struct ice_fdir_fltr *input
may first fail to be added via ice_fdir_update_list_entry() but then
may be deleted by ice_fdir_update_list_entry.
Terminate in both cases when the returned value of the previous
operation is other than 0, free memory and don't use it anymore.
Reported-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2208423
Fixes: cac2a27cd9ab ("ice: Support IPv4 Flow Director filters")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230721155854.1292805-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
VXLAN-GPE does not add an extra inner Ethernet header. Take that into
account when calculating header length.
This causes problems in skb_tunnel_check_pmtu, where incorrect PMTU is
cached.
In the collect_md mode (which is the only mode that VXLAN-GPE
supports), there's no magic auto-setting of the tunnel interface MTU.
It can't be, since the destination and thus the underlying interface
may be different for each packet.
So, the administrator is responsible for setting the correct tunnel
interface MTU. Apparently, the administrators are capable enough to
calculate that the maximum MTU for VXLAN-GPE is (their_lower_MTU - 36).
They set the tunnel interface MTU to 1464. If you run a TCP stream over
such interface, it's then segmented according to the MTU 1464, i.e.
producing 1514 bytes frames. Which is okay, this still fits the lower
MTU.
However, skb_tunnel_check_pmtu (called from vxlan_xmit_one) uses 50 as
the header size and thus incorrectly calculates the frame size to be
1528. This leads to ICMP too big message being generated (locally),
PMTU of 1450 to be cached and the TCP stream to be resegmented.
The fix is to use the correct actual header size, especially for
skb_tunnel_check_pmtu calculation.
Fixes: e1e5314de08ba ("vxlan: implement GPE")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In iavf_adminq_task(), if the function can't acquire the
adapter->crit_lock, it checks if the driver is removing. If so, it simply
exits without re-enabling the interrupt. This is done to ensure that the
task stops processing as soon as possible once the driver is being removed.
However, if the IAVF_FLAG_PF_COMMS_FAILED is set, the function checks this
before attempting to acquire the lock. In this case, the function exits
early and re-enables the interrupt. This will happen even if the driver is
already removing.
Avoid this, by moving the check to after the adapter->crit_lock is
acquired. This way, if the driver is removing, we will not re-enable the
interrupt.
Fixes: fc2e6b3b132a ("iavf: Rework mutexes for better synchronisation")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In iavf_adminq_task(), if kzalloc() fails to allocate the event.msg_buf,
the function will exit without releasing the adapter->crit_lock.
This is unlikely, but if it happens, the next access to that mutex will
deadlock.
Fix this by moving the unlock to the end of the function, and adding a new
label to allow jumping to the unlock portion of the function exit flow.
Fixes: fc2e6b3b132a ("iavf: Rework mutexes for better synchronisation")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The debugfs_create_dir() function returns error pointers.
It never returns NULL. Most incorrect error checks were fixed,
but the one in i40e_dbg_init() was forgotten.
Fix the remaining error check.
Fixes: 02e9c290814c ("i40e: debugfs interface")
Signed-off-by: Wang Ming <machel@vivo.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-07-17 (iavf)
This series contains updates to iavf driver only.
Ding Hui fixes use-after-free issue by calling netif_napi_del() for all
allocated q_vectors. He also resolves out-of-bounds issue by not
updating to new values when timeout is encountered.
Marcin and Ahmed change the way resets are handled so that the callback
operating under the RTNL lock will wait for the reset to finish, the
rtnl_lock sensitive functions in reset flow will schedule the netdev update
for later in order to remove circular dependency with the critical lock.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: fix reset task race with iavf_remove()
iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies
Revert "iavf: Do not restart Tx queues after reset task failure"
Revert "iavf: Detach device during reset task"
iavf: Wait for reset in callbacks which trigger it
iavf: use internal state to free traffic IRQs
iavf: Fix out-of-bounds when setting channels on remove
iavf: Fix use-after-free in free_netdev
====================
Link: https://lore.kernel.org/r/20230717175205.3217774-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In normal operation, each populated queue item has
next_to_watch pointing to the last TX desc of the packet,
while each cleaned item has it set to 0. In particular,
next_to_use that points to the next (necessarily clean)
item to use has next_to_watch set to 0.
When the TX queue is used both by an application using
AF_XDP with ZEROCOPY as well as a second non-XDP application
generating high traffic, the queue pointers can get in
an invalid state where next_to_use points to an item
where next_to_watch is NOT set to 0.
However, the implementation assumes at several places
that this is never the case, so if it does hold,
bad things happen. In particular, within the loop inside
of igc_clean_tx_irq(), next_to_clean can overtake next_to_use.
Finally, this prevents any further transmission via
this queue and it never gets unblocked or signaled.
Secondly, if the queue is in this garbled state,
the inner loop of igc_clean_tx_ring() will never terminate,
completely hogging a CPU core.
The reason is that igc_xdp_xmit_zc() reads next_to_use
before acquiring the lock, and writing it back
(potentially unmodified) later. If it got modified
before locking, the outdated next_to_use is written
pointing to an item that was already used elsewhere
(and thus next_to_watch got written).
Fixes: 9acf59a752d4 ("igc: Enable TX via AF_XDP zero-copy")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The reset task is currently scheduled from the watchdog or adminq tasks.
First, all direct calls to schedule the reset task are replaced with the
iavf_schedule_reset(), which is modified to accept the flag showing the
type of reset.
To prevent the reset task from starting once iavf_remove() starts, we need
to check the __IAVF_IN_REMOVE_TASK bit before we schedule it. This is now
easily added to iavf_schedule_reset().
Finally, remove the check for IAVF_FLAG_RESET_NEEDED in the watchdog task.
It is redundant since all callers who set the flag immediately schedules
the reset task.
Fixes: 3ccd54ef44eb ("iavf: Fix init state closure on remove")
Fixes: 14756b2ae265 ("iavf: Fix __IAVF_RESETTING state usage")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
A driver's lock (crit_lock) is used to serialize all the driver's tasks.
Lockdep, however, shows a circular dependency between rtnl and
crit_lock. This happens when an ndo that already holds the rtnl requests
the driver to reset, since the reset task (in some paths) tries to grab
rtnl to either change real number of queues of update netdev features.
[566.241851] ======================================================
[566.241893] WARNING: possible circular locking dependency detected
[566.241936] 6.2.14-100.fc36.x86_64+debug #1 Tainted: G OE
[566.241984] ------------------------------------------------------
[566.242025] repro.sh/2604 is trying to acquire lock:
[566.242061] ffff9280fc5ceee8 (&adapter->crit_lock){+.+.}-{3:3}, at: iavf_close+0x3c/0x240 [iavf]
[566.242167]
but task is already holding lock:
[566.242209] ffffffff9976d350 (rtnl_mutex){+.+.}-{3:3}, at: iavf_remove+0x6b5/0x730 [iavf]
[566.242300]
which lock already depends on the new lock.
[566.242353]
the existing dependency chain (in reverse order) is:
[566.242401]
-> #1 (rtnl_mutex){+.+.}-{3:3}:
[566.242451] __mutex_lock+0xc1/0xbb0
[566.242489] iavf_init_interrupt_scheme+0x179/0x440 [iavf]
[566.242560] iavf_watchdog_task+0x80b/0x1400 [iavf]
[566.242627] process_one_work+0x2b3/0x560
[566.242663] worker_thread+0x4f/0x3a0
[566.242696] kthread+0xf2/0x120
[566.242730] ret_from_fork+0x29/0x50
[566.242763]
-> #0 (&adapter->crit_lock){+.+.}-{3:3}:
[566.242815] __lock_acquire+0x15ff/0x22b0
[566.242869] lock_acquire+0xd2/0x2c0
[566.242901] __mutex_lock+0xc1/0xbb0
[566.242934] iavf_close+0x3c/0x240 [iavf]
[566.242997] __dev_close_many+0xac/0x120
[566.243036] dev_close_many+0x8b/0x140
[566.243071] unregister_netdevice_many_notify+0x165/0x7c0
[566.243116] unregister_netdevice_queue+0xd3/0x110
[566.243157] iavf_remove+0x6c1/0x730 [iavf]
[566.243217] pci_device_remove+0x33/0xa0
[566.243257] device_release_driver_internal+0x1bc/0x240
[566.243299] pci_stop_bus_device+0x6c/0x90
[566.243338] pci_stop_and_remove_bus_device+0xe/0x20
[566.243380] pci_iov_remove_virtfn+0xd1/0x130
[566.243417] sriov_disable+0x34/0xe0
[566.243448] ice_free_vfs+0x2da/0x330 [ice]
[566.244383] ice_sriov_configure+0x88/0xad0 [ice]
[566.245353] sriov_numvfs_store+0xde/0x1d0
[566.246156] kernfs_fop_write_iter+0x15e/0x210
[566.246921] vfs_write+0x288/0x530
[566.247671] ksys_write+0x74/0xf0
[566.248408] do_syscall_64+0x58/0x80
[566.249145] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[566.249886]
other info that might help us debug this:
[566.252014] Possible unsafe locking scenario:
[566.253432] CPU0 CPU1
[566.254118] ---- ----
[566.254800] lock(rtnl_mutex);
[566.255514] lock(&adapter->crit_lock);
[566.256233] lock(rtnl_mutex);
[566.256897] lock(&adapter->crit_lock);
[566.257388]
*** DEADLOCK ***
The deadlock can be triggered by a script that is continuously resetting
the VF adapter while doing other operations requiring RTNL, e.g:
while :; do
ip link set $VF up
ethtool --set-channels $VF combined 2
ip link set $VF down
ip link set $VF up
ethtool --set-channels $VF combined 4
ip link set $VF down
done
Any operation that triggers a reset can substitute "ethtool --set-channles"
As a fix, add a new task "finish_config" that do all the work which
needs rtnl lock. With the exception of iavf_remove(), all work that
require rtnl should be called from this task.
As for iavf_remove(), at the point where we need to call
unregister_netdevice() (and grab rtnl_lock), we make sure the finish_config
task is not running (cancel_work_sync()) to safely grab rtnl. Subsequent
finish_config work cannot restart after that since the task is guarded
by the __IAVF_IN_REMOVE_TASK bit in iavf_schedule_finish_config().
Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This reverts commit 08f1c147b7265245d67321585c68a27e990e0c4b.
Netdev is no longer being detached during reset, so this fix can be
reverted. We leave the removal of "hacky" IFF_UP flag update.
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This reverts commit aa626da947e9cd30c4cf727493903e1adbb2c0a0.
Detaching device during reset was not fully fixing the rtnl locking issue,
as there could be a situation where callback was already in progress before
detaching netdev.
Furthermore, detaching netdevice causes TX timeouts if traffic is running.
To reproduce:
ip netns exec ns1 iperf3 -c $PEER_IP -t 600 --logfile /dev/null &
while :; do
for i in 200 7000 400 5000 300 3000 ; do
ip netns exec ns1 ip link set $VF1 mtu $i
sleep 2
done
sleep 10
done
Currently, callbacks such as iavf_change_mtu() wait for the reset.
If the reset fails to acquire the rtnl_lock, they schedule the netdev
update for later while continuing the reset flow. Operations like MTU
changes are performed under the rtnl_lock. Therefore, when the operation
finishes, another callback that uses rtnl_lock can start.
Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
There was a fail when trying to add the interface to bonding
right after changing the MTU on the interface. It was caused
by bonding interface unable to open the interface due to
interface being in __RESETTING state because of MTU change.
Add new reset_waitqueue to indicate that reset has finished.
Add waiting for reset to finish in callbacks which trigger hw reset:
iavf_set_priv_flags(), iavf_change_mtu() and iavf_set_ringparam().
We use a 5000ms timeout period because on Hyper-V based systems,
this operation takes around 3000-4000ms. In normal circumstances,
it doesn't take more than 500ms to complete.
Add a function iavf_wait_for_reset() to reuse waiting for reset code and
use it also in iavf_set_channels(), which already waits for reset.
We don't use error handling in iavf_set_channels() as this could
cause the device to be in incorrect state if the reset was scheduled
but hit timeout or the waitng function was interrupted by a signal.
Fixes: 4e5e6b5d9d13 ("iavf: Fix return of set the new channel count")
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Co-developed-by: Dawid Wesierski <dawidx.wesierski@intel.com>
Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
If the system tries to close the netdev while iavf_reset_task() is
running, __LINK_STATE_START will be cleared and netif_running() will
return false in iavf_reinit_interrupt_scheme(). This will result in
iavf_free_traffic_irqs() not being called and a leak as follows:
[7632.489326] remove_proc_entry: removing non-empty directory 'irq/999', leaking at least 'iavf-enp24s0f0v0-TxRx-0'
[7632.490214] WARNING: CPU: 0 PID: 10 at fs/proc/generic.c:718 remove_proc_entry+0x19b/0x1b0
is shown when pci_disable_msix() is later called. Fix by using the
internal adapter state. The traffic IRQs will always exist if
state == __IAVF_RUNNING.
Fixes: 5b36e8d04b44 ("i40evf: Enable VF to request an alternate queue allocation")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
If we set channels greater during iavf_remove(), and waiting reset done
would be timeout, then returned with error but changed num_active_queues
directly, that will lead to OOB like the following logs. Because the
num_active_queues is greater than tx/rx_rings[] allocated actually.
Reproducer:
[root@host ~]# cat repro.sh
#!/bin/bash
pf_dbsf="0000:41:00.0"
vf0_dbsf="0000:41:02.0"
g_pids=()
function do_set_numvf()
{
echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
sleep $((RANDOM%3+1))
echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
sleep $((RANDOM%3+1))
}
function do_set_channel()
{
local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/)
[ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; }
ifconfig $nic 192.168.18.5 netmask 255.255.255.0
ifconfig $nic up
ethtool -L $nic combined 1
ethtool -L $nic combined 4
sleep $((RANDOM%3))
}
function on_exit()
{
local pid
for pid in "${g_pids[@]}"; do
kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null
done
g_pids=()
}
trap "on_exit; exit" EXIT
while :; do do_set_numvf ; done &
g_pids+=($!)
while :; do do_set_channel ; done &
g_pids+=($!)
wait
Result:
[ 3506.152887] iavf 0000:41:02.0: Removing device
[ 3510.400799] ==================================================================
[ 3510.400820] BUG: KASAN: slab-out-of-bounds in iavf_free_all_tx_resources+0x156/0x160 [iavf]
[ 3510.400823] Read of size 8 at addr ffff88b6f9311008 by task repro.sh/55536
[ 3510.400823]
[ 3510.400830] CPU: 101 PID: 55536 Comm: repro.sh Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1
[ 3510.400832] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021
[ 3510.400835] Call Trace:
[ 3510.400851] dump_stack+0x71/0xab
[ 3510.400860] print_address_description+0x6b/0x290
[ 3510.400865] ? iavf_free_all_tx_resources+0x156/0x160 [iavf]
[ 3510.400868] kasan_report+0x14a/0x2b0
[ 3510.400873] iavf_free_all_tx_resources+0x156/0x160 [iavf]
[ 3510.400880] iavf_remove+0x2b6/0xc70 [iavf]
[ 3510.400884] ? iavf_free_all_rx_resources+0x160/0x160 [iavf]
[ 3510.400891] ? wait_woken+0x1d0/0x1d0
[ 3510.400895] ? notifier_call_chain+0xc1/0x130
[ 3510.400903] pci_device_remove+0xa8/0x1f0
[ 3510.400910] device_release_driver_internal+0x1c6/0x460
[ 3510.400916] pci_stop_bus_device+0x101/0x150
[ 3510.400919] pci_stop_and_remove_bus_device+0xe/0x20
[ 3510.400924] pci_iov_remove_virtfn+0x187/0x420
[ 3510.400927] ? pci_iov_add_virtfn+0xe10/0xe10
[ 3510.400929] ? pci_get_subsys+0x90/0x90
[ 3510.400932] sriov_disable+0xed/0x3e0
[ 3510.400936] ? bus_find_device+0x12d/0x1a0
[ 3510.400953] i40e_free_vfs+0x754/0x1210 [i40e]
[ 3510.400966] ? i40e_reset_all_vfs+0x880/0x880 [i40e]
[ 3510.400968] ? pci_get_device+0x7c/0x90
[ 3510.400970] ? pci_get_subsys+0x90/0x90
[ 3510.400982] ? pci_vfs_assigned.part.7+0x144/0x210
[ 3510.400987] ? __mutex_lock_slowpath+0x10/0x10
[ 3510.400996] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
[ 3510.401001] sriov_numvfs_store+0x214/0x290
[ 3510.401005] ? sriov_totalvfs_show+0x30/0x30
[ 3510.401007] ? __mutex_lock_slowpath+0x10/0x10
[ 3510.401011] ? __check_object_size+0x15a/0x350
[ 3510.401018] kernfs_fop_write+0x280/0x3f0
[ 3510.401022] vfs_write+0x145/0x440
[ 3510.401025] ksys_write+0xab/0x160
[ 3510.401028] ? __ia32_sys_read+0xb0/0xb0
[ 3510.401031] ? fput_many+0x1a/0x120
[ 3510.401032] ? filp_close+0xf0/0x130
[ 3510.401038] do_syscall_64+0xa0/0x370
[ 3510.401041] ? page_fault+0x8/0x30
[ 3510.401043] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 3510.401073] RIP: 0033:0x7f3a9bb842c0
[ 3510.401079] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24
[ 3510.401080] RSP: 002b:00007ffc05f1fe18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 3510.401083] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f3a9bb842c0
[ 3510.401085] RDX: 0000000000000002 RSI: 0000000002327408 RDI: 0000000000000001
[ 3510.401086] RBP: 0000000002327408 R08: 00007f3a9be53780 R09: 00007f3a9c8a4700
[ 3510.401086] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
[ 3510.401087] R13: 0000000000000001 R14: 00007f3a9be52620 R15: 0000000000000001
[ 3510.401090]
[ 3510.401093] Allocated by task 76795:
[ 3510.401098] kasan_kmalloc+0xa6/0xd0
[ 3510.401099] __kmalloc+0xfb/0x200
[ 3510.401104] iavf_init_interrupt_scheme+0x26f/0x1310 [iavf]
[ 3510.401108] iavf_watchdog_task+0x1d58/0x4050 [iavf]
[ 3510.401114] process_one_work+0x56a/0x11f0
[ 3510.401115] worker_thread+0x8f/0xf40
[ 3510.401117] kthread+0x2a0/0x390
[ 3510.401119] ret_from_fork+0x1f/0x40
[ 3510.401122] 0xffffffffffffffff
[ 3510.401123]
In timeout handling, we should keep the original num_active_queues
and reset num_req_queues to 0.
Fixes: 4e5e6b5d9d13 ("iavf: Fix return of set the new channel count")
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
Cc: Huang Cun <huangcun@sangfor.com.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
We do netif_napi_add() for all allocated q_vectors[], but potentially
do netif_napi_del() for part of them, then kfree q_vectors and leave
invalid pointers at dev->napi_list.
Reproducer:
[root@host ~]# cat repro.sh
#!/bin/bash
pf_dbsf="0000:41:00.0"
vf0_dbsf="0000:41:02.0"
g_pids=()
function do_set_numvf()
{
echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
sleep $((RANDOM%3+1))
echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
sleep $((RANDOM%3+1))
}
function do_set_channel()
{
local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/)
[ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; }
ifconfig $nic 192.168.18.5 netmask 255.255.255.0
ifconfig $nic up
ethtool -L $nic combined 1
ethtool -L $nic combined 4
sleep $((RANDOM%3))
}
function on_exit()
{
local pid
for pid in "${g_pids[@]}"; do
kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null
done
g_pids=()
}
trap "on_exit; exit" EXIT
while :; do do_set_numvf ; done &
g_pids+=($!)
while :; do do_set_channel ; done &
g_pids+=($!)
wait
Result:
[ 4093.900222] ==================================================================
[ 4093.900230] BUG: KASAN: use-after-free in free_netdev+0x308/0x390
[ 4093.900232] Read of size 8 at addr ffff88b4dc145640 by task repro.sh/6699
[ 4093.900233]
[ 4093.900236] CPU: 10 PID: 6699 Comm: repro.sh Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1
[ 4093.900238] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021
[ 4093.900239] Call Trace:
[ 4093.900244] dump_stack+0x71/0xab
[ 4093.900249] print_address_description+0x6b/0x290
[ 4093.900251] ? free_netdev+0x308/0x390
[ 4093.900252] kasan_report+0x14a/0x2b0
[ 4093.900254] free_netdev+0x308/0x390
[ 4093.900261] iavf_remove+0x825/0xd20 [iavf]
[ 4093.900265] pci_device_remove+0xa8/0x1f0
[ 4093.900268] device_release_driver_internal+0x1c6/0x460
[ 4093.900271] pci_stop_bus_device+0x101/0x150
[ 4093.900273] pci_stop_and_remove_bus_device+0xe/0x20
[ 4093.900275] pci_iov_remove_virtfn+0x187/0x420
[ 4093.900277] ? pci_iov_add_virtfn+0xe10/0xe10
[ 4093.900278] ? pci_get_subsys+0x90/0x90
[ 4093.900280] sriov_disable+0xed/0x3e0
[ 4093.900282] ? bus_find_device+0x12d/0x1a0
[ 4093.900290] i40e_free_vfs+0x754/0x1210 [i40e]
[ 4093.900298] ? i40e_reset_all_vfs+0x880/0x880 [i40e]
[ 4093.900299] ? pci_get_device+0x7c/0x90
[ 4093.900300] ? pci_get_subsys+0x90/0x90
[ 4093.900306] ? pci_vfs_assigned.part.7+0x144/0x210
[ 4093.900309] ? __mutex_lock_slowpath+0x10/0x10
[ 4093.900315] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
[ 4093.900318] sriov_numvfs_store+0x214/0x290
[ 4093.900320] ? sriov_totalvfs_show+0x30/0x30
[ 4093.900321] ? __mutex_lock_slowpath+0x10/0x10
[ 4093.900323] ? __check_object_size+0x15a/0x350
[ 4093.900326] kernfs_fop_write+0x280/0x3f0
[ 4093.900329] vfs_write+0x145/0x440
[ 4093.900330] ksys_write+0xab/0x160
[ 4093.900332] ? __ia32_sys_read+0xb0/0xb0
[ 4093.900334] ? fput_many+0x1a/0x120
[ 4093.900335] ? filp_close+0xf0/0x130
[ 4093.900338] do_syscall_64+0xa0/0x370
[ 4093.900339] ? page_fault+0x8/0x30
[ 4093.900341] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 4093.900357] RIP: 0033:0x7f16ad4d22c0
[ 4093.900359] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24
[ 4093.900360] RSP: 002b:00007ffd6491b7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 4093.900362] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f16ad4d22c0
[ 4093.900363] RDX: 0000000000000002 RSI: 0000000001a41408 RDI: 0000000000000001
[ 4093.900364] RBP: 0000000001a41408 R08: 00007f16ad7a1780 R09: 00007f16ae1f2700
[ 4093.900364] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
[ 4093.900365] R13: 0000000000000001 R14: 00007f16ad7a0620 R15: 0000000000000001
[ 4093.900367]
[ 4093.900368] Allocated by task 820:
[ 4093.900371] kasan_kmalloc+0xa6/0xd0
[ 4093.900373] __kmalloc+0xfb/0x200
[ 4093.900376] iavf_init_interrupt_scheme+0x63b/0x1320 [iavf]
[ 4093.900380] iavf_watchdog_task+0x3d51/0x52c0 [iavf]
[ 4093.900382] process_one_work+0x56a/0x11f0
[ 4093.900383] worker_thread+0x8f/0xf40
[ 4093.900384] kthread+0x2a0/0x390
[ 4093.900385] ret_from_fork+0x1f/0x40
[ 4093.900387] 0xffffffffffffffff
[ 4093.900387]
[ 4093.900388] Freed by task 6699:
[ 4093.900390] __kasan_slab_free+0x137/0x190
[ 4093.900391] kfree+0x8b/0x1b0
[ 4093.900394] iavf_free_q_vectors+0x11d/0x1a0 [iavf]
[ 4093.900397] iavf_remove+0x35a/0xd20 [iavf]
[ 4093.900399] pci_device_remove+0xa8/0x1f0
[ 4093.900400] device_release_driver_internal+0x1c6/0x460
[ 4093.900401] pci_stop_bus_device+0x101/0x150
[ 4093.900402] pci_stop_and_remove_bus_device+0xe/0x20
[ 4093.900403] pci_iov_remove_virtfn+0x187/0x420
[ 4093.900404] sriov_disable+0xed/0x3e0
[ 4093.900409] i40e_free_vfs+0x754/0x1210 [i40e]
[ 4093.900415] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
[ 4093.900416] sriov_numvfs_store+0x214/0x290
[ 4093.900417] kernfs_fop_write+0x280/0x3f0
[ 4093.900418] vfs_write+0x145/0x440
[ 4093.900419] ksys_write+0xab/0x160
[ 4093.900420] do_syscall_64+0xa0/0x370
[ 4093.900421] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 4093.900422] 0xffffffffffffffff
[ 4093.900422]
[ 4093.900424] The buggy address belongs to the object at ffff88b4dc144200
which belongs to the cache kmalloc-8k of size 8192
[ 4093.900425] The buggy address is located 5184 bytes inside of
8192-byte region [ffff88b4dc144200, ffff88b4dc146200)
[ 4093.900425] The buggy address belongs to the page:
[ 4093.900427] page:ffffea00d3705000 refcount:1 mapcount:0 mapping:ffff88bf04415c80 index:0x0 compound_mapcount: 0
[ 4093.900430] flags: 0x10000000008100(slab|head)
[ 4093.900433] raw: 0010000000008100 dead000000000100 dead000000000200 ffff88bf04415c80
[ 4093.900434] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000
[ 4093.900434] page dumped because: kasan: bad access detected
[ 4093.900435]
[ 4093.900435] Memory state around the buggy address:
[ 4093.900436] ffff88b4dc145500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900437] ffff88b4dc145580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900438] >ffff88b4dc145600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900438] ^
[ 4093.900439] ffff88b4dc145680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900440] ffff88b4dc145700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900440] ==================================================================
Although the patch #2 (of 2) can avoid the issue triggered by this
repro.sh, there still are other potential risks that if num_active_queues
is changed to less than allocated q_vectors[] by unexpected, the
mismatched netif_napi_add/del() can also cause UAF.
Since we actually call netif_napi_add() for all allocated q_vectors
unconditionally in iavf_alloc_q_vectors(), so we should fix it by
letting netif_napi_del() match to netif_napi_add().
Fixes: 5eae00c57f5e ("i40evf: main driver core")
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
Cc: Huang Cun <huangcun@sangfor.com.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Calling ethtool during reload can lead to call trace, because VSI isn't
configured for some time, but netdev is alive.
To fix it add rtnl lock for VSI deconfig and config. Set ::num_q_vectors
to 0 after freeing and add a check for ::tx/rx_rings in ring related
ethtool ops.
Add proper unroll of filters in ice_start_eth().
Reproduction:
$watch -n 0.1 -d 'ethtool -g enp24s0f0np0'
$devlink dev reload pci/0000:18:00.0 action driver_reinit
Call trace before fix:
[66303.926205] BUG: kernel NULL pointer dereference, address: 0000000000000000
[66303.926259] #PF: supervisor read access in kernel mode
[66303.926286] #PF: error_code(0x0000) - not-present page
[66303.926311] PGD 0 P4D 0
[66303.926332] Oops: 0000 [#1] PREEMPT SMP PTI
[66303.926358] CPU: 4 PID: 933821 Comm: ethtool Kdump: loaded Tainted: G OE 6.4.0-rc5+ #1
[66303.926400] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.00.01.0014.070920180847 07/09/2018
[66303.926446] RIP: 0010:ice_get_ringparam+0x22/0x50 [ice]
[66303.926649] Code: 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 87 c0 09 00 00 c7 46 04 e0 1f 00 00 c7 46 10 e0 1f 00 00 48 8b 50 20 <48> 8b 12 0f b7 52 3a 89 56 14 48 8b 40 28 48 8b 00 0f b7 40 58 48
[66303.926722] RSP: 0018:ffffad40472f39c8 EFLAGS: 00010246
[66303.926749] RAX: ffff98a8ada05828 RBX: ffff98a8c46dd060 RCX: ffffad40472f3b48
[66303.926781] RDX: 0000000000000000 RSI: ffff98a8c46dd068 RDI: ffff98a8b23c4000
[66303.926811] RBP: ffffad40472f3b48 R08: 00000000000337b0 R09: 0000000000000000
[66303.926843] R10: 0000000000000001 R11: 0000000000000100 R12: ffff98a8b23c4000
[66303.926874] R13: ffff98a8c46dd060 R14: 000000000000000f R15: ffffad40472f3a50
[66303.926906] FS: 00007f6397966740(0000) GS:ffff98b390900000(0000) knlGS:0000000000000000
[66303.926941] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[66303.926967] CR2: 0000000000000000 CR3: 000000011ac20002 CR4: 00000000007706e0
[66303.926999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[66303.927029] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[66303.927060] PKRU: 55555554
[66303.927075] Call Trace:
[66303.927094] <TASK>
[66303.927111] ? __die+0x23/0x70
[66303.927140] ? page_fault_oops+0x171/0x4e0
[66303.927176] ? exc_page_fault+0x7f/0x180
[66303.927209] ? asm_exc_page_fault+0x26/0x30
[66303.927244] ? ice_get_ringparam+0x22/0x50 [ice]
[66303.927433] rings_prepare_data+0x62/0x80
[66303.927469] ethnl_default_doit+0xe2/0x350
[66303.927501] genl_family_rcv_msg_doit.isra.0+0xe3/0x140
[66303.927538] genl_rcv_msg+0x1b1/0x2c0
[66303.927561] ? __pfx_ethnl_default_doit+0x10/0x10
[66303.927590] ? __pfx_genl_rcv_msg+0x10/0x10
[66303.927615] netlink_rcv_skb+0x58/0x110
[66303.927644] genl_rcv+0x28/0x40
[66303.927665] netlink_unicast+0x19e/0x290
[66303.927691] netlink_sendmsg+0x254/0x4d0
[66303.927717] sock_sendmsg+0x93/0xa0
[66303.927743] __sys_sendto+0x126/0x170
[66303.927780] __x64_sys_sendto+0x24/0x30
[66303.928593] do_syscall_64+0x5d/0x90
[66303.929370] ? __count_memcg_events+0x60/0xa0
[66303.930146] ? count_memcg_events.constprop.0+0x1a/0x30
[66303.930920] ? handle_mm_fault+0x9e/0x350
[66303.931688] ? do_user_addr_fault+0x258/0x740
[66303.932452] ? exc_page_fault+0x7f/0x180
[66303.933193] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fixes: 5b246e533d01 ("ice: split probe into smaller functions")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Since commit 6624e780a577fc ("ice: split ice_vsi_setup into smaller
functions") ice_vsi_release does things twice. There is unregister
netdev which is unregistered in ice_deinit_eth also.
It also unregisters the devlink_port twice which is also unregistered
in ice_deinit_eth(). This double deregistration is hidden because
devl_port_unregister ignores the return value of xa_erase.
[ 68.642167] Call Trace:
[ 68.650385] ice_devlink_destroy_pf_port+0xe/0x20 [ice]
[ 68.655656] ice_vsi_release+0x445/0x690 [ice]
[ 68.660147] ice_deinit+0x99/0x280 [ice]
[ 68.664117] ice_remove+0x1b6/0x5c0 [ice]
[ 171.103841] Call Trace:
[ 171.109607] ice_devlink_destroy_pf_port+0xf/0x20 [ice]
[ 171.114841] ice_remove+0x158/0x270 [ice]
[ 171.118854] pci_device_remove+0x3b/0xc0
[ 171.122779] device_release_driver_internal+0xc7/0x170
[ 171.127912] driver_detach+0x54/0x8c
[ 171.131491] bus_remove_driver+0x77/0xd1
[ 171.135406] pci_unregister_driver+0x2d/0xb0
[ 171.139670] ice_module_exit+0xc/0x55f [ice]
Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The insertion of an empty frame was introduced with
commit db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit")
in order to ensure that the current cycle has at least one packet if
there is some packet to be scheduled for the next cycle.
However, the current implementation does not properly check if
a packet is already scheduled for the current cycle. Currently,
an empty packet is always inserted if and only if
txtime >= end_of_cycle && txtime > last_tx_cycle
but since last_tx_cycle is always either the end of the current
cycle (end_of_cycle) or the end of a previous cycle, the
second part (txtime > last_tx_cycle) is always true unless
txtime == last_tx_cycle.
What actually needs to be checked here is if the last_tx_cycle
was already written within the current cycle, so an empty frame
should only be inserted if and only if
txtime >= end_of_cycle && end_of_cycle > last_tx_cycle.
This patch does not only avoid an unnecessary insertion, but it
can actually be harmful to insert an empty packet if packets
are already scheduled in the current cycle, because it can lead
to a situation where the empty packet is actually processed
as the first packet in the upcoming cycle shifting the packet
with the first_flag even one cycle into the future, finally leading
to a TX hang.
The TX hang can be reproduced on a i225 with:
sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x1 \
txtime-delay 500000 \
clockid CLOCK_TAI
sudo tc qdisc replace dev enp1s0 parent 100:1 etf \
clockid CLOCK_TAI \
delta 500000 \
offload \
skip_sock_check
and traffic generator
sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns
with traffic.cfg
#define ETH_P_IP 0x0800
{
/* Ethernet Header */
0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed
0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed
const16(ETH_P_IP),
/* IPv4 Header */
0b01000101, 0, # IPv4 version, IHL, TOS
const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header))
const16(2), # IPv4 ident
0b01000000, 0, # IPv4 flags, fragmentation off
64, # IPv4 TTL
17, # Protocol UDP
csumip(14, 33), # IPv4 checksum
/* UDP Header */
10, 0, 48, 1, # IP Src - adapt as needed
10, 0, 48, 10, # IP Dest - adapt as needed
const16(5555), # UDP Src Port
const16(6666), # UDP Dest Port
const16(1008), # UDP length (UDP header 8 bytes + payload length)
csumudp(14, 34), # UDP checksum
/* Payload */
fill('W', 1000),
}
and the observed message with that is for example
igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang
Tx Queue <0>
TDH <32>
TDT <3c>
next_to_use <3c>
next_to_clean <32>
buffer_info[next_to_clean]
time_stamp <ffff26a8>
next_to_watch <00000000632a1828>
jiffies <ffff27f8>
desc.status <1048000>
Fixes: db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
It is possible (verified on a running system) that frames are processed
by igc_tx_launchtime with a txtime before the start of the cycle
(baset_est).
However, the result of txtime - baset_est is written into a u32,
leading to a wrap around to a positive number. The following
launchtime > 0 check will only branch to executing launchtime = 0
if launchtime is already 0.
Fix it by using a s32 before checking launchtime > 0.
Fixes: db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|