summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)AuthorFilesLines
2022-04-14ice: allow creating VFs for !CONFIG_NET_SWITCHDEVMaciej Fijalkowski1-1/+1
Currently for !CONFIG_NET_SWITCHDEV kernel builds it is not possible to create VFs properly as call to ice_eswitch_configure() returns -EOPNOTSUPP for us. This is because CONFIG_ICE_SWITCHDEV depends on CONFIG_NET_SWITCHDEV. Change the ice_eswitch_configure() implementation for !CONFIG_ICE_SWITCHDEV to return 0 instead -EOPNOTSUPP and let ice_ena_vfs() finish its work properly. CC: Grzegorz Nitka <grzegorz.nitka@intel.com> Fixes: 1a1c40df2e80 ("ice: set and release switchdev environment") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-14ice: xsk: check if Rx ring was filled up to the endMaciej Fijalkowski1-1/+6
__ice_alloc_rx_bufs_zc() checks if a number of the descriptors to be allocated would cause the ring wrap. In that case, driver will issue two calls to xsk_buff_alloc_batch() - one that will fill the ring up to the end and the second one that will start with filling descriptors from the beginning of the ring. ice_fill_rx_descs() is a wrapper for taking care of what xsk_buff_alloc_batch() gave back to the driver. It works in a best effort approach, so for example when driver asks for 64 buffers, ice_fill_rx_descs() could assign only 32. Such case needs to be checked when ring is being filled up to the end, because in that situation ntu might not reached the end of the ring. Fix the ring wrap by checking if nb_buffs_extra has the expected value. If not, bump ntu and go directly to tail update. Fixes: 3876ff525de7 ("ice: xsk: Handle SW XDP ring wrap and bump tail more often") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Shwetha Nagaraju <Shwetha.nagaraju@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-13e1000e: Fix possible overflow in LTR decodingSasha Neftin1-2/+2
When we decode the latency and the max_latency, u16 value may not fit the required size and could lead to the wrong LTR representation. Scaling is represented as: scale 0 - 1 (2^(5*0)) = 2^0 scale 1 - 32 (2^(5 *1))= 2^5 scale 2 - 1024 (2^(5 *2)) =2^10 scale 3 - 32768 (2^(5 *3)) =2^15 scale 4 - 1048576 (2^(5 *4)) = 2^20 scale 5 - 33554432 (2^(5 *4)) = 2^25 scale 4 and scale 5 required 20 and 25 bits respectively. scale 6 reserved. Replace the u16 type with the u32 type and allow corrected LTR representation. Cc: stable@vger.kernel.org Fixes: 44a13a5d99c7 ("e1000e: Fix the max snoop/no-snoop latency for 10M") Reported-by: James Hutchinson <jahutchinson99@googlemail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215689 Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Tested-by: James Hutchinson <jahutchinson99@googlemail.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-13igc: Fix suspending when PTM is activeVinicius Costa Gomes1-1/+14
Some mainboard/CPU combinations, in particular, Alder Lake-S with a W680 mainboard, have shown problems (system hangs usually, no kernel logs) with suspend/resume when PCIe PTM is enabled and active. In some cases, it could be reproduced when removing the igc module. The best we can do is to stop PTM dialogs from the downstream/device side before the interface is brought down. PCIe PTM will be re-enabled when the interface is being brought up. Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-13igc: Fix BUG: scheduling while atomicSasha Neftin1-2/+2
Replace usleep_range() method with udelay() method to allow atomic contexts in low-level MDIO access functions. The following issue can be seen by doing the following: $ modprobe -r bonding $ modprobe -v bonding max_bonds=1 mode=1 miimon=100 use_carrier=0 $ ip link set bond0 up $ ifenslave bond0 eth0 eth1 [ 982.357308] BUG: scheduling while atomic: kworker/u64:0/9/0x00000002 [ 982.364431] INFO: lockdep is turned off. [ 982.368824] Modules linked in: bonding sctp ip6_udp_tunnel udp_tunnel mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core nfp tls sunrpc intel_rapl_msr iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate intel_uncore pcspkr lpc_ich mei_me ipmi_ssif mei ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter xfs libcrc32c sr_mod cdrom sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci crc32c_intel libata i2c_algo_bit tg3 megaraid_sas igc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: bonding] [ 982.437941] CPU: 25 PID: 9 Comm: kworker/u64:0 Kdump: loaded Tainted: G W --------- - - 4.18.0-348.el8.x86_64+debug #1 [ 982.451333] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 2.7.0 12/005/2017 [ 982.459791] Workqueue: bond0 bond_mii_monitor [bonding] [ 982.465622] Call Trace: [ 982.468355] dump_stack+0x8e/0xd0 [ 982.472056] __schedule_bug.cold.60+0x3a/0x60 [ 982.476919] __schedule+0x147b/0x1bc0 [ 982.481007] ? firmware_map_remove+0x16b/0x16b [ 982.485967] ? hrtimer_fixup_init+0x40/0x40 [ 982.490625] schedule+0xd9/0x250 [ 982.494227] schedule_hrtimeout_range_clock+0x10d/0x2c0 [ 982.500058] ? hrtimer_nanosleep_restart+0x130/0x130 [ 982.505598] ? hrtimer_init_sleeper_on_stack+0x90/0x90 [ 982.511332] ? usleep_range+0x88/0x130 [ 982.515514] ? recalibrate_cpu_khz+0x10/0x10 [ 982.520279] ? ktime_get+0xab/0x1c0 [ 982.524175] ? usleep_range+0x88/0x130 [ 982.528355] usleep_range+0xdd/0x130 [ 982.532344] ? console_conditional_schedule+0x30/0x30 [ 982.537987] ? igc_put_hw_semaphore+0x17/0x60 [igc] [ 982.543432] igc_read_phy_reg_gpy+0x111/0x2b0 [igc] [ 982.548887] igc_phy_has_link+0xfa/0x260 [igc] [ 982.553847] ? igc_get_phy_id+0x210/0x210 [igc] [ 982.558894] ? lock_acquire+0x34d/0x890 [ 982.563187] ? lock_downgrade+0x710/0x710 [ 982.567659] ? rcu_read_unlock+0x50/0x50 [ 982.572039] igc_check_for_copper_link+0x106/0x210 [igc] [ 982.577970] ? igc_config_fc_after_link_up+0x840/0x840 [igc] [ 982.584286] ? rcu_read_unlock+0x50/0x50 [ 982.588661] ? lock_release+0x591/0xb80 [ 982.592939] ? lock_release+0x591/0xb80 [ 982.597220] igc_has_link+0x113/0x330 [igc] [ 982.601887] ? lock_downgrade+0x710/0x710 [ 982.606362] igc_ethtool_get_link+0x6d/0x90 [igc] [ 982.611614] bond_check_dev_link+0x131/0x2c0 [bonding] [ 982.617350] ? bond_time_in_interval+0xd0/0xd0 [bonding] [ 982.623277] ? rcu_read_lock_held+0x62/0xc0 [ 982.627944] ? rcu_read_lock_sched_held+0xe0/0xe0 [ 982.633198] bond_mii_monitor+0x314/0x2500 [bonding] [ 982.638738] ? lock_contended+0x880/0x880 [ 982.643214] ? bond_miimon_link_change+0xa0/0xa0 [bonding] [ 982.649336] ? lock_acquire+0x34d/0x890 [ 982.653615] ? lock_downgrade+0x710/0x710 [ 982.658089] ? debug_object_deactivate+0x221/0x340 [ 982.663436] ? rcu_read_unlock+0x50/0x50 [ 982.667811] ? debug_print_object+0x2b0/0x2b0 [ 982.672672] ? __switch_to_asm+0x41/0x70 [ 982.677049] ? __switch_to_asm+0x35/0x70 [ 982.681426] ? _raw_spin_unlock_irq+0x24/0x40 [ 982.686288] ? trace_hardirqs_on+0x20/0x195 [ 982.690956] ? _raw_spin_unlock_irq+0x24/0x40 [ 982.695818] process_one_work+0x8f0/0x1770 [ 982.700390] ? pwq_dec_nr_in_flight+0x320/0x320 [ 982.705443] ? debug_show_held_locks+0x50/0x50 [ 982.710403] worker_thread+0x87/0xb40 [ 982.714489] ? process_one_work+0x1770/0x1770 [ 982.719349] kthread+0x344/0x410 [ 982.722950] ? kthread_insert_work_sanity_check+0xd0/0xd0 [ 982.728975] ret_from_fork+0x3a/0x50 Fixes: 5586838fe9ce ("igc: Add code for PHY support") Reported-by: Corinna Vinschen <vinschen@redhat.com> Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Corinna Vinschen <vinschen@redhat.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-13igc: Fix infinite loop in release_swfw_syncSasha Neftin1-2/+9
An infinite loop may occur if we fail to acquire the HW semaphore, which is needed for resource release. This will typically happen if the hardware is surprise-removed. At this stage there is nothing to do, except log an error and quit. Fixes: c0071c7aa5fe ("igc: Add HW initialization code") Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-12i40e: Add Ethernet Connection X722 for 10GbE SFP+ supportMateusz Palczewski3-0/+3
Add support for Ethernet Connection X722 for 10GbE SFP+ cards. Make possible for the driver to bind to the card. Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-12i40e: Add vsi.tx_restart to i40e ethtool statsNabil S. Alramli1-0/+1
Add vsi.tx_restart to the i40e driver ethtool statistics Signed-off-by: Nabil S. Alramli <dev@nalramli.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-12i40e: Add tx_stopped statJoe Damato6-2/+12
Track TX queue stop events and export the new stat with ethtool. Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-12ice: Add mpls+tso supportJoe Damato2-9/+24
Attempt to add mpls+tso support. I don't have ice hardware available to test myself, but I just implemented this feature in i40e and thought it might be useful to implement for ice while this is fresh in my brain. Hoping some one at intel will be able to test this on my behalf. Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-12i40e: Add support for MPLS + TSOJoe Damato2-19/+48
This change adds support for TSO of MPLS packets. In my tests with tcpdump it seems to work. Note this test setup has a 9000 byte MTU: MPLS (label 100, exp 0, [S], ttl 64) IP srcip.50086 > dstip.1234: Flags [P.], seq 593345:644401, ack 0, win 420, options [nop,nop,TS val 45022534 ecr 1722291395], length 51056 IP dstip.1234 > srcip.50086: Flags [.], ack 593345, win 122, options [nop,nop,TS val 1722291395 ecr 45022534], length 0 IP dstip.1234 > srcip.50086: Flags [.], ack 602289, win 105, options [nop,nop,TS val 1722291395 ecr 45022534], length 0 IP dstip.1234 > srcip.50086: Flags [.], ack 620177, win 71, options [nop,nop,TS val 1722291395 ecr 45022534], length 0 MPLS (label 100, exp 0, [S], ttl 64) IP srcip.50086 > dstip.1234: Flags [P.], seq 644401:655953, ack 0, win 420, options [nop,nop,TS val 45022534 ecr 1722291395], length 11552 IP dstip.1234 > srcip.50086: Flags [.], ack 638065, win 37, options [nop,nop,TS val 1722291395 ecr 45022534], length 0 IP dstip.1234 > srcip.50086: Flags [.], ack 644401, win 25, options [nop,nop,TS val 1722291395 ecr 45022534], length 0 IP dstip.1234 > srcip.50086: Flags [.], ack 653345, win 8, options [nop,nop,TS val 1722291395 ecr 45022534], length 0 IP dstip.1234 > srcip.50086: Flags [.], ack 655953, win 3, options [nop,nop,TS val 1722291395 ecr 45022534], length 0 Signed-off-by: Joe Damato <jdamato@fastly.com> Co-developed-by: Mike Gallo <mgallo@fastly.com> Signed-off-by: Mike Gallo <mgallo@fastly.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-08Revert "iavf: Fix deadlock occurrence during resetting VF interface"Mateusz Palczewski1-5/+2
This change caused a regression with resetting while changing network namespaces. By clearing the IFF_UP flag, the kernel now thinks it has fully closed the device. This reverts commit 0cc318d2e8408bc0ffb4662a0c3e5e57005ac6ff. Fixes: 0cc318d2e840 ("iavf: Fix deadlock occurrence during resetting VF interface") Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-08ice: arfs: fix use-after-free when freeing @rx_cpu_rmapAlexander Lobakin3-18/+14
The CI testing bots triggered the following splat: [ 718.203054] BUG: KASAN: use-after-free in free_irq_cpu_rmap+0x53/0x80 [ 718.206349] Read of size 4 at addr ffff8881bd127e00 by task sh/20834 [ 718.212852] CPU: 28 PID: 20834 Comm: sh Kdump: loaded Tainted: G S W IOE 5.17.0-rc8_nextqueue-devqueue-02643-g23f3121aca93 #1 [ 718.219695] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 718.223418] Call Trace: [ 718.227139] [ 718.230783] dump_stack_lvl+0x33/0x42 [ 718.234431] print_address_description.constprop.9+0x21/0x170 [ 718.238177] ? free_irq_cpu_rmap+0x53/0x80 [ 718.241885] ? free_irq_cpu_rmap+0x53/0x80 [ 718.245539] kasan_report.cold.18+0x7f/0x11b [ 718.249197] ? free_irq_cpu_rmap+0x53/0x80 [ 718.252852] free_irq_cpu_rmap+0x53/0x80 [ 718.256471] ice_free_cpu_rx_rmap.part.11+0x37/0x50 [ice] [ 718.260174] ice_remove_arfs+0x5f/0x70 [ice] [ 718.263810] ice_rebuild_arfs+0x3b/0x70 [ice] [ 718.267419] ice_rebuild+0x39c/0xb60 [ice] [ 718.270974] ? asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 718.274472] ? ice_init_phy_user_cfg+0x360/0x360 [ice] [ 718.278033] ? delay_tsc+0x4a/0xb0 [ 718.281513] ? preempt_count_sub+0x14/0xc0 [ 718.284984] ? delay_tsc+0x8f/0xb0 [ 718.288463] ice_do_reset+0x92/0xf0 [ice] [ 718.292014] ice_pci_err_resume+0x91/0xf0 [ice] [ 718.295561] pci_reset_function+0x53/0x80 <...> [ 718.393035] Allocated by task 690: [ 718.433497] Freed by task 20834: [ 718.495688] Last potentially related work creation: [ 718.568966] The buggy address belongs to the object at ffff8881bd127e00 which belongs to the cache kmalloc-96 of size 96 [ 718.574085] The buggy address is located 0 bytes inside of 96-byte region [ffff8881bd127e00, ffff8881bd127e60) [ 718.579265] The buggy address belongs to the page: [ 718.598905] Memory state around the buggy address: [ 718.601809] ffff8881bd127d00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 718.604796] ffff8881bd127d80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc [ 718.607794] >ffff8881bd127e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 718.610811] ^ [ 718.613819] ffff8881bd127e80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc [ 718.617107] ffff8881bd127f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc This is due to that free_irq_cpu_rmap() is always being called *after* (devm_)free_irq() and thus it tries to work with IRQ descs already freed. For example, on device reset the driver frees the rmap right before allocating a new one (the splat above). Make rmap creation and freeing function symmetrical with {request,free}_irq() calls i.e. do that on ifup/ifdown instead of device probe/remove/resume. These operations can be performed independently from the actual device aRFS configuration. Also, make sure ice_vsi_free_irq() clears IRQ affinity notifiers only when aRFS is disabled -- otherwise, CPU rmap sets and clears its own and they must not be touched manually. Fixes: 28bf26724fdb0 ("ice: Implement aRFS") Co-developed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-08Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nexDavid S. Miller2-290/+211
t-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-04-07 Alexander Lobakin says: This hunts down several places around packet templates/dummies for switch rules which are either repetitive, fragile or just not really readable code. It's a common need to add new packet templates and to review such changes as well, try to simplify both with the help of a pair macros and aliases. ice_find_dummy_packet() became very complex at this point with tons of nested if-elses. It clearly showed this approach does not scale, so convert its logics to the simple mask-match + static const array. bloat-o-meter is happy about that (built w/ LLVM 13): add/remove: 0/1 grow/shrink: 1/1 up/down: 2/-1058 (-1056) Function old new delta ice_fill_adv_dummy_packet 289 291 +2 ice_adv_add_update_vsi_list 201 - -201 ice_add_adv_rule 2950 2093 -857 Total: Before=414512, After=413456, chg -0.25% add/remove: 53/52 grow/shrink: 0/0 up/down: 4660/-3988 (672) RO Data old new delta ice_dummy_pkt_profiles - 672 +672 Total: Before=37895, After=38567, chg +1.77% Diffstat also looks nice, and adding new packet templates now takes less lines. We'll probably come out with dynamic template crafting in a while, but for now let's improve what we have currently. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-07ice: switch: convert packet template match code to rodataAlexander Lobakin1-107/+108
Trade text size for rodata size and replace tons of nested if-elses to the const mask match based structs. The almost entire ice_find_dummy_packet() now becomes just one plain while-increment loop. The order in ice_dummy_pkt_profiles[] should be same with the if-elses order previously, as masks become less and less strict through the array to follow the original code flow. Apart from removing 80 locs of 4-level if-elses, it brings a solid text size optimization: add/remove: 0/1 grow/shrink: 1/1 up/down: 2/-1058 (-1056) Function old new delta ice_fill_adv_dummy_packet 289 291 +2 ice_adv_add_update_vsi_list 201 - -201 ice_add_adv_rule 2950 2093 -857 Total: Before=414512, After=413456, chg -0.25% add/remove: 53/52 grow/shrink: 0/0 up/down: 4660/-3988 (672) RO Data old new delta ice_dummy_pkt_profiles - 672 +672 Total: Before=37895, After=38567, chg +1.77% Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-07ice: switch: use convenience macros to declare dummy pkt templatesAlexander Lobakin1-71/+62
Declarations of dummy/template packet headers and offsets can be minified to improve readability and simplify adding new templates. Move all the repetitive constructions into two macros and let them do the name and type expansions. Linewrap removal is yet another positive side effect. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-07ice: switch: use a struct to pass packet template paramsAlexander Lobakin1-174/+94
ice_find_dummy_packet() contains a lot of boilerplate code and a nice room for copy-paste mistakes. Instead of passing 3 separate pointers back and forth to get packet template (dummy) params, directly return a structure containing them. Then, use a macro to compose compound literals and avoid code duplication on return path. Now, dummy packet type/name is needed only once to return a full correct triple pkt-pkt_len-offsets, and those are all one-liners. dummy_ipv4_gtpu_ipv4_packet_offsets is just moved around and renamed (as well as dummy_ipv6_gtp_packet_offsets) with no function changes. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-07ice: switch: unobscurify bitops loop in ice_fill_adv_dummy_packet()Alexander Lobakin1-7/+9
A loop performing header modification according to the provided mask in ice_fill_adv_dummy_packet() is very cryptic (and error-prone). Replace two identical cast-deferences with a variable. Replace three struct-member-array-accesses with a variable. Invert the condition, reduce the indentation by one -> eliminate line wraps. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-07ice: switch: add and use u16[] aliases to ice_adv_lkup_elem::{h, m}_uAlexander Lobakin2-10/+17
ice_adv_lkup_elem fields h_u and m_u are being accessed as raw u16 arrays in several places. To reduce cast and braces burden, add permanent array-of-u16 aliases with the same size as the `union ice_prot_hdr` itself via anonymous unions to the actual struct declaration, and just access them directly. This: - removes the need to cast the union to u16[] and then dereference it each time -> reduces the horizon for potential bugs; - improves -Warray-bounds coverage -- the array size is now known at compilation time; - addresses cppcheck complaints. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-05ice: clear cmd_type_offset_bsz for TX ringsMaciej Fijalkowski1-1/+1
Currently when XDP rings are created, each descriptor gets its DD bit set, which turns out to be the wrong approach as it can lead to a situation where more descriptors get cleaned than it was supposed to, e.g. when AF_XDP busy poll is run with a large batch size. In this situation, the driver would request for more buffers than it is able to handle. Fix this by not setting the DD bits in ice_xdp_alloc_setup_rings(). They should be initialized to zero instead. Fixes: 9610bd988df9 ("ice: optimize XDP_TX workloads") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-05ice: xsk: fix VSI state check in ice_xsk_wakeup()Maciej Fijalkowski1-1/+1
ICE_DOWN is dedicated for pf->state. Check for ICE_VSI_DOWN being set on vsi->state in ice_xsk_wakeup(). Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-05ice: synchronize_rcu() when terminating ringsMaciej Fijalkowski3-3/+7
Unfortunately, the ice driver doesn't respect the RCU critical section that XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to paths that destroy resources that might be in use. This was addressed in other AF_XDP ZC enabled drivers, for reference see for example commit b3873a5be757 ("net/i40e: Fix concurrency issues between config flow and XSK") Fixes: efc2214b6047 ("ice: Add support for XDP") Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-05ice: Do not skip not enabled queues in ice_vc_dis_qs_msgAnatolii Gerasymenko1-2/+2
Disable check for queue being enabled in ice_vc_dis_qs_msg, because there could be a case when queues were created, but were not enabled. We still need to delete those queues. Normal workflow for VF looks like: Enable path: VIRTCHNL_OP_ADD_ETH_ADDR (opcode 10) VIRTCHNL_OP_CONFIG_VSI_QUEUES (opcode 6) VIRTCHNL_OP_ENABLE_QUEUES (opcode 8) Disable path: VIRTCHNL_OP_DISABLE_QUEUES (opcode 9) VIRTCHNL_OP_DEL_ETH_ADDR (opcode 11) The issue appears only in stress conditions when VF is enabled and disabled very fast. Eventually there will be a case, when queues are created by VIRTCHNL_OP_CONFIG_VSI_QUEUES, but are not enabled by VIRTCHNL_OP_ENABLE_QUEUES. In turn, these queues are not deleted by VIRTCHNL_OP_DISABLE_QUEUES, because there is a check whether queues are enabled in ice_vc_dis_qs_msg. When we bring up the VF again, we will see the "Failed to set LAN Tx queue context" error during VIRTCHNL_OP_CONFIG_VSI_QUEUES step. This happens because old 16 queues were not deleted and VF requests to create 16 more, but ice_sched_get_free_qparent in ice_ena_vsi_txq would fail to find a parent node for first newly requested queue (because all nodes are allocated to 16 old queues). Testing Hints: Just enable and disable VF fast enough, so it would be disabled before reaching VIRTCHNL_OP_ENABLE_QUEUES. while true; do ip link set dev ens785f0v0 up sleep 0.065 # adjust delay value for you machine ip link set dev ens785f0v0 down done Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-05ice: Set txq_teid to ICE_INVAL_TEID on ring creationAnatolii Gerasymenko1-0/+1
When VF is freshly created, but not brought up, ring->txq_teid value is by default set to 0. But 0 is a valid TEID. On some platforms the Root Node of Tx scheduler has a TEID = 0. This can cause issues as shown below. The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF). Testing Hints: echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs ip link set dev ens785f0v0 up ip link set dev ens785f0v0 down If we have freshly created VF and quickly turn it on and off, so there would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error: [ 639.531454] disable queue 89 failed 14 [ 639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR [ 639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5 The reason for the fail is that we are trying to send AQ command to delete queue 89, which has never been created and receive an "invalid argument" error from firmware. As this queue has never been created, it's teid and ring->txq_teid have default value 0. ice_dis_vsi_txq has a check against non-existent queues: node = ice_sched_find_node_by_teid(pi->root, q_teids[i]); if (!node) continue; But on some platforms the Root Node of Tx scheduler has a teid = 0. Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is pi->root), and we go further to submit an erroneous request to firmware. Fixes: 37bb83901286 ("ice: Move common functions out of ice_main.c part 7/7") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-01ice: Fix broken IFF_ALLMULTI handlingIvan Vecera3-38/+121
Handling of all-multicast flag and associated multicast promiscuous mode is broken in ice driver. When an user switches allmulticast flag on or off the driver checks whether any VLANs are configured over the interface (except default VLAN 0). If any extra VLANs are registered it enables multicast promiscuous mode for all these VLANs (including default VLAN 0) using ICE_SW_LKUP_PROMISC_VLAN look-up type. In this situation all multicast packets tagged with known VLAN ID or untagged are received and multicast packets tagged with unknown VLAN ID ignored. If no extra VLANs are registered (so only VLAN 0 exists) it enables multicast promiscuous mode for VLAN 0 and uses ICE_SW_LKUP_PROMISC look-up type. In this situation any multicast packets including tagged ones are received. The driver handles IFF_ALLMULTI in ice_vsi_sync_fltr() this way: ice_vsi_sync_fltr() { ... if (changed_flags & IFF_ALLMULTI) { if (netdev->flags & IFF_ALLMULTI) { if (vsi->num_vlans > 1) ice_set_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS); else ice_set_promisc(..., ICE_MCAST_PROMISC_BITS); } else { if (vsi->num_vlans > 1) ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS); else ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS); } } ... } The code above depends on value vsi->num_vlan that specifies number of VLANs configured over the interface (including VLAN 0) and this is problem because that value is modified in NDO callbacks ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid(). Scenario 1: 1. ip link set ens7f0 allmulticast on 2. ip link add vlan10 link ens7f0 type vlan id 10 3. ip link set ens7f0 allmulticast off 4. ip link set ens7f0 allmulticast on [1] In this scenario IFF_ALLMULTI is enabled and the driver calls ice_set_promisc(..., ICE_MCAST_PROMISC_BITS) that installs multicast promisc rule with non-VLAN look-up type. [2] Then VLAN with ID 10 is added and vsi->num_vlan incremented to 2 [3] Command switches IFF_ALLMULTI off and the driver calls ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS) but this call is effectively NOP because it looks for multicast promisc rules for VLAN 0 and VLAN 10 with VLAN look-up type but no such rules exist. So the all-multicast remains enabled silently in hardware. [4] Command tries to switch IFF_ALLMULTI on and the driver calls ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS) but this call fails (-EEXIST) because non-VLAN multicast promisc rule already exists. Scenario 2: 1. ip link add vlan10 link ens7f0 type vlan id 10 2. ip link set ens7f0 allmulticast on 3. ip link add vlan20 link ens7f0 type vlan id 20 4. ip link del vlan10 ; ip link del vlan20 5. ip link set ens7f0 allmulticast off [1] VLAN with ID 10 is added and vsi->num_vlan==2 [2] Command switches IFF_ALLMULTI on and driver installs multicast promisc rules with VLAN look-up type for VLAN 0 and 10 [3] VLAN with ID 20 is added and vsi->num_vlan==3 but no multicast promisc rules is added for this new VLAN so the interface does not receive MC packets from VLAN 20 [4] Both VLANs are removed but multicast rule for VLAN 10 remains installed so interface receives multicast packets from VLAN 10 [5] Command switches IFF_ALLMULTI off and because vsi->num_vlan is 1 the driver tries to remove multicast promisc rule for VLAN 0 with non-VLAN look-up that does not exist. All-multicast looks disabled from user point of view but it is partially enabled in HW (interface receives all multicast packets either untagged or tagged with VLAN ID 10) To resolve these issues the patch introduces these changes: 1. Adds handling for IFF_ALLMULTI to ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid() callbacks. So when VLAN is added/removed and IFF_ALLMULTI is enabled an appropriate multicast promisc rule for that VLAN ID is added/removed. 2. In ice_vlan_rx_add_vid() when first VLAN besides VLAN 0 is added so (vsi->num_vlan == 2) and IFF_ALLMULTI is enabled then look-up type for existing multicast promisc rule for VLAN 0 is updated to ICE_MCAST_VLAN_PROMISC_BITS. 3. In ice_vlan_rx_kill_vid() when last VLAN besides VLAN 0 is removed so (vsi->num_vlan == 1) and IFF_ALLMULTI is enabled then look-up type for existing multicast promisc rule for VLAN 0 is updated to ICE_MCAST_PROMISC_BITS. 4. Both ice_vlan_rx_{add,kill}_vid() have to run under ICE_CFG_BUSY bit protection to avoid races with ice_vsi_sync_fltr() that runs in ice_service_task() context. 5. Bit ICE_VSI_VLAN_FLTR_CHANGED is use-less and can be removed. 6. Error messages added to ice_fltr_*_vsi_promisc() helper functions to avoid them in their callers 7. Small improvements to increase readability Fixes: 5eda8afd6bcc ("ice: Add support for PF/VF promiscuous mode") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-01ice: Fix MAC address settingIvan Vecera1-2/+5
Commit 2ccc1c1ccc671b ("ice: Remove excess error variables") merged the usage of 'status' and 'err' variables into single one in function ice_set_mac_address(). Unfortunately this causes a regression when call of ice_fltr_add_mac() returns -EEXIST because this return value does not indicate an error in this case but value of 'err' remains to be -EEXIST till the end of the function and is returned to caller. Prior mentioned commit this does not happen because return value of ice_fltr_add_mac() was stored to 'status' variable first and if it was -EEXIST then 'err' remains to be zero. Fix the problem by reset 'err' to zero when ice_fltr_add_mac() returns -EEXIST. Fixes: 2ccc1c1ccc671b ("ice: Remove excess error variables") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-01ice: Clear default forwarding VSI during VSI releaseIvan Vecera1-0/+2
VSI is set as default forwarding one when promisc mode is set for PF interface, when PF is switched to switchdev mode or when VF driver asks to enable allmulticast or promisc mode for the VF interface (when vf-true-promisc-support priv flag is off). The third case is buggy because in that case VSI associated with VF remains as default one after VF removal. Reproducer: 1. Create VF echo 1 > sys/class/net/ens7f0/device/sriov_numvfs 2. Enable allmulticast or promisc mode on VF ip link set ens7f0v0 allmulticast on ip link set ens7f0v0 promisc on 3. Delete VF echo 0 > sys/class/net/ens7f0/device/sriov_numvfs 4. Try to enable promisc mode on PF ip link set ens7f0 promisc on Although it looks that promisc mode on PF is enabled the opposite is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC handling first checks if any other VSI is set as default forwarding one and if so the function does not do anything. At this point it is not possible to enable promisc mode on PF without re-probe device. To resolve the issue this patch clear default forwarding VSI during ice_vsi_release() when the VSI to be released is the default one. Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-29ice: xsk: Fix indexing in ice_tx_xsk_pool()Maciej Fijalkowski1-1/+1
Ice driver tries to always create XDP rings array to be num_possible_cpus() sized, regardless of user's queue count setting that can be changed via ethtool -L for example. Currently, ice_tx_xsk_pool() calculates the qid by decrementing the ring->q_index by the count of XDP queues, but ring->q_index is set to 'i + vsi->alloc_txq'. When user did ethtool -L $IFACE combined 1, alloc_txq is 1, but vsi->num_xdp_txq is still num_possible_cpus(). Then, ice_tx_xsk_pool() will do OOB access and in the final result ring would not get xsk_pool pointer assigned. Then, each ice_xsk_wakeup() call will fail with error and it will not be possible to get into NAPI and do the processing from driver side. Fix this by decrementing vsi->alloc_txq instead of vsi->num_xdp_txq from ring-q_index in ice_tx_xsk_pool() so the calculation is reflected to the setting of ring->q_index. Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-5-maciej.fijalkowski@intel.com
2022-03-29ice: xsk: Stop Rx processing when ntc catches ntuMaciej Fijalkowski1-0/+3
This can happen with big budget values and some breakage of re-filling descriptors as we do not clear the entry that ntu is pointing at the end of ice_alloc_rx_bufs_zc. So if ntc is at ntu then it might be the case that status_error0 has an old, uncleared value and ntc would go over with processing which would result in false results. Break Rx loop when ntc == ntu to avoid broken behavior. Fixes: 3876ff525de7 ("ice: xsk: Handle SW XDP ring wrap and bump tail more often") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-4-maciej.fijalkowski@intel.com
2022-03-29ice: xsk: Eliminate unnecessary loop iterationMagnus Karlsson1-1/+1
The NIC Tx ring completion routine cleans entries from the ring in batches. However, it processes one more batch than it is supposed to. Note that this does not matter from a functionality point of view since it will not find a set DD bit for the next batch and just exit the loop. But from a performance perspective, it is faster to terminate the loop before and not issue an expensive read over PCIe to get the DD bit. Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-3-maciej.fijalkowski@intel.com
2022-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-10/+20
Merge in overtime fixes, no conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-23ice: don't allow to run ice_send_event_to_aux() in atomic ctxAlexander Lobakin1-0/+3
ice_send_event_to_aux() eventually descends to mutex_lock() (-> might_sched()), so it must not be called under non-task context. However, at least two fixes have happened already for the bug splats occurred due to this function being called from atomic context. To make the emergency landings softer, bail out early when executed in non-task context emitting a warn splat only once. This way we trade some events being potentially lost for system stability and avoid any related hangs and crashes. Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-23ice: fix 'scheduling while atomic' on aux critical err interruptAlexander Lobakin2-10/+17
There's a kernel BUG splat on processing aux critical error interrupts in ice_misc_intr(): [ 2100.917085] BUG: scheduling while atomic: swapper/15/0/0x00010000 ... [ 2101.060770] Call Trace: [ 2101.063229] <IRQ> [ 2101.065252] dump_stack+0x41/0x60 [ 2101.068587] __schedule_bug.cold.100+0x4c/0x58 [ 2101.073060] __schedule+0x6a4/0x830 [ 2101.076570] schedule+0x35/0xa0 [ 2101.079727] schedule_preempt_disabled+0xa/0x10 [ 2101.084284] __mutex_lock.isra.7+0x310/0x420 [ 2101.088580] ? ice_misc_intr+0x201/0x2e0 [ice] [ 2101.093078] ice_send_event_to_aux+0x25/0x70 [ice] [ 2101.097921] ice_misc_intr+0x220/0x2e0 [ice] [ 2101.102232] __handle_irq_event_percpu+0x40/0x180 [ 2101.106965] handle_irq_event_percpu+0x30/0x80 [ 2101.111434] handle_irq_event+0x36/0x53 [ 2101.115292] handle_edge_irq+0x82/0x190 [ 2101.119148] handle_irq+0x1c/0x30 [ 2101.122480] do_IRQ+0x49/0xd0 [ 2101.125465] common_interrupt+0xf/0xf [ 2101.129146] </IRQ> ... As Andrew correctly mentioned previously[0], the following call ladder happens: ice_misc_intr() <- hardirq ice_send_event_to_aux() device_lock() mutex_lock() might_sleep() might_resched() <- oops Add a new PF state bit which indicates that an aux critical error occurred and serve it in ice_service_task() in process context. The new ice_pf::oicr_err_reg is read-write in both hardirq and process contexts, but only 3 bits of non-critical data probably aren't worth explicit synchronizing (and they're even in the same byte [31:24]). [0] https://lore.kernel.org/all/YeSRUVmrdmlUXHDn@lunn.ch Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Michal Kubiak <michal.kubiak@intel.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-19Merge branch '40GbE' of ↵Jakub Kicinski2-6/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 40GbE Intel Wired LAN Driver Updates 2022-03-17 This series contains updates to i40e and igb drivers. Tom Rix moves a conversion to little endian to occur only when the value is used for i40e. He also zeros out a structure to resolve possible use of garbage value for igb as reported by clang. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igb: zero hwtstamp by default i40e: little endian only valid checksums ==================== Link: https://lore.kernel.org/r/20220317160236.3534321-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18Merge branch '100GbE' of ↵Jakub Kicinski4-3/+35
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-03-16 This series contains updates to gtp and ice driver. Wojciech fixes smatch reported inconsistent indenting for gtp and ice. Yang Yingliang fixes a couple of return value checks for GNSS to IS_PTR instead of null. Jacob adds support for trace events on tx timestamps. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: add trace events for tx timestamps ice: fix return value check in ice_gnss.c ice: Fix inconsistent indenting in ice_switch gtp: Fix inconsistent indenting ==================== Link: https://lore.kernel.org/r/20220316204024.3201500-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-4/+18
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17iavf: Fix hang during reboot/shutdownIvan Vecera1-0/+7
Recent commit 974578017fc1 ("iavf: Add waiting so the port is initialized in remove") adds a wait-loop at the beginning of iavf_remove() to ensure that port initialization is finished prior unregistering net device. This causes a regression in reboot/shutdown scenario because in this case callback iavf_shutdown() is called and this callback detaches the device, makes it down if it is running and sets its state to __IAVF_REMOVE. Later shutdown callback of associated PF driver (e.g. ice_shutdown) is called. That callback calls among other things sriov_disable() that calls indirectly iavf_remove() (see stack trace below). As the adapter state is already __IAVF_REMOVE then the mentioned loop is end-less and shutdown process hangs. The patch fixes this by checking adapter's state at the beginning of iavf_remove() and skips the rest of the function if the adapter is already in remove state (shutdown is in progress). Reproducer: 1. Create VF on PF driven by ice or i40e driver 2. Ensure that the VF is bound to iavf driver 3. Reboot [52625.981294] sysrq: SysRq : Show Blocked State [52625.988377] task:reboot state:D stack: 0 pid:17359 ppid: 1 f2 [52625.996732] Call Trace: [52625.999187] __schedule+0x2d1/0x830 [52626.007400] schedule+0x35/0xa0 [52626.010545] schedule_hrtimeout_range_clock+0x83/0x100 [52626.020046] usleep_range+0x5b/0x80 [52626.023540] iavf_remove+0x63/0x5b0 [iavf] [52626.027645] pci_device_remove+0x3b/0xc0 [52626.031572] device_release_driver_internal+0x103/0x1f0 [52626.036805] pci_stop_bus_device+0x72/0xa0 [52626.040904] pci_stop_and_remove_bus_device+0xe/0x20 [52626.045870] pci_iov_remove_virtfn+0xba/0x120 [52626.050232] sriov_disable+0x2f/0xe0 [52626.053813] ice_free_vfs+0x7c/0x340 [ice] [52626.057946] ice_remove+0x220/0x240 [ice] [52626.061967] ice_shutdown+0x16/0x50 [ice] [52626.065987] pci_device_shutdown+0x34/0x60 [52626.070086] device_shutdown+0x165/0x1c5 [52626.074011] kernel_restart+0xe/0x30 [52626.077593] __do_sys_reboot+0x1d2/0x210 [52626.093815] do_syscall_64+0x5b/0x1a0 [52626.097483] entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://lore.kernel.org/r/20220317104524.2802848-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17igb: zero hwtstamp by defaultTom Rix1-4/+2
Clang static analysis reports this representative issue igb_ptp.c:997:3: warning: The left operand of '+' is a garbage value ktime_add_ns(shhwtstamps.hwtstamp, adjust); ^ ~~~~~~~~~~~~~~~~~~~~ shhwtstamps.hwtstamp is set by a call to igb_ptp_systim_to_hwtstamp(). In the switch-statement for the hw type, the hwtstamp is zeroed for matches but not the default case. Move the memset out of switch-statement. This degarbages the default case and reduces the size. Some whitespace cleanup of empty lines Signed-off-by: Tom Rix <trix@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-17i40e: little endian only valid checksumsTom Rix1-2/+3
The calculation of the checksum can fail. So move converting the checksum to little endian to inside the return status check. Signed-off-by: Tom Rix <trix@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-16ice: add trace events for tx timestampsJacob Keller2-0/+32
We've previously run into many issues related to the latency of a Tx timestamp completion with the ice hardware. It can be difficult to determine the root cause of a slow Tx timestamp. To aid in this, introduce new trace events which capture timing data about when the driver reaches certain points while processing a transmit timestamp * ice_tx_tstamp_request: Trace when the stack initiates a new timestamp request. * ice_tx_tstamp_fw_req: Trace when the driver begins a read of the timestamp register in the work thread. * ice_tx_tstamp_fw_done: Trace when the driver finishes reading a timestamp register in the work thread. * ice_tx_tstamp_complete: Trace when the driver submits the skb back to the stack with a completed Tx timestamp. These trace events can be enabled using the standard trace event subsystem exposed by the ice driver. If they are disabled, they become no-ops with no run time cost. The following is a simple GNU AWK script which can highlight one potential way to use the trace events to capture latency data from the trace buffer about how long the driver takes to process a timestamp: ----- BEGIN { PREC=256 } # Detect requests /tx_tstamp_request/ { time=strtonum($4) skb=$7 # Store the time of request for this skb requests[skb] = time printf("skb %s: idx %d at %.6f\n", skb, idx, time) } # Detect completions /tx_tstamp_complete/ { time=strtonum($4) skb=$7 idx=$9 if (skb in requests) { latency = (time - requests[skb]) * 1000 printf("skb %s: %.3f to complete\n", skb, latency) if (latency > 4) { printf(">>> HIGH LATENCY <<<\n") } printf("\n") } else { printf("!!! skb %s (idx %d) at %.6f\n", skb, idx, time) } } ----- Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-16ice: fix return value check in ice_gnss.cYang Yingliang1-2/+2
kthread_create_worker() and tty_alloc_driver() return ERR_PTR() and never return NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: 43113ff73453 ("ice: add TTY for GNSS module for E810T device") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-16ice: Fix inconsistent indenting in ice_switchWojciech Drewek1-1/+1
Fix the following warning as reported by smatch: smatch warnings: drivers/net/ethernet/intel/ice/ice_switch.c:5568 ice_find_dummy_packet() warn: inconsistent indenting Fixes: 9a225f81f540 ("ice: Support GTP-U and GTP-C offload in switchdev") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15iavf: Fix double free in iavf_reset_taskPrzemyslaw Patynowski1-1/+7
Fix double free possibility in iavf_disable_vf, as crit_lock is freed in caller, iavf_reset_task. Add kernel-doc for iavf_disable_vf. Remove mutex_unlock in iavf_disable_vf. Without this patch there is double free scenario, when calling iavf_reset_task. Fixes: e85ff9c631e1 ("iavf: Fix deadlock in iavf_reset_task") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: destroy flow director filter mutex after releasing VSIsSudheer Mogilappagari1-1/+1
Currently fdir_fltr_lock is accessed in ice_vsi_release_all() function after it is destroyed. Instead destroy mutex after ice_vsi_release_all. Fixes: 40319796b732 ("ice: Add flow director support for channel mode") Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: fix NULL pointer dereference in ice_update_vsi_tx_ring_stats()Maciej Fijalkowski1-2/+3
It is possible to do NULL pointer dereference in routine that updates Tx ring stats. Currently only stats and bytes are updated when ring pointer is valid, but later on ring is accessed to propagate gathered Tx stats onto VSI stats. Change the existing logic to move to next ring when ring is NULL. Fixes: e72bba21355d ("ice: split ice_ring onto Tx/Rx separate structs") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: remove PF pointer from ice_check_vf_initJacob Keller3-16/+14
The ice_check_vf_init function takes both a PF and a VF pointer. Every caller looks up the PF pointer from the VF structure. Some callers only use of the PF pointer is call this function. Move the lookup inside ice_check_vf_init and drop the unnecessary argument. Cleanup the callers to drop the now unnecessary local variables. In particular, replace the local PF pointer with a HW structure pointer in ice_vc_get_vf_res_msg which simplifies a few accesses to the HW structure in that function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: introduce ice_virtchnl.c and ice_virtchnl.hJacob Keller5-3825/+3869
Just as we moved the generic virtualization library logic into ice_vf_lib.c, move the virtchnl message handling into ice_virtchnl.c Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: cleanup long lines in ice_sriov.cJacob Keller1-12/+27
Before we move the virtchnl message handling from ice_sriov.c into ice_virtchnl.c, cleanup some long line warnings to avoid checkpatch.pl complaints. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: introduce ICE_VF_RESET_LOCK flagJacob Keller4-16/+19
The ice_reset_vf function performs actions which must be taken only while holding the VF configuration lock. Some flows already acquired the lock, while other flows must acquire it just for the reset function. Add the ICE_VF_RESET_LOCK flag to the function so that it can handle taking and releasing the lock instead at the appropriate scope. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: introduce ICE_VF_RESET_NOTIFY flagJacob Keller3-39/+34
In some cases of resetting a VF, the PF would like to first notify the VF that a reset is impending. This is currently done via ice_vc_notify_vf_reset. A wrapper to ice_reset_vf, ice_vf_reset_vf, is used to call this function first before calling ice_reset_vf. In fact, every single call to ice_vc_notify_vf_reset occurs just prior to a call to ice_vc_reset_vf. Now that ice_reset_vf has flags, replace this separate call with an ICE_VF_RESET_NOTIFY flag. This removes an unnecessary exported function of ice_vc_notify_vf_reset, and also makes there be a single function to reset VFs (ice_reset_vf). Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>