summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice
AgeCommit message (Collapse)AuthorFilesLines
2022-10-26ice: set tx_tstamps when creating new Tx rings via ethtoolJacob Keller1-0/+1
[ Upstream commit b3b173745c8cab1e24d6821488b60abed3acb24d ] When the user changes the number of queues via ethtool, the driver allocates new rings. This allocation did not initialize tx_tstamps. This results in the tx_tstamps field being zero (due to kcalloc allocation), and would result in a NULL pointer dereference when attempting a transmit timestamp on the new ring. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-28ice: Don't double unplug aux on peer initiated resetDave Ertman1-2/+0
[ Upstream commit 23c619190318376769ad7b61504c2ea0703fb783 ] In the IDC callback that is accessed when the aux drivers request a reset, the function to unplug the aux devices is called. This function is also called in the ice_prepare_for_reset function. This double call is causing a "scheduling while atomic" BUG. [ 662.676430] ice 0000:4c:00.0 rocep76s0: cqp opcode = 0x1 maj_err_code = 0xffff min_err_code = 0x8003 [ 662.676609] ice 0000:4c:00.0 rocep76s0: [Modify QP Cmd Error][op_code=8] status=-29 waiting=1 completion_err=1 maj=0xffff min=0x8003 [ 662.815006] ice 0000:4c:00.0 rocep76s0: ICE OICR event notification: oicr = 0x10000003 [ 662.815014] ice 0000:4c:00.0 rocep76s0: critical PE Error, GLPE_CRITERR=0x00011424 [ 662.815017] ice 0000:4c:00.0 rocep76s0: Requesting a reset [ 662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002 [ 662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002 [ 662.815477] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill 8021q garp mrp stp llc vfat fat rpcrdma intel_rapl_msr intel_rapl_common sunrpc i10nm_edac rdma_ucm nfit ib_srpt libnvdimm ib_isert iscsi_target_mod x86_pkg_temp_thermal intel_powerclamp coretemp target_core_mod snd_hda_intel ib_iser snd_intel_dspcfg libiscsi snd_intel_sdw_acpi scsi_transport_iscsi kvm_intel iTCO_wdt rdma_cm snd_hda_codec kvm iw_cm ipmi_ssif iTCO_vendor_support snd_hda_core irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hwdep snd_seq snd_seq_device rapl snd_pcm snd_timer isst_if_mbox_pci pcspkr isst_if_mmio irdma intel_uncore idxd acpi_ipmi joydev isst_if_common snd mei_me idxd_bus ipmi_si soundcore i2c_i801 mei ipmi_devintf i2c_smbus i2c_ismt ipmi_msghandler acpi_power_meter acpi_pad rv(OE) ib_uverbs ib_cm ib_core xfs libcrc32c ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helpe r ttm [ 662.815546] nvme nvme_core ice drm crc32c_intel i40e t10_pi wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod fuse [ 662.815557] Preemption disabled at: [ 662.815558] [<0000000000000000>] 0x0 [ 662.815563] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S OE 5.17.1 #2 [ 662.815566] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021 [ 662.815568] Call Trace: [ 662.815572] <IRQ> [ 662.815574] dump_stack_lvl+0x33/0x42 [ 662.815581] __schedule_bug.cold.147+0x7d/0x8a [ 662.815588] __schedule+0x798/0x990 [ 662.815595] schedule+0x44/0xc0 [ 662.815597] schedule_preempt_disabled+0x14/0x20 [ 662.815600] __mutex_lock.isra.11+0x46c/0x490 [ 662.815603] ? __ibdev_printk+0x76/0xc0 [ib_core] [ 662.815633] device_del+0x37/0x3d0 [ 662.815639] ice_unplug_aux_dev+0x1a/0x40 [ice] [ 662.815674] ice_schedule_reset+0x3c/0xd0 [ice] [ 662.815693] irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma] [ 662.815712] ? bitmap_find_next_zero_area_off+0x45/0xa0 [ 662.815719] ice_send_event_to_aux+0x54/0x70 [ice] [ 662.815741] ice_misc_intr+0x21d/0x2d0 [ice] [ 662.815756] __handle_irq_event_percpu+0x4c/0x180 [ 662.815762] handle_irq_event_percpu+0xf/0x40 [ 662.815764] handle_irq_event+0x34/0x60 [ 662.815766] handle_edge_irq+0x9a/0x1c0 [ 662.815770] __common_interrupt+0x62/0x100 [ 662.815774] common_interrupt+0xb4/0xd0 [ 662.815779] </IRQ> [ 662.815780] <TASK> [ 662.815780] asm_common_interrupt+0x1e/0x40 [ 662.815785] RIP: 0010:cpuidle_enter_state+0xd6/0x380 [ 662.815789] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [ 662.815791] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202 [ 662.815793] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f [ 662.815795] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7 [ 662.815796] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0 [ 662.815797] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08 [ 662.815798] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000 [ 662.815801] cpuidle_enter+0x29/0x40 [ 662.815803] do_idle+0x261/0x2b0 [ 662.815807] cpu_startup_entry+0x19/0x20 [ 662.815809] start_secondary+0x114/0x150 [ 662.815813] secondary_startup_64_no_verify+0xd5/0xdb [ 662.815818] </TASK> [ 662.815846] bad: scheduling from the idle thread! [ 662.815849] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S W OE 5.17.1 #2 [ 662.815852] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021 [ 662.815853] Call Trace: [ 662.815855] <IRQ> [ 662.815856] dump_stack_lvl+0x33/0x42 [ 662.815860] dequeue_task_idle+0x20/0x30 [ 662.815863] __schedule+0x1c3/0x990 [ 662.815868] schedule+0x44/0xc0 [ 662.815871] schedule_preempt_disabled+0x14/0x20 [ 662.815873] __mutex_lock.isra.11+0x3a8/0x490 [ 662.815876] ? __ibdev_printk+0x76/0xc0 [ib_core] [ 662.815904] device_del+0x37/0x3d0 [ 662.815909] ice_unplug_aux_dev+0x1a/0x40 [ice] [ 662.815937] ice_schedule_reset+0x3c/0xd0 [ice] [ 662.815961] irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma] [ 662.815979] ? bitmap_find_next_zero_area_off+0x45/0xa0 [ 662.815985] ice_send_event_to_aux+0x54/0x70 [ice] [ 662.816011] ice_misc_intr+0x21d/0x2d0 [ice] [ 662.816033] __handle_irq_event_percpu+0x4c/0x180 [ 662.816037] handle_irq_event_percpu+0xf/0x40 [ 662.816039] handle_irq_event+0x34/0x60 [ 662.816042] handle_edge_irq+0x9a/0x1c0 [ 662.816045] __common_interrupt+0x62/0x100 [ 662.816048] common_interrupt+0xb4/0xd0 [ 662.816052] </IRQ> [ 662.816053] <TASK> [ 662.816054] asm_common_interrupt+0x1e/0x40 [ 662.816057] RIP: 0010:cpuidle_enter_state+0xd6/0x380 [ 662.816060] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [ 662.816063] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202 [ 662.816065] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f [ 662.816067] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7 [ 662.816068] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0 [ 662.816070] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08 [ 662.816071] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000 [ 662.816075] cpuidle_enter+0x29/0x40 [ 662.816077] do_idle+0x261/0x2b0 [ 662.816080] cpu_startup_entry+0x19/0x20 [ 662.816083] start_secondary+0x114/0x150 [ 662.816087] secondary_startup_64_no_verify+0xd5/0xdb [ 662.816091] </TASK> [ 662.816169] bad: scheduling from the idle thread! The correct place to unplug the aux devices for a reset is in the prepare_for_reset function, as this is a common place for all reset flows. It also has built in protection from being called twice in a single reset instance before the aux devices are replugged. Fixes: f9f5301e7e2d4 ("ice: Register auxiliary device to provide RDMA") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Helena Anna Dubel <helena.anna.dubel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15ice: use bitmap_free instead of devm_kfreeMichal Swiatkowski1-1/+1
[ Upstream commit 59ac325557b6c14f1f793b90d3946bc145ffa085 ] pf->avail_txqs was allocated using bitmap_zalloc, bitmap_free should be used to free this memory. Fixes: 78b5713ac1241 ("ice: Alloc queue management bitmaps and arrays dynamically") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31ice: xsk: prohibit usage of non-balanced queue idMaciej Fijalkowski1-0/+6
[ Upstream commit 5a42f112d367bb4700a8a41f5c12724fde6bfbb9 ] Fix the following scenario: 1. ethtool -L $IFACE rx 8 tx 96 2. xdpsock -q 10 -t -z Above refers to a case where user would like to attach XSK socket in txonly mode at a queue id that does not have a corresponding Rx queue. At this moment ice's XSK logic is tightly bound to act on a "queue pair", e.g. both Tx and Rx queues at a given queue id are disabled/enabled and both of them will get XSK pool assigned, which is broken for the presented queue configuration. This results in the splat included at the bottom, which is basically an OOB access to Rx ring array. To fix this, allow using the ids only in scope of "combined" queues reported by ethtool. However, logic should be rewritten to allow such configurations later on, which would end up as a complete rewrite of the control path, so let us go with this temporary fix. [420160.558008] BUG: kernel NULL pointer dereference, address: 0000000000000082 [420160.566359] #PF: supervisor read access in kernel mode [420160.572657] #PF: error_code(0x0000) - not-present page [420160.579002] PGD 0 P4D 0 [420160.582756] Oops: 0000 [#1] PREEMPT SMP NOPTI [420160.588396] CPU: 10 PID: 21232 Comm: xdpsock Tainted: G OE 5.19.0-rc7+ #10 [420160.597893] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [420160.609894] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice] [420160.616968] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85 [420160.639421] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282 [420160.646650] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8 [420160.655893] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000 [420160.665166] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000 [420160.674493] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a [420160.683833] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828 [420160.693211] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000 [420160.703645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [420160.711783] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0 [420160.721399] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [420160.731045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [420160.740707] PKRU: 55555554 [420160.745960] Call Trace: [420160.750962] <TASK> [420160.755597] ? kmalloc_large_node+0x79/0x90 [420160.762703] ? __kmalloc_node+0x3f5/0x4b0 [420160.769341] xp_assign_dev+0xfd/0x210 [420160.775661] ? shmem_file_read_iter+0x29a/0x420 [420160.782896] xsk_bind+0x152/0x490 [420160.788943] __sys_bind+0xd0/0x100 [420160.795097] ? exit_to_user_mode_prepare+0x20/0x120 [420160.802801] __x64_sys_bind+0x16/0x20 [420160.809298] do_syscall_64+0x38/0x90 [420160.815741] entry_SYSCALL_64_after_hwframe+0x63/0xcd [420160.823731] RIP: 0033:0x7fa86a0dd2fb [420160.830264] Code: c3 66 0f 1f 44 00 00 48 8b 15 69 8b 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 8b 0c 00 f7 d8 64 89 01 48 [420160.855410] RSP: 002b:00007ffc1146f618 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 [420160.866366] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa86a0dd2fb [420160.876957] RDX: 0000000000000010 RSI: 00007ffc1146f680 RDI: 0000000000000003 [420160.887604] RBP: 000055d7113a0520 R08: 00007fa868fb8000 R09: 0000000080000000 [420160.898293] R10: 0000000000008001 R11: 0000000000000246 R12: 000055d7113a04e0 [420160.909038] R13: 000055d7113a0320 R14: 000000000000000a R15: 0000000000000000 [420160.919817] </TASK> [420160.925659] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mei_me coretemp ioatdma mei ipmi_si wmi ipmi_msghandler acpi_pad acpi_power_meter ip_tables x_tables autofs4 ixgbe i40e crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ahci mdio dca libahci lpc_ich [last unloaded: ice] [420160.977576] CR2: 0000000000000082 [420160.985037] ---[ end trace 0000000000000000 ]--- [420161.097724] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice] [420161.107341] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85 [420161.134741] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282 [420161.144274] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8 [420161.155690] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000 [420161.168088] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000 [420161.179295] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a [420161.190420] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828 [420161.201505] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000 [420161.213628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [420161.223413] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0 [420161.234653] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [420161.245893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [420161.257052] PKRU: 55555554 Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31ice: xsk: Force rings to be sized to power of 2Maciej Fijalkowski1-0/+8
[ Upstream commit 296f13ff3854535009a185aaf8e3603266d39d94 ] With the upcoming introduction of batching to XSK data path, performance wise it will be the best to have the ring descriptor count to be aligned to power of 2. Check if ring sizes that user is going to attach the XSK socket fulfill the condition above. For Tx side, although check is being done against the Tx queue and in the end the socket will be attached to the XDP queue, it is fine since XDP queues get the ring->count setting from Tx queues. Suggested-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220125160446.78976-3-maciej.fijalkowski@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25ice: Ignore EEXIST when setting promisc modeGrzegorz Siwik1-1/+1
commit 11e551a2efa4481bd4f616ab75374a2710b480e9 upstream. Ignore EEXIST error when setting promiscuous mode. This fix is needed because the driver could set promiscuous mode when it still has not cleared properly. Promiscuous mode could be set only once, so setting it second time will be rejected. Fixes: 5eda8afd6bcc ("ice: Add support for PF/VF promiscuous mode") Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/ Tested-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Tested-by: Igor Raits <igor@gooddata.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03ice: do not setup vlan for loopback VSIMaciej Fijalkowski1-3/+5
commit cc019545a238518fa9da1e2a889f6e1bb1005a63 upstream. Currently loopback test is failiing due to the error returned from ice_vsi_vlan_setup(). Skip calling it when preparing loopback VSI. Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)Maciej Fijalkowski1-1/+2
commit 283d736ff7c7e96ac5b32c6c0de40372f8eb171e upstream. Tx side sets EOP and RS bits on descriptors to indicate that a particular descriptor is the last one and needs to generate an irq when it was sent. These bits should not be checked on completion path regardless whether it's the Tx or the Rx. DD bit serves this purpose and it indicates that a particular descriptor is either for Rx or was successfully Txed. EOF is also set as loopback test does not xmit fragmented frames. Look at (DD | EOF) bits setting in ice_lbtest_receive_frames() instead of EOP and RS pair. Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-29ice: ethtool: advertise 1000M speeds properlyAnatolii Gerasymenko1-1/+38
[ Upstream commit c3d184c83ff4b80167e34edfc3d21df424bf27ff ] In current implementation ice_update_phy_type enables all link modes for selected speed. This approach doesn't work for 1000M speeds, because both copper (1000baseT) and optical (1000baseX) standards cannot be enabled at once. Fix this, by adding the function `ice_set_phy_type_from_speed()` for 1000M speeds. Fixes: 48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-30ice: fix crash at allocation failureMagnus Karlsson1-1/+1
Fix a crash in the zero-copy driver that occurs when it fails to allocate buffers from user-space. This crash can easily be triggered by a malicious program that does not provide any buffers in the fill ring for the kernel to use. Note that this bug does not exist in upstream since the batched buffer allocation interface got introduced in 5.16 and replaced this code. Reported-by: Jeff Shaw <jeffrey.b.shaw@intel.com> Tested-by: Jeff Shaw <jeffrey.b.shaw@intel.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-25ice: Fix interrupt moderation settings getting clearedMichal Wilczynski2-11/+16
[ Upstream commit bf13502ed5f941b0777b3fd1e24dac5d93f3886c ] Adaptive-rx and Adaptive-tx are interrupt moderation settings that can be enabled/disabled using ethtool: ethtool -C ethX adaptive-rx on/off adaptive-tx on/off Unfortunately those settings are getting cleared after changing number of queues, or in ethtool world 'channels': ethtool -L ethX rx 1 tx 1 Clearing was happening due to introduction of bit fields in ice_ring_container struct. This way only itr_setting bits were rebuilt during ice_vsi_rebuild_set_coalesce(). Introduce an anonymous struct of bitfields and create a union to refer to them as a single variable. This way variable can be easily saved and restored. Fixes: 61dc79ced7aa ("ice: Restore interrupt throttle settings after VSI rebuild") Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-25ice: move ice_container_type onto ice_ring_containerMaciej Fijalkowski3-23/+23
[ Upstream commit dc23715cf30a9acb808f5b08962877c390d3e6ea ] Currently ice_container_type is scoped only for ice_ethtool.c. Next commit that will split the ice_ring struct onto Rx/Tx specific ring structs is going to also modify the type of linked list of rings that is within ice_ring_container. Therefore, the functions that are taking the ice_ring_container as an input argument will need to be aware of a ring type that will be looked up. Embed ice_container_type within ice_ring_container and initialize it properly when allocating the q_vectors. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-25ice: fix possible under reporting of ethtool Tx and Rx statisticsPaul Greenwalt1-3/+4
[ Upstream commit 31b6298fd8e29effe9ed6b77351ac5969be56ce0 ] The hardware statistics counters are not cleared during resets so the drivers first access is to initialize the baseline and then subsequent reads are for reporting the counters. The statistics counters are read during the watchdog subtask when the interface is up. If the baseline is not initialized before the interface is up, then there can be a brief window in which some traffic can be transmitted/received before the initial baseline reading takes place. Directly initialize ethtool statistics in driver open so the baseline will be initialized when the interface is up, and any dropped packets incremented before the interface is up won't be reported. Fixes: 28dc1b86f8ea9 ("ice: ignore dropped packets during init") Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-25ice: fix crash when writing timestamp on RX ringsArkadiusz Kubalewski1-4/+15
[ Upstream commit 4503cc7fdf9a84cd631b0cb8ecb3c9b1bdbf3594 ] Do not allow to write timestamps on RX rings if PF is being configured. When PF is being configured RX rings can be freed or rebuilt. If at the same time timestamps are updated, the kernel will crash by dereferencing null RX ring pointer. PID: 1449 TASK: ff187d28ed658040 CPU: 34 COMMAND: "ice-ptp-0000:51" #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be #1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d #2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd #3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54 #4 [ff1966a94a713d08] no_context at ffffffff9d06bda4 #5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c #6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4 #7 [ff1966a94a713de0] page_fault at ffffffff9da0107e [exception RIP: ice_ptp_update_cached_phctime+91] RIP: ffffffffc076db8b RSP: ff1966a94a713e98 RFLAGS: 00010246 RAX: 16e3db9c6b7ccae4 RBX: ff187d269dd3c180 RCX: ff187d269cd4d018 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ff187d269cfcc644 R8: ff187d339b9641b0 R9: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: ff187d269cfcc648 R13: ffffffff9f128784 R14: ffffffff9d101b70 R15: ff187d269cfcc640 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice] #9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b #10 [ff1966a94a713f10] kthread at ffffffff9d101b4d #11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f Fixes: 77a781155a65 ("ice: enable receive hardware timestamping") Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Dave Cain <dcain@redhat.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18ice: fix PTP stale Tx timestamps cleanupMichal Michalik1-2/+8
[ Upstream commit a11b6c1a383ff092f432e040c20e032503785d47 ] Read stale PTP Tx timestamps from PHY on cleanup. After running out of Tx timestamps request handlers, hardware (HW) stops reporting finished requests. Function ice_ptp_tx_tstamp_cleanup() used to only clean up stale handlers in driver and was leaving the hardware registers not read. Not reading stale PTP Tx timestamps prevents next interrupts from arriving and makes timestamping unusable. Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Michal Michalik <michal.michalik@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18ice: Fix race during aux device (un)pluggingIvan Vecera3-8/+20
[ Upstream commit 486b9eee57ddca5c9a2d59fc41153f36002e0a00 ] Function ice_plug_aux_dev() assigns pf->adev field too early prior aux device initialization and on other side ice_unplug_aux_dev() starts aux device deinit and at the end assigns NULL to pf->adev. This is wrong because pf->adev should always be non-NULL only when aux device is fully initialized and ready. This wrong order causes a crash when ice_send_event_to_aux() call occurs because that function depends on non-NULL value of pf->adev and does not assume that aux device is half-initialized or half-destroyed. After order correction the race window is tiny but it is still there, as Leon mentioned and manipulation with pf->adev needs to be protected by mutex. Fix (un-)plugging functions so pf->adev field is set after aux device init and prior aux device destroy and protect pf->adev assignment by new mutex. This mutex is also held during ice_send_event_to_aux() call to ensure that aux device is valid during that call. Note that device lock used ice_send_event_to_aux() needs to be kept to avoid race with aux drv unload. Reproducer: cycle=1 while :;do echo "#### Cycle: $cycle" ip link set ens7f0 mtu 9000 ip link add bond0 type bond mode 1 miimon 100 ip link set bond0 up ifenslave bond0 ens7f0 ip link set bond0 mtu 9000 ethtool -L ens7f0 combined 1 ip link del bond0 ip link set ens7f0 mtu 1500 sleep 1 let cycle++ done In short when the device is added/removed to/from bond the aux device is unplugged/plugged. When MTU of the device is changed an event is sent to aux device asynchronously. This can race with (un)plugging operation and because pf->adev is set too early (plug) or too late (unplug) the function ice_send_event_to_aux() can touch uninitialized or destroyed fields. In the case of crash below pf->adev->dev.mutex. Crash: [ 53.372066] bond0: (slave ens7f0): making interface the new active one [ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready [ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.248204] bond0: (slave ens7f0): Releasing backup interface [ 54.253955] bond0: (slave ens7f1): making interface the new active one [ 54.274875] bond0: (slave ens7f1): Releasing backup interface [ 54.289153] bond0 (unregistering): Released all slaves [ 55.383179] MII link monitoring set to 100 ms [ 55.398696] bond0: (slave ens7f0): making interface the new active one [ 55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080 [ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 55.412198] #PF: supervisor write access in kernel mode [ 55.412200] #PF: error_code(0x0002) - not-present page [ 55.412201] PGD 25d2ad067 P4D 0 [ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S 5.17.0-13579-g57f2d6540f03 #1 [ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/ 2021 [ 55.430226] Workqueue: ice ice_service_task [ice] [ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20 [ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48 [ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246 [ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001 [ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080 [ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041 [ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0 [ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000 [ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000 [ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0 [ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.567305] PKRU: 55555554 [ 55.570018] Call Trace: [ 55.572474] <TASK> [ 55.574579] ice_service_task+0xaab/0xef0 [ice] [ 55.579130] process_one_work+0x1c5/0x390 [ 55.583141] ? process_one_work+0x390/0x390 [ 55.587326] worker_thread+0x30/0x360 [ 55.590994] ? process_one_work+0x390/0x390 [ 55.595180] kthread+0xe6/0x110 [ 55.598325] ? kthread_complete_and_exit+0x20/0x20 [ 55.603116] ret_from_fork+0x1f/0x30 [ 55.606698] </TASK> Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13ice: xsk: fix VSI state check in ice_xsk_wakeup()Maciej Fijalkowski1-1/+1
[ Upstream commit 72b915a2b444e9247c9d424a840e94263db07c27 ] ICE_DOWN is dedicated for pf->state. Check for ICE_VSI_DOWN being set on vsi->state in ice_xsk_wakeup(). Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13ice: synchronize_rcu() when terminating ringsMaciej Fijalkowski3-3/+7
[ Upstream commit f9124c68f05ffdb87a47e3ea6d5fae9dad7cb6eb ] Unfortunately, the ice driver doesn't respect the RCU critical section that XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to paths that destroy resources that might be in use. This was addressed in other AF_XDP ZC enabled drivers, for reference see for example commit b3873a5be757 ("net/i40e: Fix concurrency issues between config flow and XSK") Fixes: efc2214b6047 ("ice: Add support for XDP") Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13ice: Do not skip not enabled queues in ice_vc_dis_qs_msgAnatolii Gerasymenko1-2/+2
[ Upstream commit 05ef6813b234db3196f083b91db3963f040b65bb ] Disable check for queue being enabled in ice_vc_dis_qs_msg, because there could be a case when queues were created, but were not enabled. We still need to delete those queues. Normal workflow for VF looks like: Enable path: VIRTCHNL_OP_ADD_ETH_ADDR (opcode 10) VIRTCHNL_OP_CONFIG_VSI_QUEUES (opcode 6) VIRTCHNL_OP_ENABLE_QUEUES (opcode 8) Disable path: VIRTCHNL_OP_DISABLE_QUEUES (opcode 9) VIRTCHNL_OP_DEL_ETH_ADDR (opcode 11) The issue appears only in stress conditions when VF is enabled and disabled very fast. Eventually there will be a case, when queues are created by VIRTCHNL_OP_CONFIG_VSI_QUEUES, but are not enabled by VIRTCHNL_OP_ENABLE_QUEUES. In turn, these queues are not deleted by VIRTCHNL_OP_DISABLE_QUEUES, because there is a check whether queues are enabled in ice_vc_dis_qs_msg. When we bring up the VF again, we will see the "Failed to set LAN Tx queue context" error during VIRTCHNL_OP_CONFIG_VSI_QUEUES step. This happens because old 16 queues were not deleted and VF requests to create 16 more, but ice_sched_get_free_qparent in ice_ena_vsi_txq would fail to find a parent node for first newly requested queue (because all nodes are allocated to 16 old queues). Testing Hints: Just enable and disable VF fast enough, so it would be disabled before reaching VIRTCHNL_OP_ENABLE_QUEUES. while true; do ip link set dev ens785f0v0 up sleep 0.065 # adjust delay value for you machine ip link set dev ens785f0v0 down done Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13ice: Set txq_teid to ICE_INVAL_TEID on ring creationAnatolii Gerasymenko1-0/+1
[ Upstream commit ccfee1822042b87e5135d33cad8ea353e64612d2 ] When VF is freshly created, but not brought up, ring->txq_teid value is by default set to 0. But 0 is a valid TEID. On some platforms the Root Node of Tx scheduler has a TEID = 0. This can cause issues as shown below. The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF). Testing Hints: echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs ip link set dev ens785f0v0 up ip link set dev ens785f0v0 down If we have freshly created VF and quickly turn it on and off, so there would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error: [ 639.531454] disable queue 89 failed 14 [ 639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR [ 639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5 The reason for the fail is that we are trying to send AQ command to delete queue 89, which has never been created and receive an "invalid argument" error from firmware. As this queue has never been created, it's teid and ring->txq_teid have default value 0. ice_dis_vsi_txq has a check against non-existent queues: node = ice_sched_find_node_by_teid(pi->root, q_teids[i]); if (!node) continue; But on some platforms the Root Node of Tx scheduler has a teid = 0. Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is pi->root), and we go further to submit an erroneous request to firmware. Fixes: 37bb83901286 ("ice: Move common functions out of ice_main.c part 7/7") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13ice: Clear default forwarding VSI during VSI releaseIvan Vecera1-0/+2
[ Upstream commit bd8c624c0cd59de0032752ba3001c107bba97f7b ] VSI is set as default forwarding one when promisc mode is set for PF interface, when PF is switched to switchdev mode or when VF driver asks to enable allmulticast or promisc mode for the VF interface (when vf-true-promisc-support priv flag is off). The third case is buggy because in that case VSI associated with VF remains as default one after VF removal. Reproducer: 1. Create VF echo 1 > sys/class/net/ens7f0/device/sriov_numvfs 2. Enable allmulticast or promisc mode on VF ip link set ens7f0v0 allmulticast on ip link set ens7f0v0 promisc on 3. Delete VF echo 0 > sys/class/net/ens7f0/device/sriov_numvfs 4. Try to enable promisc mode on PF ip link set ens7f0 promisc on Although it looks that promisc mode on PF is enabled the opposite is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC handling first checks if any other VSI is set as default forwarding one and if so the function does not do anything. At this point it is not possible to enable promisc mode on PF without re-probe device. To resolve the issue this patch clear default forwarding VSI during ice_vsi_release() when the VSI to be released is the default one. Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08ice: don't allow to run ice_send_event_to_aux() in atomic ctxAlexander Lobakin1-0/+3
[ Upstream commit 5a3156932da06f09953764de113419f254086faf ] ice_send_event_to_aux() eventually descends to mutex_lock() (-> might_sched()), so it must not be called under non-task context. However, at least two fixes have happened already for the bug splats occurred due to this function being called from atomic context. To make the emergency landings softer, bail out early when executed in non-task context emitting a warn splat only once. This way we trade some events being potentially lost for system stability and avoid any related hangs and crashes. Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08ice: fix 'scheduling while atomic' on aux critical err interruptAlexander Lobakin2-10/+17
[ Upstream commit 32d53c0aa3a7b727243473949bad2a830b908edc ] There's a kernel BUG splat on processing aux critical error interrupts in ice_misc_intr(): [ 2100.917085] BUG: scheduling while atomic: swapper/15/0/0x00010000 ... [ 2101.060770] Call Trace: [ 2101.063229] <IRQ> [ 2101.065252] dump_stack+0x41/0x60 [ 2101.068587] __schedule_bug.cold.100+0x4c/0x58 [ 2101.073060] __schedule+0x6a4/0x830 [ 2101.076570] schedule+0x35/0xa0 [ 2101.079727] schedule_preempt_disabled+0xa/0x10 [ 2101.084284] __mutex_lock.isra.7+0x310/0x420 [ 2101.088580] ? ice_misc_intr+0x201/0x2e0 [ice] [ 2101.093078] ice_send_event_to_aux+0x25/0x70 [ice] [ 2101.097921] ice_misc_intr+0x220/0x2e0 [ice] [ 2101.102232] __handle_irq_event_percpu+0x40/0x180 [ 2101.106965] handle_irq_event_percpu+0x30/0x80 [ 2101.111434] handle_irq_event+0x36/0x53 [ 2101.115292] handle_edge_irq+0x82/0x190 [ 2101.119148] handle_irq+0x1c/0x30 [ 2101.122480] do_IRQ+0x49/0xd0 [ 2101.125465] common_interrupt+0xf/0xf [ 2101.129146] </IRQ> ... As Andrew correctly mentioned previously[0], the following call ladder happens: ice_misc_intr() <- hardirq ice_send_event_to_aux() device_lock() mutex_lock() might_sleep() might_resched() <- oops Add a new PF state bit which indicates that an aux critical error occurred and serve it in ice_service_task() in process context. The new ice_pf::oicr_err_reg is read-write in both hardirq and process contexts, but only 3 bits of non-critical data probably aren't worth explicit synchronizing (and they're even in the same byte [31:24]). [0] https://lore.kernel.org/all/YeSRUVmrdmlUXHDn@lunn.ch Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Michal Kubiak <michal.kubiak@intel.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-19ice: Fix race condition during interface enslaveIvan Vecera2-2/+21
commit 5cb1ebdbc4342b1c2ce89516e19808d64417bdbc upstream. Commit 5dbbbd01cbba83 ("ice: Avoid RTNL lock when re-creating auxiliary device") changes a process of re-creation of aux device so ice_plug_aux_dev() is called from ice_service_task() context. This unfortunately opens a race window that can result in dead-lock when interface has left LAG and immediately enters LAG again. Reproducer: ``` #!/bin/sh ip link add lag0 type bond mode 1 miimon 100 ip link set lag0 for n in {1..10}; do echo Cycle: $n ip link set ens7f0 master lag0 sleep 1 ip link set ens7f0 nomaster done ``` This results in: [20976.208697] Workqueue: ice ice_service_task [ice] [20976.213422] Call Trace: [20976.215871] __schedule+0x2d1/0x830 [20976.219364] schedule+0x35/0xa0 [20976.222510] schedule_preempt_disabled+0xa/0x10 [20976.227043] __mutex_lock.isra.7+0x310/0x420 [20976.235071] enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core] [20976.251215] ib_enum_roce_netdev+0xa4/0xe0 [ib_core] [20976.256192] ib_cache_setup_one+0x33/0xa0 [ib_core] [20976.261079] ib_register_device+0x40d/0x580 [ib_core] [20976.266139] irdma_ib_register_device+0x129/0x250 [irdma] [20976.281409] irdma_probe+0x2c1/0x360 [irdma] [20976.285691] auxiliary_bus_probe+0x45/0x70 [20976.289790] really_probe+0x1f2/0x480 [20976.298509] driver_probe_device+0x49/0xc0 [20976.302609] bus_for_each_drv+0x79/0xc0 [20976.306448] __device_attach+0xdc/0x160 [20976.310286] bus_probe_device+0x9d/0xb0 [20976.314128] device_add+0x43c/0x890 [20976.321287] __auxiliary_device_add+0x43/0x60 [20976.325644] ice_plug_aux_dev+0xb2/0x100 [ice] [20976.330109] ice_service_task+0xd0c/0xed0 [ice] [20976.342591] process_one_work+0x1a7/0x360 [20976.350536] worker_thread+0x30/0x390 [20976.358128] kthread+0x10a/0x120 [20976.365547] ret_from_fork+0x1f/0x40 ... [20976.438030] task:ip state:D stack: 0 pid:213658 ppid:213627 flags:0x00004084 [20976.446469] Call Trace: [20976.448921] __schedule+0x2d1/0x830 [20976.452414] schedule+0x35/0xa0 [20976.455559] schedule_preempt_disabled+0xa/0x10 [20976.460090] __mutex_lock.isra.7+0x310/0x420 [20976.464364] device_del+0x36/0x3c0 [20976.467772] ice_unplug_aux_dev+0x1a/0x40 [ice] [20976.472313] ice_lag_event_handler+0x2a2/0x520 [ice] [20976.477288] notifier_call_chain+0x47/0x70 [20976.481386] __netdev_upper_dev_link+0x18b/0x280 [20976.489845] bond_enslave+0xe05/0x1790 [bonding] [20976.494475] do_setlink+0x336/0xf50 [20976.502517] __rtnl_newlink+0x529/0x8b0 [20976.543441] rtnl_newlink+0x43/0x60 [20976.546934] rtnetlink_rcv_msg+0x2b1/0x360 [20976.559238] netlink_rcv_skb+0x4c/0x120 [20976.563079] netlink_unicast+0x196/0x230 [20976.567005] netlink_sendmsg+0x204/0x3d0 [20976.570930] sock_sendmsg+0x4c/0x50 [20976.574423] ____sys_sendmsg+0x1eb/0x250 [20976.586807] ___sys_sendmsg+0x7c/0xc0 [20976.606353] __sys_sendmsg+0x57/0xa0 [20976.609930] do_syscall_64+0x5b/0x1a0 [20976.613598] entry_SYSCALL_64_after_hwframe+0x65/0xca 1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev() is called from ice_service_task() context, aux device is created and associated device->lock is taken. 2. Command 'ip link ... set master...' calls ice's notifier under RTNL lock and that notifier calls ice_unplug_aux_dev(). That function tries to take aux device->lock but this is already taken by ice_plug_aux_dev() in step 1 3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already taken in step 2 4. Dead-lock The patch fixes this issue by following changes: - Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev() call in ice_service_task() - The bit is checked in ice_clear_rdma_cap() and only if it is not set then ice_unplug_aux_dev() is called. If it is set (in other words plugging of aux device was requested and ice_plug_aux_dev() is potentially running) then the function only clears the bit - Once ice_plug_aux_dev() call (in ice_service_task) is finished the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked whether it was already cleared by ice_clear_rdma_cap(). If so then aux device is unplugged. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Co-developed-by: Petr Oros <poros@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16ice: Fix curr_link_speed advertised speedJedrzej Jagielski1-1/+1
[ Upstream commit ad35ffa252af67d4cc7c744b9377a2b577748e3f ] Change curr_link_speed advertised speed, due to link_info.link_speed is not equal phy.curr_user_speed_req. Without this patch it is impossible to set advertised speed to same as link_speed. Testing Hints: Try to set advertised speed to 25G only with 25G default link (use ethtool -s 0x80000000) Fixes: 48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations") Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-16ice: Don't use GFP_KERNEL in atomic contextChristophe JAILLET1-1/+1
[ Upstream commit 3d97f1afd8d831e0c0dc1157418f94b8faa97b54 ] ice_misc_intr() is an irq handler. It should not sleep. Use GFP_ATOMIC instead of GFP_KERNEL when allocating some memory. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-16ice: Fix error with handling of bonding MTUDave Ertman2-15/+15
[ Upstream commit 97b0129146b1544bbb0773585327896da3bb4e0a ] When a bonded interface is destroyed, .ndo_change_mtu can be called during the tear-down process while the RTNL lock is held. This is a problem since the auxiliary driver linked to the LAN driver needs to be notified of the MTU change, and this requires grabbing a device_lock on the auxiliary_device's dev. Currently this is being attempted in the same execution context as the call to .ndo_change_mtu which is causing a dead-lock. Move the notification of the changed MTU to a separate execution context (watchdog service task) and eliminate the "before" notification. Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-16ice: stop disabling VFs due to PF error responsesJacob Keller2-21/+0
[ Upstream commit 79498d5af8e458102242d1667cf44df1f1564e63 ] The ice_vc_send_msg_to_vf function has logic to detect "failure" responses being sent to a VF. If a VF is sent more than ICE_DFLT_NUM_INVAL_MSGS_ALLOWED then the VF is marked as disabled. Almost identical logic also existed in the i40e driver. This logic was added to the ice driver in commit 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") which itself copied from the i40e implementation in commit 5c3c48ac6bf5 ("i40e: implement virtual device interface"). Neither commit provides a proper explanation or justification of the check. In fact, later commits to i40e changed the logic to allow bypassing the check in some specific instances. The "logic" for this seems to be that error responses somehow indicate a malicious VF. This is not really true. The PF might be sending an error for any number of reasons such as lack of resources, etc. Additionally, this causes the PF to log an info message for every failed VF response which may confuse users, and can spam the kernel log. This behavior is not documented as part of any requirement for our products and other operating system drivers such as the FreeBSD implementation of our drivers do not include this type of check. In fact, the change from dev_err to dev_info in i40e commit 18b7af57d9c1 ("i40e: Lower some message levels") explains that these messages typically don't actually indicate a real issue. It is quite likely that a user who hits this in practice will be very confused as the VF will be disabled without an obvious way to recover. We already have robust malicious driver detection logic using actual hardware detection mechanisms that detect and prevent invalid device usage. Remove the logic since its not a documented requirement and the behavior is not intuitive. Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-02ice: fix concurrent reset and removal of VFsJacob Keller3-18/+27
commit fadead80fe4c033b5e514fcbadd20b55c4494112 upstream. Commit c503e63200c6 ("ice: Stop processing VF messages during teardown") introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is intended to prevent some issues with concurrently handling messages from VFs while tearing down the VFs. This change was motivated by crashes caused while tearing down and bringing up VFs in rapid succession. It turns out that the fix actually introduces issues with the VF driver caused because the PF no longer responds to any messages sent by the VF during its .remove routine. This results in the VF potentially removing its DMA memory before the PF has shut down the device queues. Additionally, the fix doesn't actually resolve concurrency issues within the ice driver. It is possible for a VF to initiate a reset just prior to the ice driver removing VFs. This can result in the remove task concurrently operating while the VF is being reset. This results in similar memory corruption and panics purportedly fixed by that commit. Fix this concurrency at its root by protecting both the reset and removal flows using the existing VF cfg_lock. This ensures that we cannot remove the VF while any outstanding critical tasks such as a virtchnl message or a reset are occurring. This locking change also fixes the root cause originally fixed by commit c503e63200c6 ("ice: Stop processing VF messages during teardown"), so we can simply revert it. Note that I kept these two changes together because simply reverting the original commit alone would leave the driver vulnerable to worse race conditions. Fixes: c503e63200c6 ("ice: Stop processing VF messages during teardown") Cc: <stable@vger.kernel.org> # e6ba5273d4ed: ice: Fix race conditions between virtchnl handling and VF ndo ops Cc: <stable@vger.kernel.org> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02ice: Fix race conditions between virtchnl handling and VF ndo opsBrett Creeley2-0/+30
commit e6ba5273d4ede03d075d7a116b8edad1f6115f4d upstream. The VF can be configured via the PF's ndo ops at the same time the PF is receiving/handling virtchnl messages. This has many issues, with one of them being the ndo op could be actively resetting a VF (i.e. resetting it to the default state and deleting/re-adding the VF's VSI) while a virtchnl message is being handled. The following error was seen because a VF ndo op was used to change a VF's trust setting while the VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing: [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5 [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6 Fix this by making sure the virtchnl handling and VF ndo ops that trigger VF resets cannot run concurrently. This is done by adding a struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex will be locked around the critical operations and VFR. Since the ndo ops will trigger a VFR, the virtchnl thread will use mutex_trylock(). This is done because if any other thread (i.e. VF ndo op) has the mutex, then that means the current VF message being handled is no longer valid, so just ignore it. This issue can be seen using the following commands: for i in {0..50}; do rmmod ice modprobe ice sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on sleep 2 echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on done Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations") Cc: <stable@vger.kernel.org> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> [I had to fix the cherry-pick manually as the patch added a line around some context that was missing.] Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02ice: initialize local variable 'tlv'Tom Rix1-1/+1
commit 5950bdc88dd1d158f2845fdff8fb1de86476806c upstream. Clang static analysis reports this issues ice_common.c:5008:21: warning: The left expression of the compound assignment is an uninitialized value. The computed value will also be garbage ldo->phy_type_low |= ((u64)buf << (i * 16)); ~~~~~~~~~~~~~~~~~ ^ When called from ice_cfg_phy_fec() ldo is the uninitialized local variable tlv. So initialize. Fixes: ea78ce4dab05 ("ice: add link lenient and default override support") Signed-off-by: Tom Rix <trix@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02ice: check the return of ice_ptp_gettimex64Tom Rix1-1/+4
commit ed22d9c8d128293fc7b0b086c7d3654bcb99a8dd upstream. Clang static analysis reports this issue time64.h:69:50: warning: The left operand of '+' is a garbage value set_normalized_timespec64(&ts_delta, lhs.tv_sec + rhs.tv_sec, ~~~~~~~~~~ ^ In ice_ptp_adjtime_nonatomic(), the timespec64 variable 'now' is set by ice_ptp_gettimex64(). This function can fail with -EBUSY, so 'now' can have a gargbage value. So check the return. Fixes: 06c16d89d2cb ("ice: register 1588 PTP clock device object for E810 devices") Signed-off-by: Tom Rix <trix@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23ice: enable parsing IPSEC SPI headers for RSSJesse Brandeburg1-0/+6
commit 86006f996346e8a5a1ea80637ec949ceeea4ecbc upstream. The COMMS package can enable the hardware parser to recognize IPSEC frames with ESP header and SPI identifier. If this package is available and configured for loading in /lib/firmware, then the driver will succeed in enabling this protocol type for RSS. This in turn allows the hardware to hash over the SPI and use it to pick a consistent receive queue for the same secure flow. Without this all traffic is steered to the same queue for multiple traffic threads from the same IP address. For that reason this is marked as a fix, as the driver supports the model, but it wasn't enabled. If the package is not available, adding this type will fail, but the failure is ignored on purpose as it has no negative affect. Fixes: c90ed40cefe1 ("ice: Enable writing hardware filtering tables") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-16ice: Avoid RTNL lock when re-creating auxiliary deviceDave Ertman2-1/+5
[ Upstream commit 5dbbbd01cbba831233c6ea9a3e6bfa133606d3c0 ] If a call to re-create the auxiliary device happens in a context that has already taken the RTNL lock, then the call flow that recreates auxiliary device can hang if there is another attempt to claim the RTNL lock by the auxiliary driver. To avoid this, any call to re-create auxiliary devices that comes from an source that is holding the RTNL lock (e.g. netdev notifier when interface exits a bond) should execute in a separate thread. To accomplish this, add a flag to the PF that will be evaluated in the service task and dealt with there. Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16ice: Fix KASAN error in LAG NETDEV_UNREGISTER handlerDave Ertman1-6/+28
[ Upstream commit bea1898f65b9b7096cb4e73e97c83b94718f1fa1 ] Currently, the same handler is called for both a NETDEV_BONDING_INFO LAG unlink notification as for a NETDEV_UNREGISTER call. This is causing a problem though, since the netdev_notifier_info passed has a different structure depending on which event is passed. The problem manifests as a call trace from a BUG: KASAN stack-out-of-bounds error. Fix this by creating a handler specific to NETDEV_UNREGISTER that only is passed valid elements in the netdev_notifier_info struct for the NETDEV_UNREGISTER event. Also included is the removal of an unbalanced dev_put on the peer_netdev and related braces. Fixes: 6a8b357278f5 ("ice: Respond to a NETDEV_UNREGISTER event for LAG") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16ice: fix IPIP and SIT TSO offloadJesse Brandeburg2-8/+18
[ Upstream commit 46b699c50c0304cdbd725d7740073a7f9d5edb10 ] The driver was avoiding offload for IPIP (at least) frames due to parsing the inner header offsets incorrectly when trying to check lengths. This length check works for VXLAN frames but fails on IPIP frames because skb_transport_offset points to the inner header in IPIP frames, which meant the subtraction of transport_header from inner_network_header returns a negative value (-20). With the code before this patch, everything continued to work, but GSO was being used to segment, causing throughputs of 1.5Gb/s per thread. After this patch, throughput is more like 10Gb/s per thread for IPIP traffic. Fixes: e94d44786693 ("ice: Implement filter sync, NDO operations and bump version") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16ice: fix an error code in ice_cfg_phy_fec()Dan Carpenter1-1/+2
[ Upstream commit 21338d58736ef70eaae5fd75d567a358ff7902f9 ] Propagate the error code from ice_get_link_default_override() instead of returning success. Fixes: ea78ce4dab05 ("ice: add link lenient and default override support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-29ice: xsk: return xsk buffers back to pool when cleaning the ringMaciej Fijalkowski1-6/+6
[ Upstream commit afe8a3ba85ec2a6b6849367e25c06a2f8e0ddd05 ] Currently we only NULL the xdp_buff pointer in the internal SW ring but we never give it back to the xsk buffer pool. This means that buffers can be leaked out of the buff pool and never be used again. Add missing xsk_buff_free() call to the routine that is supposed to clean the entries that are left in the ring so that these buffers in the umem can be used by other sockets. Also, only go through the space that is actually left to be cleaned instead of a whole ring. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-29ice: Use xdp_buf instead of rx_buf for xsk zero-copyMagnus Karlsson2-39/+33
[ Upstream commit 57f7f8b6bc0bc80d94443f94fe5f21f266499a2b ] In order to use the new xsk batched buffer allocation interface, a pointer to an array of struct xsk_buff pointers need to be provided so that the function can put the result of the allocation there. In the ice driver, we already have a ring that stores pointers to xdp_buffs. This is only used for the xsk zero-copy driver and is a union with the structure that is used for the regular non zero-copy path. Unfortunately, that structure is larger than the xdp_buffs pointers which mean that there will be a stride (of 20 bytes) between each xdp_buff pointer. And feeding this into the xsk_buff_alloc_batch interface will not work since it assumes a regular array of xdp_buff pointers (each 8 bytes with 0 bytes in-between them on a 64-bit system). To fix this, remove the xdp_buff pointer from the rx_buf union and move it one step higher to the union above which only has pointers to arrays in it. This solves the problem and we can directly feed the SW ring of xdp_buff pointers straight into the allocation function in the next patch when that interface is used. This will improve performance. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-4-magnus.karlsson@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22ice: Don't put stale timestamps in the skbKarol Kolacinski2-7/+10
[ Upstream commit 37e738b6fdb14529534dca441e0222313688fde3 ] The driver has to check if it does not accidentally put the timestamp in the SKB before previous timestamp gets overwritten. Timestamp values in the PHY are read only and do not get cleared except at hardware reset or when a new timestamp value is captured. The cached_tstamp field is used to detect the case where a new timestamp has not yet been captured, ensuring that we avoid sending stale timestamp data to the stack. Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22ice: Use div64_u64 instead of div_u64 in adjfineKarol Kolacinski1-1/+1
[ Upstream commit 0013881c1145d36bf26165bb70fdd7560a5507a3 ] Change the division in ice_ptp_adjfine from div_u64 to div64_u64. div_u64 is used when the divisor is 32 bit but in this case incval is 64 bit and it caused incorrect calculations and incval adjustments. Fixes: 06c16d89d2cb ("ice: register 1588 PTP clock device object for E810 devices") Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-17ice: fix FDIR init missing when reset VFYahui Cao1-0/+2
[ Upstream commit f23ab04dd6f703e282bb2d51fe3ae14f4b88a628 ] When VF is being reset, ice_reset_vf() will be called and FDIR resource should be released and initialized again. Fixes: 1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF") Signed-off-by: Yahui Cao <yahui.cao@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-14ice: ignore dropped packets during initJesse Brandeburg1-0/+3
commit 28dc1b86f8ea9fd6f4c9e0b363db73ecabf84e22 upstream. If the hardware is constantly receiving unicast or broadcast packets during driver load, the device previously counted many GLV_RDPC (VSI dropped packets) events during init. This causes confusing dropped packet statistics during driver load. The dropped packets counter incrementing does stop once the driver finishes loading. Avoid this problem by baselining our statistics at the end of driver open instead of the end of probe. Fixes: cdedef59deb0 ("ice: Configure VSIs for Tx/Rx") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01ice: avoid bpf_prog refcount underflowMarta Plantykow1-1/+17
[ Upstream commit f65ee535df775a13a1046c0a0b2d72db342f8a5b ] Ice driver has the routines for managing XDP resources that are shared between ndo_bpf op and VSI rebuild flow. The latter takes place for example when user changes queue count on an interface via ethtool's set_channels(). There is an issue around the bpf_prog refcounting when VSI is being rebuilt - since ice_prepare_xdp_rings() is called with vsi->xdp_prog as an argument that is used later on by ice_vsi_assign_bpf_prog(), same bpf_prog pointers are swapped with each other. Then it is also interpreted as an 'old_prog' which in turn causes us to call bpf_prog_put on it that will decrement its refcount. Below splat can be interpreted in a way that due to zero refcount of a bpf_prog it is wiped out from the system while kernel still tries to refer to it: [ 481.069429] BUG: unable to handle page fault for address: ffffc9000640f038 [ 481.077390] #PF: supervisor read access in kernel mode [ 481.083335] #PF: error_code(0x0000) - not-present page [ 481.089276] PGD 100000067 P4D 100000067 PUD 1001cb067 PMD 106d2b067 PTE 0 [ 481.097141] Oops: 0000 [#1] PREEMPT SMP PTI [ 481.101980] CPU: 12 PID: 3339 Comm: sudo Tainted: G OE 5.15.0-rc5+ #1 [ 481.110840] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016 [ 481.122021] RIP: 0010:dev_xdp_prog_id+0x25/0x40 [ 481.127265] Code: 80 00 00 00 00 0f 1f 44 00 00 89 f6 48 c1 e6 04 48 01 fe 48 8b 86 98 08 00 00 48 85 c0 74 13 48 8b 50 18 31 c0 48 85 d2 74 07 <48> 8b 42 38 8b 40 20 c3 48 8b 96 90 08 00 00 eb e8 66 2e 0f 1f 84 [ 481.148991] RSP: 0018:ffffc90007b63868 EFLAGS: 00010286 [ 481.155034] RAX: 0000000000000000 RBX: ffff889080824000 RCX: 0000000000000000 [ 481.163278] RDX: ffffc9000640f000 RSI: ffff889080824010 RDI: ffff889080824000 [ 481.171527] RBP: ffff888107af7d00 R08: 0000000000000000 R09: ffff88810db5f6e0 [ 481.179776] R10: 0000000000000000 R11: ffff8890885b9988 R12: ffff88810db5f4bc [ 481.188026] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 481.196276] FS: 00007f5466d5bec0(0000) GS:ffff88903fb00000(0000) knlGS:0000000000000000 [ 481.205633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 481.212279] CR2: ffffc9000640f038 CR3: 000000014429c006 CR4: 00000000003706e0 [ 481.220530] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 481.228771] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 481.237029] Call Trace: [ 481.239856] rtnl_fill_ifinfo+0x768/0x12e0 [ 481.244602] rtnl_dump_ifinfo+0x525/0x650 [ 481.249246] ? __alloc_skb+0xa5/0x280 [ 481.253484] netlink_dump+0x168/0x3c0 [ 481.257725] netlink_recvmsg+0x21e/0x3e0 [ 481.262263] ____sys_recvmsg+0x87/0x170 [ 481.266707] ? __might_fault+0x20/0x30 [ 481.271046] ? _copy_from_user+0x66/0xa0 [ 481.275591] ? iovec_from_user+0xf6/0x1c0 [ 481.280226] ___sys_recvmsg+0x82/0x100 [ 481.284566] ? sock_sendmsg+0x5e/0x60 [ 481.288791] ? __sys_sendto+0xee/0x150 [ 481.293129] __sys_recvmsg+0x56/0xa0 [ 481.297267] do_syscall_64+0x3b/0xc0 [ 481.301395] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 481.307238] RIP: 0033:0x7f5466f39617 [ 481.311373] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bd 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2f 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 481.342944] RSP: 002b:00007ffedc7f4308 EFLAGS: 00000246 ORIG_RAX: 000000000000002f [ 481.361783] RAX: ffffffffffffffda RBX: 00007ffedc7f5460 RCX: 00007f5466f39617 [ 481.380278] RDX: 0000000000000000 RSI: 00007ffedc7f5360 RDI: 0000000000000003 [ 481.398500] RBP: 00007ffedc7f53f0 R08: 0000000000000000 R09: 000055d556f04d50 [ 481.416463] R10: 0000000000000077 R11: 0000000000000246 R12: 00007ffedc7f5360 [ 481.434131] R13: 00007ffedc7f5350 R14: 00007ffedc7f5344 R15: 0000000000000e98 [ 481.451520] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mxm_wmi mei_me coretemp mei ipmi_si ipmi_msghandler wmi acpi_pad acpi_power_meter ip_tables x_tables autofs4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel ahci crypto_simd cryptd libahci lpc_ich [last unloaded: ice] [ 481.528558] CR2: ffffc9000640f038 [ 481.542041] ---[ end trace d1f24c9ecf5b61c1 ]--- Fix this by only calling ice_vsi_assign_bpf_prog() inside ice_prepare_xdp_rings() when current vsi->xdp_prog pointer is NULL. This way set_channels() flow will not attempt to swap the vsi->xdp_prog pointers with itself. Also, sprinkle around some comments that provide a reasoning about correlation between driver and kernel in terms of bpf_prog refcount. Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Marta Plantykow <marta.a.plantykow@intel.com> Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01ice: fix vsi->txq_map sizingMaciej Fijalkowski1-2/+7
[ Upstream commit 792b2086584f25d84081a526beee80d103c2a913 ] The approach of having XDP queue per CPU regardless of user's setting exposed a hidden bug that could occur in case when Rx queue count differ from Tx queue count. Currently vsi->txq_map's size is equal to the doubled vsi->alloc_txq, which is not correct due to the fact that XDP rings were previously based on the Rx queue count. Below splat can be seen when ethtool -L is used and XDP rings are configured: [ 682.875339] BUG: kernel NULL pointer dereference, address: 000000000000000f [ 682.883403] #PF: supervisor read access in kernel mode [ 682.889345] #PF: error_code(0x0000) - not-present page [ 682.895289] PGD 0 P4D 0 [ 682.898218] Oops: 0000 [#1] PREEMPT SMP PTI [ 682.903055] CPU: 42 PID: 2878 Comm: ethtool Tainted: G OE 5.15.0-rc5+ #1 [ 682.912214] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016 [ 682.923380] RIP: 0010:devres_remove+0x44/0x130 [ 682.928527] Code: 49 89 f4 55 48 89 fd 4c 89 ff 53 48 83 ec 10 e8 92 b9 49 00 48 8b 9d a8 02 00 00 48 8d 8d a0 02 00 00 49 89 c2 48 39 cb 74 0f <4c> 3b 63 10 74 25 48 8b 5b 08 48 39 cb 75 f1 4c 89 ff 4c 89 d6 e8 [ 682.950237] RSP: 0018:ffffc90006a679f0 EFLAGS: 00010002 [ 682.956285] RAX: 0000000000000286 RBX: ffffffffffffffff RCX: ffff88908343a370 [ 682.964538] RDX: 0000000000000001 RSI: ffffffff81690d60 RDI: 0000000000000000 [ 682.972789] RBP: ffff88908343a0d0 R08: 0000000000000000 R09: 0000000000000000 [ 682.981040] R10: 0000000000000286 R11: 3fffffffffffffff R12: ffffffff81690d60 [ 682.989282] R13: ffffffff81690a00 R14: ffff8890819807a8 R15: ffff88908343a36c [ 682.997535] FS: 00007f08c7bfa740(0000) GS:ffff88a03fd00000(0000) knlGS:0000000000000000 [ 683.006910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 683.013557] CR2: 000000000000000f CR3: 0000001080a66003 CR4: 00000000003706e0 [ 683.021819] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 683.030075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 683.038336] Call Trace: [ 683.041167] devm_kfree+0x33/0x50 [ 683.045004] ice_vsi_free_arrays+0x5e/0xc0 [ice] [ 683.050380] ice_vsi_rebuild+0x4c8/0x750 [ice] [ 683.055543] ice_vsi_recfg_qs+0x9a/0x110 [ice] [ 683.060697] ice_set_channels+0x14f/0x290 [ice] [ 683.065962] ethnl_set_channels+0x333/0x3f0 [ 683.070807] genl_family_rcv_msg_doit+0xea/0x150 [ 683.076152] genl_rcv_msg+0xde/0x1d0 [ 683.080289] ? channels_prepare_data+0x60/0x60 [ 683.085432] ? genl_get_cmd+0xd0/0xd0 [ 683.089667] netlink_rcv_skb+0x50/0xf0 [ 683.094006] genl_rcv+0x24/0x40 [ 683.097638] netlink_unicast+0x239/0x340 [ 683.102177] netlink_sendmsg+0x22e/0x470 [ 683.106717] sock_sendmsg+0x5e/0x60 [ 683.110756] __sys_sendto+0xee/0x150 [ 683.114894] ? handle_mm_fault+0xd0/0x2a0 [ 683.119535] ? do_user_addr_fault+0x1f3/0x690 [ 683.134173] __x64_sys_sendto+0x25/0x30 [ 683.148231] do_syscall_64+0x3b/0xc0 [ 683.161992] entry_SYSCALL_64_after_hwframe+0x44/0xae Fix this by taking into account the value that num_possible_cpus() yields in addition to vsi->alloc_txq instead of doubling the latter. Fixes: efc2214b6047 ("ice: Add support for XDP") Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path") Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25ice: Delete always true check of PF pointerLeon Romanovsky1-3/+0
commit 2ff04286a9569675948f39cec2c6ad47c3584633 upstream. PF pointer is always valid when PCI core calls its .shutdown() and .remove() callbacks. There is no need to check it again. Fixes: 837f08fdecbe ("ice: Add basic driver framework for Intel(R) E800 Series") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25ice: Fix VF true promiscuous modeBrett Creeley2-43/+40
commit 1a8c7778bcde5981463a5b9f9b2caa44a327ff93 upstream. When a VF requests promiscuous mode and it's trusted and true promiscuous mode is enabled the PF driver attempts to enable unicast and/or multicast promiscuous mode filters based on the request. This is fine, but there are a couple issues with the current code. [1] The define to configure the unicast promiscuous mode mask also includes bits to configure the multicast promiscuous mode mask, which causes multicast to be set/cleared unintentionally. [2] All 4 cases for enable/disable unicast/multicast mode are not handled in the promiscuous mode message handler, which causes unexpected results regarding the current promiscuous mode settings. To fix [1] make sure any promiscuous mask defines include the correct bits for each of the promiscuous modes. To fix [2] make sure that all 4 cases are handled since there are 2 bits (FLAG_VF_UNICAST_PROMISC and FLAG_VF_MULTICAST_PROMISC) that can be either set or cleared. Also, since either unicast and/or multicast promiscuous configuration can fail, introduce two separate error values to handle each of these cases. Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18ice: Fix not stopping Tx queues for VFsBrett Creeley2-5/+3
[ Upstream commit b385cca47363316c6d9a74ae9db407bbc281f815 ] When a VF is removed and/or reset its Tx queues need to be stopped from the PF. This is done by calling the ice_dis_vf_qs() function, which calls ice_vsi_stop_lan_tx_rings(). Currently ice_dis_vf_qs() is protected by the VF state bit ICE_VF_STATE_QS_ENA. Unfortunately, this is causing the Tx queues to not be disabled in some cases and when the VF tries to re-enable/reconfigure its Tx queues over virtchnl the op is failing. This is because a VF can be reset and/or removed before the ICE_VF_STATE_QS_ENA bit is set, but the Tx queues were already configured via ice_vsi_cfg_single_txq() in the VIRTCHNL_OP_CONFIG_VSI_QUEUES op. However, the ICE_VF_STATE_QS_ENA bit is set on a successful VIRTCHNL_OP_ENABLE_QUEUES, which will always happen after the VIRTCHNL_OP_CONFIG_VSI_QUEUES op. This was causing the following error message when loading the ice driver, creating VFs, and modifying VF trust in an endless loop: [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5 [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6 Fix this by always calling ice_dis_vf_qs() and silencing the error message in ice_vsi_stop_tx_ring() since the calling code ignores the return anyway. Also, all other places that call ice_vsi_stop_tx_ring() catch the error, so this doesn't affect those flows since there was no change to the values the function returns. Other solutions were considered (i.e. tracking which VF queues had been "started/configured" in VIRTCHNL_OP_CONFIG_VSI_QUEUES, but it seemed more complicated than it was worth. This solution also brings in the chance for other unexpected conditions due to invalid state bit checks. So, the proposed solution seemed like the best option since there is no harm in failing to stop Tx queues that were never started. This issue can be seen using the following commands: for i in {0..50}; do rmmod ice modprobe ice sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on sleep 2 echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on done Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18ice: Fix replacing VF hardware MAC to existing MAC filterSylwester Dziedziuch1-5/+9
[ Upstream commit ce572a5b88d5ca6737b5e23da9892792fd708ad3 ] VF was not able to change its hardware MAC address in case the new address was already present in the MAC filter list. Change the handling of VF add mac request to not return if requested MAC address is already present on the list and check if its hardware MAC needs to be updated in this case. Fixes: ed4c068d46f6 ("ice: Enable ip link show on the PF to display VF unicast MAC(s)") Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18ice: Move devlink port to PF/VF structWojciech Drewek7-37/+103
[ Upstream commit 2ae0aa4758b0f4a247d45cb3bf01548a7f396751 ] Keeping devlink port inside VSI data structure causes some issues. Since VF VSI is released during reset that means that we have to unregister devlink port and register it again every time reset is triggered. With the new changes in devlink API it might cause deadlock issues. After calling devlink_port_register/devlink_port_unregister devlink API is going to lock rtnl_mutex. It's an issue when VF reset is triggered in netlink operation context (like setting VF MAC address or VLAN), because rtnl_lock is already taken by netlink. Another call of rtnl_lock from devlink API results in dead-lock. By moving devlink port to PF/VF we avoid creating/destroying it during reset. Since this patch, devlink ports are created during ice_probe, destroyed during ice_remove for PF and created during ice_repr_add, destroyed during ice_repr_rem for VF. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>