BMC/Intel-BMC/linux.git - Intel OpenBMC Linux kernel source tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2019-07-31	ice: Remove duplicate code in ice_alloc_rx_bufs	Brett Creeley	1	-11/+3
	Currently if the call to ice_alloc_mapped_page() fails we jump to the no_buf label, possibly call ice_release_rx_desc(), and return true indicating that there is more work to do. In the success case we just fall out of the while loop, possibly call ice_alloc_mapped_page(), and return false saying we exhausted cleaned_count. This flow can be improved by breaking if ice_alloc_mapped_page() fails and then the flow outside of the while loop is the same for the failure and success case. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31	ice: Add stats for Rx drops at the port level	Brett Creeley	2	-0/+7
	Currently we are not reporting dropped counts at the port level to ethtool or netlink. This was found when debugging Rx dropped issues and the total packets sent did not equal the total packets received minus the rx_dropped, which was very confusing. To determine dropped counts at the port level we need to read the PRTRPB_RDPC register. To fix reporting we will store the dropped counts in the PF's rx_discards. This will be reported to netlink by storing it in the PF VSI's rx_missed_errors signaling that the receiver missed the packet. Also, we will report this to ethtool in the rx_dropped.nic field. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31	ice: Update number of VF queue before setting VSI resources	Akeem G Abodunrin	1	-5/+5
	In case there is a request from a VF to change its number of queues, and the request was successful, we need to update number of queues configured on the VF before updating corresponding VSI for that VF, especially LAN Tx queue tree and TC update, otherwise, we would continued to use old value of vf->num_vf_qs for allocated Tx/Rx queues... Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31	ice: Set up Tx scheduling tree based on alloc VSI Tx queues	Akeem G Abodunrin	1	-3/+3
	This patch uses allocated number of Tx queues per VSI to set up its scheduling tree instead of using total number of available Tx queues. Only PF VSIs have total number of allocated Tx queues equal to number of available Tx queues, other VSIs have different number of queues configured. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31	ice: Only bump Rx tail and release buffers once per napi_poll	Brett Creeley	1	-15/+27
	Currently we bump the Rx tail and release/give buffers to hardware every 16 descriptors. This causes us to bump Rx tail up to 4 times per napi_poll call. Also we are always bumping tail on an odd index and this is a problem because hardware ignores the lower 3 bits in the QRX_TAIL register. This is making it so hardware sees tail bumps only every 8 descriptors. Instead lets only bump Rx tail once per napi_poll if the value aligns with hardware's expectations of the lower 3 bits being cleared. Also only release/give Rx buffers once per napi_poll call. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31	ice: Disable VFs until reset is completed	Akeem G Abodunrin	1	-0/+5
	This patch adds code to clear VFs enable status until reset is completed, and Tx/Rx rings are setup. Without this patch, the code flow request Tx queues to be disabled after reset, especially PFR - where VF VSI Tx rings have already been wiped off in the NVM and result to adminq error based on the call to disable Tx LAN queue in ice_reset_all_vfs function call. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31	ice: Do not configure port with no media	Tony Nguyen	2	-82/+158
	The firmware reports an error when trying to configure a port with no media. Instead of always configuring the port, check for media before attempting to configure it. In the absence of media, turn off link and poll for media to become available before re-enabling link. Move ice_force_phys_link_state() up to avoid forward declaration. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31	ice: separate out control queue lock creation	Jacob Keller	3	-29/+91
	The ice_init_all_ctrlq and ice_shutdown_all_ctrlq functions create and destroy the locks used to protect the send and receive process of each control queue. This is problematic, as the driver may use these functions to shutdown and re-initialize the control queues at run time. For example, it may do this in response to a device reset. If the driver failed to recover from a reset, it might leave the control queues offline. In this case, the locks will no longer be initialized. A later call to ice_sq_send_cmd will then attempt to acquire a lock that has been destroyed. It is incorrect behavior to access a lock that has been destroyed. Indeed, ice_aq_send_cmd already tries to avoid accessing an offline control queue, but the check occurs inside the lock. The root of the problem is that the locks are destroyed at run time. Modify ice_init_all_ctrlq and ice_shutdown_all_ctrlq such that they no longer create or destroy the locks. Introduce new functions, ice_create_all_ctrlq and ice_destroy_all_ctrlq. Call these functions in ice_init_hw and ice_deinit_hw. Now, the control queue locks will remain valid for the life of the driver, and will not be destroyed until the driver unloads. This also allows removing a duplicate check of the sq.count and rq.count values when shutting down the controlqs. The ice_shutdown_ctrlq function already checks this value under the lock. Previously commit dec64ff10ed9 ("ice: use [sr]q.count when checking if queue is initialized") needed this check to happen outside the lock, because it prevented duplicate attempts at destroying the locks. The driver may now safely use ice_init_all_ctrlq and ice_shutdown_all_ctrlq while handling reset events, without causing the locks to be invalid. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31	ice: Always set prefena when configuring an Rx queue	Brett Creeley	2	-1/+9
	Currently we are always setting prefena to 0. This is causing the hardware to only fetch descriptors when there are none free in the cache for a received packet instead of prefetching when it has used the last descriptor regardless of incoming packets. Fix this by allowing the hardware to prefetch Rx descriptors. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31	ice: Move vector base setup to PF VSI	Tony Nguyen	1	-4/+4
	When interrupt tracking was refactored, during rebuild, the call to ice_vsi_setup_vector_base() was inadvertently removed from the PF VSI instead of being removed from the VF VSI. During reset, the failure to properly setup the vector base generates a call trace. Correct this so that resets/rebuilds properly complete. Fixes: cbe66bfee6a0 ("ice: Refactor interrupt tracking") Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31	ice: track hardware stat registers past rollover	Jacob Keller	5	-130/+91
	Currently, ice_stat_update32 and ice_stat_update40 will limit the value of the software statistic to 32 or 40 bits wide, depending on which register is being read. This means that if a driver is running for a long time, the displayed software register values will roll over to zero at 40 bits or 32 bits. This occurs because the functions directly assign the difference between the previous value and current value of the hardware statistic. Instead, add this value to the current software statistic, and then update the previous value. In this way, each time ice_stat_update40 or ice_stat_update32 are called, they will increment the software tracking value by the difference of the hardware register from its last read. The software tracking value will correctly count up until it overflows a u64. The only requirement is that the ice_stat_update functions be called at least once each time the hardware register overflows. While we're fixing ice_stat_update40, modify it to use rd64 instead of two calls to rd32. Additionally, drop the now unnecessary hireg function parameter. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31	ice: add lp_advertising flow control support	Paul Greenwalt	1	-32/+72
	Add support for reporting link partner advertising when ETHTOOL_GLINKSETTINGS defined. Get pause param reports the Tx/Rx pause configured, and then ethtool issues ETHTOOL_GSET ioctl and ice_get_settings_link_up reports the negotiated Tx/Rx pause. Negotiated pause frame report per IEEE 802.3-2005 table 288-3. $ ethtool --show-pause ens6f0 Pause parameters for ens6f0: Autonegotiate: on RX: on TX: on RX negotiated: on TX negotiated: on $ ethtool ens6f0 Settings for ens6f0: Supported ports: [ FIBRE ] Supported link modes: 25000baseCR/Full Supported pause frame use: Symmetric Supports auto-negotiation: Yes Supported FEC modes: None BaseR RS Advertised link modes: 25000baseCR/Full Advertised pause frame use: Symmetric Receive-only Advertised auto-negotiation: Yes Advertised FEC modes: None BaseR RS Link partner advertised link modes: Not reported Link partner advertised pause frame use: Symmetric Link partner advertised auto-negotiation: Yes Link partner advertised FEC modes: Not reported Speed: 25000Mb/s Duplex: Full Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: g Current message level: 0x00000007 (7) drv probe link Link detected: yes When ETHTOOL_GLINKSETTINGS is not defined, get pause param reports the negotiated Tx/Rx pause. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31	net: Use skb_frag_off accessors	Jonathan Lemon	3	-4/+4
	Use accessor functions for skb fragment's page_offset instead of direct references, in preparation for bvec conversion. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-25	Merge branch '1GbE' of ↵	David S. Miller	6	-14/+22
	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 1GbE Intel Wired LAN Driver Updates 2019-07-24 This series contains updates to igc and e1000e client drivers only. Sasha provides a couple of cleanups to remove code that is not needed and reduce structure sizes. Updated the MAC reset flow to use the device reset flow instead of a port reset flow. Added addition device id's that will be supported. Kai-Heng Feng provides a workaround for a possible stalled packet issue in our ICH devices due to a clock recovery from the PCH being too slow. v2: removed the last patch in the series that supposedly fixed a MAC/PHY de-sync potential issue while waiting for additional information from hardware engineers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-25	net/ixgbevf: fix a compilation error of skb_frag_t	Qian Cai	1	-2/+5
	The linux-next commit "net: Rename skb_frag_t size to bv_len" [1] introduced a compilation error on powerpc as it forgot to deal with the renaming from "size" to "bv_len" for ixgbevf. [1] https://lore.kernel.org/netdev/20190723030831.11879-1-willy@infradead.org/T/#md052f1c7de965ccd1bdcb6f92e1990a52298eac5 In file included from ./include/linux/cache.h:5, from ./include/linux/printk.h:9, from ./include/linux/kernel.h:15, from ./include/linux/list.h:9, from ./include/linux/module.h:9, from drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:12: drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c: In function 'ixgbevf_xmit_frame_ring': drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:4138:51: error: 'skb_frag_t' {aka 'struct bio_vec'} has no member named 'size' count += TXD_USE_COUNT(skb_shinfo(skb)->frags[f].size); ^ ./include/uapi/linux/kernel.h:13:40: note: in definition of macro '__KERNEL_DIV_ROUND_UP' #define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d)) ^ drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:4138:12: note: in expansion of macro 'TXD_USE_COUNT' count += TXD_USE_COUNT(skb_shinfo(skb)->frags[f].size); Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-24	e1000e: add workaround for possible stalled packet	Kai-Heng Feng	2	-1/+11
	This works around a possible stalled packet issue, which may occur due to clock recovery from the PCH being too slow, when the LAN is transitioning from K1 at 1G link speed. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204057 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-24	igc: Add more SKUs for i225 device	Sasha Neftin	3	-0/+9
	Add support for more SKUs. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-24	igc: Update the MAC reset flow	Sasha Neftin	2	-2/+2
	Use Device Reset flow instead of Port Reset flow. This flow performs a reset of the entire controller device, resulting in a state nearly approximating the state following a power-up reset or internal PCIe reset, except for system PCI configuration. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-24	igc: Remove the unused field from a device specification structure	Sasha Neftin	1	-5/+0
	This patch comes to clean up the device specification structure. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-24	igc: Remove the polarity field from a PHY information structure	Sasha Neftin	1	-6/+0
	Polarity and cable length fields is not applicable for the i225 device. This patch comes to clean up PHY information structure. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-23	igb: Use dev_get_drvdata where possible	Chuhong Yuan	1	-2/+1
	Instead of using to_pci_dev + pci_get_drvdata, use dev_get_drvdata to make code simpler. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-23	i40e: Use dev_get_drvdata	Chuhong Yuan	1	-5/+3
	Instead of using to_pci_dev + pci_get_drvdata, use dev_get_drvdata to make code simpler. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-23	fm10k: Use dev_get_drvdata	Chuhong Yuan	1	-2/+2
	Instead of using to_pci_dev + pci_get_drvdata, use dev_get_drvdata to make code simpler. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-23	e1000e: Use dev_get_drvdata where possible	Chuhong Yuan	1	-4/+3
	Instead of using to_pci_dev + pci_get_drvdata, use dev_get_drvdata to make code simpler. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-23	net: Use skb accessors in network drivers	Matthew Wilcox (Oracle)	14	-28/+28
	In preparation for unifying the skb_frag and bio_vec, use the fine accessors which already exist and use skb_frag_t instead of struct skb_frag_struct. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21	igc: Prefer pcie_capability_read_word()	Frederick Lawler	1	-8/+4
	Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability") added accessors for the PCI Express Capability so that drivers didn't need to be aware of differences between v1 and v2 of the PCI Express Capability. Replace pci_read_config_word() and pci_write_config_word() calls with pcie_capability_read_word() and pcie_capability_write_word(). Signed-off-by: Frederick Lawler <fred@fredlawl.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-15	docs: driver-model: move it to the driver-api book	Mauro Carvalho Chehab	1	-1/+1
	The audience for the Kernel driver-model is clearly Kernel hackers. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> # ice driver changes
2019-07-12	Merge tag 'driver-core-5.3-rc1' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and debugfs updates from Greg KH: "Here is the "big" driver core and debugfs changes for 5.3-rc1 It's a lot of different patches, all across the tree due to some api changes and lots of debugfs cleanups. Other than the debugfs cleanups, in this set of changes we have: - bus iteration function cleanups - scripts/get_abi.pl tool to display and parse Documentation/ABI entries in a simple way - cleanups to Documenatation/ABI/ entries to make them parse easier due to typos and other minor things - default_attrs use for some ktype users - driver model documentation file conversions to .rst - compressed firmware file loading - deferred probe fixes All of these have been in linux-next for a while, with a bunch of merge issues that Stephen has been patient with me for" * tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (102 commits) debugfs: make error message a bit more verbose orangefs: fix build warning from debugfs cleanup patch ubifs: fix build warning after debugfs cleanup patch driver: core: Allow subsystems to continue deferring probe drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT arch_topology: Remove error messages on out-of-memory conditions lib: notifier-error-inject: no need to check return value of debugfs_create functions swiotlb: no need to check return value of debugfs_create functions ceph: no need to check return value of debugfs_create functions sunrpc: no need to check return value of debugfs_create functions ubifs: no need to check return value of debugfs_create functions orangefs: no need to check return value of debugfs_create functions nfsd: no need to check return value of debugfs_create functions lib: 842: no need to check return value of debugfs_create functions debugfs: provide pr_fmt() macro debugfs: log errors when something goes wrong drivers: s390/cio: Fix compilation warning about const qualifiers drivers: Add generic helper to match by of_node driver_find_device: Unify the match function with class_find_device() bus_find_device: Unify the match callback with class_find_device ...
2019-07-10	net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload	Pablo Neira Ayuso	3	-30/+30
	And any other existing fields in this structure that refer to tc. Specifically: * tc_cls_flower_offload_flow_rule() to flow_cls_offload_flow_rule(). * TC_CLSFLOWER_* to FLOW_CLS_. tc_cls_common_offload to tc_cls_common_offload. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-10	drivers: net: use flow block API	Pablo Neira Ayuso	4	-4/+16
	This patch updates flow_block_cb_setup_simple() to use the flow block API. Several drivers are also adjusted to use it. This patch introduces the per-driver list of flow blocks to account for blocks that are already in use. Remove tc_block_offload alias. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-10	net: flow_offload: add flow_block_cb_setup_simple()	Pablo Neira Ayuso	4	-93/+19
	Most drivers do the same thing to set up the flow block callbacks, this patch adds a helper function to do this. This preparation patch reduces the number of changes to adapt the existing drivers to use the flow block callback API. This new helper function takes a flow block list per-driver, which is set to NULL until this driver list is used. This patch also introduces the flow_block_command and flow_block_binder_type enumerations, which are renamed to use FLOW_BLOCK_* in follow up patches. There are three definitions (aliases) in order to reduce the number of updates in this patch, which go away once drivers are fully adapted to use this flow block API. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-04	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	David S. Miller	2	-11/+16
	Daniel Borkmann says: ==================== pull-request: bpf-next 2019-07-03 The following pull-request contains BPF updates for your net-next tree. There is a minor merge conflict in mlx5 due to 8960b38932be ("linux/dim: Rename externally used net_dim members") which has been pulled into your tree in the meantime, but resolution seems not that bad ... getting current bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed just so he's aware of the resolution below: ** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c: <<<<<<< HEAD static int mlx5e_open_cq(struct mlx5e_channel c, struct dim_cq_moder moder, struct mlx5e_cq_param param, struct mlx5e_cq cq) ======= int mlx5e_open_cq(struct mlx5e_channel c, struct net_dim_cq_moder moder, struct mlx5e_cq_param param, struct mlx5e_cq cq) >>>>>>> e5a3e259ef239f443951d401db10db7d426c9497 Resolution is to take the second chunk and rename net_dim_cq_moder into dim_cq_moder. Also the signature for mlx5e_open_cq() in ... drivers/net/ethernet/mellanox/mlx5/core/en.h +977 ... and in mlx5e_open_xsk() ... drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64 ... needs the same rename from net_dim_cq_moder into dim_cq_moder. ** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c: <<<<<<< HEAD int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); struct dim_cq_moder icocq_moder = {0, 0}; struct net_device netdev = priv->netdev; struct mlx5e_channel c; unsigned int irq; ======= struct net_dim_cq_moder icocq_moder = {0, 0}; >>>>>>> e5a3e259ef239f443951d401db10db7d426c9497 Take the second chunk and rename net_dim_cq_moder into dim_cq_moder as well. Let me know if you run into any issues. Anyway, the main changes are: 1) Long-awaited AF_XDP support for mlx5e driver, from Maxim. 2) Addition of two new per-cgroup BPF hooks for getsockopt and setsockopt along with a new sockopt program type which allows more fine-grained pass/reject settings for containers. Also add a sock_ops callback that can be selectively enabled on a per-socket basis and is executed for every RTT to help tracking TCP statistics, both features from Stanislav. 3) Follow-up fix from loops in precision tracking which was not propagating precision marks and as a result verifier assumed that some branches were not taken and therefore wrongly removed as dead code, from Alexei. 4) Fix BPF cgroup release synchronization race which could lead to a double-free if a leaf's cgroup_bpf object is released and a new BPF program is attached to the one of ancestor cgroups in parallel, from Roman. 5) Support for bulking XDP_TX on veth devices which improves performance in some cases by around 9%, from Toshiaki. 6) Allow for lookups into BPF devmap and improve feedback when calling into bpf_redirect_map() as lookup is now performed right away in the helper itself, from Toke. 7) Add support for fq's Earliest Departure Time to the Host Bandwidth Manager (HBM) sample BPF program, from Lawrence. 8) Various cleanups and minor fixes all over the place from many others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01	Merge branch '10GbE' of ↵	David S. Miller	21	-145/+686
	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2019-06-28 This series contains a smorgasbord of updates to many of the Intel drivers. Gustavo A. R. Silva updates the ice and iavf drivers to use the strcut_size() helper where possible. Miguel increases the pause and refresh time for flow control in the e1000e driver during reset for certain devices. Dann Frazier fixes a potential NULL pointer dereference in ixgbe driver when using non-IPSec enabled devices. Colin Ian King fixes a potential overflow during a shift in the ixgbe driver. Also fixes a potential NULL pointer dereference in the iavf driver by adding a check. Venkatesh Srinivas converts the e1000 driver to use dma_wmb() instead of wmb() for doorbell writes to avoid SFENCEs in the transmit and receive paths. Arjan updates the e1000e driver to improve boot time by over 100 msec by reducing the usleep ranges suring system startup. Artem updates the igb driver register dump in ethtool, first prepares the register dump for future additions of registers in the dump, then secondly, adds the RR2DCDELAY register to the dump. When dealing with time-sensitive networks, this register is helpful in determining your latency from the device to the ring. Alex fixes the ixgbevf driver to use the current cached link state, rather than trying to re-check the value from the PF. Harshitha adds support for MACVLAN offloads in i40e by using channels as MACVLAN interfaces. Detlev Casanova updates the e1000e driver to use delayed work instead of timers to run the watchdog. Vitaly fixes an issue in e1000e, where when disconnecting and reconnecting the physical cable connection, the NIC enters a DMoff state. This state causes a mismatch in link and duplexing, so check the PCIm function state and perform a PHY reset when in this state to resolve the issue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29	e1000e: PCIm function state support	Vitaly Lifshits	2	-1/+20
	Due to commit: 5d8682588605 ("[misc] mei: me: allow runtime pm for platform with D0i3") When disconnecting the cable and reconnecting it the NIC enters DMoff state. This caused wrong link indication and duplex mismatch. This bug is described in: https://bugzilla.redhat.com/show_bug.cgi?id=1689436 Checking PCIm function state and performing PHY reset after a timeout in watchdog task solves this issue. Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29	e1000e: Make watchdog use delayed work	Detlev Casanova	2	-27/+32
	Use delayed work instead of timers to run the watchdog of the e1000e driver. Simplify the code with one less middle function. Signed-off-by: Detlev Casanova <detlev.casanova@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29	i40e: Add macvlan support on i40e	Harshitha Ramamurthy	2	-2/+522
	This patch enables macvlan offloads for i40e. The idea is to use channels as macvlan interfaces. The channels are VSIs of type VMDQ. When the first macvlan is created, the maximum number of channels possible are created. From then on, as a macvlan interface is created, a macvlan filter is added to these already created channels (VSIs). This patch utilizes subordinate device traffic classes to make queue groups(channels) available for an upper device like a macvlan. Steps to configure macvlan offloads: 1. ethtool -K ethx l2-fwd-offload on 2. ip link add link ethx name macvlan1 type macvlan 3. ip addr add <address> dev macvlan1 4. ip link set macvlan1 up Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29	ixgbevf: Use cached link state instead of re-reading the value for ethtool	Alexander Duyck	1	-8/+2
	Change the ethtool link settings call to just read the cached state out of the adapter structure instead of trying to recheck the value from the PF. Doing this should prevent excessive reading of the mailbox. Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: "Guilherme G. Piccoli" <gpiccoli@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29	iavf: fix dereference of null rx_buffer pointer	Colin Ian King	1	-2/+4
	A recent commit efa14c3985828d ("iavf: allow null RX descriptors") added a null pointer sanity check on rx_buffer, however, rx_buffer is being dereferenced before that check, which implies a null pointer dereference bug can potentially occur. Fix this by only dereferencing rx_buffer until after the null pointer check. Addresses-Coverity: ("Dereference before null check") Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29	igb: add RR2DCDELAY to ethtool registers dump	Artem Bityutskiy	2	-1/+6
	This patch adds the RR2DCDELAY register to the ethtool registers dump. RR2DCDELAY exists on I210 and I211 Intel Gigabit Ethernet chips and it stands for "Read Request To Data Completion Delay". Here is how this register is described in the I210 datasheet: "This field captures the maximum PCIe split time in 16 ns units, which is the maximum delay between the read request to the first data completion. This is giving an estimation of the PCIe round trip time." In other words, whenever I210 reads from the host memory (e.g., fetches a descriptor from the ring), the chip measures every PCI DMA read transaction and captures the maximum value. So it ends up containing the longest DMA transaction time. This register is very useful for troubleshooting and research purposes. If you are dealing with time-sensitive networks, this register can help you get an idea of your "I210-to-ring" latency. This helps answering questions like "should I have PCIe ASPM enabled?" or "should I enable deep C-states?" on my system. It is safe to read this register at any point, reading it has no effect on the I210 chip functionality. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29	igb: minor ethool regdump amendment	Artem Bityutskiy	1	-35/+35
	This patch has no functional impact and it is just a preparation for the following patch. It removes an early return from the 'igb_get_regs()' function by moving the 82576-only registers dump into an "if" block. With this preparation, we can dump more non-82576 registers at the end of this function. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29	iavf: Fix up debug print macro	Jeff Kirsher	1	-3/+7
	This aligns the iavf_debug() macro with the other Intel drivers. Add the bus number, bus_id field to i40e_bus_info so output shows each physical port(i.e func) in following format: [[[[<domain>]:]<bus>]:][<slot>][.[<func>]] domains are numbered from 0 to ffff), bus (0-ff), slot (0-1f) and function (0-7). Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
2019-06-29	e1000e: Reduce boot time by tightening sleep ranges	Arjan van de Ven	7	-28/+28
	The e1000e driver is a great user of the usleep_range() API, and has nice ranges that in principle help power management. However the ranges that are used only during system startup are very long (and can add easily 100 msec to the boot time) while the power savings of such long ranges is irrelevant due to the one-off, boot only, nature of these functions. This patch shrinks some of the longest ranges to be shorter (while still using a power friendly 1 msec range); this saves 100msec+ of boot time on my BDW NUCs Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29	iavf: use struct_size() helper	Gustavo A. R. Silva	1	-21/+16
	Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes, in particular in the context in which this code is being used. So, replace code of the following form: sizeof(struct virtchnl_ether_addr_list) + (count * sizeof(struct virtchnl_ether_addr)) with: struct_size(veal, list, count) and so on... This code was detected with the help of Coccinelle. Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29	e1000: Use dma_wmb() instead of wmb() before doorbell writes	Venkatesh Srinivas	1	-3/+3
	e1000 writes to doorbells to post transmit descriptors and fill the receive ring. After writing descriptors to memory but before writing to doorbells, use dma_wmb() rather than wmb(). wmb() is more heavyweight than necessary for a device to see descriptor writes. On x86, this avoids SFENCEs before doorbell writes in both the Tx and Rx paths. On ARM, this converts DSB ST -> DMB OSHST. Tested: 82576EB / x86; QEMU (qemu emulates an 8257x) Signed-off-by: Venkatesh Srinivas <venkateshs@google.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29	ixgbe: fix potential u32 overflow on shift	Colin Ian King	1	-10/+4
	The u32 variable rem is being shifted using u32 arithmetic however it is being passed to div_u64 that expects the expression to be a u64. The 32 bit shift may potentially overflow, so cast rem to a u64 before shifting to avoid this. Also remove comment about overflow. Addresses-Coverity: ("Unintentional integer overflow") Fixes: cd4583206990 ("ixgbe: implement support for SDP/PPS output on X550 hardware") Fixes: 68d9676fc04e ("ixgbe: fix PTP SDP pin setup on X540 hardware") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29	ixgbe: Avoid NULL pointer dereference with VF on non-IPsec hw	Dann Frazier	1	-0/+3
	An ipsec structure will not be allocated if the hardware does not support offload. Fixes the following Oops: [ 191.045452] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 191.054232] Mem abort info: [ 191.057014] ESR = 0x96000004 [ 191.060057] Exception class = DABT (current EL), IL = 32 bits [ 191.065963] SET = 0, FnV = 0 [ 191.069004] EA = 0, S1PTW = 0 [ 191.072132] Data abort info: [ 191.074999] ISV = 0, ISS = 0x00000004 [ 191.078822] CM = 0, WnR = 0 [ 191.081780] user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000043d9e467 [ 191.088382] [0000000000000000] pgd=0000000000000000 [ 191.093252] Internal error: Oops: 96000004 [#1] SMP [ 191.098119] Modules linked in: vhost_net vhost tap vfio_pci vfio_virqfd vfio_iommu_type1 vfio xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter devlink ebtables ip6table_filter ip6_tables iptable_filter bpfilter ipmi_ssif nls_iso8859_1 input_leds joydev ipmi_si hns_roce_hw_v2 ipmi_devintf hns_roce ipmi_msghandler cppc_cpufreq sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 ses enclosure btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c raid1 raid0 multipath linear ixgbevf hibmc_drm ttm [ 191.168607] drm_kms_helper aes_ce_blk aes_ce_cipher syscopyarea crct10dif_ce sysfillrect ghash_ce qla2xxx sysimgblt sha2_ce sha256_arm64 hisi_sas_v3_hw fb_sys_fops sha1_ce uas nvme_fc mpt3sas ixgbe drm hisi_sas_main nvme_fabrics usb_storage hclge scsi_transport_fc ahci libsas hnae3 raid_class libahci xfrm_algo scsi_transport_sas mdio aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 191.202952] CPU: 94 PID: 0 Comm: swapper/94 Not tainted 4.19.0-rc1+ #11 [ 191.209553] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.20.01 04/26/2019 [ 191.218064] pstate: 20400089 (nzCv daIf +PAN -UAO) [ 191.222873] pc : ixgbe_ipsec_vf_clear+0x60/0xd0 [ixgbe] [ 191.228093] lr : ixgbe_msg_task+0x2d0/0x1088 [ixgbe] [ 191.233044] sp : ffff000009b3bcd0 [ 191.236346] x29: ffff000009b3bcd0 x28: 0000000000000000 [ 191.241647] x27: ffff000009628000 x26: 0000000000000000 [ 191.246946] x25: ffff803f652d7600 x24: 0000000000000004 [ 191.252246] x23: ffff803f6a718900 x22: 0000000000000000 [ 191.257546] x21: 0000000000000000 x20: 0000000000000000 [ 191.262845] x19: 0000000000000000 x18: 0000000000000000 [ 191.268144] x17: 0000000000000000 x16: 0000000000000000 [ 191.273443] x15: 0000000000000000 x14: 0000000100000026 [ 191.278742] x13: 0000000100000025 x12: ffff8a5f7fbe0df0 [ 191.284042] x11: 000000010000000b x10: 0000000000000040 [ 191.289341] x9 : 0000000000001100 x8 : ffff803f6a824fd8 [ 191.294640] x7 : ffff803f6a825098 x6 : 0000000000000001 [ 191.299939] x5 : ffff000000f0ffc0 x4 : 0000000000000000 [ 191.305238] x3 : ffff000028c00000 x2 : ffff803f652d7600 [ 191.310538] x1 : 0000000000000000 x0 : ffff000000f205f0 [ 191.315838] Process swapper/94 (pid: 0, stack limit = 0x00000000addfed5a) [ 191.322613] Call trace: [ 191.325055] ixgbe_ipsec_vf_clear+0x60/0xd0 [ixgbe] [ 191.329927] ixgbe_msg_task+0x2d0/0x1088 [ixgbe] [ 191.334536] ixgbe_msix_other+0x274/0x330 [ixgbe] [ 191.339233] __handle_irq_event_percpu+0x78/0x270 [ 191.343924] handle_irq_event_percpu+0x40/0x98 [ 191.348355] handle_irq_event+0x50/0xa8 [ 191.352180] handle_fasteoi_irq+0xbc/0x148 [ 191.356263] generic_handle_irq+0x34/0x50 [ 191.360259] __handle_domain_irq+0x68/0xc0 [ 191.364343] gic_handle_irq+0x84/0x180 [ 191.368079] el1_irq+0xe8/0x180 [ 191.371208] arch_cpu_idle+0x30/0x1a8 [ 191.374860] do_idle+0x1dc/0x2a0 [ 191.378077] cpu_startup_entry+0x2c/0x30 [ 191.381988] secondary_start_kernel+0x150/0x1e0 [ 191.386506] Code: 6b15003f 54000320 f1404a9f 54000060 (79400260) Fixes: eda0333ac2930 ("ixgbe: add VF IPsec management") Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Acked-by: Shannon Nelson <snelson@pensando.io> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29	e1000e: Increase pause and refresh time	Miguel Bernal Marin	1	-2/+2
	Suggested-by: Tim Pepper <timothy.c.pepper@linux.intel.com> Signed-off-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29	ice: Use struct_size() helper	Gustavo A. R. Silva	1	-2/+2
	One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; size = sizeof(struct foo) + count * sizeof(struct boo); instance = alloc(size, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: size = struct_size(instance, entry, count); This code was detected with the help of Coccinelle. Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29	igb: clear out skb->tstamp after reading the txtime	Vedang Patel	1	-0/+1
	If a packet which is utilizing the launchtime feature (via SO_TXTIME socket option) also requests the hardware transmit timestamp, the hardware timestamp is not delivered to the userspace. This is because the value in skb->tstamp is mistaken as the software timestamp. Applications, like ptp4l, request a hardware timestamp by setting the SOF_TIMESTAMPING_TX_HARDWARE socket option. Whenever a new timestamp is detected by the driver (this work is done in igb_ptp_tx_work() which calls igb_ptp_tx_hwtstamps() in igb_ptp.c[1]), it will queue the timestamp in the ERR_QUEUE for the userspace to read. When the userspace is ready, it will issue a recvmsg() call to collect this timestamp. The problem is in this recvmsg() call. If the skb->tstamp is not cleared out, it will be interpreted as a software timestamp and the hardware tx timestamp will not be successfully sent to the userspace. Look at skb_is_swtx_tstamp() and the callee function __sock_recv_timestamp() in net/socket.c for more details. Signed-off-by: Vedang Patel <vedang.patel@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27	xsk: Return the whole xdp_desc from xsk_umem_consume_tx	Maxim Mikityanskiy	2	-11/+16
	Some drivers want to access the data transmitted in order to implement acceleration features of the NICs. It is also useful in AF_XDP TX flow. Change the xsk_umem_consume_tx API to return the whole xdp_desc, that contains the data pointer, length and DMA address, instead of only the latter two. Adapt the implementation of i40e and ixgbe to this change. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>