summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-06-29iwlwifi: dbg_ini: abort region collection in case the size is 0Shahar S Matityahu1-20/+33
Allows to abort region collection in case the region size is 0. It is needed for future regions that their size might be 0. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: update CSI APIJohannes Berg1-3/+8
Update the CSI API to the new version supported by the firmware. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg_ini: dump headers cleanupShahar S Matityahu3-67/+52
Unite dump memory ranges under a single struct and add a specific header for each type of memory. Also, maintain a single version to all dump structures. This cleanup is also needed for the future addition of FW notification regions and others. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg: allow dump collection in case of an early errorShahar S Matityahu2-6/+22
Improve the robustness of the dump collection flow in case of an early error: 1. in iwl_trans_pcie_sync_nmi, disable and enable interrupts only if they were already enabled 2. attempt to initiate dump collection in iwl_fw_dbg_error_collect only if the device is enabled 3. check Tx command queue was already allocated before trying to collect it Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: iwl_mvm_tx_mpdu() must be called with BH disabledJiri Kosina1-1/+3
As iwl_mvm_tx_mpdu() is not disabling BH while obtaining iwl_mvm_sta->lock (which is being taken from BH context as well), it has to be always invoked with BH disabled. Make that clear in a comment. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29bpf: fix uapi bpf_prog_info fields alignmentBaruch Siach2-0/+2
Merge commit 1c8c5a9d38f60 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next") undid the fix from commit 36f9814a494 ("bpf: fix uapi hole for 32 bit compat applications") by taking the gpl_compatible 1-bit field definition from commit b85fab0e67b162 ("bpf: Add gpl_compatible flag to struct bpf_prog_info") as is. That breaks architectures with 16-bit alignment like m68k. Add 31-bit pad after gpl_compatible to restore alignment of following fields. Thanks to Dmitry V. Levin his analysis of this bug history. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Acked-by: Song Liu <songliubraving@fb.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29Merge branch 'bpf-lookup-devmap'Daniel Borkmann9-159/+157
Toke Høiland-Jørgensen says: ==================== When using the bpf_redirect_map() helper to redirect packets from XDP, the eBPF program cannot currently know whether the redirect will succeed, which makes it impossible to gracefully handle errors. To properly fix this will probably require deeper changes to the way TX resources are allocated, but one thing that is fairly straight forward to fix is to allow lookups into devmaps, so programs can at least know when a redirect is *guaranteed* to fail because there is no entry in the map. Currently, programs work around this by keeping a shadow map of another type which indicates whether a map index is valid. This series contains two changes that are complementary ways to fix this issue: - Moving the map lookup into the bpf_redirect_map() helper (and caching the result), so the helper can return an error if no value is found in the map. This includes a refactoring of the devmap and cpumap code to not care about the index on enqueue. - Allowing regular lookups into devmaps from eBPF programs, using the read-only flag to make sure they don't change the values. The performance impact of the series is negligible, in the sense that I cannot measure it because the variance between test runs is higher than the difference pre/post series. Changelog: v6: - Factor out list handling in maps to a helper in list.h (new patch 1) - Rename variables in struct bpf_redirect_info (new patch 3 + patch 4) - Explain why we are clearing out the map in the info struct on lookup failure - Remove unneeded check for forwarding target in tracepoint macro v5: - Rebase on latest bpf-next. - Update documentation for bpf_redirect_map() with the new meaning of flags. v4: - Fix a few nits from Andrii - Lose the #defines in bpf.h and just compare the flags argument directly to XDP_TX in bpf_xdp_redirect_map(). v3: - Adopt Jonathan's idea of using the lower two bits of the flag value as the return code. - Always do the lookup, and cache the result for use in xdp_do_redirect(); to achieve this, refactor the devmap and cpumap code to get rid the bitmap for selecting which devices to flush. v2: - For patch 1, make it clear that the change works for any map type. - For patch 2, just use the new BPF_F_RDONLY_PROG flag to make the return value read-only. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29devmap: Allow map lookups from eBPFToke Høiland-Jørgensen2-5/+7
We don't currently allow lookups into a devmap from eBPF, because the map lookup returns a pointer directly to the dev->ifindex, which shouldn't be modifiable from eBPF. However, being able to do lookups in devmaps is useful to know (e.g.) whether forwarding to a specific interface is enabled. Currently, programs work around this by keeping a shadow map of another type which indicates whether a map index is valid. Since we now have a flag to make maps read-only from the eBPF side, we can simply lift the lookup restriction if we make sure this flag is always set. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29bpf_xdp_redirect_map: Perform map lookup in eBPF helperToke Høiland-Jørgensen4-19/+26
The bpf_redirect_map() helper used by XDP programs doesn't return any indication of whether it can successfully redirect to the map index it was given. Instead, BPF programs have to track this themselves, leading to programs using duplicate maps to track which entries are populated in the devmap. This patch fixes this by moving the map lookup into the bpf_redirect_map() helper, which makes it possible to return failure to the eBPF program. The lower bits of the flags argument is used as the return code, which means that existing users who pass a '0' flag argument will get XDP_ABORTED. With this, a BPF program can check the return code from the helper call and react by, for instance, substituting a different redirect. This works for any type of map used for redirect. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29devmap: Rename ifindex member in bpf_redirect_infoToke Høiland-Jørgensen2-14/+14
The bpf_redirect_info struct has an 'ifindex' member which was named back when the redirects could only target egress interfaces. Now that we can also redirect to sockets and CPUs, this is a bit misleading, so rename the member to tgt_index. Reorder the struct members so we can have 'tgt_index' and 'tgt_value' next to each other in a subsequent patch. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29devmap/cpumap: Use flush list instead of bitmapToke Høiland-Jørgensen3-119/+95
The socket map uses a linked list instead of a bitmap to keep track of which entries to flush. Do the same for devmap and cpumap, as this means we don't have to care about the map index when enqueueing things into the map (and so we can cache the map lookup). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29xskmap: Move non-standard list manipulation to helperToke Høiland-Jørgensen2-2/+15
Add a helper in list.h for the non-standard way of clearing a list that is used in xskmap. This makes it easier to reuse it in the other map types, and also makes sure this usage is not forgotten in any list refactorings in the future. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29selftests/bpf: fix -Wstrict-aliasing in test_sockopt_sk.cStanislav Fomichev1-27/+24
Let's use union with u8[4] and u32 members for sockopt buffer, that should fix any possible aliasing issues. test_sockopt_sk.c: In function ‘getsetsockopt’: test_sockopt_sk.c:115:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] if (*(__u32 *)buf != 0x55AA*2) { ^~ test_sockopt_sk.c:116:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] log_err("Unexpected getsockopt(SO_SNDBUF) 0x%x != 0x55AA*2", ^~~~~~~ Fixes: 8a027dc0d8f5 ("selftests/bpf: add sockopt test that exercises sk helpers") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29net/mlx5e: Disallow tc redirect offload cases we don't supportPaul Blakey3-5/+24
After changing the parent_id to be the same for both NICs of same the hardware device, netdev_port_same_parent_id now returns true for more cases (all the lower devices in the hierarchy are on the same hardware device). If merged eswitch isn't enabled, these cases aren't supported, so disallow them. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-29net/mlx5e: Expose same physical switch_id for all representorsPaul Blakey1-20/+9
Report system_image_guid as the E-Switch switch_id, this ensures that when a NIC contains multiple PCI functions and which has merged eswitch capability, all representors from multiple PFs publish same switch_id. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-29net/mlx5e: Don't refresh TIRs when updating representor SQsGavi Teitz5-4/+21
Refreshing TIRs is done in order to update the TIRs with the current state of SQs in the transport domain, so that the TIRs can filter out undesired self-loopback packets based on the source SQ of the packet. Representor TIRs will only receive packets that originate from their associated vport, due to dedicated steering, and therefore will never receive self-loopback packets, whose source vport will be the vport of the E-Switch manager, and therefore not the vport associated with the representor. As such, it is not necessary to refresh the representors' TIRs, since self-loopback packets can't reach them. Since representors only exist in switchdev mode, and there is no scenario in which a representor will exist in the transport domain alongside a non-representor, it is not necessary to refresh the transport domain's TIRs upon changing the state of a representor's queues. Therefore, do not refresh TIRs upon such a change. Achieve this by adding an update_rx callback to the mlx5e_profile, which refreshes TIRs for non-representors and does nothing for representors, and replace instances of mlx5e_refresh_tirs() upon changing the state of the queues with update_rx(). Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-29net/mlx5e: reduce stack usage in mlx5_eswitch_termtbl_createArnd Bergmann3-12/+12
Putting an empty 'mlx5_flow_spec' structure on the stack is a bit wasteful and causes a warning on 32-bit architectures when building with clang -fsanitize-coverage: drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c: In function 'mlx5_eswitch_termtbl_create': drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c:90:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Since the structure is never written to, we can statically allocate it to avoid the stack usage. To be on the safe side, mark all subsequent function arguments that we pass it into as 'const' as well. Fixes: 10caabdaad5a ("net/mlx5e: Use termination table for VLAN push actions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-29net/mlx5e: Set drvinfo in generic mannerParav Pandit1-1/+1
Consider PCI and non PCI device types while setting device name in get_drvinfo() callback using existing generic device. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-29net/mlx5e: Correct phys_port_name for PF portParav Pandit1-0/+2
Currently PF phys_port_name is named as pfNvf-1 as vport number for PF vport is 65535. Correct PF's phys_port name as agreed upon name as pfN. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-29net/mlx5e: Report netdevice MPLS featuresAriel Levkovich1-0/+5
Set supported device features in the netdevice MPLS features mask. This will enable HW checksumming and TSO for MPLS tagged traffic. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-29net/mlx5e: Move to HW checksumming advertisingAriel Levkovich1-4/+2
This patch changes the way the driver advertises its checksum offload capabilities within the net device features bit mask. Instead of advertising protocol specific checksumming capabilities which are limited today to IPv4 and IPv6, we move to reporing generic HW checksumming capabilities. This will allow the network stack to let mlx5 device offload checksum for cases where the IP header is encapsulated within another protocol and the skb->protocol doesn't indicate one of the IP versions protocol, specifically in the case of MPLS label encapsulating the IP header and the skb->protocol indiciates MPLS ethertype rather than IP. Moving the HW_CSUM reporting is required in the basic net device hw features mask and also in the extensions (vlan and encpasulation features) since the extensions are always multiplied by the basic features set during the packet's traversal through the stack's tx flow. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-29net/mlx5: MPFS, Allow adding the same MAC more than onceGavi Teitz1-1/+6
Remove the limitation preventing adding a vport's MAC address to the Multi-Physical Function Switch (MPFS) more than once per E-switch, as there is no difference in the MPFS if an address is being used by an E-switch more than once. This allows the E-switch to have multiple vports with the same MAC address, allowing vports to be classified by VLAN id instead of by MAC if desired. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-29net/mlx5: MPFS, Cleanup add MAC flowGavi Teitz1-11/+15
Unify and isolate the error handling flow in mlx5_mpfs_add_mac(), removing code duplication. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-29Merge branch 'mlx5-next' of ↵Saeed Mahameed33-630/+1367
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Misc updates from mlx5-next branch: 1) E-Switch vport metadata support for source vport matching 2) Convert mkey_table to XArray 3) Shared IRQs and to use single IRQ for all async EQs Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-29e1000e: PCIm function state supportVitaly Lifshits2-1/+20
Due to commit: 5d8682588605 ("[misc] mei: me: allow runtime pm for platform with D0i3") When disconnecting the cable and reconnecting it the NIC enters DMoff state. This caused wrong link indication and duplex mismatch. This bug is described in: https://bugzilla.redhat.com/show_bug.cgi?id=1689436 Checking PCIm function state and performing PHY reset after a timeout in watchdog task solves this issue. Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29e1000e: Make watchdog use delayed workDetlev Casanova2-27/+32
Use delayed work instead of timers to run the watchdog of the e1000e driver. Simplify the code with one less middle function. Signed-off-by: Detlev Casanova <detlev.casanova@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29i40e: Add macvlan support on i40eHarshitha Ramamurthy2-2/+522
This patch enables macvlan offloads for i40e. The idea is to use channels as macvlan interfaces. The channels are VSIs of type VMDQ. When the first macvlan is created, the maximum number of channels possible are created. From then on, as a macvlan interface is created, a macvlan filter is added to these already created channels (VSIs). This patch utilizes subordinate device traffic classes to make queue groups(channels) available for an upper device like a macvlan. Steps to configure macvlan offloads: 1. ethtool -K ethx l2-fwd-offload on 2. ip link add link ethx name macvlan1 type macvlan 3. ip addr add <address> dev macvlan1 4. ip link set macvlan1 up Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29ixgbevf: Use cached link state instead of re-reading the value for ethtoolAlexander Duyck1-8/+2
Change the ethtool link settings call to just read the cached state out of the adapter structure instead of trying to recheck the value from the PF. Doing this should prevent excessive reading of the mailbox. Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: "Guilherme G. Piccoli" <gpiccoli@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29iavf: fix dereference of null rx_buffer pointerColin Ian King1-2/+4
A recent commit efa14c3985828d ("iavf: allow null RX descriptors") added a null pointer sanity check on rx_buffer, however, rx_buffer is being dereferenced before that check, which implies a null pointer dereference bug can potentially occur. Fix this by only dereferencing rx_buffer until after the null pointer check. Addresses-Coverity: ("Dereference before null check") Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29igb: add RR2DCDELAY to ethtool registers dumpArtem Bityutskiy2-1/+6
This patch adds the RR2DCDELAY register to the ethtool registers dump. RR2DCDELAY exists on I210 and I211 Intel Gigabit Ethernet chips and it stands for "Read Request To Data Completion Delay". Here is how this register is described in the I210 datasheet: "This field captures the maximum PCIe split time in 16 ns units, which is the maximum delay between the read request to the first data completion. This is giving an estimation of the PCIe round trip time." In other words, whenever I210 reads from the host memory (e.g., fetches a descriptor from the ring), the chip measures every PCI DMA read transaction and captures the maximum value. So it ends up containing the longest DMA transaction time. This register is very useful for troubleshooting and research purposes. If you are dealing with time-sensitive networks, this register can help you get an idea of your "I210-to-ring" latency. This helps answering questions like "should I have PCIe ASPM enabled?" or "should I enable deep C-states?" on my system. It is safe to read this register at any point, reading it has no effect on the I210 chip functionality. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29igb: minor ethool regdump amendmentArtem Bityutskiy1-35/+35
This patch has no functional impact and it is just a preparation for the following patch. It removes an early return from the 'igb_get_regs()' function by moving the 82576-only registers dump into an "if" block. With this preparation, we can dump more non-82576 registers at the end of this function. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29iavf: Fix up debug print macroJeff Kirsher1-3/+7
This aligns the iavf_debug() macro with the other Intel drivers. Add the bus number, bus_id field to i40e_bus_info so output shows each physical port(i.e func) in following format: [[[[<domain>]:]<bus>]:][<slot>][.[<func>]] domains are numbered from 0 to ffff), bus (0-ff), slot (0-1f) and function (0-7). Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
2019-06-29e1000e: Reduce boot time by tightening sleep rangesArjan van de Ven7-28/+28
The e1000e driver is a great user of the usleep_range() API, and has nice ranges that in principle help power management. However the ranges that are used only during system startup are very long (and can add easily 100 msec to the boot time) while the power savings of such long ranges is irrelevant due to the one-off, boot only, nature of these functions. This patch shrinks some of the longest ranges to be shorter (while still using a power friendly 1 msec range); this saves 100msec+ of boot time on my BDW NUCs Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29iavf: use struct_size() helperGustavo A. R. Silva1-21/+16
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes, in particular in the context in which this code is being used. So, replace code of the following form: sizeof(struct virtchnl_ether_addr_list) + (count * sizeof(struct virtchnl_ether_addr)) with: struct_size(veal, list, count) and so on... This code was detected with the help of Coccinelle. Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29e1000: Use dma_wmb() instead of wmb() before doorbell writesVenkatesh Srinivas1-3/+3
e1000 writes to doorbells to post transmit descriptors and fill the receive ring. After writing descriptors to memory but before writing to doorbells, use dma_wmb() rather than wmb(). wmb() is more heavyweight than necessary for a device to see descriptor writes. On x86, this avoids SFENCEs before doorbell writes in both the Tx and Rx paths. On ARM, this converts DSB ST -> DMB OSHST. Tested: 82576EB / x86; QEMU (qemu emulates an 8257x) Signed-off-by: Venkatesh Srinivas <venkateshs@google.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29ixgbe: fix potential u32 overflow on shiftColin Ian King1-10/+4
The u32 variable rem is being shifted using u32 arithmetic however it is being passed to div_u64 that expects the expression to be a u64. The 32 bit shift may potentially overflow, so cast rem to a u64 before shifting to avoid this. Also remove comment about overflow. Addresses-Coverity: ("Unintentional integer overflow") Fixes: cd4583206990 ("ixgbe: implement support for SDP/PPS output on X550 hardware") Fixes: 68d9676fc04e ("ixgbe: fix PTP SDP pin setup on X540 hardware") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29ixgbe: Avoid NULL pointer dereference with VF on non-IPsec hwDann Frazier1-0/+3
An ipsec structure will not be allocated if the hardware does not support offload. Fixes the following Oops: [ 191.045452] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 191.054232] Mem abort info: [ 191.057014] ESR = 0x96000004 [ 191.060057] Exception class = DABT (current EL), IL = 32 bits [ 191.065963] SET = 0, FnV = 0 [ 191.069004] EA = 0, S1PTW = 0 [ 191.072132] Data abort info: [ 191.074999] ISV = 0, ISS = 0x00000004 [ 191.078822] CM = 0, WnR = 0 [ 191.081780] user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000043d9e467 [ 191.088382] [0000000000000000] pgd=0000000000000000 [ 191.093252] Internal error: Oops: 96000004 [#1] SMP [ 191.098119] Modules linked in: vhost_net vhost tap vfio_pci vfio_virqfd vfio_iommu_type1 vfio xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter devlink ebtables ip6table_filter ip6_tables iptable_filter bpfilter ipmi_ssif nls_iso8859_1 input_leds joydev ipmi_si hns_roce_hw_v2 ipmi_devintf hns_roce ipmi_msghandler cppc_cpufreq sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 ses enclosure btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c raid1 raid0 multipath linear ixgbevf hibmc_drm ttm [ 191.168607] drm_kms_helper aes_ce_blk aes_ce_cipher syscopyarea crct10dif_ce sysfillrect ghash_ce qla2xxx sysimgblt sha2_ce sha256_arm64 hisi_sas_v3_hw fb_sys_fops sha1_ce uas nvme_fc mpt3sas ixgbe drm hisi_sas_main nvme_fabrics usb_storage hclge scsi_transport_fc ahci libsas hnae3 raid_class libahci xfrm_algo scsi_transport_sas mdio aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 191.202952] CPU: 94 PID: 0 Comm: swapper/94 Not tainted 4.19.0-rc1+ #11 [ 191.209553] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.20.01 04/26/2019 [ 191.218064] pstate: 20400089 (nzCv daIf +PAN -UAO) [ 191.222873] pc : ixgbe_ipsec_vf_clear+0x60/0xd0 [ixgbe] [ 191.228093] lr : ixgbe_msg_task+0x2d0/0x1088 [ixgbe] [ 191.233044] sp : ffff000009b3bcd0 [ 191.236346] x29: ffff000009b3bcd0 x28: 0000000000000000 [ 191.241647] x27: ffff000009628000 x26: 0000000000000000 [ 191.246946] x25: ffff803f652d7600 x24: 0000000000000004 [ 191.252246] x23: ffff803f6a718900 x22: 0000000000000000 [ 191.257546] x21: 0000000000000000 x20: 0000000000000000 [ 191.262845] x19: 0000000000000000 x18: 0000000000000000 [ 191.268144] x17: 0000000000000000 x16: 0000000000000000 [ 191.273443] x15: 0000000000000000 x14: 0000000100000026 [ 191.278742] x13: 0000000100000025 x12: ffff8a5f7fbe0df0 [ 191.284042] x11: 000000010000000b x10: 0000000000000040 [ 191.289341] x9 : 0000000000001100 x8 : ffff803f6a824fd8 [ 191.294640] x7 : ffff803f6a825098 x6 : 0000000000000001 [ 191.299939] x5 : ffff000000f0ffc0 x4 : 0000000000000000 [ 191.305238] x3 : ffff000028c00000 x2 : ffff803f652d7600 [ 191.310538] x1 : 0000000000000000 x0 : ffff000000f205f0 [ 191.315838] Process swapper/94 (pid: 0, stack limit = 0x00000000addfed5a) [ 191.322613] Call trace: [ 191.325055] ixgbe_ipsec_vf_clear+0x60/0xd0 [ixgbe] [ 191.329927] ixgbe_msg_task+0x2d0/0x1088 [ixgbe] [ 191.334536] ixgbe_msix_other+0x274/0x330 [ixgbe] [ 191.339233] __handle_irq_event_percpu+0x78/0x270 [ 191.343924] handle_irq_event_percpu+0x40/0x98 [ 191.348355] handle_irq_event+0x50/0xa8 [ 191.352180] handle_fasteoi_irq+0xbc/0x148 [ 191.356263] generic_handle_irq+0x34/0x50 [ 191.360259] __handle_domain_irq+0x68/0xc0 [ 191.364343] gic_handle_irq+0x84/0x180 [ 191.368079] el1_irq+0xe8/0x180 [ 191.371208] arch_cpu_idle+0x30/0x1a8 [ 191.374860] do_idle+0x1dc/0x2a0 [ 191.378077] cpu_startup_entry+0x2c/0x30 [ 191.381988] secondary_start_kernel+0x150/0x1e0 [ 191.386506] Code: 6b15003f 54000320 f1404a9f 54000060 (79400260) Fixes: eda0333ac2930 ("ixgbe: add VF IPsec management") Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Acked-by: Shannon Nelson <snelson@pensando.io> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29e1000e: Increase pause and refresh timeMiguel Bernal Marin1-2/+2
Suggested-by: Tim Pepper <timothy.c.pepper@linux.intel.com> Signed-off-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29ice: Use struct_size() helperGustavo A. R. Silva1-2/+2
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; size = sizeof(struct foo) + count * sizeof(struct boo); instance = alloc(size, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: size = struct_size(instance, entry, count); This code was detected with the help of Coccinelle. Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-29Merge branch 'net-sched-Add-txtime-assist-support-for-taprio'David S. Miller4-36/+405
Vedang Patel says: ==================== net/sched: Add txtime-assist support for taprio. Changes in v6: - Use _BITUL() instead of BIT() in UAPI for etf. (patch #1) - Fix a bug reported by kbuild test bot in length_to_duration(). (patch #6) - Remove an unused function (get_cycle_start()). (Patch #6) Changes in v5: - Commit message improved for the igb patch (patch #1). - Fixed typo in commit message for etf patch (patch #2). Changes in v4: - Remove inline directive from functions in foo.c. - Fix spacing in pkt_sched.h (for etf patch). Changes in v3: - Simplify implementation for taprio flags. - txtime_delay can only be set if txtime-assist mode is enabled. - txtime_delay and flags will only be visible in tc output if set by user. - Minor changes in error reporting. Changes in v2: - Txtime-offload has now been renamed to txtime-assist mode. - Renamed the offload parameter to flags. - Removed the code which introduced the hardware offloading functionality. Original Cover letter (with above changes included) -------------------------------------------------- Currently, we are seeing packets being transmitted outside their timeslices. We can confirm that the packets are being dequeued at the right time. So, the delay is induced after the packet is dequeued, because taprio, without any offloading, has no control of when a packet is actually transmitted. In order to solve this, we are making use of the txtime feature provided by ETF qdisc. Hardware offloading needs to be supported by the ETF qdisc in order to take advantage of this feature. The taprio qdisc will assign txtime (in skb->tstamp) for all the packets which do not have the txtime allocated via the SO_TXTIME socket option. For the packets which already have SO_TXTIME set, taprio will validate whether the packet will be transmitted in the correct interval. In order to support this, the following parameters have been added: - flags (taprio): This is added in order to support different offloading modes which will be added in the future. - txtime-delay (taprio): This indicates the minimum time it will take for the packet to hit the wire after it reaches taprio_enqueue(). This is useful in determining whether we can transmit the packet in the remaining time if the gate corresponding to the packet is currently open. - skip_skb_check (ETF): ETF currently drops any packet which does not have the SO_TXTIME socket option set. This check can be skipped by specifying this option. Following is an example configuration: tc qdisc replace dev $IFACE parent root handle 100 taprio \\ num_tc 3 \\ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\ queues 1@0 1@0 1@0 \\ base-time $BASE_TIME \\ sched-entry S 01 300000 \\ sched-entry S 02 300000 \\ sched-entry S 04 400000 \\ flags 0x1 \\ txtime-delay 200000 \\ clockid CLOCK_TAI tc qdisc replace dev $IFACE parent 100:1 etf \\ offload delta 200000 clockid CLOCK_TAI skip_skb_check Here, the "flags" parameter is indicating that the txtime-assist mode is enabled. Also, all the traffic classes have been assigned the same queue. This is to prevent the traffic classes in the lower priority queues from getting starved. Note that this configuration is specific to the i210 ethernet card. Other network cards where the hardware queues are given the same priority, might be able to utilize more than one queue. Following are some of the other highlights of the series: - Fix a bug where hardware timestamping and SO_TXTIME options cannot be used together. (Patch 1) - Introduces the skip_skb_check option. (Patch 2) - Make TxTime assist mode work with TCP packets (Patch 7). The following changes are recommended to be done in order to get the best performance from taprio in this mode: ip link set dev enp1s0 mtu 1514 ethtool -K eth0 gso off ethtool -K eth0 tso off ethtool --set-eee eth0 eee off ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29taprio: Adjust timestamps for TCP packetsVedang Patel1-1/+40
When the taprio qdisc is running in "txtime offload" mode, it will set the launchtime value (in skb->tstamp) for all the packets which do not have the SO_TXTIME socket option. But, the TCP packets already have this value set and it indicates the earliest departure time represented in CLOCK_MONOTONIC clock. We need to respect the timestamp set by the TCP subsystem. So, convert this time to the clock which taprio is using and ensure that the packet is not transmitted before the deadline set by TCP. Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29taprio: make clock reference conversions easierVedang Patel1-8/+22
Later in this series we will need to transform from CLOCK_MONOTONIC (used in TCP) to the clock reference used in TAPRIO. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29taprio: Add support for txtime-assist modeVedang Patel2-17/+328
Currently, we are seeing non-critical packets being transmitted outside of their timeslice. We can confirm that the packets are being dequeued at the right time. So, the delay is induced in the hardware side. The most likely reason is the hardware queues are starving the lower priority queues. In order to improve the performance of taprio, we will be making use of the txtime feature provided by the ETF qdisc. For all the packets which do not have the SO_TXTIME option set, taprio will set the transmit timestamp (set in skb->tstamp) in this mode. TAPrio Qdisc will ensure that the transmit time for the packet is set to when the gate is open. If SO_TXTIME is set, the TAPrio qdisc will validate whether the timestamp (in skb->tstamp) occurs when the gate corresponding to skb's traffic class is open. Following two parameters added to support this mode: - flags: used to enable txtime-assist mode. Will also be used to enable other modes (like hardware offloading) later. - txtime-delay: This indicates the minimum time it will take for the packet to hit the wire. This is useful in determining whether we can transmit the packet in the remaining time if the gate corresponding to the packet is currently open. An example configuration for enabling txtime-assist: tc qdisc replace dev eth0 parent root handle 100 taprio \\ num_tc 3 \\ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\ queues 1@0 1@0 1@0 \\ base-time 1558653424279842568 \\ sched-entry S 01 300000 \\ sched-entry S 02 300000 \\ sched-entry S 04 400000 \\ flags 0x1 \\ txtime-delay 40000 \\ clockid CLOCK_TAI tc qdisc replace dev $IFACE parent 100:1 etf skip_sock_check \\ offload delta 200000 clockid CLOCK_TAI Note that all the traffic classes are mapped to the same queue. This is only possible in taprio when txtime-assist is enabled. Also, note that the ETF Qdisc is enabled with offload mode set. In this mode, if the packet's traffic class is open and the complete packet can be transmitted, taprio will try to transmit the packet immediately. This will be done by setting skb->tstamp to current_time + the time delta indicated in the txtime-delay parameter. This parameter indicates the time taken (in software) for packet to reach the network adapter. If the packet cannot be transmitted in the current interval or if the packet's traffic is not currently transmitting, the skb->tstamp is set to the next available timestamp value. This is tracked in the next_launchtime parameter in the struct sched_entry. The behaviour w.r.t admin and oper schedules is not changed from what is present in software mode. The transmit time is already known in advance. So, we do not need the HR timers to advance the schedule and wakeup the dequeue side of taprio. So, HR timer won't be run when this mode is enabled. Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29taprio: Remove inline directiveVedang Patel1-1/+1
Remove inline directive from length_to_duration(). We will let the compiler make the decisions. Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29taprio: calculate cycle_time when schedule is installedVedang Patel1-18/+11
cycle time for a particular schedule is calculated only when it is first installed. So, it makes sense to just calculate it once right after the 'cycle_time' parameter has been parsed and store it in cycle_time. Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29etf: Add skip_sock_checkVedang Patel2-0/+11
Currently, etf expects a socket with SO_TXTIME option set for each packet it encounters. So, it will drop all other packets. But, in the future commits we are planning to add functionality where tstamp value will be set by another qdisc. Also, some packets which are generated from within the kernel (e.g. ICMP packets) do not have any socket associated with them. So, this commit adds support for skip_sock_check. When this option is set, etf will skip checking for a socket and other associated options for all skbs. Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29etf: Don't use BIT() in UAPI headers.Vedang Patel1-2/+2
The BIT() macro isn't exported as part of the UAPI interface. So, the compile-test to ensure they are self contained fails. So, use _BITUL() instead. Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29igb: clear out skb->tstamp after reading the txtimeVedang Patel1-0/+1
If a packet which is utilizing the launchtime feature (via SO_TXTIME socket option) also requests the hardware transmit timestamp, the hardware timestamp is not delivered to the userspace. This is because the value in skb->tstamp is mistaken as the software timestamp. Applications, like ptp4l, request a hardware timestamp by setting the SOF_TIMESTAMPING_TX_HARDWARE socket option. Whenever a new timestamp is detected by the driver (this work is done in igb_ptp_tx_work() which calls igb_ptp_tx_hwtstamps() in igb_ptp.c[1]), it will queue the timestamp in the ERR_QUEUE for the userspace to read. When the userspace is ready, it will issue a recvmsg() call to collect this timestamp. The problem is in this recvmsg() call. If the skb->tstamp is not cleared out, it will be interpreted as a software timestamp and the hardware tx timestamp will not be successfully sent to the userspace. Look at skb_is_swtx_tstamp() and the callee function __sock_recv_timestamp() in net/socket.c for more details. Signed-off-by: Vedang Patel <vedang.patel@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29net: mvpp2: prs: Don't override the sign bit in SRAM parser shiftMaxime Chevallier1-1/+2
The Header Parser allows identifying various fields in the packet headers, used for various kind of filtering and classification steps. This is a re-entrant process, where the offset in the packet header depends on the previous lookup results. This offset is represented in the SRAM results of the TCAM, as a shift to be operated. This shift can be negative in some cases, such as in IPv6 parsing. This commit prevents overriding the sign bit when setting the shift value, which could cause instabilities when parsing IPv6 flows. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Suggested-by: Alan Winkowski <walan@marvell.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29net: phylink: further documentation clarificationsRussell King1-2/+9
Clarify the validate() behaviour in a few cases which weren't mentioned in the documentation, but which are necessary for users to get the correct behaviour. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>