summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-03-30netfilter: nft_exthdr: fix endianness of tcp option castSergey Marinkevich1-5/+3
I got a problem on MIPS with Big-Endian is turned on: every time when NF trying to change TCP MSS it returns because of new.v16 was greater than old.v16. But real MSS was 1460 and my rule was like this: add rule table chain tcp option maxseg size set 1400 And 1400 is lesser that 1460, not greater. Later I founded that main causer is cast from u32 to __be16. Debugging: In example MSS = 1400(HEX: 0x578). Here is representation of each byte like it is in memory by addresses from left to right(e.g. [0x0 0x1 0x2 0x3]). LE — Little-Endian system, BE — Big-Endian, left column is type. LE BE u32: [78 05 00 00] [00 00 05 78] As you can see, u32 representation will be casted to u16 from different half of 4-byte address range. But actually nf_tables uses registers and store data of various size. Actually TCP MSS stored in 2 bytes. But registers are still u32 in definition: struct nft_regs { union { u32 data[20]; struct nft_verdict verdict; }; }; So, access like regs->data[priv->sreg] exactly u32. So, according to table presents above, per-byte representation of stored TCP MSS in register will be: LE BE (u32)regs->data[]: [78 05 00 00] [05 78 00 00] ^^ ^^ We see that register uses just half of u32 and other 2 bytes may be used for some another data. But in nft_exthdr_tcp_set_eval() it casted just like u32 -> __be16: new.v16 = src But u32 overfill __be16, so it get 2 low bytes. For clarity draw one more table(<xx xx> means that bytes will be used for cast). LE BE u32: [<78 05> 00 00] [00 00 <05 78>] (u32)regs->data[]: [<78 05> 00 00] [05 78 <00 00>] As you can see, for Little-Endian nothing changes, but for Big-endian we take the wrong half. In my case there is some other data instead of zeros, so new MSS was wrongly greater. For shooting this bug I used solution for ports ranges. Applying of this patch does not affect Little-Endian systems. Signed-off-by: Sergey Marinkevich <sergey.marinkevich@eltex-co.ru> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30netfilter: flowtable: add counter support in HW offloadwenxu1-0/+12
Store the conntrack counters to the conntrack entry in the HW flowtable offload. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30netfilter: conntrack: add nf_ct_acct_add()wenxu2-4/+14
Add nf_ct_acct_add function to update the conntrack counter with packets and bytes. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30netfilter: nf_tables: skip set types that do not support for expressionsPablo Neira Ayuso3-0/+7
The bitmap set does not support for expressions, skip it from the estimation step. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30netfilter: nft_dynset: validate set expression definitionPablo Neira Ayuso1-2/+7
If the global set expression definition mismatches the dynset expression, then bail out. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30netfilter: nft_set_bitmap: initialize set element extension in lookupsPablo Neira Ayuso1-0/+1
Otherwise, nft_lookup might dereference an uninitialized pointer to the element extension. Fixes: 665153ff5752 ("netfilter: nf_tables: add bitmap set type") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30netfilter: ctnetlink: be more strict when NF_CONNTRACK_MARK is not setRomain Bellan1-1/+1
When CONFIG_NF_CONNTRACK_MARK is not set, any CTA_MARK or CTA_MARK_MASK in netlink message are not supported. We should return an error when one of them is set, not both Fixes: 9306425b70bf ("netfilter: ctnetlink: must check mark attributes vs NULL") Signed-off-by: Romain Bellan <romain.bellan@wifirst.fr> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-29netfilter: nf_queue: prefer nf_queue_entry_freeFlorian Westphal1-18/+9
Instead of dropping refs+kfree, use the helper added in previous patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-29netfilter: nf_queue: do not release refcouts until nf_reinject is doneFlorian Westphal1-4/+2
nf_queue is problematic when another NF_QUEUE invocation happens from nf_reinject(). 1. nf_queue is invoked, increments state->sk refcount. 2. skb is queued, waiting for verdict. 3. sk is closed/released. 3. verdict comes back, nf_reinject is called. 4. nf_reinject drops the reference -- refcount can now drop to 0 Instead of get_ref/release_ref pattern, we need to nest the get_ref calls: get_ref get_ref release_ref release_ref So that when we invoke the next processing stage (another netfilter or the okfn()), we hold at least one reference count on the devices/socket. After previous patch, it is now safe to put the entry even after okfn() has potentially free'd the skb. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-29netfilter: nf_queue: place bridge physports into queue_entry structFlorian Westphal2-31/+27
The refcount is done via entry->skb, which does work fine. Major problem: When putting the refcount of the bridge ports, we must always put the references while the skb is still around. However, we will need to put the references after okfn() to avoid a possible 1 -> 0 -> 1 refcount transition, so we cannot use the skb pointer anymore. Place the physports in the queue entry structure instead to allow for refcounting changes in the next patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-29netfilter: nf_queue: make nf_queue_entry_release_refs staticFlorian Westphal3-11/+11
This is a preparation patch, no logical changes. Move free_entry into core and rename it to something more sensible. Will ease followup patches which will complicate the refcount handling. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27netfilter: flowtable: Use work entry per offload commandPaul Blakey1-31/+15
To allow offload commands to execute in parallel, create workqueue for flow table offload, and use a work entry per offload command. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27netfilter: flowtable: Use rw sem as flow block lockPaul Blakey3-9/+8
Currently flow offload threads are synchronized by the flow block mutex. Use rw lock instead to increase flow insertion (read) concurrency. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27netfilter: nf_tables: silence a RCU-list warning in nft_table_lookup()Qian Cai1-1/+2
It is safe to traverse &net->nft.tables with &net->nft.commit_mutex held using list_for_each_entry_rcu(). Silence the PROVE_RCU_LIST false positive, WARNING: suspicious RCU usage net/netfilter/nf_tables_api.c:523 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by iptables/1384: #0: ffffffff9745c4a8 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x25/0x60 [nf_tables] Call Trace: dump_stack+0xa1/0xea lockdep_rcu_suspicious+0x103/0x10d nft_table_lookup.part.0+0x116/0x120 [nf_tables] nf_tables_newtable+0x12c/0x7d0 [nf_tables] nfnetlink_rcv_batch+0x559/0x1190 [nfnetlink] nfnetlink_rcv+0x1da/0x210 [nfnetlink] netlink_unicast+0x306/0x460 netlink_sendmsg+0x44b/0x770 ____sys_sendmsg+0x46b/0x4a0 ___sys_sendmsg+0x138/0x1a0 __sys_sendmsg+0xb6/0x130 __x64_sys_sendmsg+0x48/0x50 do_syscall_64+0x69/0xf4 entry_SYSCALL_64_after_hwframe+0x49/0xb3 Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27netfilter: flowtable: Fix incorrect tc_setup_type typewenxu5-7/+8
The indirect block setup should use TC_SETUP_FT as the type instead of TC_SETUP_BLOCK. Adjust existing users of the indirect flow block infrastructure. Fixes: b5140a36da78 ("netfilter: flowtable: add indr block setup support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27netfilter: flowtable: add counter supportPablo Neira Ayuso3-1/+12
Add a new flag to turn on flowtable counters which are stored in the conntrack entry. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27netfilter: nf_tables: add enum nft_flowtable_flags to uapiPablo Neira Ayuso3-2/+12
Expose the NFT_FLOWTABLE_HW_OFFLOAD flag through uapi. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27netfilter: conntrack: export nf_ct_acct_update()Pablo Neira Ayuso2-8/+9
This function allows you to update the conntrack counters. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27ipvs: optimize tunnel dumps for icmp errorsHaishuang Yan1-20/+26
After strip GRE/UDP tunnel header for icmp errors, it's better to show "GRE/UDP" instead of "IPIP" in debug message. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27netfilter: conntrack: Add missing annotations for nf_conntrack_all_lock() ↵Jules Irenge1-0/+2
and nf_conntrack_all_unlock() Sparse reports warnings at nf_conntrack_all_lock() and nf_conntrack_all_unlock() warning: context imbalance in nf_conntrack_all_lock() - wrong count at exit warning: context imbalance in nf_conntrack_all_unlock() - unexpected unlock Add the missing __acquires(&nf_conntrack_locks_all_lock) Add missing __releases(&nf_conntrack_locks_all_lock) Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27netfilter: ctnetlink: Add missing annotation for ctnetlink_parse_nat_setup()Jules Irenge1-0/+1
Sparse reports a warning at ctnetlink_parse_nat_setup() warning: context imbalance in ctnetlink_parse_nat_setup() - unexpected unlock The root cause is the missing annotation at ctnetlink_parse_nat_setup() Add the missing __must_hold(RCU) annotation Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19netfilter: flowtable: fix NULL pointer dereference in tunnel offload supportwenxu1-3/+3
The tc ct action does not cache the route in the flowtable entry. Fixes: 88bf6e4114d5 ("netfilter: flowtable: add tunnel encap/decap action offload support") Fixes: cfab6dbd0ecf ("netfilter: flowtable: add tunnel match offload support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19netfilter: nf_tables: add nft_set_elem_expr_destroy() and use itPablo Neira Ayuso1-11/+17
This patch adds nft_set_elem_expr_destroy() to destroy stateful expressions in set elements. This patch also updates the commit path to call this function to invoke expr->ops->destroy_clone when required. This is implicitly fixing up a module reference counter leak and a memory leak in expressions that allocated internal state, e.g. nft_counter. Fixes: 409444522976 ("netfilter: nf_tables: add elements with stateful expressions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19netfilter: nf_tables: fix double-free on set expression from the error pathPablo Neira Ayuso1-0/+1
After copying the expression to the set element extension, release the expression and reset the pointer to avoid a double-free from the error path. Fixes: 409444522976 ("netfilter: nf_tables: add elements with stateful expressions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19netfilter: nf_tables: allow to specify stateful expression in set definitionPablo Neira Ayuso3-12/+52
This patch allows users to specify the stateful expression for the elements in this set via NFTA_SET_EXPR. This new feature allows you to turn on counters for all of the elements in this set. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19netfilter: nf_tables: pass context to nft_set_destroy()Pablo Neira Ayuso1-5/+5
The patch that adds support for stateful expressions in set definitions require this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19netfilter: nf_tables: move nft_expr_clone() to nf_tables_api.cPablo Neira Ayuso3-17/+19
Move the nft_expr_clone() helper function to the core. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19Merge tag 'mlx5-updates-2020-03-17' of ↵David S. Miller12-105/+218
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-03-17 1) Compiler warnings and cleanup for the connection tracking series 2) Bug fixes for the connection tracking series 3) Fix devlink port register sequence 4) Last five patches in the series, By Eli cohen Add the support for forwarding traffic between two eswitch uplink representors (Hairpin for eswitch), using mlx5 termination tables to change the direction of a packet in hw from RX to TX pipeline. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19net: phy: realtek: read actual speed to detect downshiftHeiner Kallweit1-1/+59
At least some integrated PHY's in RTL8168/RTL8125 chip versions support downshift, and the actual link speed can be read from a vendor-specific register. Info about this register was provided by Realtek. More details about downshift configuration (e.g. number of attempts) aren't available, therefore the downshift tunable is not implemented. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19net: sched: Fix hw_stats_type setting in pedit loopPetr Machata1-1/+1
In the commit referenced below, hw_stats_type of an entry is set for every entry that corresponds to a pedit action. However, the assignment is only done after the entry pointer is bumped, and therefore could overwrite memory outside of the entries array. The reason for this positioning may have been that the current entry's hw_stats_type is already set above, before the action-type dispatch. However, if there are no more actions, the assignment is wrong. And if there are, the next round of the for_each_action loop will make the assignment before the action-type dispatch anyway. Therefore fix this issue by simply reordering the two lines. Fixes: 74522e7baae2 ("net: sched: set the hw_stats_type in pedit loop") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19Merge branch 'mlxsw-spectrum_cnt-Expose-counter-resources'David S. Miller8-64/+343
Ido Schimmel says: ==================== mlxsw: spectrum_cnt: Expose counter resources Jiri says: Capacity and utilization of existing flow and RIF counters are currently unavailable to be seen by the user. Use the existing devlink resources API to expose the information: $ sudo devlink resource show pci/0000:00:10.0 -v pci/0000:00:10.0: name kvd resource_path /kvd size 524288 unit entry dpipe_tables none name span_agents resource_path /span_agents size 8 occ 0 unit entry dpipe_tables none name counters resource_path /counters size 79872 occ 44 unit entry dpipe_tables none resources: name flow resource_path /counters/flow size 61440 occ 4 unit entry dpipe_tables none name rif resource_path /counters/rif size 18432 occ 40 unit entry dpipe_tables none ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19selftests: mlxsw: Add tc action hw_stats testsJiri Pirko2-0/+139
Add tests for mlxsw hw_stats types. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19mlxsw: spectrum_cnt: Expose devlink resource occupancy for countersJiri Pirko1-1/+62
Implement occupancy counting for counters and expose over devlink resource API. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19mlxsw: spectrum_cnt: Consolidate subpools initializationJiri Pirko1-22/+16
Put all init operations related to subpools into mlxsw_sp_counter_sub_pools_init(). Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19mlxsw: spectrum_cnt: Move config validation along with resource registerJiri Pirko1-23/+8
Move the validation of subpools configuration, to avoid possible over commitment to resource registration. Add WARN_ON to indicate bug in the code. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19mlxsw: spectrum_cnt: Expose subpool sizes over devlink resourcesJiri Pirko4-17/+101
Implement devlink resources support for counter pools. Move the subpool sizes calculations into the new resources register function. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19mlxsw: spectrum_cnt: Add entry_size_res_id for each subpool and use it to ↵Jiri Pirko1-12/+14
query entry size Add new field to subpool struct that would indicate which resource id should be used to query the entry size for the subpool from the device. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19mlxsw: spectrum_cnt: Move sub_pools under per-instance pool structJiri Pirko1-18/+27
Currently, the global static array of subpools is used. Make it per-instance as multiple instances of the mlxsw driver can have different values. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19selftests: spectrum-2: Adjust tc_flower_scale limit according to current ↵Jiri Pirko1-2/+2
counter count With the change that made the code to query counter bank size from device instead of using hard-coded value, the number of available counters changed for Spectrum-2. Adjust the limit in the selftests. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19mlxsw: spectrum_cnt: Query bank size from FW resourcesJiri Pirko2-6/+11
The bank size is different between Spectrum versions. Also it is a resource that can be queried. So instead of hard coding the value in code, query it from the firmware. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19cxgb4: rework TC filter rule insertion across regionsRahul Lakkireddy6-152/+381
Chelsio NICs have 3 filter regions, in following order of priority: 1. High Priority (HPFILTER) region (Highest Priority). 2. HASH region. 3. Normal FILTER region (Lowest Priority). Currently, there's a 1-to-1 mapping between the prio value passed by TC and the filter region index. However, it's possible to have multiple TC rules with the same prio value. In this case, if a region is exhausted, no attempt is made to try inserting the rule in the next available region. So, rework and remove the 1-to-1 mapping. Instead, dynamically select the region to insert the filter rule, as long as the new rule's prio value doesn't conflict with existing rules across all the 3 regions. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19netfilter: revert introduction of egress hookDaniel Borkmann8-160/+68
This reverts the following commits: 8537f78647c0 ("netfilter: Introduce egress hook") 5418d3881e1f ("netfilter: Generalize ingress hook") b030f194aed2 ("netfilter: Rename ingress hook include file") >From the discussion in [0], the author's main motivation to add a hook in fast path is for an out of tree kernel module, which is a red flag to begin with. Other mentioned potential use cases like NAT{64,46} is on future extensions w/o concrete code in the tree yet. Revert as suggested [1] given the weak justification to add more hooks to critical fast-path. [0] https://lore.kernel.org/netdev/cover.1583927267.git.lukas@wunner.de/ [1] https://lore.kernel.org/netdev/20200318.011152.72770718915606186.davem@davemloft.net/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: David Miller <davem@davemloft.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Alexei Starovoitov <ast@kernel.org> Nacked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19Merge branch 's390-qeth-next'David S. Miller6-70/+174
Julian Wiedmann says: ==================== s390/qeth: updates 2020-03-18 please apply the following patch series for qeth to netdev's net-next tree. This consists of three parts: 1) support for __GFP_MEMALLOC, 2) several ethtool enhancements (.set_channels, SW Timestamping), 3) the usual cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19s390/qeth: use dev->reg_stateJulian Wiedmann3-29/+17
To check whether a netdevice has already been registered, look at NETREG_REGISTERED to replace some hacks I added a while ago. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19s390/qeth: remove gratuitous NULL checksJulian Wiedmann2-5/+0
qeth_do_ioctl() is only reached through our own net_device_ops, so we can trust that dev->ml_priv still contains what we put there earlier. qeth_bridgeport_an_set() is an internal function that doesn't require such sanity checks. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19s390/qeth: add phys_to_virt() translation for AOBJulian Wiedmann1-3/+4
Data addresses in the AOB are absolute, and need to be translated before being fed into kmem_cache_free(). Currently this phys_to_virt() is a no-op. Also see commit 2db01da8d25f ("s390/qdio: fill SBALEs with absolute addresses"). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19s390/qeth: don't report hard-coded driver versionJulian Wiedmann1-1/+0
Versions are meaningless for an in-kernel driver. Instead use the UTS_RELEASE that is set by ethtool_get_drvinfo(). Cc: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19s390/qeth: add SW timestamping support for IQD devicesJulian Wiedmann2-1/+17
This adds support for SOF_TIMESTAMPING_TX_SOFTWARE. No support for non-IQD devices, since they orphan the skb in their xmit path. To play nice with TX bulking, set the timestamp when the buffer that contains the skb(s) is actually flushed out to HW. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19s390/qeth: balance the TX queue selection for IQD devicesJulian Wiedmann3-2/+46
For ucast traffic, qeth_iqd_select_queue() falls back to netdev_pick_tx(). This will potentially use skb_tx_hash() to distribute the flow over all active TX queues - so txq 0 is a valid selection, and qeth_iqd_select_queue() needs to check for this and put it on some other queue. As a result, the distribution for ucast flows is unbalanced and hits QETH_IQD_MIN_UCAST_TXQ heavier than the other queues. Open-coding a custom variant of skb_tx_hash() isn't an option, since netdev_pick_tx() also gives us eg. access to XPS. But we can pull a little trick: add a single TC class that excludes the mcast txq, and thus encourage skb_tx_hash() to not pick the mcast txq. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19s390/qeth: allow configuration of TX queues for IQD devicesJulian Wiedmann2-4/+21
Similar to the support for z/VM NICs, but we need to take extra care about the dedicated mcast queue: 1. netdev_pick_tx() is unaware of this limitation and might select the mcast txq. Catch this. 2. require at least _two_ TX queues - one for ucast, one for mcast. 3. when reducing the number of TX queues, there's a potential race where netdev_cap_txqueue() over-rules the selected txq index and falls back to index 0. This would place ucast traffic on the mcast queue, and result in TX errors. So for IQD, reject a reduction while the interface is running. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>