summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2019-05-01net: dsa: Remove legacy probing supportAndrew Lunn5-774/+0
Now that all drivers can be probed using more traditional methods, remove the legacy probe code. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01net: dsa: Add more convenient functions for installing port VLANsVladimir Oltean3-21/+36
This hides the need to perform a two-phase transaction and construct a switchdev_obj_port_vlan struct. Call graph (including a function that will be introduced in a follow-up patch) looks like this now (same for the *_vlan_del function): dsa_slave_vlan_rx_add_vid dsa_port_setup_8021q_tagging | | | | | +-------------+ | | v v dsa_port_vid_add dsa_slave_port_obj_add | | +-------+ +-------+ | | v v dsa_port_vlan_add Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01net: dsa: Skip calling .port_vlan_filtering on no changeVladimir Oltean1-0/+3
Even if VLAN filtering is global, DSA will call this callback once per each port. Drivers should not have to compare the global state with the requested change. So let DSA do it. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01net: dsa: Keep the vlan_filtering setting in dsa_switch if it's globalVladimir Oltean1-1/+4
The current behavior is not as obvious as one would assume (which is that, if the driver set vlan_filtering_is_global = 1, then checking any dp->vlan_filtering would yield the same result). Only the ports which are actively enslaved into a bridge would have vlan_filtering set. This makes it tricky for drivers to check what the global state is. So fix this and make the struct dsa_switch hold this global setting. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01net: dsa: Unset vlan_filtering when ports leave the bridgeVladimir Oltean1-0/+29
When ports are standalone (after they left the bridge), they should have no VLAN filtering semantics (they should pass all traffic to the CPU). Currently this is not true for switchdev drivers, because the bridge "forgets" to unset that. Normally one would think that doing this at the bridge layer would be a better idea, i.e. call br_vlan_filter_toggle() from br_del_if(), similar to how nbp_vlan_init() is called from br_add_if(). However what complicates that approach, and makes this one preferable, is the fact that for the bridge core, vlan_filtering is a per-bridge setting, whereas for switchdev/DSA it is per-port. Also there are switches where the setting is per the entire device, and unsetting vlan_filtering one by one, for each leaving port, would not be possible from the bridge core without a certain level of awareness. So do this in DSA and let drivers be unaware of it. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01net: dsa: Be aware of switches where VLAN filtering is a global settingVladimir Oltean1-7/+45
On some switches, the action of whether to parse VLAN frame headers and use that information for ingress admission is configurable, but not per port. Such is the case for the Broadcom BCM53xx and the NXP SJA1105 families, for example. In that case, DSA can prevent the bridge core from trying to apply different VLAN filtering settings on net devices that belong to the same switch. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01net: dsa: Store vlan_filtering as a property of dsa_portVladimir Oltean1-4/+8
This allows drivers to query the VLAN setting imposed by the bridge driver directly from DSA, instead of keeping their own state based on the .port_vlan_filtering callback. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01net: dsa: Fix pharse -> phase typoVladimir Oltean1-1/+1
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-30Merge branch 'master' of ↵David S. Miller30-1499/+1226
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2019-04-30 1) A lot of work to remove indirections from the xfrm code. From Florian Westphal. 2) Support ESP offload in combination with gso partial. From Boris Pismenny. 3) Remove some duplicated code from vti4. From Jeremy Sowden. Please note that there is merge conflict between commit: 8742dc86d0c7 ("xfrm4: Fix uninitialized memory read in _decode_session4") from the ipsec tree and commit: c53ac41e3720 ("xfrm: remove decode_session indirection from afinfo_policy") from the ipsec-next tree. The merge conflict will appear when those trees get merged during the merge window. The conflict can be solved as it is done in linux-next: https://lkml.org/lkml/2019/4/25/1207 Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-29dsa: Allow tag drivers to be built as modulesAndrew Lunn2-29/+73
Make the CONFIG symbols tristate and add help text. The broadcom and Microchip KSZ tag drivers support two different tagging protocols in one driver. Add a configuration option for the drivers, and then options to select the protocol. Create a submenu for the tagging drivers. Signed-off-by: Andrew Lunn <andrew@lunn.ch> v2: tab/space cleanup Help text wording NET_DSA_TAG_BRCM_COMMON and NET_DSA_TAG_KZS_COMMON hidden v3: More tabification Punctuation v4: trailler->trailer Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-29dsa: tag_brcm: Avoid unused symbolsAndrew Lunn1-2/+6
It is possible that the driver is compiled with both CONFIG_NET_DSA_TAG_BRCM and CONFIG_NET_DSA_TAG_BRCM_PREPEND disabled. This results in warnings about unused symbols. Add some conditional compilation to avoid this. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> v2 Reorder patch to before tag drivers can be modules Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-29dsa: Cleanup unneeded table and make tag structures staticAndrew Lunn11-76/+11
Now that tag drivers dynamically register, we don't need the static table. Remove it. This also means the tag driver structures can be made static. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-29dsa: Make use of the list of tag driversAndrew Lunn1-5/+34
Implement the _get and _put functions to make use of the list of tag drivers. Also, trigger the loading of the module, based on the alias information. The _get function takes a reference on the tag driver, so it cannot be unloaded, and the _put function releases the reference. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> v2: Make tag_driver_register void Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-29dsa: Add stub tag driver put methodAndrew Lunn4-0/+9
When a DSA switch driver is unloaded, the lock on the tag driver should be released so the module can be unloaded. Add the needed calls, but leave the actual release code as a stub. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> v2 Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-29dsa: Rename dsa_resolve_tag_protocol() to _get ready for lockingAndrew Lunn4-4/+5
dsa_resolve_tag_protocol() is used to find the tagging driver needed by a switch driver. When the tagging drivers become modules, it will be necassary to take a reference on the module to prevent it being unloaded. So rename this function to _get() to indicate it has some locking properties. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-29dsa: Register the none tagger opsAndrew Lunn1-0/+7
The none tagger is special in that it does not live in a tag_*.c file, but is within the core. Register/unregister when DSA is loaded/unloaded. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-29dsa: Keep link list of tag driversAndrew Lunn1-0/+28
Let the tag drivers register themselves with the DSA core, keeping them in a linked list. Signed-off-by: Andrew Lunn <andrew@lunn.ch> v2 Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-29dsa: Add boilerplate helper to register DSA tag driver modulesAndrew Lunn10-2/+52
A DSA tag driver module will need to register the tag protocols it implements with the DSA core. Add macros containing this boiler plate. The registration/unregistration code is currently just a stub. A Later patch will add the real implementation. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> v2 Fix indent of #endif Rewrite to move list pointer into a new structure v3 Move kdoc next to macro Fix THIS_MODULE indentation Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-29dsa: Add TAG protocol to tag opsAndrew Lunn10-0/+12
In order that we can match the tagging protocol a switch driver request to the tagger, we need to know what protocol the tagger supports. Add this information to the ops structure. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> v2 More tag protocol to end of structure to keep hot members at the beginning. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-29dsa: Add MODULE_LICENSE to tag driversAndrew Lunn9-0/+9
All the tag drivers are some variant of GPL. Add a MODULE_LICENSE() indicating this, so the drivers can later be compiled as modules. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-29dsa: Add MODULE_ALIAS to taggers in preparation to become modulesAndrew Lunn9-0/+22
When the tag drivers become modules, we will need to dynamically load them based on what the switch drivers need. Add aliases to map between the TAG protocol and the driver. In order to do this, we need the tag protocol number as something which the C pre-processor can stringinfy. Only the compiler knows the value of an enum, CPP cannot use them. So add #defines. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-29dsa: Move tagger name into its ops structureAndrew Lunn10-43/+13
Rather than keep a list to map a tagger ops to a name, place the name into the ops structure. This removes the hard coded list, a step towards making the taggers more dynamic. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> v2: Move name to end of structure, keeping the hot entries at the beginning. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-29dsa: Add SPDX header to tag drivers.Andrew Lunn8-52/+8
Add an SPDX header, and remove the license boilerplate text. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller7-105/+1021
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-04-28 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Introduce BPF socket local storage map so that BPF programs can store private data they associate with a socket (instead of e.g. separate hash table), from Martin. 2) Add support for bpftool to dump BTF types. This is done through a new `bpftool btf dump` sub-command, from Andrii. 3) Enable BPF-based flow dissector for skb-less eth_get_headlen() calls which was currently not supported since skb was used to lookup netns, from Stanislav. 4) Add an opt-in interface for tracepoints to expose a writable context for attached BPF programs, used here for NBD sockets, from Matt. 5) BPF xadd related arm64 JIT fixes and scalability improvements, from Daniel. 6) Change the skb->protocol for bpf_skb_adjust_room() helper in order to support tunnels such as sit. Add selftests as well, from Willem. 7) Various smaller misc fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28genetlink: optionally validate strictly/dumpsJohannes Berg27-3/+358
Add options to strictly validate messages and dump messages, sometimes perhaps validating dump messages non-strictly may be required, so add an option for that as well. Since none of this can really be applied to existing commands, set the options everwhere using the following spatch: @@ identifier ops; expression X; @@ struct genl_ops ops[] = { ..., { .cmd = X, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, ... }, ... }; For new commands one should just not copy the .validate 'opt-out' flags and thus get strict validation. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28netlink: make validation more configurable for future strictnessJohannes Berg122-709/+860
We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be *everything*. The current *_strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to *_parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28net: fix two coding style issuesMichal Kubecek2-3/+4
This is a simple cleanup addressing two coding style issues found by checkpatch.pl in an earlier patch. It's submitted as a separate patch to keep the original patch as it was generated by spatch. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28ipset: drop ipset_nest_start() and ipset_nest_end()Michal Kubecek3-21/+21
After the previous commit, both ipset_nest_start() and ipset_nest_end() are just aliases for nla_nest_start() and nla_nest_end() so that there is no need to keep them. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28netlink: make nla_nest_start() add NLA_F_NESTED flagMichal Kubecek98-422/+469
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27net/tls: byte swap device req TCP seq no upon settingJakub Kicinski1-1/+1
To avoid a sparse warning byteswap the be32 sequence number before it's stored in the atomic value. While at it drop unnecessary brackets and use kernel's u64 type. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27net/tls: remove old exports of sk_destruct functionsJakub Kicinski1-18/+17
tls_device_sk_destruct being set on a socket used to indicate that socket is a kTLS device one. That is no longer true - now we use sk_validate_xmit_skb pointer for that purpose. Remove the export. tls_device_attach() needs to be moved. While at it, remove the dead declaration of tls_sk_destruct(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27net/tls: don't log errors every time offload can't proceedJakub Kicinski1-6/+1
Currently when CONFIG_TLS_DEVICE is set each time kTLS connection is opened and the offload is not successful (either because the underlying device doesn't support it or e.g. it's tables are full) a rate limited error will be printed to the logs. There is nothing wrong with failing TLS offload. SW path will process the packets just fine, drop the noisy messages. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27bpf: Introduce bpf sk local storageMartin KaFai Lau5-0/+824
After allowing a bpf prog to - directly read the skb->sk ptr - get the fullsock bpf_sock by "bpf_sk_fullsock()" - get the bpf_tcp_sock by "bpf_tcp_sock()" - get the listener sock by "bpf_get_listener_sock()" - avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock" into different bpf running context. this patch is another effort to make bpf's network programming more intuitive to do (together with memory and performance benefit). When bpf prog needs to store data for a sk, the current practice is to define a map with the usual 4-tuples (src/dst ip/port) as the key. If multiple bpf progs require to store different sk data, multiple maps have to be defined. Hence, wasting memory to store the duplicated keys (i.e. 4 tuples here) in each of the bpf map. [ The smallest key could be the sk pointer itself which requires some enhancement in the verifier and it is a separate topic. ] Also, the bpf prog needs to clean up the elem when sk is freed. Otherwise, the bpf map will become full and un-usable quickly. The sk-free tracking currently could be done during sk state transition (e.g. BPF_SOCK_OPS_STATE_CB). The size of the map needs to be predefined which then usually ended-up with an over-provisioned map in production. Even the map was re-sizable, while the sk naturally come and go away already, this potential re-size operation is arguably redundant if the data can be directly connected to the sk itself instead of proxy-ing through a bpf map. This patch introduces sk->sk_bpf_storage to provide local storage space at sk for bpf prog to use. The space will be allocated when the first bpf prog has created data for this particular sk. The design optimizes the bpf prog's lookup (and then optionally followed by an inline update). bpf_spin_lock should be used if the inline update needs to be protected. BPF_MAP_TYPE_SK_STORAGE: ----------------------- To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in this patch) needs to be created. Multiple BPF_MAP_TYPE_SK_STORAGE maps can be created to fit different bpf progs' needs. The map enforces BTF to allow printing the sk-local-storage during a system-wise sk dump (e.g. "ss -ta") in the future. The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete a "sk-local-storage" data from a particular sk. Think of the map as a meta-data (or "type") of a "sk-local-storage". This particular "type" of "sk-local-storage" data can then be stored in any sk. The main purposes of this map are mostly: 1. Define the size of a "sk-local-storage" type. 2. Provide a similar syscall userspace API as the map (e.g. lookup/update, map-id, map-btf...etc.) 3. Keep track of all sk's storages of this "type" and clean them up when the map is freed. sk->sk_bpf_storage: ------------------ The main lookup/update/delete is done on sk->sk_bpf_storage (which is a "struct bpf_sk_storage"). When doing a lookup, the "map" pointer is now used as the "key" to search on the sk_storage->list. The "map" pointer is actually serving as the "type" of the "sk-local-storage" that is being requested. To allow very fast lookup, it should be as fast as looking up an array at a stable-offset. At the same time, it is not ideal to set a hard limit on the number of sk-local-storage "type" that the system can have. Hence, this patch takes a cache approach. The last search result from sk_storage->list is cached in sk_storage->cache[] which is a stable sized array. Each "sk-local-storage" type has a stable offset to the cache[] array. In the future, a map's flag could be introduced to do cache opt-out/enforcement if it became necessary. The cache size is 16 (i.e. 16 types of "sk-local-storage"). Programs can share map. On the program side, having a few bpf_progs running in the networking hotpath is already a lot. The bpf_prog should have already consolidated the existing sock-key-ed map usage to minimize the map lookup penalty. 16 has enough runway to grow. All sk-local-storage data will be removed from sk->sk_bpf_storage during sk destruction. bpf_sk_storage_get() and bpf_sk_storage_delete(): ------------------------------------------------ Instead of using bpf_map_(lookup|update|delete)_elem(), the bpf prog needs to use the new helper bpf_sk_storage_get() and bpf_sk_storage_delete(). The verifier can then enforce the ARG_PTR_TO_SOCKET argument. The bpf_sk_storage_get() also allows to "create" new elem if one does not exist in the sk. It is done by the new BPF_SK_STORAGE_GET_F_CREATE flag. An optional value can also be provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE. The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock. Together, it has eliminated the potential use cases for an equivalent bpf_map_update_elem() API (for bpf_prog) in this patch. Misc notes: ---------- 1. map_get_next_key is not supported. From the userspace syscall perspective, the map has the socket fd as the key while the map can be shared by pinned-file or map-id. Since btf is enforced, the existing "ss" could be enhanced to pretty print the local-storage. Supporting a kernel defined btf with 4 tuples as the return key could be explored later also. 2. The sk->sk_lock cannot be acquired. Atomic operations is used instead. e.g. cmpxchg is done on the sk->sk_bpf_storage ptr. Please refer to the source code comments for the details in synchronization cases and considerations. 3. The mem is charged to the sk->sk_omem_alloc as the sk filter does. Benchmark: --------- Here is the benchmark data collected by turning on the "kernel.bpf_stats_enabled" sysctl. Two bpf progs are tested: One bpf prog with the usual bpf hashmap (max_entries = 8192) with the sk ptr as the key. (verifier is modified to support sk ptr as the key That should have shortened the key lookup time.) Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE. Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for each egress skb and then bump the cnt. netperf is used to drive data with 4096 connected UDP sockets. BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run) 27: cgroup_skb name egress_sk_map tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633 loaded_at 2019-04-15T13:46:39-0700 uid 0 xlated 344B jited 258B memlock 4096B map_ids 16 btf_id 5 BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run) 30: cgroup_skb name egress_sk_stora tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739 loaded_at 2019-04-15T13:47:54-0700 uid 0 xlated 168B jited 156B memlock 4096B map_ids 17 btf_id 6 Here is a high-level picture on how are the objects organized: sk ┌──────┐ │ │ │ │ │ │ │*sk_bpf_storage─────▶ bpf_sk_storage └──────┘ ┌───────┐ ┌───────────┤ list │ │ │ │ │ │ │ │ │ │ │ └───────┘ │ │ elem │ ┌────────┐ ├─▶│ snode │ │ ├────────┤ │ │ data │ bpf_map │ ├────────┤ ┌─────────┐ │ │map_node│◀─┬─────┤ list │ │ └────────┘ │ │ │ │ │ │ │ │ elem │ │ │ │ ┌────────┐ │ └─────────┘ └─▶│ snode │ │ ├────────┤ │ bpf_map │ data │ │ ┌─────────┐ ├────────┤ │ │ list ├───────▶│map_node│ │ │ │ └────────┘ │ │ │ │ │ │ elem │ └─────────┘ ┌────────┐ │ ┌─▶│ snode │ │ │ ├────────┤ │ │ │ data │ │ │ ├────────┤ │ │ │map_node│◀─┘ │ └────────┘ │ │ │ ┌───────┐ sk └──────────│ list │ ┌──────┐ │ │ │ │ │ │ │ │ │ │ │ │ └───────┘ │*sk_bpf_storage───────▶bpf_sk_storage └──────┘ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27selftests: bpf: test writable buffers in raw tpsMatt Mullins1-0/+4
This tests that: * a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it uses either: * a variable offset to the tracepoint buffer, or * an offset beyond the size of the tracepoint buffer * a tracer can modify the buffer provided when attached to a writable tracepoint in bpf_prog_test_run Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-26Merge tag 'mac80211-next-for-davem-2019-04-26' of ↵David S. Miller30-345/+1220
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Various updates, notably: * extended key ID support (from 802.11-2016) * per-STA TX power control support * mac80211 TX performance improvements * HE (802.11ax) updates * mesh link probing support * enhancements of multi-BSSID support (also related to HE) * OWE userspace processing support ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26tipc: remove rcu_read_unlock() left in tipc_udp_recv()Eric Dumazet1-1/+0
I forgot to remove one rcu_read_unlock() before a return statement. Joy of mixing goto and return styles in a function :) Fixes: 4109a2c3b91e ("tipc: tipc_udp_recv() cleanup vs rcu verbs") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26ipv6: Initialize fib6_result in bpf_ipv6_fib_lookupDavid Ahern1-1/+1
fib6_result is not initialized in bpf_ipv6_fib_lookup and potentially passses garbage to the fib lookup which triggers a KASAN warning: [ 262.055450] ================================================================== [ 262.057640] BUG: KASAN: user-memory-access in fib6_rule_suppress+0x4b/0xce [ 262.059488] Read of size 8 at addr 00000a20000000b0 by task swapper/1/0 [ 262.061238] [ 262.061673] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.1.0-rc5+ #56 [ 262.063493] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014 [ 262.065593] Call Trace: [ 262.066277] <IRQ> [ 262.066848] dump_stack+0x7e/0xbb [ 262.067764] kasan_report+0x18b/0x1b5 [ 262.069921] __asan_load8+0x7f/0x81 [ 262.070879] fib6_rule_suppress+0x4b/0xce [ 262.071980] fib_rules_lookup+0x275/0x2cd [ 262.073090] fib6_lookup+0x119/0x218 [ 262.076457] bpf_ipv6_fib_lookup+0x39d/0x664 ... Initialize fib6_result to 0. Fixes: b1d40991506aa ("ipv6: Rename fib6_multipath_select and pass fib6_result") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26net: socket: Fix missing break in switch statementGustavo A. R. Silva1-0/+1
Add missing break statement in order to prevent the code from falling through to cases SIOCGSTAMP_NEW and SIOCGSTAMPNS_NEW. This bug was found thanks to the ongoing efforts to enable -Wimplicit-fallthrough. Fixes: 0768e17073dc ("net: socket: implement 64-bit timestamps") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26mac80211: probe unexercised mesh linksRajkumar Manoharan4-0/+43
The requirement for mesh link metric refreshing, is that from one mesh point we be able to send some data frames to other mesh points which are not currently selected as a primary traffic path, but which are only 1 hop away. The absence of the primary path to the chosen node makes it necessary to apply some form of marking on a chosen packet stream so that the packets can be properly steered to the selected node for testing, and not by the regular mesh path lookup. Tested-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org> Signed-off-by: Rajkumar Manoharan <rmanohar@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26mac80211: add option for setting control flagsRajkumar Manoharan3-9/+14
Allows setting of control flags of skb cb - if needed - when calling ieee80211_subif_start_xmit(). Tested-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org> Signed-off-by: Rajkumar Manoharan <rmanohar@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26cfg80211: add support to probe unexercised mesh linkRajkumar Manoharan3-0/+79
Adding support to allow mesh HWMP to measure link metrics on unexercised direct mesh path by sending some data frames to other mesh points which are not currently selected as a primary traffic path but only 1 hop away. The absence of the primary path to the chosen node makes it necessary to apply some form of marking on a chosen packet stream so that the packets can be properly steered to the selected node for testing, and not by the regular mesh path lookup. Tested-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org> Signed-off-by: Rajkumar Manoharan <rmanohar@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26mac80211: Set CAN_REPLACE_PTK0 for SW crypto only driversAlexander Wetzel1-0/+7
Mac80211 SW crypto handles replacing PTK keys correctly. Don't trigger needless warnings or workarounds when the driver can only use the known good SW crypto provided by mac80211. Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26nl80211: do a struct assignment to radar_chandef instead of memcpy()Luca Coelho1-1/+1
We are copying one entire structure to another of the same type in nl80211_notify_radar_detection, so it's simpler and safer to do a struct assignment instead of memcpy(). Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26mac80211: Fix Extended Key ID auto activationAlexander Wetzel1-1/+5
Only enable Extended Key ID support for drivers which are not supporting crypto offload and also do not support A-MPDU. While any driver using SW crypto from mac80211 is generally able to also support Extended Key ID these drivers are likely to mix keyIDs in AMPDUs when rekeying. According to IEEE 802.11-2016 "9.7.3 A-MPDU contents" this is not allowed. Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> [reword comment a bit, move ! into logic expression] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26cfg80211: don't pass pointer to pointer unnecessarilyDan Carpenter2-8/+8
The cfg80211_merge_profile() and ieee802_11_find_bssid_profile() are a bit cleaner if we just pass the merged_ie pointer instead of a pointer to the pointer. This isn't a functional change, it's just a clean up. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26mac80211: store tx power value from user to stationAshok Raj Nagarajan4-0/+65
This patch introduce a new driver callback drv_sta_set_txpwr. This API will copy the transmit power value passed from user space and call the driver callback to set the tx power for the station. Co-developed-by: Balaji Pothunoori <bpothuno@codeaurora.org> Signed-off-by: Ashok Raj Nagarajan <arnagara@codeaurora.org> Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26cfg80211: Add support to set tx power for a station associatedAshok Raj Nagarajan1-0/+43
This patch adds support to set transmit power setting type and transmit power level attributes to NL80211_CMD_SET_STATION in order to facilitate adjusting the transmit power level of a station associated to the AP. The added attributes allow selection of automatic and limited transmit power level, with the level defined in dBm format. Co-developed-by: Balaji Pothunoori <bpothuno@codeaurora.org> Signed-off-by: Ashok Raj Nagarajan <arnagara@codeaurora.org> Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26mac80211: only allocate one queue when using iTXQsJohannes Berg1-5/+5
There's no need to allocate than one queue in the iTXQs case now that we no longer use ndo_select_queue to assign the AC. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26nl80211: Use struct_size() in kzalloc()Gustavo A. R. Silva1-7/+3
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; size = sizeof(struct foo) + count * sizeof(struct boo); instance = kzalloc(size, GFP_KERNEL) Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL) Notice that, in this case, variable size_of_regd is not necessary, hence it is removed. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26cfg80211: Use struct_size() in kzalloc()Gustavo A. R. Silva1-16/+7
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; size = sizeof(struct foo) + count * sizeof(struct boo); instance = kzalloc(size, GFP_KERNEL) Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL) Notice that, in this case, variable size_of_regd is not necessary, hence it is removed. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>