summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-04-28net: ethernet: ti: ale: use define for host port in cpsw_ale_set_allmulti()Grygorii Strashko1-2/+2
Use ALE_PORT_HOST define for host port in cpsw_ale_set_allmulti() instead of constants. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28net: ethernet: ti: ale: fix mcast super settingGrygorii Strashko1-1/+1
Use correct define ALE_SUPER for ALE Multicast Address Table Entry Supervisory Packet (SUPER) bit setting instead of ALE_BLOCKED. No issues were observed till now as it have never been set, but it's going to be used by new CPSW switch driver. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28net: ethernet: ti: cpsw: drop cpsw_tx_packet_submit()Grygorii Strashko1-12/+3
Drop unnecessary wrapper function cpsw_tx_packet_submit() which is used only in one place. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28net: ethernet: ti: cpsw: use devm_alloc_etherdev_mqs()Grygorii Strashko1-12/+6
Use devm_alloc_etherdev_mqs() and simplify code. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28net: ethernet: ti: cpsw: drop pinctrl_pm_select_default_state callGrygorii Strashko1-3/+0
Drop pinctrl_pm_select_default_state call from probe as default pinctrl state is set by DD core. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28net: ethernet: ti: cpsw: use local var dev in probeGrygorii Strashko1-32/+33
Use local variable struct device *dev in probe to simplify code. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28net: ethernet: ti: cpsw: update cpsw_split_res() to accept cpsw_commonGrygorii Strashko1-8/+6
Update cpsw_split_res() to accept struct cpsw_common instead of struct net_device to simplify code. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28net: ethernet: ti: cpsw: drop CONFIG_TI_CPSW_ALE config optionGrygorii Strashko3-28/+2
All TI drivers CPSW/NETCP can't work without ALE, hence simplify build of those drivers by always linking cpsw_ale and drop CONFIG_TI_CPSW_ALE config option. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28net: ethernet: ti: cpsw: drop TI_DAVINCI_CPDMA config optionGrygorii Strashko3-43/+3
Both drivers CPSW and EMAC can't work without CPDMA, hence simplify build of those drivers by always linking davinci_cpdma and drop TI_DAVINCI_CPDMA config option. Note. the davinci_emac driver module was changed to "ti_davinci_emac" to make build work. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28net: ethernet: ti: convert to SPDX license identifiersGrygorii Strashko18-180/+18
Replace textual license with SPDX-License-Identifier. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28Merge branch 'strict-netlink-validation'David S. Miller163-923/+1764
Johannes Berg says: ==================== strict netlink validation Here's a respin, with the following changes: * change message when rejecting unknown attribute types (David Ahern) * drop nl80211 patch - I'll apply it separately * remove NL_VALIDATE_POLICY - we have a lot of calls to nla_parse() that really should be without a policy as it has previously been validated - need to find a good way to handle this later * include the correct generic netlink change (d'oh, sorry) ==================== Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28genetlink: optionally validate strictly/dumpsJohannes Berg37-3/+405
Add options to strictly validate messages and dump messages, sometimes perhaps validating dump messages non-strictly may be required, so add an option for that as well. Since none of this can really be applied to existing commands, set the options everwhere using the following spatch: @@ identifier ops; expression X; @@ struct genl_ops ops[] = { ..., { .cmd = X, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, ... }, ... }; For new commands one should just not copy the .validate 'opt-out' flags and thus get strict validation. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28netlink: add strict parsing for future attributesJohannes Berg2-0/+22
Unfortunately, we cannot add strict parsing for all attributes, as that would break existing userspace. We currently warn about it, but that's about all we can do. For new attributes, however, the story is better: nobody is using them, so we can reject bad sizes. Also, for new attributes, we need not accept them when the policy doesn't declare their usage. David Ahern and I went back and forth on how to best encode this, and the best way we found was to have a "boundary type", from which point on new attributes have all possible validation applied, and NLA_UNSPEC is rejected. As we didn't want to add another argument to all functions that get a netlink policy, the workaround is to encode that boundary in the first entry of the policy array (which is for type 0 and thus probably not really valid anyway). I put it into the validation union for the rare possibility that somebody is actually using attribute 0, which would continue to work fine unless they tried to use the extended validation, which isn't likely. We also didn't find any in-tree users with type 0. The reason for setting the "start strict here" attribute is that we never really need to start strict from 0, which is invalid anyway (or in legacy families where that isn't true, it cannot be set to strict), so we can thus reserve the value 0 for "don't do this check" and don't have to add the tag to all policies right now. Thus, policies can now opt in to this validation, which we should do for all existing policies, at least when adding new attributes. Note that entirely *new* policies won't need to set it, as the use of that should be using nla_parse()/nlmsg_parse() etc. which anyway do fully strict validation now, regardless of this. So in effect, this patch only covers the "existing command with new attribute" case. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28netlink: re-add parse/validate functions in strict modeJohannes Berg2-0/+106
This re-adds the parse and validate functions like nla_parse() that are now actually strict after the previous rename and were just split out to make sure everything is converted (and if not compilation of the previous patch would fail.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28netlink: make validation more configurable for future strictnessJohannes Berg145-933/+1233
We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be *everything*. The current *_strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to *_parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28netlink: add NLA_MIN_LENJohannes Berg2-2/+13
Rather than using NLA_UNSPEC for this type of thing, use NLA_MIN_LEN so we can make NLA_UNSPEC be NLA_REJECT under certain conditions for future attributes. While at it, also use NLA_EXACT_LEN for the struct example. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28Merge branch 'nla_nest_start'David S. Miller114-494/+565
Michal Kubecek says: ==================== make nla_nest_start() add NLA_F_NESTED flag One of the comments in recent review of the ethtool netlink series pointed out that proposed ethnl_nest_start() helper which adds NLA_F_NESTED to second argument of nla_nest_start() is not really specific to ethtool netlink code. That is hard to argue with as closer inspection revealed that exactly the same helper already exists in ipset code (except it's a macro rather than an inline function). Another observation was that even if NLA_F_NESTED flag was introduced in 2007, only few netlink based interfaces set it in kernel generated messages and even many recently added APIs omit it. That is unfortunate as without the flag, message parsers not familiar with attribute semantics cannot recognize nested attributes and do not see message structure; this affects e.g. wireshark dissector or mnl_nlmsg_fprintf() from libmnl. This is why I'm suggesting to rename existing nla_nest_start() to different name (nla_nest_start_noflag) and reintroduce nla_nest_start() as a wrapper adding NLA_F_NESTED flag. This is implemented in first patch which is mostly generated by spatch. Second patch drops ipset helper macros which lose their purpose. Third patch cleans up minor coding style issues found by checkpatch.pl in first patch. We could leave nla_nest_start() untouched and simply add a wrapper adding NLA_F_NESTED but that would probably preserve the state when even most new code doesn't set the flag. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28net: fix two coding style issuesMichal Kubecek2-3/+4
This is a simple cleanup addressing two coding style issues found by checkpatch.pl in an earlier patch. It's submitted as a separate patch to keep the original patch as it was generated by spatch. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28ipset: drop ipset_nest_start() and ipset_nest_end()Michal Kubecek4-28/+25
After the previous commit, both ipset_nest_start() and ipset_nest_end() are just aliases for nla_nest_start() and nla_nest_end() so that there is no need to keep them. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28netlink: make nla_nest_start() add NLA_F_NESTED flagMichal Kubecek111-466/+539
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27Merge branch 'net-tls-small-code-cleanup'David S. Miller3-50/+38
Jakub Kicinski says: ==================== net/tls: small code cleanup This small patch set cleans up tls (mostly offload parts). Other than avoiding unnecessary error messages - no functional changes here. v2 (Saeed): - fix up Review tags; - remove the warning on failure completely. ==================== Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27net/tls: byte swap device req TCP seq no upon settingJakub Kicinski2-2/+2
To avoid a sparse warning byteswap the be32 sequence number before it's stored in the atomic value. While at it drop unnecessary brackets and use kernel's u64 type. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27net/tls: move definition of tls ops into net/tls.hJakub Kicinski2-22/+18
There seems to be no reason for tls_ops to be defined in netdevice.h which is included in a lot of places. Don't wrap the struct/enum declaration in ifdefs, it trickles down unnecessary ifdefs into driver code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27net/tls: remove old exports of sk_destruct functionsJakub Kicinski2-20/+17
tls_device_sk_destruct being set on a socket used to indicate that socket is a kTLS device one. That is no longer true - now we use sk_validate_xmit_skb pointer for that purpose. Remove the export. tls_device_attach() needs to be moved. While at it, remove the dead declaration of tls_sk_destruct(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27net/tls: don't log errors every time offload can't proceedJakub Kicinski1-6/+1
Currently when CONFIG_TLS_DEVICE is set each time kTLS connection is opened and the offload is not successful (either because the underlying device doesn't support it or e.g. it's tables are full) a rate limited error will be printed to the logs. There is nothing wrong with failing TLS offload. SW path will process the packets just fine, drop the noisy messages. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27Merge branch 'sk-local-storage'Alexei Starovoitov26-114/+2089
Martin KaFai Lau says: ==================== v4: - Move checks to map_alloc_check in patch 1 (Stanislav Fomichev) - Refactor BTF encoding macros to test_btf.h at a new patch 4 (Stanislav Fomichev) - Refactor getenv and add print PASS message at the end of the test in patch 6 (Yonghong Song) v3: - Replace spinlock_types.h with spinlock.h in patch 1 (kbuild test robot <lkp@intel.com>) v2: - Add the "test_maps.h" file in patch 5 This series introduces the BPF sk local storage. The details is in the patch 1 commit message. ==================== Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27bpf: Add ene-to-end test for bpf_sk_storage_* helpersMartin KaFai Lau3-16/+157
This patch rides on an existing BPF_PROG_TYPE_CGROUP_SKB test (test_sock_fields.c) to do a TCP end-to-end test on the new bpf_sk_storage_* helpers. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27bpf: Add BPF_MAP_TYPE_SK_STORAGE test to test_mapsMartin KaFai Lau4-10/+679
This patch adds BPF_MAP_TYPE_SK_STORAGE test to test_maps. The src file is rather long, so it is put into another dir map_tests/ and compile like the current prog_tests/ does. Other existing tests in test_maps can also be re-factored into map_tests/ in the future. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27bpf: Add verifier tests for the bpf_sk_storageMartin KaFai Lau2-19/+152
This patch adds verifier tests for the bpf_sk_storage: 1. ARG_PTR_TO_MAP_VALUE_OR_NULL 2. Map and helper compatibility (e.g. disallow bpf_map_loookup_elem) It also takes this chance to remove the unused struct btf_raw_data and uses the BTF encoding macros from "test_btf.h". Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27bpf: Refactor BTF encoding macro to test_btf.hMartin KaFai Lau2-62/+70
Refactor common BTF encoding macros for other tests to use. The libbpf may reuse some of them in the future which requires some more thoughts before publishing as a libbpf API. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27bpf: Support BPF_MAP_TYPE_SK_STORAGE in bpf map probingMartin KaFai Lau2-1/+74
This patch supports probing for the new BPF_MAP_TYPE_SK_STORAGE. BPF_MAP_TYPE_SK_STORAGE enforces BTF usage, so the new probe requires to create and load a BTF also. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27bpf: Sync bpf.h to toolsMartin KaFai Lau1-1/+43
This patch sync the bpf.h to tools/. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27bpf: Introduce bpf sk local storageMartin KaFai Lau12-5/+914
After allowing a bpf prog to - directly read the skb->sk ptr - get the fullsock bpf_sock by "bpf_sk_fullsock()" - get the bpf_tcp_sock by "bpf_tcp_sock()" - get the listener sock by "bpf_get_listener_sock()" - avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock" into different bpf running context. this patch is another effort to make bpf's network programming more intuitive to do (together with memory and performance benefit). When bpf prog needs to store data for a sk, the current practice is to define a map with the usual 4-tuples (src/dst ip/port) as the key. If multiple bpf progs require to store different sk data, multiple maps have to be defined. Hence, wasting memory to store the duplicated keys (i.e. 4 tuples here) in each of the bpf map. [ The smallest key could be the sk pointer itself which requires some enhancement in the verifier and it is a separate topic. ] Also, the bpf prog needs to clean up the elem when sk is freed. Otherwise, the bpf map will become full and un-usable quickly. The sk-free tracking currently could be done during sk state transition (e.g. BPF_SOCK_OPS_STATE_CB). The size of the map needs to be predefined which then usually ended-up with an over-provisioned map in production. Even the map was re-sizable, while the sk naturally come and go away already, this potential re-size operation is arguably redundant if the data can be directly connected to the sk itself instead of proxy-ing through a bpf map. This patch introduces sk->sk_bpf_storage to provide local storage space at sk for bpf prog to use. The space will be allocated when the first bpf prog has created data for this particular sk. The design optimizes the bpf prog's lookup (and then optionally followed by an inline update). bpf_spin_lock should be used if the inline update needs to be protected. BPF_MAP_TYPE_SK_STORAGE: ----------------------- To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in this patch) needs to be created. Multiple BPF_MAP_TYPE_SK_STORAGE maps can be created to fit different bpf progs' needs. The map enforces BTF to allow printing the sk-local-storage during a system-wise sk dump (e.g. "ss -ta") in the future. The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete a "sk-local-storage" data from a particular sk. Think of the map as a meta-data (or "type") of a "sk-local-storage". This particular "type" of "sk-local-storage" data can then be stored in any sk. The main purposes of this map are mostly: 1. Define the size of a "sk-local-storage" type. 2. Provide a similar syscall userspace API as the map (e.g. lookup/update, map-id, map-btf...etc.) 3. Keep track of all sk's storages of this "type" and clean them up when the map is freed. sk->sk_bpf_storage: ------------------ The main lookup/update/delete is done on sk->sk_bpf_storage (which is a "struct bpf_sk_storage"). When doing a lookup, the "map" pointer is now used as the "key" to search on the sk_storage->list. The "map" pointer is actually serving as the "type" of the "sk-local-storage" that is being requested. To allow very fast lookup, it should be as fast as looking up an array at a stable-offset. At the same time, it is not ideal to set a hard limit on the number of sk-local-storage "type" that the system can have. Hence, this patch takes a cache approach. The last search result from sk_storage->list is cached in sk_storage->cache[] which is a stable sized array. Each "sk-local-storage" type has a stable offset to the cache[] array. In the future, a map's flag could be introduced to do cache opt-out/enforcement if it became necessary. The cache size is 16 (i.e. 16 types of "sk-local-storage"). Programs can share map. On the program side, having a few bpf_progs running in the networking hotpath is already a lot. The bpf_prog should have already consolidated the existing sock-key-ed map usage to minimize the map lookup penalty. 16 has enough runway to grow. All sk-local-storage data will be removed from sk->sk_bpf_storage during sk destruction. bpf_sk_storage_get() and bpf_sk_storage_delete(): ------------------------------------------------ Instead of using bpf_map_(lookup|update|delete)_elem(), the bpf prog needs to use the new helper bpf_sk_storage_get() and bpf_sk_storage_delete(). The verifier can then enforce the ARG_PTR_TO_SOCKET argument. The bpf_sk_storage_get() also allows to "create" new elem if one does not exist in the sk. It is done by the new BPF_SK_STORAGE_GET_F_CREATE flag. An optional value can also be provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE. The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock. Together, it has eliminated the potential use cases for an equivalent bpf_map_update_elem() API (for bpf_prog) in this patch. Misc notes: ---------- 1. map_get_next_key is not supported. From the userspace syscall perspective, the map has the socket fd as the key while the map can be shared by pinned-file or map-id. Since btf is enforced, the existing "ss" could be enhanced to pretty print the local-storage. Supporting a kernel defined btf with 4 tuples as the return key could be explored later also. 2. The sk->sk_lock cannot be acquired. Atomic operations is used instead. e.g. cmpxchg is done on the sk->sk_bpf_storage ptr. Please refer to the source code comments for the details in synchronization cases and considerations. 3. The mem is charged to the sk->sk_omem_alloc as the sk filter does. Benchmark: --------- Here is the benchmark data collected by turning on the "kernel.bpf_stats_enabled" sysctl. Two bpf progs are tested: One bpf prog with the usual bpf hashmap (max_entries = 8192) with the sk ptr as the key. (verifier is modified to support sk ptr as the key That should have shortened the key lookup time.) Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE. Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for each egress skb and then bump the cnt. netperf is used to drive data with 4096 connected UDP sockets. BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run) 27: cgroup_skb name egress_sk_map tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633 loaded_at 2019-04-15T13:46:39-0700 uid 0 xlated 344B jited 258B memlock 4096B map_ids 16 btf_id 5 BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run) 30: cgroup_skb name egress_sk_stora tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739 loaded_at 2019-04-15T13:47:54-0700 uid 0 xlated 168B jited 156B memlock 4096B map_ids 17 btf_id 6 Here is a high-level picture on how are the objects organized: sk ┌──────┐ │ │ │ │ │ │ │*sk_bpf_storage─────▶ bpf_sk_storage └──────┘ ┌───────┐ ┌───────────┤ list │ │ │ │ │ │ │ │ │ │ │ └───────┘ │ │ elem │ ┌────────┐ ├─▶│ snode │ │ ├────────┤ │ │ data │ bpf_map │ ├────────┤ ┌─────────┐ │ │map_node│◀─┬─────┤ list │ │ └────────┘ │ │ │ │ │ │ │ │ elem │ │ │ │ ┌────────┐ │ └─────────┘ └─▶│ snode │ │ ├────────┤ │ bpf_map │ data │ │ ┌─────────┐ ├────────┤ │ │ list ├───────▶│map_node│ │ │ │ └────────┘ │ │ │ │ │ │ elem │ └─────────┘ ┌────────┐ │ ┌─▶│ snode │ │ │ ├────────┤ │ │ │ data │ │ │ ├────────┤ │ │ │map_node│◀─┘ │ └────────┘ │ │ │ ┌───────┐ sk └──────────│ list │ ┌──────┐ │ │ │ │ │ │ │ │ │ │ │ │ └───────┘ │*sk_bpf_storage───────▶bpf_sk_storage └──────┘ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27Merge branch 'writeable-bpf-tracepoints'Alexei Starovoitov19-5/+433
Matt Mullins says: ==================== This adds an opt-in interface for tracepoints to expose a writable context to BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that are attached, while supporting read-only access from existing BPF_PROG_TYPE_RAW_TRACEPOINT programs, as well as from non-BPF-based tracepoints. The initial motivation is to support tracing that can be observed from the remote end of an NBD socket, e.g. by adding flags to the struct nbd_request header. Earlier attempts included adding an NBD-specific tracepoint fd, but in code review, I was recommended to implement it more generically -- as a result, this patchset is far simpler than my initial try. v4->v5: * rebased onto bpf-next/master and fixed merge conflicts * "tools: sync bpf.h" also syncs comments that have previously changed in bpf-next v3->v4: * fixed a silly copy/paste typo in include/trace/events/bpf_test_run.h (_TRACE_NBD_H -> _TRACE_BPF_TEST_RUN_H) * fixed incorrect/misleading wording in patch 1's commit message, since the pointer cannot be directly dereferenced in a BPF_PROG_TYPE_RAW_TRACEPOINT * cleaned up the error message wording if the prog_tests fail * Addressed feedback from Yonghong * reject non-pointer-sized accesses to the buffer pointer * use sizeof(struct nbd_request) as one-byte-past-the-end in raw_tp_writable_reject_nbd_invalid.c * use BPF_MOV64_IMM instead of BPF_LD_IMM64 v2->v3: * Andrew addressed Josef's comments: * C-style commenting in nbd.c * Collapsed identical events into a single DECLARE_EVENT_CLASS. This saves about 2kB of kernel text v1->v2: * add selftests * sync tools/include/uapi/linux/bpf.h * reject variable offset into the buffer * add string representation of PTR_TO_TP_BUFFER to reg_type_str ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27selftests: bpf: test writable buffers in raw tpsMatt Mullins5-0/+210
This tests that: * a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it uses either: * a variable offset to the tracepoint buffer, or * an offset beyond the size of the tracepoint buffer * a tracer can modify the buffer provided when attached to a writable tracepoint in bpf_prog_test_run Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27tools: sync bpf.hMatt Mullins3-1/+11
This adds BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, and fixes up the error: enumeration value ‘BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE’ not handled in switch [-Werror=switch-enum] build errors it would otherwise cause in libbpf. Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27nbd: add tracepoints for send/receive timingAndrew Hall2-0/+59
This adds four tracepoints to nbd, enabling separate tracing of payload and header sending/receipt. In the send path for headers that have already been sent, we also explicitly initialize the handle so it can be referenced by the later tracepoint. Signed-off-by: Andrew Hall <hall@fb.com> Signed-off-by: Matt Mullins <mmullins@fb.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27nbd: trace sending nbd requestsMatt Mullins3-0/+62
This adds a tracepoint that can both observe the nbd request being sent to the server, as well as modify that request , e.g., setting a flag in the request that will cause the server to collect detailed tracing data. The struct request * being handled is included to permit correlation with the block tracepoints. Signed-off-by: Matt Mullins <mmullins@fb.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27bpf: add writable context for raw tracepointsMatt Mullins8-4/+91
This is an opt-in interface that allows a tracepoint to provide a safe buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program. The size of the buffer must be a compile-time constant, and is checked before allowing a BPF program to attach to a tracepoint that uses this feature. The pointer to this buffer will be the first argument of tracepoints that opt in; the pointer is valid and can be bpf_probe_read() by both BPF_PROG_TYPE_RAW_TRACEPOINT and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a tracepoint, but the buffer to which it points may only be written by the latter. Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27bpf, arm64: use more scalable stadd over ldxr / stxr loop in xaddDaniel Borkmann4-9/+71
Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016, lets add support for STADD and use that in favor of LDXR / STXR loop for the XADD mapping if available. STADD is encoded as an alias for LDADD with XZR as the destination register, therefore add LDADD to the instruction encoder along with STADD as special case and use it in the JIT for CPUs that advertise LSE atomics in CPUID register. If immediate offset in the BPF XADD insn is 0, then use dst register directly instead of temporary one. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27bpf, arm64: remove prefetch insn in xadd mappingDaniel Borkmann2-7/+0
Prefetch-with-intent-to-write is currently part of the XADD mapping in the AArch64 JIT and follows the kernel's implementation of atomic_add. This may interfere with other threads executing the LDXR/STXR loop, leading to potential starvation and fairness issues. Drop the optional prefetch instruction. Fixes: 85f68fe89832 ("bpf, arm64: implement jiting of BPF_XADD") Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-26Merge tag 'mac80211-next-for-davem-2019-04-26' of ↵David S. Miller36-360/+1465
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Various updates, notably: * extended key ID support (from 802.11-2016) * per-STA TX power control support * mac80211 TX performance improvements * HE (802.11ax) updates * mesh link probing support * enhancements of multi-BSSID support (also related to HE) * OWE userspace processing support ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26Merge branch 'hns3-next'David S. Miller10-55/+69
Huazhong Tan says: ==================== code optimizations & bugfixes for HNS3 driver This patch-set includes code optimizations and bugfixes for the HNS3 ethernet controller driver. [patch 1/11 - 3/11] fixes some bugs about the IO path [patch 4/11 - 6/11] includes some optimization and bugfixes about mailbox handling [patch 7/11 - 11/11] adds misc code optimizations and bugfixes. Change log: V2->V3: fixes comments from Neil Horman, removes [patch 8/12] V1->V2: adds modification on [patch 8/12] ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26net: hns3: remove reset after command send failedWeihang Li1-10/+0
It's meaningless to trigger reset when failed to send command to IMP, because the failure is usually caused by no authority, illegal command and so on. When that happened, we just need to return the status code for further debugging. Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26net: hns3: prevent double free in hns3_put_ring_config()Huazhong Tan1-5/+5
This patch adds a check for the hns3_put_ring_config() to prevent double free, and for more readable, move the NULL assignment of priv->ring_data into the hns3_put_ring_config(). Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26net: hns3: extend the loopback state acquisition timeliuzhongzhu1-2/+2
The test results show that the maximum time of hardware return to mac link state is 500MS.The software needs to set twice the maximum time of hardware return state (1000MS). If not modified, the loopback test returns probability failure. Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26net: hns3: fix pause configure fail problemHuazhong Tan1-1/+4
When configure pause, current implementation returns directly after setup PFC without setup BP, which is not sufficient. So this patch fixes it, only return while setting PFC failed. Fixes: 44e59e375bf7 ("net: hns3: do not return GE PFC setting err when initializing") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26net: hns3: not reset TQP in the DOWN while VF resettingHuazhong Tan1-2/+4
Since the hardware does not handle mailboxes and the hardware reset include TQP reset, so it is unnecessary to reset TQP in the hclgevf_ae_stop() while doing VF reset. Also it is unnecessary to reset the remaining TQP when one reset fails. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26net: hns3: use a reserved byte to identify need_resp flagHuazhong Tan3-5/+9
This patch uses a reserved byte in the hclge_mbx_vf_to_pf_cmd to save the need_resp flag, so when PF received the mailbox, it can use it to decise whether send a response to VF. For hclge_set_vf_uc_mac_addr(), it should use mbx_need_resp flag to decide whether send response to VF. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26net: hns3: use atomic_t replace u32 for arq's countHuazhong Tan3-5/+6
Since irq handler and mailbox task will both update arq's count, so arq's count should use atomic_t instead of u32, otherwise its value may go wrong finally. Fixes: 07a0556a3a73 ("net: hns3: Changes to support ARQ(Asynchronous Receive Queue)") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>