Age | Commit message (Collapse) | Author | Files | Lines |
|
Number of .dumpit functions try to ignore -EOPNOTSUPP errors.
Recent change missed that, and started reporting all errors
but -EMSGSIZE back from dumps. This leads to situation like
this:
$ devlink dev info
devlink answers: Operation not supported
Dump should not report an error just because the last device
to be queried could not provide an answer.
To fix this and avoid similar confusion make sure we clear
err properly, and not leave it set to an error if we don't
terminate the iteration.
Fixes: c62c2cfb801b ("net: devlink: don't ignore errors during dumpit")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This lets us use socket storage from the following hooks:
* BPF_CGROUP_INET_SOCK_CREATE
* BPF_CGROUP_INET_SOCK_RELEASE
* BPF_CGROUP_INET4_POST_BIND
* BPF_CGROUP_INET6_POST_BIND
Using existing 'bpf_sk_storage_get_proto' doesn't work because
second argument is ARG_PTR_TO_SOCKET. Even though
BPF_PROG_TYPE_CGROUP_SOCK hooks operate on 'struct bpf_sock',
the verifier still considers it as a PTR_TO_CTX.
That's why I'm adding another 'bpf_sk_storage_get_cg_sock_proto'
definition strictly for BPF_PROG_TYPE_CGROUP_SOCK which accepts
ARG_PTR_TO_CTX which is really 'struct sock' for this program type.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729003104.1280813-1-sdf@google.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull the v5.9 RCU bits from Paul E. McKenney:
- Documentation updates
- Miscellaneous fixes
- kfree_rcu updates
- RCU tasks updates
- Read-side scalability tests
- SRCU updates
- Torture-test updates
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This patch fixes:
commit b9aaec8f0be5 ("fib: use indirect call wrappers in the most common
fib_rules_ops") which didn't consider the case when
CONFIG_IPV6_MULTIPLE_TABLES is not set.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: b9aaec8f0be5 ("fib: use indirect call wrappers in the most common fib_rules_ops")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cited commit mistakenly removed the trap group for externally routed
packets (e.g., via the management interface) and grouped locally routed
and externally routed packet traps under the same group, thereby
subjecting them to the same policer.
This can result in problems, for example, when FRR is restarted and
suddenly all transient traffic is trapped to the CPU because of a
default route through the management interface. Locally routed packets
required to re-establish a BGP connection will never reach the CPU and
the routing tables will not be re-populated.
Fix this by using a different trap group for externally routed packets.
Fixes: 8110668ecd9a ("mlxsw: spectrum_trap: Register layer 3 control traps")
Reported-by: Alex Veber <alexve@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This avoids another inderect call per RX packet which save us around
20-40 ns.
Changelog:
v1 -> v2:
- Move declaraions to fib_rules.h to remove warnings
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that BPF program/link management is centralized in generic net_device
code, kernel code never queries program id from drivers, so
XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary.
This patch removes all the implementations of those commands in kernel, along
the xdp_attachment_query().
This patch was compile-tested on allyesconfig.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
|
|
Implement XDP link-specific show_fdinfo and link_info to emit ifindex.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-7-andriin@fb.com
|
|
Add support for LINK_UPDATE command for BPF XDP link to enable reliable
replacement of underlying BPF program.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-6-andriin@fb.com
|
|
Add bpf_link-based API (bpf_xdp_link) to attach BPF XDP program through
BPF_LINK_CREATE command.
bpf_xdp_link is mutually exclusive with direct BPF program attachment,
previous BPF program should be detached prior to attempting to create a new
bpf_xdp_link attachment (for a given XDP mode). Once BPF link is attached, it
can't be replaced by other BPF program attachment or link attachment. It will
be detached only when the last BPF link FD is closed.
bpf_xdp_link will be auto-detached when net_device is shutdown, similarly to
how other BPF links behave (cgroup, flow_dissector). At that point bpf_link
will become defunct, but won't be destroyed until last FD is closed.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-5-andriin@fb.com
|
|
Further refactor XDP attachment code. dev_change_xdp_fd() is split into two
parts: getting bpf_progs from FDs and attachment logic, working with
bpf_progs. This makes attachment logic a bit more straightforward and
prepares code for bpf_xdp_link inclusion, which will share the common logic.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-4-andriin@fb.com
|
|
Instead of delegating to drivers, maintain information about which BPF
programs are attached in which XDP modes (generic/skb, driver, or hardware)
locally in net_device. This effectively obsoletes XDP_QUERY_PROG command.
Such re-organization simplifies existing code already. But it also allows to
further add bpf_link-based XDP attachments without drivers having to know
about any of this at all, which seems like a good setup.
XDP_SETUP_PROG/XDP_SETUP_PROG_HW are just low-level commands to driver to
install/uninstall active BPF program. All the higher-level concerns about
prog/link interaction will be contained within generic driver-agnostic logic.
All the XDP_QUERY_PROG calls to driver in dev_xdp_uninstall() were removed.
It's not clear for me why dev_xdp_uninstall() were passing previous prog_flags
when resetting installed programs. That seems unnecessary, plus most drivers
don't populate prog_flags anyways. Having XDP_SETUP_PROG vs XDP_SETUP_PROG_HW
should be enough of an indicator of what is required of driver to correctly
reset active BPF program. dev_xdp_uninstall() is also generalized as an
iteration over all three supported mode.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-3-andriin@fb.com
|
|
The bpf iterator for bpf sock local storage map
is implemented. User space interacts with sock
local storage map with fd as a key and storage value.
In kernel, passing fd to the bpf program does not
really make sense. In this case, the sock itself is
passed to bpf program.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184116.590602-1-yhs@fb.com
|
|
The UDP reuseport conflict was a little bit tricky.
The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.
At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.
This requires moving the reuseport_has_conns() logic into the callers.
While we are here, get rid of inline directives as they do not belong
in foo.c files.
The other changes were cases of more straightforward overlapping
modifications.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IRQs are disabled when freeing skbs in input queue.
Use the IRQ safe variant to free skbs here.
Fixes: 145dd5f9c88f ("net: flush the softnet backlog in process context")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rework the remaining setsockopt code to pass a sockptr_t instead of a
plain user pointer. This removes the last remaining set_fs(KERNEL_DS)
outside of architecture specific code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154]
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Retreive a hash value from the SKB and store it
in the dissector key for future matching.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I noticed that touching linux/rhashtable.h causes lib/vsprintf.c to
be rebuilt. This dependency came through a bogus inclusion in the
file net/flow_offload.h. This patch moves it to the right place.
This patch also removes a lingering rhashtable inclusion in cls_api
created by the same commit.
Fixes: 4e481908c51b ("flow_offload: move tc indirect block to...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The purpose of this override is to give the user an indication of what
the number of the CPU port is (in DSA, the CPU port is a hardware
implementation detail and not a network interface capable of traffic).
However, it has always failed (by design) at providing this information
to the user in a reliable fashion.
Prior to commit 3369afba1e46 ("net: Call into DSA netdevice_ops
wrappers"), the behavior was to only override this callback if it was
not provided by the DSA master.
That was its first failure: if the DSA master itself was a DSA port or a
switchdev, then the user would not see the number of the CPU port in
/sys/class/net/eth0/phys_port_name, but the number of the DSA master
port within its respective physical switch.
But that was actually ok in a way. The commit mentioned above changed
that behavior, and now overrides the master's ndo_get_phys_port_name
unconditionally. That comes with problems of its own, which are worse in
a way.
The idea is that it's typical for switchdev users to have udev rules for
consistent interface naming. These are based, among other things, on
the phys_port_name attribute. If we let the DSA switch at the bottom
to start randomly overriding ndo_get_phys_port_name with its own CPU
port, we basically lose any predictability in interface naming, or even
uniqueness, for that matter.
So, there are reasons to let DSA override the master's callback (to
provide a consistent interface, a number which has a clear meaning and
must not be interpreted according to context), and there are reasons to
not let DSA override it (it breaks udev matching for the DSA master).
But, there is an alternative method for users to retrieve the number of
the CPU port of each DSA switch in the system:
$ devlink port
pci/0000:00:00.5/0: type eth netdev swp0 flavour physical port 0
pci/0000:00:00.5/2: type eth netdev swp2 flavour physical port 2
pci/0000:00:00.5/4: type notset flavour cpu port 4
spi/spi2.0/0: type eth netdev sw0p0 flavour physical port 0
spi/spi2.0/1: type eth netdev sw0p1 flavour physical port 1
spi/spi2.0/2: type eth netdev sw0p2 flavour physical port 2
spi/spi2.0/4: type notset flavour cpu port 4
spi/spi2.1/0: type eth netdev sw1p0 flavour physical port 0
spi/spi2.1/1: type eth netdev sw1p1 flavour physical port 1
spi/spi2.1/2: type eth netdev sw1p2 flavour physical port 2
spi/spi2.1/3: type eth netdev sw1p3 flavour physical port 3
spi/spi2.1/4: type notset flavour cpu port 4
So remove this duplicated, unreliable and troublesome method. From this
patch on, the phys_port_name attribute of the DSA master will only
contain information about itself (if at all). If the users need reliable
information about the CPU port they're probably using devlink anyway.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently devlink instance is searched on all doit() operations.
But it is optionally stored into user_ptr[0]. This requires
rediscovering devlink again doing post_doit().
Few devlink commands related to port shared buffers needs 3 pointers
(devlink, devlink_port, and devlink_sb) while executing doit commands.
Though devlink pointer can be derived from the devlink_port during
post_doit() operation when doit() callback has acquired devlink
instance lock, relying on such scheme to access devlik pointer makes
code very fragile.
Hence, to avoid ambiguity in post_doit() and to avoid searching
devlink instance again, simplify code by always storing devlink
instance in user_ptr[0] and derive devlink_sb pointer in their
respective callback routines.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The buildbot found a config where the header isn't already implicitly
pulled in, so add an explicit include as well.
Fixes: 8c918ffbbad4 ("net: remove compat_sock_common_{get,set}sockopt")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-07-21
The following pull-request contains BPF updates for your *net-next* tree.
We've added 46 non-merge commits during the last 6 day(s) which contain
a total of 68 files changed, 4929 insertions(+), 526 deletions(-).
The main changes are:
1) Run BPF program on socket lookup, from Jakub.
2) Introduce cpumap, from Lorenzo.
3) s390 JIT fixes, from Ilya.
4) teach riscv JIT to emit compressed insns, from Luke.
5) use build time computed BTF ids in bpf iter, from Yonghong.
====================
Purely independent overlapping changes in both filter.h and xdp.h
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
Constify devlink instance pointer while checking if reload operation is
supported or not.
This helps to review the scope of checks done in reload.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reload operation is enabled or not is already checked by
devlink_reload(). Hence, remove the duplicate check from
devlink_nl_cmd_reload().
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no need to hold a device global lock when initializing
devlink device fields of a devlink instance which is not yet part of the
devices list.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to
add a newline for easy reading.
root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout
0root@syzkaller:~#
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If an unconnected socket in a UDP reuseport group connect()s, has_conns is
set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all
sockets in udp_hslot looking for the connected socket with the highest
score.
However, when the number of sockets bound to the port exceeds max_socks,
reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2()
to return without scanning all sockets, resulting in that packets sent to
connected sockets may be distributed to unconnected sockets.
Therefore, reuseport_grow() should copy has_conns.
Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
CC: Willem de Bruijn <willemb@google.com>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tcp and udp bpf_iter can reuse some socket ids in
btf_sock_ids, so make it global.
I put the extern definition in btf_ids.h as a central
place so it can be easily discovered by developers.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163402.1393427-1-yhs@fb.com
|
|
Currently, socket types (struct tcp_sock, udp_sock, etc.)
used by bpf_skc_to_*() helpers are computed when vmlinux_btf
is first built in the kernel.
Commit 5a2798ab32ba
("bpf: Add BTF_ID_LIST/BTF_ID/BTF_ID_UNUSED macros")
implemented a mechanism to compute btf_ids at kernel build
time which can simplify kernel implementation and reduce
runtime overhead by removing in-kernel btf_id calculation.
This patch did exactly this, removing in-kernel btf_id
computation and utilizing build-time btf_id computation.
If CONFIG_DEBUG_INFO_BTF is not defined, BTF_ID_LIST will
define an array with size of 5, which is not enough for
btf_sock_ids. So define its own static array if
CONFIG_DEBUG_INFO_BTF is not defined.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163358.1393023-1-yhs@fb.com
|
|
Make the core net_device code call into our ndo_do_ioctl() and
ndo_get_phys_port_name() functions via the wrappers defined previously
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for adding another layer of call into a DSA stacked ops
singleton, wrap the ndo_do_ioctl() call into dev_do_ioctl().
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Just check for a NULL method instead of wiring up
sock_no_{get,set}sockopt.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Handle the few cases that need special treatment in-line using
in_compat_syscall(). This also removes all the now unused
compat_{get,set}sockopt methods.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the compat handling to sock_common_{get,set}sockopt instead,
keyed of in_compat_syscall(). This allow to remove the now unused
->compat_{get,set}sockopt methods from struct proto_ops.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a helper that copies either a native or compat bpf_fprog from
userspace after verifying the length, and remove the compat setsockopt
handlers that now aren't required.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Run a BPF program before looking up a listening socket on the receive path.
Program selects a listening socket to yield as result of socket lookup by
calling bpf_sk_assign() helper and returning SK_PASS code. Program can
revert its decision by assigning a NULL socket with bpf_sk_assign().
Alternatively, BPF program can also fail the lookup by returning with
SK_DROP, or let the lookup continue as usual with SK_PASS on return, when
no socket has been selected with bpf_sk_assign().
This lets the user match packets with listening sockets freely at the last
possible point on the receive path, where we know that packets are destined
for local delivery after undergoing policing, filtering, and routing.
With BPF code selecting the socket, directing packets destined to an IP
range or to a port range to a single socket becomes possible.
In case multiple programs are attached, they are run in series in the order
in which they were attached. The end result is determined from return codes
of all the programs according to following rules:
1. If any program returned SK_PASS and selected a valid socket, the socket
is used as result of socket lookup.
2. If more than one program returned SK_PASS and selected a socket,
last selection takes effect.
3. If any program returned SK_DROP, and no program returned SK_PASS and
selected a socket, socket lookup fails with -ECONNREFUSED.
4. If all programs returned SK_PASS and none of them selected a socket,
socket lookup continues to htable-based lookup.
Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-5-jakub@cloudflare.com
|
|
Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
when looking up a listening socket for a new connection request for
connection oriented protocols, or when looking up an unconnected socket for
a packet for connection-less protocols.
When called, SK_LOOKUP BPF program can select a socket that will receive
the packet. This serves as a mechanism to overcome the limits of what
bind() API allows to express. Two use-cases driving this work are:
(1) steer packets destined to an IP range, on fixed port to a socket
192.0.2.0/24, port 80 -> NGINX socket
(2) steer packets destined to an IP address, on any port to a socket
198.51.100.1, any port -> L7 proxy socket
In its run-time context program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple. Context can be further extended to include ingress
interface identifier.
To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection. Transport layer then uses the selected
socket as a result of socket lookup.
In its basic form, SK_LOOKUP acts as a filter and hence must return either
SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should
look for a socket to receive the packet, or use the one selected by the
program if available, while SK_DROP informs the transport layer that the
lookup should fail.
This patch only enables the user to attach an SK_LOOKUP program to a
network namespace. Subsequent patches hook it up to run on local delivery
path in ipv4 and ipv6 stacks.
Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com
|
|
When vlan_newlink call register_vlan_dev fails, it might return error
with dev->reg_state = NETREG_UNREGISTERED. The rtnl_newlink should
free the memory. But currently rtnl_newlink only free the memory which
state is NETREG_UNINITIALIZED.
BUG: memory leak
unreferenced object 0xffff8881051de000 (size 4096):
comm "syz-executor139", pid 560, jiffies 4294745346 (age 32.445s)
hex dump (first 32 bytes):
76 6c 61 6e 32 00 00 00 00 00 00 00 00 00 00 00 vlan2...........
00 45 28 03 81 88 ff ff 00 00 00 00 00 00 00 00 .E(.............
backtrace:
[<0000000047527e31>] kmalloc_node include/linux/slab.h:578 [inline]
[<0000000047527e31>] kvmalloc_node+0x33/0xd0 mm/util.c:574
[<000000002b59e3bc>] kvmalloc include/linux/mm.h:753 [inline]
[<000000002b59e3bc>] kvzalloc include/linux/mm.h:761 [inline]
[<000000002b59e3bc>] alloc_netdev_mqs+0x83/0xd90 net/core/dev.c:9929
[<000000006076752a>] rtnl_create_link+0x2c0/0xa20 net/core/rtnetlink.c:3067
[<00000000572b3be5>] __rtnl_newlink+0xc9c/0x1330 net/core/rtnetlink.c:3329
[<00000000e84ea553>] rtnl_newlink+0x66/0x90 net/core/rtnetlink.c:3397
[<0000000052c7c0a9>] rtnetlink_rcv_msg+0x540/0x990 net/core/rtnetlink.c:5460
[<000000004b5cb379>] netlink_rcv_skb+0x12b/0x3a0 net/netlink/af_netlink.c:2469
[<00000000c71c20d3>] netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
[<00000000c71c20d3>] netlink_unicast+0x4c6/0x690 net/netlink/af_netlink.c:1329
[<00000000cca72fa9>] netlink_sendmsg+0x735/0xcc0 net/netlink/af_netlink.c:1918
[<000000009221ebf7>] sock_sendmsg_nosec net/socket.c:652 [inline]
[<000000009221ebf7>] sock_sendmsg+0x109/0x140 net/socket.c:672
[<000000001c30ffe4>] ____sys_sendmsg+0x5f5/0x780 net/socket.c:2352
[<00000000b71ca6f3>] ___sys_sendmsg+0x11d/0x1a0 net/socket.c:2406
[<0000000007297384>] __sys_sendmsg+0xeb/0x1b0 net/socket.c:2439
[<000000000eb29b11>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
[<000000006839b4d0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: cb626bf566eb ("net-sysfs: Fix reference count leak")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit aebe4426ccaa4838f36ea805cdf7d76503e65117.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce the capability to attach an eBPF program to cpumap entries.
The idea behind this feature is to add the possibility to define on
which CPU run the eBPF program if the underlying hw does not support
RSS. Current supported verdicts are XDP_DROP and XDP_PASS.
This patch has been tested on Marvell ESPRESSObin using xdp_redirect_cpu
sample available in the kernel tree to identify possible performance
regressions. Results show there are no observable differences in
packet-per-second:
$./xdp_redirect_cpu --progname xdp_cpu_map0 --dev eth0 --cpu 1
rx: 354.8 Kpps
rx: 356.0 Kpps
rx: 356.8 Kpps
rx: 356.3 Kpps
rx: 356.6 Kpps
rx: 356.6 Kpps
rx: 356.7 Kpps
rx: 355.8 Kpps
rx: 356.8 Kpps
rx: 356.8 Kpps
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/5c9febdf903d810b3415732e5cd98491d7d9067a.1594734381.git.lorenzo@kernel.org
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-07-13
The following pull-request contains BPF updates for your *net-next* tree.
We've added 36 non-merge commits during the last 7 day(s) which contain
a total of 62 files changed, 2242 insertions(+), 468 deletions(-).
The main changes are:
1) Avoid trace_printk warning banner by switching bpf_trace_printk to use
its own tracing event, from Alan.
2) Better libbpf support on older kernels, from Andrii.
3) Additional AF_XDP stats, from Ciara.
4) build time resolution of BTF IDs, from Jiri.
5) BPF_CGROUP_INET_SOCK_RELEASE hook, from Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dereferencing the reporter after it was destroyed in order to unlock the
reporters lock results in a use-after-free [1].
Fix this by storing a pointer to the lock in a local variable before
destroying the reporter.
[1]
==================================================================
BUG: KASAN: use-after-free in devlink_health_reporter_destroy+0x15c/0x1b0 net/core/devlink.c:5476
Read of size 8 at addr ffff8880650fd020 by task syz-executor.1/904
CPU: 0 PID: 904 Comm: syz-executor.1 Not tainted 5.8.0-rc2+ #35
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xf6/0x16e lib/dump_stack.c:118
print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
devlink_health_reporter_destroy+0x15c/0x1b0 net/core/devlink.c:5476
nsim_dev_health_exit+0x8b/0xe0 drivers/net/netdevsim/health.c:317
nsim_dev_reload_destroy+0x7f/0x110 drivers/net/netdevsim/dev.c:1134
nsim_dev_reload_down+0x6e/0xd0 drivers/net/netdevsim/dev.c:712
devlink_reload+0xc6/0x3b0 net/core/devlink.c:2952
devlink_nl_cmd_reload+0x2f1/0x7c0 net/core/devlink.c:2987
genl_family_rcv_msg_doit net/netlink/genetlink.c:691 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:736 [inline]
genl_rcv_msg+0x611/0x9d0 net/netlink/genetlink.c:753
netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469
genl_rcv+0x24/0x40 net/netlink/genetlink.c:764
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0x150/0x190 net/socket.c:672
____sys_sendmsg+0x6d8/0x840 net/socket.c:2363
___sys_sendmsg+0xff/0x170 net/socket.c:2417
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2450
do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4748ad
Code: Bad RIP value.
RSP: 002b:00007fd0358adc38 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000056bf00 RCX: 00000000004748ad
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00000000004d1a4b R14: 00007fd0358ae6b4 R15: 00007fd0358add80
Allocated by task 539:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:494 [inline]
__kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
kmalloc include/linux/slab.h:555 [inline]
kzalloc include/linux/slab.h:669 [inline]
__devlink_health_reporter_create+0x91/0x2f0 net/core/devlink.c:5359
devlink_health_reporter_create+0xa1/0x170 net/core/devlink.c:5431
nsim_dev_health_init+0x95/0x3a0 drivers/net/netdevsim/health.c:279
nsim_dev_probe+0xb1e/0xeb0 drivers/net/netdevsim/dev.c:1086
really_probe+0x287/0x6d0 drivers/base/dd.c:525
driver_probe_device+0xfe/0x1d0 drivers/base/dd.c:701
__device_attach_driver+0x21e/0x290 drivers/base/dd.c:807
bus_for_each_drv+0x161/0x1e0 drivers/base/bus.c:431
__device_attach+0x21a/0x360 drivers/base/dd.c:873
bus_probe_device+0x1e6/0x290 drivers/base/bus.c:491
device_add+0xaf2/0x1b00 drivers/base/core.c:2680
nsim_bus_dev_new drivers/net/netdevsim/bus.c:336 [inline]
new_device_store+0x374/0x590 drivers/net/netdevsim/bus.c:215
bus_attr_store+0x75/0xa0 drivers/base/bus.c:122
sysfs_kf_write+0x113/0x170 fs/sysfs/file.c:138
kernfs_fop_write+0x25d/0x480 fs/kernfs/file.c:315
__vfs_write+0x7c/0x100 fs/read_write.c:495
vfs_write+0x265/0x5e0 fs/read_write.c:559
ksys_write+0x12d/0x250 fs/read_write.c:612
do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 904:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
kasan_set_free_info mm/kasan/common.c:316 [inline]
__kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
slab_free_hook mm/slub.c:1474 [inline]
slab_free_freelist_hook mm/slub.c:1507 [inline]
slab_free mm/slub.c:3072 [inline]
kfree+0xe6/0x320 mm/slub.c:4063
devlink_health_reporter_free net/core/devlink.c:5449 [inline]
devlink_health_reporter_put+0xb7/0xf0 net/core/devlink.c:5456
__devlink_health_reporter_destroy net/core/devlink.c:5463 [inline]
devlink_health_reporter_destroy+0x11b/0x1b0 net/core/devlink.c:5475
nsim_dev_health_exit+0x8b/0xe0 drivers/net/netdevsim/health.c:317
nsim_dev_reload_destroy+0x7f/0x110 drivers/net/netdevsim/dev.c:1134
nsim_dev_reload_down+0x6e/0xd0 drivers/net/netdevsim/dev.c:712
devlink_reload+0xc6/0x3b0 net/core/devlink.c:2952
devlink_nl_cmd_reload+0x2f1/0x7c0 net/core/devlink.c:2987
genl_family_rcv_msg_doit net/netlink/genetlink.c:691 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:736 [inline]
genl_rcv_msg+0x611/0x9d0 net/netlink/genetlink.c:753
netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469
genl_rcv+0x24/0x40 net/netlink/genetlink.c:764
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0x150/0x190 net/socket.c:672
____sys_sendmsg+0x6d8/0x840 net/socket.c:2363
___sys_sendmsg+0xff/0x170 net/socket.c:2417
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2450
do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff8880650fd000
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 32 bytes inside of
512-byte region [ffff8880650fd000, ffff8880650fd200)
The buggy address belongs to the page:
page:ffffea0001943f00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8880650ff800 head:ffffea0001943f00 order:2 compound_mapcount:0 compound_pincount:0
flags: 0x100000000010200(slab|head)
raw: 0100000000010200 ffffea0001a06a08 ffffea00010ad308 ffff88806c402500
raw: ffff8880650ff800 0000000000100009 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8880650fcf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8880650fcf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8880650fd000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880650fd080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880650fd100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Fixes: 3c5584bf0a04 ("devlink: Rework devlink health reporter destructor")
Fixes: 15c724b997a8 ("devlink: Add devlink health port reporters API")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The sparse tool complains as follows:
net/core/dev.c:5594:1: warning:
symbol '__pcpu_scope_flush_works' was not declared. Should it be static?
'flush_works' is not used outside of dev.c, so marks
it static.
Fixes: 41852497a9205 ("net: batch calls to flush_all_backlogs()")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously, shared blocks were only relevant for the pseudo-qdiscs ingress
and clsact. Recently, a qevent facility was introduced, which allows to
bind blocks to well-defined slots of a qdisc instance. RED in particular
got two qevents: early_drop and mark. Drivers that wish to offload these
blocks will be sent the usual notification, and need to know which qdisc it
is related to.
To that end, extend flow_block_offload with a "sch" pointer, and initialize
as appropriate. This prompts changes in the indirect block facility, which
now tracks the scheduler in addition to the netdevice. Update signatures of
several functions similarly.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simple fixes which require no deep knowledge of the code.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|