summaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)AuthorFilesLines
2023-02-13net: phy: add genphy_c45_ethtool_get/set_eee() supportOleksij Rempel1-0/+8
Add replacement for phy_ethtool_get/set_eee() functions. Current phy_ethtool_get/set_eee() implementation is great and it is possible to make it even better: - this functionality is for devices implementing parts of IEEE 802.3 specification beyond Clause 22. The better place for this code is phy-c45.c - currently it is able to do read/write operations on PHYs with different abilities to not existing registers. It is better to use stored supported_eee abilities to avoid false read/write operations. - the eee_active detection will provide wrong results on not supported link modes. It is better to validate speed/duplex properties against supported EEE link modes. - it is able to support only limited amount of link modes. We have more EEE link modes... By refactoring this code I address most of this point except of the last one. Adding additional EEE link modes will need more work. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13net: ethtool: extend ringparam set/get APIs for rx_pushShannon Nelson1-0/+1
Similar to what was done for TX_PUSH, add an RX_PUSH concept to the ethtool interfaces. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-11Daniel Borkmann says:Jakub Kicinski2-1/+60
==================== pull-request: bpf-next 2023-02-11 We've added 96 non-merge commits during the last 14 day(s) which contain a total of 152 files changed, 4884 insertions(+), 962 deletions(-). There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c between commit 5b246e533d01 ("ice: split probe into smaller functions") from the net-next tree and commit 66c0e13ad236 ("drivers: net: turn on XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev() is otherwise there a 2nd time, and add XDP features to the existing ice_cfg_netdev() one: [...] ice_set_netdev_features(netdev); netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | NETDEV_XDP_ACT_XSK_ZEROCOPY; ice_set_ops(netdev); [...] Stephen's merge conflict mail: https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/ The main changes are: 1) Add support for BPF trampoline on s390x which finally allows to remove many test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich. 2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski. 3) Add capability to export the XDP features supported by the NIC. Along with that, add a XDP compliance test tool, from Lorenzo Bianconi & Marek Majtyka. 4) Add __bpf_kfunc tag for marking kernel functions as kfuncs, from David Vernet. 5) Add a deep dive documentation about the verifier's register liveness tracking algorithm, from Eduard Zingerman. 6) Fix and follow-up cleanups for resolve_btfids to be compiled as a host program to avoid cross compile issues, from Jiri Olsa & Ian Rogers. 7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted when testing on different NICs, from Jesper Dangaard Brouer. 8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang. 9) Extend libbpf to add an option for when the perf buffer should wake up, from Jon Doron. 10) Follow-up fix on xdp_metadata selftest to just consume on TX completion, from Stanislav Fomichev. 11) Extend the kfuncs.rst document with description on kfunc lifecycle & stability expectations, from David Vernet. 12) Fix bpftool prog profile to skip attaching to offline CPUs, from Tonghao Zhang. ==================== Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-0/+2
net/devlink/leftover.c / net/core/devlink.c: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global") f05bd8ebeb69 ("devlink: move code to a dedicated directory") 687125b5799c ("devlink: split out core code") https://lore.kernel.org/all/20230208094657.379f2b1a@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-06uapi: add missing ip/ipv6 header dependencies for linux/stddef.hHerton R. Krzesinski2-0/+2
Since commit 58e0be1ef6118 ("net: use struct_group to copy ip/ipv6 header addresses"), ip and ipv6 headers started to use the __struct_group definition, which is defined at include/uapi/linux/stddef.h. However, linux/stddef.h isn't explicitly included in include/uapi/linux/{ip,ipv6}.h, which breaks build of xskxceiver bpf selftest if you install the uapi headers in the system: $ make V=1 xskxceiver -C tools/testing/selftests/bpf ... make: Entering directory '(...)/tools/testing/selftests/bpf' gcc -g -O0 -rdynamic -Wall -Werror (...) In file included from xskxceiver.c:79: /usr/include/linux/ip.h:103:9: error: expected specifier-qualifier-list before ‘__struct_group’ 103 | __struct_group(/* no tag */, addrs, /* no attrs */, | ^~~~~~~~~~~~~~ ... Include the missing <linux/stddef.h> dependency in ip.h and do the same for the ipv6.h header. Fixes: 58e0be1ef611 ("net: use struct_group to copy ip/ipv6 header addresses") Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06net: bridge: Add netlink knobs for number / maximum MDB entriesPetr Machata2-0/+4
The previous patch added accounting for number of MDB entries per port and per port-VLAN, and the logic to verify that these values stay within configured bounds. However it didn't provide means to actually configure those bounds or read the occupancy. This patch does that. Two new netlink attributes are added for the MDB occupancy: IFLA_BRPORT_MCAST_N_GROUPS for the per-port occupancy and BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS for the per-port-VLAN occupancy. And another two for the maximum number of MDB entries: IFLA_BRPORT_MCAST_MAX_GROUPS for the per-port maximum, and BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS for the per-port-VLAN one. Note that the two new IFLA_BRPORT_ attributes prompt bumping of RTNL_SLAVE_MAX_TYPE to size the slave attribute tables large enough. The new attributes are used like this: # ip link add name br up type bridge vlan_filtering 1 mcast_snooping 1 \ mcast_vlan_snooping 1 mcast_querier 1 # ip link set dev v1 master br # bridge vlan add dev v1 vid 2 # bridge vlan set dev v1 vid 1 mcast_max_groups 1 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1 # bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1 Error: bridge: Port-VLAN is already in 1 groups, and mcast_max_groups=1. # bridge link set dev v1 mcast_max_groups 1 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 2 Error: bridge: Port is already in 1 groups, and mcast_max_groups=1. # bridge -d link show 5: v1@v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br [...] [...] mcast_n_groups 1 mcast_max_groups 1 # bridge -d vlan show port vlan-id br 1 PVID Egress Untagged state forwarding mcast_router 1 v1 1 PVID Egress Untagged [...] mcast_n_groups 1 mcast_max_groups 1 2 [...] mcast_n_groups 0 mcast_max_groups 0 Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-04bpf: fix typo in header for bpf_perf_prog_read_valueFlorian Lehner1-1/+1
Fix a simple typo in the documentation for bpf_perf_prog_read_value. Signed-off-by: Florian Lehner <dev@der-flo.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20230203121439.25884-1-dev@der-flo.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-03netdev-genl: create a simple family for netdev stuffJakub Kicinski1-0/+59
Add a Netlink spec-compatible family for netdevs. This is a very simple implementation without much thought going into it. It allows us to reap all the benefits of Netlink specs, one can use the generic client to issue the commands: $ ./cli.py --spec netdev.yaml --dump dev_get [{'ifindex': 1, 'xdp-features': set()}, {'ifindex': 2, 'xdp-features': {'basic', 'ndo-xmit', 'redirect'}}, {'ifindex': 3, 'xdp-features': {'rx-sg'}}] the generic python library does not have flags-by-name support, yet, but we also don't have to carry strings in the messages, as user space can get the names from the spec. Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Co-developed-by: Marek Majtyka <alardam@gmail.com> Signed-off-by: Marek Majtyka <alardam@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/327ad9c9868becbe1e601b580c962549c8cd81f2.1675245258.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-02net: add gso_ipv4_max_size and gro_ipv4_max_size per deviceXin Long1-0/+3
This patch introduces gso_ipv4_max_size and gro_ipv4_max_size per device and adds netlink attributes for them, so that IPV4 BIG TCP can be guarded by a separate tunable in the next patch. To not break the old application using "gso/gro_max_size" for IPv4 GSO packets, this patch updates "gso/gro_ipv4_max_size" in netif_set_gso/gro_max_size() if the new size isn't greater than GSO_LEGACY_MAX_SIZE, so that nothing will change even if userspace doesn't realize the new netlink attributes. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02packet: add TP_STATUS_GSO_TCP for tp_statusXin Long1-0/+1
Introduce TP_STATUS_GSO_TCP tp_status flag to tell the af_packet user that this is a TCP GSO packet. When parsing IPv4 BIG TCP packets in tcpdump/libpcap, it can use tp_len as the IPv4 packet len when this flag is set, as iph tot_len is set to 0 for IPv4 BIG TCP packets. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-30Merge tag 'batadv-next-pullrequest-20230127' of ↵David S. Miller1-0/+2
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - drop prandom.h includes, by Sven Eckelmann - fix mailing list address, by Sven Eckelmann - multicast feature preparation, by Linus Lüssing (2 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-28Merge tag 'for-netdev' of ↵Jakub Kicinski1-0/+12
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2023-01-28 We've added 124 non-merge commits during the last 22 day(s) which contain a total of 124 files changed, 6386 insertions(+), 1827 deletions(-). The main changes are: 1) Implement XDP hints via kfuncs with initial support for RX hash and timestamp metadata kfuncs, from Stanislav Fomichev and Toke Høiland-Jørgensen. Measurements on overhead: https://lore.kernel.org/bpf/875yellcx6.fsf@toke.dk 2) Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case, from Andrii Nakryiko. 3) Significantly reduce the search time for module symbols by livepatch and BPF, from Jiri Olsa and Zhen Lei. 4) Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals, from David Vernet. 5) Fix several issues in the dynptr processing such as stack slot liveness propagation, missing checks for PTR_TO_STACK variable offset, etc, from Kumar Kartikeya Dwivedi. 6) Various performance improvements, fixes, and introduction of more than just one XDP program to XSK selftests, from Magnus Karlsson. 7) Big batch to BPF samples to reduce deprecated functionality, from Daniel T. Lee. 8) Enable struct_ops programs to be sleepable in verifier, from David Vernet. 9) Reduce pr_warn() noise on BTF mismatches when they are expected under the CONFIG_MODULE_ALLOW_BTF_MISMATCH config anyway, from Connor O'Brien. 10) Describe modulo and division by zero behavior of the BPF runtime in BPF's instruction specification document, from Dave Thaler. 11) Several improvements to libbpf API documentation in libbpf.h, from Grant Seltzer. 12) Improve resolve_btfids header dependencies related to subcmd and add proper support for HOSTCC, from Ian Rogers. 13) Add ipip6 and ip6ip decapsulation support for bpf_skb_adjust_room() helper along with BPF selftests, from Ziyang Xuan. 14) Simplify the parsing logic of structure parameters for BPF trampoline in the x86-64 JIT compiler, from Pu Lehui. 15) Get BTF working for kernels with CONFIG_RUST enabled by excluding Rust compilation units with pahole, from Martin Rodriguez Reboredo. 16) Get bpf_setsockopt() working for kTLS on top of TCP sockets, from Kui-Feng Lee. 17) Disable stack protection for BPF objects in bpftool given BPF backends don't support it, from Holger Hoffstätte. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (124 commits) selftest/bpf: Make crashes more debuggable in test_progs libbpf: Add documentation to map pinning API functions libbpf: Fix malformed documentation formatting selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket. bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt(). bpf/selftests: Verify struct_ops prog sleepable behavior bpf: Pass const struct bpf_prog * to .check_member libbpf: Support sleepable struct_ops.s section bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable selftests/bpf: Fix vmtest static compilation error tools/resolve_btfids: Alter how HOSTCC is forced tools/resolve_btfids: Install subcmd headers bpf/docs: Document the nocast aliasing behavior of ___init bpf/docs: Document how nested trusted fields may be defined bpf/docs: Document cpumask kfuncs in a new file selftests/bpf: Add selftest suite for cpumask kfuncs selftests/bpf: Add nested trust selftests suite bpf: Enable cpumasks to be queried and used as kptrs bpf: Disallow NULLable pointers for trusted kfuncs ... ==================== Link: https://lore.kernel.org/r/20230128004827.21371-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-4/+2
Conflicts: drivers/net/ethernet/intel/ice/ice_main.c 418e53401e47 ("ice: move devlink port creation/deletion") 643ef23bd9dd ("ice: Introduce local var for readability") https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/ https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/ drivers/net/ethernet/engleder/tsnep_main.c 3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues") 25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock") https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/ net/netfilter/nf_conntrack_proto_sctp.c 13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"") a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths") f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets") https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/ https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-26icmp: Add counters for rate limitsJamie Bainbridge1-0/+3
There are multiple ICMP rate limiting mechanisms: * Global limits: net.ipv4.icmp_msgs_burst/icmp_msgs_per_sec * v4 per-host limits: net.ipv4.icmp_ratelimit/ratemask * v6 per-host limits: net.ipv6.icmp_ratelimit/ratemask However, when ICMP output is limited, there is no way to tell which limit has been hit or even if the limits are responsible for the lack of ICMP output. Add counters for each of the cases above. As we are within local_bh_disable(), use the __INC stats variant. Example output: # nstat -sz "*RateLimit*" IcmpOutRateLimitGlobal 134 0.0 IcmpOutRateLimitHost 770 0.0 Icmp6OutRateLimitHost 84 0.0 Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Suggested-by: Abhishek Rawal <rawal.abhishek92@gmail.com> Link: https://lore.kernel.org/r/273b32241e6b7fdc5c609e6f5ebc68caf3994342.1674605770.git.jamie.bainbridge@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26inet: Add IP_LOCAL_PORT_RANGE socket optionJakub Sitnicki1-0/+1
Users who want to share a single public IP address for outgoing connections between several hosts traditionally reach for SNAT. However, SNAT requires state keeping on the node(s) performing the NAT. A stateless alternative exists, where a single IP address used for egress can be shared between several hosts by partitioning the available ephemeral port range. In such a setup: 1. Each host gets assigned a disjoint range of ephemeral ports. 2. Applications open connections from the host-assigned port range. 3. Return traffic gets routed to the host based on both, the destination IP and the destination port. An application which wants to open an outgoing connection (connect) from a given port range today can choose between two solutions: 1. Manually pick the source port by bind()'ing to it before connect()'ing the socket. This approach has a couple of downsides: a) Search for a free port has to be implemented in the user-space. If the chosen 4-tuple happens to be busy, the application needs to retry from a different local port number. Detecting if 4-tuple is busy can be either easy (TCP) or hard (UDP). In TCP case, the application simply has to check if connect() returned an error (EADDRNOTAVAIL). That is assuming that the local port sharing was enabled (REUSEADDR) by all the sockets. # Assume desired local port range is 60_000-60_511 s = socket(AF_INET, SOCK_STREAM) s.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s.bind(("192.0.2.1", 60_000)) s.connect(("1.1.1.1", 53)) # Fails only if 192.0.2.1:60000 -> 1.1.1.1:53 is busy # Application must retry with another local port In case of UDP, the network stack allows binding more than one socket to the same 4-tuple, when local port sharing is enabled (REUSEADDR). Hence detecting the conflict is much harder and involves querying sock_diag and toggling the REUSEADDR flag [1]. b) For TCP, bind()-ing to a port within the ephemeral port range means that no connecting sockets, that is those which leave it to the network stack to find a free local port at connect() time, can use the this port. IOW, the bind hash bucket tb->fastreuse will be 0 or 1, and the port will be skipped during the free port search at connect() time. 2. Isolate the app in a dedicated netns and use the use the per-netns ip_local_port_range sysctl to adjust the ephemeral port range bounds. The per-netns setting affects all sockets, so this approach can be used only if: - there is just one egress IP address, or - the desired egress port range is the same for all egress IP addresses used by the application. For TCP, this approach avoids the downsides of (1). Free port search and 4-tuple conflict detection is done by the network stack: system("sysctl -w net.ipv4.ip_local_port_range='60000 60511'") s = socket(AF_INET, SOCK_STREAM) s.setsockopt(SOL_IP, IP_BIND_ADDRESS_NO_PORT, 1) s.bind(("192.0.2.1", 0)) s.connect(("1.1.1.1", 53)) # Fails if all 4-tuples 192.0.2.1:60000-60511 -> 1.1.1.1:53 are busy For UDP this approach has limited applicability. Setting the IP_BIND_ADDRESS_NO_PORT socket option does not result in local source port being shared with other connected UDP sockets. Hence relying on the network stack to find a free source port, limits the number of outgoing UDP flows from a single IP address down to the number of available ephemeral ports. To put it another way, partitioning the ephemeral port range between hosts using the existing Linux networking API is cumbersome. To address this use case, add a new socket option at the SOL_IP level, named IP_LOCAL_PORT_RANGE. The new option can be used to clamp down the ephemeral port range for each socket individually. The option can be used only to narrow down the per-netns local port range. If the per-socket range lies outside of the per-netns range, the latter takes precedence. UAPI-wise, the low and high range bounds are passed to the kernel as a pair of u16 values in host byte order packed into a u32. This avoids pointer passing. PORT_LO = 40_000 PORT_HI = 40_511 s = socket(AF_INET, SOCK_STREAM) v = struct.pack("I", PORT_HI << 16 | PORT_LO) s.setsockopt(SOL_IP, IP_LOCAL_PORT_RANGE, v) s.bind(("127.0.0.1", 0)) s.getsockname() # Local address between ("127.0.0.1", 40_000) and ("127.0.0.1", 40_511), # if there is a free port. EADDRINUSE otherwise. [1] https://github.com/cloudflare/cloudflare-blog/blob/232b432c1d57/2022-02-connectx/connectx.py#L116 Reviewed-by: Marek Majkowski <marek@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-24net: fou: regenerate the uAPI from the specJakub Kicinski1-28/+26
Regenerate the FOU uAPI header from the YAML spec. The flags now come before attributes which use them, and the comments for type disappear (coders should look at the spec instead). Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-24netfilter: conntrack: unify established states for SCTP pathsSriram Yagnaraman2-2/+2
An SCTP endpoint can start an association through a path and tear it down over another one. That means the initial path will not see the shutdown sequence, and the conntrack entry will remain in ESTABLISHED state for 5 days. By merging the HEARTBEAT_ACKED and ESTABLISHED states into one ESTABLISHED state, there remains no difference between a primary or secondary path. The timeout for the merged ESTABLISHED state is set to 210 seconds (hb_interval * max_path_retrans + rto_max). So, even if a path doesn't see the shutdown sequence, it will expire in a reasonable amount of time. With this change in place, there is now more than one state from which we can transition to ESTABLISHED, COOKIE_ECHOED and HEARTBEAT_SENT, so handle the setting of ASSURED bit whenever a state change has happened and the new state is ESTABLISHED. Removed the check for dir==REPLY since the transition to ESTABLISHED can happen only in the reply direction. Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-24Revert "netfilter: conntrack: add sctp DATA_SENT state"Sriram Yagnaraman2-2/+0
This reverts commit (bff3d0534804: "netfilter: conntrack: add sctp DATA_SENT state") Using DATA/SACK to detect a new connection on secondary/alternate paths works only on new connections, while a HEARTBEAT is required on connection re-use. It is probably consistent to wait for HEARTBEAT to create a secondary connection in conntrack. Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-24Merge tag 'wireless-next-2023-01-23' of ↵Jakub Kicinski1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.3 First set of patches for v6.3. The most important change here is that the old Wireless Extension user space interface is not supported on Wi-Fi 7 devices at all. We also added a warning if anyone with modern drivers (ie. cfg80211 and mac80211 drivers) tries to use Wireless Extensions, everyone should switch to using nl80211 interface instead. Static WEP support is removed, there wasn't any driver using that anyway so there's no user impact. Otherwise it's smaller features and fixes as usual. Note: As mt76 had tricky conflicts due to the fixes in wireless tree, we decided to merge wireless into wireless-next to solve them easily. There should not be any merge problems anymore. Major changes: cfg80211 - remove never used static WEP support - warn if Wireless Extention interface is used with cfg80211/mac80211 drivers - stop supporting Wireless Extensions with Wi-Fi 7 devices - support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting rfkill - add GPIO DT support bitfield - add FIELD_PREP_CONST() mt76 - per-PHY LED support rtw89 - support new Bluetooth co-existance version rtl8xxxu - support RTL8188EU * tag 'wireless-next-2023-01-23' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (123 commits) wifi: wireless: deny wireless extensions on MLO-capable devices wifi: wireless: warn on most wireless extension usage wifi: mac80211: drop extra 'e' from ieeee80211... name wifi: cfg80211: Deduplicate certificate loading bitfield: add FIELD_PREP_CONST() wifi: mac80211: add kernel-doc for EHT structure mac80211: support minimal EHT rate reporting on RX wifi: mac80211: Add HE MU-MIMO related flags in ieee80211_bss_conf wifi: mac80211: Add VHT MU-MIMO related flags in ieee80211_bss_conf wifi: cfg80211: Use MLD address to indicate MLD STA disconnection wifi: cfg80211: Support 32 bytes KCK key in GTK rekey offload wifi: cfg80211: Fix extended KCK key length check in nl80211_set_rekey_data() wifi: cfg80211: remove support for static WEP wifi: rtl8xxxu: Dump the efuse only for untested devices wifi: rtl8xxxu: Print the ROM version too wifi: rtw88: Use non-atomic sta iterator in rtw_ra_mask_info_update() wifi: rtw88: Use rtw_iterate_vifs() for rtw_vif_watch_dog_iter() wifi: rtw88: Move register access from rtw_bf_assoc() outside the RCU wifi: rtl8xxxu: Use a longer retry limit of 48 wifi: rtl8xxxu: Report the RSSI to the firmware ... ==================== Link: https://lore.kernel.org/r/20230123103338.330CBC433EF@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23bpf: Introduce device-bound XDP programsStanislav Fomichev1-0/+5
New flag BPF_F_XDP_DEV_BOUND_ONLY plus all the infra to have a way to associate a netdev with a BPF program at load time. netdevsim checks are dropped in favor of generic check in dev_xdp_attach. Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230119221536.3349901-6-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-23net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)Vladimir Oltean2-0/+21
IEEE 802.3-2018 clause 99 defines a MAC Merge sublayer which contains an Express MAC and a Preemptible MAC. Both MACs are hidden to higher and lower layers and visible as a single MAC (packet classification to eMAC or pMAC on TX is done based on priority; classification on RX is done based on SFD). For devices which support a MAC Merge sublayer, it is desirable to retrieve individual packet counters from the eMAC and the pMAC, as well as aggregate statistics (their sum). Introduce a new ETHTOOL_A_STATS_SRC attribute which is part of the policy of ETHTOOL_MSG_STATS_GET and, and an ETHTOOL_A_PAUSE_STATS_SRC which is part of the policy of ETHTOOL_MSG_PAUSE_GET (accepted when ETHTOOL_FLAG_STATS is set in the common ethtool header). Both of these take values from enum ethtool_mac_stats_src, defaulting to "aggregate" in the absence of the attribute. Existing drivers do not need to pay attention to this enum which was added to all driver-facing structures, just the ones which report the MAC merge layer as supported. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: ethtool: add support for MAC Merge layerVladimir Oltean2-0/+72
The MAC merge sublayer (IEEE 802.3-2018 clause 99) is one of 2 specifications (the other being Frame Preemption; IEEE 802.1Q-2018 clause 6.7.2), which work together to minimize latency caused by frame interference at TX. The overall goal of TSN is for normal traffic and traffic with a bounded deadline to be able to cohabitate on the same L2 network and not bother each other too much. The standards achieve this (partly) by introducing the concept of preemptible traffic, i.e. Ethernet frames that have a custom value for the Start-of-Frame-Delimiter (SFD), and these frames can be fragmented and reassembled at L2 on a link-local basis. The non-preemptible frames are called express traffic, they are transmitted using a normal SFD, and they can preempt preemptible frames, therefore having lower latency, which can matter at lower (100 Mbps) link speeds, or at high MTUs (jumbo frames around 9K). Preemption is not recursive, i.e. a P frame cannot preempt another P frame. Preemption also does not depend upon priority, or otherwise said, an E frame with prio 0 will still preempt a P frame with prio 7. In terms of implementation, the standards talk about the presence of an express MAC (eMAC) which handles express traffic, and a preemptible MAC (pMAC) which handles preemptible traffic, and these MACs are multiplexed on the same MII by a MAC merge layer. To support frame preemption, the definition of the SFD was generalized to SMD (Start-of-mPacket-Delimiter), where an mPacket is essentially an Ethernet frame fragment, or a complete frame. Stations unaware of an SMD value different from the standard SFD will treat P frames as error frames. To prevent that from happening, a negotiation process is defined. On RX, packets are dispatched to the eMAC or pMAC after being filtered by their SMD. On TX, the eMAC/pMAC classification decision is taken by the 802.1Q spec, based on packet priority (each of the 8 user priority values may have an admin-status of preemptible or express). The MAC Merge layer and the Frame Preemption parameters have some degree of independence in terms of how software stacks are supposed to deal with them. The activation of the MM layer is supposed to be controlled by an LLDP daemon (after it has been communicated that the link partner also supports it), after which a (hardware-based or not) verification handshake takes place, before actually enabling the feature. So the process is intended to be relatively plug-and-play. Whereas FP settings are supposed to be coordinated across a network using something approximating NETCONF. The support contained here is exclusively for the 802.3 (MAC Merge) portions and not for the 802.1Q (Frame Preemption) parts. This API is sufficient for an LLDP daemon to do its job. The FP adminStatus variable from 802.1Q is outside the scope of an LLDP daemon. I have taken a few creative licenses and augmented the Linux kernel UAPI compared to the standard managed objects recommended by IEEE 802.3. These are: - ETHTOOL_A_MM_PMAC_ENABLED: According to Figure 99-6: Receive Processing state diagram, a MAC Merge layer is always supposed to be able to receive P frames. However, this implies keeping the pMAC powered on, which will consume needless power in applications where FP will never be used. If LLDP is used, the reception of an Additional Ethernet Capabilities TLV from the link partner is sufficient indication that the pMAC should be enabled. So my proposal is that in Linux, we keep the pMAC turned off by default and that user space turns it on when needed. - ETHTOOL_A_MM_VERIFY_ENABLED: The IEEE managed object is called aMACMergeVerifyDisableTx. I opted for consistency (positive logic) in the boolean netlink attributes offered, so this is also positive here. Other than the meaning being reversed, they correspond to the same thing. - ETHTOOL_A_MM_MAX_VERIFY_TIME: I found it most reasonable for a LLDP daemon to maximize the verifyTime variable (delay between SMD-V transmissions), to maximize its chances that the LP replies. IEEE says that the verifyTime can range between 1 and 128 ms, but the NXP ENETC stupidly keeps this variable in a 7 bit register, so the maximum supported value is 127 ms. I could have chosen to hardcode this in the LLDP daemon to a lower value, but why not let the kernel expose its supported range directly. - ETHTOOL_A_MM_TX_MIN_FRAG_SIZE: the standard managed object is called aMACMergeAddFragSize, and expresses the "additional" fragment size (on top of ETH_ZLEN), whereas this expresses the absolute value of the fragment size. - ETHTOOL_A_MM_RX_MIN_FRAG_SIZE: there doesn't appear to exist a managed object mandated by the standard, but user space clearly needs to know what is the minimum supported fragment size of our local receiver, since LLDP must advertise a value no lower than that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-21batman-adv: tvlv: prepare for tvlv enabled multicast packet typeLinus Lüssing1-0/+2
Prepare TVLV infrastructure for more packet types, in particular the upcoming batman-adv multicast packet type. For that swap the OGM vs. unicast-tvlv packet boolean indicator to an explicit unsigned integer packet type variable. And provide the skb to a call to batadv_tvlv_containers_process(), as later the multicast packet's TVLV handler will need to have access not only to the TVLV but the full skb for forwarding. Forwarding will be invoked from the multicast packet's TVLVs' contents later. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2023-01-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-2/+2
drivers/net/ipa/ipa_interrupt.c drivers/net/ipa/ipa_interrupt.h 9ec9b2a30853 ("net: ipa: disable ipa interrupt during suspend") 8e461e1f092b ("net: ipa: introduce ipa_interrupt_enable()") d50ed3558719 ("net: ipa: enable IPA interrupt handlers separate from registration") https://lore.kernel.org/all/20230119114125.5182c7ab@canb.auug.org.au/ https://lore.kernel.org/all/79e46152-8043-a512-79d9-c3b905462774@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-20net: dcb: add new rewrite tableDaniel Machon1-0/+2
Add new rewrite table and all the required functions, offload hooks and bookkeeping for maintaining it. The rewrite table reuses the app struct, and the entire set of app selectors. As such, some bookeeping code can be shared between the rewrite- and the APP table. New functions for getting, setting and deleting entries has been added. Apart from operating on the rewrite list, these functions do not emit a DCB_APP_EVENT when the list os modified. The new dcb_getrewr does a lookup based on selector and priority and returns the protocol, so that mappings from priority to protocol, for a given selector and ifindex is obtained. Also, a new nested attribute has been added, that encapsulates one or more app structs. This attribute is used to distinguish the two tables. The dcb_lock used for the APP table is reused for the rewrite table. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-18wifi: cfg80211: Use MLD address to indicate MLD STA disconnectionVeerendranath Jakkam1-1/+2
We use station's MLD address to report disconnection of MLD station. Update the documentation in multiple places to indicate this. Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/20221206080226.1702646-4-quic_vjakkam@quicinc.com [update commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-18wifi: cfg80211: Support 32 bytes KCK key in GTK rekey offloadShivani Baranwal1-0/+1
Currently, maximum KCK key length supported for GTK rekey offload is 24 bytes but with some newer AKMs the KCK key length can be 32 bytes. e.g., 00-0F-AC:24 AKM suite with SAE finite cyclic group 21. Add support to allow 32 bytes KCK keys in GTK rekey offload. Signed-off-by: Shivani Baranwal <quic_shivbara@quicinc.com> Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/20221206143715.1802987-3-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-18netfilter: nf_tables: add support to destroy operationFernando Fernandez Mancera1-0/+14
Introduce NFT_MSG_DESTROY* message type. The destroy operation performs a delete operation but ignoring the ENOENT errors. This is useful for the transaction semantics, where failing to delete an object which does not exist results in aborting the transaction. This new command allows the transaction to proceed in case the object does not exist. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-01-17sched: add new attr TCA_EXT_WARN_MSG to report tc extact messageHangbin Liu1-0/+1
We will report extack message if there is an error via netlink_ack(). But if the rule is not to be exclusively executed by the hardware, extack is not passed along and offloading failures don't get logged. In commit 81c7288b170a ("sched: cls: enable verbose logging") Marcelo made cls could log verbose info for offloading failures, which helps improving Open vSwitch debuggability when using flower offloading. It would also be helpful if userspace monitor tools, like "tc monitor", could log this kind of message, as it doesn't require vswitchd log level adjusment. Let's add a new tc attributes to report the extack message so the monitor program could receive the failures. e.g. # tc monitor added chain dev enp3s0f1np1 parent ffff: chain 0 added filter dev enp3s0f1np1 ingress protocol all pref 49152 flower chain 0 handle 0x1 ct_state +trk+new not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 Warning: mlx5_core: matching on ct_state +new isn't supported. In this patch I only report the extack message on add/del operations. It doesn't look like we need to report the extack message on get/dump operations. Note this message not only reporte to multicast groups, it could also be reported unicast, which may affect the current usersapce tool's behaivor. Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20230113034353.2766735-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-15bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()Ziyang Xuan1-0/+7
Add ipip6 and ip6ip decap support for bpf_skb_adjust_room(). Main use case is for using cls_bpf on ingress hook to decapsulate IPv4 over IPv6 and IPv6 over IPv4 tunnel packets. Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the new IP header version after decapsulating the outer IP header. Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/b268ec7f0ff9431f4f43b1b40ab856ebb28cb4e1.1673574419.git.william.xuanziyang@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-13Merge tag 'arm64-fixes' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Here's a sizeable batch of Friday the 13th arm64 fixes for -rc4. What could possibly go wrong? The obvious reason we have so much here is because of the holiday season right after the merge window, but we've also brought back an erratum workaround that was previously dropped at the last minute and there's an MTE coredumping fix that strays outside of the arch/arm64 directory. Summary: - Fix PAGE_TABLE_CHECK failures on hugepage splitting path - Fix PSCI encoding of MEM_PROTECT_RANGE function in UAPI header - Fix NULL deref when accessing debugfs node if PSCI is not present - Fix MTE core dumping when VMA list is being updated concurrently - Fix SME signal frame handling when SVE is not implemented by the CPU - Fix asm constraints for cmpxchg_double() to hazard both words - Fix build failure with stack tracer and older versions of Clang - Bring back workaround for Cortex-A715 erratum 2645198" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix build with CC=clang, CONFIG_FTRACE=y and CONFIG_STACK_TRACER=y arm64/mm: Define dummy pud_user_exec() when using 2-level page-table arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption firmware/psci: Don't register with debugfs if PSCI isn't available firmware/psci: Fix MEM_PROTECT_RANGE function numbers arm64/signal: Always allocate SVE signal frames on SME only systems arm64/signal: Always accept SVE signal frames on SME only systems arm64/sme: Fix context switch for SME only systems arm64: cmpxchg_double*: hazard against entire exchange variable arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warning arm64: mte: Avoid the racy walk of the vma list during core dump elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size} arm64: mte: Fix double-freeing of the temporary tag storage during coredump arm64: ptrace: Use ARM64_SME to guard the SME register enumerations arm64/mm: add pud_user_exec() check in pud_user_accessible_page() arm64/mm: fix incorrect file_map_count for invalid pmd
2023-01-13ethtool: add tx aggregation parametersDaniele Palmas1-0/+3
Add the following ethtool tx aggregation parameters: ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES Maximum size in bytes of a tx aggregated block of frames. ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES Maximum number of frames that can be aggregated into a block. ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS Time in usecs after the first packet arrival in an aggregated block for the block to be sent. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+112
drivers/net/usb/r8152.c be53771c87f4 ("r8152: add vendor/device ID pair for Microsoft Devkit") ec51fbd1b8a2 ("r8152: add USB device driver for config selection") https://lore.kernel.org/all/20230113113339.658c4723@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-11drivers/net/phy: add the link modes for the 10BASE-T1S Ethernet PHYPiergiorgio Beruto1-0/+3
This patch adds the link modes for the IEEE 802.3cg Clause 147 10BASE-T1S Ethernet PHY. According to the specifications, the 10BASE-T1S supports Point-To-Point Full-Duplex, Point-To-Point Half-Duplex and/or Point-To-Multipoint (AKA Multi-Drop) Half-Duplex operations. Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-11net/ethtool: add netlink interface for the PLCA RSPiergiorgio Beruto1-0/+25
Add support for configuring the PLCA Reconciliation Sublayer on multi-drop PHYs that support IEEE802.3cg-2019 Clause 148 (e.g., 10BASE-T1S). This patch adds the appropriate netlink interface to ethtool. Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-07ethtool: Replace 0-length array with flexible arrayKees Cook1-1/+1
Zero-length arrays are deprecated[1]. Replace struct ethtool_rxnfc's "rule_locs" 0-length array with a flexible array. Detected with GCC 13, using -fstrict-flex-arrays=3: net/ethtool/common.c: In function 'ethtool_get_max_rxnfc_channel': net/ethtool/common.c:558:55: warning: array subscript i is outside array bounds of '__u32[0]' {aka 'unsigned int[]'} [-Warray-bounds=] 558 | .fs.location = info->rule_locs[i], | ~~~~~~~~~~~~~~~^~~ In file included from include/linux/ethtool.h:19, from include/uapi/linux/ethtool_netlink.h:12, from include/linux/ethtool_netlink.h:6, from net/ethtool/common.c:3: include/uapi/linux/ethtool.h:1186:41: note: while referencing 'rule_locs' 1186 | __u32 rule_locs[0]; | ^~~~~~~~~ [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Cc: Andrew Lunn <andrew@lunn.ch> Cc: kernel test robot <lkp@intel.com> Cc: Oleksij Rempel <linux@rempel-privat.de> Cc: Sean Anderson <sean.anderson@seco.com> Cc: Alexandru Tachici <alexandru.tachici@analog.com> Cc: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230106042844.give.885-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-07net: ipv6: rpl_iptunnel: Replace 0-length arrays with flexible arraysKees Cook1-2/+2
Zero-length arrays are deprecated[1]. Replace struct ipv6_rpl_sr_hdr's "segments" union of 0-length arrays with flexible arrays. Detected with GCC 13, using -fstrict-flex-arrays=3: In function 'rpl_validate_srh', inlined from 'rpl_build_state' at ../net/ipv6/rpl_iptunnel.c:96:7: ../net/ipv6/rpl_iptunnel.c:60:28: warning: array subscript <unknown> is outside array bounds of 'struct in6_addr[0]' [-Warray-bounds=] 60 | if (ipv6_addr_type(&srh->rpl_segaddr[srh->segments_left - 1]) & | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from ../include/net/rpl.h:12, from ../net/ipv6/rpl_iptunnel.c:13: ../include/uapi/linux/rpl.h: In function 'rpl_build_state': ../include/uapi/linux/rpl.h:40:33: note: while referencing 'addr' 40 | struct in6_addr addr[0]; | ^~~~ [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230105221533.never.711-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-07ipv6: ioam: Replace 0-length array with flexible arrayKees Cook1-1/+1
Zero-length arrays are deprecated[1]. Replace struct ioam6_trace_hdr's "data" 0-length array with a flexible array. Detected with GCC 13, using -fstrict-flex-arrays=3: net/ipv6/ioam6_iptunnel.c: In function 'ioam6_build_state': net/ipv6/ioam6_iptunnel.c:194:37: warning: array subscript <unknown> is outside array bounds of '__u8[0]' {aka 'unsigned char[]'} [-Warray-bounds=] 194 | tuninfo->traceh.data[trace->remlen * 4] = IPV6_TLV_PADN; | ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~ In file included from include/linux/ioam6.h:11, from net/ipv6/ioam6_iptunnel.c:13: include/uapi/linux/ioam6.h:130:17: note: while referencing 'data' 130 | __u8 data[0]; | ^~~~ [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Justin Iurman <justin.iurman@uliege.be> Tested-by: Justin Iurman <justin.iurman@uliege.be> Link: https://lore.kernel.org/r/20230105222115.never.661-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-07Merge tag 'block-2023-01-06' of git://git.kernel.dk/linuxLinus Torvalds1-0/+112
Pull block fixes from Jens Axboe: "The big change here is obviously the revert of the pktcdvd driver removal. Outside of that, just minor tweaks. In detail: - Re-instate the pktcdvd driver, which necessitates adding back bio_copy_data_iter() and the fops->devnode() hook for now (me) - Fix for splitting of a bio marked as NOWAIT, causing either nowait reads or writes to error with EAGAIN even if parts of the IO completed (me) - Fix for ublk, punting management commands to io-wq as they can all easily block for extended periods of time (Ming) - Removal of SRCU dependency for the block layer (Paul)" * tag 'block-2023-01-06' of git://git.kernel.dk/linux: block: Remove "select SRCU" Revert "pktcdvd: remove driver." Revert "block: remove devnode callback from struct block_device_operations" Revert "block: bio_copy_data_iter" ublk: honor IO_URING_F_NONBLOCK for handling control command block: don't allow splitting of a REQ_NOWAIT bio block: handle bio_split_to_limits() NULL return
2023-01-06firmware/psci: Fix MEM_PROTECT_RANGE function numbersWill Deacon1-2/+2
PSCI v1.1 offers 32-bit and 64-bit variants of the MEM_PROTECT_RANGE call using function identifier 20. Fix the incorrect definitions of the MEM_PROTECT_CHECK_RANGE calls in the PSCI UAPI header. Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Fixes: 3137f2e60098 ("firmware/psci: Add debugfs support to ease debugging") Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20221125101826.22404-1-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-01-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski5-4/+30
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05Merge tag 'net-6.2-rc3' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, wifi, and netfilter. Current release - regressions: - bpf: fix nullness propagation for reg to reg comparisons, avoid null-deref - inet: control sockets should not use current thread task_frag - bpf: always use maximal size for copy_array() - eth: bnxt_en: don't link netdev to a devlink port for VFs Current release - new code bugs: - rxrpc: fix a couple of potential use-after-frees - netfilter: conntrack: fix IPv6 exthdr error check - wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes - eth: dsa: qca8k: various fixes for the in-band register access - eth: nfp: fix schedule in atomic context when sync mc address - eth: renesas: rswitch: fix getting mac address from device tree - mobile: ipa: use proper endpoint mask for suspend Previous releases - regressions: - tcp: add TIME_WAIT sockets in bhash2, fix regression caught by Jiri / python tests - net: tc: don't intepret cls results when asked to drop, fix oob-access - vrf: determine the dst using the original ifindex for multicast - eth: bnxt_en: - fix XDP RX path if BPF adjusted packet length - fix HDS (header placement) and jumbo thresholds for RX packets - eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf, avoid memory corruptions Previous releases - always broken: - ulp: prevent ULP without clone op from entering the LISTEN status - veth: fix race with AF_XDP exposing old or uninitialized descriptors - bpf: - pull before calling skb_postpull_rcsum() (fix checksum support and avoid a WARN()) - fix panic due to wrong pageattr of im->image (when livepatch and kretfunc coexist) - keep a reference to the mm, in case the task is dead - mptcp: fix deadlock in fastopen error path - netfilter: - nf_tables: perform type checking for existing sets - nf_tables: honor set timeout and garbage collection updates - ipset: fix hash:net,port,net hang with /0 subnet - ipset: avoid hung task warning when adding/deleting entries - selftests: net: - fix cmsg_so_mark.sh test hang on non-x86 systems - fix the arp_ndisc_evict_nocarrier test for IPv6 - usb: rndis_host: secure rndis_query check against int overflow - eth: r8169: fix dmar pte write access during suspend/resume with WOL - eth: lan966x: fix configuration of the PCS - eth: sparx5: fix reading of the MAC address - eth: qed: allow sleep in qed_mcp_trace_dump() - eth: hns3: - fix interrupts re-initialization after VF FLR - fix handling of promisc when MAC addr table gets full - refine the handling for VF heartbeat - eth: mlx5: - properly handle ingress QinQ-tagged packets on VST - fix io_eq_size and event_eq_size params validation on big endian - fix RoCE setting at HCA level if not supported at all - don't turn CQE compression on by default for IPoIB - eth: ena: - fix toeplitz initial hash key value - account for the number of XDP-processed bytes in interface stats - fix rx_copybreak value update Misc: - ethtool: harden phy stat handling against buggy drivers - docs: netdev: convert maintainer's doc from FAQ to a normal document" * tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits) caif: fix memory leak in cfctrl_linkup_request() inet: control sockets should not use current thread task_frag net/ulp: prevent ULP without clone op from entering the LISTEN status qed: allow sleep in qed_mcp_trace_dump() MAINTAINERS: Update maintainers for ptp_vmw driver usb: rndis_host: Secure rndis_query check against int overflow net: dpaa: Fix dtsec check for PCS availability octeontx2-pf: Fix lmtst ID used in aura free drivers/net/bonding/bond_3ad: return when there's no aggregator netfilter: ipset: Rework long task execution when adding/deleting entries netfilter: ipset: fix hash:net,port,net hang with /0 subnet net: sparx5: Fix reading of the MAC address vxlan: Fix memory leaks in error path net: sched: htb: fix htb_classify() kernel-doc net: sched: cbq: dont intepret cls results when asked to drop net: sched: atm: dont intepret cls results when asked to drop dt-bindings: net: marvell,orion-mdio: Fix examples dt-bindings: net: sun8i-emac: Add phy-supply property net: ipa: use proper endpoint mask for suspend selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier ...
2023-01-05Merge tag 'for-netdev' of ↵Jakub Kicinski1-0/+4
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2023-01-04 We've added 45 non-merge commits during the last 21 day(s) which contain a total of 50 files changed, 1454 insertions(+), 375 deletions(-). The main changes are: 1) Fixes, improvements and refactoring of parts of BPF verifier's state equivalence checks, from Andrii Nakryiko. 2) Fix a few corner cases in libbpf's BTF-to-C converter in particular around padding handling and enums, also from Andrii Nakryiko. 3) Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata, from Christian Ehrig. 4) Improve x86 JIT's codegen for PROBE_MEM runtime error checks, from Dave Marchevsky. 5) Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers, from Jiri Olsa. 6) Add proper documentation for BPF_MAP_TYPE_SOCK{MAP,HASH} maps, from Maryam Tahhan. 7) Improvements in libbpf's btf_parse_elf error handling, from Changbin Du. 8) Bigger batch of improvements to BPF tracing code samples, from Daniel T. Lee. 9) Add LoongArch support to libbpf's bpf_tracing helper header, from Hengqi Chen. 10) Fix a libbpf compiler warning in perf_event_open_probe on arm32, from Khem Raj. 11) Optimize bpf_local_storage_elem by removing 56 bytes of padding, from Martin KaFai Lau. 12) Use pkg-config to locate libelf for resolve_btfids build, from Shen Jiamin. 13) Various libbpf improvements around API documentation and errno handling, from Xin Liu. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits) libbpf: Return -ENODATA for missing btf section libbpf: Add LoongArch support to bpf_tracing.h libbpf: Restore errno after pr_warn. libbpf: Added the description of some API functions libbpf: Fix invalid return address register in s390 samples/bpf: Use BPF_KSYSCALL macro in syscall tracing programs samples/bpf: Fix tracex2 by using BPF_KSYSCALL macro samples/bpf: Change _kern suffix to .bpf with syscall tracing program samples/bpf: Use vmlinux.h instead of implicit headers in syscall tracing program samples/bpf: Use kyscall instead of kprobe in syscall tracing program bpf: rename list_head -> graph_root in field info types libbpf: fix errno is overwritten after being closed. bpf: fix regs_exact() logic in regsafe() to remap IDs correctly bpf: perform byte-by-byte comparison only when necessary in regsafe() bpf: reject non-exact register type matches in regsafe() bpf: generalize MAYBE_NULL vs non-MAYBE_NULL rule bpf: reorganize struct bpf_reg_state fields bpf: teach refsafe() to take into account ID remapping bpf: Remove unused field initialization in bpf's ctl_table selftests/bpf: Add jit probe_mem corner case tests to s390x denylist ... ==================== Link: https://lore.kernel.org/r/20230105000926.31350-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-3/+1
Pull virtio updates from Michael Tsirkin: "Mostly fixes all over the place, a couple of cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (32 commits) virtio_blk: Fix signedness bug in virtblk_prep_rq() vdpa_sim_net: should not drop the multicast/broadcast packet vdpasim: fix memory leak when freeing IOTLBs vdpa: conditionally fill max max queue pair for stats vdpa/vp_vdpa: fix kfree a wrong pointer in vp_vdpa_remove vduse: Validate vq_num in vduse_validate_config() tools/virtio: remove smp_read_barrier_depends() tools/virtio: remove stray characters vhost_vdpa: fix the crash in unmap a large memory virtio: Implementing attribute show with sysfs_emit virtio-crypto: fix memory leak in virtio_crypto_alg_skcipher_close_session() tools/virtio: Variable type completion vdpa_sim: fix vringh initialization in vdpasim_queue_ready() virtio_blk: use UINT_MAX instead of -1U vhost-vdpa: fix an iotlb memory leak vhost: fix range used in translate_desc() vringh: fix range used in iotlb_translate() vhost/vsock: Fix error handling in vhost_vsock_init() vdpa_sim: fix possible memory leak in vdpasim_net_init() and vdpasim_blk_init() tools: Delete the unneeded semicolon after curly braces ...
2023-01-05Revert "pktcdvd: remove driver."Jens Axboe1-0/+112
This reverts commit f40eb99897af665f11858dd7b56edcb62c3f3c67. There are apparently still users out there of this driver. While we'd love to remove it to ease the maintenance burden, let's reinstate it for now until better (userspace) solutions can be developed. Link: https://lore.kernel.org/lkml/20230104190115.ceglfefco475ev6c@pali/ Reported-by: Pali Rohár <pali@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-30Merge tag 'io_uring-6.2-2022-12-29' of git://git.kernel.dk/linuxLinus Torvalds1-0/+8
Pull io_uring fixes from Jens Axboe: - Two fixes for mutex grabbing when the task state is != TASK_RUNNING (me) - Check for invalid opcode in io_uring_register() a bit earlier, to avoid going through the quiesce machinery just to return -EINVAL later in the process (me) - Fix for the uapi io_uring header, skipping including time_types.h when necessary (Stefan) * tag 'io_uring-6.2-2022-12-29' of git://git.kernel.dk/linux: uapi:io_uring.h: allow linux/time_types.h to be skipped io_uring: check for valid register opcode earlier io_uring/cancel: re-grab ctx mutex after finishing wait io_uring: finish waiting before flushing overflow entries
2022-12-28Merge branch 'kvm-late-6.1-fixes' into HEADPaolo Bonzini1-0/+3
x86: * several fixes to nested VMX execution controls * fixes and clarification to the documentation for Xen emulation * do not unnecessarily release a pmu event with zero period * MMU fixes * fix Coverity warning in kvm_hv_flush_tlb() selftests: * fixes for the ucall mechanism in selftests * other fixes mostly related to compilation with clang
2022-12-28vdpa: merge functionally duplicated dev_features attributesSi-Wei Liu1-3/+1
We can merge VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES with VDPA_ATTR_DEV_FEATURES which is functionally equivalent. While at it, tweak the comment in header file to make user provioned device features distinguished from those supported by the parent mgmtdev device: the former of which can be inherited as a whole from the latter, or can be a subset of the latter if explicitly specified. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Message-Id: <1665422823-18364-1-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-12-27uapi:io_uring.h: allow linux/time_types.h to be skippedStefan Metzmacher1-0/+8
include/uapi/linux/io_uring.h is synced 1:1 into liburing:src/include/liburing/io_uring.h. liburing has a configure check to detect the need for linux/time_types.h. It can opt-out by defining UAPI_LINUX_IO_URING_H_SKIP_LINUX_TIME_TYPES_H Fixes: 78a861b94959 ("io_uring: add sync cancelation API through io_uring_register()") Link: https://github.com/axboe/liburing/issues/708 Link: https://github.com/axboe/liburing/pull/709 Link: https://lore.kernel.org/io-uring/20221115212614.1308132-1-ammar.faizi@intel.com/T/#m9f5dd571cd4f6a5dee84452dbbca3b92ba7a4091 CC: Jens Axboe <axboe@kernel.dk> Cc: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Link: https://lore.kernel.org/r/7071a0a1d751221538b20b63f9160094fc7e06f4.1668630247.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-27KVM: x86/xen: Add KVM_XEN_INVALID_GPA and KVM_XEN_INVALID_GFN to uapiDavid Woodhouse1-0/+3
These are (uint64_t)-1 magic values are a userspace ABI, allowing the shared info pages and other enlightenments to be disabled. This isn't a Xen ABI because Xen doesn't let the guest turn these off except with the full SHUTDOWN_soft_reset mechanism. Under KVM, the userspace VMM is expected to handle soft reset, and tear down the kernel parts of the enlightenments accordingly. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20221226120320.1125390-5-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>