summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2023-02-21net/sched: cls_api: Support hardware miss to tc actionPaul Blakey1-1/+5
For drivers to support partial offload of a filter's action list, add support for action miss to specify an action instance to continue from in sw. CT action in particular can't be fully offloaded, as new connections need to be handled in software. This imposes other limitations on the actions that can be offloaded together with the CT action, such as packet modifications. Assign each action on a filter's action list a unique miss_cookie which drivers can then use to fill action_miss part of the tc skb extension. On getting back this miss_cookie, find the action instance with relevant cookie and continue classifying from there. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21Merge tag 'ieee802154-for-net-next-2023-02-20' of ↵Jakub Kicinski3-41/+7
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154-next 2023-02-20 Miquel Raynal build upon his earlier work and introduced two new features into the ieee802154 stack. Beaconing to announce existing PAN's and passive scanning to discover the beacons and associated PAN's. The matching changes to the userspace configuration tool have been posted as well and will be released together with the kernel release. Arnd Bergmann and Dmitry Torokhov worked on converting the at86rf230 and cc2520 drivers away from the unused platform_data usage and towards the new gpiod API. (I had to add a revert as Dmitry found a regression on an already pushed tree on my side). Changes since v1 (pull request 2023-02-02) - Netlink API extack and NLA_POLICY* usage as suggested by Jakub - Removed always true condition found by kernel test robot - Simplify device removal with running background job for scanning - Fix problems with beacon sending in some cases by using the MLME tx path * tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next: ieee802154: Drop device trackers mac802154: Fix an always true condition mac802154: Send beacons using the MLME Tx path ieee802154: Change error code on monitor scan netlink request ieee802154: Convert scan error messages to extack ieee802154: Use netlink policies when relevant on scan parameters ieee802154: at86rf230: switch to using gpiod API ieee802154: at86rf230: drop support for platform data Revert "at86rf230: convert to gpio descriptors" cc2520: move to gpio descriptors mac802154: Avoid superfluous endianness handling at86rf230: convert to gpio descriptors mac802154: Handle basic beaconing ieee802154: Add support for user beaconing requests mac802154: Handle passive scanning mac802154: Add MLME Tx locked helpers mac802154: Prepare forcing specific symbol duration ieee802154: Introduce a helper to validate a channel ieee802154: Define a beacon frame header ieee802154: Add support for user scanning requests ==================== Link: https://lore.kernel.org/r/20230220213749.386451-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21net/mlx4_en: Introduce flexible array to silence overflow warningKees Cook1-0/+1
The call "skb_copy_from_linear_data(skb, inl + 1, spc)" triggers a FORTIFY memcpy() warning on ppc64 platform: In function ‘fortify_memcpy_chk’, inlined from ‘skb_copy_from_linear_data’ at ./include/linux/skbuff.h:4029:2, inlined from ‘build_inline_wqe’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:722:4, inlined from ‘mlx4_en_xmit’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:1066:3: ./include/linux/fortify-string.h:513:25: error: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 513 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Same behaviour on x86 you can get if you use "__always_inline" instead of "inline" for skb_copy_from_linear_data() in skbuff.h The call here copies data into inlined tx destricptor, which has 104 bytes (MAX_INLINE) space for data payload. In this case "spc" is known in compile-time but the destination is used with hidden knowledge (real structure of destination is different from that the compiler can see). That cause the fortify warning because compiler can check bounds, but the real bounds are different. "spc" can't be bigger than 64 bytes (MLX4_INLINE_ALIGN), so the data can always fit into inlined tx descriptor. The fact that "inl" points into inlined tx descriptor is determined earlier in mlx4_en_xmit(). Avoid confusing the compiler with "inl + 1" constructions to get to past the inl header by introducing a flexible array "data" to the struct so that the compiler can see that we are not dealing with an array of inl structs, but rather, arbitrary data following the structure. There are no changes to the structure layout reported by pahole, and the resulting machine code is actually smaller. Reported-by: Josef Oskera <joskera@redhat.com> Link: https://lore.kernel.org/lkml/20230217094541.2362873-1-joskera@redhat.com Fixes: f68f2ff91512 ("fortify: Detect struct member overflows in memcpy() at compile-time") Cc: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20230218183842.never.954-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21Merge tag 'for-netdev' of ↵Jakub Kicinski3-20/+76
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-02-17 We've added 64 non-merge commits during the last 7 day(s) which contain a total of 158 files changed, 4190 insertions(+), 988 deletions(-). The main changes are: 1) Add a rbtree data structure following the "next-gen data structure" precedent set by recently-added linked-list, that is, by using kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky. 2) Add a new benchmark for hashmap lookups to BPF selftests, from Anton Protopopov. 3) Fix bpf_fib_lookup to only return valid neighbors and add an option to skip the neigh table lookup, from Martin KaFai Lau. 4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accouting for container environments, from Yafang Shao. 5) Batch of ice multi-buffer and driver performance fixes, from Alexander Lobakin. 6) Fix a bug in determining whether global subprog's argument is PTR_TO_CTX, which is based on type names which breaks kprobe progs, from Andrii Nakryiko. 7) Prep work for future -mcpu=v4 LLVM option which includes usage of BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier, from Eduard Zingerman. 8) More prep work for later building selftests with Memory Sanitizer in order to detect usages of undefined memory, from Ilya Leoshkevich. 9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer dereference via sendmsg(), from Maciej Fijalkowski. 10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui. 11) Fix BPF memory allocator in combination with BPF hashtab where it could corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao. 12) Fix LoongArch BPF JIT to always use 4 instructions for function address so that instruction sequences don't change between passes, from Hengqi Chen. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits) selftests/bpf: Add bpf_fib_lookup test bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup riscv, bpf: Add bpf trampoline support for RV64 riscv, bpf: Add bpf_arch_text_poke support for RV64 riscv, bpf: Factor out emit_call for kernel and bpf context riscv: Extend patch_text for multiple instructions Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES" selftests/bpf: Add global subprog context passing tests selftests/bpf: Convert test_global_funcs test to test_loader framework bpf: Fix global subprog context argument resolution logic LoongArch, bpf: Use 4 instructions for function address in JIT bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state bpf: Disable bh in bpf_test_run for xdp and tc prog xsk: check IFF_UP earlier in Tx path Fix typos in selftest/bpf files selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd() libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd() ... ==================== Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20net: make default_rps_mask a per netns attributePaolo Abeni1-1/+0
That really was meant to be a per netns attribute from the beginning. The idea is that once proper isolation is in place in the main namespace, additional demux in the child namespaces will be redundant. Let's make child netns default rps mask empty by default. To avoid bloating the netns with a possibly large cpumask, allocate it on-demand during the first write operation. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextDavid S. Miller1-0/+3
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter updates for net-next: 1) Add safeguard to check for NULL tupe in objects updates via NFT_MSG_NEWOBJ, this should not ever happen. From Alok Tiwari. 2) Incorrect pointer check in the new destroy rule command, from Yang Yingliang. 3) Incorrect status bitcheck in nf_conntrack_udp_packet(), from Florian Westphal. 4) Simplify seq_print_acct(), from Ilia Gavrilov. 5) Use 2-arg optimal variant of kfree_rcu() in IPVS, from Julian Anastasov. 6) TCP connection enters CLOSE state in conntrack for locally originated TCP reset packet from the reject target, from Florian Westphal. The fixes #2 and #3 in this series address issues from the previous pull nf-next request in this net-next cycle. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-17netfilter: let reset rules clean out conntrack entriesFlorian Westphal1-0/+3
iptables/nftables support responding to tcp packets with tcp resets. The generated tcp reset packet passes through both output and postrouting netfilter hooks, but conntrack will never see them because the generated skb has its ->nfct pointer copied over from the packet that triggered the reset rule. If the reset rule is used for established connections, this may result in the conntrack entry to be around for a very long time (default timeout is 5 days). One way to avoid this would be to not copy the nf_conn pointer so that the rest packet passes through conntrack too. Problem is that output rules might not have the same conntrack zone setup as the prerouting ones, so its possible that the reset skb won't find the correct entry. Generating a template entry for the skb seems error prone as well. Add an explicit "closing" function that switches a confirmed conntrack entry to closed state and wire this up for tcp. If the entry isn't confirmed, no action is needed because the conntrack entry will never be committed to the table. Reported-by: Russel King <linux@armlinux.org.uk> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-02-17Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/netDavid S. Miller5-7/+14
Some of the devlink bits were tricky, but I think I got it right. Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-17Merge tag 'drm-fixes-2023-02-17' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-0/+1
Pull drm fixes from Dave Airlie: "Just a final collection of misc fixes, the biggest disables the recently added dynamic debugging support, it has a regression that needs some bigger fixes. Otherwise a bunch of fixes across the board, vc4, amdgpu and vmwgfx mostly, with some smaller i915 and ast fixes. drm: - dynamic debug disable for now fbdev: - deferred i/o device close fix amdgpu: - Fix GC11.x suspend warning - Fix display warning vc4: - YUV planes fix - hdmi display fix - crtc reduced blanking fix ast: - fix start address computation vmwgfx: - fix bo/handle races i915: - gen11 WA fix" * tag 'drm-fixes-2023-02-17' of git://anongit.freedesktop.org/drm/drm: drm/amd/display: Fail atomic_check early on normalize_zpos error drm/amd/amdgpu: fix warning during suspend drm/vmwgfx: Do not drop the reference to the handle too soon drm/vmwgfx: Stop accessing buffer objects which failed init drm/i915/gen11: Wa_1408615072/Wa_1407596294 should be on GT list drm: Disable dynamic debug as broken drm/ast: Fix start address computation fbdev: Fix invalid page access after closing deferred I/O devices drm/vc4: crtc: Increase setup cost in core clock calculation to handle extreme reduced blanking drm/vc4: hdmi: Always enable GCP with AVMUTE cleared drm/vc4: Fix YUV plane handling when planes are in different buffers
2023-02-17Merge tag 'drm-misc-fixes-2023-02-16' of ↵Dave Airlie1-0/+1
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Multiple fixes in vc4 to address issues with YUV planes, HDMI and CRTC; an invalid page access fix for fbdev, mark dynamic debug as broken, a double free and refcounting fix for vmwgfx. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20230216091905.i5wswy4dd74x4br5@houat
2023-02-16Merge tag 'net-6.2-final' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Fixes from the main networking tree only, probably because all sub-trees have backed off and haven't submitted their changes. None of the fixes here are particularly scary and no outstanding regressions. In an ideal world the "current release" sections would be empty at this stage but that never happens. Current release - regressions: - fix unwanted sign extension in netdev_stats_to_stats64() Current release - new code bugs: - initialize net->notrefcnt_tracker earlier - devlink: fix netdev notifier chain corruption - nfp: make sure mbox accesses in IPsec code are atomic - ice: fix check for weight and priority of a scheduling node Previous releases - regressions: - ice: xsk: fix cleaning of XDP_TX frame, prevent inf loop - igb: fix I2C bit banging config with external thermal sensor Previous releases - always broken: - sched: tcindex: update imperfect hash filters respecting rcu - mpls: fix stale pointer if allocation fails during device rename - dccp/tcp: avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions - remove WARN_ON_ONCE(sk->sk_forward_alloc) from sk_stream_kill_queues() - af_key: fix heap information leak - ipv6: fix socket connection with DSCP (correct interpretation of the tclass field vs fib rule matching) - tipc: fix kernel warning when sending SYN message - vmxnet3: read RSS information from the correct descriptor (eop)" * tag 'net-6.2-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits) devlink: Fix netdev notifier chain corruption igb: conditionalize I2C bit banging on external thermal sensor support net: mpls: fix stale pointer if allocation fails during device rename net/sched: tcindex: search key must be 16 bits tipc: fix kernel warning when sending SYN message igb: Fix PPS input and output using 3rd and 4th SDP net: use a bounce buffer for copying skb->mark ixgbe: add double of VLAN header when computing the max MTU i40e: add double of VLAN header when computing the max MTU ixgbe: allow to increase MTU to 3K with XDP enabled net: stmmac: Restrict warning on disabling DMA store and fwd mode net/sched: act_ctinfo: use percpu stats net: stmmac: fix order of dwmac5 FlexPPS parametrization sequence ice: fix lost multicast packets in promisc mode ice: Fix check for weight and priority of a scheduling node bnxt_en: Fix mqprio and XDP ring checking logic net: Fix unwanted sign extension in netdev_stats_to_stats64() net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path net: openvswitch: fix possible memory leak in ovs_meter_cmd_set() af_key: Fix heap information leak ...
2023-02-16Merge branch 'mlx5-next' of ↵Jakub Kicinski2-2/+13
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Leon Romanovsky says: ==================== mlx5-next changes Following previous conversations [1] and our clear commitment to do the TC work [2], please pull mlx5-next shared branch, which includes low-level steering logic to allow RoCEv2 traffic to be encrypted/ decrypted through IPsec. [1] https://lore.kernel.org/all/20230126230815.224239-1-saeed@kernel.org/ [2] https://lore.kernel.org/all/Y+Z7lVVWqnRBiPh2@nvidia.com/ * 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Configure IPsec steering for egress RoCEv2 traffic net/mlx5: Configure IPsec steering for ingress RoCEv2 traffic net/mlx5: Add IPSec priorities in RDMA namespaces net/mlx5: Implement new destination type TABLE_TYPE net/mlx5: Introduce new destination type TABLE_TYPE ==================== Link: https://lore.kernel.org/r/20230215095624.1365200-1-leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-16Merge tag 'wireless-next-2023-03-16' of ↵Jakub Kicinski2-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Major stack changes: * EHT channel puncturing support (client & AP) * some support for AP MLD without mac80211 * fixes for A-MSDU on mesh connections Major driver changes: iwlwifi * EHT rate reporting * Bump FW API to 74 for AX devices * STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware mt76 * switch to using page pool allocator * mt7996 EHT (Wi-Fi 7) support * Wireless Ethernet Dispatch (WED) reset support libertas * WPS enrollee support brcmfmac * Rename Cypress 89459 to BCM4355 * BCM4355 and BCM4377 support mwifiex * SD8978 chipset support rtl8xxxu * LED support ath12k * new driver for Qualcomm Wi-Fi 7 devices ath11k * IPQ5018 support * Fine Timing Measurement (FTM) responder role support * channel 177 support ath10k * store WLAN firmware version in SMEM image table * tag 'wireless-next-2023-03-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (207 commits) wifi: brcmfmac: p2p: Introduce generic flexible array frame member wifi: mac80211: add documentation for amsdu_mesh_control wifi: cfg80211: remove gfp parameter from cfg80211_obss_color_collision_notify description wifi: mac80211: always initialize link_sta with sta wifi: mac80211: pass 'sta' to ieee80211_rx_data_set_sta() wifi: cfg80211: Set SSID if it is not already set wifi: rtw89: move H2C of del_pkt_offload before polling FW status ready wifi: rtw89: use readable return 0 in rtw89_mac_cfg_ppdu_status() wifi: rtw88: usb: drop now unnecessary URB size check wifi: rtw88: usb: send Zero length packets if necessary wifi: rtw88: usb: Set qsel correctly wifi: mac80211: fix off-by-one link setting wifi: mac80211: Fix for Rx fragmented action frames wifi: mac80211: avoid u32_encode_bits() warning wifi: mac80211: Don't translate MLD addresses for multicast wifi: cfg80211: call reg_notifier for self managed wiphy from driver hint wifi: cfg80211: get rid of gfp in cfg80211_bss_color_notify wifi: nl80211: Allow authentication frames and set keys on NAN interface wifi: mac80211: fix non-MLO station association wifi: mac80211: Allow NSS change only up to capability ... ==================== Link: https://lore.kernel.org/r/20230216105406.208416-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-16devlink: Fix netdev notifier chain corruptionIdo Schimmel1-2/+0
Cited commit changed devlink to register its netdev notifier block on the global netdev notifier chain instead of on the per network namespace one. However, when changing the network namespace of the devlink instance, devlink still tries to unregister its notifier block from the chain of the old namespace and register it on the chain of the new namespace. This results in corruption of the notifier chains, as the same notifier block is registered on two different chains: The global one and the per network namespace one. In turn, this causes other problems such as the inability to dismantle namespaces due to netdev reference count issues. Fix by preventing devlink from moving its notifier block between namespaces. Reproducer: # echo "10 1" > /sys/bus/netdevsim/new_device # ip netns add test123 # devlink dev reload netdevsim/netdevsim10 netns test123 # ip netns del test123 [ 71.935619] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 71.938348] leaked reference. Fixes: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230215073139.1360108-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16wifi: brcmfmac: p2p: Introduce generic flexible array frame memberKees Cook1-0/+1
Silence run-time memcpy() false positive warning when processing management frames: memcpy: detected field-spanning write (size 27) of single field "&mgmt_frame->u" at drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c:1469 (size 26) Due to this (soon to be fixed) GCC bug[1], FORTIFY_SOURCE (via __builtin_dynamic_object_size) doesn't recognize that the union may end with a flexible array, and returns "26" (the fixed size of the union), rather than the remaining size of the allocation. Add an explicit flexible array member and set it as the destination here, so that we get the correct coverage for the memcpy(). [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832 Reported-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arend van Spriel <aspriel@gmail.com> Cc: Franky Lin <franky.lin@broadcom.com> Cc: Hante Meuleman <hante.meuleman@broadcom.com> Cc: Kalle Valo <kvalo@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Brian Henriquez <brian.henriquez@cypress.com> Cc: linux-wireless@vger.kernel.org Cc: brcm80211-dev-list.pdl@broadcom.com Cc: SHA-cyfmac-dev-list@infineon.com Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230215224110.never.022-kees@kernel.org [rename 'frame' to 'body'] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-16bpf: Zeroing allocated object from slab in bpf memory allocatorHou Tao1-0/+7
Currently the freed element in bpf memory allocator may be immediately reused, for htab map the reuse will reinitialize special fields in map value (e.g., bpf_spin_lock), but lookup procedure may still access these special fields, and it may lead to hard-lockup as shown below: NMI backtrace for cpu 16 CPU: 16 PID: 2574 Comm: htab.bin Tainted: G L 6.1.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), RIP: 0010:queued_spin_lock_slowpath+0x283/0x2c0 ...... Call Trace: <TASK> copy_map_value_locked+0xb7/0x170 bpf_map_copy_value+0x113/0x3c0 __sys_bpf+0x1c67/0x2780 __x64_sys_bpf+0x1c/0x20 do_syscall_64+0x30/0x60 entry_SYSCALL_64_after_hwframe+0x46/0xb0 ...... </TASK> For htab map, just like the preallocated case, these is no need to initialize these special fields in map value again once these fields have been initialized. For preallocated htab map, these fields are initialized through __GFP_ZERO in bpf_map_area_alloc(), so do the similar thing for non-preallocated htab in bpf memory allocator. And there is no need to use __GFP_ZERO for per-cpu bpf memory allocator, because __alloc_percpu_gfp() does it implicitly. Fixes: 0fd7c5d43339 ("bpf: Optimize call_rcu in non-preallocated hash map.") Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230215082132.3856544-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15net/mlx5: Add IPSec priorities in RDMA namespacesMark Zhang1-0/+2
Add IPSec flow steering priorities in RDMA namespaces. This allows adding tables/rules to forward RoCEv2 traffic to the IPSec crypto tables in NIC_TX domain, and accept RoCEv2 traffic from NIC_RX domain. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-15net/mlx5: Implement new destination type TABLE_TYPEMark Zhang1-0/+1
Implement new destination type to support flow transition between different table types. e.g. from NIC_RX to RDMA_RX or from RDMA_TX to NIC_TX. The new destination is described in the tracepoint as follows: "mlx5_fs_add_rule: rule=00000000d53cd0ed fte=0000000048a8a6ed index=0 sw_action=<> [dst] flow_table_type=7 id:262152" Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-15net/mlx5: Introduce new destination type TABLE_TYPEPatrisious Haddad1-2/+10
This new destination type supports flow transition between different table types, e.g. from NIC_RX to RDMA_RX or from RDMA_TX to NIC_TX. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-15net/mlx5e: Move dl_port to struct mlx5e_devJiri Pirko1-1/+0
No need to have dl_port which is tightly coupled with mlx5e code in mlx5 core code. Move it to struct mlx5e_dev and loose mlx5e_devlink_get_dl_port() helper. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-15net/mlx5: Lag, Add single RDMA device in multiport modeMark Bloch1-0/+1
In MultiPort E-Switch mode a single RDMA is created. This device has multiple RDMA ports that represent the uplink ports that are connected to the E-Switch. Account for this when creating the RDMA device so it has an additional port for the non native uplink. As a side effect of this patch, use shared fdb in multiport eswitch mode. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-15net/mlx5: Lag, set different uplink vport metadata in multiport eswitch modeRoi Dayan1-0/+1
In a follow-up commit multiport eswitch mode will use a shared fdb. In shared fdb there is a single eswitch fdb and traffic could come from any port. to distinguish between the ports set a different metadata per uplink port. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-14net/mlx5e: TC, support per action statsOz Shlomo1-0/+2
Extend the action stats callback implementation to update stats for actions that are associated with hw counters. Note that the callback may be called from tc action utility or from tc flower. Both apis expect the driver to return the stats difference from the last update. As such, query the raw counter value and maintain the diff from the last api call in the tc layer, instead of the fs_core layer. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-14net: add pskb_may_pull_reason() helperEric Dumazet1-4/+15
pskb_may_pull() can fail for two different reasons. Provide pskb_may_pull_reason() helper to distinguish between these reasons. It returns: SKB_NOT_DROPPED_YET : Success SKB_DROP_REASON_PKT_TOO_SMALL : packet too small SKB_DROP_REASON_NOMEM : skb->head could not be resized Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-14bpf: Add basic bpf_rb_{root,node} supportDave Marchevsky1-1/+19
This patch adds special BPF_RB_{ROOT,NODE} btf_field_types similar to BPF_LIST_{HEAD,NODE}, adds the necessary plumbing to detect the new types, and adds bpf_rb_root_free function for freeing bpf_rb_root in map_values. structs bpf_rb_root and bpf_rb_node are opaque types meant to obscure structs rb_root_cached rb_node, respectively. btf_struct_access will prevent BPF programs from touching these special fields automatically now that they're recognized. btf_check_and_fixup_fields now groups list_head and rb_root together as "graph root" fields and {list,rb}_node as "graph node", and does same ownership cycle checking as before. Note that this function does _not_ prevent ownership type mixups (e.g. rb_root owning list_node) - that's handled by btf_parse_graph_root. After this patch, a bpf program can have a struct bpf_rb_root in a map_value, but not add anything to nor do anything useful with it. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-14Merge tag 'mm-hotfixes-stable-2023-02-13-13-50' of ↵Linus Torvalds2-5/+12
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Twelve hotfixes, mostly against mm/. Five of these fixes are cc:stable" * tag 'mm-hotfixes-stable-2023-02-13-13-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: of: reserved_mem: Have kmemleak ignore dynamically allocated reserved mem scripts/gdb: fix 'lx-current' for x86 lib: parser: optimize match_NUMBER apis to use local array mm: shrinkers: fix deadlock in shrinker debugfs mm: hwpoison: support recovery from ksm_might_need_to_copy() kasan: fix Oops due to missing calls to kasan_arch_is_ready() revert "squashfs: harden sanity check in squashfs_read_xattr_id_table" fsdax: dax_unshare_iter() should return a valid length mm/gup: add folio to list when folio_isolate_lru() succeed aio: fix mremap after fork null-deref mailmap: add entry for Alexander Mikhalitsyn mm: extend max struct page size for kmsan
2023-02-14bpf: Migrate release_on_unlock logic to non-owning ref semanticsDave Marchevsky2-20/+24
This patch introduces non-owning reference semantics to the verifier, specifically linked_list API kfunc handling. release_on_unlock logic for refs is refactored - with small functional changes - to implement these semantics, and bpf_list_push_{front,back} are migrated to use them. When a list node is pushed to a list, the program still has a pointer to the node: n = bpf_obj_new(typeof(*n)); bpf_spin_lock(&l); bpf_list_push_back(&l, n); /* n still points to the just-added node */ bpf_spin_unlock(&l); What the verifier considers n to be after the push, and thus what can be done with n, are changed by this patch. Common properties both before/after this patch: * After push, n is only a valid reference to the node until end of critical section * After push, n cannot be pushed to any list * After push, the program can read the node's fields using n Before: * After push, n retains the ref_obj_id which it received on bpf_obj_new, but the associated bpf_reference_state's release_on_unlock field is set to true * release_on_unlock field and associated logic is used to implement "n is only a valid ref until end of critical section" * After push, n cannot be written to, the node must be removed from the list before writing to its fields * After push, n is marked PTR_UNTRUSTED After: * After push, n's ref is released and ref_obj_id set to 0. NON_OWN_REF type flag is added to reg's type, indicating that it's a non-owning reference. * NON_OWN_REF flag and logic is used to implement "n is only a valid ref until end of critical section" * n can be written to (except for special fields e.g. bpf_list_node, timer, ...) Summary of specific implementation changes to achieve the above: * release_on_unlock field, ref_set_release_on_unlock helper, and logic to "release on unlock" based on that field are removed * The anonymous active_lock struct used by bpf_verifier_state is pulled out into a named struct bpf_active_lock. * NON_OWN_REF type flag is introduced along with verifier logic changes to handle non-owning refs * Helpers are added to use NON_OWN_REF flag to implement non-owning ref semantics as described above * invalidate_non_owning_refs - helper to clobber all non-owning refs matching a particular bpf_active_lock identity. Replaces release_on_unlock logic in process_spin_lock. * ref_set_non_owning - set NON_OWN_REF type flag after doing some sanity checking * ref_convert_owning_non_owning - convert owning reference w/ specified ref_obj_id to non-owning references. Set NON_OWN_REF flag for each reg with that ref_obj_id and 0-out its ref_obj_id * Update linked_list selftests to account for minor semantic differences introduced by this patch * Writes to a release_on_unlock node ref are not allowed, while writes to non-owning reference pointees are. As a result the linked_list "write after push" failure tests are no longer scenarios that should fail. * The test##missing_lock##op and test##incorrect_lock##op macro-generated failure tests need to have a valid node argument in order to have the same error output as before. Otherwise verification will fail early and the expected error output won't be seen. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230212092715.1422619-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-13wifi: mwifiex: Support SD8978 chipsetLukas Wunner1-0/+1
The Marvell SD8978 (aka NXP IW416) uses identical registers as SD8987, so reuse the existing mwifiex_reg_sd8987 definition. Note that mwifiex_reg_sd8977 and mwifiex_reg_sd8997 are likewise identical, save for the fw_dump_ctrl register: They define it as 0xf0 whereas mwifiex_reg_sd8987 defines it as 0xf9. I've verified that 0xf9 is the correct value on SD8978. NXP's out-of-tree driver uses 0xf9 for all of them, so there's a chance that 0xf0 is not correct in the mwifiex_reg_sd8977 and mwifiex_reg_sd8997 definitions. I cannot test that for lack of hardware, hence am leaving it as is. NXP has only released a firmware which runs Bluetooth over UART. Perhaps Bluetooth over SDIO is unsupported by this chipset. Consequently, only an "sdiouart" firmware image is referenced, not an alternative "sdsd" image. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/536b4f17a72ca460ad1b07045757043fb0778988.1674827105.git.lukas@wunner.de
2023-02-13net: phy: add genphy_c45_ethtool_get/set_eee() supportOleksij Rempel2-0/+65
Add replacement for phy_ethtool_get/set_eee() functions. Current phy_ethtool_get/set_eee() implementation is great and it is possible to make it even better: - this functionality is for devices implementing parts of IEEE 802.3 specification beyond Clause 22. The better place for this code is phy-c45.c - currently it is able to do read/write operations on PHYs with different abilities to not existing registers. It is better to use stored supported_eee abilities to avoid false read/write operations. - the eee_active detection will provide wrong results on not supported link modes. It is better to validate speed/duplex properties against supported EEE link modes. - it is able to support only limited amount of link modes. We have more EEE link modes... By refactoring this code I address most of this point except of the last one. Adding additional EEE link modes will need more work. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13net: phy: export phy_check_valid() functionOleksij Rempel1-0/+1
This function will be needed for genphy_c45_ethtool_get_eee() provided by next patch. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13net: phy: add genphy_c45_read_eee_abilities() functionOleksij Rempel2-0/+32
Add generic function for EEE abilities defined by IEEE 802.3 specification. For now following registers are supported: - IEEE 802.3-2018 45.2.3.10 EEE control and capability 1 (Register 3.20) - IEEE 802.3cg-2019 45.2.1.186b 10BASE-T1L PMA status register (Register 1.2295) Since I was not able to find any flag signaling support of these registers, we should detect link mode abilities first and then based on these abilities doing EEE link modes detection. Results of EEE ability detection will be stored into new variable phydev->supported_eee. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13net: ethtool: extend ringparam set/get APIs for rx_pushShannon Nelson1-0/+4
Similar to what was done for TX_PUSH, add an RX_PUSH concept to the ethtool interfaces. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13Merge tag 'trace-v6.2-rc7' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: "Fix showing of TASK_COMM_LEN instead of its value The TASK_COMM_LEN was converted from a macro into an enum so that BTF would have access to it. But this unfortunately caused TASK_COMM_LEN to display in the format fields of trace events, as they are created by the TRACE_EVENT() macro and such, macros convert to their values, where as enums do not. To handle this, instead of using the field itself to be display, save the value of the array size as another field in the trace_event_fields structure, and use that instead. Not only does this fix the issue, but also converts the other trace events that have this same problem (but were not breaking tooling). With this change, the original work around b3bc8547d3be6 ("tracing: Have TRACE_DEFINE_ENUM affect trace event types as well") could be reverted (but that should be done in the merge window)" * tag 'trace-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix TASK_COMM_LEN in trace event format file
2023-02-12tracing: Fix TASK_COMM_LEN in trace event format fileYafang Shao1-0/+1
After commit 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN"), the content of the format file under /sys/kernel/tracing/events/task/task_newtask was changed from field:char comm[16]; offset:12; size:16; signed:0; to field:char comm[TASK_COMM_LEN]; offset:12; size:16; signed:0; John reported that this change breaks older versions of perfetto. Then Mathieu pointed out that this behavioral change was caused by the use of __stringify(_len), which happens to work on macros, but not on enum labels. And he also gave the suggestion on how to fix it: :One possible solution to make this more robust would be to extend :struct trace_event_fields with one more field that indicates the length :of an array as an actual integer, without storing it in its stringified :form in the type, and do the formatting in f_show where it belongs. The result as follows after this change, $ cat /sys/kernel/tracing/events/task/task_newtask/format field:char comm[16]; offset:12; size:16; signed:0; Link: https://lore.kernel.org/lkml/Y+QaZtz55LIirsUO@google.com/ Link: https://lore.kernel.org/linux-trace-kernel/20230210155921.4610-1-laoar.shao@gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20230212151303.12353-1-laoar.shao@gmail.com Cc: stable@vger.kernel.org Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Kajetan Puchalski <kajetan.puchalski@arm.com> CC: Qais Yousef <qyousef@layalina.io> Fixes: 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN") Reported-by: John Stultz <jstultz@google.com> Debugged-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-11bpf: allow to disable bpf map memory accountingYafang Shao1-0/+8
We can simply set root memcg as the map's memcg to disable bpf memory accounting. bpf_map_area_alloc is a little special as it gets the memcg from current rather than from the map, so we need to disable GFP_ACCOUNT specifically for it. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-4-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-11bpf: use bpf_map_kvcalloc in bpf_local_storageYafang Shao1-0/+8
Introduce new helper bpf_map_kvcalloc() for the memory allocation in bpf_local_storage(). Then the allocation will charge the memory from the map instead of from current, though currently they are the same thing as it is only used in map creation path now. By charging map's memory into the memcg from the map, it will be more clear. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-3-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-11mm: memcontrol: add new kernel parameter cgroup.memory=nobpfYafang Shao1-0/+11
Add new kernel parameter cgroup.memory=nobpf to allow user disable bpf memory accounting. This is a preparation for the followup patch. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-11Daniel Borkmann says:Jakub Kicinski3-7/+34
==================== pull-request: bpf-next 2023-02-11 We've added 96 non-merge commits during the last 14 day(s) which contain a total of 152 files changed, 4884 insertions(+), 962 deletions(-). There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c between commit 5b246e533d01 ("ice: split probe into smaller functions") from the net-next tree and commit 66c0e13ad236 ("drivers: net: turn on XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev() is otherwise there a 2nd time, and add XDP features to the existing ice_cfg_netdev() one: [...] ice_set_netdev_features(netdev); netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | NETDEV_XDP_ACT_XSK_ZEROCOPY; ice_set_ops(netdev); [...] Stephen's merge conflict mail: https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/ The main changes are: 1) Add support for BPF trampoline on s390x which finally allows to remove many test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich. 2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski. 3) Add capability to export the XDP features supported by the NIC. Along with that, add a XDP compliance test tool, from Lorenzo Bianconi & Marek Majtyka. 4) Add __bpf_kfunc tag for marking kernel functions as kfuncs, from David Vernet. 5) Add a deep dive documentation about the verifier's register liveness tracking algorithm, from Eduard Zingerman. 6) Fix and follow-up cleanups for resolve_btfids to be compiled as a host program to avoid cross compile issues, from Jiri Olsa & Ian Rogers. 7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted when testing on different NICs, from Jesper Dangaard Brouer. 8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang. 9) Extend libbpf to add an option for when the perf buffer should wake up, from Jon Doron. 10) Follow-up fix on xdp_metadata selftest to just consume on TX completion, from Stanislav Fomichev. 11) Extend the kfuncs.rst document with description on kfunc lifecycle & stability expectations, from David Vernet. 12) Fix bpftool prog profile to skip attaching to offline CPUs, from Tonghao Zhang. ==================== Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10fbdev: Fix invalid page access after closing deferred I/O devicesTakashi Iwai1-0/+1
When a fbdev with deferred I/O is once opened and closed, the dirty pages still remain queued in the pageref list, and eventually later those may be processed in the delayed work. This may lead to a corruption of pages, hitting an Oops. This patch makes sure to cancel the delayed work and clean up the pageref list at closing the device for addressing the bug. A part of the cleanup code is factored out as a new helper function that is called from the common fb_release(). Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Tested-by: Miko Larsson <mikoxyzzz@gmail.com> Fixes: 56c134f7f1b5 ("fbdev: Track deferred-I/O pages in pageref struct") Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230129082856.22113-1-tiwai@suse.de
2023-02-10net: skbuff: drop the word head from skb cacheJakub Kicinski1-1/+1
skbuff_head_cache is misnamed (perhaps for historical reasons?) because it does not hold heads. Head is the buffer which skb->data points to, and also where shinfo lives. struct sk_buff is a metadata structure, not the head. Eric recently added skb_small_head_cache (which allocates actual head buffers), let that serve as an excuse to finally clean this up :) Leave the user-space visible name intact, it could possibly be uAPI. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10string_helpers: Move string_is_valid() to the headerAndy Shevchenko1-0/+5
Move string_is_valid() to the header for wider use. While at it, rename to string_is_terminated() to be precise about its semantics. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230208133153.22528-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: introduce default_rps_mask netns attributePaolo Abeni1-0/+1
If RPS is enabled, this allows configuring a default rps mask, which is effective since receive queue creation time. A default RPS mask allows the system admin to ensure proper isolation, avoiding races at network namespace or device creation time. The default RPS mask is initially empty, and can be modified via a newly added sysctl entry. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10mm: shrinkers: fix deadlock in shrinker debugfsQi Zheng1-2/+3
The debugfs_remove_recursive() is invoked by unregister_shrinker(), which is holding the write lock of shrinker_rwsem. It will waits for the handler of debugfs file complete. The handler also needs to hold the read lock of shrinker_rwsem to do something. So it may cause the following deadlock: CPU0 CPU1 debugfs_file_get() shrinker_debugfs_count_show()/shrinker_debugfs_scan_write() unregister_shrinker() --> down_write(&shrinker_rwsem); debugfs_remove_recursive() // wait for (A) --> wait_for_completion(); // wait for (B) --> down_read_killable(&shrinker_rwsem) debugfs_file_put() -- (A) up_write() -- (B) The down_read_killable() can be killed, so that the above deadlock can be recovered. But it still requires an extra kill action, otherwise it will block all subsequent shrinker-related operations, so it's better to fix it. [akpm@linux-foundation.org: fix CONFIG_SHRINKER_DEBUG=n stub] Link: https://lkml.kernel.org/r/20230202105612.64641-1-zhengqi.arch@bytedance.com Fixes: 5035ebc644ae ("mm: shrinkers: introduce debugfs interface for memory shrinkers") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski9-21/+41
net/devlink/leftover.c / net/core/devlink.c: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global") f05bd8ebeb69 ("devlink: move code to a dedicated directory") 687125b5799c ("devlink: split out core code") https://lore.kernel.org/all/20230208094657.379f2b1a@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09Merge tag 'net-6.2-rc8' of ↵Linus Torvalds1-3/+10
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can and ipsec subtrees. Current release - regressions: - sched: fix off by one in htb_activate_prios() - eth: mana: fix accessing freed irq affinity_hint - eth: ice: fix out-of-bounds KASAN warning in virtchnl Current release - new code bugs: - eth: mtk_eth_soc: enable special tag when any MAC uses DSA Previous releases - always broken: - core: fix sk->sk_txrehash default - neigh: make sure used and confirmed times are valid - mptcp: be careful on subflow status propagation on errors - xfrm: prevent potential spectre v1 gadget in xfrm_xlate32_attr() - phylink: move phy_device_free() to correctly release phy device - eth: mlx5: - fix crash unsetting rx-vlan-filter in switchdev mode - fix hang on firmware reset - serialize module cleanup with reload and remove" * tag 'net-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) selftests: forwarding: lib: quote the sysctl values net: mscc: ocelot: fix all IPv6 getting trapped to CPU when PTP timestamping is used rds: rds_rm_zerocopy_callback() use list_first_entry() net: txgbe: Update support email address selftests: Fix failing VXLAN VNI filtering test selftests: mptcp: stop tests earlier selftests: mptcp: allow more slack for slow test-case mptcp: be careful on subflow status propagation on errors mptcp: fix locking for in-kernel listener creation mptcp: fix locking for setsockopt corner-case mptcp: do not wait for bare sockets' timeout net: ethernet: mtk_eth_soc: fix DSA TX tag hwaccel for switch port 0 nfp: ethtool: fix the bug of setting unsupported port speed txhash: fix sk->sk_txrehash default net: ethernet: mtk_eth_soc: fix wrong parameters order in __xdp_rxq_info_reg() net: ethernet: mtk_eth_soc: enable special tag when any MAC uses DSA net: sched: sch: Fix off by one in htb_activate_prios() igc: Add ndo_tx_timeout support net: mana: Fix accessing freed irq affinity_hint hv_netvsc: Allocate memory in netvsc_dma_map() with GFP_ATOMIC ...
2023-02-09Merge tag 'linux-can-next-for-6.3-20230208' of ↵Jakub Kicinski1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== can-next 2023-02-08 The 1st patch is by Oliver Hartkopp and cleans up the CAN_RAW's raw_setsockopt() for CAN_RAW_FD_FRAMES. The 2nd patch is by me and fixes the compilation if CONFIG_CAN_CALC_BITTIMING is disabled. (Problem introduced in last pull request to next-next.) * tag 'linux-can-next-for-6.3-20230208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: bittiming: can_calc_bittiming(): add missing parameter to no-op function can: raw: use temp variable instead of rolling back config ==================== Link: https://lore.kernel.org/r/20230208210014.3169347-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09Merge tag 'mlx5-next-netdev-deadlock' of ↵Jakub Kicinski3-5/+48
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-next-netdev-deadlock This series from Jiri solves a deadlock when removing a network namespace with mlx5 devlink instance being in it. The deadlock is between: 1) mlx5_ib->unregister_netdevice_notifier() AND 2) mlx5_core->devlink_reload->cleanup_net() To slove this introduced mlx5 netdev added/removed events to track uplink netdev to be used for register_netdevice_notifier_dev_net() purposes. * tag 'mlx5-next-netdev-deadlock' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister net/mlx5e: Propagate an internal event in case uplink netdev changes net/mlx5e: Fix trap event handling net/mlx5: Introduce CQE error syndrome ==================== Link: https://lore.kernel.org/r/20230208005626.72930-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09net/mlx5e: Propagate an internal event in case uplink netdev changesJiri Pirko2-0/+6
Whenever uplink netdev is set/cleared, propagate newly introduced event to inform notifier blocks netdev was added/removed. Move the set() helper to core.c from header, introduce clear() and netdev_added_event_replay() helpers. The last one is going to be called from rdma driver, so export it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-09Merge tag 'mlx5-updates-2023-02-07' of ↵Jakub Kicinski2-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-02-07 1) Minor and trivial code Cleanups 2) Minor fixes for net-next 3) From Shay: dynamic FW trace strings update. * tag 'mlx5-updates-2023-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: fw_tracer, Add support for unrecognized string net/mlx5: fw_tracer, Add support for strings DB update event net/mlx5: fw_tracer, allow 0 size string DBs net/mlx5: fw_tracer: Fix debug print net/mlx5: fs, Remove redundant assignment of size net/mlx5: fs_core, Remove redundant variable err net/mlx5: Fix memory leak in error flow of port set buffer net/mlx5e: Remove incorrect debugfs_create_dir NULL check in TLS net/mlx5e: Remove incorrect debugfs_create_dir NULL check in hairpin net/mlx5: fs, Remove redundant vport_number assignment net/mlx5e: Remove redundant code for handling vlan actions net/mlx5e: Don't listen to remove flows event net/mlx5: fw reset: Skip device ID check if PCI link up failed net/mlx5: Remove redundant health work lock mlx5: reduce stack usage in mlx5_setup_tc ==================== Link: https://lore.kernel.org/r/20230208003712.68386-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08can: bittiming: can_calc_bittiming(): add missing parameter to no-op functionMarc Kleine-Budde1-1/+1
In commit 286c0e09e8e0 ("can: bittiming: can_changelink() pass extack down callstack") a new parameter was added to can_calc_bittiming(), however the static inline no-op (which is used if CONFIG_CAN_CALC_BITTIMING is disabled) wasn't converted. Add the new parameter to the static inline no-op of can_calc_bittiming(). Fixes: 286c0e09e8e0 ("can: bittiming: can_changelink() pass extack down callstack") Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/20230207201734.2905618-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>