summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-01-04bpf: Add objcg to bpf_mem_allocYonghong Song2-5/+7
The objcg is a bpf_mem_alloc level property since all bpf_mem_cache's are with the same objcg. This patch made such a property explicit. The next patch will use this property to save and restore objcg for percpu unit allocator. Acked-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231222031739.1288590-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04bpf: Avoid unnecessary extra percpu memory allocationYonghong Song1-1/+3
Currently, for percpu memory allocation, say if the user requests allocation size to be 32 bytes, the actually calculated size will be 40 bytes and it further rounds to 64 bytes, and eventually 64 bytes are allocated, wasting 32-byte memory. Change bpf_mem_alloc() to calculate the cache index based on the user-provided allocation size so unnecessary extra memory can be avoided. Suggested-by: Hou Tao <houtao1@huawei.com> Acked-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231222031734.1288400-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04net: kcm: fix direct access to bv_lenMina Almasry1-1/+1
Minor fix for kcm: code wanting to access the fields inside an skb frag should use the skb_frag_*() helpers, instead of accessing the fields directly. Signed-off-by: Mina Almasry <almasrymina@google.com> Link: https://lore.kernel.org/r/20240102205959.794513-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04vsock/virtio: use skb_frag_*() helpersMina Almasry1-3/+3
Minor fix for virtio: code wanting to access the fields inside an skb frag should use the skb_frag_*() helpers, instead of accessing the fields directly. This allows for extensions where the underlying memory is not a page. Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Mina Almasry <almasrymina@google.com> Link: https://lore.kernel.org/r/20240102205905.793738-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net/sched: sch_api: conditional netlink notificationsPedro Tammela1-11/+68
Implement conditional netlink notifications for Qdiscs and classes, which were missing in the initial patches that targeted tc filters and actions. Notifications will only be built after passing a check for 'rtnl_notify_needed()'. For both Qdiscs and classes 'get' operations now call a dedicated notification function as it was not possible to distinguish between 'create' and 'get' before. This distinction is necessary because 'get' always send a notification. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20231229132642.1489088-2-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net/sched: introduce ACT_P_BOUND return codePedro Tammela21-21/+22
Bound actions always return '0' and as of today we rely on '0' being returned in order to properly skip bound actions in tcf_idr_insert_many. In order to further improve maintainability, introduce the ACT_P_BOUND return code. Actions are updated to return 'ACT_P_BOUND' instead of plain '0'. tcf_idr_insert_many is then updated to check for 'ACT_P_BOUND'. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20231229132642.1489088-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net-device: move xdp_prog to net_device_read_rxEric Dumazet3-3/+3
xdp_prog is used in receive path, both from XDP enabled drivers and from netif_elide_gro(). This patch also removes two 4-bytes holes. Fixes: 43a71cd66b9c ("net-device: reorganize net_device fast path variables") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240102162220.750823-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04Merge branch '10GbE' of ↵Jakub Kicinski16-359/+342
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-01-02 (ixgbe, i40e) This series contains updates to ixgbe and i40e drivers. Ovidiu Panait adds reporting of VF link state to ixgbe. Jedrzej removes uses of IXGBE_ERR* codes to instead use standard error codes. Andrii modifies behavior of VF disable to properly shut down queues on i40e. Simon Horman removes, undesired, use of comma operator for i40e. * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: i40e: Avoid unnecessary use of comma operator i40e: Fix VF disable behavior to block all traffic ixgbe: Refactor returning internal error codes ixgbe: Refactor overtemp event handling ixgbe: report link state for VF devices ==================== Link: https://lore.kernel.org/r/20240102222429.699129-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04Merge tag 'nf-24-01-03' of ↵Jakub Kicinski2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix nat packets in the related state in OVS, from Brad Cowie. 2) Drop chain reference counter on error path in case chain binding fails. * tag 'nf-24-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_immediate: drop chain reference counter on error netfilter: nf_nat: fix action not being set for all ct states ==================== Link: https://lore.kernel.org/r/20240103113001.137936-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04Merge branch 'ena-driver-xdp-changes'Jakub Kicinski7-692/+736
David Arinzon says: ==================== ENA driver XDP changes This patchset contains multiple XDP-related changes in the ENA driver, including moving the XDP code to dedicated files. ==================== Link: https://lore.kernel.org/r/20240101190855.18739-1-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net: ena: Take xdp packets stats into account in ena_get_stats64()David Arinzon1-3/+10
Queue stats using ifconfig and ip are retrieved via ena_get_stats64(). This function currently does not take the xdp sent or dropped packets stats into account. This commit adds the following xdp stats to ena_get_stats64(): tx bytes sent tx packets sent rx dropped packets Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20240101190855.18739-12-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net: ena: Make queue stats code cleaner by removing the if blockDavid Arinzon1-10/+7
Also shorten comment related to it. Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20240101190855.18739-11-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net: ena: Always register RX queue infoDavid Arinzon3-4/+13
The RX queue info contains information about the RX queue which might be relevant to the kernel. To avoid configuring this queue for different scenarios, this patch moves the RX queue configuration to ena_up()/ena_down() function and makes it configured every interface state toggle. Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20240101190855.18739-10-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net: ena: Add more debug prints to XDP related functionDavid Arinzon1-0/+4
Used for better readability and debugging of XDP flow. Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20240101190855.18739-9-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net: ena: Refactor napi functionsDavid Arinzon2-23/+30
This patch focuses on changes to the XDP part of the napi polling routine. 1. Update the `napi_comp` stat only when napi is actually complete. 2. Simplify the code by using a function pointer to the right napi routine (XDP vs non-XDP path) 3. Remove unnecessary local variables. 4. Adjust a debug print to show the processed XDP frame index rather than the pointer. Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20240101190855.18739-8-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net: ena: Don't check if XDP program is loaded in ena_xdp_execute()David Arinzon1-3/+0
This check is already done in ena_clean_rx_irq() which indirectly calls it. This function is called in napi context and the driver doesn't allow to change the XDP program without performing destruction and reinitialization of napi context (part of ena_down/ena_up sequence). Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20240101190855.18739-7-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net: ena: Use tx_ring instead of xdp_ring for XDP channel TXDavid Arinzon4-65/+61
When an XDP program is loaded the existing channels in the driver split into two halves: - The first half of the channels contain RX and TX rings, these queues are used for receiving traffic and sending packets originating from kernel. - The second half of the channels contain only a TX ring. These queues are used for sending packets that were redirected using XDP_TX or XDP_REDIRECT. Referring to the queues in the second half of the channels as "xdp_ring" can be confusing and may give the impression that ENA has the capability to generate an additional special queue. This patch ensures that the xdp_ring field is exclusively used to describe the XDP TX queue that a specific RX queue needs to utilize when forwarding packets with XDP TX and XDP REDIRECT, preserving the integrity of the xdp_ring field in ena_ring. Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20240101190855.18739-6-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net: ena: Introduce total_tx_size field in ena_tx_buffer structDavid Arinzon2-2/+5
To avoid de-referencing skb or xdp_frame when we poll for TX completion (where they might not be in the cache), save the total TX packet size in the ena_tx_buffer object representing the packet. Also the 'print_once' field's type was changed from u32 to u8 to allow adding the 'total_tx_size' without changing the total size of the struct. Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20240101190855.18739-5-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net: ena: Put orthogonal fields in ena_tx_buffer in a unionDavid Arinzon1-5/+7
The skb and xdpf pointers cannot be set together in the driver (each TX descriptor can send either an SKB or an XDP frame), and so it makes more sense to put them both in a union. This decreases the overall size of the ena_tx_buffer struct which improves cache locality. Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20240101190855.18739-4-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net: ena: Pass ena_adapter instead of net_device to ena_xmit_common()David Arinzon4-11/+10
This change will enable the ability to use ena_xmit_common() in functions that don't have a net_device pointer. While it can be retrieved by dereferencing ena_adapter (adapter->netdev), there's no reason to do it in fast path code where this pointer is only needed for debug prints. Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20240101190855.18739-3-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net: ena: Move XDP code to its new filesDavid Arinzon7-657/+680
XDP system has a very large footprint in the driver's overall code. makes the whole driver's code much harder to read. Moving XDP code to dedicated files. This patch doesn't make any changes to the code itself and only cut-pastes the code into ena_xdp.c and ena_xdp.h files so the change is purely cosmetic. Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20240101190855.18739-2-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04octeontx2-af: Fix max NPC MCAM entry check while validating ref_entrySuman Ghosh1-7/+6
As of today, the last MCAM entry was not getting allocated because of a <= check with the max_bmap count. This patch modifies that and if the requested entry is greater than the available entries then set it to the max value. Signed-off-by: Suman Ghosh <sumang@marvell.com> Link: https://lore.kernel.org/r/20240101145042.419697-1-sumang@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04Merge tag 'drm-misc-fixes-2024-01-03' of ↵Dave Airlie9-14/+48
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v6.7 final: - 2 small qaic fixes. - Fixes for overflow in aux xfer. - Fix uninitialised gamma lut in gmag200. - Small compiler warning fix for backports of a ps8640 fix. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/9ba866b4-3144-47a9-a2c0-7313c67249d7@linux.intel.com
2024-01-04selftests/net: change shebang to bash to support "source"Yujie Liu2-4/+4
The patch set [1] added a general lib.sh in net selftests, and converted several test scripts to source the lib.sh. unicast_extensions.sh (converted in [1]) and pmtu.sh (converted in [2]) have a /bin/sh shebang which may point to various shells in different distributions, but "source" is only available in some of them. For example, "source" is a built-it function in bash, but it cannot be used in dash. Refer to other scripts that were converted together, simply change the shebang to bash to fix the following issues when the default /bin/sh points to other shells. not ok 51 selftests: net: unicast_extensions.sh # exit=1 v1 -> v2: - Fix pmtu.sh which has the same issue as unicast_extensions.sh, suggested by Hangbin - Change the style of the "source" line to be consistent with other tests, suggested by Hangbin Link: https://lore.kernel.org/all/20231202020110.362433-1-liuhangbin@gmail.com/ [1] Link: https://lore.kernel.org/all/20231219094856.1740079-1-liuhangbin@gmail.com/ [2] Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 378f082eaf37 ("selftests/net: convert pmtu.sh to run it in unique namespace") Fixes: 0f4765d0b48d ("selftests/net: convert unicast_extensions.sh to run it in unique namespace") Signed-off-by: Yujie Liu <yujie.liu@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20231229131931.3961150-1-yujie.liu@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04Merge branch '1GbE' of ↵Jakub Kicinski2-3/+40
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-12-27 (igc) This series contains updates to igc driver only. Kurt Kanzenbach resolves issues around VLAN ntuple rules; correctly reporting back added rules and checking for valid values. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igc: Check VLAN EtherType mask igc: Check VLAN TCI mask igc: Report VLAN EtherType matching back to user ==================== Link: https://lore.kernel.org/r/20231227210041.3035055-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04Merge branch '100GbE' of ↵Jakub Kicinski3-10/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-12-27 (ice, i40e) This series contains updates to ice and i40e drivers. Katarzyna changes message to no longer be reported as error under certain conditions as it can be expected on ice. Ngai-Mint ensures VSI is always closed when stopping interface to prevent NULL pointer dereference for ice. Arkadiusz corrects reporting of phase offset value for ice. Sudheer corrects checking on ADQ filters to prevent invalid values on i40e. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: Fix filter input checks to prevent config with invalid values ice: dpll: fix phase offset value ice: Shut down VSI with "link-down-on-close" enabled ice: Fix link_down_on_close message ==================== Link: https://lore.kernel.org/r/20231227182541.3033124-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net/smc: fix invalid link access in dumping SMC-R connectionsWen Gu1-2/+1
A crash was found when dumping SMC-R connections. It can be reproduced by following steps: - environment: two RNICs on both sides. - run SMC-R between two sides, now a SMC_LGR_SYMMETRIC type link group will be created. - set the first RNIC down on either side and link group will turn to SMC_LGR_ASYMMETRIC_LOCAL then. - run 'smcss -R' and the crash will be triggered. BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000101fdd067 P4D 8000000101fdd067 PUD 10ce46067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 1810 Comm: smcss Kdump: loaded Tainted: G W E 6.7.0-rc6+ #51 RIP: 0010:__smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag] Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x66/0x150 ? exc_page_fault+0x69/0x140 ? asm_exc_page_fault+0x26/0x30 ? __smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag] smc_diag_dump_proto+0xd0/0xf0 [smc_diag] smc_diag_dump+0x26/0x60 [smc_diag] netlink_dump+0x19f/0x320 __netlink_dump_start+0x1dc/0x300 smc_diag_handler_dump+0x6a/0x80 [smc_diag] ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag] sock_diag_rcv_msg+0x121/0x140 ? __pfx_sock_diag_rcv_msg+0x10/0x10 netlink_rcv_skb+0x5a/0x110 sock_diag_rcv+0x28/0x40 netlink_unicast+0x22a/0x330 netlink_sendmsg+0x240/0x4a0 __sock_sendmsg+0xb0/0xc0 ____sys_sendmsg+0x24e/0x300 ? copy_msghdr_from_user+0x62/0x80 ___sys_sendmsg+0x7c/0xd0 ? __do_fault+0x34/0x1a0 ? do_read_fault+0x5f/0x100 ? do_fault+0xb0/0x110 __sys_sendmsg+0x4d/0x80 do_syscall_64+0x45/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 When the first RNIC is set down, the lgr->lnk[0] will be cleared and an asymmetric link will be allocated in lgr->link[SMC_LINKS_PER_LGR_MAX - 1] by smc_llc_alloc_alt_link(). Then when we try to dump SMC-R connections in __smc_diag_dump(), the invalid lgr->lnk[0] will be accessed, resulting in this issue. So fix it by accessing the right link. Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets") Reported-by: henaumars <henaumars@sina.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7616 Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Link: https://lore.kernel.org/r/1703662835-53416-1-git-send-email-guwen@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04Merge branch 'fix sockmap + stream af_unix memleak'Martin KaFai Lau3-4/+236
John Fastabend says: ==================== There was a memleak when streaming af_unix sockets were inserted into multiple sockmap slots and/or maps. This is because each insert would call a proto update operatino and these must be allowed to be called multiple times. The streaming af_unix implementation recently added a refcnt to handle a use after free issue, however it introduced a memleak when inserted into multiple maps. This series fixes the memleak, adds a note in the code so we remember that proto updates need to support this. And then we add three tests for each of the slightly different iterations of adding sockets into multiple maps. I kept them as 3 independent test cases here. I have some slight preference for this they could however be a single test, but then you don't get to run them independently which was sort of useful while debugging. ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-04bpf: sockmap, add tests for proto updates replace socketJohn Fastabend1-0/+69
Add test that replaces the same socket with itself. This exercises a corner case where old element and new element have the same posck. Test protocols: TCP, UDP, stream af_unix and dgram af_unix. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20231221232327.43678-6-john.fastabend@gmail.com
2024-01-04bpf: sockmap, add tests for proto updates single socket to many mapJohn Fastabend1-0/+74
Add test with multiple maps where each socket is inserted in multiple maps. Test protocols: TCP, UDP, stream af_unix and dgram af_unix. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20231221232327.43678-5-john.fastabend@gmail.com
2024-01-04bpf: sockmap, add tests for proto updates many to single mapJohn Fastabend1-1/+70
Add test with a single map where each socket is inserted multiple times. Test protocols: TCP, UDP, stream af_unix and dgram af_unix. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20231221232327.43678-4-john.fastabend@gmail.com
2024-01-04bpf: sockmap, added comments describing update proto rulesJohn Fastabend1-0/+5
Add a comment describing that the psock update proto callbback can be called multiple times and this must be safe. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20231221232327.43678-3-john.fastabend@gmail.com
2024-01-04net/qla3xxx: fix potential memleak in ql_alloc_buffer_queuesDinghao Liu1-0/+2
When dma_alloc_coherent() fails, we should free qdev->lrg_buf to prevent potential memleak. Fixes: 1357bfcf7106 ("qla3xxx: Dynamically size the rx buffer queue based on the MTU.") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Link: https://lore.kernel.org/r/20231227070227.10527-1-dinghao.liu@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04bpf: sockmap, fix proto update hook to avoid dup callsJohn Fastabend1-3/+18
When sockets are added to a sockmap or sockhash we allocate and init a psock. Then update the proto ops with sock_map_init_proto the flow is sock_hash_update_common sock_map_link psock = sock_map_psock_get_checked() <-returns existing psock sock_map_init_proto(sk, psock) <- updates sk_proto If the socket is already in a map this results in the sock_map_init_proto being called multiple times on the same socket. We do this because when a socket is added to multiple maps this might result in a new set of BPF programs being attached to the socket requiring an updated ops struct. This creates a rule where it must be safe to call psock_update_sk_prot multiple times. When we added a fix for UAF through unix sockets in patch 4dd9a38a753fc we broke this rule by adding a sock_hold in that path to ensure the sock is not released. The result is if a af_unix stream sock is placed in multiple maps it results in a memory leak because we call sock_hold multiple times with only a single sock_put on it. Fixes: 8866730aed51 ("bpf, sockmap: af_unix stream sockets need to hold ref for pair sock") Reported-by: Xingwei Lee <xrivendell7@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20231221232327.43678-2-john.fastabend@gmail.com
2024-01-04Merge branch '200GbE' of ↵Jakub Kicinski3-5/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-12-26 (idpf) This series contains updates to idpf driver only. Alexander resolves issues in singleq mode to prevent corrupted frames and leaking skbs. Pavan prevents extra padding on RSS struct causing load failure due to unexpected size. * '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: idpf: avoid compiler introduced padding in virtchnl2_rss_key struct idpf: fix corrupted frames and skb leaks in singleq mode ==================== Link: https://lore.kernel.org/r/20231226174125.2632875-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04virtio_net: fix missing dma unmap for resizeXuan Zhuo1-30/+30
For rq, we have three cases getting buffers from virtio core: 1. virtqueue_get_buf{,_ctx} 2. virtqueue_detach_unused_buf 3. callback for virtqueue_resize But in commit 295525e29a5b("virtio_net: merge dma operations when filling mergeable buffers"), I missed the dma unmap for the #3 case. That will leak some memory, because I did not release the pages referred by the unused buffers. If we do such script, we will make the system OOM. while true do ethtool -G ens4 rx 128 ethtool -G ens4 rx 256 free -m done Fixes: 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20231226094333.47740-1-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04fib: remove unnecessary input parameters in fib_default_rule_addZhengchao Shao6-11/+9
When fib_default_rule_add is invoked, the value of the input parameter 'flags' is always 0. Rules uses kzalloc to allocate memory, so 'flags' has been initialized to 0. Therefore, remove the input parameter 'flags' in fib_default_rule_add. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240102071519.3781384-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net: mvpp2: initialize port fwnode pointerMarcin Wojtas1-1/+1
Update the port's device structure also with its fwnode pointer with a recommended device_set_node() helper routine. Signed-off-by: Marcin Wojtas <marcin.s.wojtas@gmail.com> Reviewed-by: Suman Ghosh <sumang@marvell.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20231231122019.123344-1-marcin.s.wojtas@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net: mdio: mux-bcm-iproc: Use alignment helpers and SZ_4KIlpo Järvinen1-2/+4
Instead of open coding, use IS_ALIGNED() and ALIGN_DOWN() when dealing with alignment. Replace also literals with SZ_4K. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Acked-by: Ray Jui <ray.jui@broadcom.com> Link: https://lore.kernel.org/r/20231229145232.6163-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04Merge tag 'pci-v6.7-fixes-2' of ↵Linus Torvalds5-3/+33
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI fixes from Bjorn Helgaas: - Revert an ASPM patch that caused an unintended reboot when resuming after suspend (Bjorn Helgaas) - Orphan Cadence PCIe IP (Bjorn Helgaas) * tag 'pci-v6.7-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: MAINTAINERS: Orphan Cadence PCIe IP Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"
2024-01-04Merge tag 'apparmor-pr-2024-01-03' of ↵Linus Torvalds2-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor Pull apparmor fix from John Johansen: "Detect that the source mount is not in the namespace and if it isn't don't use it as a source path match. This prevent apparmor from applying the attach_disconnected flag to move_mount() source which prevents detached mounts from appearing as / when applying mount mediation, which is not only incorrect but could result in bad policy being generated" * tag 'apparmor-pr-2024-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: apparmor: Fix move_mount mediation by detecting if source is detached
2024-01-03apparmor: Fix move_mount mediation by detecting if source is detachedJohn Johansen2-0/+5
Prevent move_mount from applying the attach_disconnected flag to move_mount(). This prevents detached mounts from appearing as / when applying mount mediation, which is not only incorrect but could result in bad policy being generated. Basic mount rules like allow mount, allow mount options=(move) -> /target/, will allow detached mounts, allowing older policy to continue to function. New policy gains the ability to specify `detached` as a source option allow mount detached -> /target/, In addition make sure support of move_mount is advertised as a feature to userspace so that applications that generate policy can respond to the addition. Note: this fixes mediation of move_mount when a detached mount is used, it does not fix the broader regression of apparmor mediation of mounts under the new mount api. Link: https://lore.kernel.org/all/68c166b8-5b4d-4612-8042-1dee3334385b@leemhuis.info/T/#mb35fdde37f999f08f0b02d58dc1bf4e6b65b8da2 Fixes: 157a3537d6bc ("apparmor: Fix regression in mount mediation") Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2024-01-03Merge tag 'efi-urgent-for-v6.7-3' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fix from Ard Biesheuvel: - Ensure that the KASLR load flag is set in boot_params when loading the kernel randomized directly from the EFI stub * tag 'efi-urgent-for-v6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi/x86: Fix the missing KASLR_FLAG bit in boot_params->hdr.loadflags
2024-01-03Merge tag 'trace-v6.7-rc8' of ↵Linus Torvalds2-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix a NULL kernel dereference in set_gid() on tracefs mounting. When tracefs is mounted with "gid=1000", it will update the existing dentries to have the new gid. The tracefs_inode which is retrieved by a container_of(dentry->d_inode) has flags to see if the inode belongs to the eventfs system. The issue that was fixed was if getdents() was called on tracefs that was previously mounted, and was not closed. It will leave a "cursor dentry" in the subdirs list of the current dentries that set_gid() walks. On a remount of tracefs, the container_of(dentry->d_inode) will dereference a NULL pointer and cause a crash when referenced. Simply have a check for dentry->d_inode to see if it is NULL and if so, skip that entry. - Fix the bits of the eventfs_inode structure. The "is_events" bit was taken from the nr_entries field, but the nr_entries field wasn't updated to be 30 bits and was still 31. Including the "is_freed" bit this would use 33 bits which would make the structure use another integer for just one bit. * tag 'trace-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Fix bitwise fields for "is_events" tracefs: Check for dentry->d_inode exists in set_gid()
2024-01-03Merge tag 'bcachefs-2024-01-01' of https://evilpiepirate.org/git/bcachefsLinus Torvalds30-423/+977
Pull bcachefs from Kent Overstreet: "More bcachefs bugfixes for 6.7, and forwards compatibility work: - fix for a nasty extents + snapshot interaction, reported when reflink of a snapshotted file wouldn't complete but turned out to be a more general bug - fix for an invalid free in dio write path when iov vector was longer than our inline vector - fix for a buffer overflow in the nocow write path - BCH_REPLICAS_MAX doesn't actually limit the number of pointers in an extent when cached pointers are included - RO snapshots are actually RO now - And, a new superblock section to avoid future breakage when the disk space acounting rewrite rolls out: the new superblock section describes versions that need work to downgrade, where the work required is a list of recovery passes and errors to silently fix" * tag 'bcachefs-2024-01-01' of https://evilpiepirate.org/git/bcachefs: bcachefs: make RO snapshots actually RO bcachefs: bch_sb_field_downgrade bcachefs: bch_sb.recovery_passes_required bcachefs: Add persistent identifiers for recovery passes bcachefs: prt_bitflags_vector() bcachefs: move BCH_SB_ERRS() to sb-errors_types.h bcachefs: fix buffer overflow in nocow write path bcachefs: DARRAY_PREALLOCATED() bcachefs: Switch darray to kvmalloc() bcachefs: Factor out darray resize slowpath bcachefs: fix setting version_upgrade_complete bcachefs: fix invalid free in dio write path bcachefs: Fix extents iteration + snapshots interaction
2024-01-03igc: Fix hicredit calculationRodrigo Cataldo1-1/+1
According to the Intel Software Manual for I225, Section 7.5.2.7, hicredit should be multiplied by the constant link-rate value, 0x7736. Currently, the old constant link-rate value, 0x7735, from the boards supported on igb are being used, most likely due to a copy'n'paste, as the rest of the logic is the same for both drivers. Update hicredit accordingly. Fixes: 1ab011b0bf07 ("igc: Add support for CBS offloading") Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Rodrigo Cataldo <rodrigo.cadore@l-acoustics.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-03ice: fix Get link status data lengthPaul Greenwalt1-1/+2
Get link status version 2 (opcode 0x0607) is returning an error because FW expects a data length of 56 bytes, and this is causing the driver to fail probe. Update the get link status version 2 data length to 56 bytes by adding 5 byte reserved5 field to the end of struct ice_aqc_get_link_status_data and passing it as parameter to offsetofend() to the fix error. Fixes: 2777d24ec6d1 ("ice: Add ice_get_link_status_datalen") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-03i40e: Restore VF MSI-X state during PCI resetAndrii Staikov3-0/+32
During a PCI FLR the MSI-X Enable flag in the VF PCI MSI-X capability register will be cleared. This can lead to issues when a VF is assigned to a VM because in these cases the VF driver receives no indication of the PF PCI error/reset and additionally it is incapable of restoring the cleared flag in the hypervisor configuration space without fully reinitializing the driver interrupt functionality. Since the VF driver is unable to easily resolve this condition on its own, restore the VF MSI-X flag during the PF PCI reset handling. Fixes: 19b7960b2da1 ("i40e: implement split PCI error reset handler") Co-developed-by: Karen Ostrowska <karen.ostrowska@intel.com> Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com> Co-developed-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Andrii Staikov <andrii.staikov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-03Merge branch 'bpf-volatile-compare'Andrii Nakryiko29-275/+171
Alexei Starovoitov says: ==================== bpf: volatile compare From: Alexei Starovoitov <ast@kernel.org> v2->v3: Debugged profiler.c regression. It was caused by basic block layout. Introduce bpf_cmp_likely() and bpf_cmp_unlikely() macros. Debugged redundant <<=32, >>=32 with u32 variables. Added cast workaround. v1->v2: Fixed issues pointed out by Daniel, added more tests, attempted to convert profiler.c, but barrier_var() wins vs bpf_cmp(). To be investigated. Patches 1-4 are good to go, but 5 needs more work. ==================== Link: https://lore.kernel.org/r/20231226191148.48536-1-alexei.starovoitov@gmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-01-03selftests/bpf: Convert profiler.c to bpf_cmp.Alexei Starovoitov1-46/+18
Convert profiler[123].c to "volatile compare" to compare barrier_var() approach vs bpf_cmp_likely() vs bpf_cmp_unlikely(). bpf_cmp_unlikely() produces correct code, but takes much longer to verify: ./veristat -C -e prog,insns,states before after_with_unlikely Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) ------------------------------------ --------- --------- ------------------ ---------- ---------- ----------------- kprobe__proc_sys_write 1603 19606 +18003 (+1123.08%) 123 1678 +1555 (+1264.23%) kprobe__vfs_link 11815 70305 +58490 (+495.05%) 971 4967 +3996 (+411.53%) kprobe__vfs_symlink 5464 42896 +37432 (+685.07%) 434 3126 +2692 (+620.28%) kprobe_ret__do_filp_open 5641 44578 +38937 (+690.25%) 446 3162 +2716 (+608.97%) raw_tracepoint__sched_process_exec 2770 35962 +33192 (+1198.27%) 226 3121 +2895 (+1280.97%) raw_tracepoint__sched_process_exit 1526 2135 +609 (+39.91%) 133 208 +75 (+56.39%) raw_tracepoint__sched_process_fork 265 337 +72 (+27.17%) 19 24 +5 (+26.32%) tracepoint__syscalls__sys_enter_kill 18782 140407 +121625 (+647.56%) 1286 12176 +10890 (+846.81%) bpf_cmp_likely() is equivalent to barrier_var(): ./veristat -C -e prog,insns,states before after_with_likely Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) ------------------------------------ --------- --------- -------------- ---------- ---------- ------------- kprobe__proc_sys_write 1603 1663 +60 (+3.74%) 123 127 +4 (+3.25%) kprobe__vfs_link 11815 12090 +275 (+2.33%) 971 971 +0 (+0.00%) kprobe__vfs_symlink 5464 5448 -16 (-0.29%) 434 426 -8 (-1.84%) kprobe_ret__do_filp_open 5641 5739 +98 (+1.74%) 446 446 +0 (+0.00%) raw_tracepoint__sched_process_exec 2770 2608 -162 (-5.85%) 226 216 -10 (-4.42%) raw_tracepoint__sched_process_exit 1526 1526 +0 (+0.00%) 133 133 +0 (+0.00%) raw_tracepoint__sched_process_fork 265 265 +0 (+0.00%) 19 19 +0 (+0.00%) tracepoint__syscalls__sys_enter_kill 18782 18970 +188 (+1.00%) 1286 1286 +0 (+0.00%) kprobe__proc_sys_write 2700 2809 +109 (+4.04%) 107 109 +2 (+1.87%) kprobe__vfs_link 12238 12366 +128 (+1.05%) 267 269 +2 (+0.75%) kprobe__vfs_symlink 7139 7365 +226 (+3.17%) 167 175 +8 (+4.79%) kprobe_ret__do_filp_open 7264 7070 -194 (-2.67%) 180 182 +2 (+1.11%) raw_tracepoint__sched_process_exec 3768 3453 -315 (-8.36%) 211 199 -12 (-5.69%) raw_tracepoint__sched_process_exit 3138 3138 +0 (+0.00%) 83 83 +0 (+0.00%) raw_tracepoint__sched_process_fork 265 265 +0 (+0.00%) 19 19 +0 (+0.00%) tracepoint__syscalls__sys_enter_kill 26679 24327 -2352 (-8.82%) 1067 1037 -30 (-2.81%) kprobe__proc_sys_write 1833 1833 +0 (+0.00%) 157 157 +0 (+0.00%) kprobe__vfs_link 9995 10127 +132 (+1.32%) 803 803 +0 (+0.00%) kprobe__vfs_symlink 5606 5672 +66 (+1.18%) 451 451 +0 (+0.00%) kprobe_ret__do_filp_open 5716 5782 +66 (+1.15%) 462 462 +0 (+0.00%) raw_tracepoint__sched_process_exec 3042 3042 +0 (+0.00%) 278 278 +0 (+0.00%) raw_tracepoint__sched_process_exit 1680 1680 +0 (+0.00%) 146 146 +0 (+0.00%) raw_tracepoint__sched_process_fork 299 299 +0 (+0.00%) 25 25 +0 (+0.00%) tracepoint__syscalls__sys_enter_kill 18372 18372 +0 (+0.00%) 1558 1558 +0 (+0.00%) default (mcpu=v3), no_alu32, cpuv4 have similar differences. Note one place where bpf_nop_mov() is used to workaround the verifier lack of link between the scalar register and its spill to stack. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231226191148.48536-7-alexei.starovoitov@gmail.com