summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-10-27net: hns3: fix data endian problem of some functions of debugfsJie Wang1-16/+14
The member data in struct hclge_desc is type of __le32, it needs endian conversion before using it, and some functions of debugfs didn't do that, so this patch fixes it. Fixes: c0ebebb9ccc1 ("net: hns3: Add "dcb register" status information query function") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27net: hns3: ignore reset event before initialization process is doneGuangbin Huang3-0/+5
Currently, if there is a reset event triggered by RAS during device in initialization process, driver may run reset process concurrently with initialization process. In this case, it may cause problem. For example, the RSS indirection table may has not been alloc memory in initialization process yet, but it is used in reset process, it will cause a call trace like this: [61228.744836] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ... [61228.897677] Workqueue: hclgevf hclgevf_service_task [hclgevf] [61228.911390] pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--) [61228.918670] pc : hclgevf_set_rss_indir_table+0xb4/0x190 [hclgevf] [61228.927812] lr : hclgevf_set_rss_indir_table+0x90/0x190 [hclgevf] [61228.937248] sp : ffff8000162ebb50 [61228.941087] x29: ffff8000162ebb50 x28: ffffb77add72dbc0 x27: ffff0820c7dc8080 [61228.949516] x26: 0000000000000000 x25: ffff0820ad4fc880 x24: ffff0820c7dc8080 [61228.958220] x23: ffff0820c7dc8090 x22: 00000000ffffffff x21: 0000000000000040 [61228.966360] x20: ffffb77add72b9c0 x19: 0000000000000000 x18: 0000000000000030 [61228.974646] x17: 0000000000000000 x16: ffffb77ae713feb0 x15: ffff0820ad4fcce8 [61228.982808] x14: ffffffffffffffff x13: ffff8000962eb7f7 x12: 00003834ec70c960 [61228.991990] x11: 00e0fafa8c206982 x10: 9670facc78a8f9a8 x9 : ffffb77add717530 [61229.001123] x8 : ffff0820ad4fd6b8 x7 : 0000000000000000 x6 : 0000000000000011 [61229.010249] x5 : 00000000000cb1b0 x4 : 0000000000002adb x3 : 0000000000000049 [61229.018662] x2 : ffff8000162ebbb8 x1 : 0000000000000000 x0 : 0000000000000480 [61229.027002] Call trace: [61229.030177] hclgevf_set_rss_indir_table+0xb4/0x190 [hclgevf] [61229.039009] hclgevf_rss_init_hw+0x128/0x1b4 [hclgevf] [61229.046809] hclgevf_reset_rebuild+0x17c/0x69c [hclgevf] [61229.053862] hclgevf_reset_service_task+0x4cc/0xa80 [hclgevf] [61229.061306] hclgevf_service_task+0x6c/0x630 [hclgevf] [61229.068491] process_one_work+0x1dc/0x48c [61229.074121] worker_thread+0x15c/0x464 [61229.078562] kthread+0x168/0x16c [61229.082873] ret_from_fork+0x10/0x18 [61229.088221] Code: 7900e7f6 f904a683 d503201f 9101a3e2 (38616b43) [61229.095357] ---[ end trace 153661a538f6768c ]--- To fix this problem, don't schedule reset task before initialization process is done. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27net: hns3: change hclge/hclgevf workqueue to WQ_UNBOUND modeYufeng Mo3-31/+6
Currently, the workqueue of hclge/hclgevf is executed on the CPU that initiates scheduling requests by default. In stress scenarios, the CPU may be busy and workqueue scheduling is completed after a long period of time. To avoid this situation and implement proper scheduling, use the WQ_UNBOUND mode instead. In this way, the workqueue can be performed on a relatively idle CPU. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27net: hns3: fix pause config problem after autoneg disabledGuangbin Huang5-10/+57
If a TP port is configured by follow steps: 1.ethtool -s ethx autoneg off speed 100 duplex full 2.ethtool -A ethx rx on tx on 3.ethtool -s ethx autoneg on(rx&tx negotiated pause results are off) 4.ethtool -s ethx autoneg off speed 100 duplex full In step 3, driver will set rx&tx pause parameters of hardware to off as pause parameters negotiated with link partner are off. After step 4, the "ethtool -a ethx" command shows both rx and tx pause parameters are on. However, pause parameters of hardware are still off and port has no flow control function actually. To fix this problem, if autoneg is disabled, driver uses its saved parameters to restore pause of hardware. If the speed is not changed in this case, there is no link state changed for phy, it will cause the pause parameter is not taken effect, so we need to force phy to go down and up. Fixes: aacbe27e82f0 ("net: hns3: modify how pause options is displayed") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27Merge tag 'mlx5-updates-2021-10-26' of ↵David S. Miller26-164/+1300
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-10-26 HW-GRO support in mlx5 Beside the HW GRO this series includes two trivial non-mlx5 patches: - net: Prevent HW-GRO and LRO features operate together - lib: bitmap: Introduce node-aware alloc API Khalid Manaa Says: ================== This series implements the HW-GRO offload using the HW feature SHAMPO. HW-GRO: Hardware offload for the Generic Receive Offload feature. SHAMPO: Split Headers And Merge Payload Offload. This feature performs headers data split for each received packed and merge the payloads of the packets of the same session. There are new HW components for this feature: The headers buffer: – cyclic buffer where the packets headers will be located Reservation buffer: – capability to divide RQ WQEs to reservations, a definite size in granularity of 4KB, the reservation is used to define the largest segment that we can create by packets stitching. Each reservation will have a session and the new received packet can be merged to the session, terminate it, or open a new one according to the match criteria. When a new packet is received the headers will be written to the headers buffer and the data will be written to the reservation, in case the packet matches the session the data will be written continuously otherwise it will be written after performing an alignment. SHAMPO RQ, WQ and CQE changes: ----------------------------- RQ (receive queue) new params: -shampo_no_match_alignment_granularity: the HW alignment granularity in case the received packet doesn't match the current session. -shampo_match_criteria_type: the type of match criteria. -reservation_timeout: the maximum time that the HW will hold the reservation. -Each RQ has SKB that represents the current opened flow. WQ (work queue) new params: -headers_mkey: mkey that represents the headers buffer, where the packets headers will be written by the HW. -shampo_enable: flag to verify if the WQ supports SHAMPO feature. -log_reservation_size: the log of the reservation size where the data of the packet will be written by the HW. -log_max_num_of_packets_per_reservation: log of the maximum number of packets that can be written to the same reservation. -log_headers_entry_size: log of the header entry size of the headers buffer. -log_headers_buffer_entry_num: log of the entries number of the headers buffer. CQEs (Completion queue entry) SHAMPO fields: -match: in case it is set, then the current packet matches the opened session. -flush: in case it is set, the opened session must be flushed. -header_size: the size of the packet’s headers. -header_entry_index: the entry index in the headers buffer of the received packet headers. -data_offset: the offset of the received packet data in the WQE. HW-GRO works as follow: ---------------------- The feature can be enabled on the interface using the ethtool command by setting on rx-gro-hw. When the feature is on the mlx5 driver will reopen the RQ to support the SHAMPO feature: Will allocate the headers buffer and fill the parameters regarding the reservation and the match criteria. Receive packet flow: each RQ will hold SKB that represents the current GRO opened session. The driver has a new CQE handler mlx5e_handle_rx_cqe_mpwrq_shampo which will use the CQE SHAMPO params to extract the location of the packet’s headers in the headers buffer and the location of the packets data in the RQ. Also, the CQE has two flags flush and match that indicate if the current packet matches the current session or not and if we need to close the session. In case there is an opened session, and we receive a matched packet then the handler will merge the packet's payload to the current SKB, in case we receive no match then the handler will flush the SKB and create a new one for the new packet. In case the flash flag is set then the driver will close the session, the SKB will be passed to the network stack. In case the driver merges packets in the SKB, before passing the SKB to the network stack the driver will update the checksum of the packet’s headers. SKB build: --------- The driver will build a new SKB in the following situations: in case there is no current opened session. In case the current packet doesn’t match the current session. In case there is no place to add the packets data to the SKB that represents the current session. Otherwise, the driver will add the packet’s data to the SKB. When the driver builds a new SKB, the linear area will contain only the packet headers and the data will be added to the SKB fragments. In case the entry size of the headers buffer is sufficient to build the SKB it will be used, otherwise the driver will allocate new memory to build the SKB. ================== ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27net/mlx5: Lag, Make mlx5_lag_is_multipath() be static inlineMaor Dickman1-1/+1
Fix "no previous prototype" W=1 warnings when CONFIG_MLX5_CORE_EN is not set: drivers/net/ethernet/mellanox/mlx5/core/lag_mp.h:34:6: error: no previous prototype for ‘mlx5_lag_is_multipath’ [-Werror=missing-prototypes] 34 | bool mlx5_lag_is_multipath(struct mlx5_core_dev *dev) { return false; } | ^~~~~~~~~~~~~~~~~~~~~ Fixes: 14fe2471c628 ("net/mlx5: Lag, change multipath and bonding to be mutually exclusive") Signed-off-by: Maor Dickman <maord@nvidia.com>
2021-10-27net/mlx5e: Prevent HW-GRO and CQE-COMPRESS features operate togetherKhalid Manaa2-0/+10
HW-GRO and CQE-COMPRESS are mutually exclusive, this commit adds this restriction. Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27net/mlx5e: Add HW-GRO offloadKhalid Manaa1-0/+39
This commit introduces HW-GRO offload by using the SHAMPO feature - Add set feature handler for HW-GRO. Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27net/mlx5e: Add HW_GRO statisticsKhalid Manaa3-3/+34
This patch adds HW_GRO counters to RX packets statistics: - gro_match_packets: counter of received packets with set match flag. - gro_packets: counter of received packets over the HW_GRO feature, this counter is increased by one for every received HW_GRO cqe. - gro_bytes: counter of received bytes over the HW_GRO feature, this counter is increased by the received bytes for every received HW_GRO cqe. - gro_skbs: counter of built HW_GRO skbs, increased by one when we flush HW_GRO skb (when we call a napi_gro_receive with hw_gro skb). - gro_large_hds: counter of received packets with large headers size, in case the packet needs new SKB, the driver will allocate new one and will not use the headers entry to build it. Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27net/mlx5e: HW_GRO cqe handler implementationKhalid Manaa3-11/+242
this patch updates the SHAMPO CQE handler to support HW_GRO, changes in the SHAMPO CQE handler: - CQE match and flush fields are used to determine if to build new skb using the new received packet, or to add the received packet data to the existing RQ.hw_gro_skb, also this fields are used to determine when to flush the skb. - in the end of the function mlx5e_poll_rx_cq the RQ.hw_gro_skb is flushed. Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27net/mlx5e: Add data path for SHAMPO featureBen Ben-Ishay2-0/+189
The header buffer is used to store the headers of the rx packets. The header buffer size deduced from WorkQueue size + restriction of max packets per WorkQueueElement. This commit adds the functionality for posting/updating memory for the header buffer during the posting/updating of WQEs. Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27net/mlx5e: Add handle SHAMPO cqe supportKhalid Manaa3-35/+194
This patch adds the new CQE SHAMPO fields: - flush: indicates that we must close the current session and pass the SKB to the network stack. - match: indicates that the current packet matches the oppened session, the packet will be merge into the current SKB. - header_size: the size of the packet headers that written into the headers buffer. - header_entry_index: the entry index in the headers buffer. - data_offset: packets data offset in the WQE. Also new cqe handler is added to handle SHAMPO packets: - The new handler uses CQE SHAMPO fields to build the SKB. CQE's Flush and match fields are not used in this patch, packets are not merged in this patch. Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27net/mlx5e: Add control path for SHAMPO featureBen Ben-Ishay6-30/+400
This commit introduces the control path infrastructure for SHAMPO feature. SHAMPO feature enables packet stitching by splitting packets to header and payload, the header is placed on a dedicated buffer and the payload on the RX ring, this allows stitching the data part of a flow together continuously in the receive buffer. SHAMPO feature is implemented as linked list striding RQ feature. To support packets splitting and payload stitching: - Enlarge the ICOSQ and the correspond CQ to support the header buffer memory regions. - Add support to create linked list striding RQ with SHAMPO feature set in the open_rq function. - Add deallocation function and corresponded calls for SHAMPO header buffer. - Add mlx5e_create_umr_klm_mkey to support KLM mkey for the header buffer. - Rename mlx5e_create_umr_mkey to mlx5e_create_umr_mtt_mkey. Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27net/mlx5e: Add support to klm_umr_wqeBen Ben-Ishay2-1/+24
This commit adds the needed definitions for using the klm_umr_wqe. UMR stands for user-mode memory registration, is a mechanism to alter address translation properties of MKEY by posting WorkQueueElement aka WQE on send queue. MKEY stands for memory key, MKEY are used to describe a region in memory that can be later used by HW. KLM stands for {Key, Length, MemVa}, KLM_MKEY is indirect MKEY that enables to map multiple memory spaces with different sizes in unified MKEY. klm_umr_wqe is a UMR that use to update a KLM_MKEY. SHAMPO feature uses KLM_MKEY for memory registration of his header buffer. Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27net/mlx5e: Rename TIR lro functions to TIR packet merge functionsKhalid Manaa15-92/+95
This series introduces new packet merge type, therefore rename lro functions to packet merge to support the new merge type: - Generalize + rename mlx5e_build_tir_ctx_lro to mlx5e_build_tir_ctx_packet_merge. - Rename mlx5e_modify_tirs_lro to mlx5e_modify_tirs_packet_merge. - Rename lro bit in mlx5_ifc_modify_tir_bitmask_bits to packet_merge. - Rename lro_en in mlx5e_params to packet_merge_type type and combine packet_merge params into one struct mlx5e_packet_merge_param. Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27net/mlx5: Add SHAMPO caps, HW bits and enumerationsBen Ben-Ishay4-3/+64
This commit adds SHAMPO bit to hca_cap and SHAMPO capabilities structure, SHAMPO related HW spec hardware fields and enumerations. SHAMPO stands for: split headers and merge payload offload. SHAMPO new fields: WQ: - headers_mkey: mkey that represents the headers buffer, where the packets headers will be written by the HW. - shampo_enable: flag to verify if the WQ supports SHAMPO feature. - log_reservation_size: the log of the reservation size where the data of the packet will be written by the HW. - log_max_num_of_packets_per_reservation: log of the maximum number of packets that can be written to the same reservation. - log_headers_entry_size: log of the header entry size of the headers buffer. - log_headers_buffer_entry_num: log of the entries number of the headers buffer. RQ: - shampo_no_match_alignment_granularity: the HW alignment granularity in case the received packet doesn't match the current session. - shampo_match_criteria_type: the type of match criteria. - reservation_timeout: the maximum time that the HW will hold the reservation. mlx5_ifc_shampo_cap_bits, the capabilities of the SHAMPO feature: - shampo_log_max_reservation_size: the maximum allowed value of the field WQ.log_reservation_size. - log_reservation_size: the minimum allowed value of the field WQ.log_reservation_size. - shampo_min_mss_size: the minimum payload size of packet that can open a new session or be merged to a session. - shampo_max_log_headers_entry_size: the maximum allowed value of the field WQ.log_headers_entry_size Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27net/mlx5e: Rename lro_timeout to packet_merge_timeoutBen Ben-Ishay5-9/+9
TIR stands for transport interface receive, the TIR object is responsible for performing all transport related operations on the receive side like packet processing, demultiplexing the packets to different RQ's, etc. lro_timeout is a field in the TIR that is used to set the timeout for lro session, this series introduces new packet merge type, therefore rename lro_timeout to packet_merge_timeout for all packet merge types. Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27net: Prevent HW-GRO and LRO features operate togetherBen Ben-ishay1-0/+5
LRO and HW-GRO are mutually exclusive, this commit adds this restriction in netdev_fix_feature. HW-GRO is preferred, that means in case both HW-GRO and LRO features are requested, LRO is cleared. Signed-off-by: Ben Ben-ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27lib: bitmap: Introduce node-aware alloc APITariq Toukan2-0/+15
Expose new node-aware API for bitmap allocation: bitmap_alloc_node() / bitmap_zalloc_node(). Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27Merge tag 'arm-soc-fixes-5.15-3' of ↵Linus Torvalds10-13/+64
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "One last set of small fixes for the soc tree: - Incorrect ethernet phy settings found on i.mx and allwinner platforms - a revert for a Qualcomm DT change that caused a boot regression - four patches for incorrect settings in i.MX DT files - new MAINTAINER file entries for dhcom boards - a Kconfig fix for a reset driver that became unselectable - three more code changes for bugs in reset drivers" * tag 'arm-soc-fixes-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: MAINTAINERS: Add maintainers for DHCOM i.MX6 and DHCOM/DHCOR STM32MP1 Revert "arm64: dts: qcom: sm8250: remove bus clock from the mdss node for sm8250 target" arm64: dts: imx8mm-kontron: Fix connection type for VSC8531 RGMII PHY arm64: dts: imx8mm-kontron: Fix CAN SPI clock frequency arm64: dts: imx8mm-kontron: Fix polarity of reg_rst_eth2 arm64: dts: imx8mm-kontron: Set lower limit of VDD_SNVS to 800 mV arm64: dts: imx8mm-kontron: Make sure SOC and DRAM supply voltages are correct reset: socfpga: add empty driver allowing consumers to probe reset: tegra-bpmp: Handle errors in BPMP response reset: pistachio: Re-enable driver selection reset: brcmstb-rescal: fix incorrect polarity of status bit ARM: dts: sun7i: A20-olinuxino-lime2: Fix ethernet phy-mode arm64: dts: allwinner: h5: NanoPI Neo 2: Fix ethernet node
2021-10-27Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski23-98/+118
Daniel Borkmann says: ==================== pull-request: bpf 2021-10-26 We've added 12 non-merge commits during the last 7 day(s) which contain a total of 23 files changed, 118 insertions(+), 98 deletions(-). The main changes are: 1) Fix potential race window in BPF tail call compatibility check, from Toke Høiland-Jørgensen. 2) Fix memory leak in cgroup fs due to missing cgroup_bpf_offline(), from Quanyang Wang. 3) Fix file descriptor reference counting in generic_map_update_batch(), from Xu Kuohai. 4) Fix bpf_jit_limit knob to the max supported limit by the arch's JIT, from Lorenz Bauer. 5) Fix BPF sockmap ->poll callbacks for UDP and AF_UNIX sockets, from Cong Wang and Yucong Sun. 6) Fix BPF sockmap concurrency issue in TCP on non-blocking sendmsg calls, from Liu Jian. 7) Fix build failure of INODE_STORAGE and TASK_STORAGE maps on !CONFIG_NET, from Tejun Heo. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix potential race in tail call compatibility check bpf: Move BPF_MAP_TYPE for INODE_STORAGE and TASK_STORAGE outside of CONFIG_NET selftests/bpf: Use recv_timeout() instead of retries net: Implement ->sock_is_readable() for UDP and AF_UNIX skmsg: Extract and reuse sk_msg_is_readable() net: Rename ->stream_memory_read to ->sock_is_readable tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function cgroup: Fix memory leak caused by missing cgroup_bpf_offline bpf: Fix error usage of map_fd and fdget() in generic_map_update_batch() bpf: Prevent increasing bpf_jit_limit above max bpf: Define bpf_jit_alloc_exec_limit for arm64 JIT bpf: Define bpf_jit_alloc_exec_limit for riscv JIT ==================== Link: https://lore.kernel.org/r/20211026201920.11296-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27net: phy: fixed warning: Function parameter not describedLuo Jie1-0/+1
Fixed warning: Function parameter or member 'enable' not described in 'genphy_c45_fast_retrain' Signed-off-by: Luo Jie <luoj@codeaurora.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20211026102957.17100-1-luoj@codeaurora.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-26bpf: Fix potential race in tail call compatibility checkToke Høiland-Jørgensen4-11/+23
Lorenzo noticed that the code testing for program type compatibility of tail call maps is potentially racy in that two threads could encounter a map with an unset type simultaneously and both return true even though they are inserting incompatible programs. The race window is quite small, but artificially enlarging it by adding a usleep_range() inside the check in bpf_prog_array_compatible() makes it trivial to trigger from userspace with a program that does, essentially: map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0); pid = fork(); if (pid) { key = 0; value = xdp_fd; } else { key = 1; value = tc_fd; } err = bpf_map_update_elem(map_fd, &key, &value, 0); While the race window is small, it has potentially serious ramifications in that triggering it would allow a BPF program to tail call to a program of a different type. So let's get rid of it by protecting the update with a spinlock. The commit in the Fixes tag is the last commit that touches the code in question. v2: - Use a spinlock instead of an atomic variable and cmpxchg() (Alexei) v3: - Put lock and the members it protects into an embedded 'owner' struct (Daniel) Fixes: 3324b584b6f6 ("ebpf: misc core cleanup") Reported-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com
2021-10-26bpf: Move BPF_MAP_TYPE for INODE_STORAGE and TASK_STORAGE outside of CONFIG_NETTejun Heo1-4/+4
bpf_types.h has BPF_MAP_TYPE_INODE_STORAGE and BPF_MAP_TYPE_TASK_STORAGE declared inside #ifdef CONFIG_NET although they are built regardless of CONFIG_NET. So, when CONFIG_BPF_SYSCALL && !CONFIG_NET, they are built without the declarations leading to spurious build failures and not registered to bpf_map_types making them unavailable. Fix it by moving the BPF_MAP_TYPE for the two map types outside of CONFIG_NET. Reported-by: kernel test robot <lkp@intel.com> Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs") Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/YXG1cuuSJDqHQfRY@slm.duckdns.org
2021-10-26Merge branch 'sock_map: fix ->poll() and update selftests'Alexei Starovoitov13-78/+58
Cong Wang says: ==================== This patchset fixes ->poll() for sockets in sockmap and updates selftests accordingly with select(). Please check each patch for more details. Fixes: c50524ec4e3a ("Merge branch 'sockmap: add sockmap support for unix datagram socket'") Fixes: 89d69c5d0fbc ("Merge branch 'sockmap: introduce BPF_SK_SKB_VERDICT and support UDP'") Acked-by: John Fastabend <john.fastabend@gmail.com> --- v4: add a comment in udp_poll() v3: drop sk_psock_get_checked() reuse tcp_bpf_sock_is_readable() v2: rename and reuse ->stream_memory_read() fix a compile error in sk_psock_get_checked() Cong Wang (3): net: rename ->stream_memory_read to ->sock_is_readable skmsg: extract and reuse sk_msg_is_readable() net: implement ->sock_is_readable() for UDP and AF_UNIX ==================== Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-10-26selftests/bpf: Use recv_timeout() instead of retriesYucong Sun1-55/+20
We use non-blocking sockets in those tests, retrying for EAGAIN is ugly because there is no upper bound for the packet arrival time, at least in theory. After we fix poll() on sockmap sockets, now we can switch to select()+recv(). Signed-off-by: Yucong Sun <sunyucong@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211008203306.37525-5-xiyou.wangcong@gmail.com
2021-10-26net: Implement ->sock_is_readable() for UDP and AF_UNIXCong Wang4-0/+10
Yucong noticed we can't poll() sockets in sockmap even when they are the destination sockets of redirections. This is because we never poll any psock queues in ->poll(), except for TCP. With ->sock_is_readable() now we can overwrite >sock_is_readable(), invoke and implement it for both UDP and AF_UNIX sockets. Reported-by: Yucong Sun <sunyucong@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211008203306.37525-4-xiyou.wangcong@gmail.com
2021-10-26skmsg: Extract and reuse sk_msg_is_readable()Cong Wang3-14/+16
tcp_bpf_sock_is_readable() is pretty much generic, we can extract it and reuse it for non-TCP sockets. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211008203306.37525-3-xiyou.wangcong@gmail.com
2021-10-26net: Rename ->stream_memory_read to ->sock_is_readableCong Wang6-11/+14
The proto ops ->stream_memory_read() is currently only used by TCP to check whether psock queue is empty or not. We need to rename it before reusing it for non-TCP protocols, and adjust the exsiting users accordingly. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211008203306.37525-2-xiyou.wangcong@gmail.com
2021-10-26tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict functionLiu Jian1-0/+12
With two Msgs, msgA and msgB and a user doing nonblocking sendmsg calls (or multiple cores) on a single socket 'sk' we could get the following flow. msgA, sk msgB, sk ----------- --------------- tcp_bpf_sendmsg() lock(sk) psock = sk->psock tcp_bpf_sendmsg() lock(sk) ... blocking tcp_bpf_send_verdict if (psock->eval == NONE) psock->eval = sk_psock_msg_verdict .. < handle SK_REDIRECT case > release_sock(sk) < lock dropped so grab here > ret = tcp_bpf_sendmsg_redir psock = sk->psock tcp_bpf_send_verdict lock_sock(sk) ... blocking on B if (psock->eval == NONE) <- boom. psock->eval will have msgA state The problem here is we dropped the lock on msgA and grabbed it with msgB. Now we have old state in psock and importantly psock->eval has not been cleared. So msgB will run whatever action was done on A and the verdict program may never see it. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211012052019.184398-1-liujian56@huawei.com
2021-10-26watchdog: Fix OMAP watchdog early handlingWalter Stoll1-1/+5
TI's implementation does not service the watchdog even if the kernel command line parameter omap_wdt.early_enable is set to 1. This patch fixes the issue. Signed-off-by: Walter Stoll <walter.stoll@duagon.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/88a8fe5229cd68fa0f1fd22f5d66666c1b7057a0.camel@duagon.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2021-10-26watchdog: ixp4xx_wdt: Fix address space warningGuenter Roeck1-1/+1
sparse reports the following address space warning. drivers/watchdog/ixp4xx_wdt.c:122:20: sparse: incorrect type in assignment (different address spaces) drivers/watchdog/ixp4xx_wdt.c:122:20: sparse: expected void [noderef] __iomem *base drivers/watchdog/ixp4xx_wdt.c:122:20: sparse: got void *platform_data Add a typecast to solve the problem. Fixes: 21a0a29d16c6 ("watchdog: ixp4xx: Rewrite driver to use core") Cc: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210911042925.556889-1-linux@roeck-us.net Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2021-10-26watchdog: sbsa: drop unneeded MODULE_ALIASKrzysztof Kozlowski1-1/+0
The MODULE_DEVICE_TABLE already creates proper alias for platform driver. Having another MODULE_ALIAS causes the alias to be duplicated. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210917092024.19323-1-krzysztof.kozlowski@canonical.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2021-10-26watchdog: sbsa: only use 32-bit accessorsJamie Iles1-2/+2
SBSA says of the generic watchdog: All registers are 32 bits in size and should be accessed using 32-bit reads and writes. If an access size other than 32 bits is used then the results are IMPLEMENTATION DEFINED. and for qemu, the implementation will only allow 32-bit accesses resulting in a synchronous external abort when configuring the watchdog. Use lo_hi_* accessors rather than a readq/writeq. Fixes: abd3ac7902fb ("watchdog: sbsa: Support architecture version 1") Signed-off-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Link: https://lore.kernel.org/r/20210903112101.493552-1-quic_jiles@quicinc.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2021-10-26Revert "watchdog: iTCO_wdt: Account for rebooting on second timeout"Guenter Roeck1-9/+3
This reverts commit cb011044e34c ("watchdog: iTCO_wdt: Account for rebooting on second timeout") and commit aec42642d91f ("watchdog: iTCO_wdt: Fix detection of SMI-off case") since those patches cause a regression on certain boards (https://bugzilla.kernel.org/show_bug.cgi?id=213809). While this revert may result in some boards to only reset after twice the configured timeout value, that is still better than a watchdog reset after half the configured value. Fixes: cb011044e34c ("watchdog: iTCO_wdt: Account for rebooting on second timeout") Fixes: aec42642d91f ("watchdog: iTCO_wdt: Fix detection of SMI-off case") Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Mantas Mikulėnas <grawity@gmail.com> Reported-by: Javier S. Pedro <debbugs@javispedro.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20211008003302.1461733-1-linux@roeck-us.net Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2021-10-26net/mlx5: remove the recent devlink paramsJakub Kicinski10-207/+9
revert commit 46ae40b94d88 ("net/mlx5: Let user configure io_eq_size param") revert commit a6cb08daa3b4 ("net/mlx5: Let user configure event_eq_size param") revert commit 554604061979 ("net/mlx5: Let user configure max_macs param") The EQE parameters are applicable to more drivers, they should be configured via standard API, probably ethtool. Example of another driver needing something similar: https://lore.kernel.org/all/1633454136-14679-3-git-send-email-sbhatta@marvell.com/ The last param for "max_macs" is probably fine but the documentation is severely lacking. The meaning and implications for changing the param need to be stated. Link: https://lore.kernel.org/r/20211026152939.3125950-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-26MAINTAINERS: Add maintainers for DHCOM i.MX6 and DHCOM/DHCOR STM32MP1Christoph Niedermaier1-0/+13
Add maintainers for DH electronics DHCOM i.MX6 and DHCOM/DHCOR STM32MP1 boards. Signed-off-by: Christoph Niedermaier <cniedermaier@dh-electronics.com> Cc: linux-arm-kernel@lists.infradead.org Cc: kernel@dh-electronics.com Cc: arnd@arndb.de Link: https://lore.kernel.org/r/20211025073706.2794-1-cniedermaier@dh-electronics.com' To: soc@kernel.org To: linux-kernel@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-26Merge tag 'qcom-arm64-fixes-for-5.15-2' of ↵Arnd Bergmann1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes Qualcomm ARM64 DTS one more fix for 5.15 This reverts a clock change in the Qualcomm RB5 devicetree which in some combinations of firmware and configuration causes the device to crash during boot. Data on an adjacent platform indicates that this is probably not be the root cause of the problem, but this resolves the regression seen on RB5 and will allow the SM8250 platform to boot v5.15. * tag 'qcom-arm64-fixes-for-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: Revert "arm64: dts: qcom: sm8250: remove bus clock from the mdss node for sm8250 target" Link: https://lore.kernel.org/r/20211025201213.1145348-1-bjorn.andersson@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-26MAINTAINERS: please remove myself from the Prestera driverVadym Kochan1-1/+0
Signed-off-by: Vadym Kochan <vkochan@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: lan78xx: fix division by zero in send pathJohan Hovold1-0/+6
Add the missing endpoint max-packet sanity check to probe() to avoid division by zero in lan78xx_tx_bh() in case a malicious device has broken descriptors (or when doing descriptor fuzz testing). Note that USB core will reject URBs submitted for endpoints with zero wMaxPacketSize but that drivers doing packet-size calculations still need to handle this (cf. commit 2548288b4fb0 ("USB: Fix: Don't skip endpoint descriptors with maxpacket=0")). Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Cc: stable@vger.kernel.org # 4.3 Cc: Woojung.Huh@microchip.com <Woojung.Huh@microchip.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26Merge branch 'phy-supported-interfaces-bitmap'David S. Miller3-2/+81
Russell King says: ==================== Introduce supported interfaces bitmap This series introduces a new bitmap to allow us to indicate which phy_interface_t modes are supported. Currently, phylink will call ->validate with PHY_INTERFACE_MODE_NA to request all link mode capabilities from the MAC driver before choosing an interface to use. This leads in some cases to some rather hairly code. This can be simplified if phylink is aware of the interface modes that the MAC supports, and it can instead walk those modes, calling ->validate for each one, and combining the results. This series merely introduces the support; there is no change of behaviour until MAC drivers populate their supported_interfaces bitmap. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: phylink: use supported_interfaces for phylink validationRussell King (Oracle)2-2/+46
If the network device supplies a supported interface bitmap, we can use that during phylink's validation to simplify MAC drivers in two ways by using the supported_interfaces bitmap to: 1. reject unsupported interfaces before calling into the MAC driver. 2. generate the set of all supported link modes across all supported interfaces (used mainly for SFP, but also some 10G PHYs.) Suggested-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: phylink: add MAC phy_interface_t bitmapRussell King1-0/+1
Add a phy_interface_t bitmap so the MAC driver can specifiy which PHY interface modes it supports. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: phy: add phy_interface_t bitmap supportRussell King (Oracle)1-0/+34
Add support for a bitmap for phy interface modes, which includes: - a macro to declare the interface bitmap - an inline helper to zero the interface bitmap - an inline helper to detect an empty interface bitmap - inline helpers to do a bitwise AND and OR operations on two interface bitmaps Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26Merge branch 'dsa-isolation-prep'David S. Miller2-3/+2
Vladimir Oltean says: ==================== DSA preparations for FDB isolation between bridges This series makes 2 small changes to DSA's SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE handler, which will make it possible to offer switch drivers a stable association between a FDB entry and a bridge device in a future series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: dsa: stop calling dev_hold in dsa_slave_fdb_eventVladimir Oltean1-3/+0
Now that we guarantee that SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE events have finished executing by the time we leave our bridge upper interface, we've established a stronger boundary condition for how long the dsa_slave_switchdev_event_work() might run. As such, it is no longer possible for DSA slave interfaces to become unregistered, since they are still bridge ports. So delete the unnecessary dev_hold() and dev_put(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: dsa: flush switchdev workqueue when leaving the bridgeVladimir Oltean1-0/+2
DSA is preparing to offer switch drivers an API through which they can associate each FDB entry with a struct net_device *bridge_dev. This can be used to perform FDB isolation (the FDB lookup performed on the ingress of a standalone, or bridged port, should not find an FDB entry that is present in the FDB of another bridge). In preparation of that work, DSA needs to ensure that by the time we call the switch .port_fdb_add and .port_fdb_del methods, the dp->bridge_dev pointer is still valid, i.e. the port is still a bridge port. This is not guaranteed because the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE API requires drivers that must have sleepable context to handle those events to schedule the deferred work themselves. DSA does this through the dsa_owq. It can happen that a port leaves a bridge, del_nbp() flushes the FDB on that port, SWITCHDEV_FDB_DEL_TO_DEVICE is notified in atomic context, DSA schedules its deferred work, but del_nbp() finishes unlinking the bridge as a master from the port before DSA's deferred work is run. Fundamentally, the port must not be unlinked from the bridge until all FDB deletion deferred work items have been flushed. The bridge must wait for the completion of these hardware accesses. An attempt has been made to address this issue centrally in switchdev by making SWITCHDEV_FDB_DEL_TO_DEVICE deferred (=> blocking) at the switchdev level, which would offer implicit synchronization with del_nbp: https://patchwork.kernel.org/project/netdevbpf/cover/20210820115746.3701811-1-vladimir.oltean@nxp.com/ but it seems that any attempt to modify switchdev's behavior and make the events blocking there would introduce undesirable side effects in other switchdev consumers. The most undesirable behavior seems to be that switchdev_deferred_process_work() takes the rtnl_mutex itself, which would be worse off than having the rtnl_mutex taken individually from drivers which is what we have now (except DSA which has removed that lock since commit 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")). So to offer the needed guarantee to DSA switch drivers, I have come up with a compromise solution that does not require switchdev rework: we already have a hook at the last moment in time when the bridge is still an upper of ours: the NETDEV_PRECHANGEUPPER handler. We can flush the dsa_owq manually from there, which makes all FDB deletions synchronous. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26ifb: Depend on netfilter alternatively to tcLukas Wunner1-1/+1
IFB originally depended on NET_CLS_ACT for traffic redirection. But since v4.5, that may be achieved with NFT_FWD_NETDEV as well. Fixes: 39e6dea28adc ("netfilter: nf_tables: add forward expression to the netdev family") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: <stable@vger.kernel.org> # v4.5+: bcfabee1afd9: netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress Cc: <stable@vger.kernel.org> # v4.5+ Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26mctp: Implement extended addressingJeremy Kerr5-39/+170
This change allows an extended address struct - struct sockaddr_mctp_ext - to be passed to sendmsg/recvmsg. This allows userspace to specify output ifindex and physical address information (for sendmsg) or receive the input ifindex/physaddr for incoming messages (for recvmsg). This is typically used by userspace for MCTP address discovery and assignment operations. The extended addressing facility is conditional on a new sockopt: MCTP_OPT_ADDR_EXT; userspace must explicitly enable addressing before the kernel will consume/populate the extended address data. Includes a fix for an uninitialised var: Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: ax88796c: Remove pointless check in ax88796c_open()Nathan Chancellor1-5/+4
Clang warns: drivers/net/ethernet/asix/ax88796c_main.c:851:24: error: address of array 'ax_local->phydev->advertising' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion] if (ax_local->phydev->advertising && ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~ ~~ advertising cannot be NULL here if ax_local is not NULL, which cannot happen due to the check in ax88796c_probe(). Remove the check. Link: https://github.com/ClangBuiltLinux/linux/issues/1492 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>