summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2023-05-27soc: qcom: socinfo: move SMEM item struct and defines to a headerRobert Marko1-0/+70
Move SMEM item struct and related defines to a header in order to be able to reuse them in the SMEM driver instead of duplicating them. Signed-off-by: Robert Marko <robimarko@gmail.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230526204802.3081168-1-robimarko@gmail.com
2023-05-27Merge tag 'for-netdev' of ↵Jakub Kicinski2-9/+13
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-05-26 We've added 54 non-merge commits during the last 10 day(s) which contain a total of 76 files changed, 2729 insertions(+), 1003 deletions(-). The main changes are: 1) Add the capability to destroy sockets in BPF through a new kfunc, from Aditi Ghag. 2) Support O_PATH fds in BPF_OBJ_PIN and BPF_OBJ_GET commands, from Andrii Nakryiko. 3) Add capability for libbpf to resize datasec maps when backed via mmap, from JP Kobryn. 4) Move all the test kfuncs for CI out of the kernel and into bpf_testmod, from Jiri Olsa. 5) Big batch of xsk selftest improvements to prep for multi-buffer testing, from Magnus Karlsson. 6) Show the target_{obj,btf}_id in tracing link's fdinfo and dump it via bpftool, from Yafang Shao. 7) Various misc BPF selftest improvements to work with upcoming LLVM 17, from Yonghong Song. 8) Extend bpftool to specify netdevice for resolving XDP hints, from Larysa Zaremba. 9) Document masking in shift operations for the insn set document, from Dave Thaler. 10) Extend BPF selftests to check xdp_feature support for bond driver, from Lorenzo Bianconi. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits) bpf: Fix bad unlock balance on freeze_mutex libbpf: Ensure FD >= 3 during bpf_map__reuse_fd() libbpf: Ensure libbpf always opens files with O_CLOEXEC selftests/bpf: Check whether to run selftest libbpf: Change var type in datasec resize func bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command libbpf: Selftests for resizing datasec maps libbpf: Add capability for resizing datasec maps selftests/bpf: Add path_fd-based BPF_OBJ_PIN and BPF_OBJ_GET tests libbpf: Add opts-based bpf_obj_pin() API and add support for path_fd bpf: Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands libbpf: Start v1.3 development cycle bpf: Validate BPF object in BPF_OBJ_PIN before calling LSM bpftool: Specify XDP Hints ifname when loading program selftests/bpf: Add xdp_feature selftest for bond device selftests/bpf: Test bpf_sock_destroy selftests/bpf: Add helper to get port using getsockname bpf: Add bpf_sock_destroy kfunc bpf: Add kfunc filter function to 'struct btf_kfunc_id_set' bpf: udp: Implement batching for sockets iterator ... ==================== Link: https://lore.kernel.org/r/20230526222747.17775-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-27Merge tag 'arm-fixes-6.4-1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "There have not been a lot of fixes for for the soc tree in 6.4, but these have been sitting here for too long. For the devicetree side, there is one minor warning fix for vexpress, the rest all all for the the NXP i.MX platforms: SoC specific bugfixes for the iMX8 clocks and its USB-3.0 gadget device, as well as board specific fixes for regulators and the phy on some of the i.MX boards. The microchip risc-v and arm32 maintainers now also add a shared maintainer file entry for the arm64 parts. The remaining fixes are all for firmware drivers, addressing mistakes in the optee, scmi and ff-a firmware driver implementation, mostly in the error handling code, incorrect use of the alloc_workqueue() interface in SCMI, and compatibility with corner cases of the firmware implementation" * tag 'arm-fixes-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: MAINTAINERS: update arm64 Microchip entries arm64: dts: imx8: fix USB 3.0 Gadget Failure in QM & QXPB0 at super speed dt-binding: cdns,usb3: Fix cdns,on-chip-buff-size type arm64: dts: colibri-imx8x: delete adc1 and dsp arm64: dts: colibri-imx8x: fix iris pinctrl configuration arm64: dts: colibri-imx8x: move pinctrl property from SoM to eval board arm64: dts: colibri-imx8x: fix eval board pin configuration arm64: dts: imx8mp: Fix video clock parents ARM: dts: imx6qdl-mba6: Add missing pvcie-supply regulator ARM: dts: imx6ull-dhcor: Set and limit the mode for PMIC buck 1, 2 and 3 arm64: dts: imx8mn-var-som: fix PHY detection bug by adding deassert delay arm64: dts: imx8mn: Fix video clock parents firmware: arm_ffa: Set reserved/MBZ fields to zero in the memory descriptors firmware: arm_ffa: Fix FFA device names for logical partitions firmware: arm_ffa: Fix usage of partition info get count flag firmware: arm_ffa: Check if ffa_driver remove is present before executing arm64: dts: arm: add missing cache properties ARM: dts: vexpress: add missing cache properties firmware: arm_scmi: Fix incorrect alloc_workqueue() invocation optee: fix uninited async notif value
2023-05-27kallsyms: remove unsed API lookup_symbol_attrsManinder Singh2-15/+0
with commit '7878c231dae0 ("slab: remove /proc/slab_allocators")' lookup_symbol_attrs usage is removed. Thus removing redundant API. Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-05-26overflow: Add struct_size_t() helperKees Cook1-1/+17
While struct_size() is normally used in situations where the structure type already has a pointer instance, there are places where no variable is available. In the past, this has been worked around by using a typed NULL first argument, but this is a bit ugly. Add a helper to do this, and replace the handful of instances of the code pattern with it. Instances were found with this Coccinelle script: @struct_size_t@ identifier STRUCT, MEMBER; expression COUNT; @@ - struct_size((struct STRUCT *)\(0\|NULL\), + struct_size_t(struct STRUCT, MEMBER, COUNT) Suggested-by: Christoph Hellwig <hch@infradead.org> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: James Smart <james.smart@broadcom.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: HighPoint Linux Team <linux@highpoint-tech.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Cc: Don Brace <don.brace@microchip.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Guo Xuenan <guoxuenan@huawei.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Daniel Latypov <dlatypov@google.com> Cc: kernel test robot <lkp@intel.com> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Cc: linux-nvme@lists.infradead.org Cc: linux-scsi@vger.kernel.org Cc: megaraidlinux.pdl@broadcom.com Cc: storagedev@microchip.com Cc: linux-xfs@vger.kernel.org Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230522211810.never.421-kees@kernel.org
2023-05-26HID: ensure timely release of driver-allocated resourcesDmitry Torokhov1-0/+1
More and more drivers rely on devres to manage their resources, however if bus' probe() and release() methods are not trivial and control some of resources as well (for example enable or disable clocks, or attach device to a power domain), we need to make sure that driver-allocated resources are released immediately after driver's remove() method returns, and not postponed until driver core gets around to releasing resources. In case of HID we should not try to close the report and release associated memory until after all devres callbacks are executed. To fix that we open a new devres group before calling driver's probe() and explicitly release it when we return from driver's remove(). This is similar to what we did for I2C bus in commit 5b5475826c52 ("i2c: ensure timely release of driver-allocated resources"). It is tempting to try and move this into driver core, but actually doing so is challenging, we need to split bus' remove() method into pre- and post-remove methods, which would make the logic even less clear. Reported-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20230505232417.1377393-1-swboyd@chromium.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2023-05-26Merge tag 'ffa-fixes-6.4' of ↵Arnd Bergmann1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes Arm FF-A fixes for v6.4 Quite a few fixes to address set of assorted issues: 1. NULL pointer dereference if the ffa driver doesn't provide remove() callback as it is currently executed unconditionally 2. FF-A core probe failure on systems with v1.0 firmware as the new partition info get count flag is used unconditionally 3. Failure to register more than one logical partition or service within the same physical partition as the device name contains only VM ID which will be same for all but each will have unique UUID. 4. Rejection of certain memory interface transmissions by the receivers (secure partitions) as few MBZ fields are non-zero due to lack of explicit re-initialization of those fields * tag 'ffa-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: firmware: arm_ffa: Set reserved/MBZ fields to zero in the memory descriptors firmware: arm_ffa: Fix FFA device names for logical partitions firmware: arm_ffa: Fix usage of partition info get count flag firmware: arm_ffa: Check if ffa_driver remove is present before executing Link: https://lore.kernel.org/r/20230509143453.1188753-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-05-26Merge tag 'gpio-omap-descriptors-v6.5' of ↵Arnd Bergmann6-29/+4
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio into soc/arm This removes all usage of global GPIO numbers from arch/arm/mach-omap[12]. The patches have been reviewed and tested by everyone who showed interest which was one person that tested on OSK1 and Nokia 770, and we smoked out the bugs and also addressed all review comments. Any remaining problems can certainly be fixed in-tree. * tag 'gpio-omap-descriptors-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: ARM/musb: omap2: Remove global GPIO numbers from TUSB6010 ARM: omap2: Rewrite WLAN quirk to use GPIO descriptors ARM: omap2: Get USB hub reset GPIO from descriptor ARM/gpio: Push OMAP2 quirk down into TWL4030 driver ARM: omap1: Exorcise the legacy GPIO header ARM: omap1: Make serial wakeup GPIOs use descriptors ARM: omap1: Fix up the Nokia 770 board device IRQs ARM/mmc: Convert old mmci-omap to GPIO descriptors Input: ads7846 - Convert to use software nodes ARM: omap1: Remove reliance on GPIO numbers from SX1 ARM: omap1: Remove reliance on GPIO numbers from PalmTE ARM: omap1: Drop header on AMS Delta ARM/mfd/gpio: Fixup TPS65010 regression on OMAP1 OSK1 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-05-26arm-cci: add cci_enable_port_for_self prototypeArnd Bergmann1-0/+2
The cci_enable_port_for_self() is called from assembler, so add the prototype only to shut up the W=1 warning: drivers/bus/arm-cci.c:298:25: error: no previous prototype for 'cci_enable_port_for_self' [-Werror=missing-prototypes] Link: https://lore.kernel.org/r/20230516201218.556437-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-05-26ARM: pxa: fix missing-prototypes warningsArnd Bergmann3-0/+23
The PXA platform has a number of configurations that end up with a warning like these when building with W=1: drivers/hwmon/max1111.c:83:5: error: no previous prototype for 'max1111_read_channel' [-Werror=missing-prototypes] arch/arm/mach-pxa/reset.c:86:6: error: no previous prototype for 'pxa_restart' [-Werror=missing-prototypes] arch/arm/mach-pxa/mfp-pxa2xx.c:254:5: error: no previous prototype for 'keypad_set_wake' [-Werror=missing-prototypes] drivers/clk/pxa/clk-pxa25x.c:70:14: error: no previous prototype for 'pxa25x_get_clk_frequency_khz' [-Werror=missing-prototypes] drivers/clk/pxa/clk-pxa25x.c:325:12: error: no previous prototype for 'pxa25x_clocks_init' [-Werror=missing-prototypes] drivers/clk/pxa/clk-pxa27x.c:74:14: error: no previous prototype for 'pxa27x_get_clk_frequency_khz' [-Werror=missing-prototypes] drivers/clk/pxa/clk-pxa27x.c:102:6: error: no previous prototype for 'pxa27x_is_ppll_disabled' [-Werror=missing-prototypes] drivers/clk/pxa/clk-pxa27x.c:470:12: error: no previous prototype for 'pxa27x_clocks_init' [-Werror=missing-prototypes] arch/arm/mach-pxa/pxa27x.c:44:6: error: no previous prototype for 'pxa27x_clear_otgph' [-Werror=missing-prototypes] arch/arm/mach-pxa/pxa27x.c:58:6: error: no previous prototype for 'pxa27x_configure_ac97reset' [-Werror=missing-prototypes] arch/arm/mach-pxa/spitz_pm.c:170:15: error: no previous prototype for 'spitzpm_read_devdata' [-Werror=missing-prototypes] The problem is that there is a declaration for each of these, but it's only seen by the caller and not the callee. Moving these into appropriate header files ensures that both use the same calling conventions and it avoids the warnings. Acked-by: Stephen Boyd <sboyd@kernel.org> Link: https://lore.kernel.org/r/20230516153109.514251-11-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-05-26ARM: davinci: fix davinci_cpufreq_init() declarationArnd Bergmann1-0/+6
The davinci_cpufreq_init() declaration is only seen by its caller but not the definition: drivers/cpufreq/davinci-cpufreq.c:153:12: error: no previous prototype for 'davinci_cpufreq_init' Move it into the platform_data header that is already used an interface between the two places. Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230516153109.514251-2-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-05-26Merge tag 'drm-misc-next-2023-05-24' of ↵Dave Airlie1-53/+2
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v6.5: UAPI Changes: Cross-subsystem Changes: * fbdev: Move framebuffer I/O helpers to <asm/fb.h>, fix naming * firmware: Init sysfb as early as possible Core Changes: * DRM scheduler: Rename interfaces * ttm: Store ttm_device_funcs in .rodata * Replace strlcpy() with strscpy() in various places * Cleanups Driver Changes: * bridge: analogix: Fix endless probe loop; samsung-dsim: Support swapping clock/data polarity; tc358767: Use devm_ Cleanups; * gma500: Fix I/O-memory access * panel: boe-tv101wum-nl6: Improve initialization; sharp-ls043t1le001: Mode fixes; simple: Add BOE EV121WXM-N10-1850 plus DT bindings; AddS6D7AA0 plus DT bindings; Cleanups * ssd1307x: Style fixes * sun4i: Release clocks * msm: Fix I/O-memory access * nouveau: Cleanups * shmobile: Support Renesas; Enable framebuffer console; Various fixes * vkms: Fix RGB565 conversion Signed-off-by: Dave Airlie <airlied@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmRuBXEACgkQaA3BHVML # eiPLkwgAqCa7IuSDQhFMWVOI0EJpPPEHtHM8SCT1Pp8aniXk23Ru+E16c5zck53O # uf4tB+zoFrwD9npy60LIvX1OZmXS1KI4+ZO8itYFk6GSjxqbTWbjNFREBeWFdIpa # OG54nEqjFQZzEXY+gJYDpu5zqLy3xLN07ZgQkcMyfW3O/Krj4LLzfQTDl+jP5wkO # 7/v5Eu5CG5QjupMxIjb4e+ruUflp73pynur5bhZsfS1bPNGFTnxHlwg7NWnBXU7o # Hg23UYfCuZZWPmuO26EeUDlN33rCoaycmVgtpdZft2eznca5Mg74Loz1Qc3GQfjw # LLvKsAIlBcZvEIhElkzhtXitBoe7LQ== # =/9zV # -----END PGP SIGNATURE----- # gpg: Signature made Wed 24 May 2023 22:39:13 AEST # gpg: using RSA key 7217FBAC8CE9CF6344A168E5680DC11D530B7A23 # gpg: Can't check signature: No public key # Conflicts: # MAINTAINERS From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230524124237.GA25416@linux-uq9g
2023-05-26Merge tag 'mlx5-fixes-2023-05-24' of ↵Jakub Kicinski1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2023-05-24 This series includes bug fixes for the mlx5 driver. * tag 'mlx5-fixes-2023-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: Documentation: net/mlx5: Wrap notes in admonition blocks Documentation: net/mlx5: Add blank line separator before numbered lists Documentation: net/mlx5: Use bullet and definition lists for vnic counters description Documentation: net/mlx5: Wrap vnic reporter devlink commands in code blocks net/mlx5: Fix check for allocation failure in comp_irqs_request_pci() net/mlx5: DR, Add missing mutex init/destroy in pattern manager net/mlx5e: Move Ethernet driver debugfs to profile init callback net/mlx5e: Don't attach netdev profile while handling internal error net/mlx5: Fix post parse infra to only parse every action once net/mlx5e: Use query_special_contexts cmd only once per mdev net/mlx5: fw_tracer, Fix event handling net/mlx5: SF, Drain health before removing device net/mlx5: Drain health before unregistering devlink net/mlx5e: Do not update SBCM when prio2buffer command is invalid net/mlx5e: Consider internal buffers size in port buffer calculations net/mlx5e: Prevent encap offload when neigh update is running net/mlx5e: Extract remaining tunnel encap code to dedicated file ==================== Link: https://lore.kernel.org/r/20230525034847.99268-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski5-5/+17
Cross-merge networking fixes after downstream PR. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-3/+15
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/raw.c 3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt protocol") c85be08fc4fa ("raw: Stop using RTO_ONLINK.") https://lore.kernel.org/all/20230525110037.2b532b83@canb.auug.org.au/ Adjacent changes: drivers/net/ethernet/freescale/fec_main.c 9025944fddfe ("net: fec: add dma_wmb to ensure correct descriptor values") 144470c88c5d ("net: fec: using the standard return codes when xdp xmit errors") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-26module: error out early on concurrent load of the same module fileLinus Torvalds1-0/+6
It turns out that udev under certain circumstances will concurrently try to load the same modules over-and-over excessively. This isn't a kernel bug, but it ends up affecting the kernel, to the point that under certain circumstances we can fail to boot, because the kernel uses a lot of memory to read all the module data all at once. Note that it isn't a memory leak, it's just basically a thundering herd problem happening at bootup with a lot of CPUs, with the worst cases then being pretty bad. Admittedly the worst situations are somewhat contrived: lots and lots of CPUs, not a lot of memory, and KASAN enabled to make it all slower and as such (unintentionally) exacerbate the problem. Luis explains: [1] "My best assessment of the situation is that each CPU in udev ends up triggering a load of duplicate set of modules, not just one, but *a lot*. Not sure what heuristics udev uses to load a set of modules per CPU." Petr Pavlu chimes in: [2] "My understanding is that udev workers are forked. An initial kmod context is created by the main udevd process but no sharing happens after the fork. It means that the mentioned memory pool logic doesn't really kick in. Multiple parallel load requests come from multiple udev workers, for instance, each handling an udev event for one CPU device and making the exactly same requests as all others are doing at the same time. The optimization idea would be to recognize these duplicate requests at the udevd/kmod level and converge them" Note that module loading has tried to mitigate this issue before, see for example commit 064f4536d139 ("module: avoid allocation if module is already present and ready"), which has a few ASCII graphs on memory use due to this same issue. However, while that noticed that the module was already loaded, and exited with an error early before spending any more time on setting up the module, it didn't handle the case of multiple concurrent module loads all being active - but not complete - at the same time. Yes, one of them will eventually win the race and finalize its copy, and the others will then notice that the module already exists and error out, but while this all happens, we have tons of unnecessary concurrent work being done. Again, the real fix is for udev to not do that (maybe it should use threads instead of fork, and have actual shared data structures and not cause duplicate work). That real fix is apparently not trivial. But it turns out that the kernel already has a pretty good model for dealing with concurrent access to the same file: the i_writecount of the inode. In fact, the module loading already indirectly uses 'i_writecount' , because 'kernel_file_read()' will in fact do ret = deny_write_access(file); if (ret) return ret; ... allow_write_access(file); around the read of the file data. We do not allow concurrent writes to the file, and return -ETXTBUSY if the file was open for writing at the same time as the module data is loaded from it. And the solution to the reader concurrency problem is to simply extend this "no concurrent writers" logic to simply be "exclusive access". Note that "exclusive" in this context isn't really some absolute thing: it's only exclusion from writers and from other "special readers" that do this writer denial. So we simply introduce a variation of that "deny_write_access()" logic that not only denies write access, but also requires that this is the _only_ such access that denies write access. Which means that you can't start loading a module that is already being loaded as a module by somebody else, or you will get the same -ETXTBSY error that you would get if there were writers around. [ It also means that you can't try to load a currently executing executable as a module, for the same reason: executables do that same "deny_write_access()" thing, and that's obviously where the whole ETXTBSY logic traditionally came from. This is not a problem for kernel modules, since the set of normal executable files and kernel module files is entirely disjoint. ] This new function is called "exclusive_deny_write_access()", and the implementation is trivial, in that it's just an atomic decrement of i_writecount if it was 0 before. To use that new exclusivity check, all we then do is wrap the module loading with that exclusive_deny_write_access()() / allow_write_access() pair. The actual patch is a bit bigger than that, because we want to surround not just the "load file data" part, but the whole module setup, to get maximum exclusion. So this ends up splitting up "finit_module()" into a few helper functions to make it all very clear and legible. In Luis' test-case (bringing up 255 vcpu's in a virtual machine [3]), the "wasted vmalloc" space (ie module data read into a vmalloc'ed area in order to be loaded as a module, but then discarded because somebody else loaded the same module instead) dropped from 1.8GiB to 474kB. Yes, that's gigabytes to kilobytes. It doesn't drop completely to zero, because even with this change, you can still end up having completely serial pointless module loads, where one udev process has loaded a module fully (and thus the kernel has released that exclusive lock on the module file), and then another udev process tries to load the same module again. So while we cannot fully get rid of the fundamental bug in user space, we _can_ get rid of the excessive concurrent thundering herd effect. A couple of final side notes on this all: - This tweak only affects the "finit_module()" system call, which gives the kernel a file descriptor with the module data. You can also just feed the module data as raw data from user space with "init_module()" (note the lack of 'f' at the beginning), and obviously for that case we do _not_ have any "exclusive read" logic. So if you absolutely want to do things wrong in user space, and try to load the same module multiple times, and error out only later when the kernel ends up saying "you can't load the same module name twice", you can still do that. And in fact, some distros will do exactly that, because they will uncompress the kernel module data in user space before feeding it to the kernel (mainly because they haven't started using the new kernel side decompression yet). So this is not some absolute "you can't do concurrent loads of the same module". It's literally just a very simple heuristic that will catch it early in case you try to load the exact same module file at the same time, and in that case avoid a potentially nasty situation. - There is another user of "deny_write_access()": the verity code that enables fs-verity on a file (the FS_IOC_ENABLE_VERITY ioctl). If you use fs-verity and you care about verifying the kernel modules (which does make sense), you should do it *before* loading said kernel module. That may sound obvious, but now the implementation basically requires it. Because if you try to do it concurrently, the kernel may refuse to load the module file that is being set up by the fs-verity code. - This all will obviously mean that if you insist on loading the same module in parallel, only one module load will succeed, and the others will return with an error. That was true before too, but what is different is that the -ETXTBSY error can be returned *before* the success case of another process fully loading and instantiating the module. Again, that might sound obvious, and it is indeed the whole point of the whole change: we are much quicker to notice the whole "you're already in the process of loading this module". So it's very much intentional, but it does mean that if you just spray the kernel with "finit_module()", and expect that the module is immediately loaded afterwards without checking the return value, you are doing something horribly horribly wrong. I'd like to say that that would never happen, but the whole _reason_ for this commit is that udev is currently doing something horribly horribly wrong, so ... Link: https://lore.kernel.org/all/ZEGopJ8VAYnE7LQ2@bombadil.infradead.org/ [1] Link: https://lore.kernel.org/all/23bd0ce6-ef78-1cd8-1f21-0e706a00424a@suse.com/ [2] Link: https://lore.kernel.org/lkml/ZG%2Fa+nrt4%2FAAUi5z@bombadil.infradead.org/ [3] Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-25Merge tag 'vfs/v6.4-rc3/misc.fixes' of ↵Linus Torvalds1-21/+21
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - During the acl rework we merged this cycle the generic_listxattr() helper had to be modified in a way that in principle it would allow for POSIX ACLs to be reported. At least that was the impression we had initially. Because before the acl rework POSIX ACLs would be reported if the filesystem did have POSIX ACL xattr handlers in sb->s_xattr. That logic changed and now we can simply check whether the superblock has SB_POSIXACL set and if the inode has inode->i_{default_}acl set report the appropriate POSIX ACL name. However, we didn't realize that generic_listxattr() was only ever used by two filesystems. Both of them don't support POSIX ACLs via sb->s_xattr handlers and so never reported POSIX ACLs via generic_listxattr() even if they raised SB_POSIXACL and did contain inodes which had acls set. The example here is nfs4. As a result, generic_listxattr() suddenly started reporting POSIX ACLs when it wouldn't have before. Since SB_POSIXACL implies that the umask isn't stripped in the VFS nfs4 can't just drop SB_POSIXACL from the superblock as it would also alter umask handling for them. So just have generic_listxattr() not report POSIX ACLs as it never did anyway. It's documented as such. - Our SB_* flags currently use a signed integer and we shift the last bit causing UBSAN to complain about undefined behavior. Switch to using unsigned. While the original patch used an explicit unsigned bitshift it's now pretty common to rely on the BIT() macro in a lot of headers nowadays. So the patch has been adjusted to use that. - Add Namjae as ntfs reviewer. They're already active this cycle so let's make it explicit right now. * tag 'vfs/v6.4-rc3/misc.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: ntfs: Add myself as a reviewer fs: don't call posix_acl_listxattr in generic_listxattr fs: fix undefined behavior in bit shift for SB_NOUSER
2023-05-25Merge tag 'net-6.4-rc4' of ↵Linus Torvalds4-3/+15
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth and bpf. Current release - regressions: - net: fix skb leak in __skb_tstamp_tx() - eth: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs Current release - new code bugs: - handshake: - fix sock->file allocation - fix handshake_dup() ref counting - bluetooth: - fix potential double free caused by hci_conn_unlink - fix UAF in hci_conn_hash_flush Previous releases - regressions: - core: fix stack overflow when LRO is disabled for virtual interfaces - tls: fix strparser rx issues - bpf: - fix many sockmap/TCP related issues - fix a memory leak in the LRU and LRU_PERCPU hash maps - init the offload table earlier - eth: mlx5e: - do as little as possible in napi poll when budget is 0 - fix using eswitch mapping in nic mode - fix deadlock in tc route query code Previous releases - always broken: - udplite: fix NULL pointer dereference in __sk_mem_raise_allocated() - raw: fix output xfrm lookup wrt protocol - smc: reset connection when trying to use SMCRv2 fails - phy: mscc: enable VSC8501/2 RGMII RX clock - eth: octeontx2-pf: fix TSOv6 offload - eth: cdc_ncm: deal with too low values of dwNtbOutMaxSize" * tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits) udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated(). net: phy: mscc: enable VSC8501/2 RGMII RX clock net: phy: mscc: remove unnecessary phydev locking net: phy: mscc: add support for VSC8501 net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE net/handshake: Enable the SNI extension to work properly net/handshake: Unpin sock->file if a handshake is cancelled net/handshake: handshake_genl_notify() shouldn't ignore @flags net/handshake: Fix uninitialized local variable net/handshake: Fix handshake_dup() ref counting net/handshake: Remove unneeded check from handshake_dup() ipv6: Fix out-of-bounds access in ipv6_find_tlv() net: ethernet: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs docs: netdev: document the existence of the mail bot net: fix skb leak in __skb_tstamp_tx() r8169: Use a raw_spinlock_t for the register locks. page_pool: fix inconsistency for page_pool_ring_[un]lock() bpf, sockmap: Test progs verifier error with latest clang bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer ...
2023-05-25Merge tag 'for-v6.4-rc' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply fixes from Sebastian Reichel: - Fix power_supply_get_battery_info for devices without parent devices resulting in NULL pointer dereference - Fix desktop systems reporting to run on battery once a power-supply device with device scope appears (e.g. a HID keyboard with a battery) - Ratelimit debug print about driver not providing data - Fix race condition related to external_power_changed in multiple drivers (ab8500, axp288, bq25890, sc27xx, bq27xxx) - Fix LED trigger switching from blinking to solid-on when charging finishes - Fix multiple races in bq27xxx battery driver - mt6360: handle potential ENOMEM from devm_work_autocancel - sbs-charger: Fix SBS_CHARGER_STATUS_CHARGE_INHIBITED bit - rt9467: avoid passing 0 to dev_err_probe * tag 'for-v6.4-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (21 commits) power: supply: Fix logic checking if system is running from battery power: supply: mt6360: add a check of devm_work_autocancel in mt6360_charger_probe power: supply: sbs-charger: Fix INHIBITED bit for Status reg power: supply: rt9467: Fix passing zero to 'dev_err_probe' power: supply: Ratelimit no data debug output power: supply: Fix power_supply_get_battery_info() if parent is NULL power: supply: bq24190: Call power_supply_changed() after updating input current power: supply: bq25890: Call power_supply_changed() after updating input current or voltage power: supply: bq27xxx: Use mod_delayed_work() instead of cancel() + schedule() power: supply: bq27xxx: After charger plug in/out wait 0.5s for things to stabilize power: supply: bq27xxx: Ensure power_supply_changed() is called on current sign changes power: supply: bq27xxx: Move bq27xxx_battery_update() down power: supply: bq27xxx: Add cache parameter to bq27xxx_battery_current_and_status() power: supply: bq27xxx: Fix poll_interval handling and races on remove power: supply: bq27xxx: Fix I2C IRQ race on remove power: supply: bq27xxx: Fix bq27xxx_battery_update() race condition power: supply: leds: Fix blink to LED on transition power: supply: sc27xx: Fix external_power_changed race power: supply: bq25890: Fix external_power_changed race power: supply: axp288_fuel_gauge: Fix external_power_changed race ...
2023-05-25mmc: sdio: Add/rename SDIO ID of the RTL8723DS SDIO wifi cardsMartin Blumenstingl1-1/+2
RTL8723DS comes in two variant and each of them has their own SDIO ID: - 0xd723 can connect two antennas. The WiFi part is still 1x1 so the second antenna can be dedicated to Bluetooth - 0xd724 can only connect one antenna so it's shared between WiFi and Bluetooth Add a new entry for the single antenna RTL8723DS (0xd724) which can be found on the MangoPi MQ-Quad. Also rename the existing RTL8723DS entry (0xd723) so it's name reflects that it's the variant with support for two antennas. Reviewed-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230522202425.1827005-4-martin.blumenstingl@googlemail.com
2023-05-25io_uring/cmd: add cmd lazy tw wake helperPavel Begunkov1-2/+16
We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new cmd tw helper accepting TWQ flags, and then add io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and imply the "lazy" semantics, i.e. it posts no more than 1 CQE and delaying execution of this tw should not prevent forward progress. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5b9f6716006df7e817f18bd555aee2f8f9c8b0c3.1684154817.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-25mfd: axp20x: Add support for AXP313a PMICMartin Botka1-0/+32
The AXP313a is a PMIC chip produced by X-Powers, it can be connected via an I2C bus. The name AXP1530 seems to appear as well, and this is what is used in the BSP driver. From all we know it's the same chip, just a different name. However we have only seen AXP313a chips in the wild, so go with this name. Compared to the other AXP PMICs it's a rather simple affair: just three DCDC converters, three LDOs, and no battery charging support. Describe the regmap and the MFD bits, along with the registers exposed via I2C. Aside from the various regulators, also describe the power key interrupts, and adjust the shutdown handler routine to use a different register than the other PMICs. Eventually advertise the device using the new compatible string. Signed-off-by: Martin Botka <martin.botka@somainline.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Chen-Yu Tsai <wens@csie.org> Link: https://lore.kernel.org/r/20230524000012.15028-2-andre.przywara@arm.com Signed-off-by: Lee Jones <lee@kernel.org>
2023-05-25leds: Fix oops about sleeping in led_trigger_blink()Hans de Goede1-0/+24
led_trigger_blink() calls led_blink_set() from a RCU read-side critical section so led_blink_set() must not sleep. Note sleeping was not allowed before the switch to RCU either because a spinlock was held before. led_blink_set() does not sleep when sw-blinking is used, but many LED controller drivers with hw blink support have a blink_set function which may sleep, leading to an oops like this one: [ 832.605062] ------------[ cut here ]------------ [ 832.605085] Voluntary context switch within RCU read-side critical section! [ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690 <snip> [ 832.606453] Call Trace: [ 832.606466] <TASK> [ 832.606487] __schedule+0x9f/0x1480 [ 832.606527] schedule+0x5d/0xe0 [ 832.606549] schedule_timeout+0x79/0x140 [ 832.606572] ? __pfx_process_timeout+0x10/0x10 [ 832.606599] wait_for_completion_timeout+0x6f/0x140 [ 832.606627] i2c_dw_xfer+0x101/0x460 [ 832.606659] ? psi_group_change+0x168/0x400 [ 832.606680] __i2c_transfer+0x172/0x6d0 [ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0 [ 832.606732] ? __schedule+0x430/0x1480 [ 832.606753] ? preempt_count_add+0x6a/0xa0 [ 832.606778] ? get_nohz_timer_target+0x18/0x190 [ 832.606796] ? lock_timer_base+0x61/0x80 [ 832.606817] ? preempt_count_add+0x6a/0xa0 [ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0 [ 832.606862] i2c_smbus_xfer+0x66/0xf0 [ 832.606882] i2c_smbus_read_byte_data+0x41/0x70 [ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40 [ 832.606922] ? __pm_runtime_suspend+0x46/0xc0 [ 832.606946] cht_wc_byte_reg_read+0x2e/0x60 [ 832.606972] _regmap_read+0x5c/0x120 [ 832.606997] _regmap_update_bits+0x96/0xc0 [ 832.607023] regmap_update_bits_base+0x5b/0x90 [ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove] [ 832.607094] led_blink_setup+0x28/0x100 [ 832.607119] led_trigger_blink+0x40/0x70 [ 832.607145] power_supply_update_leds+0x1b7/0x1c0 [ 832.607174] power_supply_changed_work+0x67/0xe0 [ 832.607198] process_one_work+0x1c8/0x3c0 [ 832.607222] worker_thread+0x4d/0x380 [ 832.607243] ? __pfx_worker_thread+0x10/0x10 [ 832.607258] kthread+0xe9/0x110 [ 832.607279] ? __pfx_kthread+0x10/0x10 [ 832.607300] ret_from_fork+0x2c/0x50 [ 832.607337] </TASK> [ 832.607344] ---[ end trace 0000000000000000 ]--- Add a new led_blink_set_nosleep() function which defers the actual led_blink_set() call to a workqueue when necessary to fix this. This also fixes an existing race where a pending led_set_brightness() has been deferred to set_brightness_work and might then race with a later led_cdev->blink_set() call. Note this race is only an issue with triggers mixing led_trigger_event() and led_trigger_blink() calls, sysfs API calls and led_trigger_blink_oneshot() are not affected. Note rather then adding a separate blink_set_blocking callback this uses the presence of the already existing brightness_set_blocking callback to detect if the blinking call should be deferred to set_brightness_work. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com> Tested-by: Yauhen Kharuzhy <jekhor@gmail.com> Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com Signed-off-by: Lee Jones <lee@kernel.org>
2023-05-25leds: Fix set_brightness_delayed() raceHans de Goede1-0/+3
When a trigger wants to switch from blinking to LED on it needs to call: led_set_brightness(LED_OFF); led_set_brightness(LED_FULL); To first call disables blinking and the second then turns the LED on (the power-supply charging-blink-full-solid triggers do this). These calls happen immediately after each other, so it is possible that set_brightness_delayed() from the first call has not run yet when the led_set_brightness(LED_FULL) call finishes. If this race hits then this is causing problems for both sw- and hw-blinking: For sw-blinking set_brightness_delayed() clears delayed_set_value when LED_BLINK_DISABLE is set causing the led_set_brightness(LED_FULL) call effects to get lost when hitting the race, resulting in the LED turning off instead of on. For hw-blinking if the race hits delayed_set_value has been set to LED_FULL by the time set_brightness_delayed() runs. So led_cdev->brightness_set_blocking() is never called with LED_OFF as argument and the hw-blinking is never disabled leaving the LED blinking instead of on. Fix both issues by adding LED_SET_BRIGHTNESS and LED_SET_BRIGHTNESS_OFF work_flags making this 2 separate actions to be run by set_brightness_delayed(). Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com> Tested-by: Yauhen Kharuzhy <jekhor@gmail.com> Link: https://lore.kernel.org/r/20230510162234.291439-3-hdegoede@redhat.com Signed-off-by: Lee Jones <lee@kernel.org>
2023-05-25leds: Change led_trigger_blink[_oneshot]() delay parameters to pass-by-valueHans de Goede1-8/+8
led_blink_set[_oneshot]()'s delay_on and delay_off function parameters are pass by reference, so that hw-blink implementations can report back the actual achieved delays when the values have been rounded to something the hw supports. This is really only interesting for the sysfs API / the timer trigger. Other triggers don't really care about this and none of the callers of led_trigger_blink[_oneshot]() do anything with the returned delay values. Change the led_trigger_blink[_oneshot]() delay parameters to pass-by-value, there are 2 reasons for this: 1. led_cdev->blink_set() may sleep, while led_trigger_blink() may not. So on hw where led_cdev->blink_set() sleeps the call needs to be deferred to a workqueue, in which case the actual achieved delays are unknown (this is a preparation patch for the deferring). 2. Since the callers don't care about the actual achieved delays, allowing callers to directly pass a value leads to simpler code for most callers. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com> Tested-by: Yauhen Kharuzhy <jekhor@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Acked-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230510162234.291439-2-hdegoede@redhat.com Signed-off-by: Lee Jones <lee@kernel.org>
2023-05-25leds: lp55xx: Configure internal charge pumpMaarten Zanders1-0/+3
The LP55xx range of devices have an internal charge pump which can (automatically) increase the output voltage towards the LED's, boosting the output voltage to 4.5V. Implement this option from the devicetree. When the setting is not present it will operate in automatic mode as before. Tested on LP55231. Datasheet analysis shows that LP5521, LP5523 and LP8501 are identical in topology and are modified in the same way. Signed-off-by: Maarten Zanders <maarten.zanders@mind.be> Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20230421075305.37597-3-maarten.zanders@mind.be
2023-05-25efi: fix missing prototype warningsArnd Bergmann2-0/+8
The cper.c file needs to include an extra header, and efi_zboot_entry needs an extern declaration to avoid these 'make W=1' warnings: drivers/firmware/efi/libstub/zboot.c:65:1: error: no previous prototype for 'efi_zboot_entry' [-Werror=missing-prototypes] drivers/firmware/efi/efi.c:176:16: error: no previous prototype for 'efi_attr_is_visible' [-Werror=missing-prototypes] drivers/firmware/efi/cper.c:626:6: error: no previous prototype for 'cper_estatus_print' [-Werror=missing-prototypes] drivers/firmware/efi/cper.c:649:5: error: no previous prototype for 'cper_estatus_check_header' [-Werror=missing-prototypes] drivers/firmware/efi/cper.c:662:5: error: no previous prototype for 'cper_estatus_check' [-Werror=missing-prototypes] To make this easier, move the cper specific declarations to include/linux/cper.h. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-05-25Merge tag 'for-netdev' of ↵Jakub Kicinski1-2/+1
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-05-24 We've added 19 non-merge commits during the last 10 day(s) which contain a total of 20 files changed, 738 insertions(+), 448 deletions(-). The main changes are: 1) Batch of BPF sockmap fixes found when running against NGINX TCP tests, from John Fastabend. 2) Fix a memleak in the LRU{,_PERCPU} hash map when bucket locking fails, from Anton Protopopov. 3) Init the BPF offload table earlier than just late_initcall, from Jakub Kicinski. 4) Fix ctx access mask generation for 32-bit narrow loads of 64-bit fields, from Will Deacon. 5) Remove a now unsupported __fallthrough in BPF samples, from Andrii Nakryiko. 6) Fix a typo in pkg-config call for building sign-file, from Jeremy Sowden. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, sockmap: Test progs verifier error with latest clang bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0 bpf, sockmap: Build helper to create connected socket pair bpf, sockmap: Pull socket helpers out of listen test for general use bpf, sockmap: Incorrectly handling copied_seq bpf, sockmap: Wake up polling after data copy bpf, sockmap: TCP data stall on recv before accept bpf, sockmap: Handle fin correctly bpf, sockmap: Improved check for empty queue bpf, sockmap: Reschedule is now done through backlog bpf, sockmap: Convert schedule_work into delayed_work bpf, sockmap: Pass skb ownership through read_skb bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields samples/bpf: Drop unnecessary fallthrough bpf: netdev: init the offload table earlier selftests/bpf: Fix pkg-config call building sign-file ==================== Link: https://lore.kernel.org/r/20230524170839.13905-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-25net/mlx5e: Use query_special_contexts cmd only once per mdevDragos Tatulea1-0/+1
Don't query the firmware so many times (num rqs * num wqes * wqe frags) because it slows down linearly the interface creation time when the product is larger. Do it only once per mdev and store the result in mlx5e_param. Due to helper function being called from different files, move it to an appropriate location. Rename the function with a proper prefix and add a small cleanup. This fix applies only for legacy rq. Fixes: 1b1e4868836a ("net/mlx5e: Use query_special_contexts for mkeys") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-24PM: suspend: add a arch_resume_nosmt() prototypeArnd Bergmann1-0/+2
The arch_resume_nosmt() has a __weak definition, plus an x86 specific override, but no prototype that ensures the two have the same arguments. This causes a W=1 warning: arch/x86/power/hibernate.c:189:5: error: no previous prototype for 'arch_resume_nosmt' [-Werror=missing-prototypes] Add the prototype in linux/suspend.h, which is included in both places. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24PM: suspend: Fix pm_suspend_target_state handling for !CONFIG_PMKai-Heng Feng1-1/+3
Move the pm_suspend_target_state definition for CONFIG_SUSPEND unset from the wakeup code into the headers so as to allow it to still be used elsewhere when CONFIG_SUSPEND is not set. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> [ rjw: Changelog and subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Introduce core support for TPMI interfaceZhang Rui1-0/+5
Compared with existing RAPL MSR/MMIO Interface, the RAPL TPMI Interface 1. has per Power Limit register, thus has per Power Limit Lock and Enable bit. 2. doesn't have Power Limit Clamp bit. 3. the Power Limit Lock and Enable bits have different bit offsets. These mean RAPL TPMI Interface needs its own primitive information. RAPL TPMI Interface also has per domain unit register but with a different register layout. This requires a TPMI specific rapl_defaults call to decode the unit register. Introduce the RAPL core support for TPMI Interface. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Introduce RAPL I/F typeZhang Rui1-0/+6
Different RAPL Interfaces may have different primitive information and rapl_defaults calls. To better distinguish this difference in the RAPL framework code, introduce a new enum to represent different types of RAPL Interfaces. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Make cpu optional for rapl_packageZhang Rui1-4/+4
MSR RAPL Interface always removes a rapl_package when all the CPUs in that rapl_package are offlined. This is because it relies on an online CPU to access the MSR. But for RAPL Interface using MMIO registers, when all the cpus within the rapl_package are offlined, 1. the register can still be accessed 2. monitoring and setting the Power Pimits for the rapl_package is still meaningful because of uncore power. This means that, a valid rapl_package doesn't rely on one or more cpus being onlined. For this sense, make cpu optional for rapl_package. A rapl_package can be registered either using a CPU id to represent the physical package/die, or using the physical package id directly. Note that, the thermal throttling interrupt is not disabled via MSR_IA32_PACKAGE_THERM_INTERRUPT for such rapl_package at the moment. If it is still needed in the future, this can be achieved by selecting an onlined CPU using the physical package id. Note that, processor_thermal_rapl, the current MMIO RAPL Interface driver, can also be converted to register using a package id instead. But this is not done right now because processor_thermal_rapl driver works on single-package systems only, and offlining the only package will not happen. So keep the previous logic. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Add support for lock bit per Power LimitZhang Rui1-0/+2
With RAPL MSR/MMIO Interface, each RAPL domain has one Power Limit register. Each Power Limit register has one lock bit which tells the OS if the power limit register can be used or not. Depending on the number of power limits supported by the power limit register, the lock bit may apply to one or more power limits. With RAPL TPMI Interface, each RAPL domain has multiple Power Limits, and each Power Limit has its own register, with a lock bit. To handle this, introduce support for lock bit per Power Limit. For existing RAPL MSR/MMIO Interfaces, the lock bit in the Power Limit register applies to all the Power Limits controlled by this register. Remove the per domain DOMAIN_STATE_BIOS_LOCKED flag at the same time because it can be replaced by the per Power Limit lock. No functional change intended. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Cleanup Power Limits supportZhang Rui1-1/+0
The same set of operations are shared by different Powert Limits, including Power Limit get/set, Power Limit enable/disable, clamping enable/disable, time window get/set, and max power get/set, etc. But the same operation for different Power Limit has different primitives because they use different registers/register bits. A lot of dirty/duplicate code was introduced to handle this difference. Introduce a universal way to issue Power Limit operations. Instead of using hardcoded primitive name directly, use Power Limit id + operation type, and hide all the Power Limit difference details in a central place, get_pl_prim(). Two helpers, rapl_read_pl_data() and rapl_write_pl_data(), are introduced at the same time to simplify the code for issuing Power Limit operations. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Change primitive orderZhang Rui1-2/+3
The same set of operations are shared by different Powert Limits, including Power Limit get/set, Power Limit enable/disable, clamping enable/disable, time window get/set, and max power get/set, etc. But the same operation for different Power Limit has different primitives because they use different registers/register bits. A lot of dirty/duplicate code was introduced to handle this difference. Instead of using hardcoded primitive name directly, using Power Limit id + operation type is much cleaner. For this sense, move POWER_LIMIT1/POWER_LIMIT2/POWER_LIMIT4 to the beginning of enum rapl_primitives so that they can be reused as Power Limit ids. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Support per domain energy/power/time unitZhang Rui1-4/+4
RAPL MSR/MMIO Interface has package scope unit register but some RAPL domains like Dram/Psys may use a fixed energy unit value instead of the default unit value on certain platforms. RAPL TPMI Interface supports per domain unit register. For the above reasons, add support for per domain unit register and per domain energy/power/time unit. When per domain unit register is not available, use the package scope unit register as the per domain unit register for each RAPL domain so that this change is transparent to MSR/MMIO Interface. No functional change intended. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Support per Interface primitive informationZhang Rui1-0/+2
RAPL primitive information is Interface specific. Although current MSR and MMIO Interface share the same RAPL primitives, new Interface like TPMI has its own RAPL primitive information. Save the primitive information in the Interface private structure. Plus, using variant name "rp" for struct rapl_primitive_info is confusing because "rp" is also used for struct rapl_package. Use "rpi" as the variant name for struct rapl_primitive_info, and rename the previous rpi[] array to avoid conflict. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Support per Interface rapl_defaultsZhang Rui1-0/+2
rapl_defaults is Interface specific. Although current MSR and MMIO Interface share the same rapl_defaults, new Interface like TPMI need its own rapl_defaults callbacks. Save the rapl_defaults information in the Interface private structure. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Remove unused field in struct rapl_if_privZhang Rui1-1/+0
After commit f1e8d7560d30 ("powercap/intel_rapl: enumerate Psys RAPL domain together with package RAPL domain"), the platform_rapl_domain field is not used anymore. Remove it from rapl_if_priv structure. Fixes: f1e8d7560d30 ("powercap/intel_rapl: enumerate Psys RAPL domain together with package RAPL domain") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24net: phylink: add function to resolve clause 73 negotiationRussell King (Oracle)1-0/+2
Add a function to resolve clause 73 negotiation according to the priority resolution function described in clause 73.3.6. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24net: mdio: add clause 73 to ethtool conversion helperRussell King (Oracle)1-0/+39
Add a helper to convert a clause 73 advertisement to an ethtool bitmap. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24x86/pci/xen: populate MSI sysfs entriesMaximilian Heyne1-1/+8
Commit bf5e758f02fc ("genirq/msi: Simplify sysfs handling") reworked the creation of sysfs entries for MSI IRQs. The creation used to be in msi_domain_alloc_irqs_descs_locked after calling ops->domain_alloc_irqs. Then it moved into __msi_domain_alloc_irqs which is an implementation of domain_alloc_irqs. However, Xen comes with the only other implementation of domain_alloc_irqs and hence doesn't run the sysfs population code anymore. Commit 6c796996ee70 ("x86/pci/xen: Fixup fallout from the PCI/MSI overhaul") set the flag MSI_FLAG_DEV_SYSFS for the xen msi_domain_info but that doesn't actually have an effect because Xen uses it's own domain_alloc_irqs implementation. Fix this by making use of the fallback functions for sysfs population. Fixes: bf5e758f02fc ("genirq/msi: Simplify sysfs handling") Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230503131656.15928-1-mheyne@amazon.de Signed-off-by: Juergen Gross <jgross@suse.com>
2023-05-24block: Add BIO_PAGE_PINNED and associated infrastructureDavid Howells2-1/+3
Add BIO_PAGE_PINNED to indicate that the pages in a bio are pinned (FOLL_PIN) and that the pin will need removing. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Jan Kara <jack@suse.cz> cc: Matthew Wilcox <willy@infradead.org> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-block@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230522205744.2825689-5-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logicChristoph Hellwig2-2/+2
Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted meaning is only set when a page reference has been acquired that needs to be released by bio_release_pages(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Jan Kara <jack@suse.cz> cc: Matthew Wilcox <willy@infradead.org> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-block@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230522205744.2825689-4-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24block: Fix bio_flagged() so that gcc can better optimise itDavid Howells1-1/+1
Fix bio_flagged() so that multiple instances of it, such as: if (bio_flagged(bio, BIO_PAGE_REFFED) || bio_flagged(bio, BIO_PAGE_PINNED)) can be combined by the gcc optimiser into a single test in assembly (arguably, this is a compiler optimisation issue[1]). The missed optimisation stems from bio_flagged() comparing the result of the bitwise-AND to zero. This results in an out-of-line bio_release_page() being compiled to something like: <+0>: mov 0x14(%rdi),%eax <+3>: test $0x1,%al <+5>: jne 0xffffffff816dac53 <bio_release_pages+11> <+7>: test $0x2,%al <+9>: je 0xffffffff816dac5c <bio_release_pages+20> <+11>: movzbl %sil,%esi <+15>: jmp 0xffffffff816daba1 <__bio_release_pages> <+20>: jmp 0xffffffff81d0b800 <__x86_return_thunk> However, the test is superfluous as the return type is bool. Removing it results in: <+0>: testb $0x3,0x14(%rdi) <+4>: je 0xffffffff816e4af4 <bio_release_pages+15> <+6>: movzbl %sil,%esi <+10>: jmp 0xffffffff816dab7c <__bio_release_pages> <+15>: jmp 0xffffffff81d0b7c0 <__x86_return_thunk> instead. Also, the MOVZBL instruction looks unnecessary[2] - I think it's just 're-booling' the mark_dirty parameter. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Jens Axboe <axboe@kernel.dk> cc: linux-block@vger.kernel.org Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370 [1] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371 [2] Link: https://lore.kernel.org/r/167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk/ # v6 Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230522205744.2825689-3-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24Merge branch 'for-6.5/splice' into for-6.5/blockJens Axboe3-19/+6
Merge splice bits as subsequent block cleanups and improvements for DIO depend on them. * for-6.5/splice: (31 commits) splice: kdoc for filemap_splice_read() and copy_splice_read() iov_iter: Kill ITER_PIPE splice: Remove generic_file_splice_read() splice: Use filemap_splice_read() instead of generic_file_splice_read() cifs: Use filemap_splice_read() trace: Convert trace/seq to use copy_splice_read() zonefs: Provide a splice-read wrapper xfs: Provide a splice-read wrapper orangefs: Provide a splice-read wrapper ocfs2: Provide a splice-read wrapper ntfs3: Provide a splice-read wrapper nfs: Provide a splice-read wrapper f2fs: Provide a splice-read wrapper ext4: Provide a splice-read wrapper ecryptfs: Provide a splice-read wrapper ceph: Provide a splice-read wrapper afs: Provide a splice-read wrapper 9p: Add splice_read wrapper net: Make sock_splice_read() use copy_splice_read() by default tty, proc, kernfs, random: Use copy_splice_read() ...
2023-05-24iov_iter: Kill ITER_PIPEDavid Howells1-14/+0
The ITER_PIPE-type iterator was only used by generic_file_splice_read() and that has been replaced and removed. This leaves ITER_PIPE unused - so remove it too. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230522135018.2742245-31-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24splice: Remove generic_file_splice_read()David Howells1-2/+0
Remove generic_file_splice_read() as it has been replaced with calls to filemap_splice_read() and copy_splice_read(). With this, ITER_PIPE is no longer used. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Steve French <smfrench@gmail.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230522135018.2742245-30-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>