starfive-tech/linux.git - StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2023-05-26	module: error out early on concurrent load of the same module file	Linus Torvalds	1	-0/+6
	It turns out that udev under certain circumstances will concurrently try to load the same modules over-and-over excessively. This isn't a kernel bug, but it ends up affecting the kernel, to the point that under certain circumstances we can fail to boot, because the kernel uses a lot of memory to read all the module data all at once. Note that it isn't a memory leak, it's just basically a thundering herd problem happening at bootup with a lot of CPUs, with the worst cases then being pretty bad. Admittedly the worst situations are somewhat contrived: lots and lots of CPUs, not a lot of memory, and KASAN enabled to make it all slower and as such (unintentionally) exacerbate the problem. Luis explains: [1] "My best assessment of the situation is that each CPU in udev ends up triggering a load of duplicate set of modules, not just one, but a lot. Not sure what heuristics udev uses to load a set of modules per CPU." Petr Pavlu chimes in: [2] "My understanding is that udev workers are forked. An initial kmod context is created by the main udevd process but no sharing happens after the fork. It means that the mentioned memory pool logic doesn't really kick in. Multiple parallel load requests come from multiple udev workers, for instance, each handling an udev event for one CPU device and making the exactly same requests as all others are doing at the same time. The optimization idea would be to recognize these duplicate requests at the udevd/kmod level and converge them" Note that module loading has tried to mitigate this issue before, see for example commit 064f4536d139 ("module: avoid allocation if module is already present and ready"), which has a few ASCII graphs on memory use due to this same issue. However, while that noticed that the module was already loaded, and exited with an error early before spending any more time on setting up the module, it didn't handle the case of multiple concurrent module loads all being active - but not complete - at the same time. Yes, one of them will eventually win the race and finalize its copy, and the others will then notice that the module already exists and error out, but while this all happens, we have tons of unnecessary concurrent work being done. Again, the real fix is for udev to not do that (maybe it should use threads instead of fork, and have actual shared data structures and not cause duplicate work). That real fix is apparently not trivial. But it turns out that the kernel already has a pretty good model for dealing with concurrent access to the same file: the i_writecount of the inode. In fact, the module loading already indirectly uses 'i_writecount' , because 'kernel_file_read()' will in fact do ret = deny_write_access(file); if (ret) return ret; ... allow_write_access(file); around the read of the file data. We do not allow concurrent writes to the file, and return -ETXTBUSY if the file was open for writing at the same time as the module data is loaded from it. And the solution to the reader concurrency problem is to simply extend this "no concurrent writers" logic to simply be "exclusive access". Note that "exclusive" in this context isn't really some absolute thing: it's only exclusion from writers and from other "special readers" that do this writer denial. So we simply introduce a variation of that "deny_write_access()" logic that not only denies write access, but also requires that this is the _only_ such access that denies write access. Which means that you can't start loading a module that is already being loaded as a module by somebody else, or you will get the same -ETXTBSY error that you would get if there were writers around. [ It also means that you can't try to load a currently executing executable as a module, for the same reason: executables do that same "deny_write_access()" thing, and that's obviously where the whole ETXTBSY logic traditionally came from. This is not a problem for kernel modules, since the set of normal executable files and kernel module files is entirely disjoint. ] This new function is called "exclusive_deny_write_access()", and the implementation is trivial, in that it's just an atomic decrement of i_writecount if it was 0 before. To use that new exclusivity check, all we then do is wrap the module loading with that exclusive_deny_write_access()() / allow_write_access() pair. The actual patch is a bit bigger than that, because we want to surround not just the "load file data" part, but the whole module setup, to get maximum exclusion. So this ends up splitting up "finit_module()" into a few helper functions to make it all very clear and legible. In Luis' test-case (bringing up 255 vcpu's in a virtual machine [3]), the "wasted vmalloc" space (ie module data read into a vmalloc'ed area in order to be loaded as a module, but then discarded because somebody else loaded the same module instead) dropped from 1.8GiB to 474kB. Yes, that's gigabytes to kilobytes. It doesn't drop completely to zero, because even with this change, you can still end up having completely serial pointless module loads, where one udev process has loaded a module fully (and thus the kernel has released that exclusive lock on the module file), and then another udev process tries to load the same module again. So while we cannot fully get rid of the fundamental bug in user space, we _can_ get rid of the excessive concurrent thundering herd effect. A couple of final side notes on this all: - This tweak only affects the "finit_module()" system call, which gives the kernel a file descriptor with the module data. You can also just feed the module data as raw data from user space with "init_module()" (note the lack of 'f' at the beginning), and obviously for that case we do _not_ have any "exclusive read" logic. So if you absolutely want to do things wrong in user space, and try to load the same module multiple times, and error out only later when the kernel ends up saying "you can't load the same module name twice", you can still do that. And in fact, some distros will do exactly that, because they will uncompress the kernel module data in user space before feeding it to the kernel (mainly because they haven't started using the new kernel side decompression yet). So this is not some absolute "you can't do concurrent loads of the same module". It's literally just a very simple heuristic that will catch it early in case you try to load the exact same module file at the same time, and in that case avoid a potentially nasty situation. - There is another user of "deny_write_access()": the verity code that enables fs-verity on a file (the FS_IOC_ENABLE_VERITY ioctl). If you use fs-verity and you care about verifying the kernel modules (which does make sense), you should do it before loading said kernel module. That may sound obvious, but now the implementation basically requires it. Because if you try to do it concurrently, the kernel may refuse to load the module file that is being set up by the fs-verity code. - This all will obviously mean that if you insist on loading the same module in parallel, only one module load will succeed, and the others will return with an error. That was true before too, but what is different is that the -ETXTBSY error can be returned before the success case of another process fully loading and instantiating the module. Again, that might sound obvious, and it is indeed the whole point of the whole change: we are much quicker to notice the whole "you're already in the process of loading this module". So it's very much intentional, but it does mean that if you just spray the kernel with "finit_module()", and expect that the module is immediately loaded afterwards without checking the return value, you are doing something horribly horribly wrong. I'd like to say that that would never happen, but the whole _reason_ for this commit is that udev is currently doing something horribly horribly wrong, so ... Link: https://lore.kernel.org/all/ZEGopJ8VAYnE7LQ2@bombadil.infradead.org/ [1] Link: https://lore.kernel.org/all/23bd0ce6-ef78-1cd8-1f21-0e706a00424a@suse.com/ [2] Link: https://lore.kernel.org/lkml/ZG%2Fa+nrt4%2FAAUi5z@bombadil.infradead.org/ [3] Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-25	Merge tag 'vfs/v6.4-rc3/misc.fixes' of ↵	Linus Torvalds	1	-21/+21
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - During the acl rework we merged this cycle the generic_listxattr() helper had to be modified in a way that in principle it would allow for POSIX ACLs to be reported. At least that was the impression we had initially. Because before the acl rework POSIX ACLs would be reported if the filesystem did have POSIX ACL xattr handlers in sb->s_xattr. That logic changed and now we can simply check whether the superblock has SB_POSIXACL set and if the inode has inode->i_{default_}acl set report the appropriate POSIX ACL name. However, we didn't realize that generic_listxattr() was only ever used by two filesystems. Both of them don't support POSIX ACLs via sb->s_xattr handlers and so never reported POSIX ACLs via generic_listxattr() even if they raised SB_POSIXACL and did contain inodes which had acls set. The example here is nfs4. As a result, generic_listxattr() suddenly started reporting POSIX ACLs when it wouldn't have before. Since SB_POSIXACL implies that the umask isn't stripped in the VFS nfs4 can't just drop SB_POSIXACL from the superblock as it would also alter umask handling for them. So just have generic_listxattr() not report POSIX ACLs as it never did anyway. It's documented as such. - Our SB_* flags currently use a signed integer and we shift the last bit causing UBSAN to complain about undefined behavior. Switch to using unsigned. While the original patch used an explicit unsigned bitshift it's now pretty common to rely on the BIT() macro in a lot of headers nowadays. So the patch has been adjusted to use that. - Add Namjae as ntfs reviewer. They're already active this cycle so let's make it explicit right now. * tag 'vfs/v6.4-rc3/misc.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: ntfs: Add myself as a reviewer fs: don't call posix_acl_listxattr in generic_listxattr fs: fix undefined behavior in bit shift for SB_NOUSER
2023-05-25	Merge tag 'net-6.4-rc4' of ↵	Linus Torvalds	4	-3/+15
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth and bpf. Current release - regressions: - net: fix skb leak in __skb_tstamp_tx() - eth: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs Current release - new code bugs: - handshake: - fix sock->file allocation - fix handshake_dup() ref counting - bluetooth: - fix potential double free caused by hci_conn_unlink - fix UAF in hci_conn_hash_flush Previous releases - regressions: - core: fix stack overflow when LRO is disabled for virtual interfaces - tls: fix strparser rx issues - bpf: - fix many sockmap/TCP related issues - fix a memory leak in the LRU and LRU_PERCPU hash maps - init the offload table earlier - eth: mlx5e: - do as little as possible in napi poll when budget is 0 - fix using eswitch mapping in nic mode - fix deadlock in tc route query code Previous releases - always broken: - udplite: fix NULL pointer dereference in __sk_mem_raise_allocated() - raw: fix output xfrm lookup wrt protocol - smc: reset connection when trying to use SMCRv2 fails - phy: mscc: enable VSC8501/2 RGMII RX clock - eth: octeontx2-pf: fix TSOv6 offload - eth: cdc_ncm: deal with too low values of dwNtbOutMaxSize" * tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits) udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated(). net: phy: mscc: enable VSC8501/2 RGMII RX clock net: phy: mscc: remove unnecessary phydev locking net: phy: mscc: add support for VSC8501 net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE net/handshake: Enable the SNI extension to work properly net/handshake: Unpin sock->file if a handshake is cancelled net/handshake: handshake_genl_notify() shouldn't ignore @flags net/handshake: Fix uninitialized local variable net/handshake: Fix handshake_dup() ref counting net/handshake: Remove unneeded check from handshake_dup() ipv6: Fix out-of-bounds access in ipv6_find_tlv() net: ethernet: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs docs: netdev: document the existence of the mail bot net: fix skb leak in __skb_tstamp_tx() r8169: Use a raw_spinlock_t for the register locks. page_pool: fix inconsistency for page_pool_ring_[un]lock() bpf, sockmap: Test progs verifier error with latest clang bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer ...
2023-05-25	Merge tag 'for-v6.4-rc' of ↵	Linus Torvalds	1	-0/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply fixes from Sebastian Reichel: - Fix power_supply_get_battery_info for devices without parent devices resulting in NULL pointer dereference - Fix desktop systems reporting to run on battery once a power-supply device with device scope appears (e.g. a HID keyboard with a battery) - Ratelimit debug print about driver not providing data - Fix race condition related to external_power_changed in multiple drivers (ab8500, axp288, bq25890, sc27xx, bq27xxx) - Fix LED trigger switching from blinking to solid-on when charging finishes - Fix multiple races in bq27xxx battery driver - mt6360: handle potential ENOMEM from devm_work_autocancel - sbs-charger: Fix SBS_CHARGER_STATUS_CHARGE_INHIBITED bit - rt9467: avoid passing 0 to dev_err_probe * tag 'for-v6.4-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (21 commits) power: supply: Fix logic checking if system is running from battery power: supply: mt6360: add a check of devm_work_autocancel in mt6360_charger_probe power: supply: sbs-charger: Fix INHIBITED bit for Status reg power: supply: rt9467: Fix passing zero to 'dev_err_probe' power: supply: Ratelimit no data debug output power: supply: Fix power_supply_get_battery_info() if parent is NULL power: supply: bq24190: Call power_supply_changed() after updating input current power: supply: bq25890: Call power_supply_changed() after updating input current or voltage power: supply: bq27xxx: Use mod_delayed_work() instead of cancel() + schedule() power: supply: bq27xxx: After charger plug in/out wait 0.5s for things to stabilize power: supply: bq27xxx: Ensure power_supply_changed() is called on current sign changes power: supply: bq27xxx: Move bq27xxx_battery_update() down power: supply: bq27xxx: Add cache parameter to bq27xxx_battery_current_and_status() power: supply: bq27xxx: Fix poll_interval handling and races on remove power: supply: bq27xxx: Fix I2C IRQ race on remove power: supply: bq27xxx: Fix bq27xxx_battery_update() race condition power: supply: leds: Fix blink to LED on transition power: supply: sc27xx: Fix external_power_changed race power: supply: bq25890: Fix external_power_changed race power: supply: axp288_fuel_gauge: Fix external_power_changed race ...
2023-05-25	mmc: sdio: Add/rename SDIO ID of the RTL8723DS SDIO wifi cards	Martin Blumenstingl	1	-1/+2
	RTL8723DS comes in two variant and each of them has their own SDIO ID: - 0xd723 can connect two antennas. The WiFi part is still 1x1 so the second antenna can be dedicated to Bluetooth - 0xd724 can only connect one antenna so it's shared between WiFi and Bluetooth Add a new entry for the single antenna RTL8723DS (0xd724) which can be found on the MangoPi MQ-Quad. Also rename the existing RTL8723DS entry (0xd723) so it's name reflects that it's the variant with support for two antennas. Reviewed-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230522202425.1827005-4-martin.blumenstingl@googlemail.com
2023-05-25	io_uring/cmd: add cmd lazy tw wake helper	Pavel Begunkov	1	-2/+16
	We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new cmd tw helper accepting TWQ flags, and then add io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and imply the "lazy" semantics, i.e. it posts no more than 1 CQE and delaying execution of this tw should not prevent forward progress. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5b9f6716006df7e817f18bd555aee2f8f9c8b0c3.1684154817.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-25	mfd: axp20x: Add support for AXP313a PMIC	Martin Botka	1	-0/+32
	The AXP313a is a PMIC chip produced by X-Powers, it can be connected via an I2C bus. The name AXP1530 seems to appear as well, and this is what is used in the BSP driver. From all we know it's the same chip, just a different name. However we have only seen AXP313a chips in the wild, so go with this name. Compared to the other AXP PMICs it's a rather simple affair: just three DCDC converters, three LDOs, and no battery charging support. Describe the regmap and the MFD bits, along with the registers exposed via I2C. Aside from the various regulators, also describe the power key interrupts, and adjust the shutdown handler routine to use a different register than the other PMICs. Eventually advertise the device using the new compatible string. Signed-off-by: Martin Botka <martin.botka@somainline.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Chen-Yu Tsai <wens@csie.org> Link: https://lore.kernel.org/r/20230524000012.15028-2-andre.przywara@arm.com Signed-off-by: Lee Jones <lee@kernel.org>
2023-05-25	efi: fix missing prototype warnings	Arnd Bergmann	2	-0/+8
	The cper.c file needs to include an extra header, and efi_zboot_entry needs an extern declaration to avoid these 'make W=1' warnings: drivers/firmware/efi/libstub/zboot.c:65:1: error: no previous prototype for 'efi_zboot_entry' [-Werror=missing-prototypes] drivers/firmware/efi/efi.c:176:16: error: no previous prototype for 'efi_attr_is_visible' [-Werror=missing-prototypes] drivers/firmware/efi/cper.c:626:6: error: no previous prototype for 'cper_estatus_print' [-Werror=missing-prototypes] drivers/firmware/efi/cper.c:649:5: error: no previous prototype for 'cper_estatus_check_header' [-Werror=missing-prototypes] drivers/firmware/efi/cper.c:662:5: error: no previous prototype for 'cper_estatus_check' [-Werror=missing-prototypes] To make this easier, move the cper specific declarations to include/linux/cper.h. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-05-25	Merge tag 'for-netdev' of ↵	Jakub Kicinski	1	-2/+1
	https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-05-24 We've added 19 non-merge commits during the last 10 day(s) which contain a total of 20 files changed, 738 insertions(+), 448 deletions(-). The main changes are: 1) Batch of BPF sockmap fixes found when running against NGINX TCP tests, from John Fastabend. 2) Fix a memleak in the LRU{,_PERCPU} hash map when bucket locking fails, from Anton Protopopov. 3) Init the BPF offload table earlier than just late_initcall, from Jakub Kicinski. 4) Fix ctx access mask generation for 32-bit narrow loads of 64-bit fields, from Will Deacon. 5) Remove a now unsupported __fallthrough in BPF samples, from Andrii Nakryiko. 6) Fix a typo in pkg-config call for building sign-file, from Jeremy Sowden. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, sockmap: Test progs verifier error with latest clang bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0 bpf, sockmap: Build helper to create connected socket pair bpf, sockmap: Pull socket helpers out of listen test for general use bpf, sockmap: Incorrectly handling copied_seq bpf, sockmap: Wake up polling after data copy bpf, sockmap: TCP data stall on recv before accept bpf, sockmap: Handle fin correctly bpf, sockmap: Improved check for empty queue bpf, sockmap: Reschedule is now done through backlog bpf, sockmap: Convert schedule_work into delayed_work bpf, sockmap: Pass skb ownership through read_skb bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields samples/bpf: Drop unnecessary fallthrough bpf: netdev: init the offload table earlier selftests/bpf: Fix pkg-config call building sign-file ==================== Link: https://lore.kernel.org/r/20230524170839.13905-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-25	net/mlx5e: Use query_special_contexts cmd only once per mdev	Dragos Tatulea	1	-0/+1
	Don't query the firmware so many times (num rqs * num wqes * wqe frags) because it slows down linearly the interface creation time when the product is larger. Do it only once per mdev and store the result in mlx5e_param. Due to helper function being called from different files, move it to an appropriate location. Rename the function with a proper prefix and add a small cleanup. This fix applies only for legacy rq. Fixes: 1b1e4868836a ("net/mlx5e: Use query_special_contexts for mkeys") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-24	PM: suspend: add a arch_resume_nosmt() prototype	Arnd Bergmann	1	-0/+2
	The arch_resume_nosmt() has a __weak definition, plus an x86 specific override, but no prototype that ensures the two have the same arguments. This causes a W=1 warning: arch/x86/power/hibernate.c:189:5: error: no previous prototype for 'arch_resume_nosmt' [-Werror=missing-prototypes] Add the prototype in linux/suspend.h, which is included in both places. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24	PM: suspend: Fix pm_suspend_target_state handling for !CONFIG_PM	Kai-Heng Feng	1	-1/+3
	Move the pm_suspend_target_state definition for CONFIG_SUSPEND unset from the wakeup code into the headers so as to allow it to still be used elsewhere when CONFIG_SUSPEND is not set. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> [ rjw: Changelog and subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24	powercap: intel_rapl: Introduce core support for TPMI interface	Zhang Rui	1	-0/+5
	Compared with existing RAPL MSR/MMIO Interface, the RAPL TPMI Interface 1. has per Power Limit register, thus has per Power Limit Lock and Enable bit. 2. doesn't have Power Limit Clamp bit. 3. the Power Limit Lock and Enable bits have different bit offsets. These mean RAPL TPMI Interface needs its own primitive information. RAPL TPMI Interface also has per domain unit register but with a different register layout. This requires a TPMI specific rapl_defaults call to decode the unit register. Introduce the RAPL core support for TPMI Interface. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24	powercap: intel_rapl: Introduce RAPL I/F type	Zhang Rui	1	-0/+6
	Different RAPL Interfaces may have different primitive information and rapl_defaults calls. To better distinguish this difference in the RAPL framework code, introduce a new enum to represent different types of RAPL Interfaces. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24	powercap: intel_rapl: Make cpu optional for rapl_package	Zhang Rui	1	-4/+4
	MSR RAPL Interface always removes a rapl_package when all the CPUs in that rapl_package are offlined. This is because it relies on an online CPU to access the MSR. But for RAPL Interface using MMIO registers, when all the cpus within the rapl_package are offlined, 1. the register can still be accessed 2. monitoring and setting the Power Pimits for the rapl_package is still meaningful because of uncore power. This means that, a valid rapl_package doesn't rely on one or more cpus being onlined. For this sense, make cpu optional for rapl_package. A rapl_package can be registered either using a CPU id to represent the physical package/die, or using the physical package id directly. Note that, the thermal throttling interrupt is not disabled via MSR_IA32_PACKAGE_THERM_INTERRUPT for such rapl_package at the moment. If it is still needed in the future, this can be achieved by selecting an onlined CPU using the physical package id. Note that, processor_thermal_rapl, the current MMIO RAPL Interface driver, can also be converted to register using a package id instead. But this is not done right now because processor_thermal_rapl driver works on single-package systems only, and offlining the only package will not happen. So keep the previous logic. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24	powercap: intel_rapl: Add support for lock bit per Power Limit	Zhang Rui	1	-0/+2
	With RAPL MSR/MMIO Interface, each RAPL domain has one Power Limit register. Each Power Limit register has one lock bit which tells the OS if the power limit register can be used or not. Depending on the number of power limits supported by the power limit register, the lock bit may apply to one or more power limits. With RAPL TPMI Interface, each RAPL domain has multiple Power Limits, and each Power Limit has its own register, with a lock bit. To handle this, introduce support for lock bit per Power Limit. For existing RAPL MSR/MMIO Interfaces, the lock bit in the Power Limit register applies to all the Power Limits controlled by this register. Remove the per domain DOMAIN_STATE_BIOS_LOCKED flag at the same time because it can be replaced by the per Power Limit lock. No functional change intended. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24	powercap: intel_rapl: Cleanup Power Limits support	Zhang Rui	1	-1/+0
	The same set of operations are shared by different Powert Limits, including Power Limit get/set, Power Limit enable/disable, clamping enable/disable, time window get/set, and max power get/set, etc. But the same operation for different Power Limit has different primitives because they use different registers/register bits. A lot of dirty/duplicate code was introduced to handle this difference. Introduce a universal way to issue Power Limit operations. Instead of using hardcoded primitive name directly, use Power Limit id + operation type, and hide all the Power Limit difference details in a central place, get_pl_prim(). Two helpers, rapl_read_pl_data() and rapl_write_pl_data(), are introduced at the same time to simplify the code for issuing Power Limit operations. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24	powercap: intel_rapl: Change primitive order	Zhang Rui	1	-2/+3
	The same set of operations are shared by different Powert Limits, including Power Limit get/set, Power Limit enable/disable, clamping enable/disable, time window get/set, and max power get/set, etc. But the same operation for different Power Limit has different primitives because they use different registers/register bits. A lot of dirty/duplicate code was introduced to handle this difference. Instead of using hardcoded primitive name directly, using Power Limit id + operation type is much cleaner. For this sense, move POWER_LIMIT1/POWER_LIMIT2/POWER_LIMIT4 to the beginning of enum rapl_primitives so that they can be reused as Power Limit ids. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24	powercap: intel_rapl: Support per domain energy/power/time unit	Zhang Rui	1	-4/+4
	RAPL MSR/MMIO Interface has package scope unit register but some RAPL domains like Dram/Psys may use a fixed energy unit value instead of the default unit value on certain platforms. RAPL TPMI Interface supports per domain unit register. For the above reasons, add support for per domain unit register and per domain energy/power/time unit. When per domain unit register is not available, use the package scope unit register as the per domain unit register for each RAPL domain so that this change is transparent to MSR/MMIO Interface. No functional change intended. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24	powercap: intel_rapl: Support per Interface primitive information	Zhang Rui	1	-0/+2
	RAPL primitive information is Interface specific. Although current MSR and MMIO Interface share the same RAPL primitives, new Interface like TPMI has its own RAPL primitive information. Save the primitive information in the Interface private structure. Plus, using variant name "rp" for struct rapl_primitive_info is confusing because "rp" is also used for struct rapl_package. Use "rpi" as the variant name for struct rapl_primitive_info, and rename the previous rpi[] array to avoid conflict. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24	powercap: intel_rapl: Support per Interface rapl_defaults	Zhang Rui	1	-0/+2
	rapl_defaults is Interface specific. Although current MSR and MMIO Interface share the same rapl_defaults, new Interface like TPMI need its own rapl_defaults callbacks. Save the rapl_defaults information in the Interface private structure. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24	powercap: intel_rapl: Remove unused field in struct rapl_if_priv	Zhang Rui	1	-1/+0
	After commit f1e8d7560d30 ("powercap/intel_rapl: enumerate Psys RAPL domain together with package RAPL domain"), the platform_rapl_domain field is not used anymore. Remove it from rapl_if_priv structure. Fixes: f1e8d7560d30 ("powercap/intel_rapl: enumerate Psys RAPL domain together with package RAPL domain") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24	net: phylink: add function to resolve clause 73 negotiation	Russell King (Oracle)	1	-0/+2
	Add a function to resolve clause 73 negotiation according to the priority resolution function described in clause 73.3.6. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24	net: mdio: add clause 73 to ethtool conversion helper	Russell King (Oracle)	1	-0/+39
	Add a helper to convert a clause 73 advertisement to an ethtool bitmap. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24	x86/pci/xen: populate MSI sysfs entries	Maximilian Heyne	1	-1/+8
	Commit bf5e758f02fc ("genirq/msi: Simplify sysfs handling") reworked the creation of sysfs entries for MSI IRQs. The creation used to be in msi_domain_alloc_irqs_descs_locked after calling ops->domain_alloc_irqs. Then it moved into __msi_domain_alloc_irqs which is an implementation of domain_alloc_irqs. However, Xen comes with the only other implementation of domain_alloc_irqs and hence doesn't run the sysfs population code anymore. Commit 6c796996ee70 ("x86/pci/xen: Fixup fallout from the PCI/MSI overhaul") set the flag MSI_FLAG_DEV_SYSFS for the xen msi_domain_info but that doesn't actually have an effect because Xen uses it's own domain_alloc_irqs implementation. Fix this by making use of the fallback functions for sysfs population. Fixes: bf5e758f02fc ("genirq/msi: Simplify sysfs handling") Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230503131656.15928-1-mheyne@amazon.de Signed-off-by: Juergen Gross <jgross@suse.com>
2023-05-24	block: Add BIO_PAGE_PINNED and associated infrastructure	David Howells	2	-1/+3
	Add BIO_PAGE_PINNED to indicate that the pages in a bio are pinned (FOLL_PIN) and that the pin will need removing. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Jan Kara <jack@suse.cz> cc: Matthew Wilcox <willy@infradead.org> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-block@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230522205744.2825689-5-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24	block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic	Christoph Hellwig	2	-2/+2
	Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted meaning is only set when a page reference has been acquired that needs to be released by bio_release_pages(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Jan Kara <jack@suse.cz> cc: Matthew Wilcox <willy@infradead.org> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-block@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230522205744.2825689-4-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24	block: Fix bio_flagged() so that gcc can better optimise it	David Howells	1	-1/+1
	Fix bio_flagged() so that multiple instances of it, such as: if (bio_flagged(bio, BIO_PAGE_REFFED) \|\| bio_flagged(bio, BIO_PAGE_PINNED)) can be combined by the gcc optimiser into a single test in assembly (arguably, this is a compiler optimisation issue[1]). The missed optimisation stems from bio_flagged() comparing the result of the bitwise-AND to zero. This results in an out-of-line bio_release_page() being compiled to something like: <+0>: mov 0x14(%rdi),%eax <+3>: test $0x1,%al <+5>: jne 0xffffffff816dac53 <bio_release_pages+11> <+7>: test $0x2,%al <+9>: je 0xffffffff816dac5c <bio_release_pages+20> <+11>: movzbl %sil,%esi <+15>: jmp 0xffffffff816daba1 <__bio_release_pages> <+20>: jmp 0xffffffff81d0b800 <__x86_return_thunk> However, the test is superfluous as the return type is bool. Removing it results in: <+0>: testb $0x3,0x14(%rdi) <+4>: je 0xffffffff816e4af4 <bio_release_pages+15> <+6>: movzbl %sil,%esi <+10>: jmp 0xffffffff816dab7c <__bio_release_pages> <+15>: jmp 0xffffffff81d0b7c0 <__x86_return_thunk> instead. Also, the MOVZBL instruction looks unnecessary[2] - I think it's just 're-booling' the mark_dirty parameter. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Jens Axboe <axboe@kernel.dk> cc: linux-block@vger.kernel.org Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370 [1] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371 [2] Link: https://lore.kernel.org/r/167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk/ # v6 Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230522205744.2825689-3-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24	Merge branch 'for-6.5/splice' into for-6.5/block	Jens Axboe	3	-19/+6
	Merge splice bits as subsequent block cleanups and improvements for DIO depend on them. * for-6.5/splice: (31 commits) splice: kdoc for filemap_splice_read() and copy_splice_read() iov_iter: Kill ITER_PIPE splice: Remove generic_file_splice_read() splice: Use filemap_splice_read() instead of generic_file_splice_read() cifs: Use filemap_splice_read() trace: Convert trace/seq to use copy_splice_read() zonefs: Provide a splice-read wrapper xfs: Provide a splice-read wrapper orangefs: Provide a splice-read wrapper ocfs2: Provide a splice-read wrapper ntfs3: Provide a splice-read wrapper nfs: Provide a splice-read wrapper f2fs: Provide a splice-read wrapper ext4: Provide a splice-read wrapper ecryptfs: Provide a splice-read wrapper ceph: Provide a splice-read wrapper afs: Provide a splice-read wrapper 9p: Add splice_read wrapper net: Make sock_splice_read() use copy_splice_read() by default tty, proc, kernfs, random: Use copy_splice_read() ...
2023-05-24	iov_iter: Kill ITER_PIPE	David Howells	1	-14/+0
	The ITER_PIPE-type iterator was only used by generic_file_splice_read() and that has been replaced and removed. This leaves ITER_PIPE unused - so remove it too. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230522135018.2742245-31-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24	splice: Remove generic_file_splice_read()	David Howells	1	-2/+0
	Remove generic_file_splice_read() as it has been replaced with calls to filemap_splice_read() and copy_splice_read(). With this, ITER_PIPE is no longer used. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Steve French <smfrench@gmail.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230522135018.2742245-30-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24	splice: Make do_splice_to() generic and export it	David Howells	1	-0/+3
	Rename do_splice_to() to vfs_splice_read() and export it so that it can be used as a helper when calling down to a lower layer filesystem as it performs all the necessary checks[1]. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> cc: Miklos Szeredi <miklos@szeredi.hu> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: John Hubbard <jhubbard@nvidia.com> cc: David Hildenbrand <david@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-unionfs@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/CAJfpeguGksS3sCigmRi9hJdUec8qtM9f+_9jC1rJhsXT+dV01w@mail.gmail.com/ [1] Link: https://lore.kernel.org/r/20230522135018.2742245-6-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24	splice: Rename direct_splice_read() to copy_splice_read()	David Howells	1	-3/+3
	Rename direct_splice_read() to copy_splice_read() to better reflect as to what it does. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> cc: Steve French <sfrench@samba.org> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230522135018.2742245-4-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24	ARM/musb: omap2: Remove global GPIO numbers from TUSB6010	Linus Walleij	1	-13/+0
	The TUSB6010 (MUSB) device is picking up some GPIO lines hardcoded by number and passing on to the TUSB6010 device when registering it. Instead of nasty workarounds, provide a GPIO descriptor table and then make the TUSB6010 MUSB glue driver pick up the GPIO lines directly, convert it to an IRQ and pass down to the MUSB driver. OMAP2 is the only system using the TUSB6010. Stash the GPIO descriptors in the glue layer and use then to power up and down the TUSB6010 on-demand, instead of using boardfile callbacks. Since the OMAP2 boards are the only boards using the .set_power() and .board_set_power() callbacks, we can just delete them as the power is now handled directly in the TUSB6010 glue code. Cc: Bin Liu <b-liu@ti.com> Cc: linux-usb@vger.kernel.org Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-05-24	ARM/gpio: Push OMAP2 quirk down into TWL4030 driver	Linus Walleij	1	-3/+0
	The TWL4030 GPIO driver has a custom platform data .set_up() callback to call back into the platform and do misc stuff such as hog and export a GPIO for WLAN PWR on a specific OMAP3 board. Avoid all the kludgery in the platform data and the boardfile and just put the quirks right into the driver. Make it conditional on OMAP3. I think the exported GPIO is used by some kind of userspace so ordinary DTS hogs will probably not work. Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-05-24	ARM/mmc: Convert old mmci-omap to GPIO descriptors	Linus Walleij	1	-2/+0
	A recent change to the OMAP driver making it use a dynamic GPIO base created problems with some old OMAP1 board files, among them Nokia 770, SX1 and also the OMAP2 Nokia n8x0. Fix up all instances of GPIOs being used for the MMC driver by pushing the handling of power, slot selection and MMC "cover" into the driver as optional GPIOs. This is maybe not the most perfect solution as the MMC framework have some central handlers for some of the stuff, but it at least makes the situtation better and solves the immediate issue. Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-05-24	Input: ads7846 - Convert to use software nodes	Linus Walleij	2	-4/+0
	The Nokia 770 is using GPIOs from the global numberspace on the CBUS node to pass down to the LCD controller. This regresses when we let the OMAP GPIO driver use dynamic GPIO base. The Nokia 770 now has dynamic allocation of IRQ numbers, so this needs to be fixed for it to work. As this is the only user of LCD MIPID we can easily augment the driver to use a GPIO descriptor instead and resolve the issue. The platform data .shutdown() callback wasn't even used in the code, but we encode a shutdown asserting RESET in the remove() callback for completeness sake. The CBUS also has the ADS7846 touchscreen attached. Populate the devices on the Nokia 770 CBUS I2C using software nodes instead of platform data quirks. This includes the LCD and the ADS7846 touchscreen so the conversion just brings the LCD along with it as software nodes is an all-or-nothing design pattern. The ADS7846 has some limited support for using GPIO descriptors, let's convert it over completely to using device properties and then fix all remaining boardfile users to provide all platform data using software nodes. Dump the of includes and of_match_ptr() in the ADS7846 driver as part of the job. Since we have to move ADS7846 over to obtaining the GPIOs it is using exclusively from descriptors, we provide descriptor tables for the two remaining in-kernel boardfiles using ADS7846: - PXA Spitz - MIPS Alchemy DB1000 development board It was too hard for me to include software node conversion of these two remaining users at this time: the spitz is using a hscync callback in the platform data that would require further GPIO descriptor conversion of the Spitz, and moving the hsync callback down into the driver: it will just become too big of a job, but it can be done separately. The MIPS Alchemy DB1000 is simply something I cannot test, so take the easier approach of just providing some GPIO descriptors in this case as I don't want the patch to grow too intrusive. As we see that several device trees have incorrect polarity flags and just expect to bypass the gpiolib polarity handling, fix up all device trees too, in a separate patch. Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-05-24	ARM/mfd/gpio: Fixup TPS65010 regression on OMAP1 OSK1	Linus Walleij	1	-7/+4
	Aaro reports problems on the OSK1 board after we altered the dynamic base for GPIO allocations. It appears this happens because the OMAP driver now allocates GPIO numbers dynamically, so all that is references by number is a bit up in the air. Let's bite the bullet and try to just move the gpio_chip in the tps65010 MFD driver over to using dynamic allocations. Alter everything in the OSK1 board file to use a GPIO descriptor table and lookups. Utilize the NULL device to define some board-specific GPIO lookups and use these to immediately look up the same GPIOs, convert to IRQ numbers and pass as resources to the devices. This is ugly but should work. The .setup() callback for tps65010 was used for some GPIO hogging, but since the OSK1 is the only user in the entire kernel we can alter the signatures to something that is helpful and make a clean transition. Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: andy.shevchenko@gmail.com Cc: Andreas Kemnade <andreas@kemnade.info> Acked-by: Lee Jones <lee@kernel.org> Reviewed-by: Lee Jones <lee@kernel.org> Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-05-24	spi: Merge up v6.4-rc3	Mark Brown	13	-18/+43
	Merge up v6.4-rc3 in order to get fixes to improve the stability of my CI.
2023-05-24	regulator: Merge up v6.4-rc3	Mark Brown	13	-18/+43
	Merge up v6.4-rc3 in order to get fixes to improve the stability of my CI.
2023-05-24	genirq: Use hlist for managing resend handlers	Shanker Donthineni	1	-0/+3
	The current implementation utilizes a bitmap for managing interrupt resend handlers, which is allocated based on the SPARSE_IRQ/NR_IRQS macros. However, this method may not efficiently utilize memory during runtime, particularly when IRQ_BITMAP_BITS is large. Address this issue by using an hlist to manage interrupt resend handlers instead of relying on a static bitmap memory allocation. Additionally, a new function, clear_irq_resend(), is introduced and called from irq_shutdown to ensure a graceful teardown of the interrupt. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230519134902.1495562-2-sdonthineni@nvidia.com
2023-05-24	net: phy: avoid kernel warning dump when stopping an errored PHY	Russell King (Oracle)	1	-2/+5
	When taking a network interface down (or removing a SFP module) after the PHY has encountered an error, phy_stop() complains incorrectly that it was called from HALTED state. The reason this is incorrect is that the network driver will have called phy_start() when the interface was brought up, and the fact that the PHY has a problem bears no relationship to the administrative state of the interface. Taking the interface administratively down (which calls phy_stop()) is always the right thing to do after a successful phy_start() call, whether or not the PHY has encountered an error. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24	sysctl: Refactor base paths registrations	Joel Granados	1	-23/+0
	This is part of the general push to deprecate register_sysctl_paths and register_sysctl_table. The old way of doing this through register_sysctl_base and DECLARE_SYSCTL_BASE macro is replaced with a call to register_sysctl_init. The 5 base paths affected are: "kernel", "vm", "debug", "dev" and "fs". We remove the register_sysctl_base function and the DECLARE_SYSCTL_BASE macro since they are no longer needed. In order to quickly acertain that the paths did not actually change I executed `find /proc/sys/ \| sha1sum` and made sure that the sha was the same before and after the commit. We end up saving 563 bytes with this change: ./scripts/bloat-o-meter vmlinux.0.base vmlinux.1.refactor-base-paths add/remove: 0/5 grow/shrink: 2/0 up/down: 77/-640 (-563) Function old new delta sysctl_init_bases 55 111 +56 init_fs_sysctls 12 33 +21 vm_base_table 128 - -128 kernel_base_table 128 - -128 fs_base_table 128 - -128 dev_base_table 128 - -128 debug_base_table 128 - -128 Total: Before=21258215, After=21257652, chg -0.00% [mcgrof: modified to use register_sysctl_init() over register_sysctl() and add bloat-o-meter stats] Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Christian Brauner <brauner@kernel.org>
2023-05-24	sysctl: stop exporting register_sysctl_table	Joel Granados	1	-7/+1
	We make register_sysctl_table static because the only function calling it is in fs/proc/proc_sysctl.c (__register_sysctl_base). We remove it from the sysctl.h header and modify the documentation in both the header and proc_sysctl.c files to mention "register_sysctl" instead of "register_sysctl_table". This plus the commits that remove register_sysctl_table from parport save 217 bytes: ./scripts/bloat-o-meter .bsysctl/vmlinux.old .bsysctl/vmlinux.new add/remove: 0/1 grow/shrink: 5/1 up/down: 458/-675 (-217) Function old new delta __register_sysctl_base 8 286 +278 parport_proc_register 268 379 +111 parport_device_proc_register 195 247 +52 kzalloc.constprop 598 608 +10 parport_default_proc_register 62 69 +7 register_sysctl_table 291 - -291 parport_sysctl_template 1288 904 -384 Total: Before=8603076, After=8602859, chg -0.00% Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-05-24	parport: Move magic number "15" to a define	Joel Granados	1	-0/+2
	Put the size of a parport name behind a define so we can use it in other files. This is a preparation patch to be able to use this size in parport/procfs.c. Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-05-24	net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES	David Howells	1	-0/+3
	Add a function to handle MSG_SPLICE_PAGES being passed internally to sendmsg(). Pages are spliced into the given socket buffer if possible and copied in if not (e.g. they're slab pages or have a zero refcount). Signed-off-by: David Howells <dhowells@redhat.com> cc: David Ahern <dsahern@kernel.org> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24	net: Pass max frags into skb_append_pagefrags()	David Howells	1	-1/+1
	Pass the maximum number of fragments into skb_append_pagefrags() rather than using MAX_SKB_FRAGS so that it can be used from code that wants to specify sysctl_max_skb_frags. Signed-off-by: David Howells <dhowells@redhat.com> cc: David Ahern <dsahern@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24	net: Declare MSG_SPLICE_PAGES internal sendmsg() flag	David Howells	1	-0/+3
	Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a network protocol that it should splice pages from the source iterator rather than copying the data if it can. This flag is added to a list that is cleared by sendmsg syscalls on entry. This is intended as a replacement for the ->sendpage() op, allowing a way to splice in several multipage folios in one go. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24	tracing: Rename stacktrace field to common_stacktrace	Steven Rostedt (Google)	1	-0/+1
	The histogram and synthetic events can use a pseudo event called "stacktrace" that will create a stacktrace at the time of the event and use it just like it was a normal field. We have other pseudo events such as "common_cpu" and "common_timestamp". To stay consistent with that, convert "stacktrace" to "common_stacktrace". As this was used in older kernels, to keep backward compatibility, this will act just like "common_cpu" did with "cpu". That is, "cpu" will be the same as "common_cpu" unless the event has a "cpu" field. In which case, the event's field is used. The same is true with "stacktrace". Also update the documentation to reflect this change. Link: https://lore.kernel.org/linux-trace-kernel/20230523230913.6860e28d@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-24	tracing/user_events: Document user_event_mm one-shot list usage	Beau Belgrave	1	-0/+1
	During 6.4 development it became clear that the one-shot list used by the user_event_mm's next field was confusing to others. It is not clear how this list is protected or what the next field usage is for unless you are familiar with the code. Add comments into the user_event_mm struct indicating lock requirement and usage. Also document how and why this approach was used via comments in both user_event_enabler_update() and user_event_mm_get_all() and the rules to properly use it. Link: https://lkml.kernel.org/r/20230519230741.669-5-beaub@linux.microsoft.com Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wicngggxVpbnrYHjRTwGE0WYscPRM+L2HO2BF8ia1EXgQ@mail.gmail.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>