summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide
AgeCommit message (Collapse)AuthorFilesLines
2024-07-12MIPS: Implement ieee754 NAN2008 emulation modeJiaxun Yang1-1/+3
Implement ieee754 NAN2008 emulation mode. When this mode is enabled, kernel will accept ELF file compiled for both NaN 2008 and NaN legacy, but if hardware does not have capability to match ELF's NaN mode, __own_fpu will fail for corresponding thread and fpuemu will then kick in. This mode trade performance for correctness, while maintaining support for both NaN mode regardless of hardware capability. It is useful for multilib installation that have both types of binary exist in system. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2024-07-11x86/xen: remove deprecated xen_nopvspin boot parameterJuergen Gross1-5/+0
The xen_nopvspin boot parameter is deprecated since 2019. nopvspin can be used instead. Remove the xen_nopvspin boot parameter and replace the xen_pvspin variable use cases with nopvspin. This requires to move the nopvspin variable out of the .initdata section, as it needs to be accessed for cpuhotplug, too. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Message-ID: <20240710110139.22300-1-jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-11xen: make multicall debug boot time selectableJuergen Gross1-0/+6
Today Xen multicall debugging needs to be enabled via modifying a define in a source file for getting debug data of multicall errors encountered by users. Switch multicall debugging to depend on a boot parameter "xen_mc_debug" instead, enabling affected users to boot with the new parameter set in order to get better diagnostics. With debugging enabled print the following information in case at least one of the batched calls failed: - all calls of the batch with operation, result and caller - all parameters of each call - all parameters stored in the multicall data for each call Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Message-ID: <20240710092749.13595-1-jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-11Merge branch 'sched/urgent' into sched/core, to pick up fixes and refresh ↵Ingo Molnar3-32/+35
the branch Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-07-11platform/x86/amd/pmf: Remove update system state documentShyam Sundar S K1-24/+0
This commit removes the "pmf.rst" document, which was associated with the PMF driver that enabled system state updates based on TA output actions. The driver now uses existing input events (KEY_SCREENLOCK, KEY_SLEEP, and KEY_SUSPEND) instead of defining new udev rules in the "/etc/udev/rules.d/" directory. Consequently, the pmf.rst document is no longer necessary. Therefore, the pmf.rst documentation is being removed. Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20240711052047.1531957-2-Shyam-sundar.S-k@amd.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-07-10Docs/admin-guide/mm/damon/start: add access pattern snapshot exampleSeongJae Park1-4/+42
DAMON user-space tool (damo) provides access pattern snapshot feature, which is expected to be frequently used for real time access pattern analysis. The snapshot output is also showing what DAMON provides on its own, including the 'age' information. In contrast, the recorded access patterns, which is shown as an example usage on the quick start section, shows what users can make from what DAMON provided. It includes information that generated outside of DAMON and makes the 'age' concept bit unclear. Hence snapshot output is easier at understanding the raw realtime output of DAMON. Add the snapshot usage example on the quick start section. Link: https://lkml.kernel.org/r/20240701192706.51415-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10dm-crypt: limit the size of encryption requestsMikulas Patocka1-0/+11
There was a performance regression reported where dm-crypt would perform worse on new kernels than on old kernels. The reason is that the old kernels split the bios to NVMe request size (that is usually 65536 or 131072 bytes) and the new kernels pass the big bios through dm-crypt and split them underneath. If a big 1MiB bio is passed to dm-crypt, dm-crypt processes it on a single core without parallelization and this is what causes the performance degradation. This commit introduces new tunable variables /sys/module/dm_crypt/parameters/max_read_size and /sys/module/dm_crypt/parameters/max_write_size that specify the maximum bio size for dm-crypt. Bios larger than this value are split, so that they can be encrypted in parallel by multiple cores. If these variables are '0', a default 131072 is used. Splitting bios may cause performance regressions in other workloads - if this happens, the user should increase the value in max_read_size and max_write_size variables. max_read_size: 128k 2399MiB/s 256k 2368MiB/s 512k 1986MiB/s 1024 1790MiB/s max_write_size: 128k 1712MiB/s 256k 1651MiB/s 512k 1537MiB/s 1024k 1332MiB/s Note that if you run dm-crypt inside a virtual machine, you may need to do "echo numa >/sys/module/workqueue/parameters/default_affinity_scope" to improve performance. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Laurence Oberman <loberman@redhat.com>
2024-07-10Merge back cpufreq material for 6.11.Rafael J. Wysocki2-1/+21
2024-07-09Documentation: add reference from dynamic debug to loglevel kernel paramsDaniel Lublin1-0/+5
This is useful information for somebody who has managed to dig into enabling debug output, but is wondering why there is no such output appearing on the console. Signed-off-by: Daniel Lublin <daniel@lublin.se> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/4633bdb82c1c7c014d79840887878624a55c59f8.1720533043.git.daniel@lublin.se
2024-07-09LoongArch: KVM: Add PV steal time support in guest sideBibo Mao1-3/+3
Per-cpu struct kvm_steal_time is added here, its size is 64 bytes and also defined as 64 bytes, so that the whole structure is in one physical page. When a VCPU is online, function pv_enable_steal_time() is called. This function will pass guest physical address of struct kvm_steal_time and tells hypervisor to enable steal time. When a vcpu is offline, physical address is set as 0 and tells hypervisor to disable steal time. Here is an output of vmstat on guest when there is workload on both host and guest. It shows steal time stat information. procs -----------memory---------- -----io---- -system-- ------cpu----- r b swpd free inact active bi bo in cs us sy id wa st 15 1 0 7583616 184112 72208 20 0 162 52 31 6 43 0 20 17 0 0 7583616 184704 72192 0 0 6318 6885 5 60 8 5 22 16 0 0 7583616 185392 72144 0 0 1766 1081 0 49 0 1 50 16 0 0 7583616 184816 72304 0 0 6300 6166 4 62 12 2 20 18 0 0 7583632 184480 72240 0 0 2814 1754 2 58 4 1 35 Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09gpio: virtuser: new virtual testing driver for the GPIO APIBartosz Golaszewski2-0/+178
The GPIO subsystem used to have a serious problem with undefined behavior and use-after-free bugs on hot-unplug of GPIO chips. This can be considered a corner-case by some as most GPIO controllers are enabled early in the boot process and live until the system goes down but most GPIO drivers do allow unbind over sysfs, many are loadable modules that can be (force) unloaded and there are also GPIO devices that can be dynamically detached, for instance CP2112 which is a USB GPIO expender. Bugs can be triggered both from user-space as well as by in-kernel users. We have the means of testing it from user-space via the character device but the issues manifest themselves differently in the kernel. This is a proposition of adding a new virtual driver - a configurable GPIO consumer that can be configured over configfs (similarly to gpio-sim) or described on the device-tree. This driver is aimed as a helper in spotting any regressions in hot-unplug handling in GPIOLIB. Link: https://lore.kernel.org/r/20240708142912.120570-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-07-05docs: mm: add enable_soft_offline sysctlJiaqi Yan1-0/+38
Add the documentation for soft offline behaviors / costs, and what the new enable_soft_offline sysctl is for. [jiaqiyan@google.com: fix kerneldoc warnings] Link: https://lkml.kernel.org/r/CACw3F52=GxTCDw-PqFh3-GDM-fo3GbhGdu0hedxYXOTT4TQSTg@mail.gmail.com [jiaqiyan@google.com: there are more blank lines needed] Link: https://lkml.kernel.org/r/CACw3F52_obAB742XeDRNun4BHBYtrxtbvp5NkUincXdaob0j1g@mail.gmail.com Link: https://lkml.kernel.org/r/20240626050818.2277273-5-jiaqiyan@google.com Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Acked-by: Oscar Salvador <osalvador@suse.de> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Frank van der Linden <fvdl@google.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-05mm: add swappiness= arg to memory.reclaimDan Schatzberg1-7/+11
Allow proactive reclaimers to submit an additional swappiness=<val> argument to memory.reclaim. This overrides the global or per-memcg swappiness setting for that reclaim attempt. For example: echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim will perform reclaim on the rootcg with a swappiness setting of 0 (no swap) regardless of the vm.swappiness sysctl setting. Userspace proactive reclaimers use the memory.reclaim interface to trigger reclaim. The memory.reclaim interface does not allow for any way to effect the balance of file vs anon during proactive reclaim. The only approach is to adjust the vm.swappiness setting. However, there are a few reasons we look to control the balance of file vs anon during proactive reclaim, separately from reactive reclaim: * Swapout should be limited to manage SSD write endurance. In near-OOM situations we are fine with lots of swap-out to avoid OOMs. As these are typically rare events, they have relatively little impact on write endurance. However, proactive reclaim runs continuously and so its impact on SSD write endurance is more significant. Therefore it is desireable to control swap-out for proactive reclaim separately from reactive reclaim * Some userspace OOM killers like systemd-oomd[1] support OOM killing on swap exhaustion. This makes sense if the swap exhaustion is triggered due to reactive reclaim but less so if it is triggered due to proactive reclaim (e.g. one could see OOMs when free memory is ample but anon is just particularly cold). Therefore, it's desireable to have proactive reclaim reduce or stop swap-out before the threshold at which OOM killing occurs. In the case of Meta's Senpai proactive reclaimer, we adjust vm.swappiness before writes to memory.reclaim[2]. This has been in production for nearly two years and has addressed our needs to control proactive vs reactive reclaim behavior but is still not ideal for a number of reasons: * vm.swappiness is a global setting, adjusting it can race/interfere with other system administration that wishes to control vm.swappiness. In our case, we need to disable Senpai before adjusting vm.swappiness. * vm.swappiness is stateful - so a crash or restart of Senpai can leave a misconfigured setting. This requires some additional management to record the "desired" setting and ensure Senpai always adjusts to it. With this patch, we avoid these downsides of adjusting vm.swappiness globally. [1]https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd.service.html [2]https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins/Senpai.cpp#L585-L598 Link: https://lkml.kernel.org/r/20240103164841.2800183-3-schatzberg.dan@gmail.com Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com> Suggested-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Yue Zhao <findns94@gmail.com> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitterPaul E. McKenney1-0/+8
If a CPU is running either a userspace application or a guest OS in nohz_full mode, it is possible for a system call to occur just as an RCU grace period is starting. If that CPU also has the scheduling-clock tick enabled for any reason (such as a second runnable task), and if the system was booted with rcutree.use_softirq=0, then RCU can add insult to injury by awakening that CPU's rcuc kthread, resulting in yet another task and yet more OS jitter due to switching to that task, running it, and switching back. In addition, in the common case where that system call is not of excessively long duration, awakening the rcuc task is pointless. This pointlessness is due to the fact that the CPU will enter an extended quiescent state upon returning to the userspace application or guest OS. In this case, the rcuc kthread cannot do anything that the main RCU grace-period kthread cannot do on its behalf, at least if it is given a few additional milliseconds (for example, given the time duration specified by rcutree.jiffies_till_first_fqs, give or take scheduling delays). This commit therefore adds a rcutree.nohz_full_patience_delay kernel boot parameter that specifies the grace period age (in milliseconds, rounded to jiffies) before which RCU will refrain from awakening the rcuc kthread. Preliminary experimentation suggests a value of 1000, that is, one second. Increasing rcutree.nohz_full_patience_delay will increase grace-period latency and in turn increase memory footprint, so systems with constrained memory might choose a smaller value. Systems with less-aggressive OS-jitter requirements might choose the default value of zero, which keeps the traditional immediate-wakeup behavior, thus avoiding increases in grace-period latency. [ paulmck: Apply Leonardo Bras feedback. ] Link: https://lore.kernel.org/all/20240328171949.743211-1-leobras@redhat.com/ Reported-by: Leonardo Bras <leobras@redhat.com> Suggested-by: Leonardo Bras <leobras@redhat.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Leonardo Bras <leobras@redhat.com>
2024-07-04ACPI: Add acpi=nospcr to disable ACPI SPCR as default console on ARM64Liu Wei1-3/+7
For varying privacy and security reasons, sometimes we would like to completely silence the _serial_ console, and only enable it when needed. But there are many existing systems that depend on this _serial_ console, so add acpi=nospcr to disable console in ACPI SPCR table as default _serial_ console. Signed-off-by: Liu Wei <liuwei09@cestc.cn> Suggested-by: Prarit Bhargava <prarit@redhat.com> Suggested-by: Will Deacon <will@kernel.org> Suggested-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Prarit Bhargava <prarit@redhat.com> Link: https://lore.kernel.org/r/20240625030504.58025-1-liuwei09@cestc.cn Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-07-04Documentation: kernel-parameters: Add DEVNAME:0.0 format for serial portsTony Lindgren1-0/+19
Document the console option for DEVNAME:0.0 style addressing for serial ports. Suggested-by: Sebastian Reichel <sre@kernel.org> Signed-off-by: Tony Lindgren <tony@atomide.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240703100615.118762-4-tony.lindgren@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-04vmalloc: modify the alloc_vmap_area() error message for better diagnosticsShubhang Kaushik OS1-3/+6
'vmap allocation for size %lu failed: use vmalloc=<size> to increase size' The above warning is seen in the kernel functionality for allocation of the restricted virtual memory range till exhaustion. This message is misleading because 'vmalloc=' is supported on arm32, x86 platforms and is not a valid kernel parameter on a number of other platforms (in particular its not supported on arm64, alpha, loongarch, arc, csky, hexagon, microblaze, mips, nios2, openrisc, parisc, m64k, powerpc, riscv, sh, um, xtensa, s390, sparc). With the update, the output gets modified to include the function parameters along with the start and end of the virtual memory range allowed. The warning message after fix on kernel version 6.10.0-rc1+: vmalloc_node_range for size 33619968 failed: Address range restricted between 0xffff800082640000 - 0xffff800084650000 Backtrace with the misleading error message: vmap allocation for size 33619968 failed: use vmalloc=<size> to increase size insmod: vmalloc error: size 33554432, vm_struct allocation failed, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 46 PID: 1977 Comm: insmod Tainted: G E 6.10.0-rc1+ #79 Hardware name: INGRASYS Yushan Server iSystem TEMP-S000141176+10/Yushan Motherboard, BIOS 2.10.20230517 (SCP: xxx) yyyy/mm/dd Call trace: dump_backtrace+0xa0/0x128 show_stack+0x20/0x38 dump_stack_lvl+0x78/0x90 dump_stack+0x18/0x28 warn_alloc+0x12c/0x1b8 __vmalloc_node_range_noprof+0x28c/0x7e0 custom_init+0xb4/0xfff8 [test_driver] do_one_initcall+0x60/0x290 do_init_module+0x68/0x250 load_module+0x236c/0x2428 init_module_from_file+0x8c/0xd8 __arm64_sys_finit_module+0x1b4/0x388 invoke_syscall+0x78/0x108 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x3c/0x130 el0t_64_sync_handler+0x100/0x130 el0t_64_sync+0x190/0x198 [Shubhang@os.amperecomputing.com: v5] Link: https://lkml.kernel.org/r/CH2PR01MB5894B0182EA0B28DF2EFB916F5C72@CH2PR01MB5894.prod.exchangelabs.com Link: https://lkml.kernel.org/r/MN2PR01MB59025CC02D1D29516527A693F5C62@MN2PR01MB5902.prod.exchangelabs.com Signed-off-by: Shubhang Kaushik <shubhang@os.amperecomputing.com> Reviewed-by: Christoph Lameter (Ampere) <cl@linux.com> Cc: Christoph Lameter <cl@linux.com> Cc: Guo Ren <guoren@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Xiongwei Song <xiongwei.song@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04Docs/damon: document damos_migrate_{hot,cold}Honggyu Kim1-3/+7
This patch adds damon description for "migrate_hot" and "migrate_cold" actions for both usage and design documents as long as a new "target_nid" knob to set the migration target node. [sj@kernel.org: trivial fixups for DAMOS_MIGRATE_{HOT,COLD} documentation] Link: https://lkml.kernel.org/r/20240618213630.84846-2-sj@kernel.org Link: https://lkml.kernel.org/r/20240614030010.751-8-honggyu.kim@sk.com Signed-off-by: Honggyu Kim <honggyu.kim@sk.com> Signed-off-by: SeongJae Park <sj@kernel.org>Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Gregory Price <gregory.price@memverge.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Hyeongtak Ji <hyeongtak.ji@sk.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04Documentation/admin-guide/mm/pagemap.rst: drop "Using pagemap to do ↵David Hildenbrand1-21/+0
something useful" That example was added in 2008. In 2015, we restricted access to the PFNs in the pagemap to CAP_SYS_ADMIN, making that approach quite less usable. It's 2024 now, and using that racy and low-lewel mechanism to calculate the USS should not be considered a good example anymore. /proc/$pid/smaps and /proc/$pid/smaps_rollup can do a much better job without any of that low-level handling. Let's just drop that example. Link: https://lkml.kernel.org/r/20240607122357.115423-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Oscar Salvador <osalvador@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: shmem: add mTHP counters for anonymous shmemBaolin Wang1-0/+13
Add mTHP counters for anonymous shmem. [baolin.wang@linux.alibaba.com: update Documentation/admin-guide/mm/transhuge.rst] Link: https://lkml.kernel.org/r/d86e2e7f-4141-432b-b2ba-c6691f36ef0b@linux.alibaba.com Link: https://lkml.kernel.org/r/4fd9e467d49ae4a747e428bcd821c7d13125ae67.1718090413.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lance Yang <ioworker0@gmail.com> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: shmem: add multi-size THP sysfs interface for anonymous shmemBaolin Wang1-0/+25
To support the use of mTHP with anonymous shmem, add a new sysfs interface 'shmem_enabled' in the '/sys/kernel/mm/transparent_hugepage/hugepages-kB/' directory for each mTHP to control whether shmem is enabled for that mTHP, with a value similar to the top level 'shmem_enabled', which can be set to: "always", "inherit (to inherit the top level setting)", "within_size", "advise", "never". An 'inherit' option is added to ensure compatibility with these global settings, and the options 'force' and 'deny' are dropped, which are rather testing artifacts from the old ages. By default, PMD-sized hugepages have enabled="inherit" and all other hugepage sizes have enabled="never" for '/sys/kernel/mm/transparent_hugepage/hugepages-xxkB/shmem_enabled'. In addition, if top level value is 'force', then only PMD-sized hugepages have enabled="inherit", otherwise configuration will be failed and vice versa. That means now we will avoid using non-PMD sized THP to override the global huge allocation. [baolin.wang@linux.alibaba.com: fix transhuge.rst indentation] Link: https://lkml.kernel.org/r/b189d815-998b-4dfd-ba89-218ff51313f8@linux.alibaba.com [akpm@linux-foundation.org: reflow transhuge.rst addition to 80 cols] [baolin.wang@linux.alibaba.com: move huge_shmem_orders_lock under CONFIG_SYSFS] Link: https://lkml.kernel.org/r/eb34da66-7f12-44f3-a39e-2bcc90c33354@linux.alibaba.com [akpm@linux-foundation.org: huge_memory.c needs mm_types.h] Link: https://lkml.kernel.org/r/ffddfa8b3cb4266ff963099ab78cfd7184c57ac7.1718090413.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04docs/admin-guide/mm: correct typo 'quired' to 'queried'Daniel Watson1-1/+1
Convert the word "quired" to the word "queried" which makes more sense in this context. Signed-off-by: Daniel Watson <ozzloy@each.do> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/878qymrjrg.fsf@trent-reznor
2024-07-03cgroup/misc: Introduce misc.peakXiu Jianfeng1-0/+9
Introduce misc.peak to record the historical maximum usage of the resource, as in some scenarios the value of misc.max could be adjusted based on the peak usage of the resource. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-02cpufreq: docs: Add missing scaling_available_frequencies descriptionRaphael Gallais-Pou1-0/+4
Add a description of the scaling_available_frequencies attribute in sysfs to the documentation. Signed-off-by: Raphael Gallais-Pou <rgallaispou@gmail.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20240701171040.369030-1-rgallaispou@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-02x86/efi: Drop support for fake EFI memory mapsArd Biesheuvel1-21/+0
Between kexec and confidential VM support, handling the EFI memory maps correctly on x86 is already proving to be rather difficult (as opposed to other EFI architectures which manage to never modify the EFI memory map to begin with) EFI fake memory map support is essentially a development hack (for testing new support for the 'special purpose' and 'more reliable' EFI memory attributes) that leaked into production code. The regions marked in this manner are not actually recognized as such by the firmware itself or the EFI stub (and never have), and marking memory as 'more reliable' seems rather futile if the underlying memory is just ordinary RAM. Marking memory as 'special purpose' in this way is also dubious, but may be in use in production code nonetheless. However, the same should be achievable by using the memmap= command line option with the ! operator. EFI fake memmap support is not enabled by any of the major distros (Debian, Fedora, SUSE, Ubuntu) and does not exist on other architectures, so let's drop support for it. Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-07-01Merge 6.10-rc6 into usb-nextGreg Kroah-Hartman1-25/+0
We need the USB fixes in here as well for some follow-on patches. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-30Merge tag 'tty-6.10-rc6' of ↵Linus Torvalds1-19/+0
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial / console fixes from Greg KH: "Here are a bunch of fixes/reverts for 6.10-rc6. Include in here are: - revert the bunch of tty/serial/console changes that landed in -rc1 that didn't quite work properly yet. Everyone agreed to just revert them for now and will work on making them better for a future release instead of trying to quick fix the existing changes this late in the release cycle - 8250 driver port count bugfix - Other tiny serial port bugfixes for reported issues All of these have been in linux-next this week with no reported issues" * tag 'tty-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: Revert "printk: Save console options for add_preferred_console_match()" Revert "printk: Don't try to parse DEVNAME:0.0 console options" Revert "printk: Flag register_console() if console is set on command line" Revert "serial: core: Add support for DEVNAME:0.0 style naming for kernel console" Revert "serial: core: Handle serial console options" Revert "serial: 8250: Add preferred console in serial8250_isa_init_ports()" Revert "Documentation: kernel-parameters: Add DEVNAME:0.0 format for serial ports" Revert "serial: 8250: Fix add preferred console for serial8250_isa_init_ports()" Revert "serial: core: Fix ifdef for serial base console functions" serial: bcm63xx-uart: fix tx after conversion to uart_port_tx_limited() serial: core: introduce uart_port_tx_limited_flags() Revert "serial: core: only stop transmit when HW fifo is empty" serial: imx: set receiver level before starting uart tty: mcf: MCF54418 has 10 UARTS serial: 8250_omap: Implementation of Errata i2310 tty: serial: 8250: Fix port count mismatch with the device
2024-06-29media: em28xx: Add support for MyGica UTV3Nils Rothaug1-0/+8
The MyGica UTV3 Analog USB2.0 TV Box is a USB video capture card that has analog TV, composite video, and FM radio inputs, an IR remote, and provides audio only as Line Out, but not over USB. Mine is prepared for an FM tuner, but not equipped with one. Support for FM radio is therefore missing. The device contains: - Empia EM2860 USB bridge - Philips SAA7113 video decoder - NXP TDA9801T demodulator - Tena TNF931D-DFDR1 tuner - ST HCF4052 demux, switches input audio to Line Out Signed-off-by: Nils Rothaug <nils.rothaug@gmx.de> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-29media: tuner-simple: Add support for Tena TNF931D-DFDR1Nils Rothaug1-0/+2
Tuner ranges were determined by USB capturing the vendor driver of a MyGica UTV3 video capture card. Signed-off-by: Nils Rothaug <nils.rothaug@gmx.de> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-28x86/bugs: Add 'spectre_bhi=vmexit' cmdline optionJosh Poimboeuf1-3/+9
In cloud environments it can be useful to *only* enable the vmexit mitigation and leave syscalls vulnerable. Add that as an option. This is similar to the old spectre_bhi=auto option which was removed with the following commit: 36d4fe147c87 ("x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto") with the main difference being that this has a more descriptive name and is disabled by default. Mitigation switch requested by Maksim Davydov <davydov-max@yandex-team.ru>. [ bp: Massage. ] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/2cbad706a6d5e1da2829e5e123d8d5c80330148c.1719381528.git.jpoimboe@kernel.org
2024-06-28x86/bugs: Remove duplicate Spectre cmdline option descriptionsJosh Poimboeuf1-76/+10
Duplicating the documentation of all the Spectre kernel cmdline options in two separate files is unwieldy and error-prone. Instead just add a reference to kernel-parameters.txt from spectre.rst. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Link: https://lore.kernel.org/r/450b5f4ffe891a8cc9736ec52b0c6f225bab3f4b.1719381528.git.jpoimboe@kernel.org
2024-06-28documentation: media: vivid: Update documentation on vivid loopback supportDorcas Anono Litunya1-23/+29
Modify section "Video and Sliced VBI Looping" in Documentation to explain the vivid loopback support for video across multiple vivid instances. Previous documentation is out-of-date as it was explaining looping in a single vivid instance only. Also, in "Some Future Improvements" the item "Add support to loop from a specific output to a specific input across vivid instances" can be dropped since that's now implemented. Signed-off-by: Dorcas Anono Litunya <anonolitunya@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-28media: vivid: loopback based on 'Connected To' controlsHans Verkuil1-5/+0
Instead of using hardwired video loopback limited to a single vivid instance, use the new 'Connected To' controls to only loopback if an HDMI or S-Video input is connected to another output, which can be in another vivid instance. Effectively this emulates connecting and disconnecting an HDMI/S-Video cable. The Loop Video control is dropped since it has now been replaced by the new 'Connected To' controls. The Display Present has also been dropped since it no longer fits. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Co-developed-by: Dorcas Anono Litunya <anonolitunya@gmail.com> Signed-off-by: Dorcas Anono Litunya <anonolitunya@gmail.com>
2024-06-28media: Documentation: vivid.rst: Remove documentation for Capture OverlayDorcas Anono Litunya1-68/+0
Modifying documentation to remove 'Capture Overlay section' as destructive capture overlay support was removed. See commit ccaa9d50ca73 ("media: vivid: drop overlay support") Signed-off-by: Dorcas Anono Litunya <anonolitunya@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-28media: Documentation: vivid.rst: add supports_requestsHans Verkuil1-0/+9
The module option supports_requests was not documented, add it. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-28media: Documentation: vivid.rst: drop "Video, VBI and RDS Looping"Hans Verkuil1-15/+17
Drop the "Video, VBI and RDS Looping" section, instead moving the Video/VBI info to section "Video and Sliced VBI looping" and the RDS info to section "Radio & RDS Looping". Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-28media: Documentation: vivid.rst: fix confusing section refsHans Verkuil1-20/+21
The documentation contained several instances of "section X" references, which no longer map to whatever X was. Replace these by the section titles. Also fix a single confusing typo in the "Radio & RDS Looping" section: "are regular frequency intervals" -> "at regular frequency intervals" Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-27Merge tag 'amd-pstate-v6.11-2024-06-26' of ↵Rafael J. Wysocki1-0/+16
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux Merge more amd-pstate driver updates for 6.11 from Mario Limonciello: "Add support for amd-pstate core performance boost support which allows controlling which CPU cores can operate above nominal frequencies for short periods of time." * tag 'amd-pstate-v6.11-2024-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux: Documentation: cpufreq: amd-pstate: update doc for Per CPU boost control method cpufreq: amd-pstate: Cap the CPPC.max_perf to nominal_perf if CPB is off cpufreq: amd-pstate: initialize core precision boost state cpufreq: acpi: move MSR_K7_HWCR_CPB_DIS_BIT into msr-index.h
2024-06-27Merge back cpufreq material for v6.11.Rafael J. Wysocki1-1/+1
2024-06-27Documentation: Remove IA-64 from kernel-parametersThomas Huth2-43/+3
IA-64 has been removed from the tree, so we should also remove the corresponding kernel-parameters documentation now. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240627162458.387700-1-thuth@redhat.com
2024-06-27media: admin-guide: Document the Raspberry Pi PiSP BEJacopo Mondi3-0/+130
Add documentation for the PiSP Back End memory-to-memory ISP. Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-27docs: verify/bisect: Fix rendered version URLDiederik de Haas1-1/+1
When rendering the documentation, the 'html' file extension replaces the 'rst' file extension, not add it. So remove the 'rst' part of the URL. Signed-off-by: Diederik de Haas <didi.debian@cknow.org> Reviewed-by: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240620081355.11549-1-didi.debian@cknow.org
2024-06-26Documentation: cpufreq: amd-pstate: update doc for Per CPU boost control methodPerry Yuan1-0/+16
Updates the documentation in `amd-pstate.rst` to include information about the per CPU boost control feature. Users can now enable or disable the Core Performance Boost (CPB) feature on individual CPUs using the `boost` sysfs attribute. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Perry Yuan <perry.yuan@amd.com> Co-developed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20240626042733.3747-5-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-06-25Revert "Documentation: kernel-parameters: Add DEVNAME:0.0 format for serial ↵Greg Kroah-Hartman1-19/+0
ports" This reverts commit 5c3a766e9f057ee7a54b5d7addff7fab02676fea. Let's roll back all of the serial core and printk console changes that went into 6.10-rc1 as there still are problems with them that need to be sorted out. Link: https://lore.kernel.org/r/ZnpRozsdw6zbjqze@tlindgre-MOBL1 Reported-by: Petr Mladek <pmladek@suse.com> Reported-by: Tony Lindgren <tony@atomide.com> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-24media: Documentation: ipu6: Fix examples in ipu6-isys admin-guideSamuel Wein1-7/+7
Fix flags in X1 Yoga example. MEDIA_LNK_FL_DYNAMIC (0x4 in the link flag) was removed in V4 Intel IPU6 and IPU6 input system drivers. Added -V flag to media-ctl commands for X1 Yoga, lower-case v only makes it verbose upper-case V sets the format. Signed-off-by: Samuel Wein <sam@samwein.com> [Sakari Ailus: Align subject line, rewrap commit message.] Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-21Documentation: PM: amd-pstate: add guided mode to the Operation modePerry Yuan1-1/+1
the guided mode is also supported, so the operation mode should include that mode as well. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Perry Yuan <perry.yuan@amd.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Link: https://lore.kernel.org/r/a61d825ef71f6aacc8f1624fe9fb982b8446b5a7.1718811234.git.perry.yuan@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-06-19cgroup/cpuset: Make cpuset.cpus.exclusive independent of cpuset.cpusWaiman Long1-2/+2
The "cpuset.cpus.exclusive.effective" value is currently limited to a subset of its "cpuset.cpus". This makes the exclusive CPUs distribution hierarchy subsumed within the larger "cpuset.cpus" hierarchy. We have to decide on what CPUs are used locally and what CPUs can be passed down as exclusive CPUs down the hierarchy and combine them into "cpuset.cpus". The advantage of the current scheme is to have only one hierarchy to worry about. However, it make it harder to use as all the "cpuset.cpus" values have to be properly set along the way down to the designated remote partition root. It also makes it more cumbersome to find out what CPUs can be used locally. Make creation of remote partition simpler by breaking the dependency of "cpuset.cpus.exclusive" on "cpuset.cpus" and make them independent entities. Now we have two separate hierarchies - one for setting "cpuset.cpus.effective" and the other one for setting "cpuset.cpus.exclusive.effective". We may not need to set "cpuset.cpus" when we activate a partition root anymore. Also update Documentation/admin-guide/cgroup-v2.rst and cpuset.c comment to document this change. Suggested-by: Petr Malat <oss@malat.biz> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-06-19cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partitionWaiman Long1-2/+6
The CS_CPU_EXCLUSIVE flag is currently set whenever cpuset.cpus.exclusive is set to make sure that the exclusivity test will be run to ensure its exclusiveness. At the same time, this flag can be changed whenever the partition root state is changed. For example, the CS_CPU_EXCLUSIVE flag will be reset whenever a partition root becomes invalid. This makes using CS_CPU_EXCLUSIVE to ensure exclusiveness a bit fragile. The current scheme also makes setting up a cpuset.cpus.exclusive hierarchy to enable remote partition harder as cpuset.cpus.exclusive cannot overlap with any cpuset.cpus of sibling cpusets if their cpuset.cpus.exclusive aren't set. Solve these issues by deferring the setting of CS_CPU_EXCLUSIVE flag until the cpuset become a valid partition root while adding new checks in validate_change() to ensure that cpuset.cpus.exclusive of sibling cpusets cannot overlap. An additional check is also added to validate_change() to make sure that cpuset.cpus of one cpuset cannot be a subset of cpuset.cpus.exclusive of a sibling cpuset to avoid the problem that none of those CPUs will be available when these exclusive CPUs are extracted out to a newly enabled partition root. The Documentation/admin-guide/cgroup-v2.rst file is updated to document the new constraints. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-06-19pstore/ramoops: Add ramoops.mem_name= command line optionSteven Rostedt (Google)1-0/+13
Add a method to find a region specified by reserve_mem=nn:align:name for ramoops. Adding a kernel command line parameter: reserve_mem=12M:4096:oops ramoops.mem_name=oops Will use the size and location defined by the memmap parameter where it finds the memory and labels it "oops". The "oops" in the ramoops option is used to search for it. This allows for arbitrary RAM to be used for ramoops if it is known that the memory is not cleared on kernel crashes or soft reboots. Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20240613155527.591647061@goodmis.org Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-06-19mm/memblock: Add "reserve_mem" to reserved named memory at boot upSteven Rostedt (Google)1-0/+22
In order to allow for requesting a memory region that can be used for things like pstore on multiple machines where the memory layout is not the same, add a new option to the kernel command line called "reserve_mem". The format is: reserve_mem=nn:align:name Where it will find nn amount of memory at the given alignment of align. The name field is to allow another subsystem to retrieve where the memory was found. For example: reserve_mem=12M:4096:oops ramoops.mem_name=oops Where ramoops.mem_name will tell ramoops that memory was reserved for it via the reserve_mem option and it can find it by calling: if (reserve_mem_find_by_name("oops", &start, &size)) { // start holds the start address and size holds the size given This is typically used for systems that do not wipe the RAM, and this command line will try to reserve the same physical memory on soft reboots. Note, it is not guaranteed to be the same location. For example, if KASLR places the kernel at the location of where the RAM reservation was from a previous boot, the new reservation will be at a different location. Any subsystem using this feature must add a way to verify that the contents of the physical memory is from a previous boot, as there may be cases where the memory will not be located at the same location. Not all systems may work either. There could be bit flips if the reboot goes through the BIOS. Using kexec to reboot the machine is likely to have better results in such cases. Link: https://lore.kernel.org/all/ZjJVnZUX3NZiGW6q@kernel.org/ Suggested-by: Mike Rapoport <rppt@kernel.org> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20240613155527.437020271@goodmis.org Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>