summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2023-08-18mm: move is_ioremap_addr() into new header fileBaoquan He1-0/+1
Now is_ioremap_addr() is only used in kernel/iomem.c and gonna be used in mm/ioremap.c. Move it into its own new header file linux/ioremap.h. Link: https://lkml.kernel.org/r/20230706154520.11257-17-bhe@redhat.com Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Rich Felker <dalias@libc.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18mm/mm_init.c: remove obsolete macro HASH_SMALLMiaohe Lin1-2/+1
HASH_SMALL only works when parameter numentries is 0. But the sole caller futex_init() never calls alloc_large_system_hash() with numentries set to 0. So HASH_SMALL is obsolete and remove it. Link: https://lkml.kernel.org/r/20230625021323.849147-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: André Almeida <andrealmeid@igalia.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18mm: remove arguments of show_mem()Kefeng Wang1-1/+1
All callers of show_mem() pass 0 and NULL, so we can remove the two arguments by directly calling __show_mem(0, NULL, MAX_NR_ZONES - 1) in show_mem(). Link: https://lkml.kernel.org/r/20230630062253.189440-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18cgroup: Avoid -Wstringop-overflow warningsGustavo A. R. Silva1-2/+2
Change the notation from pointer-to-array to pointer-to-pointer. With this, we avoid the compiler complaining about trying to access a region of size zero as an argument during function calls. This is a workaround to prevent the compiler complaining about accessing an array of size zero when evaluating the arguments of a couple of function calls. See below: kernel/cgroup/cgroup.c: In function 'find_css_set': kernel/cgroup/cgroup.c:1206:16: warning: 'find_existing_css_set' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] 1206 | cset = find_existing_css_set(old_cset, cgrp, template); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/cgroup/cgroup.c:1206:16: note: referencing argument 3 of type 'struct cgroup_subsys_state *[0]' kernel/cgroup/cgroup.c:1071:24: note: in a call to function 'find_existing_css_set' 1071 | static struct css_set *find_existing_css_set(struct css_set *old_cset, | ^~~~~~~~~~~~~~~~~~~~~ With the change to pointer-to-pointer, the functions are not prevented from being executed, and they will do what they have to do when CGROUP_SUBSYS_COUNT == 0. Address the following -Wstringop-overflow warnings seen when built with ARM architecture and aspeed_g4_defconfig configuration (notice that under this configuration CGROUP_SUBSYS_COUNT == 0): kernel/cgroup/cgroup.c:1208:16: warning: 'find_existing_css_set' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] kernel/cgroup/cgroup.c:1258:15: warning: 'css_set_hash' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] kernel/cgroup/cgroup.c:6089:18: warning: 'css_set_hash' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] kernel/cgroup/cgroup.c:6153:18: warning: 'css_set_hash' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] This results in no differences in binary output. Link: https://github.com/KSPP/linux/issues/316 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-17seccomp: Add missing kerndoc notationsKees Cook1-4/+10
The kerndoc for some struct member and function arguments were missing. Add them. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308171742.AncabIG1-lkp@intel.com/ Signed-off-by: Kees Cook <keescook@chromium.org>
2023-08-17tracing: Fix memleak due to race between current_tracer and traceZheng Yejian3-2/+12
Kmemleak report a leak in graph_trace_open(): unreferenced object 0xffff0040b95f4a00 (size 128): comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s) hex dump (first 32 bytes): e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}.......... f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e... backtrace: [<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0 [<000000007df90faa>] graph_trace_open+0xb0/0x344 [<00000000737524cd>] __tracing_open+0x450/0xb10 [<0000000098043327>] tracing_open+0x1a0/0x2a0 [<00000000291c3876>] do_dentry_open+0x3c0/0xdc0 [<000000004015bcd6>] vfs_open+0x98/0xd0 [<000000002b5f60c9>] do_open+0x520/0x8d0 [<00000000376c7820>] path_openat+0x1c0/0x3e0 [<00000000336a54b5>] do_filp_open+0x14c/0x324 [<000000002802df13>] do_sys_openat2+0x2c4/0x530 [<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4 [<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394 [<00000000313647bf>] do_el0_svc+0xac/0xec [<000000002ef1c651>] el0_svc+0x20/0x30 [<000000002fd4692a>] el0_sync_handler+0xb0/0xb4 [<000000000c309c35>] el0_sync+0x160/0x180 The root cause is descripted as follows: __tracing_open() { // 1. File 'trace' is being opened; ... *iter->trace = *tr->current_trace; // 2. Tracer 'function_graph' is // currently set; ... iter->trace->open(iter); // 3. Call graph_trace_open() here, // and memory are allocated in it; ... } s_start() { // 4. The opened file is being read; ... *iter->trace = *tr->current_trace; // 5. If tracer is switched to // 'nop' or others, then memory // in step 3 are leaked!!! ... } To fix it, in s_start(), close tracer before switching then reopen the new tracer after switching. And some tracers like 'wakeup' may not update 'iter->private' in some cases when reopen, then it should be cleared to avoid being mistakenly closed again. Link: https://lore.kernel.org/linux-trace-kernel/20230817125539.1646321-1-zhengyejian1@huawei.com Fixes: d7350c3f4569 ("tracing/core: make the read callbacks reentrants") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-08-17sched/eevdf: Curb wakeup-preemptionPeter Zijlstra2-0/+13
Mike and others noticed that EEVDF does like to over-schedule quite a bit -- which does hurt performance of a number of benchmarks / workloads. In particular, what seems to cause over-scheduling is that when lag is of the same order (or larger) than the request / slice then placement will not only cause the task to be placed left of current, but also with a smaller deadline than current, which causes immediate preemption. [ notably, lag bounds are relative to HZ ] Mike suggested we stick to picking 'current' for as long as it's eligible to run, giving it uninterrupted runtime until it reaches parity with the pack. Augment Mike's suggestion by only allowing it to exhaust it's initial request. One random data point: echo NO_RUN_TO_PARITY > /debug/sched/features perf stat -a -e context-switches --repeat 10 -- perf bench sched messaging -g 20 -t -l 5000 3,723,554 context-switches ( +- 0.56% ) 9.5136 +- 0.0394 seconds time elapsed ( +- 0.41% ) echo RUN_TO_PARITY > /debug/sched/features perf stat -a -e context-switches --repeat 10 -- perf bench sched messaging -g 20 -t -l 5000 2,556,535 context-switches ( +- 0.51% ) 9.2427 +- 0.0302 seconds time elapsed ( +- 0.33% ) Suggested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230816134059.GC982867@hirez.programming.kicks-ass.net
2023-08-17Merge branches 'doc.2023.07.14b', 'fixes.2023.08.16a', ↵Paul E. McKenney9-57/+285
'rcu-tasks.2023.07.24a', 'rcuscale.2023.07.14b', 'refscale.2023.07.14b', 'torture.2023.08.14a' and 'torturescripts.2023.07.20a' into HEAD doc.2023.07.14b: Documentation updates. fixes.2023.08.16a: Miscellaneous fixes. rcu-tasks.2023.07.24a: RCU Tasks updates. rcuscale.2023.07.14b: RCU (updater) scalability test updates. refscale.2023.07.14b: Reference (reader) scalability test updates. torture.2023.08.14a: Other torture-test updates. torturescripts.2023.07.20a: Other torture-test scripting updates.
2023-08-17rcu: Make the rcu_nocb_poll boot parameter usable via boot configPaul E. McKenney1-2/+2
The rcu_nocb_poll kernel boot parameter is defined via early_param(), whose parsing functions are invoked from parse_early_param() which is in turn invoked by setup_arch(), which is very early indeed.  It is invoked so early that the console output timestamps read 0.000000, in other words, before time begins. This use of early_param() means that the rcu_nocb_poll kernel boot parameter cannot usefully be embedded into the kernel image. Yes, you can embed it, but setup_boot_config() is invoked from start_kernel() too late for it to be parsed. But it makes no sense to parse this parameter so early. After all, it cannot do anything until the rcuog kthreads are created, which is long after rcu_init() time, let alone setup_boot_config() time. This commit therefore switches the rcu_nocb_poll kernel boot parameter from early_param() to __setup(), which allows boot-config parsing of this parameter, in turn allowing it to be embedded into the kernel image. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-08-17rcu: Mark __rcu_irq_enter_check_tick() ->rcu_urgent_qs loadPaul E. McKenney1-1/+1
The rcu_request_urgent_qs_task() function does a cross-CPU store to ->rcu_urgent_qs, so this commit therefore marks the load in __rcu_irq_enter_check_tick() with READ_ONCE(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-08-16tracing/synthetic: Allocate one additional element for sizeSven Schnelle1-1/+2
While debugging another issue I noticed that the stack trace contains one invalid entry at the end: <idle>-0 [008] d..4. 26.484201: wake_lat: pid=0 delta=2629976084 000000009cc24024 stack=STACK: => __schedule+0xac6/0x1a98 => schedule+0x126/0x2c0 => schedule_timeout+0x150/0x2c0 => kcompactd+0x9ca/0xc20 => kthread+0x2f6/0x3d8 => __ret_from_fork+0x8a/0xe8 => 0x6b6b6b6b6b6b6b6b This is because the code failed to add the one element containing the number of entries to field_size. Link: https://lkml.kernel.org/r/20230816154928.4171614-4-svens@linux.ibm.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-08-16tracing/synthetic: Skip first entry for stack tracesSven Schnelle1-13/+4
While debugging another issue I noticed that the stack trace output contains the number of entries on top: <idle>-0 [000] d..4. 203.322502: wake_lat: pid=0 delta=2268270616 stack=STACK: => 0x10 => __schedule+0xac6/0x1a98 => schedule+0x126/0x2c0 => schedule_timeout+0x242/0x2c0 => __wait_for_common+0x434/0x680 => __wait_rcu_gp+0x198/0x3e0 => synchronize_rcu+0x112/0x138 => ring_buffer_reset_online_cpus+0x140/0x2e0 => tracing_reset_online_cpus+0x15c/0x1d0 => tracing_set_clock+0x180/0x1d8 => hist_register_trigger+0x486/0x670 => event_hist_trigger_parse+0x494/0x1318 => trigger_process_regex+0x1d4/0x258 => event_trigger_write+0xb4/0x170 => vfs_write+0x210/0xad0 => ksys_write+0x122/0x208 Fix this by skipping the first element. Also replace the pointer logic with an index variable which is easier to read. Link: https://lkml.kernel.org/r/20230816154928.4171614-3-svens@linux.ibm.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-08-16tracing/synthetic: Use union instead of castsSven Schnelle2-50/+45
The current code uses a lot of casts to access the fields member in struct synth_trace_events with different sizes. This makes the code hard to read, and had already introduced an endianness bug. Use a union and struct instead. Link: https://lkml.kernel.org/r/20230816154928.4171614-2-svens@linux.ibm.com Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 00cf3d672a9dd ("tracing: Allow synthetic events to pass around stacktraces") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-08-16tracing: Fix cpu buffers unavailable due to 'record_disabled' missedZheng Yejian1-0/+6
Trace ring buffer can no longer record anything after executing following commands at the shell prompt: # cd /sys/kernel/tracing # cat tracing_cpumask fff # echo 0 > tracing_cpumask # echo 1 > snapshot # echo fff > tracing_cpumask # echo 1 > tracing_on # echo "hello world" > trace_marker -bash: echo: write error: Bad file descriptor The root cause is that: 1. After `echo 0 > tracing_cpumask`, 'record_disabled' of cpu buffers in 'tr->array_buffer.buffer' became 1 (see tracing_set_cpumask()); 2. After `echo 1 > snapshot`, 'tr->array_buffer.buffer' is swapped with 'tr->max_buffer.buffer', then the 'record_disabled' became 0 (see update_max_tr()); 3. After `echo fff > tracing_cpumask`, the 'record_disabled' become -1; Then array_buffer and max_buffer are both unavailable due to value of 'record_disabled' is not 0. To fix it, enable or disable both array_buffer and max_buffer at the same time in tracing_set_cpumask(). Link: https://lkml.kernel.org/r/20230805033816.3284594-2-zhengyejian1@huawei.com Cc: <mhiramat@kernel.org> Cc: <vnagarnaik@google.com> Cc: <shuah@kernel.org> Fixes: 71babb2705e2 ("tracing: change CPU ring buffer state from tracing_cpumask") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-08-16printk: export symbols for debug modulesEnlin Mu1-0/+2
the module is out-of-tree, it saves kernel logs when panic Signed-off-by: Enlin Mu <enlin.mu@unisoc.com> Acked-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230815020711.2604939-1-yunlong.xing@unisoc.com
2023-08-16bpf: Fix uninitialized symbol in bpf_perf_link_fill_kprobe()Yafang Shao1-3/+2
The commit 1b715e1b0ec5 ("bpf: Support ->fill_link_info for perf_event") leads to the following Smatch static checker warning: kernel/bpf/syscall.c:3416 bpf_perf_link_fill_kprobe() error: uninitialized symbol 'type'. That can happens when uname is NULL. So fix it by verifying the uname when we really need to fill it. Fixes: 1b715e1b0ec5 ("bpf: Support ->fill_link_info for perf_event") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Jiri Olsa <jolsa@kernel.org> Closes: https://lore.kernel.org/bpf/85697a7e-f897-4f74-8b43-82721bebc462@kili.mountain Link: https://lore.kernel.org/bpf/20230813141900.1268-2-laoar.shao@gmail.com
2023-08-16perf/hw_breakpoint: Remove arch breakpoint hooksBenjamin Gray1-28/+0
PowerPC was the only user of these hooks, and has been refactored to no longer require them. There is no need to keep them around, so remove them to reduce complexity. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230801011744.153973-8-bgray@linux.ibm.com
2023-08-16sysctl: Add size to register_sysctlJoel Granados1-1/+1
This commit adds table_size to register_sysctl in preparation for the removal of the sentinel elements in the ctl_table arrays (last empty markers). And though we do *not* remove any sentinels in this commit, we set things up by either passing the table_size explicitly or using ARRAY_SIZE on the ctl_table arrays. We replace the register_syctl function with a macro that will add the ARRAY_SIZE to the new register_sysctl_sz function. In this way the callers that are already using an array of ctl_table structs do not change. For the callers that pass a ctl_table array pointer, we pass the table_size to register_sysctl_sz instead of the macro. Signed-off-by: Joel Granados <j.granados@samsung.com> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-16sysctl: Add a size arg to __register_sysctl_tableJoel Granados1-1/+2
We make these changes in order to prepare __register_sysctl_table and its callers for when we remove the sentinel element (empty element at the end of ctl_table arrays). We don't actually remove any sentinels in this commit, but we *do* make sure to use ARRAY_SIZE so the table_size is available when the removal occurs. We add a table_size argument to __register_sysctl_table and adjust callers, all of which pass ctl_table pointers and need an explicit call to ARRAY_SIZE. We implement a size calculation in register_net_sysctl in order to forward the size of the array pointer received from the network register calls. The new table_size argument does not yet have any effect in the init_header call which is still dependent on the sentinel's presence. table_size *does* however drive the `kzalloc` allocation in __register_sysctl_table with no adverse effects as the allocated memory is either one element greater than the calculated ctl_table array (for the calls in ipc_sysctl.c, mq_sysctl.c and ucount.c) or the exact size of the calculated ctl_table array (for the call from sysctl_net.c and register_sysctl). This approach will allows us to "just" remove the sentinel without further changes to __register_sysctl_table as table_size will represent the exact size for all the callers at that point. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-16audit: move trailing statements to next lineAtul Kumar Pant2-2/+4
Fixes following checkpatch.pl issue: ERROR: trailing statements should be on next line Signed-off-by: Atul Kumar Pant <atulpant.linux@gmail.com> [PM: subject line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-16audit: cleanup function braces and assignment-in-if-conditionAtul Kumar Pant1-2/+4
The patch fixes following checkpatch.pl issue: ERROR: open brace '{' following function definitions go on the next line ERROR: do not use assignment in if condition Signed-off-by: Atul Kumar Pant <atulpant.linux@gmail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-16audit: add space before parenthesis and around '=', "==", and '<'Atul Kumar Pant3-10/+10
Fixes following checkpatch.pl issue: ERROR: space required before the open parenthesis '(' ERROR: spaces required around that '=' ERROR: spaces required around that '<' ERROR: spaces required around that '==' Signed-off-by: Atul Kumar Pant <atulpant.linux@gmail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-15bpf: Support default .validate() and .update() behavior for struct_ops linksDavid Vernet1-6/+9
Currently, if a struct_ops map is loaded with BPF_F_LINK, it must also define the .validate() and .update() callbacks in its corresponding struct bpf_struct_ops in the kernel. Enabling struct_ops link is useful in its own right to ensure that the map is unloaded if an application crashes. For example, with sched_ext, we want to automatically unload the host-wide scheduler if the application crashes. We would likely never support updating elements of a sched_ext struct_ops map, so we'd have to implement these callbacks showing that they _can't_ support element updates just to benefit from the basic lifetime management of struct_ops links. Let's enable struct_ops maps to work with BPF_F_LINK even if they haven't defined these callbacks, by assuming that a struct_ops map element cannot be updated by default. Acked-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230814185908.700553-2-void@manifault.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-15cgroup:namespace: Remove unused cgroup_namespaces_init()Lu Jialin1-6/+0
cgroup_namspace_init() just return 0. Therefore, there is no need to call it during start_kernel. Just remove it. Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces") Signed-off-by: Lu Jialin <lujialin4@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-15workqueue: Rename rescuer kworkerAaron Tomlin1-1/+1
Each CPU-specific and unbound kworker kthread conforms to a particular naming scheme. However, this does not extend to the rescuer kworker. At present, a rescuer kworker is simply named according to its workqueue's name. This can be cryptic. This patch modifies a rescuer to follow the kworker naming scheme. The "R" is indicative of a rescuer and after "-" is its workqueue's name e.g. "kworker/R-ext4-rsv-conver". tj: Use "R" instead of "r" as the prefix to make it more distinctive and consistent with how highpri pools are marked. Signed-off-by: Aaron Tomlin <atomlin@atomlin.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-15rcutorture: Stop right-shifting torture_random() return valuesPaul E. McKenney1-3/+3
Now that torture_random() uses swahw32(), its callers no longer see not-so-random low-order bits, as these are now swapped up into the upper 16 bits of the torture_random() function's return value. This commit therefore removes the right-shifting of torture_random() return values. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-08-15torture: Stop right-shifting torture_random() return valuesPaul E. McKenney1-2/+2
Now that torture_random() uses swahw32(), its callers no longer see not-so-random low-order bits, as these are now swapped up into the upper 16 bits of the torture_random() function's return value. This commit therefore removes the right-shifting of torture_random() return values. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-08-15torture: Move stutter_wait() timeouts to hrtimersPaul E. McKenney1-2/+2
In order to gain better race coverage, move the test start/stop waits in stutter_wait() to torture_hrtimeout_jiffies(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-08-15torture: Move torture_shuffle() timeouts to hrtimersPaul E. McKenney1-1/+3
In order to gain better race coverage, move the CPU-migration timed waits in torture_shuffle() to torture_hrtimeout_jiffies(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-08-15torture: Move torture_onoff() timeouts to hrtimersPaul E. McKenney1-3/+3
In order to gain better race coverage, move the CPU-hotplug-related timed waits in torture_onoff() to torture_hrtimeout_jiffies(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-08-15torture: Make torture_hrtimeout_*() use TASK_IDLEPaul E. McKenney1-1/+1
Given that it is expected that more code will use torture_hrtimeout_*(), including for longer timeouts, make it use TASK_IDLE instead of TASK_UNINTERRUPTIBLE. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-08-15torture: Add lock_torture writer_fifo module parameterDietmar Eggemann2-6/+9
This commit adds a module parameter that causes the locktorture writer to run at real-time priority. To use it: insmod /lib/modules/torture.ko random_shuffle=1 insmod /lib/modules/locktorture.ko torture_type=mutex_lock rt_boost=1 rt_boost_factor=50 nested_locks=3 writer_fifo=1 ^^^^^^^^^^^^^ A predecessor to this patch has been helpful to uncover issues with the proxy-execution series. [ paulmck: Remove locktorture-specific code from kernel/torture.c. ] Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: kernel-team@android.com Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> [jstultz: Include header change to build, reword commit message] Signed-off-by: John Stultz <jstultz@google.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-08-15torture: Add a kthread-creation callback to _torture_create_kthread()Paul E. McKenney1-1/+5
This commit adds a kthread-creation callback to the _torture_create_kthread() function, which allows callers of a new torture_create_kthread_cb() macro to specify a function to be invoked after the kthread is created but before it is awakened for the first time. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: kernel-team@android.com Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: John Stultz <jstultz@google.com>
2023-08-15rcu-tasks: Fix boot-time RCU tasks debug-only deadlockPaul E. McKenney1-0/+2
In kernels built with CONFIG_PROVE_RCU=y (for example, lockdep kernels), the following sequence of events can occur: o rcu_init_tasks_generic() is invoked just before init is spawned. It invokes rcu_spawn_tasks_kthread() and friends. o rcu_spawn_tasks_kthread() invokes rcu_spawn_tasks_kthread_generic(), which uses kthread_run() to create the needed kthread. o Control returns to rcu_init_tasks_generic(), which, because this is a CONFIG_PROVE_RCU=y kernel, invokes the version of the rcu_tasks_initiate_self_tests() function that actually does something, including invoking synchronize_rcu_tasks(), which in turn invokes synchronize_rcu_tasks_generic(). o synchronize_rcu_tasks_generic() sees that the ->kthread_ptr is still NULL, because the newly spawned kthread has not yet started. o The new kthread starts, preempting synchronize_rcu_tasks_generic() just after its check. This kthread invokes rcu_tasks_one_gp(), which acquires ->tasks_gp_mutex, and, seeing no work, blocks in rcuwait_wait_event(). Note that this step requires either a preemptible kernel or a fault-injection-style sleep at the beginning of mutex_lock(). o synchronize_rcu_tasks_generic() resumes and invokes rcu_tasks_one_gp(). o rcu_tasks_one_gp() attempts to acquire ->tasks_gp_mutex, which is still held by the newly spawned kthread's rcu_tasks_one_gp() function. Deadlock. Because the only reason for ->tasks_gp_mutex is to handle pre-kthread synchronous grace periods, this commit avoids this deadlock by having rcu_tasks_one_gp() momentarily release ->tasks_gp_mutex while invoking rcuwait_wait_event(). This allows the call to rcu_tasks_one_gp() from synchronize_rcu_tasks_generic() proceed. Note that it is not necessary to release the mutex anywhere else in rcu_tasks_one_gp() because rcuwait_wait_event() is the only function that can block indefinitely. Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Roy Hopkins <rhopkins@suse.de> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Roy Hopkins <rhopkins@suse.de>
2023-08-14sched: Simplify sched_core_cpu_{starting,deactivate}()Peter Zijlstra1-15/+12
Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211812.371787909@infradead.org
2023-08-14sched: Simplify try_steal_cookie()Peter Zijlstra1-12/+9
Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211812.304154828@infradead.org
2023-08-14sched: Simplify sched_tick_remote()Peter Zijlstra1-23/+16
Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211812.236247952@infradead.org
2023-08-14sched: Simplify sched_exec()Peter Zijlstra1-12/+9
Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211812.168490417@infradead.org
2023-08-14sched: Simplify ttwu()Peter Zijlstra1-112/+109
Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211812.101069260@infradead.org
2023-08-14sched: Simplify wake_up_if_idle()Peter Zijlstra2-14/+21
Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211812.032678917@infradead.org
2023-08-14sched: Simplify: migrate_swap_stop()Peter Zijlstra2-16/+27
Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211811.964370836@infradead.org
2023-08-14sched: Simplify sysctl_sched_uclamp_handler()Peter Zijlstra1-7/+4
Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211811.896559109@infradead.org
2023-08-14sched: Simplify get_nohz_timer_target()Peter Zijlstra1-9/+6
Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211811.828443100@infradead.org
2023-08-14sched/rt: sysctl_sched_rr_timeslice show default timeslice after resetCyril Hrubis1-0/+3
The sched_rr_timeslice can be reset to default by writing value that is <= 0. However after reading from this file we always got the last value written, which is not useful at all. $ echo -1 > /proc/sys/kernel/sched_rr_timeslice_ms $ cat /proc/sys/kernel/sched_rr_timeslice_ms -1 Fix this by setting the variable that holds the sysctl file value to the jiffies_to_msecs(RR_TIMESLICE) in case that <= 0 value was written. Signed-off-by: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Petr Vorel <pvorel@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Tested-by: Petr Vorel <pvorel@suse.cz> Link: https://lore.kernel.org/r/20230802151906.25258-3-chrubis@suse.cz
2023-08-14sched/rt: Fix sysctl_sched_rr_timeslice intial valueCyril Hrubis1-1/+1
There is a 10% rounding error in the intial value of the sysctl_sched_rr_timeslice with CONFIG_HZ_300=y. This was found with LTP test sched_rr_get_interval01: sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90 sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90 What this test does is to compare the return value from the sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and fails if they do not match. The problem it found is the intial sysctl file value which was computed as: static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE; which works fine as long as MSEC_PER_SEC is multiple of HZ, however it introduces 10% rounding error for CONFIG_HZ_300: (MSEC_PER_SEC / HZ) * (100 * HZ / 1000) (1000 / 300) * (100 * 300 / 1000) 3 * 30 = 90 This can be easily fixed by reversing the order of the multiplication and division. After this fix we get: (MSEC_PER_SEC * (100 * HZ / 1000)) / HZ (1000 * (100 * 300 / 1000)) / 300 (1000 * 30) / 300 = 100 Fixes: 975e155ed873 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds") Signed-off-by: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Petr Vorel <pvorel@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Tested-by: Petr Vorel <pvorel@suse.cz> Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz
2023-08-14printk: ringbuffer: Fix truncating buffer size min_t castKees Cook1-1/+1
If an output buffer size exceeded U16_MAX, the min_t(u16, ...) cast in copy_data() was causing writes to truncate. This manifested as output bytes being skipped, seen as %NUL bytes in pstore dumps when the available record size was larger than 65536. Fix the cast to no longer truncate the calculation. Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: John Ogness <john.ogness@linutronix.de> Reported-by: Vijay Balakrishna <vijayb@linux.microsoft.com> Link: https://lore.kernel.org/lkml/d8bb1ec7-a4c5-43a2-9de0-9643a70b899f@linux.microsoft.com/ Fixes: b6cf8b3f3312 ("printk: add lockless ringbuffer") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Vijay Balakrishna <vijayb@linux.microsoft.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Reviewed-by: Tyler Hicks (Microsoft) <code@tyhicks.com> Tested-by: Tyler Hicks (Microsoft) <code@tyhicks.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230811054528.never.165-kees@kernel.org
2023-08-14Merge back system-wide sleep material for v6.6.Rafael J. Wysocki1-38/+149
2023-08-11Merge tag 'pm-6.5-rc6' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix an amd-pstate cpufreq driver issues and recently introduced hibernation-related breakage. Specifics: - Make amd-pstate use device_attributes as expected by the CPU root kobject (Thomas Weißschuh) - Restore the previous behavior of resume_store() when hibernation is not available which is to return the full number of bytes that were to be written by user space (Vlastimil Babka)" * tag 'pm-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: amd-pstate: fix global sysfs attribute type PM: hibernate: fix resume_store() return value when hibernation not available
2023-08-11Merge tag 'for-netdev' of ↵Jakub Kicinski9-25/+44
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2023-08-09 We've added 19 non-merge commits during the last 6 day(s) which contain a total of 25 files changed, 369 insertions(+), 141 deletions(-). The main changes are: 1) Fix array-index-out-of-bounds access when detaching from an already empty mprog entry from Daniel Borkmann. 2) Adjust bpf selftest because of a recent llvm change related to the cpu-v4 ISA from Eduard Zingerman. 3) Add uprobe support for the bpf_get_func_ip helper from Jiri Olsa. 4) Fix a KASAN splat due to the kernel incorrectly accepted an invalid program using the recent cpu-v4 instruction from Yonghong Song. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: bpf: btf: Remove two unused function declarations bpf: lru: Remove unused declaration bpf_lru_promote() selftests/bpf: relax expected log messages to allow emitting BPF_ST selftests/bpf: remove duplicated functions bpf, docs: Fix small typo and define semantics of sign extension selftests/bpf: Add bpf_get_func_ip test for uprobe inside function selftests/bpf: Add bpf_get_func_ip tests for uprobe on function entry bpf: Add support for bpf_get_func_ip helper for uprobe program selftests/bpf: Add a movsx selftest for sign-extension of R10 bpf: Fix an incorrect verification success with movsx insn bpf, docs: Formalize type notation and function semantics in ISA standard bpf: change bpf_alu_sign_string and bpf_movsx_string to static libbpf: Use local includes inside the library bpf: fix bpf_dynptr_slice() to stop return an ERR_PTR. bpf: fix inconsistent return types of bpf_xdp_copy_buf(). selftests/bpf: fix the incorrect verification of port numbers. selftests/bpf: Add test for detachment on empty mprog entry bpf: Fix mprog detachment for empty mprog entry bpf: bpf_struct_ops: Remove unnecessary initial values of variables ==================== Link: https://lore.kernel.org/r/20230810055123.109578-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+42
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/intel/igc/igc_main.c 06b412589eef ("igc: Add lock to safeguard global Qbv variables") d3750076d464 ("igc: Add TransmissionOverrun counter") drivers/net/ethernet/microsoft/mana/mana_en.c a7dfeda6fdec ("net: mana: Fix MANA VF unload when hardware is unresponsive") a9ca9f9ceff3 ("page_pool: split types and declarations from page_pool.h") 92272ec4107e ("eth: add missing xdp.h includes in drivers") net/mptcp/protocol.h 511b90e39250 ("mptcp: fix disconnect vs accept race") b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning") tools/testing/selftests/net/mptcp/mptcp_join.sh c8c101ae390a ("selftests: mptcp: join: fix 'implicit EP' test") 03668c65d153 ("selftests: mptcp: join: rework detailed report") Signed-off-by: Jakub Kicinski <kuba@kernel.org>