summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2021-05-21bpf, offload: Reorder offload callback 'prepare' in verifierYinjun Zhang1-6/+6
Commit 4976b718c355 ("bpf: Introduce pseudo_btf_id") switched the order of resolve_pseudo_ldimm(), in which some pseudo instructions are rewritten. Thus those rewritten instructions cannot be passed to driver via 'prepare' offload callback. Reorder the 'prepare' offload callback to fix it. Fixes: 4976b718c355 ("bpf: Introduce pseudo_btf_id") Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210520085834.15023-1-simon.horman@netronome.com
2021-05-21bpf: Avoid using ARRAY_SIZE on an uninitialized pointerFlorent Revest1-2/+3
The cppcheck static code analysis reported the following error: if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(bufs->tmp_bufs))) { ^ ARRAY_SIZE is a macro that expands to sizeofs, so bufs is not actually dereferenced at runtime, and the code is actually safe. But to keep things tidy, this patch removes the need for a call to ARRAY_SIZE by extracting the size of the array into a macro. Cppcheck should no longer be confused and the code ends up being a bit cleaner. Fixes: e2d5b2bb769f ("bpf: Fix nested bpf_bprintf_prepare with more per-cpu buffers") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20210517092830.1026418-2-revest@chromium.org
2021-05-21bpf: Clarify a bpf_bprintf_prepare macroFlorent Revest1-4/+5
The per-cpu buffers contain bprintf data rather than printf arguments. The macro name and comment were a bit confusing, this rewords them in a clearer way. Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20210517092830.1026418-1-revest@chromium.org
2021-05-21bpf: Fix BPF_JIT kconfig symbol dependencyDaniel Borkmann1-0/+1
Randy reported a randconfig build error recently on i386: ld: arch/x86/net/bpf_jit_comp32.o: in function `do_jit': bpf_jit_comp32.c:(.text+0x28c9): undefined reference to `__bpf_call_base' ld: arch/x86/net/bpf_jit_comp32.o: in function `bpf_int_jit_compile': bpf_jit_comp32.c:(.text+0x3694): undefined reference to `bpf_jit_blind_constants' ld: bpf_jit_comp32.c:(.text+0x3719): undefined reference to `bpf_jit_binary_free' ld: bpf_jit_comp32.c:(.text+0x3745): undefined reference to `bpf_jit_binary_alloc' ld: bpf_jit_comp32.c:(.text+0x37d3): undefined reference to `bpf_jit_prog_release_other' [...] The cause was that b24abcff918a ("bpf, kconfig: Add consolidated menu entry for bpf with core options") moved BPF_JIT from net/Kconfig into kernel/bpf/Kconfig and previously BPF_JIT was guarded by a 'if NET'. However, there is no actual dependency on NET, it's just that menuconfig NET selects BPF. And the latter in turn causes kernel/bpf/core.o to be built which contains above symbols. Randy's randconfig didn't have NET set, and BPF wasn't either, but BPF_JIT otoh was. Detangle this by making BPF_JIT depend on BPF instead. arm64 was the only arch that pulled in its JIT in net/ via obj-$(CONFIG_NET), all others unconditionally pull this dir in via obj-y. Do the same since CONFIG_NET guard there is really useless as we compiled the JIT via obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o anyway. Fixes: b24abcff918a ("bpf, kconfig: Add consolidated menu entry for bpf with core options") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org>
2021-05-20wq: handle VM suspension in stall detectionSergey Senozhatsky1-2/+10
If VCPU is suspended (VM suspend) in wq_watchdog_timer_fn() then once this VCPU resumes it will see the new jiffies value, while it may take a while before IRQ detects PVCLOCK_GUEST_STOPPED on this VCPU and updates all the watchdogs via pvclock_touch_watchdogs(). There is a small chance of misreported WQ stalls in the meantime, because new jiffies is time_after() old 'ts + thresh'. wq_watchdog_timer_fn() { for_each_pool(pool, pi) { if (time_after(jiffies, ts + thresh)) { pr_emerg("BUG: workqueue lockup - pool"); } } } Save jiffies at the beginning of this function and use that value for stall detection. If VM gets suspended then we continue using "old" jiffies value and old WQ touch timestamps. If IRQ at some point restarts the stall detection cycle (pvclock_touch_watchdogs()) then old jiffies will always be before new 'ts + thresh'. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-05-20cgroup: disable controllers at parse timeShakeel Butt1-8/+5
This patch effectively reverts the commit a3e72739b7a7 ("cgroup: fix too early usage of static_branch_disable()"). The commit 6041186a3258 ("init: initialize jump labels before command line option parsing") has moved the jump_label_init() before parse_args() which has made the commit a3e72739b7a7 unnecessary. On the other hand there are consequences of disabling the controllers later as there are subsystems doing the controller checks for different decisions. One such incident is reported [1] regarding the memory controller and its impact on memory reclaim code. [1] https://lore.kernel.org/linux-mm/921e53f3-4b13-aab8-4a9e-e83ff15371e4@nec.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Reported-by: NOMURA JUNICHI(野村 淳一) <junichi.nomura@nec.com> Signed-off-by: Tejun Heo <tj@kernel.org> Tested-by: Jun'ichi Nomura <junichi.nomura@nec.com>
2021-05-19bpf: Make some symbols staticPu Lehui1-2/+2
The sparse tool complains as follows: kernel/bpf/syscall.c:4567:29: warning: symbol 'bpf_sys_bpf_proto' was not declared. Should it be static? kernel/bpf/syscall.c:4592:29: warning: symbol 'bpf_sys_close_proto' was not declared. Should it be static? This symbol is not used outside of syscall.c, so marks it static. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210519064116.240536-1-pulehui@huawei.com
2021-05-19sched: Fix a stale comment in pick_next_task()Masahiro Yamada1-1/+1
fair_sched_class->next no longer exists since commit: a87e749e8fa1 ("sched: Remove struct sched_class::next field"). Now the sched_class order is specified by the linker script. Rewrite the comment in a more generic way. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210519063709.323162-1-masahiroy@kernel.org
2021-05-19Merge branch 'irq/affinity' into irq/coreThomas Gleixner1-1/+32
Merge the export of irq_set_affinity() which is a standalone commit so it can be pulled into other trees.
2021-05-19genirq: Export affinity setter for modulesThomas Gleixner1-1/+32
Perf modules abuse irq_set_affinity_hint() to set the affinity of system PMU interrupts just because irq_set_affinity() was not exported. The fact that irq_set_affinity_hint() actually sets the affinity is a non-documented side effect and the name is clearly saying it's a hint. To clean this up, export the real affinity setter. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210518093117.968251441@linutronix.de
2021-05-19sched/uclamp: Fix locking around cpu_util_update_eff()Qais Yousef1-0/+7
cpu_cgroup_css_online() calls cpu_util_update_eff() without holding the uclamp_mutex or rcu_read_lock() like other call sites, which is a mistake. The uclamp_mutex is required to protect against concurrent reads and writes that could update the cgroup hierarchy. The rcu_read_lock() is required to traverse the cgroup data structures in cpu_util_update_eff(). Surround the caller with the required locks and add some asserts to better document the dependency in cpu_util_update_eff(). Fixes: 7226017ad37a ("sched/uclamp: Fix a bug in propagating uclamp value in new cgroups") Reported-by: Quentin Perret <qperret@google.com> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210510145032.1934078-3-qais.yousef@arm.com
2021-05-19sched/uclamp: Fix wrong implementation of cpu.uclamp.minQais Yousef1-4/+17
cpu.uclamp.min is a protection as described in cgroup-v2 Resource Distribution Model Documentation/admin-guide/cgroup-v2.rst which means we try our best to preserve the minimum performance point of tasks in this group. See full description of cpu.uclamp.min in the cgroup-v2.rst. But the current implementation makes it a limit, which is not what was intended. For example: tg->cpu.uclamp.min = 20% p0->uclamp[UCLAMP_MIN] = 0 p1->uclamp[UCLAMP_MIN] = 50% Previous Behavior (limit): p0->effective_uclamp = 0 p1->effective_uclamp = 20% New Behavior (Protection): p0->effective_uclamp = 20% p1->effective_uclamp = 50% Which is inline with how protections should work. With this change the cgroup and per-task behaviors are the same, as expected. Additionally, we remove the confusing relationship between cgroup and !user_defined flag. We don't want for example RT tasks that are boosted by default to max to change their boost value when they attach to a cgroup. If a cgroup wants to limit the max performance point of tasks attached to it, then cpu.uclamp.max must be set accordingly. Or if they want to set different boost value based on cgroup, then sysctl_sched_util_clamp_min_rt_default must be used to NOT boost to max and set the right cpu.uclamp.min for each group to let the RT tasks obtain the desired boost value when attached to that group. As it stands the dependency on !user_defined flag adds an extra layer of complexity that is not required now cpu.uclamp.min behaves properly as a protection. The propagation model of effective cpu.uclamp.min in child cgroups as implemented by cpu_util_update_eff() is still correct. The parent protection sets an upper limit of what the child cgroups will effectively get. Fixes: 3eac870a3247 (sched/uclamp: Use TG's clamps to restrict TASK's clamps) Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com
2021-05-19bpf: Add bpf_sys_close() helper.Alexei Starovoitov1-0/+19
Add bpf_sys_close() helper to be used by the syscall/loader program to close intermediate FDs and other cleanup. Note this helper must never be allowed inside fdget/fdput bracketing. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-11-alexei.starovoitov@gmail.com
2021-05-19bpf: Add bpf_btf_find_by_name_kind() helper.Alexei Starovoitov2-0/+64
Add new helper: long bpf_btf_find_by_name_kind(char *name, int name_sz, u32 kind, int flags) Description Find BTF type with given name and kind in vmlinux BTF or in module's BTFs. Return Returns btf_id and btf_obj_fd in lower and upper 32 bits. It will be used by loader program to find btf_id to attach the program to and to find btf_ids of ksyms. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-10-alexei.starovoitov@gmail.com
2021-05-19bpf: Introduce fd_idxAlexei Starovoitov2-11/+38
Typical program loading sequence involves creating bpf maps and applying map FDs into bpf instructions in various places in the bpf program. This job is done by libbpf that is using compiler generated ELF relocations to patch certain instruction after maps are created and BTFs are loaded. The goal of fd_idx is to allow bpf instructions to stay immutable after compilation. At load time the libbpf would still create maps as usual, but it wouldn't need to patch instructions. It would store map_fds into __u32 fd_array[] and would pass that pointer to sys_bpf(BPF_PROG_LOAD). Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-9-alexei.starovoitov@gmail.com
2021-05-19bpf: Make btf_load command to be bpfptr_t compatible.Alexei Starovoitov2-7/+8
Similar to prog_load make btf_load command to be availble to bpf_prog_type_syscall program. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-7-alexei.starovoitov@gmail.com
2021-05-19bpf: Prepare bpf syscall to be used from kernel and user space.Alexei Starovoitov3-61/+99
With the help from bpfptr_t prepare relevant bpf syscall commands to be used from kernel and user space. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-4-alexei.starovoitov@gmail.com
2021-05-19bpf: Introduce bpf_sys_bpf() helper and program type.Alexei Starovoitov2-0/+61
Add placeholders for bpf_sys_bpf() helper and new program type. Make sure to check that expected_attach_type is zero for future extensibility. Allow tracing helper functions to be used in this program type, since they will only execute from user context via bpf_prog_test_run. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-2-alexei.starovoitov@gmail.com
2021-05-19signal: Deliver all of the siginfo perf data in _perfEric W. Biederman1-8/+13
Don't abuse si_errno and deliver all of the perf data in _perf member of siginfo_t. Note: The data field in the perf data structures in a u64 to allow a pointer to be encoded without needed to implement a 32bit and 64bit version of the same structure. There already exists a 32bit and 64bit versions siginfo_t, and the 32bit version can not include a 64bit member as it only has 32bit alignment. So unsigned long is used in siginfo_t instead of a u64 as unsigned long can encode a pointer on all architectures linux supports. v1: https://lkml.kernel.org/r/m11rarqqx2.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210503203814.25487-10-ebiederm@xmission.com v3: https://lkml.kernel.org/r/20210505141101.11519-11-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-4-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-19signal: Factor force_sig_perf out of perf_sigtrapEric W. Biederman2-9/+15
Separate filling in siginfo for TRAP_PERF from deciding that siginal needs to be sent. There are enough little details that need to be correct when properly filling in siginfo_t that it is easy to make mistakes if filling in the siginfo_t is in the same function with other logic. So factor out force_sig_perf to reduce the cognative load of on reviewers, maintainers and implementors. v1: https://lkml.kernel.org/r/m17dkjqqxz.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-10-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-3-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-19signal: Implement SIL_FAULT_TRAPNOEric W. Biederman1-22/+12
Now that si_trapno is part of the union in _si_fault and available on all architectures, add SIL_FAULT_TRAPNO and update siginfo_layout to return SIL_FAULT_TRAPNO when the code assumes si_trapno is valid. There is room for future changes to reduce when si_trapno is valid but this is all that is needed to make si_trapno and the other members of the the union in _sigfault mutually exclusive. Update the code that uses siginfo_layout to deal with SIL_FAULT_TRAPNO and have the same code ignore si_trapno in in all other cases. v1: https://lkml.kernel.org/r/m1o8dvs7s7.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-6-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-2-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-19siginfo: Move si_trapno inside the union inside _si_faultEric W. Biederman1-0/+1
It turns out that linux uses si_trapno very sparingly, and as such it can be considered extra information for a very narrow selection of signals, rather than information that is present with every fault reported in siginfo. As such move si_trapno inside the union inside of _si_fault. This results in no change in placement, and makes it eaiser to extend _si_fault in the future as this reduces the number of special cases. In particular with si_trapno included in the union it is no longer a concern that the union must be pointer aligned on most architectures because the union follows immediately after si_addr which is a pointer. This change results in a difference in siginfo field placement on sparc and alpha for the fields si_addr_lsb, si_lower, si_upper, si_pkey, and si_perf. These architectures do not implement the signals that would use si_addr_lsb, si_lower, si_upper, si_pkey, and si_perf. Further these architecture have not yet implemented the userspace that would use si_perf. The point of this change is in fact to correct these placement issues before sparc or alpha grow userspace that cares. This change was discussed[1] and the agreement is that this change is currently safe. [1]: https://lkml.kernel.org/r/CAK8P3a0+uKYwL1NhY6Hvtieghba2hKYGD6hcKx5n8=4Gtt+pHA@mail.gmail.com Acked-by: Marco Elver <elver@google.com> v1: https://lkml.kernel.org/r/m1tunns7yf.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-5-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-1-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-18kcsan: Report observed value changesMark Rutland3-9/+33
When a thread detects that a memory location was modified without its watchpoint being hit, the report notes that a change was detected, but does not provide concrete values for the change. Knowing the concrete values can be very helpful in tracking down any racy writers (e.g. as specific values may only be written in some portions of code, or under certain conditions). When we detect a modification, let's report the concrete old/new values, along with the access's mask of relevant bits (and which relevant bits were modified). This can make it easier to identify potential racy writers. As the snapshots are at most 8 bytes, we can only report values for acceses up to this size, but this appears to cater for the common case. When we detect a race via a watchpoint, we may or may not have concrete values for the modification. To be helpful, let's attempt to log them when we do as they can be ignored where irrelevant. The resulting reports appears as follows, with values zero-padded to the access width: | ================================================================== | BUG: KCSAN: data-race in el0_svc_common+0x34/0x25c arch/arm64/kernel/syscall.c:96 | | race at unknown origin, with read to 0xffff00007ae6aa00 of 8 bytes by task 223 on cpu 1: | el0_svc_common+0x34/0x25c arch/arm64/kernel/syscall.c:96 | do_el0_svc+0x48/0xec arch/arm64/kernel/syscall.c:178 | el0_svc arch/arm64/kernel/entry-common.c:226 [inline] | el0_sync_handler+0x1a4/0x390 arch/arm64/kernel/entry-common.c:236 | el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:674 | | value changed: 0x0000000000000000 -> 0x0000000000000002 | | Reported by Kernel Concurrency Sanitizer on: | CPU: 1 PID: 223 Comm: syz-executor.1 Not tainted 5.8.0-rc3-00094-ga73f923ecc8e-dirty #3 | Hardware name: linux,dummy-virt (DT) | ================================================================== If an access mask is set, it is shown underneath the "value changed" line as "bits changed: 0x<bits changed> with mask 0x<non-zero mask>". Signed-off-by: Mark Rutland <mark.rutland@arm.com> [ elver@google.com: align "value changed" and "bits changed" lines, which required massaging the message; do not print bits+mask if no mask set. ] Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-18kcsan: Remove kcsan_report_typeMark Rutland2-42/+20
Now that the reporting code has been refactored, it's clear by construction that print_report() can only be passed KCSAN_REPORT_RACE_SIGNAL or KCSAN_REPORT_RACE_UNKNOWN_ORIGIN, and these can also be distinguished by the presence of `other_info`. Let's simplify things and remove the report type enum, and instead let's check `other_info` to distinguish these cases. This allows us to remove code for cases which are impossible and generally makes the code simpler. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> [ elver@google.com: add updated comments to kcsan_report_*() functions ] Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-18kcsan: Remove reporting indirectionMark Rutland1-66/+49
Now that we have separate kcsan_report_*() functions, we can factor the distinct logic for each of the report cases out of kcsan_report(). While this means each case has to handle mutual exclusion independently, this minimizes the conditionality of code and makes it easier to read, and will permit passing distinct bits of information to print_report() in future. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> [ elver@google.com: retain comment about lockdep_off() ] Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-18kcsan: Refactor access_info initializationMark Rutland1-17/+25
In subsequent patches we'll want to split kcsan_report() into distinct handlers for each report type. The largest bit of common work is initializing the `access_info`, so let's factor this out into a helper, and have the kcsan_report_*() functions pass the `aaccess_info` as a parameter to kcsan_report(). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-18kcsan: Fold panic() call into print_report()Mark Rutland1-13/+8
So that we can add more callers of print_report(), lets fold the panic() call into print_report() so the caller doesn't have to handle this explicitly. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-18kcsan: Refactor passing watchpoint/other_infoMark Rutland1-9/+4
The `watchpoint_idx` argument to kcsan_report() isn't meaningful for races which were not detected by a watchpoint, and it would be clearer if callers passed the other_info directly so that a NULL value can be passed in this case. Given that callers manipulate their watchpoints before passing the index into kcsan_report_*(), and given we index the `other_infos` array using this before we sanity-check it, the subsequent sanity check isn't all that useful. Let's remove the `watchpoint_idx` sanity check, and move the job of finding the `other_info` out of kcsan_report(). Other than the removal of the check, there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-18kcsan: Distinguish kcsan_report() callsMark Rutland3-15/+33
Currently kcsan_report() is used to handle three distinct cases: * The caller hit a watchpoint when attempting an access. Some information regarding the caller and access are recorded, but no output is produced. * A caller which previously setup a watchpoint detected that the watchpoint has been hit, and possibly detected a change to the location in memory being watched. This may result in output reporting the interaction between this caller and the caller which hit the watchpoint. * A caller detected a change to a modification to a memory location which wasn't detected by a watchpoint, for which there is no information on the other thread. This may result in output reporting the unexpected change. ... depending on the specific case the caller has distinct pieces of information available, but the prototype of kcsan_report() has to handle all three cases. This means that in some cases we pass redundant information, and in others we don't pass all the information we could pass. This also means that the report code has to demux these three cases. So that we can pass some additional information while also simplifying the callers and report code, add separate kcsan_report_*() functions for the distinct cases, updating callers accordingly. As the watchpoint_idx is unused in the case of kcsan_report_unknown_origin(), this passes a dummy value into kcsan_report(). Subsequent patches will refactor the report code to avoid this. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> [ elver@google.com: try to make kcsan_report_*() names more descriptive ] Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-18kcsan: Simplify value change detectionMark Rutland1-24/+16
In kcsan_setup_watchpoint() we store snapshots of a watched value into a union of u8/u16/u32/u64 sized fields, modify this in place using a consistent field, then later check for any changes via the u64 field. We can achieve the safe effect more simply by always treating the field as a u64, as smaller values will be zero-extended. As the values are zero-extended, we don't need to truncate the access_mask when we apply it, and can always apply the full 64-bit access_mask to the 64-bit value. Finally, we can store the two snapshots and calculated difference separately, which makes the code a little easier to read, and will permit reporting the old/new values in subsequent patches. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-18kcsan: Fix debugfs initcall return typeArnd Bergmann1-1/+2
clang with CONFIG_LTO_CLANG points out that an initcall function should return an 'int' due to the changes made to the initcall macros in commit 3578ad11f3fb ("init: lto: fix PREL32 relocations"): kernel/kcsan/debugfs.c:274:15: error: returning 'void' from a function with incompatible result type 'int' late_initcall(kcsan_debugfs_init); ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~ include/linux/init.h:292:46: note: expanded from macro 'late_initcall' #define late_initcall(fn) __define_initcall(fn, 7) Fixes: e36299efe7d7 ("kcsan, debugfs: Move debugfs file creation out of early init") Cc: stable <stable@vger.kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-18Merge branches 'bitmaprange.2021.05.10c', 'doc.2021.05.10c', ↵Paul E. McKenney15-469/+731
'fixes.2021.05.13a', 'kvfree_rcu.2021.05.10c', 'mmdumpobj.2021.05.10c', 'nocb.2021.05.12a', 'srcu.2021.05.12a', 'tasks.2021.05.18a' and 'torture.2021.05.10c' into HEAD bitmaprange.2021.05.10c: Allow "all" for bitmap ranges. doc.2021.05.10c: Documentation updates. fixes.2021.05.13a: Miscellaneous fixes. kvfree_rcu.2021.05.10c: kvfree_rcu() updates. mmdumpobj.2021.05.10c: mem_dump_obj() updates. nocb.2021.05.12a: RCU NOCB CPU updates, including limited deoffloading. srcu.2021.05.12a: SRCU updates. tasks.2021.05.18a: Tasks-RCU updates. torture.2021.05.10c: Torture-test updates.
2021-05-18tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inlinePaul E. McKenney2-1/+4
In some architectures, the no-op variant of show_rcu_tasks_gp_kthreads() get "no previous prototype" compiler warnings. These are false positives given that kernel/rcu/tasks.h is included only once. But why put up with the compiler noise? This commit therefore adds "static inline" to this definition to force the compiler to accept this situation, while also moving it to its proper place in kernel/rcu/rcu.h. Reported-by: kernel test robot <lkp@intel.com> [ paulmck: Update per Stephen Rothwell feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-18rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent statesPaul E. McKenney1-0/+1
Heavy networking load can cause a CPU to execute continuously and indefinitely within ksoftirqd, in which case there will be no voluntary task switches and thus no RCU-tasks quiescent states. This commit therefore causes the exiting rcu_softirq_qs() to provide an RCU-tasks quiescent state. This of course means that __do_softirq() and its callers cannot be invoked from within a tracing trampoline. Reported-by: Toke Høiland-Jørgensen <toke@redhat.com> Tested-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org>
2021-05-18sched: Make the idle task quack like a per-CPU kthreadValentin Schneider2-18/+33
For all intents and purposes, the idle task is a per-CPU kthread. It isn't created via the same route as other pcpu kthreads however, and as a result it is missing a few bells and whistles: it fails kthread_is_per_cpu() and it doesn't have PF_NO_SETAFFINITY set. Fix the former by giving the idle task a kthread struct along with the KTHREAD_IS_PER_CPU flag. This requires some extra iffery as init_idle() call be called more than once on the same idle task. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210510151024.2448573-2-valentin.schneider@arm.com
2021-05-18sched,stats: Further simplify sched_infoPeter Zijlstra1-9/+12
There's no point doing delta==0 updates. Suggested-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2021-05-18locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signalZqiang4-11/+17
When a interruptible mutex locker is interrupted by a signal without acquiring this lock and removed from the wait queue. if the mutex isn't contended enough to have a waiter put into the wait queue again, the setting of the WAITER bit will force mutex locker to go into the slowpath to acquire the lock every time, so if the wait queue is empty, the WAITER bit need to be clear. Fixes: 040a0a371005 ("mutex: Add support for wound/wait style locks") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210517034005.30828-1-qiang.zhang@windriver.com
2021-05-18locking/lockdep: Correct calling tracepointsLeo Yan1-2/+2
The commit eb1f00237aca ("lockdep,trace: Expose tracepoints") reverses tracepoints for lock_contended() and lock_acquired(), thus the ftrace log shows the wrong locking sequence that "acquired" event is prior to "contended" event: <idle>-0 [001] d.s3 20803.501685: lock_acquire: 0000000008b91ab4 &sg_policy->update_lock <idle>-0 [001] d.s3 20803.501686: lock_acquired: 0000000008b91ab4 &sg_policy->update_lock <idle>-0 [001] d.s3 20803.501689: lock_contended: 0000000008b91ab4 &sg_policy->update_lock <idle>-0 [001] d.s3 20803.501690: lock_release: 0000000008b91ab4 &sg_policy->update_lock This patch fixes calling tracepoints for lock_contended() and lock_acquired(). Fixes: eb1f00237aca ("lockdep,trace: Expose tracepoints") Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210512120937.90211-1-leo.yan@linaro.org
2021-05-17genirq: Add a IRQF_NO_DEBUG flagThomas Gleixner4-2/+19
The whole call to note_interrupt() can be avoided or return early when interrupts would be marked accordingly. For IPI handlers which always return HANDLED the whole procedure is pretty pointless to begin with. Add a IRQF_NO_DEBUG flag and mark the interrupt accordingly if supplied when the interrupt is requested. When noirqdebug is set on the kernel commandline, then the interrupt is marked unconditionally so that there is only one condition in the hotpath to evaluate. [ clg: Add changelog ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/7a8ad02f-63a8-c1aa-fdd1-39d973593d02@kaod.org
2021-05-17kdb: Switch to use %ptTsAndy Shevchenko1-8/+1
Use %ptTs instead of open-coded variant to print contents of time64_t type in human readable form. Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: kgdb-bugreport@lists.sourceforge.net Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210511153958.34527-2-andriy.shevchenko@linux.intel.com
2021-05-17module: check for exit sections in layout_sections() instead of ↵Jessica Yu1-6/+11
module_init_section() Previously, when CONFIG_MODULE_UNLOAD=n, the module loader just does not attempt to load exit sections since it never expects that any code in those sections will ever execute. However, dynamic code patching (alternatives, jump_label and static_call) can have sites in __exit code, even if __exit is never executed. Therefore __exit must be present at runtime, at least for as long as __init code is. Commit 33121347fb1c ("module: treat exit sections the same as init sections when !CONFIG_MODULE_UNLOAD") solves the requirements of jump_labels and static_calls by putting the exit sections in the init region of the module so that they are at least present at init, and discarded afterwards. It does this by including a check for exit sections in module_init_section(), so that it also returns true for exit sections, and the module loader will automatically sort them in the init region of the module. However, the solution there was not completely arch-independent. ARM is a special case where it supplies its own module_{init, exit}_section() functions. Instead of pushing the exit section checks into module_init_section(), just implement the exit section check in layout_sections(), so that we don't have to touch arch-dependent code. Fixes: 33121347fb1c ("module: treat exit sections the same as init sections when !CONFIG_MODULE_UNLOAD") Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2021-05-16Merge tag 'timers-urgent-2021-05-16' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two fixes for timers: - Use the ALARM feature check in the alarmtimer core code insted of the old method of checking for the set_alarm() callback. Drivers can have that callback set but the feature bit cleared. If such a RTC device is selected then alarms wont work. - Use a proper define to let the preprocessor check whether Hyper-V VDSO clocksource should be active. The code used a constant in an enum with #ifdef, which evaluates to always false and disabled the clocksource for VDSO" * tag 'timers-urgent-2021-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/hyper-v: Re-enable VDSO_CLOCKMODE_HVCLOCK on X86 alarmtimer: Check RTC features instead of ops
2021-05-15Merge tag 'sched-urgent-2021-05-15' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Fix an idle CPU selection bug, and an AMD Ryzen maximum frequency enumeration bug" * tag 'sched-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, sched: Fix the AMD CPPC maximum performance value on certain AMD Ryzen generations sched/fair: Fix clearing of has_idle_cores flag in select_idle_cpu()
2021-05-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge misc fixes from Andrew Morton: "13 patches. Subsystems affected by this patch series: resource, squashfs, hfsplus, modprobe, and mm (hugetlb, slub, userfaultfd, ksm, pagealloc, kasan, pagemap, and ioremap)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/ioremap: fix iomap_max_page_shift docs: admin-guide: update description for kernel.modprobe sysctl hfsplus: prevent corruption in shrinking truncate mm/filemap: fix readahead return types kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled mm: fix struct page layout on 32-bit systems ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()" userfaultfd: release page in error path to avoid BUG_ON squashfs: fix divide error in calculate_skip() kernel/resource: fix return code check in __request_free_mem_region mm, slub: move slub_debug static key enabling outside slab_mutex mm/hugetlb: fix cow where page writtable in child mm/hugetlb: fix F_SEAL_FUTURE_WRITE
2021-05-15kernel/resource: fix return code check in __request_free_mem_regionAlistair Popple1-1/+1
Splitting an earlier version of a patch that allowed calling __request_region() while holding the resource lock into a series of patches required changing the return code for the newly introduced __request_region_locked(). Unfortunately this change was not carried through to a subsequent commit 56fd94919b8b ("kernel/resource: fix locking in request_free_mem_region") in the series. This resulted in a use-after-free due to freeing the struct resource without properly releasing it. Fix this by correcting the return code check so that the struct is not freed if the request to add it was successful. Link: https://lkml.kernel.org/r/20210512073528.22334-1-apopple@nvidia.com Fixes: 56fd94919b8b ("kernel/resource: fix locking in request_free_mem_region") Signed-off-by: Alistair Popple <apopple@nvidia.com> Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Muchun Song <smuchun@gmail.com> Cc: Oliver Sang <oliver.sang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14Merge tag 'trace-v5.13-rc1' of ↵Linus Torvalds1-4/+27
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix trace_check_vprintf() for %.*s The sanity check of all strings being read from the ring buffer to make sure they are in safe memory space did not account for the %.*s notation having another parameter to process (the length). Add that to the check" * tag 'trace-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Handle %.*s in trace_check_vprintf()
2021-05-14kernel/module: Use BUG_ON instead of if condition followed by BUGzhouchuangao1-2/+1
Fix the following coccinelle report: kernel/module.c:1018:2-5: WARNING: Use BUG_ON instead of if condition followed by BUG. BUG_ON uses unlikely in if(). Through disassembly, we can see that brk #0x800 is compiled to the end of the function. As you can see below: ...... ffffff8008660bec: d65f03c0 ret ffffff8008660bf0: d4210000 brk #0x800 Usually, the condition in if () is not satisfied. For the multi-stage pipeline, we do not need to perform fetch decode and excute operation on brk instruction. In my opinion, this can improve the efficiency of the multi-stage pipeline. Signed-off-by: zhouchuangao <zhouchuangao@vivo.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2021-05-13tracing: Handle %.*s in trace_check_vprintf()Steven Rostedt (VMware)1-4/+27
If a trace event uses the %*.s notation, the trace_check_vprintf() will fail and will warn about a bad processing of strings, because it does not take into account the length field when processing the star (*) part. Have it handle this case as well. Link: https://lore.kernel.org/linux-nfs/238C0E2D-C2A4-4578-ADD2-C565B3B99842@oracle.com/ Reported-by: Chuck Lever III <chuck.lever@oracle.com> Fixes: 9a6944fee68e2 ("tracing: Add a verifier to check string pointers for trace events") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-05-13rcu: Add missing __releases() annotationJules Irenge1-0/+1
Sparse reports a warning at rcu_print_task_stall(): "warning: context imbalance in rcu_print_task_stall - unexpected unlock" The root cause is a missing annotation on rcu_print_task_stall(). This commit therefore adds the missing __releases(rnp->lock) annotation. Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-13rcu: Improve comments describing RCU read-side critical sectionsPaul E. McKenney1-10/+14
There are a number of places that call out the fact that preempt-disable regions of code now act as RCU read-side critical sections, where preempt-disable regions of code include irq-disable regions of code, bh-disable regions of code, hardirq handlers, and NMI handlers. However, someone relying solely on (for example) the call_rcu() header comment might well have no idea that preempt-disable regions of code have RCU semantics. This commit therefore updates the header comments for call_rcu(), synchronize_rcu(), rcu_dereference_bh_check(), and rcu_dereference_sched_check() to call out these new(ish) forms of RCU readers. Reported-by: Michel Lespinasse <michel@lespinasse.org> [ paulmck: Apply Matthew Wilcox and Michel Lespinasse feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>