summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2022-04-21rcuscale: Allow rcuscale without RCU Tasks Rude/TracePaul E. McKenney2-3/+11
Currently, a CONFIG_PREEMPT_NONE=y kernel substitutes normal RCU for RCU Tasks Rude and RCU Tasks Trace. Unless that kernel builds rcuscale, whether built-in or as a module, in which case these RCU Tasks flavors are (unnecessarily) built in. This both increases kernel size and increases the complexity of certain tracing operations. This commit therefore decouples the presence of rcuscale from the presence of RCU Tasks Rude and RCU Tasks Trace. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-21rcuscale: Allow rcuscale without RCU TasksPaul E. McKenney2-2/+11
Currently, a CONFIG_PREEMPT_NONE=y kernel substitutes normal RCU for RCU Tasks. Unless that kernel builds rcuscale, whether built-in or as a module, in which case RCU Tasks is (unnecessarily) built. This both increases kernel size and increases the complexity of certain tracing operations. This commit therefore decouples the presence of rcuscale from the presence of RCU Tasks. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-21refscale: Allow refscale without RCU Tasks Rude/TracePaul E. McKenney2-3/+11
Currently, a CONFIG_PREEMPT_NONE=y kernel substitutes normal RCU for RCU Tasks Rude and RCU Tasks Trace. Unless that kernel builds refscale, whether built-in or as a module, in which case these RCU Tasks flavors are (unnecessarily) built in. This both increases kernel size and increases the complexity of certain tracing operations. This commit therefore decouples the presence of refscale from the presence of RCU Tasks Rude and RCU Tasks Trace. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-21refscale: Allow refscale without RCU TasksPaul E. McKenney2-2/+11
Currently, a CONFIG_PREEMPT_NONE=y kernel substitutes normal RCU for RCU Tasks. Unless that kernel builds refscale, whether built-in or as a module, in which case RCU Tasks is (unnecessarily) built in. This both increases kernel size and increases the complexity of certain tracing operations. This commit therefore decouples the presence of refscale from the presence of RCU Tasks. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-21rcutorture: Allow rcutorture without RCU Tasks RudePaul E. McKenney3-9/+26
Unless a kernel builds rcutorture, whether built-in or as a module, that kernel is also built with CONFIG_TASKS_RUDE_RCU, whether anything else needs Tasks Rude RCU or not. This unnecessarily increases kernel size. This commit therefore decouples the presence of rcutorture from the presence of RCU Tasks Rude. However, there is a need to select CONFIG_TASKS_RUDE_RCU for testing purposes. Except that casual users must not be bothered with questions -- for them, this needs to be fully automated. There is thus a CONFIG_FORCE_TASKS_RUDE_RCU that selects CONFIG_TASKS_RUDE_RCU, is user-selectable, but which depends on CONFIG_RCU_EXPERT. [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-21rcutorture: Allow rcutorture without RCU TasksPaul E. McKenney3-7/+25
Currently, a CONFIG_PREEMPT_NONE=y kernel substitutes normal RCU for RCU Tasks. Unless that kernel builds rcutorture, whether built-in or as a module, in which case RCU Tasks is (unnecessarily) used. This both increases kernel size and increases the complexity of certain tracing operations. This commit therefore decouples the presence of rcutorture from the presence of RCU Tasks. However, there is a need to select CONFIG_TASKS_RCU for testing purposes. Except that casual users must not be bothered with questions -- for them, this needs to be fully automated. There is thus a CONFIG_FORCE_TASKS_RCU that selects CONFIG_TASKS_RCU, is user-selectable, but which depends on CONFIG_RCU_EXPERT. [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-21rcutorture: Allow rcutorture without RCU Tasks TracePaul E. McKenney3-53/+71
Unless a kernel builds rcutorture, whether built-in or as a module, that kernel is also built with CONFIG_TASKS_TRACE_RCU, whether anything else needs Tasks Trace RCU or not. This unnecessarily increases kernel size. This commit therefore decouples the presence of rcutorture from the presence of RCU Tasks Trace. However, there is a need to select CONFIG_TASKS_TRACE_RCU for testing purposes. Except that casual users must not be bothered with questions -- for them, this needs to be fully automated. There is thus a CONFIG_FORCE_TASKS_TRACE_RCU that selects CONFIG_TASKS_TRACE_RCU, is user-selectable, but which depends on CONFIG_RCU_EXPERT. [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-21rcu: Make the TASKS_RCU Kconfig option be selectedPaul E. McKenney3-1/+4
Currently, any kernel built with CONFIG_PREEMPTION=y also gets CONFIG_TASKS_RCU=y, which is not helpful to people trying to build preemptible kernels of minimal size. Because CONFIG_TASKS_RCU=y is needed only in kernels doing tracing of one form or another, this commit moves from TASKS_RCU deciding when it should be enabled to the tracing Kconfig options explicitly selecting it. This allows building preemptible kernels without TASKS_RCU, if desired. This commit also updates the SRCU-N and TREE09 rcutorture scenarios in order to avoid Kconfig errors that would otherwise result from CONFIG_TASKS_RCU being selected without its CONFIG_RCU_EXPERT dependency being met. [ paulmck: Apply BPF_SYSCALL feedback from Andrii Nakryiko. ] Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-21rcu: Use IRQ_WORK_INIT_HARD() to avoid rcu_read_unlock() hangsZqiang1-1/+7
When booting kernels built with both CONFIG_RCU_STRICT_GRACE_PERIOD=y and CONFIG_PREEMPT_RT=y, the rcu_read_unlock_special() function's invocation of irq_work_queue_on() the init_irq_work() causes the rcu_preempt_deferred_qs_handler() function to work execute in SCHED_FIFO irq_work kthreads. Because rcu_read_unlock_special() is invoked on each rcu_read_unlock() in such kernels, the amount of work just keeps piling up, resulting in a boot-time hang. This commit therefore avoids this hang by using IRQ_WORK_INIT_HARD() instead of init_irq_work(), but only in kernels built with both CONFIG_PREEMPT_RT=y and CONFIG_RCU_STRICT_GRACE_PERIOD=y. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-21rcu_sync: Fix comment to properly reflect rcu_sync_exit() behaviorDavid Vernet1-1/+1
The rcu_sync_enter() function is used by updaters to force RCU readers (e.g. percpu-rwsem) to use their slow paths during an update. This is accomplished by setting the ->gp_state of the rcu_sync structure to GP_ENTER. In the case of percpu-rwsem, the readers' slow path waits on a semaphore instead of just incrementing a reader count. Each updater invokes the rcu_sync_exit() function to signal to readers that they may again take their fastpaths. The rcu_sync_exit() function sets the ->gp_state of the rcu_sync structure to GP_EXIT, and if all goes well, after a grace period the ->gp_state reverts back to GP_IDLE. Unfortunately, the rcu_sync_enter() function currently has a comment incorrectly stating that rcu_sync_exit() (by an updater) will re-enable reader "slowpaths". This patch changes the comment to state that this function re-enables reader fastpaths. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-21rcu: Check for successful spawn of ->boost_kthread_taskZqiang1-1/+2
For the spawning of the priority-boost kthreads can fail, improbable though this might seem. This commit therefore refrains from attemoting to initiate RCU priority boosting when The ->boost_kthread_task pointer is NULL. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-21rcu: Fix preemption mode check on synchronize_rcu[_expedited]()Frederic Weisbecker1-1/+3
An early check on synchronize_rcu[_expedited]() tries to determine if the current CPU is in UP mode on an SMP no-preempt kernel, in which case there is no need to start a grace period since the current assumed quiescent state is all we need. However the preemption mode doesn't take into account the boot selected preemption mode under CONFIG_PREEMPT_DYNAMIC=y, missing a possible early return if the running flavour is "none" or "voluntary". Use the shiny new preempt mode accessors to fix this. However, avoid invoking them during early boot because doing so triggers a WARN_ON_ONCE(). [ paulmck: Update for mainlined API. ] Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-21rcu: Print number of online CPUs in RCU CPU stall-warning messagesPaul E. McKenney1-4/+4
RCU's synchronous grace periods act quite differently when there is only one online CPU, especially in the no-op case in kernels built with CONFIG_PREEMPTION=n. This change in behavior can be important debugging information, so this commit adds the number of online CPUs to the RCU CPU stall warning messages. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-21rcu: Add comments to final rcu_gp_cleanup() "if" statementPaul E. McKenney1-5/+20
The final "if" statement in rcu_gp_cleanup() has proven to be rather confusing, straightforward though it might have seemed when initially written. This commit therefore adds comments to its "then" and "else" clauses to at least provide a more elevated form of confusion. Reported-by: Boqun Feng <boqun.feng@gmail.com> Reported-by: Frederic Weisbecker <frederic@kernel.org> Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reported-by: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-21kernel/smp: Provide boot-time timeout for CSD lock diagnosticsPaul E. McKenney1-2/+5
Debugging of problems involving insanely long-running SMI handlers proceeds better if the CSD-lock timeout can be adjusted. This commit therefore provides a new smp.csd_lock_timeout kernel boot parameter that specifies the timeout in milliseconds. The default remains at the previously hard-coded value of five seconds. [ paulmck: Apply feedback from Juergen Gross. ] Cc: Rik van Riel <riel@surriel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-20bpf: Fix usage of trace RCU in local storage.KP Singh3-14/+23
bpf_{sk,task,inode}_storage_free() do not need to use call_rcu_tasks_trace as no BPF program should be accessing the owner as it's being destroyed. The only other reader at this point is bpf_local_storage_map_free() which uses normal RCU. The only path that needs trace RCU are: * bpf_local_storage_{delete,update} helpers * map_{delete,update}_elem() syscalls Fixes: 0fe4b381a59e ("bpf: Allow bpf_local_storage to be used by sleepable programs") Signed-off-by: KP Singh <kpsingh@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220418155158.2865678-1-kpsingh@kernel.org
2022-04-20bpf: Ensure type tags precede modifiers in BTFKumar Kartikeya Dwivedi1-0/+54
It is guaranteed that for modifiers, clang always places type tags before other modifiers, and then the base type. We would like to rely on this guarantee inside the kernel to make it simple to parse type tags from BTF. However, a user would be allowed to construct a BTF without such guarantees. Hence, add a pass to check that in modifier chains, type tags only occur at the head of the chain, and then don't occur later in the chain. If we see a type tag, we can have one or more type tags preceding other modifiers that then never have another type tag. If we see other modifiers, all modifiers following them should never be a type tag. Instead of having to walk chains we verified previously, we can remember the last good modifier type ID which headed a good chain. At that point, we must have verified all other chains headed by type IDs less than it. This makes the verification process less costly, and it becomes a simple O(n) pass. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220419164608.1990559-2-memxor@gmail.com
2022-04-19perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabledZhipeng Xie3-6/+6
This problem can be reproduced with CONFIG_PERF_USE_VMALLOC enabled on both x86_64 and aarch64 arch when using sysdig -B(using ebpf)[1]. sysdig -B works fine after rebuilding the kernel with CONFIG_PERF_USE_VMALLOC disabled. I tracked it down to the if condition event->rb->nr_pages != nr_pages in perf_mmap is true when CONFIG_PERF_USE_VMALLOC is enabled where event->rb->nr_pages = 1 and nr_pages = 2048 resulting perf_mmap to return -EINVAL. This is because when CONFIG_PERF_USE_VMALLOC is enabled, rb->nr_pages is always equal to 1. Arch with CONFIG_PERF_USE_VMALLOC enabled by default: arc/arm/csky/mips/sh/sparc/xtensa Arch with CONFIG_PERF_USE_VMALLOC disabled by default: x86_64/aarch64/... Fix this problem by using data_page_nr() [1] https://github.com/draios/sysdig Fixes: 906010b2134e ("perf_event: Provide vmalloc() based mmap() backing") Signed-off-by: Zhipeng Xie <xiezhipeng1@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220209145417.6495-1-xiezhipeng1@huawei.com
2022-04-19sched/pelt: Fix attach_entity_load_avg() corner casekuyo chang1-5/+5
The warning in cfs_rq_is_decayed() triggered: SCHED_WARN_ON(cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg) There exists a corner case in attach_entity_load_avg() which will cause load_sum to be zero while load_avg will not be. Consider se_weight is 88761 as per the sched_prio_to_weight[] table. Further assume the get_pelt_divider() is 47742, this gives: se->avg.load_avg is 1. However, calculating load_sum: se->avg.load_sum = div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se)); se->avg.load_sum = 1*47742/88761 = 0. Then enqueue_load_avg() adds this to the cfs_rq totals: cfs_rq->avg.load_avg += se->avg.load_avg; cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum; Resulting in load_avg being 1 with load_sum is 0, which will trigger the WARN. Fixes: f207934fb79d ("sched/fair: Align PELT windows between cfs_rq and its se") Signed-off-by: kuyo chang <kuyo.chang@mediatek.com> [peterz: massage changelog] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lkml.kernel.org/r/20220414090229.342-1-kuyo.chang@mediatek.com
2022-04-19bpf: Move rcu lock management out of BPF_PROG_RUN routinesStanislav Fomichev2-18/+111
Commit 7d08c2c91171 ("bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions") switched a bunch of BPF_PROG_RUN macros to inline routines. This changed the semantic a bit. Due to arguments expansion of macros, it used to be: rcu_read_lock(); array = rcu_dereference(cgrp->bpf.effective[atype]); ... Now, with with inline routines, we have: array_rcu = rcu_dereference(cgrp->bpf.effective[atype]); /* array_rcu can be kfree'd here */ rcu_read_lock(); array = rcu_dereference(array_rcu); I'm assuming in practice rcu subsystem isn't fast enough to trigger this but let's use rcu API properly. Also, rename to lower caps to not confuse with macros. Additionally, drop and expand BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY. See [1] for more context. [1] https://lore.kernel.org/bpf/CAKH8qBs60fOinFdxiiQikK_q0EcVxGvNTQoWvHLEUGbgcj1UYg@mail.gmail.com/T/#u v2 - keep rcu locks inside by passing cgroup_bpf Fixes: 7d08c2c91171 ("bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220414161233.170780-1-sdf@google.com
2022-04-19Merge branch irq/gpio-immutable into irq/irqchip-nextMarc Zyngier1-0/+1
* irq/gpio-immutable: : . : First try at preventing the GPIO subsystem from abusing irq_chip : data structures. The general idea is to have an irq_chip flag : to tell the GPIO subsystem that these structures are immutable, : and to convert drivers one by one. : . Documentation: Update the recommended pattern for GPIO irqchips gpio: Update TODO to mention immutable irq_chip structures pinctrl: amd: Make the irqchip immutable pinctrl: msmgpio: Make the irqchip immutable pinctrl: apple-gpio: Make the irqchip immutable gpio: pl061: Make the irqchip immutable gpio: tegra186: Make the irqchip immutable gpio: Add helpers to ease the transition towards immutable irq_chip gpio: Expose the gpiochip_irq_re[ql]res helpers gpio: Don't fiddle with irqchips marked as immutable Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-04-19gpio: Don't fiddle with irqchips marked as immutableMarc Zyngier1-0/+1
In order to move away from gpiolib messing with the internals of unsuspecting irqchips, add a flag by which irqchips advertise that they are not to be messed with, and do solemnly swear that they correctly call into the gpiolib helpers when required. Also nudge the users into converting their drivers to the new model. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Bartosz Golaszewski <brgl@bgdev.pl> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220419141846.598305-2-maz@kernel.org
2022-04-18Merge drm/drm-next into drm-misc-nextPaul Cercueil10-715/+615
drm/drm-next has a build fix for the NewVision NV3052C panel (drivers/gpu/drm/panel/panel-newvision-nv3052c.c), which needs to be merged back to drm-misc-next, as it was failing to build there. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
2022-04-18swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tblChristoph Hellwig1-57/+20
No users left. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-04-18swiotlb: provide swiotlb_init variants that remap the bufferChristoph Hellwig1-3/+33
To shared more code between swiotlb and xen-swiotlb, offer a swiotlb_init_remap interface and add a remap callback to swiotlb_init_late that will allow Xen to remap the buffer without duplicating much of the logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-04-18swiotlb: pass a gfp_mask argument to swiotlb_init_lateChristoph Hellwig1-5/+2
Let the caller chose a zone to allocate from. This will be used later on by the xen-swiotlb initialization on arm. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-04-18swiotlb: add a SWIOTLB_ANY flag to lift the low memory restrictionChristoph Hellwig1-2/+9
Power SVM wants to allocate a swiotlb buffer that is not restricted to low memory for the trusted hypervisor scheme. Consolidate the support for this into the swiotlb_init interface by adding a new flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-04-18swiotlb: make the swiotlb_init interface more usefulChristoph Hellwig1-16/+19
Pass a boolean flag to indicate if swiotlb needs to be enabled based on the addressing needs, and replace the verbose argument with a set of flags, including one to force enable bounce buffering. Note that this patch removes the possibility to force xen-swiotlb use with the swiotlb=force parameter on the command line on x86 (arm and arm64 never supported that), but this interface will be restored shortly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-04-18swiotlb: rename swiotlb_late_init_with_default_sizeChristoph Hellwig1-4/+2
swiotlb_late_init_with_default_size is an overly verbose name that doesn't even catch what the function is doing, given that the size is not just a default but the actual requested size. Rename it to swiotlb_init_late. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-04-18swiotlb: simplify swiotlb_max_segmentChristoph Hellwig1-17/+3
Remove the bogus Xen override that was usually larger than the actual size and just calculate the value on demand. Note that swiotlb_max_segment still doesn't make sense as an interface and should eventually be removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-04-18swiotlb: make swiotlb_exit a no-op if SWIOTLB_FORCE is setChristoph Hellwig1-0/+3
If force bouncing is enabled we can't release the buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-04-18dma-direct: use is_swiotlb_active in dma_direct_map_pageChristoph Hellwig1-1/+1
Use the more specific is_swiotlb_active check instead of checking the global swiotlb_force variable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-04-17Merge tag 'timers-urgent-2022-04-17' of ↵Linus Torvalds2-7/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "A small set of fixes for the timers core: - Fix the warning condition in __run_timers() which does not take into account that a CPU base (especially the deferrable base) never has a timer armed on it and therefore the next_expiry value can become stale. - Replace a WARN_ON() in the NOHZ code with a WARN_ON_ONCE() to prevent endless spam in dmesg. - Remove the double star from a comment which is not meant to be in kernel-doc format" * tag 'timers-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/sched: Fix non-kernel-doc comment tick/nohz: Use WARN_ON_ONCE() to prevent console saturation timers: Fix warning condition in __run_timers()
2022-04-17Merge tag 'smp-urgent-2022-04-17' of ↵Linus Torvalds2-19/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP fixes from Thomas Gleixner: "Two fixes for the SMP core: - Make the warning condition in flush_smp_call_function_queue() correct, which checked a just emptied list head for being empty instead of validating that there was no pending entry on the offlined CPU at all. - The @cpu member of struct cpuhp_cpu_state is initialized when the CPU hotplug thread for the upcoming CPU is created. That's too late because the creation of the thread can fail and then the following rollback operates on CPU0. Get rid of the CPU member and hand the CPU number to the involved functions directly" * tag 'smp-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state smp: Fix offline cpu check in flush_smp_call_function_queue()
2022-04-17Merge tag 'irq-urgent-2022-04-17' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single fix for the interrupt affinity spreading logic to take into account that there can be an imbalance between present and possible CPUs, which causes already assigned bits to be overwritten" * tag 'irq-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/affinity: Consider that CPUs on nodes can be unbalanced
2022-04-16Merge tag 'dma-mapping-5.18-2' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-1/+2
Pull dma-mapping fix from Christoph Hellwig: - avoid a double memory copy for swiotlb (Chao Gao) * tag 'dma-mapping-5.18-2' of git://git.infradead.org/users/hch/dma-mapping: dma-direct: avoid redundant memory sync for swiotlb
2022-04-16irq_work: use kasan_record_aux_stack_noalloc() record callstackZqiang1-1/+1
On PREEMPT_RT kernel and KASAN is enabled. the kasan_record_aux_stack() may call alloc_pages(), and the rt-spinlock will be acquired, if currently in atomic context, will trigger warning: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 239, name: bootlogd Preemption disabled at: [<ffffffffbab1a531>] rt_mutex_slowunlock+0xa1/0x4e0 CPU: 3 PID: 239 Comm: bootlogd Tainted: G W 5.17.1-rt17-yocto-preempt-rt+ #105 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace: __might_resched.cold+0x13b/0x173 rt_spin_lock+0x5b/0xf0 get_page_from_freelist+0x20c/0x1610 __alloc_pages+0x25e/0x5e0 __stack_depot_save+0x3c0/0x4a0 kasan_save_stack+0x3a/0x50 __kasan_record_aux_stack+0xb6/0xc0 kasan_record_aux_stack+0xe/0x10 irq_work_queue_on+0x6a/0x1c0 pull_rt_task+0x631/0x6b0 do_balance_callbacks+0x56/0x80 __balance_callbacks+0x63/0x90 rt_mutex_setprio+0x349/0x880 rt_mutex_slowunlock+0x22a/0x4e0 rt_spin_unlock+0x49/0x80 uart_write+0x186/0x2b0 do_output_char+0x2e9/0x3a0 n_tty_write+0x306/0x800 file_tty_write.isra.0+0x2af/0x450 tty_write+0x22/0x30 new_sync_write+0x27c/0x3a0 vfs_write+0x3f7/0x5d0 ksys_write+0xd9/0x180 __x64_sys_write+0x43/0x50 do_syscall_64+0x44/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fix it by using kasan_record_aux_stack_noalloc() to avoid the call to alloc_pages(). Link: https://lkml.kernel.org/r/20220402142555.2699582-1-qiang1.zhang@intel.com Signed-off-by: Zqiang <qiang1.zhang@intel.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15ftrace: Fix build warningYueHaibing1-2/+2
If CONFIG_SYSCTL and CONFIG_DYNAMIC_FTRACE is n, build warns: kernel/trace/ftrace.c:7912:13: error: ‘is_permanent_ops_registered’ defined but not used [-Werror=unused-function] static bool is_permanent_ops_registered(void) ^~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/trace/ftrace.c:89:12: error: ‘last_ftrace_enabled’ defined but not used [-Werror=unused-variable] static int last_ftrace_enabled; ^~~~~~~~~~~~~~~~~~~ Move is_permanent_ops_registered() to ifdef block and mark last_ftrace_enabled as __maybe_unused to fix this. Fixes: 7cde53da38a3 ("ftrace: move sysctl_ftrace_enabled to ftrace.c") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni8-712/+612
2022-04-14Merge tag 'tai-for-tracing' into timers/coreThomas Gleixner11-715/+632
Pull in the NMI safe TAI accessor which was provided for the tracing tree to prepare for further changes in this area.
2022-04-14timekeeping: Introduce fast accessor to clock taiKurt Kanzenbach1-0/+17
Introduce fast/NMI safe accessor to clock tai for tracing. The Linux kernel tracing infrastructure has support for using different clocks to generate timestamps for trace events. Especially in TSN networks it's useful to have TAI as trace clock, because the application scheduling is done in accordance to the network time, which is based on TAI. With a tai trace_clock in place, it becomes very convenient to correlate network activity with Linux kernel application traces. Use the same implementation as ktime_get_boot_fast_ns() does by reading the monotonic time and adding the TAI offset. The same limitations as for the fast boot implementation apply. The TAI offset may change at run time e.g., by setting the time or using adjtimex() with an offset. However, these kind of offset changes are rare events. Nevertheless, the user has to be aware and deal with it in post processing. An alternative approach would be to use the same implementation as ktime_get_real_fast_ns() does. However, this requires to add an additional u64 member to the tk_read_base struct. This struct together with a seqcount is designed to fit into a single cache line on 64 bit architectures. Adding a new member would violate this constraint. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20220414091805.89667-2-kurt@linutronix.de
2022-04-14genirq: Take the proposed affinity at face value if force==trueMarc Zyngier1-2/+8
Although setting the affinity of an interrupt to a set of CPUs that doesn't have any online CPU is generally frowned apon, there are a few limited cases where such affinity is set from a CPUHP notifier, setting the affinity to a CPU that isn't online yet. The saving grace is that this is always done using the 'force' attribute, which gives a hint that the affinity setting can be outside of the online CPU mask and the callsite set this flag with the knowledge that the underlying interrupt controller knows to handle it. This restores the expected behaviour on Marek's system. Fixes: 33de0aa4bae9 ("genirq: Always limit the affinity to online CPUs") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/4b7fc13c-887b-a664-26e8-45aed13f048a@samsung.com Link: https://lore.kernel.org/r/20220414140011.541725-1-maz@kernel.org
2022-04-14ELF: Remove elf_core_copy_kernel_regs()Brian Gerst1-1/+1
x86-32 was the last architecture that implemented separate user and kernel registers. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220325153953.162643-3-brgerst@gmail.com
2022-04-14dma-direct: avoid redundant memory sync for swiotlbChao Gao1-1/+2
When we looked into FIO performance with swiotlb enabled in VM, we found swiotlb_bounce() is always called one more time than expected for each DMA read request. It turns out that the bounce buffer is copied to original DMA buffer twice after the completion of a DMA request (one is done by in dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()). But the content in bounce buffer actually doesn't change between the two rounds of copy. So, one round of copy is redundant. Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to skip the memory copy in it. This fix increases FIO 64KB sequential read throughput in a guest with swiotlb=force by 5.6%. Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code") Reported-by: Wang Zhaoyang1 <zhaoyang1.wang@intel.com> Reported-by: Gao Liang <liang.gao@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-04-14bpf: Remove unnecessary type castingsYu Zhe2-3/+3
Remove/clean up unnecessary void * type castings. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413015048.12319-1-yuzhe@nfschina.com
2022-04-13Merge branch 'pr/bpf-sysctl' into bpf-nextDaniel Borkmann2-79/+87
Pull the migration of the BPF sysctl handling into BPF core. This work is needed in both sysctl-next and bpf-next tree. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/bpf/b615bd44-6bd1-a958-7e3f-dd2ff58931a1@iogearbox.net
2022-04-13Merge remote-tracking branch 'bpf-next/pr/bpf-sysctl' into sysctl-nextLuis Chamberlain2-79/+87
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-04-13bpf: Move BPF sysctls from kernel/sysctl.c to BPF coreYan Zhu2-79/+87
We're moving sysctls out of kernel/sysctl.c as it is a mess. We already moved all filesystem sysctls out. And with time the goal is to move all sysctls out to their own subsystem/actual user. kernel/sysctl.c has grown to an insane mess and its easy to run into conflicts with it. The effort to move them out into various subsystems is part of this. Signed-off-by: Yan Zhu <zhuyan34@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/bpf/20220407070759.29506-1-zhuyan34@huawei.com
2022-04-13cpu/hotplug: Initialise all cpuhp_cpu_state structs earlierSteven Price1-9/+13
Rather than waiting until a CPU is first brought online, do the initialisation of the cpuhp_cpu_state structure for each CPU during the __init phase. This saves a (small) amount of non-__init memory and avoids potential confusion about when the cpuhp_cpu_state struct is valid. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220411152233.474129-3-steven.price@arm.com
2022-04-13Merge branch 'smp/urgent' into smp/coreThomas Gleixner2-19/+19
Pick up the hotplug fix for dependent changes.