summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-09-17rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread()Paul E. McKenney1-1/+1
Commit 8344496e8b49 ("rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()") introduced conditional compilation of several functions, but forgot one occurrence of show_rcu_tasks_classic_gp_kthread() that causes the compiler to warn of an unused static function. This commit uses "static inline" to avoid these complaints and possibly also to avoid emitting an actual definition of this function. Fixes: 8344496e8b49 ("rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()") Cc: <stable@vger.kernel.org> # 5.8.x Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-09-17rcu-tasks: Use more aggressive polling for RCU Tasks TracePaul E. McKenney1-2/+14
The RCU Tasks Trace grace periods are too slow, as in 40x slower than those of RCU Tasks. This is due to my having assumed a one-second grace period was OK, and thus not having optimized any further. This commit provides the first step in this optimization process, namely by allowing the task_list scan backoff interval to be specified on a per-flavor basis, and then speeding up the scans for RCU Tasks Trace. However, kernels built with CONFIG_TASKS_TRACE_RCU_READ_MB=y continue to use the old slower backoff, consistent with that Kconfig option's goal of reducing IPIs. Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/ Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: <bpf@vger.kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-09-17rcu-tasks: Mark variables staticPaul E. McKenney1-3/+3
The n_heavy_reader_attempts, n_heavy_reader_updates, and n_heavy_reader_ofl_updates variables are not used outside of their translation unit, so this commit marks them static. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-09-16irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()Thomas Gleixner1-22/+48
To support MSI irq domains which do not fit at all into the regular MSI irqdomain scheme, like the XEN MSI interrupt management for PV/HVM/DOM0, it's necessary to allow to override the alloc/free implementation. This is a preperatory step to switch X86 away from arch_*_msi_irqs() and store the irq domain pointer right in struct device. No functional change for existing MSI irq domain users. Aside of the evil XEN wrapper this is also useful for special MSI domains which need to do extra alloc/free work before/after calling the generic core function. Work like allocating/freeing MSI descriptors, MSI storage space etc. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200826112333.526797548@linutronix.de
2020-09-16irqdomain/msi: Provide DOMAIN_BUS_VMD_MSIThomas Gleixner1-1/+6
PCI devices behind a VMD bus are not subject to interrupt remapping, but the irq domain for VMD MSI cannot be distinguished from a regular PCI/MSI irq domain. Add a new domain bus token and allow it in the bus token check in msi_check_reservation_mode() to keep the functionality the same once VMD uses this token. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Jon Derrick <jonathan.derrick@intel.com> Link: https://lore.kernel.org/r/20200826112332.954409970@linutronix.de
2020-09-16x86/msi: Use generic MSI domain opsThomas Gleixner1-6/+0
pci_msi_get_hwirq() and pci_msi_set_desc are not longer special. Enable the generic MSI domain ops in the core and PCI MSI code unconditionally and get rid of the x86 specific implementations in the X86 MSI code and in the hyperv PCI driver. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200826112332.564274859@linutronix.de
2020-09-16genirq/chip: Use the first chip in irq_chip_compose_msi_msg()Thomas Gleixner2-5/+13
The documentation of irq_chip_compose_msi_msg() claims that with hierarchical irq domains the first chip in the hierarchy which has an irq_compose_msi_msg() callback is chosen. But the code just keeps iterating after it finds a chip with a compose callback. The x86 HPET MSI implementation relies on that behaviour, but that does not make it more correct. The message should always be composed at the domain which manages the underlying resource (e.g. APIC or remap table) because that domain knows about the required layout of the message. On X86 the following hierarchies exist: 1) vector -------- PCI/MSI 2) vector -- IR -- PCI/MSI The vector domain has a different message format than the IR (remapping) domain. So obviously the PCI/MSI domain can't compose the message without having knowledge about the parent domain, which is exactly the opposite of what hierarchical domains want to achieve. X86 actually has two different PCI/MSI chips where #1 has a compose callback and #2 does not. #2 delegates the composition to the remap domain where it belongs, but #1 does it at the PCI/MSI level. For the upcoming device MSI support it's necessary to change this and just let the first domain which can compose the message take care of it. That way the top level chip does not have to worry about it and the device MSI code does not need special knowledge about topologies. It just sets the compose callback to NULL and lets the hierarchy pick the first chip which has one. Due to that the attempt to move the compose callback from the direct delivery PCI/MSI domain to the vector domain made the system fail to boot with interrupt remapping enabled because in the remapping case irq_chip_compose_msi_msg() keeps iterating and choses the compose callback of the vector domain which obviously creates the wrong format for the remap table. Break out of the loop when the first irq chip with a compose callback is found and fixup the HPET code temporarily. That workaround will be removed once the direct delivery compose callback is moved to the place where it belongs in the vector domain. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200826112331.047917603@linutronix.de
2020-09-16locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_countHou Tao1-2/+2
The __this_cpu*() accessors are (in general) IRQ-unsafe which, given that percpu-rwsem is a blocking primitive, should be just fine. However, file_end_write() is used from IRQ context and will cause load-store issues on architectures where the per-cpu accessors are not natively irq-safe. Fix it by using the IRQ-safe this_cpu_*() for operations on read_count. This will generate more expensive code on a number of platforms, which might cause a performance regression for some of the other percpu-rwsem users. If any such is reported, we can consider alternative solutions. Fixes: 70fe2f48152e ("aio: fix freeze protection of aio writes") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20200915140750.137881-1-houtao1@huawei.com
2020-09-16softirq: Add debug check to __raise_softirq_irqoff()Jiafei Pan1-0/+1
__raise_softirq_irqoff() must be called with interrupts disabled to protect the per CPU softirq pending state update against an interrupt and soft interrupt handling on return from interrupt. Add a lockdep assertion to validate the calling convention. [ tglx: Massaged changelog ] Signed-off-by: Jiafei Pan <Jiafei.Pan@nxp.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200814045522.45719-1-Jiafei.Pan@nxp.com
2020-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2-12/+7
Alexei Starovoitov says: ==================== pull-request: bpf 2020-09-15 The following pull-request contains BPF updates for your *net* tree. We've added 12 non-merge commits during the last 19 day(s) which contain a total of 10 files changed, 47 insertions(+), 38 deletions(-). The main changes are: 1) docs/bpf fixes, from Andrii. 2) ld_abs fix, from Daniel. 3) socket casting helpers fix, from Martin. 4) hash iterator fixes, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16bpf: Add BPF_PROG_BIND_MAP syscallYiFei Zhu1-0/+63
This syscall binds a map to a program. Returns success if the map is already bound to the program. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: YiFei Zhu <zhuyifei1999@gmail.com> Link: https://lore.kernel.org/bpf/20200915234543.3220146-3-sdf@google.com
2020-09-16bpf: Mutex protect used_maps array and countYiFei Zhu2-8/+23
To support modifying the used_maps array, we use a mutex to protect the use of the counter and the array. The mutex is initialized right after the prog aux is allocated, and destroyed right before prog aux is freed. This way we guarantee it's initialized for both cBPF and eBPF. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: YiFei Zhu <zhuyifei1999@gmail.com> Link: https://lore.kernel.org/bpf/20200915234543.3220146-2-sdf@google.com
2020-09-16bpf: Fix a rcu warning for bpffs map pretty-printYonghong Song1-1/+3
Running selftest ./btf_btf -p the kernel had the following warning: [ 51.528185] WARNING: CPU: 3 PID: 1756 at kernel/bpf/hashtab.c:717 htab_map_get_next_key+0x2eb/0x300 [ 51.529217] Modules linked in: [ 51.529583] CPU: 3 PID: 1756 Comm: test_btf Not tainted 5.9.0-rc1+ #878 [ 51.530346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014 [ 51.531410] RIP: 0010:htab_map_get_next_key+0x2eb/0x300 ... [ 51.542826] Call Trace: [ 51.543119] map_seq_next+0x53/0x80 [ 51.543528] seq_read+0x263/0x400 [ 51.543932] vfs_read+0xad/0x1c0 [ 51.544311] ksys_read+0x5f/0xe0 [ 51.544689] do_syscall_64+0x33/0x40 [ 51.545116] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The related source code in kernel/bpf/hashtab.c: 709 static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) 710 { 711 struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 712 struct hlist_nulls_head *head; 713 struct htab_elem *l, *next_l; 714 u32 hash, key_size; 715 int i = 0; 716 717 WARN_ON_ONCE(!rcu_read_lock_held()); In kernel/bpf/inode.c, bpffs map pretty print calls map->ops->map_get_next_key() without holding a rcu_read_lock(), hence causing the above warning. To fix the issue, just surrounding map->ops->map_get_next_key() with rcu read lock. Fixes: a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200916004401.146277-1-yhs@fb.com
2020-09-15printk: reimplement log_cont using record extensionJohn Ogness1-78/+20
Use the record extending feature of the ringbuffer to implement continuous messages. This preserves the existing continuous message behavior. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200914123354.832-7-john.ogness@linutronix.de
2020-09-15printk: ringbuffer: add finalization/extension supportJohn Ogness2-55/+476
Add support for extending the newest data block. For this, introduce a new finalization state (desc_finalized) denoting a committed descriptor that cannot be extended. Until a record is finalized, a writer can reopen that record to append new data. Reopening a record means transitioning from the desc_committed state back to the desc_reserved state. A writer can explicitly finalize a record if there is no intention of extending it. Also, records are automatically finalized when a new record is reserved. This relieves writers of needing to explicitly finalize while also making such records available to readers sooner. (Readers can only traverse finalized records.) Four new memory barrier pairs are introduced. Two of them are insignificant additions (data_realloc:A/desc_read:D and data_realloc:A/data_push_tail:B) because they are alternate path memory barriers that exactly match the purpose, pairing, and context of the two existing memory barrier pairs they provide an alternate path for. The other two new memory barrier pairs are significant additions: desc_reopen_last:A / _prb_commit:B - When reopening a descriptor, ensure the state transitions back to desc_reserved before fully trusting the descriptor data. _prb_commit:B / desc_reserve:D - When committing a descriptor, ensure the state transitions to desc_committed before checking the head ID to see if the descriptor needs to be finalized. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200914123354.832-6-john.ogness@linutronix.de
2020-09-15printk: ringbuffer: change representation of statesJohn Ogness2-32/+27
Rather than deriving the state by evaluating bits within the flags area of the state variable, assign the states explicit values and set those values in the flags area. Introduce macros to make it simple to read and write state values for the state variable. Although the functionality is preserved, the binary representation for the states is changed. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200914123354.832-5-john.ogness@linutronix.de
2020-09-15printk: ringbuffer: clear initial reserved fieldsJohn Ogness2-16/+26
prb_reserve() will set some meta data values and leave others uninitialized (or rather, containing the values of the previous wrap). Simplify the API by always clearing out all the fields. Only the sequence number is filled in. The caller is now responsible for filling in the rest of the meta data fields. In particular, for correctly filling in text and dict lengths. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200914123354.832-4-john.ogness@linutronix.de
2020-09-15printk: ringbuffer: add BLK_DATALESS() macroJohn Ogness1-2/+4
Rather than continually needing to explicitly check @begin and @next to identify a dataless block, introduce and use a BLK_DATALESS() macro. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200914123354.832-3-john.ogness@linutronix.de
2020-09-15printk: ringbuffer: relocate get_data()John Ogness1-58/+58
Move the internal get_data() function as-is above prb_reserve() so that a later change can make use of the static function. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200914123354.832-2-john.ogness@linutronix.de
2020-09-15printk: ringbuffer: avoid memcpy() on state_varJohn Ogness1-2/+7
@state_var is copied as part of the descriptor copying via memcpy(). This is not allowed because @state_var is an atomic type, which in some implementations may contain a spinlock. Avoid using memcpy() with @state_var by explicitly copying the other fields of the descriptor. @state_var is set using atomic set operator before returning. Fixes: b6cf8b3f3312 ("printk: add lockless ringbuffer") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200914094803.27365-2-john.ogness@linutronix.de
2020-09-15printk: ringbuffer: fix setting state in desc_read()John Ogness1-7/+19
It is expected that desc_read() will always set at least the @state_var field. However, if the descriptor is in an inconsistent state, no fields are set. Also, the second load of @state_var is not stored in @desc_out and so might not match the state value that is returned. Always set the last loaded @state_var into @desc_out, regardless of the descriptor consistency. Fixes: b6cf8b3f3312 ("printk: add lockless ringbuffer") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200914094803.27365-1-john.ogness@linutronix.de
2020-09-14core/entry: Report syscall correctly for trace and auditKees Cook1-2/+4
On v5.8 when doing seccomp syscall rewrites (e.g. getpid into getppid as seen in the seccomp selftests), trace (and audit) correctly see the rewritten syscall on entry and exit: seccomp_bpf-1307 [000] .... 22974.874393: sys_enter: NR 110 (... seccomp_bpf-1307 [000] .N.. 22974.874401: sys_exit: NR 110 = 1304 With mainline we see a mismatched enter and exit (the original syscall is incorrectly visible on entry): seccomp_bpf-1030 [000] .... 21.806766: sys_enter: NR 39 (... seccomp_bpf-1030 [000] .... 21.806767: sys_exit: NR 110 = 1027 When ptrace or seccomp change the syscall, this needs to be visible to trace and audit at that time as well. Update the syscall earlier so they see the correct value. Fixes: d88d59b64ca3 ("core/entry: Respect syscall number rewrites") Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200912005826.586171-1-keescook@chromium.org
2020-09-14Merge v5.9-rc5 into drm-nextDaniel Vetter4-10/+24
Paul needs 1a21e5b930e8 ("drm/ingenic: Fix leak of device_node pointer") and 3b5b005ef7d9 ("drm/ingenic: Fix driver not probing when IPU port is missing") from -fixes to be able to merge further ingenic patches into -next. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2020-09-14kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()Masami Hiramatsu1-2/+3
Commit: 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler") fixed one bug but the underlying bugs are not completely fixed yet. If we run a kprobe_module.tc of ftracetest, a warning triggers: # ./ftracetest test.d/kprobe/kprobe_module.tc === Ftrace unit tests === [1] Kprobe dynamic event - probing module ... ------------[ cut here ]------------ Failed to disarm kprobe-ftrace at trace_printk_irq_work+0x0/0x7e [trace_printk] (-2) WARNING: CPU: 7 PID: 200 at kernel/kprobes.c:1091 __disarm_kprobe_ftrace.isra.0+0x7e/0xa0 This is because the kill_kprobe() calls disarm_kprobe_ftrace() even if the given probe is not enabled. In that case, ftrace_set_filter_ip() fails because the given probe point is not registered to ftrace. Fix to check the given (going) probe is enabled before invoking disarm_kprobe_ftrace(). Fixes: 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/159888672694.1411785.5987998076694782591.stgit@devnote2
2020-09-14lockdep: fix order in trace_hardirqs_off_caller()Sven Schnelle1-2/+2
Switch order so that locking state is consistent even if the IRQ tracer calls into lockdep again. Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-13genirq: Allow interrupts to be excluded from /proc/interruptsMarc Zyngier3-1/+9
A number of architectures implement IPI statistics directly, duplicating the core kstat_irqs accounting. As we move IPIs to being actual IRQs, we would end-up with a confusing display in /proc/interrupts (where the IPIs would appear twice). In order to solve this, allow interrupts to be flagged as "hidden", which excludes them from /proc/interrupts. Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-13genirq: Add fasteoi IPI flowMarc Zyngier1-0/+27
For irqchips using the fasteoi flow, IPIs are a bit special. They need to be EOI'd early (before calling the handler), as funny things may happen in the handler (they do not necessarily behave like a normal interrupt). Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-12Merge tag 'seccomp-v5.9-rc5' of ↵Linus Torvalds1-6/+18
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp fixes from Kees Cook: "This fixes a rare race condition in seccomp when using TSYNC and USER_NOTIF together where a memory allocation would not get freed (found by syzkaller, fixed by Tycho). Additionally updates Tycho's MAINTAINERS and .mailmap entries for his new address" * tag 'seccomp-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: don't leave dangling ->notif if file allocation fails mailmap, MAINTAINERS: move to tycho.pizza seccomp: don't leak memory when filter install races
2020-09-11gcov: add support for GCC 10.1Peter Oberparleiter2-2/+3
Using gcov to collect coverage data for kernels compiled with GCC 10.1 causes random malfunctions and kernel crashes. This is the result of a changed GCOV_COUNTERS value in GCC 10.1 that causes a mismatch between the layout of the gcov_info structure created by GCC profiling code and the related structure used by the kernel. Fix this by updating the in-kernel GCOV_COUNTERS value. Also re-enable config GCOV_KERNEL for use with GCC 10. Reported-by: Colin Ian King <colin.king@canonical.com> Reported-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Tested-and-Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-11kernel/debug: Fix spelling mistake in debug_core.cYouling Tang1-2/+2
Fix typo: "notifiter" --> "notifier" "overriden" --> "overridden" Signed-off-by: Youling Tang <tangyouling@loongson.cn> Link: https://lore.kernel.org/r/1596793480-22559-1-git-send-email-tangyouling@loongson.cn Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-09-11dma-mapping: move the dma_declare_coherent_memory documentationChristoph Hellwig1-0/+17
dma_declare_coherent_memory should not be in a DMA API guide aimed at driver writers (that is consumers of the API). Move it to a comment near the function instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-11dma-mapping: move dma_common_{mmap,get_sgtable} out of mapping.cChristoph Hellwig3-46/+53
Add a new file that contains helpers for misc DMA ops, which is only built when CONFIG_DMA_OPS is set. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-11dma-direct: rename and cleanup __phys_to_dmaChristoph Hellwig2-6/+6
The __phys_to_dma vs phys_to_dma distinction isn't exactly obvious. Try to improve the situation by renaming __phys_to_dma to phys_to_dma_unencryped, and not forcing architectures that want to override phys_to_dma to actually provide __phys_to_dma. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-11dma-direct: remove __dma_to_physChristoph Hellwig1-5/+1
There is no harm in just always clearing the SME encryption bit, while significantly simplifying the interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-11dma-direct: use phys_to_dma_direct in dma_direct_allocChristoph Hellwig1-4/+1
Replace the currently open code copy. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-11dma-direct: lift gfp_t manipulation out of__dma_direct_alloc_pagesChristoph Hellwig1-7/+5
Move the detailed gfp_t setup from __dma_direct_alloc_pages into the caller to clean things up a little. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-11dma-direct: remove dma_direct_{alloc,free}_pagesChristoph Hellwig2-25/+16
Just merge these helpers into the main dma_direct_{alloc,free} routines, as the additional checks are always false for the two callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-11dma-mapping: add (back) arch_dma_mark_clean for ia64Christoph Hellwig2-0/+9
Add back a hook to optimize dcache flushing after reading executable code using DMA. This gets ia64 out of the business of pretending to be dma incoherent just for this optimization. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-11dma-mapping: fix DMA_OPS dependenciesChristoph Hellwig1-0/+1
Driver that select DMA_OPS need to depend on HAS_DMA support to work. The vop driver was missing that dependency, so add it, and also add a another depends in DMA_OPS itself. That won't fix the issue due to how the Kconfig dependencies work, but at least produce a warning about unmet dependencies. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-11dma-debug: remove most exportsChristoph Hellwig1-10/+0
Now that the main dma mapping entry points are out of line most of the symbols in dma-debug.c can only be called from built-in code. Remove the unused exports. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-11dma-mapping: remove the dma_dummy_ops exportChristoph Hellwig1-1/+0
dma_dummy_ops is only used by the ACPI code, which can't be modular. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-10objtool: Rename frame.h -> objtool.hJulien Thierry2-2/+2
Header frame.h is getting more code annotations to help objtool analyze object files. Rename the file to objtool.h. [ jpoimboe: add objtool.h to MAINTAINERS ] Signed-off-by: Julien Thierry <jthierry@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-09-10swiotlb: Mark max_segment with static keywordAndy Shevchenko1-1/+1
Sparse is not happy about max_segment declaration: CHECK kernel/dma/swiotlb.c kernel/dma/swiotlb.c:96:14: warning: symbol 'max_segment' was not declared. Should it be static? Mark it static as suggested. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2020-09-10swiotlb: Use %pa to print phys_addr_t variablesAndy Shevchenko1-3/+1
There is an extension to a %p to print phys_addr_t type of variables. Use it here. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2020-09-10perf/core: Pull pmu::sched_task() into perf_event_context_sched_out()Kan Liang1-30/+17
The pmu::sched_task() is a context switch callback. It passes the cpuctx->task_ctx as a parameter to the lower code. To find the cpuctx->task_ctx, the current code iterates a cpuctx list. The same context will iterated in perf_event_context_sched_out() soon. Share the cpuctx->task_ctx can avoid the unnecessary iteration of the cpuctx list. The pmu::sched_task() is also required for the optimization case for equivalent contexts. The task_ctx_sched_out() will eventually disable and reenable the PMU when schedule out events. Add perf_pmu_disable() and perf_pmu_enable() around task_ctx_sched_out() don't break anything. Drop the cpuctx->ctx.lock for the pmu::sched_task(). The lock is for per-CPU context, which is not necessary for the per-task context schedule. No one uses sched_cb_entry, perf_sched_cb_usages, sched_cb_list, and perf_pmu_sched_task() any more. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200821195754.20159-2-kan.liang@linux.intel.com
2020-09-10perf/core: Pull pmu::sched_task() into perf_event_context_sched_in()Kan Liang1-20/+31
The pmu::sched_task() is a context switch callback. It passes the cpuctx->task_ctx as a parameter to the lower code. To find the cpuctx->task_ctx, the current code iterates a cpuctx list. The same context was just iterated in perf_event_context_sched_in(), which is invoked right before the pmu::sched_task(). Reuse the cpuctx->task_ctx from perf_event_context_sched_in() can avoid the unnecessary iteration of the cpuctx list. Both pmu::sched_task and perf_event_context_sched_in() have to disable PMU. Pull the pmu::sched_task into perf_event_context_sched_in() can also save the overhead from the PMU disable and reenable. The new and old tasks may have equivalent contexts. The current code optimize this case by swapping the context, which avoids the scheduling. For this case, pmu::sched_task() is still required, e.g., restore the LBR content. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200821195754.20159-1-kan.liang@linux.intel.com
2020-09-10timekeeping: Use seqcount_latch_tAhmed S. Darwish1-5/+5
Latch sequence counters are a multiversion concurrency control mechanism where the seqcount_t counter even/odd value is used to switch between two data storage copies. This allows the seqcount_t read path to safely interrupt its write side critical section (e.g. from NMIs). Initially, latch sequence counters were implemented as a single write function, raw_write_seqcount_latch(), above plain seqcount_t. The read path was expected to use plain seqcount_t raw_read_seqcount(). A specialized read function was later added, raw_read_seqcount_latch(), and became the standardized way for latch read paths. Having unique read and write APIs meant that latch sequence counters are basically a data type of their own -- just inappropriately overloading plain seqcount_t. The seqcount_latch_t data type was thus introduced at seqlock.h. Use that new data type instead of seqcount_raw_spinlock_t. This ensures that only latch-safe APIs are to be used with the sequence counter. Note that the use of seqcount_raw_spinlock_t was not very useful in the first place. Only the "raw_" subset of seqcount_t APIs were used at timekeeping.c. This subset was created for contexts where lockdep cannot be used. seqcount_LOCKTYPE_t's raison d'être -- verifying that the seqcount_t writer serialization lock is held -- cannot thus be done. References: 0c3351d451ae ("seqlock: Use raw_ prefix instead of _no_lockdep") References: 55f3560df975 ("seqlock: Extend seqcount API with associated locks") Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200827114044.11173-6-a.darwish@linutronix.de
2020-09-10time/sched_clock: Use seqcount_latch_tAhmed S. Darwish1-2/+2
Latch sequence counters have unique read and write APIs, and thus seqcount_latch_t was recently introduced at seqlock.h. Use that new data type instead of plain seqcount_t. This adds the necessary type-safety and ensures only latching-safe seqcount APIs are to be used. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200827114044.11173-5-a.darwish@linutronix.de
2020-09-10time/sched_clock: Use raw_read_seqcount_latch() during suspendAhmed S. Darwish1-1/+1
sched_clock uses seqcount_t latching to switch between two storage places protected by the sequence counter. This allows it to have interruptible, NMI-safe, seqcount_t write side critical sections. Since 7fc26327b756 ("seqlock: Introduce raw_read_seqcount_latch()"), raw_read_seqcount_latch() became the standardized way for seqcount_t latch read paths. Due to the dependent load, it has one read memory barrier less than the currently used raw_read_seqcount() API. Use raw_read_seqcount_latch() for the suspend path. Commit aadd6e5caaac ("time/sched_clock: Use raw_read_seqcount_latch()") missed changing that instance of raw_read_seqcount(). References: 1809bfa44e10 ("timers, sched/clock: Avoid deadlock during read from NMI") Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200715092345.GA231464@debian-buster-darwi.lab.linutronix.de
2020-09-10Merge branch 'linus' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a regression in padata" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: padata: fix possible padata_works_lock deadlock