summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2018-08-14Merge tag 'pm-4.19-rc1' of ↵Linus Torvalds8-23/+33
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add a new framework for CPU idle time injection, to be used by all of the idle injection code in the kernel in the future, fix some issues and add a number of relatively small extensions in multiple places. Specifics: - Add a new framework for CPU idle time injection (Daniel Lezcano). - Add AVS support to the armada-37xx cpufreq driver (Gregory CLEMENT). - Add support for current CPU frequency reporting to the ACPI CPPC cpufreq driver (George Cherian). - Rework the cooling device registration in the imx6q/thermal driver (Bastian Stender). - Make the pcc-cpufreq driver refuse to work with dynamic scaling governors on systems with many CPUs to avoid scalability issues with it (Rafael Wysocki). - Fix the intel_pstate driver to report different maximum CPU frequencies on systems where they really are different and to ignore the turbo active ratio if hardware-managend P-states (HWP) are in use; make it use the match_string() helper (Xie Yisheng, Srinivas Pandruvada). - Fix a minor deferred probe issue in the qcom-kryo cpufreq driver (Niklas Cassel). - Add a tracepoint for the tracking of frequency limits changes (from Andriod) to the cpufreq core (Ruchi Kandoi). - Fix a circular lock dependency between CPU hotplug and sysfs locking in the cpufreq core reported by lockdep (Waiman Long). - Avoid excessive error reports on driver registration failures in the ARM cpuidle driver (Sudeep Holla). - Add a new device links flag to the driver core to make links go away automatically on supplier driver removal (Vivek Gautam). - Eliminate potential race condition between system-wide power management transitions and system shutdown (Pingfan Liu). - Add a quirk to save NVS memory on system suspend for the ASUS 1025C laptop (Willy Tarreau). - Make more systems use suspend-to-idle (instead of ACPI S3) by default (Tristian Celestin). - Get rid of stack VLA usage in the low-level hibernation code on 64-bit x86 (Kees Cook). - Fix error handling in the hibernation core and mark an expected fall-through switch in it (Chengguang Xu, Gustavo Silva). - Extend the generic power domains (genpd) framework to support attaching a device to a power domain by name (Ulf Hansson). - Fix device reference counting and user limits initialization in the devfreq core (Arvind Yadav, Matthias Kaehlcke). - Fix a few issues in the rk3399_dmc devfreq driver and improve its documentation (Enric Balletbo i Serra, Lin Huang, Nick Milner). - Drop a redundant error message from the exynos-ppmu devfreq driver (Markus Elfring)" * tag 'pm-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits) PM / reboot: Eliminate race between reboot and suspend PM / hibernate: Mark expected switch fall-through cpufreq: intel_pstate: Ignore turbo active ratio in HWP cpufreq: Fix a circular lock dependency problem cpu/hotplug: Add a cpus_read_trylock() function x86/power/hibernate_64: Remove VLA usage cpufreq: trace frequency limits change cpufreq: intel_pstate: Show different max frequency with turbo 3 and HWP cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems cpufreq: qcom-kryo: Silently error out on EPROBE_DEFER cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC cpufreq: armada-37xx: Add AVS support dt-bindings: marvell: Add documentation for the Armada 3700 AVS binding PM / devfreq: rk3399_dmc: Fix duplicated opp table on reload. PM / devfreq: Init user limits from OPP limits, not viceversa PM / devfreq: rk3399_dmc: fix spelling mistakes. PM / devfreq: rk3399_dmc: do not print error when get supply and clk defer. dt-bindings: devfreq: rk3399_dmc: move interrupts to be optional. PM / devfreq: rk3399_dmc: remove wait for dcf irq event. dt-bindings: clock: add rk3399 DDR3 standard speed bins. ...
2018-08-14Merge tag 'dma-mapping-4.19' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds2-13/+11
Pull dma-mapping updates from Christoph Hellwig: - a series from Robin to fix bus imposed dma limits by adding a separate mask for them to struct device instead of trying to squeeze a second meaning out of the existing dma mask as we did before. This has ACKs from the various other subsystems touched - a small swiotlb cleanup from Kees (acked by Konrad) - conversion of nios2 and sh to the new generic dma-noncoherent code. Various other architecture conversions will come through the architectures maintainers trees. * tag 'dma-mapping-4.19' of git://git.infradead.org/users/hch/dma-mapping: sh: use generic dma_noncoherent_ops sh: split arch/sh/mm/consistent.c sh: use dma_direct_ops for the CONFIG_DMA_COHERENT case sh: introduce a sh_cacheop_vaddr helper sh: simplify get_arch_dma_ops OF: Don't set default coherent DMA mask ACPI/IORT: Don't set default coherent DMA mask iommu/dma: Respect bus DMA limit for IOVAs of/device: Set bus DMA mask as appropriate ACPI/IORT: Set bus DMA mask as appropriate dma-mapping: Generalise dma_32bit_limit flag ACPI/IORT: Support address size limit for root complexes of/platform: Initialise default DMA masks nios2: use generic dma_noncoherent_ops swiotlb: clean up reporting dma-mapping: relax warning for per-device areas
2018-08-14Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-blockLinus Torvalds2-3/+8
Pull block updates from Jens Axboe: "First pull request for this merge window, there will also be a followup request with some stragglers. This pull request contains: - Fix for a thundering heard issue in the wbt block code (Anchal Agarwal) - A few NVMe pull requests: * Improved tracepoints (Keith) * Larger inline data support for RDMA (Steve Wise) * RDMA setup/teardown fixes (Sagi) * Effects log suppor for NVMe target (Chaitanya Kulkarni) * Buffered IO suppor for NVMe target (Chaitanya Kulkarni) * TP4004 (ANA) support (Christoph) * Various NVMe fixes - Block io-latency controller support. Much needed support for properly containing block devices. (Josef) - Series improving how we handle sense information on the stack (Kees) - Lightnvm fixes and updates/improvements (Mathias/Javier et al) - Zoned device support for null_blk (Matias) - AIX partition fixes (Mauricio Faria de Oliveira) - DIF checksum code made generic (Max Gurtovoy) - Add support for discard in iostats (Michael Callahan / Tejun) - Set of updates for BFQ (Paolo) - Removal of async write support for bsg (Christoph) - Bio page dirtying and clone fixups (Christoph) - Set of bcache fix/changes (via Coly) - Series improving blk-mq queue setup/teardown speed (Ming) - Series improving merging performance on blk-mq (Ming) - Lots of other fixes and cleanups from a slew of folks" * tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits) blkcg: Make blkg_root_lookup() work for queues in bypass mode bcache: fix error setting writeback_rate through sysfs interface null_blk: add lock drop/acquire annotation Blk-throttle: reduce tail io latency when iops limit is enforced block: paride: pd: mark expected switch fall-throughs block: Ensure that a request queue is dissociated from the cgroup controller block: Introduce blk_exit_queue() blkcg: Introduce blkg_root_lookup() block: Remove two superfluous #include directives blk-mq: count the hctx as active before allocating tag block: bvec_nr_vecs() returns value for wrong slab bcache: trivial - remove tailing backslash in macro BTREE_FLAG bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section bcache: set max writeback rate when I/O request is idle bcache: add code comments for bset.c bcache: fix mistaken comments in request.c bcache: fix mistaken code comments in bcache.h bcache: add a comment in super.c bcache: avoid unncessary cache prefetch bch_btree_node_get() bcache: display rate debug parameters to 0 when writeback is not running ...
2018-08-14Merge branch 'l1tf-final' of ↵Linus Torvalds4-30/+283
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Merge L1 Terminal Fault fixes from Thomas Gleixner: "L1TF, aka L1 Terminal Fault, is yet another speculative hardware engineering trainwreck. It's a hardware vulnerability which allows unprivileged speculative access to data which is available in the Level 1 Data Cache when the page table entry controlling the virtual address, which is used for the access, has the Present bit cleared or other reserved bits set. If an instruction accesses a virtual address for which the relevant page table entry (PTE) has the Present bit cleared or other reserved bits set, then speculative execution ignores the invalid PTE and loads the referenced data if it is present in the Level 1 Data Cache, as if the page referenced by the address bits in the PTE was still present and accessible. While this is a purely speculative mechanism and the instruction will raise a page fault when it is retired eventually, the pure act of loading the data and making it available to other speculative instructions opens up the opportunity for side channel attacks to unprivileged malicious code, similar to the Meltdown attack. While Meltdown breaks the user space to kernel space protection, L1TF allows to attack any physical memory address in the system and the attack works across all protection domains. It allows an attack of SGX and also works from inside virtual machines because the speculation bypasses the extended page table (EPT) protection mechanism. The assoicated CVEs are: CVE-2018-3615, CVE-2018-3620, CVE-2018-3646 The mitigations provided by this pull request include: - Host side protection by inverting the upper address bits of a non present page table entry so the entry points to uncacheable memory. - Hypervisor protection by flushing L1 Data Cache on VMENTER. - SMT (HyperThreading) control knobs, which allow to 'turn off' SMT by offlining the sibling CPU threads. The knobs are available on the kernel command line and at runtime via sysfs - Control knobs for the hypervisor mitigation, related to L1D flush and SMT control. The knobs are available on the kernel command line and at runtime via sysfs - Extensive documentation about L1TF including various degrees of mitigations. Thanks to all people who have contributed to this in various ways - patches, review, testing, backporting - and the fruitful, sometimes heated, but at the end constructive discussions. There is work in progress to provide other forms of mitigations, which might be less horrible performance wise for a particular kind of workloads, but this is not yet ready for consumption due to their complexity and limitations" * 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) x86/microcode: Allow late microcode loading with SMT disabled tools headers: Synchronise x86 cpufeatures.h for L1TF additions x86/mm/kmmio: Make the tracer robust against L1TF x86/mm/pat: Make set_memory_np() L1TF safe x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert x86/speculation/l1tf: Invert all not present mappings cpu/hotplug: Fix SMT supported evaluation KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry x86/speculation: Simplify sysfs report of VMX L1TF vulnerability Documentation/l1tf: Remove Yonah processors from not vulnerable list x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr() x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d x86: Don't include linux/irq.h from asm/hardirq.h x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d x86/irq: Demote irq_cpustat_t::__softirq_pending to u16 x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush() x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond' x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush() cpu/hotplug: detect SMT disabled by BIOS ...
2018-08-14Merge branch 'pm-cpufreq'Rafael J. Wysocki1-0/+6
Merge cpufreq changes for 4.19. These are driver extensions, some driver and core fixes and a new tracepoint for the tracking of frequency limits changes (coming from Android). * pm-cpufreq: cpufreq: intel_pstate: Ignore turbo active ratio in HWP cpufreq: Fix a circular lock dependency problem cpu/hotplug: Add a cpus_read_trylock() function cpufreq: trace frequency limits change cpufreq: intel_pstate: Show different max frequency with turbo 3 and HWP cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems cpufreq: qcom-kryo: Silently error out on EPROBE_DEFER cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC cpufreq: armada-37xx: Add AVS support dt-bindings: marvell: Add documentation for the Armada 3700 AVS binding cpufreq: imx6q/thermal: imx: register cooling device depending on OF cpufreq: intel_pstate: use match_string() helper
2018-08-14Merge branches 'pm-core', 'pm-domains', 'pm-sleep', 'acpi-pm' and 'pm-cpuidle'Rafael J. Wysocki7-23/+27
Merge changes in the PM core, system-wide PM infrastructure, generic power domains (genpd) framework, ACPI PM infrastructure and cpuidle for 4.19. * pm-core: driver core: Add flag to autoremove device link on supplier unbind driver core: Rename flag AUTOREMOVE to AUTOREMOVE_CONSUMER * pm-domains: PM / Domains: Introduce dev_pm_domain_attach_by_name() PM / Domains: Introduce option to attach a device by name to genpd PM / Domains: dt: Add a power-domain-names property * pm-sleep: PM / reboot: Eliminate race between reboot and suspend PM / hibernate: Mark expected switch fall-through x86/power/hibernate_64: Remove VLA usage PM / hibernate: cast PAGE_SIZE to int when comparing with error code * acpi-pm: ACPI / PM: save NVS memory for ASUS 1025C laptop ACPI / PM: Default to s2idle in all machines supporting LP S0 * pm-cpuidle: ARM: cpuidle: silence error on driver registration failure
2018-08-14Merge tag 'mips_4.19' of ↵Linus Torvalds1-2/+6
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS updates from Paul Burton: "Here are the main MIPS changes for 4.19. An overview of the general architecture changes: - Massive DMA ops refactoring from Christoph Hellwig (huzzah for deleting crufty code!). - We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes & corresponding regsets to expose DSP ASE & floating point mode state respectively, both for live debugging & core dumps. - We better optimize our code by hard-coding cpu_has_* macros at compile time where their values are known due to the ISA revision that the kernel build is targeting. - The EJTAG exception handler now better handles SMP systems, where it was previously possible for CPUs to clobber a register value saved by another CPU. - Our implementation of memset() gained a couple of fixes for MIPSr6 systems to return correct values in some cases where stores fault. - We now implement ioremap_wc() using the uncached-accelerated cache coherency attribute where supported, which is detected during boot, and fall back to plain uncached access where necessary. The MIPS-specific (and unused in tree) ioremap_uncached_accelerated() & ioremap_cacheable_cow() are removed. - The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP systems by reworking the way we ensure remote CPUs that may be running threads within the affected process switch mode. - Systems using the MIPS Coherence Manager will now set the MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache maintenance overhead when flushing the icache. - A few fixes were made for building with clang/LLVM, which now sucessfully builds kernels for many of our platforms. - Miscellaneous cleanups all over. And some platform-specific changes: - ar7 gained stubs for a few clock API functions to fix build failures for some drivers. - ath79 gained support for a few new SoCs, a few fixes & better gpio-keys support. - Ci20 now exposes its SPI bus using the spi-gpio driver. - The generic platform can now auto-detect a suitable value for PHYS_OFFSET based upon the memory map described by the device tree, allowing us to avoid wasting memory on page book-keeping for systems where RAM starts at a non-zero physical address. - Ingenic systems using the jz4740 platform code now link their vmlinuz higher to allow for kernels of a realistic size. - Loongson32 now builds the kernel targeting MIPSr1 rather than MIPSr2 to avoid CPU errata. - Loongson64 gains a couple of fixes, a workaround for a write buffering issue & support for the Loongson 3A R3.1 CPU. - Malta now uses the piix4-poweroff driver to handle powering down. - Microsemi Ocelot gained support for its SPI bus & NOR flash, its second MDIO bus and can now be supported by a FIT/.itb image. - Octeon saw a bunch of header cleanups which remove a lot of duplicate or unused code" * tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (123 commits) MIPS: Remove remnants of UASM_ISA MIPS: netlogic: xlr: Remove erroneous check in nlm_fmn_send() MIPS: VDSO: Force link endianness MIPS: Always specify -EB or -EL when using clang MIPS: Use dins to simplify __write_64bit_c0_split() MIPS: Use read-write output operand in __write_64bit_c0_split() MIPS: Avoid using array as parameter to write_c0_kpgd() MIPS: vdso: Allow clang's --target flag in VDSO cflags MIPS: genvdso: Remove GOT checks MIPS: Remove obsolete MIPS checks for DST node "chosen@0" MIPS: generic: Remove input symbols from defconfig MIPS: Delete unused code in linux32.c MIPS: Remove unused sys_32_mmap2 MIPS: Remove nabi_no_regargs mips: dts: mscc: enable spi and NOR flash support on ocelot PCB123 mips: dts: mscc: Add spi on Ocelot MIPS: Loongson: Merge load addresses MIPS: Loongson: Set Loongson32 to MIPS32R1 MIPS: mscc: ocelot: add interrupt controller properties to GPIO controller MIPS: generic: Select MIPS_AUTO_PFN_OFFSET ...
2018-08-14Merge branch 'parisc-4.19-1' of ↵Linus Torvalds1-11/+2
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: - parisc now uses the generic dma_noncoherent_ops implementation (Christoph Hellwig) - further memory barrier and spinlock improvements (John David Anglin) - prepare removal of current_text_addr() functions (Nick Desaulniers) - improve kernel stack unwinding on parisc (me) - drop ENOTSUP which was defined on parisc only (me) * 'parisc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix and improve kernel stack unwinding parisc: Remove unnecessary barriers from spinlock.h parisc: Remove ordered stores from syscall.S parisc: prefer _THIS_IP_ and _RET_IP_ statement expressions parisc: Add HAVE_REGS_AND_STACK_ACCESS_API feature parisc: Drop architecture-specific ENOTSUP define parisc: use generic dma_noncoherent_ops parisc: always use flush_kernel_dcache_range for DMA cache maintainance parisc: merge pcx_dma_ops and pcxl_dma_ops
2018-08-14Merge branch 'x86-timers-for-linus' of ↵Linus Torvalds5-50/+74
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timer updates from Thomas Gleixner: "Early TSC based time stamping to allow better boot time analysis. This comes with a general cleanup of the TSC calibration code which grew warts and duct taping over the years and removes 250 lines of code. Initiated and mostly implemented by Pavel with help from various folks" * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) x86/kvmclock: Mark kvm_get_preset_lpj() as __init x86/tsc: Consolidate init code sched/clock: Disable interrupts when calling generic_sched_clock_init() timekeeping: Prevent false warning when persistent clock is not available sched/clock: Close a hole in sched_clock_init() x86/tsc: Make use of tsc_calibrate_cpu_early() x86/tsc: Split native_calibrate_cpu() into early and late parts sched/clock: Use static key for sched_clock_running sched/clock: Enable sched clock early sched/clock: Move sched clock initialization and merge with generic clock x86/tsc: Use TSC as sched clock early x86/tsc: Initialize cyc2ns when tsc frequency is determined x86/tsc: Calibrate tsc only once ARM/time: Remove read_boot_clock64() s390/time: Remove read_boot_clock64() timekeeping: Default boot time offset to local_clock() timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset() s390/time: Add read_persistent_wall_and_boot_offset() x86/xen/time: Output xen sched_clock time from 0 x86/xen/time: Initialize pv xen time in init_hypervisor_platform() ...
2018-08-14Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-6/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Thomas Gleixner: - Make lazy TLB mode even lazier to avoid pointless switch_mm() operations, which reduces CPU load by 1-2% for memcache workloads - Small cleanups and improvements all over the place * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Remove redundant check for kmem_cache_create() arm/asm/tlb.h: Fix build error implicit func declaration x86/mm/tlb: Make clear_asid_other() static x86/mm/tlb: Skip atomic operations for 'init_mm' in switch_mm_irqs_off() x86/mm/tlb: Always use lazy TLB mode x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs x86/mm/tlb: Make lazy TLB mode lazier x86/mm/tlb: Restructure switch_mm_irqs_off() x86/mm/tlb: Leave lazy TLB mode at page table free time mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids x86/mm: Add TLB purge to free pmd/pte page interfaces ioremap: Update pgtable free interfaces with addr x86/mm: Disable ioremap free page handling on x86-PAE
2018-08-13Merge branch 'timers-core-for-linus' of ↵Linus Torvalds18-137/+342
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timers departement more or less proudly presents: - More Y2038 timekeeping work mostly in the core code. The work is slowly, but steadily targeting the actuall syscalls. - Enhanced timekeeping suspend/resume support by utilizing clocksources which do not stop during suspend, but are otherwise not the main timekeeping clocksources. - Make NTP adjustmets more accurate and immediate when the frequency is set directly and not incrementally. - Sanitize the overrung handing of posix timers - A new timer driver for Mediatek SoCs - The usual pile of fixes and updates all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) clockevents: Warn if cpu_all_mask is used as cpumask tick/broadcast-hrtimer: Use cpu_possible_mask for ce_broadcast_hrtimer clocksource/drivers/arm_arch_timer: Fix bogus cpu_all_mask usage clocksource: ti-32k: Remove CLOCK_SOURCE_SUSPEND_NONSTOP flag timers: Clear timer_base::must_forward_clk with timer_base::lock held clocksource/drivers/sprd: Register one always-on timer to compensate suspend time clocksource/drivers/timer-mediatek: Add support for system timer clocksource/drivers/timer-mediatek: Convert the driver to timer-of clocksource/drivers/timer-mediatek: Use specific prefix for GPT clocksource/drivers/timer-mediatek: Rename mtk_timer to timer-mediatek clocksource/drivers/timer-mediatek: Add system timer bindings clocksource/drivers: Set clockevent device cpumask to cpu_possible_mask time: Introduce one suspend clocksource to compensate the suspend time time: Fix extra sleeptime injection when suspend fails timekeeping/ntp: Constify some function arguments ntp: Use kstrtos64 for s64 variable ntp: Remove redundant arguments timer: Fix coding style ktime: Provide typesafe ktime_to_ns() hrtimer: Improve kernel message printing ...
2018-08-13Merge branch 'perf-core-for-linus' of ↵Linus Torvalds7-288/+105
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf update from Thomas Gleixner: "The perf crowd presents: Kernel updates: - Removal of jprobes - Cleanup and consolidatation the handling of kprobes - Cleanup and consolidation of hardware breakpoints - The usual pile of fixes and updates to PMUs and event descriptors Tooling updates: - Updates and improvements all over the place. Nothing outstanding, just the (good) boring incremental grump work" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) perf trace: Do not require --no-syscalls to suppress strace like output perf bpf: Include uapi/linux/bpf.h from the 'perf trace' script's bpf.h perf tools: Allow overriding MAX_NR_CPUS at compile time perf bpf: Show better message when failing to load an object perf list: Unify metric group description format with PMU event description perf vendor events arm64: Update ThunderX2 implementation defined pmu core events perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packet perf cs-etm: Generate branch sample when receiving a CS_ETM_TRACE_ON packet perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packet perf cs-etm: Fix start tracing packet handling perf build: Fix installation directory for eBPF perf c2c report: Fix crash for empty browser perf tests: Fix indexing when invoking subtests perf trace: Beautify the AF_INET & AF_INET6 'socket' syscall 'protocol' args perf trace beauty: Add beautifiers for 'socket''s 'protocol' arg perf trace beauty: Do not print NULL strarray entries perf beauty: Add a generator for IPPROTO_ socket's protocol constants tools include uapi: Grab a copy of linux/in.h perf tests: Fix complex event name parsing perf evlist: Fix error out while applying initial delay and LBR ...
2018-08-13Merge branch 'locking-core-for-linus' of ↵Linus Torvalds4-74/+64
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking/atomics update from Thomas Gleixner: "The locking, atomics and memory model brains delivered: - A larger update to the atomics code which reworks the ordering barriers, consolidates the atomic primitives, provides the new atomic64_fetch_add_unless() primitive and cleans up the include hell. - Simplify cmpxchg() instrumentation and add instrumentation for xchg() and cmpxchg_double(). - Updates to the memory model and documentation" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) locking/atomics: Rework ordering barriers locking/atomics: Instrument cmpxchg_double*() locking/atomics: Instrument xchg() locking/atomics: Simplify cmpxchg() instrumentation locking/atomics/x86: Reduce arch_cmpxchg64*() instrumentation tools/memory-model: Rename litmus tests to comply to norm7 tools/memory-model/Documentation: Fix typo, smb->smp sched/Documentation: Update wake_up() & co. memory-barrier guarantees locking/spinlock, sched/core: Clarify requirements for smp_mb__after_spinlock() sched/core: Use smp_mb() in wake_woken_function() tools/memory-model: Add informal LKMM documentation to MAINTAINERS locking/atomics/Documentation: Describe atomic_set() as a write operation tools/memory-model: Make scripts executable tools/memory-model: Remove ACCESS_ONCE() from model tools/memory-model: Remove ACCESS_ONCE() from recipes locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example MAINTAINERS: Add Daniel Lustig as an LKMM reviewer tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name tools/memory-model: Add litmus test for full multicopy atomicity locking/refcount: Always allow checked forms ...
2018-08-13Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug update from Thomas Gleixner: "A trivial name fix for the hotplug state machine" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Clarify CPU hotplug step name for timers
2018-08-13Merge branch 'sched-core-for-linus' of ↵Linus Torvalds23-819/+966
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - Cleanup and improvement of NUMA balancing - Refactoring and improvements to the PELT (Per Entity Load Tracking) code - Watchdog simplification and related cleanups - The usual pile of small incremental fixes and improvements * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) watchdog: Reduce message verbosity stop_machine: Reflow cpu_stop_queue_two_works() sched/numa: Move task_numa_placement() closer to numa_migrate_preferred() sched/numa: Use group_weights to identify if migration degrades locality sched/numa: Update the scan period without holding the numa_group lock sched/numa: Remove numa_has_capacity() sched/numa: Modify migrate_swap() to accept additional parameters sched/numa: Remove unused task_capacity from 'struct numa_stats' sched/numa: Skip nodes that are at 'hoplimit' sched/debug: Reverse the order of printing faults sched/numa: Use task faults only if numa_group is not yet set up sched/numa: Set preferred_node based on best_cpu sched/numa: Simplify load_too_imbalanced() sched/numa: Evaluate move once per node sched/numa: Remove redundant field sched/debug: Show the sum wait time of a task group sched/fair: Remove #ifdefs from scale_rt_capacity() sched/core: Remove get_cpu() from sched_fork() sched/cpufreq: Clarify sugov_get_util() sched/sysctl: Remove unused sched_time_avg_ms sysctl ...
2018-08-13Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "A single bugfix to prevent a pinned thread which queues stomp machine work to be preempted by the stopper thread on its CPU which causes a live lock as it is unable to wake the second CPUs stopper thread" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: stop_machine: Atomically queue and wake stopper threads
2018-08-13Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds12-810/+1201
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Thomas Gleixner: "A large update to RCU: Preparatory work for consolidating the RCU flavors: - Introduce grace-period sequence numbers to the RCU-bh, RCU-preempt, and RCU-sched flavors, replacing the old ->gpnum and ->completed pair of fields. This change allows lockless code to obtain the complete grace-period state with a single READ_ONCE(), which is needed to maintain tolerable lock contention during the upcoming consolidation of the three RCU flavors. Note that grace-period sequence numbers are already used by rcu_barrier(), expedited RCU grace periods, and SRCU, and are thus already heavily used and well-tested. Joel Fernandes contributed a number of excellent fixes and improvements. - Clean up some grace-period-reporting loose ends, including improving the handling of quiescent states from offline CPUs and fixing some false-positive WARN_ON_ONCE() invocations. (Strictly speaking, the WARN_ON_ONCE() invocations were quite correct, but their invariants were (harmlessly) violated by the earlier sloppy handling of quiescent states from offline CPUs.) In addition, improve grace-period forward-progress guarantees so as to allow removal of fail-safe checks that required otherwise needless lock acquisitions. Finally, add more diagnostics to help debug the upcoming consolidation of the RCU-bh, RCU-preempt, and RCU-sched flavors. The rest: - SRCU updates - Updates to rcutorture and associated scripting. - The usual pile of miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (118 commits) rcutorture: Fix rcu_barrier successes counter rcutorture: Add support to detect if boost kthread prio is too low rcutorture: Use monotonic timestamp for stall detection rcutorture: Make boost test more robust rcutorture: Disable RT throttling for boost tests rcutorture: Emphasize testing of single reader protection type rcutorture: Handle extended read-side critical sections rcutorture: Make rcu_torture_timer() use rcu_torture_one_read() rcutorture: Use per-CPU random state for rcu_torture_timer() rcutorture: Use atomic increment for n_rcu_torture_timers rcutorture: Extract common code from rcu_torture_reader() rcuperf: Remove unused torturing_tasks() function rcu: Remove rcutorture test version and sequence number rcutorture: Change units of onoff_interval to jiffies rcu: Assign higher prio to RCU threads if rcutorture is built-in rculist: Improve documentation for list_for_each_entry_from_rcu() srcu: Add grace-period number to rcutorture statistics printout rcu: Print stall-warning NMI dyntick state in hexadecimal MAINTAINERS: Update RCU, SRCU, and TORTURE-TEST entries rcu: Make rcu_seq_diff() more exact ...
2018-08-13Merge branch 'irq-core-for-linus' of ↵Linus Torvalds4-36/+47
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull genirq updates from Thomas Gleixner: "The irq departement provides: - A synchronization fix for free_irq() to synchronize just the removed interrupt thread on shared interrupt lines. - Consolidate the multi low level interrupt entry handling and mvoe it to the generic code instead of adding yet another copy for RISC-V - Refactoring of the ARM LPI allocator and LPI exposure to the hypervisor - Yet another interrupt chip driver for the JZ4725B SoC - Speed up for /proc/interrupts as people seem to love reading this file with high frequency - Miscellaneous fixes and updates" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) irqchip/gic-v3-its: Make its_lock a raw_spin_lock_t genirq/irqchip: Remove MULTI_IRQ_HANDLER as it's now obselete openrisc: Use the new GENERIC_IRQ_MULTI_HANDLER arm64: Use the new GENERIC_IRQ_MULTI_HANDLER ARM: Convert to GENERIC_IRQ_MULTI_HANDLER irqchip: Port the ARM IRQ drivers to GENERIC_IRQ_MULTI_HANDLER irqchip/gic-v3-its: Reduce minimum LPI allocation to 1 for PCI devices dt-bindings: irqchip: renesas-irqc: Document r8a77980 support dt-bindings: irqchip: renesas-irqc: Document r8a77470 support irqchip/ingenic: Add support for the JZ4725B SoC irqchip/stm32: Add exti0 translation for stm32mp1 genirq: Remove redundant NULL pointer check in __free_irq() irqchip/gic-v3-its: Honor hypervisor enforced LPI range irqchip/gic-v3: Expose GICD_TYPER in the rdist structure irqchip/gic-v3-its: Drop chunk allocation compatibility irqchip/gic-v3-its: Move minimum LPI requirements to individual busses irqchip/gic-v3-its: Use full range of LPIs irqchip/gic-v3-its: Refactor LPI allocator genirq: Synchronize only with single thread on free_irq() genirq: Update code comments wrt recycled thread_mask ...
2018-08-13parisc: Drop architecture-specific ENOTSUP defineHelge Deller1-11/+2
parisc is the only Linux architecture which has defined a value for ENOTSUP. All other architectures #define ENOTSUP as EOPNOTSUPP in their libc headers. Having an own value for ENOTSUP which is different than EOPNOTSUPP often gives problems with userspace programs which expect both to be the same. One such example is a build error in the libuv package, as can be seen in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900237. Since we dropped HP-UX support, there is no real benefit in keeping an own value for ENOTSUP. This patch drops the parisc value for ENOTSUP from the kernel sources. glibc needs no patch, it reuses the exported headers. Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-12init: rename and re-order boot_cpu_state_init()Linus Torvalds1-1/+1
This is purely a preparatory patch for upcoming changes during the 4.19 merge window. We have a function called "boot_cpu_state_init()" that isn't really about the bootup cpu state: that is done much earlier by the similarly named "boot_cpu_init()" (note lack of "state" in name). This function initializes some hotplug CPU state, and needs to run after the percpu data has been properly initialized. It even has a comment to that effect. Except it _doesn't_ actually run after the percpu data has been properly initialized. On x86 it happens to do that, but on at least arm and arm64, the percpu base pointers are initialized by the arch-specific 'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init(). This had some unexpected results, and in particular we have a patch pending for the merge window that did the obvious cleanup of using 'this_cpu_write()' in the cpu hotplug init code: - per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE; + this_cpu_write(cpuhp_state.state, CPUHP_ONLINE); which is obviously the right thing to do. Except because of the ordering issue, it actually failed miserably and unexpectedly on arm64. So this just fixes the ordering, and changes the name of the function to be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu hotplug state, because the core CPU state was supposed to have already been done earlier. Marked for stable, since the (not yet merged) patch that will show this problem is marked for stable. Reported-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-09xdp: fix bug in devmap teardown code pathJesper Dangaard Brouer1-5/+9
Like cpumap teardown, the devmap teardown code also flush remaining xdp_frames, via bq_xmit_all() in case map entry is removed. The code can call xdp_return_frame_rx_napi, from the the wrong context, in-case ndo_xdp_xmit() fails. Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi") Fixes: 735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-09xdp: fix bug in cpumap teardown code pathJesper Dangaard Brouer1-6/+9
When removing a cpumap entry, a number of syncronization steps happen. Eventually the teardown code __cpu_map_entry_free is invoked from/via call_rcu. The teardown code __cpu_map_entry_free() flushes remaining xdp_frames, by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi(). The issues is that the teardown code is not running in the RX NAPI code path. Thus, it is not allowed to invoke the NAPI variant of xdp_return_frame. This bug was found and triggered by using the --stress-mode option to the samples/bpf program xdp_redirect_cpu. It is hard to trigger, because the ptr_ring have to be full and cpumap bulk queue max contains 8 packets, and a remote CPU is racing to empty the ptr_ring queue. Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi") Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-08bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem pathDaniel Borkmann1-2/+5
In bpf_tcp_sendmsg() the sk_alloc_sg() may fail. In the case of ENOMEM, it may also mean that we've partially filled the scatterlist entries with pages. Later jumping to sk_stream_wait_memory() we could further fail with an error for several reasons, however we miss to call free_start_sg() if the local sk_msg_buff was used. Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-08bpf, sockmap: fix bpf_tcp_sendmsg sock error handlingDaniel Borkmann1-1/+1
While working on bpf_tcp_sendmsg() code, I noticed that when a sk->sk_err is set we error out with err = sk->sk_err. However this is problematic since sk->sk_err is a positive error value and therefore we will neither go into sk_stream_error() nor will we report an error back to user space. I had this case with EPIPE and user space was thinking sendmsg() succeeded since EPIPE is a positive value, thinking we submitted 32 bytes. Fix it by negating the sk->sk_err value. Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-07cpu/hotplug: Fix SMT supported evaluationThomas Gleixner2-13/+30
Josh reported that the late SMT evaluation in cpu_smt_state_init() sets cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied on the kernel command line as it cannot differentiate between SMT disabled by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and makes the sysfs interface unusable. Rework this so that during bringup of the non boot CPUs the availability of SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a 'primary' thread then set the local cpu_smt_available marker and evaluate this explicitely right after the initial SMP bringup has finished. SMT evaulation on x86 is a trainwreck as the firmware has all the information _before_ booting the kernel, but there is no interface to query it. Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") Reported-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-06Merge tag 'irqchip-4.19' of ↵Thomas Gleixner41-303/+4843
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - GICv3 ITS LPI allocation revamp - GICv3 support for hypervisor-enforced LPI range - GICv3 ITS conversion to raw spinlock
2018-08-06PM / reboot: Eliminate race between reboot and suspendPingfan Liu6-21/+24
At present, "systemctl suspend" and "shutdown" can run in parrallel. A system can suspend after devices_shutdown(), and resume. Then the shutdown task goes on to power off. This causes many devices are not really shut off. Hence replacing reboot_mutex with system_transition_mutex (renamed from pm_mutex) to achieve the exclusion. The renaming of pm_mutex as system_transition_mutex can be better to reflect the purpose of the mutex. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-08-06PM / hibernate: Mark expected switch fall-throughGustavo A. R. Silva1-0/+1
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This addresses Coverity-ID: 114713 ("Missing break in switch"). Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-08-06Merge back cpufreq changes for 4.19.Rafael J. Wysocki1-0/+6
2018-08-06Merge tag 'v4.18-rc6' into for-4.19/block2Jens Axboe15-78/+160
Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a merge conflict down the line. Signed-of-by: Jens Axboe <axboe@kernel.dk>
2018-08-05stop_machine: Atomically queue and wake stopper threadsPrasad Sodagudi1-0/+2
When cpu_stop_queue_work() releases the lock for the stopper thread that was queued into its wake queue, preemption is enabled, which leads to the following deadlock: CPU0 CPU1 sched_setaffinity(0, ...) __set_cpus_allowed_ptr() stop_one_cpu(0, ...) stop_two_cpus(0, 1, ...) cpu_stop_queue_work(0, ...) cpu_stop_queue_two_works(0, ..., 1, ...) -grabs lock for migration/0- -spins with preemption disabled, waiting for migration/0's lock to be released- -adds work items for migration/0 and queues migration/0 to its wake_q- -releases lock for migration/0 and preemption is enabled- -current thread is preempted, and __set_cpus_allowed_ptr has changed the thread's cpu allowed mask to CPU1 only- -acquires migration/0 and migration/1's locks- -adds work for migration/0 but does not add migration/0 to wake_q, since it is already in a wake_q- -adds work for migration/1 and adds migration/1 to its wake_q- -releases migration/0 and migration/1's locks, wakes migration/1, and enables preemption- -since migration/1 is requested to run, migration/1 begins to run and waits on migration/0, but migration/0 will never be able to run, since the thread that can wake it is affine to CPU1- Disable preemption in cpu_stop_queue_work() before queueing works for stopper threads, and queueing the stopper thread in the wake queue, to ensure that the operation of queueing the works and waking the stopper threads is atomic. Fixes: 0b26351b910f ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock") Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Cc: matt@codeblueprint.co.uk Cc: bigeasy@linutronix.de Cc: gregkh@linuxfoundation.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1533329766-4856-1-git-send-email-isaacm@codeaurora.org Co-Developed-by: Isaac J. Manjarres <isaacm@codeaurora.org>
2018-08-05Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two oneliners addressing NOHZ failures: - Use a bitmask to check for the pending timer softirq and not the bit number. The existing code using the bit number checked for the wrong bit, which caused timers to either expire late or stop completely. - Make the nohz evaluation on interrupt exit more robust. The existing code did not re-arm the hardware when interrupting a running softirq in task context (ksoftirqd or tail of local_bh_enable()), which caused timers to either expire late or stop completely" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: nohz: Fix missing tick reprogram when interrupting an inline softirq nohz: Fix local_timer_softirq_pending()
2018-08-05Merge 4.18-rc7 into master to pick up the KVM dependcyThomas Gleixner50-372/+5046
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-03nohz: Fix missing tick reprogram when interrupting an inline softirqFrederic Weisbecker1-1/+1
The full nohz tick is reprogrammed in irq_exit() only if the exit is not in a nesting interrupt. This stands as an optimization: whether a hardirq or a softirq is interrupted, the tick is going to be reprogrammed when necessary at the end of the inner interrupt, with even potential new updates on the timer queue. When soft interrupts are interrupted, it's assumed that they are executing on the tail of an interrupt return. In that case tick_nohz_irq_exit() is called after softirq processing to take care of the tick reprogramming. But the assumption is wrong: softirqs can be processed inline as well, ie: outside of an interrupt, like in a call to local_bh_enable() or from ksoftirqd. Inline softirqs don't reprogram the tick once they are done, as opposed to interrupt tail softirq processing. So if a tick interrupts an inline softirq processing, the next timer will neither be reprogrammed from the interrupting tick's irq_exit() nor after the interrupted softirq processing. This situation may leave the tick unprogrammed while timers are armed. To fix this, simply keep reprogramming the tick even if a softirq has been interrupted. That can be optimized further, but for now correctness is more important. Note that new timers enqueued in nohz_full mode after a softirq gets interrupted will still be handled just fine through self-IPIs triggered by the timer code. Reported-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: stable@vger.kernel.org # 4.14+ Link: https://lkml.kernel.org/r/1533303094-15855-1-git-send-email-frederic@kernel.org
2018-08-03genirq: Make force irq threading setup more robustThomas Gleixner1-1/+8
The support of force threading interrupts which are set up with both a primary and a threaded handler wreckaged the setup of regular requested threaded interrupts (primary handler == NULL). The reason is that it does not check whether the primary handler is set to the default handler which wakes the handler thread. Instead it replaces the thread handler with the primary handler as it would do with force threaded interrupts which have been requested via request_irq(). So both the primary and the thread handler become the same which then triggers the warnon that the thread handler tries to wakeup a not configured secondary thread. Fortunately this only happens when the driver omits the IRQF_ONESHOT flag when requesting the threaded interrupt, which is normaly caught by the sanity checks when force irq threading is disabled. Fix it by skipping the force threading setup when a regular threaded interrupt is requested. As a consequence the interrupt request which lacks the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging it. Fixes: 2a1d3ab8986d ("genirq: Handle force threading of irqs with primary and thread handler") Reported-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de> Cc: stable@vger.kernel.org
2018-08-03watchdog: Reduce message verbositySinan Kaya1-2/+2
Code is emitting the following error message during boot on systems without PMU hardware support while probing NMI capability. NMI watchdog: Perf event create on CPU 0 failed with -2 This error is emitted as the perf subsystem returns -ENOENT due to lack of PMUs in the system. It is followed by the warning that NMI watchdog is disabled: NMI watchdog: Perf NMI watchdog permanently disabled While NMI disabled information is useful for ordinary users, seeing a PERF event create failed with error code -2 is not. Reduce the message severity to debug so that if debugging is still possible in case the error code returned by perf is required for analysis. Signed-off-by: Sinan Kaya <okaya@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Colin Ian King <colin.king@canonical.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599368 Link: https://lkml.kernel.org/r/20180803060943.2643-1-okaya@kernel.org
2018-08-03genirq/irqchip: Remove MULTI_IRQ_HANDLER as it's now obseletePalmer Dabbelt1-1/+0
Now that every user of MULTI_IRQ_HANDLER has been convereted over to use GENERIC_IRQ_MULTI_HANDLER remove the references to MULTI_IRQ_HANDLER. Signed-off-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux@armlinux.org.uk Cc: catalin.marinas@arm.com Cc: Will Deacon <will.deacon@arm.com> Cc: jonas@southpole.se Cc: stefan.kristiansson@saunalahti.fi Cc: shorne@gmail.com Cc: jason@lakedaemon.net Cc: marc.zyngier@arm.com Cc: Arnd Bergmann <arnd@arndb.de> Cc: nicolas.pitre@linaro.org Cc: vladimir.murzin@arm.com Cc: keescook@chromium.org Cc: jinb.park7@gmail.com Cc: yamada.masahiro@socionext.com Cc: alexandre.belloni@bootlin.com Cc: pombredanne@nexb.com Cc: Greg KH <gregkh@linuxfoundation.org> Cc: kstewart@linuxfoundation.org Cc: jhogan@kernel.org Cc: mark.rutland@arm.com Cc: ard.biesheuvel@linaro.org Cc: james.morse@arm.com Cc: linux-arm-kernel@lists.infradead.org Cc: openrisc@lists.librecores.org Link: https://lkml.kernel.org/r/20180622170126.6308-6-palmer@sifive.com
2018-08-02stop_machine: Reflow cpu_stop_queue_two_works()Peter Zijlstra1-18/+23
The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by lifting the preempt_disable() to the top to create more natural nesting wrt the spinlocks and make the wake_up_q() and preempt_enable() unconditional at the end. Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait with preemption enabled. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: isaacm@codeaurora.org Cc: matt@codeblueprint.co.uk Cc: psodagud@codeaurora.org Cc: gregkh@linuxfoundation.org Cc: pkondeti@codeaurora.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180730112140.GH2494@hirez.programming.kicks-ass.net
2018-08-02clockevents: Warn if cpu_all_mask is used as cpumaskSudeep Holla1-0/+6
Using cpu_all_mask in clockevents cpumask may result in issues while comparing multiple clockevent devices to choose the preferred one. On one of the platforms with 2 system (i.e. non per-CPU) timers with different ratings, having cpu_all_mask for one of the device resulted in a boot hang due to a endless loop in clockevents_notify_released() as both were clocksources were selected as preferred. In order to prevent such issues in the future, warn if any clockevent driver sets cpu_all_mask as it's cpumask and just override it to use cpu_possible_mask. All the existing occurrences of cpu_all_mask are already replaced with cpu_possible_mask. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Link: https://lkml.kernel.org/r/1531308264-24220-3-git-send-email-sudeep.holla@arm.com
2018-08-02tick/broadcast-hrtimer: Use cpu_possible_mask for ce_broadcast_hrtimerSudeep Holla1-1/+1
This is the last instance of cpu_all_mask usage in the core framework. Replace it with cpu_possible_mask like all other instances in the clockevent drivers. This makes it possible to add a warning in the core clockevents_register_device on usage of cpu_all_mask from any clockevent drivers in the future. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Link: https://lkml.kernel.org/r/1531308264-24220-2-git-send-email-sudeep.holla@arm.com
2018-08-02timers: Clear timer_base::must_forward_clk with timer_base::lock heldGaurav Kohli1-13/+16
timer_base::must_forward_clock is indicating that the base clock might be stale due to a long idle sleep. The forwarding of the base clock takes place in the timer softirq or when a timer is enqueued to a base which is idle. If the enqueue of timer to an idle base happens from a remote CPU, then the following race can happen: CPU0 CPU1 run_timer_softirq mod_timer base = lock_timer_base(timer); base->must_forward_clk = false if (base->must_forward_clk) forward(base); -> skipped enqueue_timer(base, timer, idx); -> idx is calculated high due to stale base unlock_timer_base(timer); base = lock_timer_base(timer); forward(base); The root cause is that timer_base::must_forward_clk is cleared outside the timer_base::lock held region, so the remote queuing CPU observes it as cleared, but the base clock is still stale. This can cause large granularity values for timers, i.e. the accuracy of the expiry time suffers. Prevent this by clearing the flag with timer_base::lock held, so that the forwarding takes place before the cleared flag is observable by a remote CPU. Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john.stultz@linaro.org Cc: sboyd@kernel.org Cc: linux-arm-msm@vger.kernel.org Link: https://lkml.kernel.org/r/1533199863-22748-1-git-send-email-gkohli@codeaurora.org
2018-08-02Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar13-28/+130
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-08-01blk-cgroup: clear the throttle queue on forkJosef Bacik1-0/+5
We were hitting a panic in production where we put too many times on the request queue. This is because we'd get the throttle_queue of the parent if we fork()'ed while we needed to be throttled, but we didn't have a reference on it. Instead just clear these flags on fork so the child doesn't pay for the sins of its father. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-31Merge tag 'audit-pr-20180731' of ↵Linus Torvalds1-4/+9
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fix from Paul Moore: "A single small audit fix to guard against memory allocation failures when logging information about a kernel module load. It's small, easy to understand, and self-contained; while nothing is zero risk, this should be pretty low" * tag 'audit-pr-20180731' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix potential null dereference 'context->module.name'
2018-07-31nohz: Fix local_timer_softirq_pending()Anna-Maria Gleixner1-1/+1
local_timer_softirq_pending() checks whether the timer softirq is pending with: local_softirq_pending() & TIMER_SOFTIRQ. This is wrong because TIMER_SOFTIRQ is the softirq number and not a bitmask. So the test checks for the wrong bit. Use BIT(TIMER_SOFTIRQ) instead. Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()") Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Cc: bigeasy@linutronix.de Cc: peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de
2018-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-2/+14
Pull networking fixes from David Miller: "Several smallish fixes, I don't think any of this requires another -rc but I'll leave that up to you: 1) Don't leak uninitialzed bytes to userspace in xfrm_user, from Eric Dumazet. 2) Route leak in xfrm_lookup_route(), from Tommi Rantala. 3) Premature poll() returns in AF_XDP, from Björn Töpel. 4) devlink leak in netdevsim, from Jakub Kicinski. 5) Don't BUG_ON in fib_compute_spec_dst, the condition can legitimately happen. From Lorenzo Bianconi. 6) Fix some spectre v1 gadgets in generic socket code, from Jeremy Cline. 7) Don't allow user to bind to out of range multicast groups, from Dmitry Safonov with a follow-up by Dmitry Safonov. 8) Fix metrics leak in fib6_drop_pcpu_from(), from Sabrina Dubroca" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits) netlink: Don't shift with UB on nlk->ngroups net/ipv6: fix metrics leak xen-netfront: wait xenbus state change when load module manually can: ems_usb: Fix memory leak on ems_usb_disconnect() openvswitch: meter: Fix setting meter id for new entries netlink: Do not subscribe to non-existent groups NET: stmmac: align DMA stuff to largest cache line length tcp_bbr: fix bw probing to raise in-flight data for very small BDPs net: socket: Fix potential spectre v1 gadget in sock_is_registered net: socket: fix potential spectre v1 gadget in socketcall net: mdio-mux: bcm-iproc: fix wrong getter and setter pair ipv4: remove BUG_ON() from fib_compute_spec_dst enic: handle mtu change for vf properly net: lan78xx: fix rx handling before first packet is send nfp: flower: fix port metadata conversion bug bpf: use GFP_ATOMIC instead of GFP_KERNEL in bpf_parse_prog() bpf: fix bpf_skb_load_bytes_relative pkt length check perf build: Build error in libbpf missing initialization net: ena: Fix use of uninitialized DMA address bits field bpf: btf: Use exact btf value_size match in map_check_btf() ...
2018-07-31audit: fix potential null dereference 'context->module.name'Yi Wang1-4/+9
The variable 'context->module.name' may be null pointer when kmalloc return null, so it's better to check it before using to avoid null dereference. Another one more thing this patch does is using kstrdup instead of (kmalloc + strcpy), and signal a lost record via audit_log_lost. Cc: stable@vger.kernel.org # 4.11 Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-07-30cpu/hotplug: Clarify CPU hotplug step name for timersMukesh Ojha1-1/+1
After commit 249d4a9b3246 ("timers: Reinitialize per cpu bases on hotplug") i.e. the introduction of state CPUHP_TIMERS_PREPARE instead of CPUHP_TIMERS_DEAD the step name "timers:dead" is not longer accurate. Rename it to "timers:prepare". [ tglx: Massaged changelog ] Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gkohli@codeaurora.org Cc: neeraju@codeaurora.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Brendan Jackman <brendan.jackman@arm.com> Cc: Mathieu Malaterre <malat@debian.org> Link: https://lkml.kernel.org/r/1532443668-26810-1-git-send-email-mojha@codeaurora.org
2018-07-30Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds4-3/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc fixes: - a deadline scheduler related bug fix which triggered a kernel warning - an RT_RUNTIME_SHARE fix - a stop_machine preemption fix - a potential NULL dereference fix in sched_domain_debug_one()" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/rt: Restore rt_runtime after disabling RT_RUNTIME_SHARE sched/deadline: Update rq_clock of later_rq when pushing a task stop_machine: Disable preemption after queueing stopper threads sched/topology: Check variable group before dereferencing it
2018-07-30Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-2/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: - AMD IBS data corruptor fix (uncovered by UBSAN) - an Intel PEBS entry unwind error fix - a HW-tracing crash fix - a MAINTAINERS update" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix crash when using HW tracing kernel filters perf/x86/intel: Fix unwind errors from PEBS entries (mk-II) MAINTAINERS: Add Naveen N. Rao as kprobes co-maintainer perf/x86/amd/ibs: Don't access non-started event