summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2017-08-13Merge branch 'clockevents/4.13-fixes' of ↵Ingo Molnar1-2/+2
http://git.linaro.org/people/daniel.lezcano/linux into timers/urgent Pull clockevents fixes from Daniel Lezcano: " - Fix error check against IS_ERR() instead of NULL for the timer-of code (Dan Carpenter) - Fix infinite recusion with ftrace for the ARM architected timer (Ding Tianhong) - Fix the error code return in the em_sti's probe function (Gustavo A. R. Silva) - Fix Kconfig dependency for the pistachio driver (Matt Redfearn) - Fix mem frame loop initialization for the ARM architected timer (Matthias Kaehlcke)" Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-11Merge tag 'powerpc-4.13-6' of ↵Linus Torvalds8-68/+101
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "All fixes for code that went in this cycle. - a revert of an optimisation to the syscall exit path, which could lead to an oops on either older machines or machines with > 1TB of memory - disable some deep idle states if the firmware configuration for them fails - re-enable HARD/SOFT lockup detectors in defconfigs after a Kconfig change - six fairly small patches fixing bugs in our new watchdog code Thanks to: Gautham R Shenoy, Nicholas Piggin" * tag 'powerpc-4.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/watchdog: add locking around init/exit functions powerpc/watchdog: Fix marking of stuck CPUs powerpc/watchdog: Fix final-check recovered case powerpc/watchdog: Moderate touch_nmi_watchdog overhead powerpc/watchdog: Improve watchdog lock primitive powerpc: NMI IPI improve lock primitive powerpc/configs: Re-enable HARD/SOFT lockup detectors powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT states when stop-api fails Revert "powerpc/64: Avoid restore_math call if possible in syscall exit"
2017-08-11clocksource/drivers/arm_arch_timer: Avoid infinite recursion when ftrace is ↵Ding Tianhong1-2/+2
enabled On platforms with an arch timer erratum workaround, it's possible for arch_timer_reg_read_stable() to recurse into itself when certain tracing options are enabled, leading to stack overflows and related problems. For example, when PREEMPT_TRACER and FUNCTION_GRAPH_TRACER are selected, it's possible to trigger this with: $ mount -t debugfs nodev /sys/kernel/debug/ $ echo function_graph > /sys/kernel/debug/tracing/current_tracer The problem is that in such cases, preempt_disable() instrumentation attempts to acquire a timestamp via trace_clock(), resulting in a call back to arch_timer_reg_read_stable(), and hence recursion. This patch changes arch_timer_reg_read_stable() to use preempt_{disable,enable}_notrace(), which avoids this. This problem is similar to the fixed by upstream commit 96b3d28bf4 ("sched/clock: Prevent tracing recursion in sched_clock_cpu()"). Fixes: 6acc71ccac71 ("arm64: arch_timer: Allows a CPU-specific erratum to only affect a subset of CPUs") Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-11mm: fix MADV_[FREE|DONTNEED] TLB flush miss problemMinchan Kim5-6/+23
Nadav reported parallel MADV_DONTNEED on same range has a stale TLB problem and Mel fixed it[1] and found same problem on MADV_FREE[2]. Quote from Mel Gorman: "The race in question is CPU 0 running madv_free and updating some PTEs while CPU 1 is also running madv_free and looking at the same PTEs. CPU 1 may have writable TLB entries for a page but fail the pte_dirty check (because CPU 0 has updated it already) and potentially fail to flush. Hence, when madv_free on CPU 1 returns, there are still potentially writable TLB entries and the underlying PTE is still present so that a subsequent write does not necessarily propagate the dirty bit to the underlying PTE any more. Reclaim at some unknown time at the future may then see that the PTE is still clean and discard the page even though a write has happened in the meantime. I think this is possible but I could have missed some protection in madv_free that prevents it happening." This patch aims for solving both problems all at once and is ready for other problem with KSM, MADV_FREE and soft-dirty story[3]. TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can catch there are parallel threads going on. In that case, forcefully, flush TLB to prevent for user to access memory via stale TLB entry although it fail to gather page table entry. I confirmed this patch works with [4] test program Nadav gave so this patch supersedes "mm: Always flush VMA ranges affected by zap_page_range v2" in current mmotm. NOTE: This patch modifies arch-specific TLB gathering interface(x86, ia64, s390, sh, um). It seems most of architecture are straightforward but s390 need to be careful because tlb_flush_mmu works only if mm->context.flush_mm is set to non-zero which happens only a pte entry really is cleared by ptep_get_and_clear and friends. However, this problem never changes the pte entries but need to flush to prevent memory access from stale tlb. [1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net [2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de [3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com [4] https://patchwork.kernel.org/patch/9861621/ [minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu] Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Reported-by: Nadav Amit <namit@vmware.com> Reported-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Jeff Dike <jdike@addtoit.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-11mm: refactor TLB gathering APIMinchan Kim5-15/+23
This patch is a preparatory patch for solving race problems caused by TLB batch. For that, we will increase/decrease TLB flush pending count of mm_struct whenever tlb_[gather|finish]_mmu is called. Before making it simple, this patch separates architecture specific part and rename it to arch_tlb_[gather|finish]_mmu and generic part just calls it. It shouldn't change any behavior. Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Jeff Dike <jdike@addtoit.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds6-11/+63
Pull sparc updates from David Miller: 1) Recognize M8 cpus, just basic chip ID matching, from Allen Pais. 2) Prevent crashes when bringing up sunvdc virtual block devices in some environments. From Jim Quigley. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain sparc64: Increase max_phys_bits to 51 and VA bits to 53 for M8. sparc64: recognize and support sparc M8 cpu type sparc64: properly name the cpu constants
2017-08-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-1/+1952
Pull networking fixes from David Miller: "The pull requests are getting smaller, that's progress I suppose :-) 1) Fix infinite loop in CIPSO option parsing, from Yujuan Qi. 2) Fix remote checksum handling in VXLAN and GUE tunneling drivers, from Koichiro Den. 3) Missing u64_stats_init() calls in several drivers, from Florian Fainelli. 4) TCP can set the congestion window to an invalid ssthresh value after congestion window reductions, from Yuchung Cheng. 5) Fix BPF jit branch generation on s390, from Daniel Borkmann. 6) Correct MIPS ebpf JIT merge, from David Daney. 7) Correct byte order test in BPF test_verifier.c, from Daniel Borkmann. 8) Fix various crashes and leaks in ASIX driver, from Dean Jenkins. 9) Handle SCTP checksums properly in mlx4 driver, from Davide Caratti. 10) We can potentially enter tcp_connect() with a cached route already, due to fastopen, so we have to explicitly invalidate it. 11) skb_warn_bad_offload() can bark in legitimate situations, fix from Willem de Bruijn" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits) net: avoid skb_warn_bad_offload false positives on UFO qmi_wwan: fix NULL deref on disconnect ppp: fix xmit recursion detection on ppp channels rds: Reintroduce statistics counting tcp: fastopen: tcp_connect() must refresh the route net: sched: set xt_tgchk_param par.net properly in ipt_init_target net: dsa: mediatek: add adjust link support for user ports net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets qed: Fix a memory allocation failure test in 'qed_mcp_cmd_init()' hysdn: fix to a race condition in put_log_buffer s390/qeth: fix L3 next-hop in xmit qeth hdr asix: Fix small memory leak in ax88772_unbind() asix: Ensure asix_rx_fixup_info members are all reset asix: Add rx->ax_skb = NULL after usbnet_skb_return() bpf: fix selftest/bpf/test_pkt_md_access on s390x netvsc: fix race on sub channel creation bpf: fix byte order test in test_verifier xgene: Always get clk source, but ignore if it's missing for SGMII ports MIPS: Add missing file for eBPF JIT. bpf, s390: fix build for libbpf and selftest suite ...
2017-08-09powerpc/watchdog: add locking around init/exit functionsNicholas Piggin1-1/+10
When CPUs start and stop the watchdog, they manipulate shared data that is normally protected by the lock. Other CPUs can be running concurrently at this time, so it's a good idea to use locking here to be on the safe side. Remove the barrier which is undocumented and didn't do anything. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-09powerpc/watchdog: Fix marking of stuck CPUsNicholas Piggin1-6/+9
When the SMP detector finds other CPUs stuck, it iterates over them and marks them as stuck. This pulls them out of the pending mask and allows the detector to continue with remaining good CPUs (if nmi_watchdog=panic is not enabled). The code to dothat was buggy because when setting a CPU stuck, if the pending mask became empty, it resets it to keep the watchdog running. However the iterator will continue to run over the new pending mask and mark remaining good CPUs sas stuck. Fix this by doing it with cpumask bitwise operations. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-09powerpc/watchdog: Fix final-check recovered caseNicholas Piggin1-1/+5
When the watchdog decides to panic, it takes the lock and double checks everything (to avoid races with the CPU being unstuck or panic()ed by something else). The exit label was misplaced and would result in all-CPUs backtrace and watchdog panic even in the case that the condition was found to be resolved. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-09powerpc/watchdog: Moderate touch_nmi_watchdog overheadNicholas Piggin1-1/+3
Some code can go into a tight loop calling touch_nmi_watchdog (e.g., stop_machine CPU hotplug code). This can cause contention on watchdog locks particularly if all CPUs with watchdog enabled are spinning in the loops. Avoid this storm of activity by running the watchdog timer callback from this path if we have exceeded the timer period since it was last run. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-09powerpc/watchdog: Improve watchdog lock primitiveNicholas Piggin1-4/+9
- Hard-disable interrupts before taking the lock, which prevents soft-NMI re-entrancy and therefore can prevent deadlocks. - Use raw_ variants of local_irq_disable to avoid irq debugging. - When the lock is contended, spin at low SMT priority, using loads only, and with interrupts enabled (where possible). Some stalls have been noticed at high loads that go away with improved locking. There should not be so much locking contention in the first place (which is addressed in a subsequent patch), but locking should still be improved. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-09powerpc: NMI IPI improve lock primitiveNicholas Piggin1-3/+3
When the NMI IPI lock is contended, spin at low SMT priority, using loads only, and with interrupts enabled (where possible). This improves behaviour under high contention (e.g., a system crash when a number of CPUs are trying to enter the debugger). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-09powerpc/configs: Re-enable HARD/SOFT lockup detectorsMichael Ellerman3-3/+6
In commit 05a4a9527931 ("kernel/watchdog: split up config options"), CONFIG_LOCKUP_DETECTOR was split into two separate config options, CONFIG_HARDLOCKUP_DETECTOR and CONFIG_SOFTLOCKUP_DETECTOR. Our defconfigs still have CONFIG_LOCKUP_DETECTOR=y, but that is no longer user selectable, and we don't mention the new options, so we end up with none of them enabled. So update the defconfigs to turn on the new SOFT and HARD options, the end result being the same as what we had previously. Fixes: 05a4a9527931 ("kernel/watchdog: split up config options") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-08powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT states when stop-api failsGautham R. Shenoy1-3/+38
Currently, we use the opal call opal_slw_set_reg() to inform the Sleep-Winkle Engine (SLW) to restore the contents of some of the Hypervisor state on wakeup from deep idle states that lose full hypervisor context (characterized by the flag OPAL_PM_LOSE_FULL_CONTEXT). However, the current code has a bug in that if opal_slw_set_reg() fails, we don't disable the use of these deep states (winkle on POWER8, stop4 onwards on POWER9). This patch fixes this bug by ensuring that if programing the sleep-winkle engine to restore the hypervisor states in pnv_save_sprs_for_deep_states() fails, then we exclude such states by clearing the OPAL_PM_LOSE_FULL_CONTEXT flag from supported_cpuidle_states. As a result POWER8 will be prevented from using winkle for CPU-Hotplug, and POWER9 will put the offlined CPUs to the default stop state when available. Further, we ensure in the initialization of the cpuidle-powernv driver to only include those states whose flags are present in supported_cpuidle_states, thereby skipping OPAL_PM_LOSE_FULL_CONTEXT states when they have been disabled due to stop-api failure. Fixes: 1e1601b38e6 ("powerpc/powernv/idle: Restore SPRs for deep idle states via stop API.") Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-08Merge tag 'xtensa-20170807' of git://github.com/jcmvbkbc/linux-xtensaLinus Torvalds5-43/+10
Pull Xtensa fixes from Max Filippov: - use asm-generic instances of asm/param.h and asm/device.h instead of exact copies in arch/xtensa/include/asm; - fix build error for xtensa cores with aliasing WT cache: define cache flushing functions and copy_{to,from}_user_page; - add missing EXPORT_SYMBOLs for clear_user_highpage, copy_user_highpage, flush_dcache_page, local_flush_cache_range, local_flush_cache_page, csum_partial and csum_partial_copy_generic. * tag 'xtensa-20170807' of git://github.com/jcmvbkbc/linux-xtensa: xtensa: mm/cache: add missing EXPORT_SYMBOLs xtensa: don't limit csum_partial export by CONFIG_NET xtensa: fix cache aliasing handling code for WT cache xtensa: remove wrapper header for asm/param.h xtensa: remove wrapper header for asm/device.h
2017-08-07Revert "powerpc/64: Avoid restore_math call if possible in syscall exit"Michael Ellerman2-46/+18
This reverts commit bc4f65e4cf9d6cc43e0e9ba0b8648cf9201cd55f. As reported by Andreas, this commit is causing unrecoverable SLB misses in the system call exit path: Unrecoverable exception 4100 at c00000000000a1ec Oops: Unrecoverable exception, sig: 6 [#1] SMP NR_CPUS=2 PowerMac ... CPU: 0 PID: 18626 Comm: rm Not tainted 4.13.0-rc3 #1 task: c00000018335e080 task.stack: c000000139e50000 NIP: c00000000000a1ec LR: c00000000000a118 CTR: 0000000000000000 REGS: c000000139e53bb0 TRAP: 4100 Not tainted (4.13.0-rc3) MSR: 9000000000001030 <SF,HV,ME,IR,DR> CR: 24000044 XER: 20000000 SOFTE: 1 GPR00: 0000000000000000 c000000139e53e30 c000000000abb500 fffffffffffffffe GPR04: c0000001eb866298 0000000000000000 0000000000000000 c00000018335e080 GPR08: 900000000000d032 0000000000000000 0000000000000002 fffffffffffff001 GPR12: c000000139e50000 c00000000ffff000 00003fffa8c0dca0 00003fffa8c0dc88 GPR16: 0000000010000000 0000000000000001 00003fffa8c0eaa0 0000000000000000 GPR20: 00003fffa8c27528 00003fffa8c27b00 0000000000000000 0000000000000000 GPR24: 00003fffa8c0d918 00003ffff1b3efa0 00003fffa8c26d68 0000000000000000 GPR28: 00003fffa8c249e8 00003fffa8c263d0 00003fffa8c27550 00003ffff1b3ef10 NIP [c00000000000a1ec] system_call_exit+0xc0/0x21c LR [c00000000000a118] system_call+0x58/0x6c Call Trace: [c000000139e53e30] [c00000000000a118] system_call+0x58/0x6c (unreliable) Instruction dump: 64a51000 7c6300d0 f8a101a0 4bffff9c 3c000000 60000006 780007c6 64000000 60000000 7c004039 4082001c e8ed0170 <88070b78> 88c70b79 7c003214 2c200000 This is caused by us trying to load THREAD_LOAD_FP with MSR_RI=0, and taking an SLB miss on the thread struct. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Diagnosed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-06Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds2-0/+3
Pull MIPS fixes from Ralf Baechle: "This fixes two build issues for ralink platforms, both due to missing #includes which used to be included indirectly via other headers" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: ralink: mt7620: Add missing header MIPS: ralink: Fix build error due to missing header
2017-08-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds5-85/+190
Pull KVM fixes from Radim Krčmář: "ARM: - Yet another race with VM destruction plugged - A set of small vgic fixes x86: - Preserve pending INIT - RCU fixes in paravirtual async pf, VM teardown, and VMXOFF emulation - nVMX interrupt injection and dirty tracking fixes - initialize to make UBSAN happy" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: arm/arm64: vgic: Use READ_ONCE fo cmpxchg KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit" KVM: nVMX: mark vmcs12 pages dirty on L2 exit kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown KVM: nVMX: do not pin the VMCS12 KVM: avoid using rcu_dereference_protected KVM: X86: init irq->level in kvm_pv_kick_cpu_op KVM: X86: Fix loss of pending INIT due to race KVM: async_pf: make rcu irq exit if not triggered from idle task KVM: nVMX: fixes to nested virt interrupt injection KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12 KVM: arm/arm64: Handle hva aging while destroying the vm KVM: arm/arm64: PMU: Fix overflow interrupt injection KVM: arm/arm64: Fix bug in advertising KVM_CAP_MSI_DEVID capability
2017-08-05Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-16/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "The recent irq core changes unearthed API abuse in the HPET code, which manifested itself in a suspend/resume regression. The fix replaces the cruft with the proper function calls and cures the regression" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hpet: Cure interface abuse in the resume path
2017-08-05Merge tag 'armsoc-fixes' of ↵Linus Torvalds43-102/+356
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "This comes a bit later than I planned, and as a consequence is a larger than it should be. Most of the changes are devicetree fixes, across lots of platforms: Renesas, Samsung Exynos, Marvell EBU, TI OMAP, Rockchips, Amlogic Meson, Sigma Desings Tango, Allwinner SUNxi and TI Davinci. Also across many platforms, I applied an older series of simple randconfig build fixes. This includes making the CONFIG_MTD_XIP option compile again, which had been broken for many years and probably has not been missed, but it felt wrong to just remove it completely. The only other changes are: - We enable HWSPINLOCK in defconfig to get some Qualcomm boards to work out of the box. - A few regression fixes for Texas Instruments OMAP2+. - A boot regression fix for the Renesas regulator quirk. - A suspend/resume fix for Uniphier SoCs, fixing the resume of the system bus" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (43 commits) ARM: dts: tango4: Request RGMII RX and TX clock delays bus: uniphier-system-bus: set up registers when resuming ARM64: dts: marvell: armada-37xx: Fix the number of GPIO on south bridge ARM: shmobile: rcar-gen2: Fix deadlock in regulator quirk arm64: defconfig: enable missing HWSPINLOCK ARM: pxa: select both FB and FB_W100 for eseries ARM: ixp4xx: fix ioport_unmap definition ARM: ep93xx: use ARM_PATCH_PHYS_VIRT correctly ARM: mmp: mark usb_dma_mask as __maybe_unused ARM: omap2: mark unused functions as __maybe_unused ARM: omap1: avoid unused variable warning ARM: sirf: mark sirfsoc_init_late as __maybe_unused ARM: ixp4xx: use normal prototype for {read,write}s{b,w,l} ARM: omap1/ams-delta: warn about failed regulator enable ARM: rpc: rename RAM_SIZE macro ARM: w90x900: normalize clk API ARM: ep93xx: normalize clk API ARM: dts: sun8i: a83t: Switch to CCU device tree binding macros arm64: allwinner: sun50i-a64: Correct emac register size ARM: dts: sunxi: h3/h5: Correct emac register size ...
2017-08-04Merge tag 'arm64-fixes' of ↵Linus Torvalds3-10/+13
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Here are some more arm64 fixes for 4.13. The main one is the PTE race with the hardware walker, but there are a couple of other things too. - Report correct timer frequency to userspace when trapping CNTFRQ_EL0 - Fix race with hardware page table updates when updating access flags - Silence clang overflow warning in VA_START and PAGE_OFFSET calculations" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: avoid overflow in VA_START and PAGE_OFFSET arm64: Fix potential race with hardware DBM in ptep_set_access_flags() arm64: Use arch_timer_get_rate when trapping CNTFRQ_EL0
2017-08-04MIPS: Add missing file for eBPF JIT.David Daney1-0/+1950
Inexplicably, commit f381bf6d82f0 ("MIPS: Add support for eBPF JIT.") lost a file somewhere on its path to Linus' tree. Add back the missing ebpf_jit.c so that we can build with CONFIG_BPF_JIT selected. This version of ebpf_jit.c is identical to the original except for two minor change need to resolve conflicts with changes merged from the BPF branch: A) Set prog->jited_len = image_size; B) Use BPF_TAIL_CALL instead of BPF_CALL | BPF_X Fixes: f381bf6d82f0 ("MIPS: Add support for eBPF JIT.") Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-04bpf, s390: fix jit branch offset related to ldimm64Daniel Borkmann1-1/+2
While testing some other work that required JIT modifications, I run into test_bpf causing a hang when JIT enabled on s390. The problematic test case was the one from ddc665a4bb4b (bpf, arm64: fix jit branch offset related to ldimm64), and turns out that we do have a similar issue on s390 as well. In bpf_jit_prog() we update next instruction address after returning from bpf_jit_insn() with an insn_count. bpf_jit_insn() returns either -1 in case of error (e.g. unsupported insn), 1 or 2. The latter is only the case for ldimm64 due to spanning 2 insns, however, next address is only set to i + 1 not taking actual insn_count into account, thus fix is to use insn_count instead of 1. bpf_jit_enable in mode 2 provides also disasm on s390: Before fix: 000003ff800349b6: a7f40003 brc 15,3ff800349bc ; target 000003ff800349ba: 0000 unknown 000003ff800349bc: e3b0f0700024 stg %r11,112(%r15) 000003ff800349c2: e3e0f0880024 stg %r14,136(%r15) 000003ff800349c8: 0db0 basr %r11,%r0 000003ff800349ca: c0ef00000000 llilf %r14,0 000003ff800349d0: e320b0360004 lg %r2,54(%r11) 000003ff800349d6: e330b03e0004 lg %r3,62(%r11) 000003ff800349dc: ec23ffeda065 clgrj %r2,%r3,10,3ff800349b6 ; jmp 000003ff800349e2: e3e0b0460004 lg %r14,70(%r11) 000003ff800349e8: e3e0b04e0004 lg %r14,78(%r11) 000003ff800349ee: b904002e lgr %r2,%r14 000003ff800349f2: e3b0f0700004 lg %r11,112(%r15) 000003ff800349f8: e3e0f0880004 lg %r14,136(%r15) 000003ff800349fe: 07fe bcr 15,%r14 After fix: 000003ff80ef3db4: a7f40003 brc 15,3ff80ef3dba 000003ff80ef3db8: 0000 unknown 000003ff80ef3dba: e3b0f0700024 stg %r11,112(%r15) 000003ff80ef3dc0: e3e0f0880024 stg %r14,136(%r15) 000003ff80ef3dc6: 0db0 basr %r11,%r0 000003ff80ef3dc8: c0ef00000000 llilf %r14,0 000003ff80ef3dce: e320b0360004 lg %r2,54(%r11) 000003ff80ef3dd4: e330b03e0004 lg %r3,62(%r11) 000003ff80ef3dda: ec230006a065 clgrj %r2,%r3,10,3ff80ef3de6 ; jmp 000003ff80ef3de0: e3e0b0460004 lg %r14,70(%r11) 000003ff80ef3de6: e3e0b04e0004 lg %r14,78(%r11) ; target 000003ff80ef3dec: b904002e lgr %r2,%r14 000003ff80ef3df0: e3b0f0700004 lg %r11,112(%r15) 000003ff80ef3df6: e3e0f0880004 lg %r14,136(%r15) 000003ff80ef3dfc: 07fe bcr 15,%r14 test_bpf.ko suite runs fine after the fix. Fixes: 054623105728 ("s390/bpf: Add s390x eBPF JIT compiler backend") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-04sparc64: Increase max_phys_bits to 51 and VA bits to 53 for M8.Vijay Kumar1-1/+11
On M8 chips, use a max_phys_bits value of 51. Also, M8 supports VA bits up to 54 bits. However, for now restrict VA bits to 53 due to 4-level pagetable limitation. Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-04sparc64: recognize and support sparc M8 cpu typeAllen Pais6-2/+30
Recognize SPARC-M8 cpu type, hardware caps and cpu distribution map. Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: David Aldridge <david.j.aldridge@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-04sparc64: properly name the cpu constantsAllen Pais2-8/+22
Acked-by: Sam Ravnborg <sam@ravnborg.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds7-18/+48
Pull sparc fixes from David Miller: - block interrupts properly across the entire MMU context change (both the hw MMU context change and the TSB table change) so that we don't get a perf event interrupt in the middle. From Rob Gardner. - be sure to register hugepages early enough, from Nitin Gupta. - UltraSPARC-III user copy exception handling would return garbage for the copied length in some circumstances. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Fix exception handling in UltraSPARC-III memcpy. sbus: Convert to using %pOF instead of full_name sparc: defconfig: Cleanup from old Kconfig options sparc64: Register hugepages during arch init sparc64: Prevent perf from running during super critical sections
2017-08-04Merge tag 'powerpc-4.13-5' of ↵Linus Torvalds9-21/+71
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fixes for recently merged code: - a fix for the _PAGE_DEVMAP support, which was breaking KVM on Power9 radix - avoid a (harmless) lockdep warning in the early SMP code - return failure for some uses of dma_set_mask() rather than falling back to 32-bits - fix stack setup in watchdog soft_nmi_common() to use emergency stack - fix of_irq_to_resource() error check in of_fsl_spi_probe() Two fixes going to stable: - fix saving of Transactional Memory SPRs in core dump - fix __check_irq_replay missing decrementer interrupt And two misc: - fix 64-bit boot wrapper build with non-biarch compiler - work around a POWER9 PMU hang after state-loss idle Thanks to: Alistair Popple, Aneesh Kumar K.V, Cyril Bur, Gustavo Romero, Jose Ricardo Ziviani, Laurent Vivier, Nicholas Piggin, Oliver O'Halloran, Sergei Shtylyov, Suraj Jitindar Singh, Thomas Gleixner" * tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64: Fix __check_irq_replay missing decrementer interrupt powerpc/perf: POWER9 PMU stops after idle workaround powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check powerpc/64s: Fix stack setup in watchdog soft_nmi_common() powerpc/powernv/pci: Return failure for some uses of dma_set_mask() powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU powerpc/tm: Fix saving of TM SPRs in core dump powerpc/mm: Fix pmd/pte_devmap() on non-leaf entries
2017-08-04sparc64: Fix exception handling in UltraSPARC-III memcpy.David S. Miller1-2/+2
Mikael Pettersson reported that some test programs in the strace-4.18 testsuite cause an OOPS. After some debugging it turns out that garbage values are returned when an exception occurs, causing the fixup memset() to be run with bogus arguments. The problem is that two of the exception handler stubs write the successfully copied length into the wrong register. Fixes: ee841d0aff64 ("sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.") Reported-by: Mikael Pettersson <mikpelinux@gmail.com> Tested-by: Mikael Pettersson <mikpelinux@gmail.com> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-04arm64: avoid overflow in VA_START and PAGE_OFFSETNick Desaulniers1-2/+4
The bitmask used to define these values produces overflow, as seen by this compiler warning: arch/arm64/kernel/head.S:47:8: warning: integer overflow in preprocessor expression #elif (PAGE_OFFSET & 0x1fffff) != 0 ^~~~~~~~~~~ arch/arm64/include/asm/memory.h:52:46: note: expanded from macro 'PAGE_OFFSET' #define PAGE_OFFSET (UL(0xffffffffffffffff) << (VA_BITS - 1)) ~~~~~~~~~~~~~~~~~~ ^ It would be preferrable to use GENMASK_ULL() instead, but it's not set up to be used from assembly (the UL() macro token pastes UL suffixes when not included in assembly sources). Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Suggested-by: Yury Norov <ynorov@caviumnetworks.com> Suggested-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-04arm64: Fix potential race with hardware DBM in ptep_set_access_flags()Catalin Marinas1-7/+8
In a system with DBM (dirty bit management) capable agents there is a possible race between a CPU executing ptep_set_access_flags() (maybe non-DBM capable) and a hardware update of the dirty state (clearing of PTE_RDONLY). The scenario: a) the pte is writable (PTE_WRITE set), clean (PTE_RDONLY set) and old (PTE_AF clear) b) ptep_set_access_flags() is called as a result of a read access and it needs to set the pte to writable, clean and young (PTE_AF set) c) a DBM-capable agent, as a result of a different write access, is marking the entry as young (setting PTE_AF) and dirty (clearing PTE_RDONLY) The current ptep_set_access_flags() implementation would set the PTE_RDONLY bit in the resulting value overriding the DBM update and losing the dirty state. This patch fixes such race by setting PTE_RDONLY to the most permissive (lowest value) of the current entry and the new one. Fixes: 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM") Cc: Will Deacon <will.deacon@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-04Merge tag 'davinci-fixes-for-v4.13' of ↵Arnd Bergmann2-28/+0
git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into fixes Pull "DaVinci fixes for v4.13" from Sekhar Nori: Drop unused VPIF endpoints from device-tree. They should be used only when an actual remote-endpoint is connected. * tag 'davinci-fixes-for-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci: ARM: dts: da850-lcdk: drop unused VPIF endpoints ARM: dts: da850-evm: drop unused VPIF endpoints
2017-08-04Merge tag 'sunxi-fixes-for-4.13' of ↵Arnd Bergmann3-9/+11
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes Pull "Allwinner fixes for 4.13" from Chen-Yu Tsai: Two fixes to correct the EMAC blocks memory region size to match the datasheet. One that converts raw A83T clock indices to macros from the clk dt-binding header, completing the A83T sunxi-ng clk driver. * tag 'sunxi-fixes-for-4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: ARM: dts: sun8i: a83t: Switch to CCU device tree binding macros arm64: allwinner: sun50i-a64: Correct emac register size ARM: dts: sunxi: h3/h5: Correct emac register size
2017-08-04Merge tag 'qcom-arm64-defconfig-fixes-for-4.13-rc2' of ↵Arnd Bergmann1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into fixes Pull "Qualcomm ARM64 based defconfig Fixes for v4.13-rc2" from Andy Gross: * Enable missing HWSPINLOCK * tag 'qcom-arm64-defconfig-fixes-for-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux: arm64: defconfig: enable missing HWSPINLOCK
2017-08-04ARM: dts: tango4: Request RGMII RX and TX clock delaysMarc Gonzalez1-1/+1
RX and TX clock delays are required. Request them explicitly. Fixes: cad008b8a77e6 ("ARM: dts: tango4: Initial device trees") Cc: stable@vger.kernel.org Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2017-08-04Merge tag 'renesas-fixes3-for-v4.13' of ↵Arnd Bergmann1-1/+5
https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes Pull "Third Round of Renesas ARM Based SoC Fixes for v4.13" from Simon Horman: Fix deadlock in regulator quirk for R-Car Gen 2 SoCs The da9063/da9210 regulator quirk for R-Car Gen2 boards uses a bus notifier, and unregisters the notifier when it is no longer needed. However, a notifier must not be unregistered from within the call chain. This bug went unnoticed, as blocking_notifier_chain_unregister() didn't take the semaphore during early boot. This is no longer the case as of upstream commit 1c3c5eab171590f8 ("sched/core: Enable might_sleep() and smp_processor_id() checks early") and a deadlock occurs. * tag 'renesas-fixes3-for-v4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: ARM: shmobile: rcar-gen2: Fix deadlock in regulator quirk
2017-08-04Merge tag 'mvebu-fixes-4.13-2' of git://git.infradead.org/linux-mvebu into fixesArnd Bergmann3-2/+4
Pull "mvebu fixes for 4.13 (part 2)" from Gregory CLEMENT: All the fixes are for ARM64 mvebu: - Fix the RTC interrupt on A7K/A8K which was missed when switching from GIC to ICU - Mark the A7K/A8K crypto engine as dma coherent - Fix the number of GPIO on south bridge on Armada 3700 * tag 'mvebu-fixes-4.13-2' of git://git.infradead.org/linux-mvebu: ARM64: dts: marvell: armada-37xx: Fix the number of GPIO on south bridge arm64: dts: marvell: mark the cp110 crypto engine as dma coherent arm64: dts: marvell: use ICU for the CP110 slave RTC
2017-08-04Merge tag 'amlogic-fixes' of ↵Arnd Bergmann3-15/+94
git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic into fixes Pull "Amlogic fixes for v4.13-rc" from Kevin Hilman: - 2 minor DT fixes * tag 'amlogic-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic: ARM64: dts: meson-gxl-s905x-libretech-cc: fixup board definition ARM64: dts: meson-gx: use specific compatible for the AO pwms
2017-08-04Merge tag 'v4.13-rockchip-dts32fixes-1' of ↵Arnd Bergmann1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into fixes Pull "Rockchip dts32 fixes for 4.13" from Heiko Stübner: Fix for the recently added mali dt support. The example showed a wrong value, so fix it before it gets copy-pasted to much. * tag 'v4.13-rockchip-dts32fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: ARM: dts: rockchip: fix mali gpu node on rk3288 dt-bindings: gpu: drop wrong compatible from midgard binding example
2017-08-04powerpc/64: Fix __check_irq_replay missing decrementer interruptNicholas Piggin1-1/+14
If the decrementer wraps again and de-asserts the decrementer exception while hard-disabled, __check_irq_replay() has a test to notice the wrap when interrupts are re-enabled. The decrementer check must be done when clearing the PACA_IRQ_HARD_DIS flag, not when the PACA_IRQ_DEC flag is tested. Previously this worked because the decrementer interrupt was always the first one checked after clearing the hard disable flag, but HMI check was moved ahead of that, which introduced this bug. This can cause a missed decrementer interrupt if we soft-disable interrupts then take an HMI which is recorded in irq_happened, then hard-disable interrupts for > 4s to wrap the decrementer. Fixes: e0e0d6b7390b ("powerpc/64: Replay hypervisor maintenance interrupt first") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-04powerpc/perf: POWER9 PMU stops after idle workaroundNicholas Piggin1-1/+7
POWER9 DD2 PMU can stop after a state-loss idle in some conditions. A solution is to set then clear MMCRA[60] after wake from state-loss idle. MMCRA[60] is a non-architected bit, see the user manual for details. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03Merge tag 'pm-4.13-rc4' of ↵Linus Torvalds1-14/+26
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two cpufreq issues, one introduced recently and one related to recent changes, fix cpufreq documentation, fix up recently added code in the Thunderbolt driver and update runtime PM framework documentation. Specifics: - Fix the handling of the scaling_cur_freq cpufreq policy attribute on x86 systems with the MPERF/APERF registers present to make it behave more as expected after recent changes (Rafael Wysocki). - Drop a leftover callback from the intel_pstate driver which also prevents the cpuinfo_cur_freq cpufreq policy attribute from being incorrectly exposed when intel_pstate works in the active mode (Rafael Wysocki). - Add a missing piece describing the cpuinfo_cur_freq policy attribute to cpufreq documentation (Rafael Wysocki). - Fix up a recently added part of the Thunderbolt driver to avoid aborting system suspends if its mailbox commands time out (Rafael Wysocki). - Update device runtime PM framework documentation to reflect the current behavior of the code (Johan Hovold)" * tag 'pm-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thunderbolt: icm: Ignore mailbox errors in icm_suspend() cpufreq: x86: Make scaling_cur_freq behave more as expected PM / runtime: Document new pm_runtime_set_suspended() constraint cpufreq: docs: Add missing cpuinfo_cur_freq description cpufreq: intel_pstate: Drop ->get from intel_pstate structure
2017-08-03Merge branches 'pm-cpufreq-x86', 'pm-cpufreq-docs' and 'intel_pstate'Rafael J. Wysocki1-14/+26
* pm-cpufreq-x86: cpufreq: x86: Make scaling_cur_freq behave more as expected * pm-cpufreq-docs: cpufreq: docs: Add missing cpuinfo_cur_freq description * intel_pstate: cpufreq: intel_pstate: Drop ->get from intel_pstate structure
2017-08-03Merge tag 'kvm-arm-for-v4.13-rc4' of ↵Radim Krčmář1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM Fixes for v4.13-rc4 - Yet another race with VM destruction plugged - A set of small vgic fixes
2017-08-03KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"Wanpeng Li1-2/+9
------------[ cut here ]------------ WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7 RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] Call Trace: vmx_check_nested_events+0x131/0x1f0 [kvm_intel] ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm] ? vmx_vcpu_load+0x1be/0x220 [kvm_intel] ? kvm_arch_vcpu_load+0x62/0x230 [kvm] kvm_vcpu_ioctl+0x340/0x700 [kvm] ? kvm_vcpu_ioctl+0x340/0x700 [kvm] ? __fget+0xfc/0x210 do_vfs_ioctl+0xa4/0x6a0 ? __fget+0x11d/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x8f/0x750 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which means that tells the kernel to not make use of any IOAPICs that may be present in the system. Actually external_intr variable in nested_vmx_vmexit() is the req_int_win variable passed from vcpu_enter_guest() which means that the L0's userspace requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) && L0's userspace reqeusts an irq window) is true, so there is no interrupt which L1 requires to inject to L2, we should not attempt to emualte "Acknowledge interrupt on exit" for the irq window requirement in this scenario. This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit" if there is no L1 requirement to inject an interrupt to L2. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> [Added code comment to make it obvious that the behavior is not correct. We should do a userspace exit with open interrupt window instead of the nested VM exit. This patch still improves the behavior, so it was accepted as a (temporary) workaround.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02KVM: nVMX: mark vmcs12 pages dirty on L2 exitDavid Matlack1-10/+43
The host physical addresses of L1's Virtual APIC Page and Posted Interrupt descriptor are loaded into the VMCS02. The CPU may write to these pages via their host physical address while L2 is running, bypassing address-translation-based dirty tracking (e.g. EPT write protection). Mark them dirty on every exit from L2 to prevent them from getting out of sync with dirty tracking. Also mark the virtual APIC page and the posted interrupt descriptor dirty when KVM is virtualizing posted interrupt processing. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardownDavid Matlack1-5/+11
According to the Intel SDM, software cannot rely on the current VMCS to be coherent after a VMXOFF or shutdown. So this is a valid way to handle VMCS12 flushes. 24.11.1 Software Use of Virtual-Machine Control Structures ... If a logical processor leaves VMX operation, any VMCSs active on that logical processor may be corrupted (see below). To prevent such corruption of a VMCS that may be used either after a return to VMX operation or on another logical processor, software should execute VMCLEAR for that VMCS before executing the VMXOFF instruction or removing power from the processor (e.g., as part of a transition to the S3 and S4 power states). ... This fixes a "suspicious rcu_dereference_check() usage!" warning during kvm_vm_release() because nested_release_vmcs12() calls kvm_vcpu_write_guest_page() without holding kvm->srcu. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02KVM: nVMX: do not pin the VMCS12Paolo Bonzini1-17/+7
Since the current implementation of VMCS12 does a memcpy in and out of guest memory, we do not need current_vmcs12 and current_vmcs12_page anymore. current_vmptr is enough to read and write the VMCS12. And David Matlack noted: This patch also fixes dirty tracking (memslot->dirty_bitmap) of the VMCS12 page by using kvm_write_guest. nested_release_page() only marks the struct page dirty. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Added David Matlack's note and nested_release_page_clean() fix.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02KVM: X86: init irq->level in kvm_pv_kick_cpu_opLongpeng(Mike)1-0/+1
'lapic_irq' is a local variable and its 'level' field isn't initialized, so 'level' is random, it doesn't matter but makes UBSAN unhappy: UBSAN: Undefined behaviour in .../lapic.c:... load of value 10 is not a valid value for type '_Bool' ... Call Trace: [<ffffffff81f030b6>] dump_stack+0x1e/0x20 [<ffffffff81f03173>] ubsan_epilogue+0x12/0x55 [<ffffffff81f03b96>] __ubsan_handle_load_invalid_value+0x118/0x162 [<ffffffffa1575173>] kvm_apic_set_irq+0xc3/0xf0 [kvm] [<ffffffffa1575b20>] kvm_irq_delivery_to_apic_fast+0x450/0x910 [kvm] [<ffffffffa15858ea>] kvm_irq_delivery_to_apic+0xfa/0x7a0 [kvm] [<ffffffffa1517f4e>] kvm_emulate_hypercall+0x62e/0x760 [kvm] [<ffffffffa113141a>] handle_vmcall+0x1a/0x30 [kvm_intel] [<ffffffffa114e592>] vmx_handle_exit+0x7a2/0x1fa0 [kvm_intel] ... Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>