BMC/Intel-BMC/linux.git - Intel OpenBMC Linux kernel source tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2019-06-19	arm64: dts: ti: Add support for J721E Common Processor Board	Nishanth Menon	3	-0/+81
	Add Support for J721E Common Processor board support. The EVM architecture is as follows: +------------------------------------------------------+ \| +-------------------------------------------+ \| \| \| \| \| \| \| Add-on Card 1 Options \| \| \| \| \| \| \| +-------------------------------------------+ \| \| \| \| \| \| +-------------------+ \| \| \| \| \| \| \| SOM \| \| \| +--------------+ \| \| \| \| \| \| \| \| \| \| \| Add-on \| +-------------------+ \| \| \| Card 2 \| \| Power Supply \| \| Options \| \| \| \| \| \| \| \| \| +--------------+ \| <--- +------------------------------------------------------+ Common Processor Board Common Processor board is the baseboard that has most of the actual connectors, power supply etc. A SOM (System on Module) is plugged on to the common processor board and this contains the SoC, PMIC, DDR and basic high speed components necessary for functionality. Add-n card options add further functionality (such as additional Audio, Display, networking options). Note: A) The minimum configuration required to boot up the board is System On Module(SOM) + Common Processor Board. B) Since there is just a single SOM and Common Processor Board, we are maintaining common processor board as the base dts and SOM as the dtsi that we include. In the future as more SOM's appear, we should move common processor board as a dtsi and include configurations as dts. C) All daughter cards beyond the basic boards shall be maintained as overlays. Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Tero Kristo <t-kristo@ti.com>
2019-06-19	arm64: dts: ti: Add Support for J721E SoC	Nishanth Menon	3	-0/+450
	The J721E SoC belongs to the K3 Multicore SoC architecture platform, providing advanced system integration to enable lower system costs of automotive applications such as infotainment, cluster, premium Audio, Gateway, industrial and a range of broad market applications. This SoC is designed around reducing the system cost by eliminating the need of an external system MCU and is targeted towards ASIL-B/C certification/requirements in addition to allowing complex software and system use-cases. Some highlights of this SoC are: * Dual Cortex-A72s in a single cluster, three clusters of lockstep capable dual Cortex-R5F MCUs, Deep-learning Matrix Multiply Accelerator(MMA), C7x floating point Vector DSP, Two C66x floating point DSPs. * 3D GPU PowerVR Rogue 8XE GE8430 * Vision Processing Accelerator (VPAC) with image signal processor and Depth and Motion Processing Accelerator (DMPAC) * Two Gigabit Industrial Communication Subsystems (ICSSG), each with dual PRUs and dual RTUs * Two CSI2.0 4L RX plus one CSI2.0 4L TX, one eDP/DP, One DSI Tx, and up to two DPI interfaces. * Integrated Ethernet switch supporting up to a total of 8 external ports in addition to legacy Ethernet switch of up to 2 ports. * System MMU (SMMU) Version 3.0 and advanced virtualisation capabilities. * Upto 4 PCIe-GEN3 controllers, 2 USB3.0 Dual-role device subsystems, 16 MCANs, 12 McASP, eMMC and SD, UFS, OSPI/HyperBus memory controller, QSPI, I3C and I2C, eCAP/eQEP, eHRPWM, MLB among other peripherals. * Two hardware accelerator block containing AES/DES/SHA/MD5 called SA2UL management. * Configurable L3 Cache and IO-coherent architecture with high data throughput capable distributed DMA architecture under NAVSS * Centralized System Controller for Security, Power, and Resource Management (DMSC) See J721E Technical Reference Manual (SPRUIL1, May 2019) for further details: http://www.ti.com/lit/pdf/spruil1 Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Suman Anna <s-anna@ti.com> Signed-off-by: Tero Kristo <t-kristo@ti.com>
2019-06-19	x86/microcode: Fix the microcode load on CPU hotplug for real	Thomas Gleixner	1	-5/+10
	A recent change moved the microcode loader hotplug callback into the early startup phase which is running with interrupts disabled. It missed that the callbacks invoke sysfs functions which might sleep causing nice 'might sleep' splats with proper debugging enabled. Split the callbacks and only load the microcode in the early startup phase and move the sysfs handling back into the later threaded and preemptible bringup phase where it was before. Fixes: 78f4e932f776 ("x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906182228350.1766@nanos.tec.linutronix.de
2019-06-19	arm64: dts: qcom: pm8998: Use qcom,pm8998-pon binding for second gen pon	John Stultz	1	-1/+1
	This changes pm8998 to use the new qcom,pm8998-pon compatible string for the pon in order to support the gen2 pon functionality properly. Cc: Andy Gross <agross@kernel.org> Cc: David Brown <david.brown@linaro.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Amit Pundir <amit.pundir@linaro.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: linux-arm-msm@vger.kernel.org Cc: devicetree@vger.kernel.org Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Andy Gross <agross@kernel.org>
2019-06-19	arm64: dts: qcom: msm8996: Enable SMMUs	Bjorn Andersson	1	-6/+0
	Enable SMMUs on 8996 now that the WRZ workaround in the arm-smmu driver has landed. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Andy Gross <agross@kernel.org>
2019-06-19	arm64: dts: qcom: msm8996: Correct apr-domain property	Bjorn Andersson	1	-1/+1
	The domain specifier was changed from using "reg" to "qcom,apr-domain", update the dts accordingly. Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-06-19	arm64: dts: qcom: Add Dragonboard 845c	Bjorn Andersson	2	-0/+558
	This adds an initial dts for the Dragonboard 845. Supported functionality includes Debug UART, UFS, USB-C (peripheral), USB-A (host), microSD-card and Bluetooth. Initializing the SMMU is clearing the mapping used for the splash screen framebuffer, which causes the board to reboot. This can be worked around using: fastboot oem select-display-panel none Reviewed-by: Vinod Koul <vkoul@kernel.org> Reviewed-by: Vivek Gautam <vivek.gautam@codeaurora.org> Tested-by: Vinod Koul <vkoul@kernel.org> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-06-18	arm64: dts: msm8998-mtp: Add pm8005_s1 regulator	Jeffrey Hugo	1	-0/+17
	The pm8005_s1 is VDD_GFX, and needs to be on to enable the GPU. This should be hooked up to the GPU CPR, but we don't have support for that yet, so until then, just turn on the regulator and keep it on so that we can focus on basic GPU bringup. Signed-off-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-06-18	Merge tag 'v5.2-rc4' into regulator-5.3	Mark Brown	3392	-23922/+4732
	Linux 5.2-rc4
2019-06-18	Merge tag 'armsoc-fixes' of ↵	Linus Torvalds	42	-40/+100
	git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Olof Johansson: "I've been bad at collecting fixes this release cycle, so this is a fairly large batch that's been trickling in for a while. It's the usual mix, more or less. Some of the bigger things fixed: - Voltage fix for MMC on TI DRA7 that sometimes would overvoltage cards - Regression fixes for D_CAN on am355x - i.MX6SX cpuidle fix to deal with wakeup latency (dropped uart chars) - DT fixes for some DRA7 variants that don't share the superset of blocks on the chip plus the usual mix of stuff: minor build/warning fixes, Kconfig dependencies, and some DT fixlets" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (28 commits) soc: ixp4xx: npe: Fix an IS_ERR() vs NULL check in probe ARM: ixp4xx: include irqs.h where needed ARM: ixp4xx: mark ixp4xx_irq_setup as __init ARM: ixp4xx: don't select SERIAL_OF_PLATFORM firmware: trusted_foundations: add ARMv7 dependency MAINTAINERS: Change QCOM repo location ARM: davinci: da8xx: specify dma_coherent_mask for lcdc ARM: davinci: da850-evm: call regulator_has_full_constraints() ARM: mvebu_v7_defconfig: fix Ethernet on Clearfog ARM: dts: am335x phytec boards: Fix cd-gpios active level ARM: dts: dra72x: Disable usb4_tm target module arm64: arch_k3: Fix kconfig dependency warning ARM: dts: Drop bogus CLKSEL for timer12 on dra7 MAINTAINERS: Update Stefan Wahren email address ARM: dts: bcm: Add missing device_type = "memory" property soc: bcm: brcmstb: biuctrl: Register writes require a barrier soc: brcmstb: Fix error path for unsupported CPUs ARM: dts: dra71x: Disable usb4_tm target module ARM: dts: dra71x: Disable rtc target module ARM: dts: dra76x: Disable usb4_tm target module ...
2019-06-18	KVM: nVMX: shadow pin based execution controls	Paolo Bonzini	1	-0/+1
	The VMX_PREEMPTION_TIMER flag may be toggled frequently, though not very frequently. Since it does not affect KVM's dirty logic, e.g. the preemption timer value is loaded from vmcs12 even if vmcs12 is "clean", there is no need to mark vmcs12 dirty when L1 writes pin controls, and shadowing the field achieves that. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: VMX: Leave preemption timer running when it's disabled	Sean Christopherson	3	-25/+41
	VMWRITEs to the major VMCS controls, pin controls included, are deceptively expensive. CPUs with VMCS caching (Westmere and later) also optimize away consistency checks on VM-Entry, i.e. skip consistency checks if the relevant fields have not changed since the last successful VM-Entry (of the cached VMCS). Because uops are a precious commodity, uCode's dirty VMCS field tracking isn't as precise as software would prefer. Notably, writing any of the major VMCS fields effectively marks the entire VMCS dirty, i.e. causes the next VM-Entry to perform all consistency checks, which consumes several hundred cycles. As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than doubles the latency of the next VM-Entry (and again when/if the flag is toggled back). In a non-nested scenario, running a "standard" guest with the preemption timer enabled, toggling the timer flag is uncommon but not rare, e.g. roughly 1 in 10 entries. Disabling the preemption timer can change these numbers due to its use for "immediate exits", even when explicitly disabled by userspace. Nested virtualization in particular is painful, as the timer flag is set for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's pin controls to clear the flag since its the timer's final state isn't known until vmx_vcpu_run(). I.e. the majority of nested VM-Enters end up unnecessarily writing pin controls twice. Rather than toggle the timer flag in pin controls, set the timer value itself to the largest allowed value to put it into a "soft disabled" state, and ignore any spurious preemption timer exits. Sadly, the timer is a 32-bit value and so theoretically it can fire before the head death of the universe, i.e. spurious exits are possible. But because KVM does not save the timer value on VM-Exit and because the timer runs at a slower rate than the TSC, the maximuma timer value is still sufficiently large for KVM's purposes. E.g. on a modern CPU with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate TSC, the timer will fire after ~55 seconds of uninterrupted guest execution. In other words, spurious VM-Exits are effectively only possible if the host is completely tickless on the logical CPU, the guest is not using the preemption timer, and the guest is not generating VM-Exits for any other reason. To be safe from bad/weird hardware, disable the preemption timer if its maximum delay is less than ten seconds. Ten seconds is mostly arbitrary and was selected in no small part because it's a nice round number. For simplicity and paranoia, fall back to __kvm_request_immediate_exit() if the preemption timer is disabled by KVM or userspace. Previously KVM continued to use the preemption timer to force immediate exits even when the timer was disabled by userspace. Now that KVM leaves the timer running instead of truly disabling it, allow userspace to kill it entirely in the unlikely event the timer (or KVM) malfunctions. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	arm64/mm: don't initialize pgd_cache twice	Mike Rapoport	1	-2/+1
	When PGD_SIZE != PAGE_SIZE, arm64 uses kmem_cache for allocation of PGD memory. That cache was initialized twice: first through pgtable_cache_init() alias and then as an override for weak pgd_cache_init(). Remove the alias from pgtable_cache_init() and keep the only pgd_cache initialization in pgd_cache_init(). Fixes: caa841360134 ("x86/mm: Initialize PGD cache during mm initialization") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-18	arm64/sve: <uapi/asm/ptrace.h> should not depend on <uapi/linux/prctl.h>	Anisse Astier	1	-5/+3
	Pulling linux/prctl.h into asm/ptrace.h in the arm64 UAPI headers causes userspace build issues for any program (e.g. strace and qemu) that includes both <sys/prctl.h> and <linux/ptrace.h> when using musl libc: \| error: redefinition of 'struct prctl_mm_map' \| struct prctl_mm_map { See https://github.com/foundriesio/meta-lmp/commit/6d4a106e191b5d79c41b9ac78fd321316d3013c0 for a public example of people working around this issue. Although it's a bit grotty, fix this breakage by duplicating the prctl constant definitions. Since these are part of the kernel ABI, they cannot be changed in future and so it's not the end of the world to have them open-coded. Fixes: 43d4da2c45b2 ("arm64/sve: ptrace and ELF coredump support") Cc: stable@vger.kernel.org Acked-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Anisse Astier <aastier@freebox.fr> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-18	arm64: dts: fsl: librem5: Add a device tree for the Librem5 devkit	Angus Ainslie (Purism)	2	-0/+807
	This is for the development kit board for the Librem 5. The current level of support yields a working console and is able to boot userspace from the network or eMMC. Additional subsystems that are active : - Both USB ports - SD card socket - WiFi usdhc - WWAN modem - GNSS - GPIO keys - LEDs - gyro - magnetometer - touchscreen - pwm - backlight - haptic motor Signed-off-by: Angus Ainslie (Purism) <angus@akkea.ca> Reviewed-by: Fabio Estevam <festevam@gmail.com> Reviewed-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-06-18	arm64: dts: fsl: ls1028a: Add qDMA node	Peng Ma	1	-0/+20
	Add the qDMA device tree nodes for LS1028A devices Signed-off-by: Peng Ma <peng.ma@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-06-18	ARM: ixp4xx: include irqs.h where needed	Arnd Bergmann	5	-0/+10
	Multiple ixp4xx specific files require macros from irqs.h that were moved out from mach/irqs.h, e.g.: arch/arm/mach-ixp4xx/vulcan-pci.c:41:19: error: this function declaration is not a prototype [-Werror,-Wstrict-prototypes] arch/arm/mach-ixp4xx/vulcan-pci.c:49:10: error: implicit declaration of function 'IXP4XX_GPIO_IRQ' [-Werror,-Wimplicit-function-declaration] return IXP4XX_GPIO_IRQ(INTA); Include this header in all files that failed to build because of that. Fixes: dc8ef8cd3a05 ("ARM: ixp4xx: Convert to SPARSE_IRQ") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Olof Johansson <olof@lixom.net>
2019-06-18	ARM: ixp4xx: don't select SERIAL_OF_PLATFORM	Arnd Bergmann	1	-1/+0
	Platforms should not normally select all the device drivers, leave that up to the user and the defconfig file. In this case, we get a warning for randconfig builds: WARNING: unmet direct dependencies detected for SERIAL_OF_PLATFORM Depends on [n]: TTY [=y] && HAS_IOMEM [=y] && SERIAL_8250 [=n] && OF [=y] Selected by [y]: - MACH_IXP4XX_OF [=y] && ARCH_IXP4XX [=y] Fixes: 9540724ca29d ("ARM: ixp4xx: Add device tree boot support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Olof Johansson <olof@lixom.net>
2019-06-18	arm64: dts: renesas: r8a774a1: Add dynamic power coefficient	Biju Das	1	-0/+2
	Describe the dynamic power coefficient of A57 and A53 CPUs. Based on work by Gaku Inami <gaku.inami.xw@bp.renesas.com> and others. Signed-off-by: Biju Das <biju.das@bp.renesas.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2019-06-18	arm64: dts: renesas: r8a774a1: Create thermal zone to support IPA	Biju Das	1	-1/+24
	Setup a thermal zone driven by SoC temperature sensor. Create passive trip points and bind them to CPUFreq cooling device that supports power extension. Based on work by Dien Pham <dien.pham.ry@renesas.com> for r8a7796 SoC. Signed-off-by: Biju Das <biju.das@bp.renesas.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2019-06-18	arm64: dts: renesas: r8a774a1: Add CPU capacity-dmips-mhz	Biju Das	1	-0/+6
	Set the capacity-dmips-mhz for RZ/G2M(r8a774a1) SoC, that is based on dhrystone. Based on work done by Gaku Inami <gaku.inami.xw@bp.renesas.com> for r8a7796 SoC. The average dhrystone result for 5 iterations is as below: r8a774a1 SoC (CA57x2 + CA53x4) CPU max-freq dhrystone --------------------------------- CA57 1500 MHz 11428571 lps/s CA53 1200 MHz 5000000 lps/s From this, CPU capacity-dmips-mhz for CA57 and CA53 are calculated as follows: r8a774a1 SoC CA57 : 1024 / (11428571 / 1500) * (11428571 / 1500) = 1024 CA53 : 1024 / (11428571 / 1500) * ( 5000000 / 1200) = 560 Since each CPUs have different max frequencies, the final CPU capacities of A53 scaled by the above difference is as below $ cat /sys/devices/system/cpu/cpu*/cpu_capacity 1024 1024 448 448 448 448 Signed-off-by: Biju Das <biju.das@bp.renesas.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2019-06-18	arm64: dts: renesas: r8a774a1: Add CPU topology on r8a774a1 SoC	Biju Das	1	-0/+26
	This patch adds the "cpu-map" into r8a774a1 composed of multi-cluster. This definition is used to parse the cpu topology. Based on work by Gaku Inami <gaku.inami.xw@bp.renesas.com> for r8a7796 SoC. Signed-off-by: Biju Das <biju.das@bp.renesas.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2019-06-18	arm64: dts: renesas: hihope-common: Add LEDs support	Fabrizio Castro	1	-0/+24
	This patch adds LEDs support to the HiHope RZ/G2[MN] Main Board common device tree. Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2019-06-18	arm64: dts: renesas: hihope-common: Enable USB3.0	Biju Das	1	-0/+29
	This patch enables USB3.0 host/peripheral device node for the HiHope RZ/G2M board. Signed-off-by: Biju Das <biju.das@bp.renesas.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2019-06-18	KVM: VMX: Drop hv_timer_armed from 'struct loaded_vmcs'	Sean Christopherson	3	-8/+2
	... now that it is fully redundant with the pin controls shadow. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Preset *DT exiting in vmcs02 when emulating UMIP	Sean Christopherson	1	-0/+8
	KVM dynamically toggles SECONDARY_EXEC_DESC to intercept (a subset of) instructions that are subject to User-Mode Instruction Prevention, i.e. VMCS.SECONDARY_EXEC_DESC == CR4.UMIP when emulating UMIP. Preset the VMCS control when preparing vmcs02 to avoid unnecessarily VMWRITEs, e.g. KVM will clear VMCS.SECONDARY_EXEC_DESC in prepare_vmcs02_early() and then set it in vmx_set_cr4(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Preserve last USE_MSR_BITMAPS when preparing vmcs02	Sean Christopherson	1	-1/+11
	KVM dynamically toggles the CPU_BASED_USE_MSR_BITMAPS execution control for nested guests based on whether or not both L0 and L1 want to pass through the same MSRs to L2. Preserve the last used value from vmcs02 so as to avoid multiple VMWRITEs to (re)set/(re)clear the bit on nested VM-Entry. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: VMX: Explicitly initialize controls shadow at VMCS allocation	Sean Christopherson	3	-15/+12
	Or: Don't re-initialize vmcs02's controls on every nested VM-Entry. VMWRITEs to the major VMCS controls are deceptively expensive. Intel CPUs with VMCS caching (Westmere and later) also optimize away consistency checks on VM-Entry, i.e. skip consistency checks if the relevant fields have not changed since the last successful VM-Entry (of the cached VMCS). Because uops are a precious commodity, uCode's dirty VMCS field tracking isn't as precise as software would prefer. Notably, writing any of the major VMCS fields effectively marks the entire VMCS dirty, i.e. causes the next VM-Entry to perform all consistency checks, which consumes several hundred cycles. Zero out the controls' shadow copies during VMCS allocation and use the optimized setter when "initializing" controls. While this technically affects both non-nested and nested virtualization, nested virtualization is the primary beneficiary as avoid VMWRITEs when prepare vmcs02 allows hardware to optimizie away consistency checks. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Don't reset VMCS controls shadow on VMCS switch	Sean Christopherson	2	-9/+0
	... now that the shadow copies are per-VMCS. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Shadow VMCS controls on a per-VMCS basis	Sean Christopherson	2	-15/+16
	... to pave the way for not preserving the shadow copies across switches between vmcs01 and vmcs02, and eventually to avoid VMWRITEs to vmcs02 when the desired value is unchanged across nested VM-Enters. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: VMX: Shadow VMCS secondary execution controls	Sean Christopherson	3	-26/+28
	Prepare to shadow all major control fields on a per-VMCS basis, which allows KVM to avoid costly VMWRITEs when switching between vmcs01 and vmcs02. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: VMX: Shadow VMCS primary execution controls	Sean Christopherson	3	-31/+23
	Prepare to shadow all major control fields on a per-VMCS basis, which allows KVM to avoid VMREADs when switching between vmcs01 and vmcs02, and more importantly can eliminate costly VMWRITEs to controls when preparing vmcs02. Shadowing exec controls also saves a VMREAD when opening virtual INTR/NMI windows, yay... Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: VMX: Shadow VMCS pin controls	Sean Christopherson	3	-7/+8
	Prepare to shadow all major control fields on a per-VMCS basis, which allows KVM to avoid costly VMWRITEs when switching between vmcs01 and vmcs02. Shadowing pin controls also allows a future patch to remove the per-VMCS 'hv_timer_armed' flag, as the shadow copy is a superset of said flag. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: VMX: Add builder macros for shadowing controls	Sean Christopherson	1	-64/+36
	... to pave the way for shadowing all (five) major VMCS control fields without massive amounts of error prone copy+paste+modify. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Use adjusted pin controls for vmcs02	Sean Christopherson	3	-4/+4
	KVM provides a module parameter to allow disabling virtual NMI support to simplify testing (hardware without virtual NMI support is hard to come by but it does have users). When preparing vmcs02, use the accessor for pin controls to ensure that the module param is respected for nested guests. Opportunistically swap the order of applying L0's and L1's pin controls to better align with other controls and to prepare for a future patche that will ignore L1's, but not L0's, preemption timer flag. Fixes: d02fcf50779ec ("kvm: vmx: Allow disabling virtual NMI support") Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Copy PDPTRs to/from vmcs12 only when necessary	Sean Christopherson	1	-5/+22
	Per Intel's SDM: ... the logical processor uses PAE paging if CR0.PG=1, CR4.PAE=1 and IA32_EFER.LME=0. A VM entry to a guest that uses PAE paging loads the PDPTEs into internal, non-architectural registers based on the setting of the "enable EPT" VM-execution control. and: [GUEST_PDPTR] values are saved into the four PDPTE fields as follows: - If the "enable EPT" VM-execution control is 0 or the logical processor was not using PAE paging at the time of the VM exit, the values saved are undefined. In other words, if EPT is disabled or the guest isn't using PAE paging, then the PDPTRS aren't consumed by hardware on VM-Entry and are loaded with junk on VM-Exit. From a nesting perspective, all of the above hold true, i.e. KVM can effectively ignore the VMCS PDPTRs. E.g. KVM already loads the PDPTRs from memory when nested EPT is disabled (see nested_vmx_load_cr3()). Because KVM intercepts setting CR4.PAE, there is no danger of consuming a stale value or crushing L1's VMWRITEs regardless of whether L1 intercepts CR4.PAE. The vmcs12's values are unchanged up until the VM-Exit where L2 sets CR4.PAE, i.e. L0 will see the new PAE state on the subsequent VM-Entry and propagate the PDPTRs from vmcs12 to vmcs02. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: x86: introduce is_pae_paging	Paolo Bonzini	4	-8/+12
	Checking for 32-bit PAE is quite common around code that fiddles with the PDPTRs. Add a function to compress all checks into a single invocation. Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Don't update GUEST_BNDCFGS if it's clean in HV eVMCS	Sean Christopherson	1	-4/+4
	L1 is responsible for dirtying GUEST_GRP1 if it writes GUEST_BNDCFGS. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Update vmcs12 for MSR_IA32_DEBUGCTLMSR when it's written	Sean Christopherson	2	-3/+9
	KVM unconditionally intercepts WRMSR to MSR_IA32_DEBUGCTLMSR. In the unlikely event that L1 allows L2 to write L1's MSR_IA32_DEBUGCTLMSR, but but saves L2's value on VM-Exit, update vmcs12 during L2's WRMSR so as to eliminate the need to VMREAD the value from vmcs02 on nested VM-Exit. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Update vmcs12 for SYSENTER MSRs when they're written	Sean Christopherson	2	-3/+10
	For L2, KVM always intercepts WRMSR to SYSENTER MSRs. Update vmcs12 in the WRMSR handler so that they don't need to be (re)read from vmcs02 on every nested VM-Exit. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Update vmcs12 for MSR_IA32_CR_PAT when it's written	Sean Christopherson	2	-4/+4
	As alluded to by the TODO comment, KVM unconditionally intercepts writes to the PAT MSR. In the unlikely event that L1 allows L2 to write L1's PAT directly but saves L2's PAT on VM-Exit, update vmcs12 when L2 writes the PAT. This eliminates the need to VMREAD the value from vmcs02 on VM-Exit as vmcs12 is already up to date in all situations. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Don't speculatively write APIC-access page address	Sean Christopherson	1	-8/+0
	If nested_get_vmcs12_pages() fails to map L1's APIC_ACCESS_ADDR into L2, then it disables SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES in vmcs02. In other words, the APIC_ACCESS_ADDR in vmcs02 is guaranteed to be written with the correct value before being consumed by hardware, drop the unneessary VMWRITE. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Don't speculatively write virtual-APIC page address	Sean Christopherson	1	-13/+8
	The VIRTUAL_APIC_PAGE_ADDR in vmcs02 is guaranteed to be updated before it is consumed by hardware, either in nested_vmx_enter_non_root_mode() or via the KVM_REQ_GET_VMCS12_PAGES callback. Avoid an extra VMWRITE and only stuff a bad value into vmcs02 when mapping vmcs12's address fails. This also eliminates the need for extra comments to connect the dots between prepare_vmcs02_early() and nested_get_vmcs12_pages(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Don't dump VMCS if virtual APIC page can't be mapped	Sean Christopherson	1	-3/+0
	... as a malicious userspace can run a toy guest to generate invalid virtual-APIC page addresses in L1, i.e. flood the kernel log with error messages. Fixes: 690908104e39d ("KVM: nVMX: allow tests to use bad virtual-APIC page address") Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Don't reread VMCS-agnostic state when switching VMCS	Sean Christopherson	3	-6/+15
	When switching between vmcs01 and vmcs02, there is no need to update state tracking for values that aren't tied to any particular VMCS as the per-vCPU values are already up-to-date (vmx_switch_vmcs() can only be called when the vCPU is loaded). Avoiding the update eliminates a RDMSR, and potentially a RDPKRU and posted-interrupt update (cmpxchg64() and more). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Don't "put" vCPU or host state when switching VMCS	Sean Christopherson	3	-26/+53
	When switching between vmcs01 and vmcs02, KVM isn't actually switching between guest and host. If guest state is already loaded (the likely, if not guaranteed, case), keep the guest state loaded and manually swap the loaded_cpu_state pointer after propagating saved host state to the new vmcs0{1,2}. Avoiding the switch between guest and host reduces the latency of switching between vmcs01 and vmcs02 by several hundred cycles, and reduces the roundtrip time of a nested VM by upwards of 1000 cycles. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: VMX: simplify vmx_prepare_switch_to_{guest,host}	Paolo Bonzini	2	-19/+25
	vmx->loaded_cpu_state can only be NULL or equal to vmx->loaded_vmcs, so change it to a bool. Because the direction of the bool is now the opposite of vmx->guest_msrs_dirty, change the direction of vmx->guest_msrs_dirty so that they match. Finally, do not imply that MSRs have to be reloaded when vmx->guest_state_loaded is false; instead, set vmx->guest_msrs_ready to false explicitly in vmx_prepare_switch_to_host. Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Don't rewrite GUEST_PML_INDEX during nested VM-Entry	Sean Christopherson	1	-11/+10
	Emulation of GUEST_PML_INDEX for a nested VMM is a bit weird. Because L0 flushes the PML on every VM-Exit, the value in vmcs02 at the time of VM-Enter is a constant -1, regardless of what L1 thinks/wants. Fixes: 09abe32002665 ("KVM: nVMX: split pieces of prepare_vmcs02() to prepare_vmcs02_early()") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Write ENCLS-exiting bitmap once per vmcs02	Sean Christopherson	1	-3/+3
	KVM doesn't yet support SGX virtualization, i.e. writes a constant value to ENCLS_EXITING_BITMAP so that it can intercept ENCLS and inject a #UD. Fixes: 0b665d3040281 ("KVM: vmx: Inject #UD for SGX ENCLS instruction in guest") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18	KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from vmcs01	Sean Christopherson	1	-7/+6
	If L1 does not set VM_ENTRY_LOAD_BNDCFGS, then L1's BNDCFGS value must be propagated to vmcs02 since KVM always runs with VM_ENTRY_LOAD_BNDCFGS when MPX is supported. Because the value effectively comes from vmcs01, vmcs02 must be updated even if vmcs12 is clean. Fixes: 62cf9bd8118c4 ("KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS") Cc: stable@vger.kernel.org Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>