summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2021-08-24KVM: PPC: Book3S HV: Stop exporting symbols from book3s_64_mmu_radixFabiano Rosas1-3/+0
The book3s_64_mmu_radix.o object is not part of the KVM builtins and all the callers of the exported symbols are in the same kvm-hv.ko module so we should not need to export any symbols. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210805212616.2641017-4-farosas@linux.ibm.com
2021-08-24KVM: PPC: Book3S HV: Add sanity check to copy_tofrom_guestFabiano Rosas1-0/+3
Both paths into __kvmhv_copy_tofrom_guest_radix ensure that we arrive with an effective address that is smaller than our total addressable space and addresses quadrant 0. - The H_COPY_TOFROM_GUEST hypercall path rejects the call with H_PARAMETER if the effective address has any of the twelve most significant bits set. - The kvmhv_copy_tofrom_guest_radix path clears the top twelve bits before calling the internal function. Although the callers make sure that the effective address is sane, any future use of the function is exposed to a programming error, so add a sanity check. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210805212616.2641017-3-farosas@linux.ibm.com
2021-08-24KVM: PPC: Book3S HV: Fix copy_tofrom_guest routinesFabiano Rosas1-2/+4
The __kvmhv_copy_tofrom_guest_radix function was introduced along with nested HV guest support. It uses the platform's Radix MMU quadrants to provide a nested hypervisor with fast access to its nested guests memory (H_COPY_TOFROM_GUEST hypercall). It has also since been added as a fast path for the kvmppc_ld/st routines which are used during instruction emulation. The commit def0bfdbd603 ("powerpc: use probe_user_read() and probe_user_write()") changed the low level copy function from raw_copy_from_user to probe_user_read, which adds a check to access_ok. In powerpc that is: static inline bool __access_ok(unsigned long addr, unsigned long size) { return addr < TASK_SIZE_MAX && size <= TASK_SIZE_MAX - addr; } and TASK_SIZE_MAX is 0x0010000000000000UL for 64-bit, which means that setting the two MSBs of the effective address (which correspond to the quadrant) now cause access_ok to reject the access. This was not caught earlier because the most common code path via kvmppc_ld/st contains a fallback (kvm_read_guest) that is likely to succeed for L1 guests. For nested guests there is no fallback. Another issue is that probe_user_read (now __copy_from_user_nofault) does not return the number of bytes not copied in case of failure, so the destination memory is not being cleared anymore in kvmhv_copy_from_guest_radix: ret = kvmhv_copy_tofrom_guest_radix(vcpu, eaddr, to, NULL, n); if (ret > 0) <-- always false! memset(to + (n - ret), 0, ret); This patch fixes both issues by skipping access_ok and open-coding the low level __copy_to/from_user_inatomic. Fixes: def0bfdbd603 ("powerpc: use probe_user_read() and probe_user_write()") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210805212616.2641017-2-farosas@linux.ibm.com
2021-08-24block: remove CONFIG_DEBUG_BLOCK_EXT_DEVTChristoph Hellwig2-2/+0
This might have been a neat debug aid when the extended dev_t was added, but that time is long gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210824075216.1179406-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-24x86/kaslr: Have process_mem_region() return a booleanJing Yangyang1-1/+1
Fix the following coccicheck warning: ./arch/x86/boot/compressed/kaslr.c:671:10-11:WARNING:return of 0/1 in function 'process_mem_region' with return type bool Generated by: scripts/coccinelle/misc/boolreturn.cocci Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Jing Yangyang <jing.yangyang@zte.com.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210824070515.61065-1-deng.changcheng@zte.com.cn
2021-08-24x86/mce: Defer processing of early errorsBorislav Petkov2-3/+9
When a fatal machine check results in a system reset, Linux does not clear the error(s) from machine check bank(s) - hardware preserves the machine check banks across a warm reset. During initialization of the kernel after the reboot, Linux reads, logs, and clears all machine check banks. But there is a problem. In: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver") the call to mce_register_decode_chain() moved later in the boot sequence. This means that /dev/mcelog doesn't see those early error logs. This was partially fixed by: cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers") which made sure that the logs were not lost completely by printing to the console. But parsing console logs is error prone. Users of /dev/mcelog should expect to find any early errors logged to standard places. Add a new flag MCP_QUEUE_LOG to machine_check_poll() to be used in early machine check initialization to indicate that any errors found should just be queued to genpool. When mcheck_late_init() is called it will call mce_schedule_work() to actually log and flush any errors queued in the genpool. [ Based on an original patch, commit message by and completely productized by Tony Luck. ] Fixes: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver") Reported-by: Sumanth Kamatala <skamatala@juniper.net> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210824003129.GA1642753@agluck-desk2.amr.corp.intel.com
2021-08-24Merge tag 'aspeed-5.15-defconfig' of ↵Arnd Bergmann2-27/+14
git://git.kernel.org/pub/scm/linux/kernel/git/joel/bmc into arm/defconfig ASPEED defconfig updates for 5.15 - Enable new KCS SerIO driver - Enable SGPIO and EDAC for AST2400 now they are supported there - Switch to SLUB and enable SLAB_FREELIST_HARDENED - Regenerate defconfigs atop v5.14-rc2 * tag 'aspeed-5.15-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/joel/bmc: ARM: config: aspeed: Regenerate defconfigs ARM: config: aspeed_g4: Enable EDAC and SPGIO ARM: config: aspeed: Enable KCS adapter for raw SerIO ARM: config: aspeed: Enable hardened allocator feature Link: https://lore.kernel.org/r/CACPK8XdzKdnyrpjKukGWieBhLgQnBs+y=LuSr_weot=Ovy3+9A@mail.gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-08-23m68k: Fix asm register constraints for atomic opsGeert Uytterhoeven1-2/+2
Depending on register assignment by the compiler: {standard input}:3084: Error: operands mismatch -- statement `andl %a1,%d1' ignored {standard input}:3145: Error: operands mismatch -- statement `orl %a1,%d1' ignored {standard input}:3195: Error: operands mismatch -- statement `eorl %a1,%d1' ignored Indeed, the first operand must not be an address register. However, it can be an immediate value. Fix this by adjusting the register constraint from "g" (general purpose register) to "di" (data register or immediate). Fixes: e39d88ea3ce4a471 ("locking/atomic, arch/m68k: Implement atomic_fetch_{add,sub,and,or,xor}()") Fixes: d839bae4269aea46 ("locking,arch,m68k: Fold atomic_ops") Fixes: 1da177e4c3f41524 ("Linux-2.6.12-rc2") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210809112903.3898660-1-geert@linux-m68k.org
2021-08-23arm64: PCI: Support root bridge preparation for Hyper-VBoqun Feng1-1/+11
Currently at root bridge preparation, the corresponding ACPI device will be set as the companion, however for a Hyper-V virtual PCI root bridge, there is no corresponding ACPI device, because a Hyper-V virtual PCI root bridge is discovered via VMBus rather than ACPI table. In order to support this, we need to make pcibios_root_bridge_prepare() work with cfg->parent being NULL. Use a NULL pointer as the ACPI device if there is no corresponding ACPI device, and this is fine because: 1) ACPI_COMPANION_SET() can work with the second parameter being NULL, 2) semantically, if a NULL pointer is set via ACPI_COMPANION_SET(), ACPI_COMPANION() (the read API for this field) will return NULL, and since ACPI_COMPANION() may return NULL, so users must have handled the cases where it returns NULL, and 3) since there is no corresponding ACPI device, it would be wrong to use any other value here. Link: https://lore.kernel.org/r/20210726180657.142727-5-boqun.feng@gmail.com Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-23arm64: PCI: Restructure pcibios_root_bridge_prepare()Boqun Feng1-7/+12
Restructure the pcibios_root_bridge_prepare() as the preparation for supporting cases when no real ACPI device is related to the PCI host bridge. No functional change. Link: https://lore.kernel.org/r/20210726180657.142727-4-boqun.feng@gmail.com Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-23powerpc/compat_sys: Declare syscallsCédric Le Goater1-0/+30
This fixes a compile error with W=1. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210823090039.166120-3-clg@kaod.org
2021-08-23powerpc/prom: Fix unused variable ‘reserve_map’ when CONFIG_PPC32 is not setCédric Le Goater1-2/+3
This fixes a compile error with W=1. arch/powerpc/kernel/prom.c: In function ‘early_reserve_mem’: arch/powerpc/kernel/prom.c:625:10: error: variable ‘reserve_map’ set but not used [-Werror=unused-but-set-variable] __be64 *reserve_map; ^~~~~~~~~~~ cc1: all warnings being treated as errors Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210823090039.166120-2-clg@kaod.org
2021-08-23powerpc/syscalls: Remove __NR__exitChristophe Leroy1-2/+0
__NR__exit is nowhere used. On most architectures it was removed by commit 135ab6ec8fda ("[PATCH] remove remaining errno and __KERNEL_SYSCALLS__ references") but not on powerpc. powerpc removed __KERNEL_SYSCALLS__ in commit 3db03b4afb3e ("[PATCH] rename the provided execve functions to kernel_execve"), but __NR__exit was left over. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6457eb4f327313323ed1f70e540bbb4ddc9178fa.1629701106.git.christophe.leroy@csgroup.eu
2021-08-23ARM: dts: omap: Drop references to opp.txtRob Herring2-2/+0
opp.txt is getting removed with the OPP binding converted to DT schema. As it is unusual to reference a binding doc from a dts file, let's just remove the reference. Cc: "Benoît Cousson" <bcousson@baylibre.com> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-08-23x86/tools/relocs: Mark die() with the printf function attr formatBorislav Petkov2-17/+21
Mark die() as a function which accepts printf-style arguments so that the compiler can typecheck them against the supplied format string. Use the C99 inttypes.h format specifiers as relocs.c gets built for both 32- and 64-bit. Original version of the patch by Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/YNnb6Q4QHtNYC049@zn.tnic
2021-08-23m68knommu: only set CONFIG_ISA_DMA_API for ColdFire sub-archArnd Bergmann1-1/+1
> Hi Arnd, > > First bad commit (maybe != root cause): > > tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > head: 2f73937c9aa561e2082839bc1a8efaac75d6e244 > commit: 47fd22f2b84765a2f7e3f150282497b902624547 [4771/5318] cs89x0: rework driver configuration > config: m68k-randconfig-c003-20210804 (attached as .config) > compiler: m68k-linux-gcc (GCC) 10.3.0 > reproduce (this is a W=1 build): > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=47fd22f2b84765a2f7e3f150282497b902624547 > git remote add linux-next https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > git fetch --no-tags linux-next master > git checkout 47fd22f2b84765a2f7e3f150282497b902624547 > # save the attached .config to linux build tree > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross ARCH=m68k > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot <lkp@intel.com> > > All errors (new ones prefixed by >>): > > In file included from include/linux/kernel.h:19, > from include/linux/list.h:9, > from include/linux/module.h:12, > from drivers/net/ethernet/cirrus/cs89x0.c:51: > drivers/net/ethernet/cirrus/cs89x0.c: In function 'net_open': > drivers/net/ethernet/cirrus/cs89x0.c:897:20: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration] > 897 | (unsigned long)isa_virt_to_bus(lp->dma_buff)); > | ^~~~~~~~~~~~~~~ > include/linux/printk.h:141:17: note: in definition of macro 'no_printk' > 141 | printk(fmt, ##__VA_ARGS__); \ > | ^~~~~~~~~~~ > drivers/net/ethernet/cirrus/cs89x0.c:86:3: note: in expansion of macro 'pr_debug' > 86 | pr_##level(fmt, ##__VA_ARGS__); \ > | ^~~ > drivers/net/ethernet/cirrus/cs89x0.c:894:3: note: in expansion of macro 'cs89_dbg' > 894 | cs89_dbg(1, debug, "%s: dma %lx %lx\n", > | ^~~~~~~~ > >> drivers/net/ethernet/cirrus/cs89x0.c:914:3: error: implicit declaration of function 'disable_dma'; did you mean 'disable_irq'? [-Werror=implicit-function-declaration] As far as I can tell, this is a bug with the m68kmmu architecture, not with my driver: The CONFIG_ISA_DMA_API option is provided for coldfire, which implements it, but dragonball also sets the option as a side-effect, without actually implementing the interfaces. The patch below should fix it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2021-08-23m68k: coldfire: return success for clk_enable(NULL)Dan Carpenter1-1/+1
The clk_enable is supposed work when CONFIG_HAVE_CLK is false, but it returns -EINVAL. That means some drivers fail during probe. [ 1.680000] flexcan: probe of flexcan.0 failed with error -22 Fixes: c1fb1bf64bb6 ("m68k: let clk_enable() return immediately if clk is NULL") Fixes: bea8bcb12da0 ("m68knommu: Add support for the Coldfire m5441x.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2021-08-23m68k: m5441x: add flexcan supportAngelo Dureghello3-4/+67
Add flexcan support. Signed-off-by: Angelo Dureghello <angelo@kernel-space.org> Made the flexcan resource inclusion conditional based on the enablement of the flexcan driver. This commit is no longer dependant on the presence of the updated driver in mainline. Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2021-08-23m68k: stmark2: update board setupAngelo Dureghello1-2/+4
Add configuration for flexcan pads. Signed-off-by: Angelo Dureghello <angelo@kernel-space.org> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2021-08-23m68k/nommu: prevent setting ROMKERNEL when ROM is not setRandy Dunlap1-0/+1
When CONFIG_ROMKERNEL is set but CONFIG_ROM is not set, the linker complains: m68k-linux-ld:./arch/m68k/kernel/vmlinux.lds:5: undefined symbol `CONFIG_ROMSTART' referenced in expression # CONFIG_ROM is not set # CONFIG_RAMKERNEL is not set CONFIG_ROMKERNEL=y Since ROMSTART depends on ROM, make ROMKERNEL also depend on ROM. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: linux-m68k@lists.linux-m68k.org Cc: uclinux-dev@uclinux.org Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2021-08-22Merge tag 'powerpc-5.14-6' of ↵Linus Torvalds3-14/+31
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix random crashes on some 32-bit CPUs by adding isync() after locking/unlocking KUEP - Fix intermittent crashes when loading modules with strict module RWX - Fix a section mismatch introduce by a previous fix. Thanks to Christophe Leroy, Fabiano Rosas, Laurent Vivier, Murilo Opsfelder Araújo, Nathan Chancellor, and Stan Johnson. h# -----BEGIN PGP SIGNATURE----- * tag 'powerpc-5.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm: Fix set_memory_*() against concurrent accesses powerpc/32s: Fix random crashes by adding isync() after locking/unlocking KUEP powerpc/xive: Do not mark xive_request_ipi() as __init
2021-08-22x86/build: Remove stale cc-option checksNick Desaulniers1-32/+14
cc-option, __cc-option, cc-option-yn, and cc-disable-warning all invoke the compiler during build time, and can slow down the build when these checks become stale for our supported compilers, whose minimally supported versions increases over time. See Documentation/process/changes.rst for the current supported minimal versions (GCC 4.9+, clang 10.0.1+). Compiler version support for these flags may be verified on godbolt.org. The following flags are supported by all supported versions of GCC and Clang. Remove their cc-option, __cc-option, and cc-option-yn tests. -Wno-address-of-packed-member -mno-avx -m32 -mno-80387 -march=k8 -march=nocona -march=core2 -march=atom -mtune=generic -mfentry [ mingo: Fixed regression on GCC, via partial revert of the stack-boundary changes. ] Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/1436 Link: https://lkml.kernel.org/r/20210812183848.1519994-1-ndesaulniers@google.com
2021-08-22x86/resctrl: Fix a maybe-uninitialized build warning treated as errorBabu Moger1-0/+6
The recent commit 064855a69003 ("x86/resctrl: Fix default monitoring groups reporting") caused a RHEL build failure with an uninitialized variable warning treated as an error because it removed the default case snippet. The RHEL Makefile uses '-Werror=maybe-uninitialized' to force possibly uninitialized variable warnings to be treated as errors. This is also reported by smatch via the 0day robot. The error from the RHEL build is: arch/x86/kernel/cpu/resctrl/monitor.c: In function ‘__mon_event_count’: arch/x86/kernel/cpu/resctrl/monitor.c:261:12: error: ‘m’ may be used uninitialized in this function [-Werror=maybe-uninitialized] m->chunks += chunks; ^~ The upstream Makefile does not build using '-Werror=maybe-uninitialized'. So, the problem is not seen there. Fix the problem by putting back the default case snippet. [ bp: note that there's nothing wrong with the code and other compilers do not trigger this warning - this is being done just so the RHEL compiler is happy. ] Fixes: 064855a69003 ("x86/resctrl: Fix default monitoring groups reporting") Reported-by: Terry Bowman <Terry.Bowman@amd.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/162949631908.23903.17090272726012848523.stgit@bmoger-ubuntu
2021-08-21Merge tag 'riscv-for-linus-5.14-rc7' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - fix the sifive-l2-cache device tree bindings for json-schema compatibility. This does not change the intended behavior of the binding. - avoid improperly freeing necessary resources during early boot. * tag 'riscv-for-linus-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix a number of free'd resources in init_resources() dt-bindings: sifive-l2-cache: Fix 'select' matching
2021-08-21Merge tag 's390-5.14-5' of ↵Linus Torvalds2-0/+11
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fix from Vasily Gorbik: - fix use after free of zpci_dev in pci code * tag 's390-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pci: fix use after free of zpci_dev
2021-08-21x86/efi: Restore Firmware IDT before calling ExitBootServices()Joerg Roedel2-9/+24
Commit 79419e13e808 ("x86/boot/compressed/64: Setup IDT in startup_32 boot path") introduced an IDT into the 32-bit boot path of the decompressor stub. But the IDT is set up before ExitBootServices() is called, and some UEFI firmwares rely on their own IDT. Save the firmware IDT on boot and restore it before calling into EFI functions to fix boot failures introduced by above commit. Fixes: 79419e13e808 ("x86/boot/compressed/64: Setup IDT in startup_32 boot path") Reported-by: Fabio Aiuto <fabioaiuto83@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Cc: stable@vger.kernel.org # 5.13+ Link: https://lkml.kernel.org/r/20210820125703.32410-1-joro@8bytes.org
2021-08-21MIPS: mscc: ocelot: mark the phy-mode for internal PHY portsVladimir Oltean2-0/+8
The ocelot driver was converted to phylink, and that expects a valid phy_interface_t. Without a phy-mode, of_get_phy_mode returns PHY_INTERFACE_MODE_NA, which is not ideal because phylink rejects that. The ocelot driver was patched to treat PHY_INTERFACE_MODE_NA as PHY_INTERFACE_MODE_INTERNAL to work with the broken DT blobs, but we should fix the device trees and specify the phy-mode too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-08-21MIPS: mscc: ocelot: disable all switch ports by defaultVladimir Oltean3-0/+23
The ocelot switch driver used to ignore ports which do not have a phy-handle property and not probe those, but this is not quite ok since it is valid to not have a phy-handle property if there is a fixed-link. It seems that checking for a phy-handle was a proxy for the proper check which is for the status, but that doesn't make a lot of sense, since the ocelot driver already iterates using for_each_available_child_of_node which skips the disabled ports, so I have no idea. Anyway, a widespread pattern in device trees is for a SoC dtsi to disable by default all hardware, and let board dts files enable what is used. So let's do that and enable only the ports with a phy-handle in the pcb120 and pcb123 device tree files. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-08-21MIPS: Return true/false (not 1/0) from bool functionsHuilong Deng2-7/+7
./arch/mips/kernel/uprobes.c:261:8-9: WARNING: return of 0/1 in function 'arch_uprobe_skip_sstep' with return type bool ./arch/mips/kernel/uprobes.c:78:10-11: WARNING: return of 0/1 in function 'is_trap_insn' with return type bool ./arch/mips/kvm/mmu.c:489:9-10: WARNING: return of 0/1 in function 'kvm_test_age_gfn' with return type bool ./arch/mips/kvm/mmu.c:445:8-9: WARNING: return of 0/1 in function 'kvm_unmap_gfn_range' with return type bool Signed-off-by: Huilong Deng <denghuilong@cdjrlc.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-08-20KVM: SVM: Add 5-level page table support for SVMWei Huang1-6/+1
When the 5-level page table is enabled on host OS, the nested page table for guest VMs must use 5-level as well. Update get_npt_level() function to reflect this requirement. In the meanwhile, remove the code that prevents kvm-amd driver from being loaded when 5-level page table is detected. Signed-off-by: Wei Huang <wei.huang2@amd.com> Message-Id: <20210818165549.3771014-4-wei.huang2@amd.com> [Tweak condition as suggested by Sean. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in hostWei Huang2-16/+38
When the 5-level page table CPU flag is set in the host, but the guest has CR4.LA57=0 (including the case of a 32-bit guest), the top level of the shadow NPT page tables will be fixed, consisting of one pointer to a lower-level table and 511 non-present entries. Extend the existing code that creates the fixed PML4 or PDP table, to provide a fixed PML5 table if needed. This is not needed on EPT because the number of layers in the tables is specified in the EPTP instead of depending on the host CR4. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Message-Id: <20210818165549.3771014-3-wei.huang2@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86: Allow CPU to force vendor-specific TDP levelWei Huang4-7/+15
AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set. To prevent kvm_mmu_get_tdp_level() from incorrectly changing NPT level on behalf of CPUs, add a new parameter in kvm_configure_mmu() to force a fixed TDP level. Signed-off-by: Wei Huang <wei.huang2@amd.com> Message-Id: <20210818165549.3771014-2-wei.huang2@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86: clamp host mapping level to max_level in kvm_mmu_max_mapping_levelPaolo Bonzini1-8/+5
This change started as a way to make kvm_mmu_hugepage_adjust a bit simpler, but it does fix two bugs as well. One bug is in zapping collapsible PTEs. If a large page size is disallowed but not all of them, kvm_mmu_max_mapping_level will return the host mapping level and the small PTEs will be zapped up to that level. However, if e.g. 1GB are prohibited, we can still zap 4KB mapping and preserve the 2MB ones. This can happen for example when NX huge pages are in use. The second would happen when userspace backs guest memory with a 1gb hugepage but only assign a subset of the page to the guest. 1gb pages would be disallowed by the memslot, but not 2mb. kvm_mmu_max_mapping_level() would fall through to the host_pfn_mapping_level() logic, see the 1gb hugepage, and map the whole thing into the guest. Fixes: 2f57b7051fe8 ("KVM: x86/mmu: Persist gfn_lpage_is_disallowed() to max_level") Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86: implement KVM_GUESTDBG_BLOCKIRQMaxim Levitsky3-1/+7
KVM_GUESTDBG_BLOCKIRQ will allow KVM to block all interrupts while running. This change is mostly intended for more robust single stepping of the guest and it has the following benefits when enabled: * Resuming from a breakpoint is much more reliable. When resuming execution from a breakpoint, with interrupts enabled, more often than not, KVM would inject an interrupt and make the CPU jump immediately to the interrupt handler and eventually return to the breakpoint, to trigger it again. From the user point of view it looks like the CPU never executed a single instruction and in some cases that can even prevent forward progress, for example, when the breakpoint is placed by an automated script (e.g lx-symbols), which does something in response to the breakpoint and then continues the guest automatically. If the script execution takes enough time for another interrupt to arrive, the guest will be stuck on the same breakpoint RIP forever. * Normal single stepping is much more predictable, since it won't land the debugger into an interrupt handler. * RFLAGS.TF has less chance to be leaked to the guest: We set that flag behind the guest's back to do single stepping but if single step lands us into an interrupt/exception handler it will be leaked to the guest in the form of being pushed to the stack. This doesn't completely eliminate this problem as exceptions can still happen, but at least this reduces the chances of this happening. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210811122927.900604-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: split svm_handle_invalid_exitMaxim Levitsky1-8/+9
Split the check for having a vmexit handler to svm_check_exit_valid, and make svm_handle_invalid_exit only handle a vmexit that is already not valid. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210811122927.900604-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86/mmu: Drop 'shared' param from tdp_mmu_link_page()Sean Christopherson1-13/+4
Drop @shared from tdp_mmu_link_page() and hardcode it to work for mmu_lock being held for read. The helper has exactly one caller and in all likelihood will only ever have exactly one caller. Even if KVM adds a path to install translations without an initiating page fault, odds are very, very good that the path will just be a wrapper to the "page fault" handler (both SNP and TDX RFCs propose patches to do exactly that). No functional change intended. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210810224554.2978735-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86/mmu: Add detailed page size statsMingwei Zhang5-31/+33
Existing KVM code tracks the number of large pages regardless of their sizes. Therefore, when large page of 1GB (or larger) is adopted, the information becomes less useful because lpages counts a mix of 1G and 2M pages. So remove the lpages since it is easy for user space to aggregate the info. Instead, provide a comprehensive page stats of all sizes from 4K to 512G. Suggested-by: Ben Gardon <bgardon@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Cc: Jing Zhang <jingzhangos@google.com> Cc: David Matlack <dmatlack@google.com> Cc: Sean Christopherson <seanjc@google.com> Message-Id: <20210803044607.599629-4-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage statsSean Christopherson1-7/+13
Factor in whether or not the old/new SPTEs are shadow-present when adjusting the large page stats in the TDP MMU. A modified MMIO SPTE can toggle the page size bit, as bit 7 is used to store the MMIO generation, i.e. is_large_pte() can get a false positive when called on a MMIO SPTE. Ditto for nuking SPTEs with REMOVED_SPTE, which sets bit 7 in its magic value. Opportunistically move the logic below the check to verify at least one of the old/new SPTEs is shadow present. Use is/was_leaf even though is/was_present would suffice. The code generation is roughly equivalent since all flags need to be computed prior to the code in question, and using the *_leaf flags will minimize the diff in a future enhancement to account all pages, i.e. will change the check to "is_leaf != was_leaf". Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Fixes: 1699f65c8b65 ("kvm/x86: Fix 'lpages' kvm stat for TDM MMU") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Message-Id: <20210803044607.599629-3-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86/mmu: Remove redundant spte present check in mmu_set_spteMingwei Zhang1-10/+6
Drop an unnecessary is_shadow_present_pte() check when updating the rmaps after installing a non-MMIO SPTE. set_spte() is used only to create shadow-present SPTEs, e.g. MMIO SPTEs are handled early on, mmu_set_spte() runs with mmu_lock held for write, i.e. the SPTE can't be zapped between writing the SPTE and updating the rmaps. Opportunistically combine the "new SPTE" logic for large pages and rmaps. No functional change intended. Suggested-by: Ben Gardon <bgardon@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Message-Id: <20210803044607.599629-2-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: stats: Add halt polling related histogram statsJing Zhang1-2/+14
Add three log histogram stats to record the distribution of time spent on successful polling, failed polling and VCPU wait. halt_poll_success_hist: Distribution of spent time for a successful poll. halt_poll_fail_hist: Distribution of spent time for a failed poll. halt_wait_hist: Distribution of time a VCPU has spent on waiting. Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210802165633.1866976-6-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: stats: Add halt_wait_ns stats for all architecturesJing Zhang4-4/+1
Add simple stats halt_wait_ns to record the time a VCPU has spent on waiting for all architectures (not just powerpc). Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210802165633.1866976-5-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: stats: Support linear and logarithmic histogram statisticsJing Zhang6-24/+0
Add new types of KVM stats, linear and logarithmic histogram. Histogram are very useful for observing the value distribution of time or size related stats. Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210802165633.1866976-2-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: AVIC: drop unsupported AVIC base relocation codeMaxim Levitsky3-13/+2
APIC base relocation is not supported anyway and won't work correctly so just drop the code that handles it and keep AVIC MMIO bar at the default APIC base. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-17-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVICMaxim Levitsky1-0/+5
Currently it is possible to have the following scenario: 1. AVIC is disabled by svm_refresh_apicv_exec_ctrl 2. svm_vcpu_blocking calls avic_vcpu_put which does nothing 3. svm_vcpu_unblocking enables the AVIC (due to KVM_REQ_APICV_UPDATE) and then calls avic_vcpu_load 4. warning is triggered in avic_vcpu_load since AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK was never cleared While it is possible to just remove the warning, it seems to be more robust to fully disable/enable AVIC in svm_refresh_apicv_exec_ctrl by calling the avic_vcpu_load/avic_vcpu_put Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-16-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: move check for kvm_vcpu_apicv_active outside of avic_vcpu_{put|load}Maxim Levitsky2-8/+9
No functional change intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-15-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: avoid refreshing avic if its state didn't changeMaxim Levitsky1-1/+8
Since AVIC can be inhibited and uninhibited rapidly it is possible that we have nothing to do by the time the svm_refresh_apicv_exec_ctrl is called. Detect and avoid this, which will be useful when we will start calling avic_vcpu_load/avic_vcpu_put when the avic inhibition state changes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-14-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: remove svm_toggle_avic_for_irq_windowMaxim Levitsky3-14/+2
Now that kvm_request_apicv_update doesn't need to drop the kvm->srcu lock, we can call kvm_request_apicv_update directly. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210810205251.424103-13-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in useVitaly Kuznetsov2-6/+32
APICV_INHIBIT_REASON_HYPERV is currently unconditionally forced upon SynIC activation as SynIC's AutoEOI is incompatible with APICv/AVIC. It is, however, possible to track whether the feature was actually used by the guest and only inhibit APICv/AVIC when needed. TLFS suggests a dedicated 'HV_DEPRECATING_AEOI_RECOMMENDED' flag to let Windows know that AutoEOI feature should be avoided. While it's up to KVM userspace to set the flag, KVM can help a bit by exposing global APICv/AVIC enablement. Maxim: - always set HV_DEPRECATING_AEOI_RECOMMENDED in kvm_get_hv_cpuid, since this feature can be used regardless of AVIC Paolo: - use arch.apicv_update_lock to protect the hv->synic_auto_eoi_used instead of atomic ops Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-12-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: add warning for mistmatch between AVIC vcpu state and AVIC inhibitionMaxim Levitsky1-0/+2
It is never a good idea to enter a guest on a vCPU when the AVIC inhibition state doesn't match the enablement of the AVIC on the vCPU. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-11-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86: APICv: fix race in kvm_request_apicv_update on SVMMaxim Levitsky2-15/+30
Currently on SVM, the kvm_request_apicv_update toggles the APICv memslot without doing any synchronization. If there is a mismatch between that memslot state and the AVIC state, on one of the vCPUs, an APIC mmio access can be lost: For example: VCPU0: enable the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT VCPU1: access an APIC mmio register. Since AVIC is still disabled on VCPU1, the access will not be intercepted by it, and neither will it cause MMIO fault, but rather it will just be read/written from/to the dummy page mapped into the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT. Fix that by adding a lock guarding the AVIC state changes, and carefully order the operations of kvm_request_apicv_update to avoid this race: 1. Take the lock 2. Send KVM_REQ_APICV_UPDATE 3. Update the apic inhibit reason 4. Release the lock This ensures that at (2) all vCPUs are kicked out of the guest mode, but don't yet see the new avic state. Then only after (4) all other vCPUs can update their AVIC state and resume. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-10-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>