Age | Commit message (Collapse) | Author | Files | Lines |
|
This function allocates IRQ on a specific chip. VAS needs per chip
IRQ allocation and will have IRQ handler per VAS instance.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587016720.2275.1047.camel@hbabu-laptop
|
|
As does XMON, the debugfs file /sys/kernel/debug/powerpc/xive exposes
the XIVE internal state of the machine CPUs and interrupts. Available
on the PowerNV and sPAPR platforms.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[mpe: Make the debugfs file 0400]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306150143.5551-5-clg@kaod.org
|
|
Some firmwares or hypervisors can advertise different source
characteristics. Track their value under XMON. What we are mostly
interested in is the StoreEOI flag.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306150143.5551-4-clg@kaod.org
|
|
The PowerNV platform has multiple IRQ chips and the xmon command
dumping the state of the XIVE interrupt should only operate on the
XIVE IRQ chip.
Fixes: 5896163f7f91 ("powerpc/xmon: Improve output of XIVE interrupts")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306150143.5551-3-clg@kaod.org
|
|
When a CPU is brought up, an IPI number is allocated and recorded
under the XIVE CPU structure. Invalid IPI numbers are tracked with
interrupt number 0x0.
On the PowerNV platform, the interrupt number space starts at 0x10 and
this works fine. However, on the sPAPR platform, it is possible to
allocate the interrupt number 0x0 and this raises an issue when CPU 0
is unplugged. The XIVE spapr driver tracks allocated interrupt numbers
in a bitmask and it is not correctly updated when interrupt number 0x0
is freed. It stays allocated and it is then impossible to reallocate.
Fix by using the XIVE_BAD_IRQ value instead of zero on both platforms.
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306150143.5551-2-clg@kaod.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"A pretty small batch for us, and apologies for it being a bit late, I
wanted to sneak Christophe's user_access_begin() series in.
Summary:
- Implement user_access_begin() and friends for our platforms that
support controlling kernel access to userspace.
- Enable CONFIG_VMAP_STACK on 32-bit Book3S and 8xx.
- Some tweaks to our pseries IOMMU code to allow SVMs ("secure"
virtual machines) to use the IOMMU.
- Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 32-bit
VDSO, and some other improvements.
- A series to use the PCI hotplug framework to control opencapi
card's so that they can be reset and re-read after flashing a new
FPGA image.
As well as other minor fixes and improvements as usual.
Thanks to: Alastair D'Silva, Alexandre Ghiti, Alexey Kardashevskiy,
Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Bai Yingjie, Chen
Zhou, Christophe Leroy, Frederic Barrat, Greg Kurz, Jason A.
Donenfeld, Joel Stanley, Jordan Niethe, Julia Lawall, Krzysztof
Kozlowski, Laurent Dufour, Laurentiu Tudor, Linus Walleij, Michael
Bringmann, Nathan Chancellor, Nicholas Piggin, Nick Desaulniers,
Oliver O'Halloran, Peter Ujfalusi, Pingfan Liu, Ram Pai, Randy Dunlap,
Russell Currey, Sam Bobroff, Sebastian Andrzej Siewior, Shawn
Anastasio, Stephen Rothwell, Steve Best, Sukadev Bhattiprolu, Thiago
Jung Bauermann, Tyrel Datwyler, Vaibhav Jain"
* tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (131 commits)
powerpc: configs: Cleanup old Kconfig options
powerpc/configs/skiroot: Enable some more hardening options
powerpc/configs/skiroot: Disable xmon default & enable reboot on panic
powerpc/configs/skiroot: Enable security features
powerpc/configs/skiroot: Update for symbol movement only
powerpc/configs/skiroot: Drop default n CONFIG_CRYPTO_ECHAINIV
powerpc/configs/skiroot: Drop HID_LOGITECH
powerpc/configs: Drop NET_VENDOR_HP which moved to staging
powerpc/configs: NET_CADENCE became NET_VENDOR_CADENCE
powerpc/configs: Drop CONFIG_QLGE which moved to staging
powerpc: Do not consider weak unresolved symbol relocations as bad
powerpc/32s: Fix kasan_early_hash_table() for CONFIG_VMAP_STACK
powerpc: indent to improve Kconfig readability
powerpc: Provide initial documentation for PAPR hcalls
powerpc: Implement user_access_save() and user_access_restore()
powerpc: Implement user_access_begin and friends
powerpc/32s: Prepare prevent_user_access() for user_access_end()
powerpc/32s: Drop NULL addr verification
powerpc/kuap: Fix set direction in allow/prevent_user_access()
powerpc/32s: Fix bad_kuap_fault()
...
|
|
Instead of opencoding, use probe_user_read() to failessly read
a user location and probe_user_write() for writing to user.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e041f5eedb23f09ab553be8a91c3de2087147320.1579800517.git.christophe.leroy@c-s.fr
|
|
A load on an ESB page returning all 1's means that the underlying
device has invalidated the access to the PQ state of the interrupt
through mmio. It may happen, for example when querying a PHB interrupt
while the PHB is in an error state.
In that case, we should consider the interrupt to be invalid when
checking its state in the irq_get_irqchip_state() handler.
Fixes: da15c03b047d ("powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
[clg: wrote a commit log, introduced XIVE_ESB_INVALID ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200113130118.27969-1-clg@kaod.org
|
|
The mpic_ipi_chip and mpic_irq_ht_chip structures are only copied
into other structures, so make them const.
The opportunity for this change was found using Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1577864614-5543-10-git-send-email-Julia.Lawall@inria.fr
|
|
The PCI INTx interrupts and other LSI interrupts are handled differently
under a sPAPR platform. When the interrupt source characteristics are
queried, the hypervisor returns an H_INT_ESB flag to inform the OS
that it should be using the H_INT_ESB hcall for interrupt management
and not loads and stores on the interrupt ESB pages.
A default -1 value is returned for the addresses of the ESB pages. The
driver ignores this condition today and performs a bogus IO mapping.
Recent changes and the DEBUG_VM configuration option make the bug
visible with :
kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=1024 NUMA pSeries
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-0.rc6.git0.1.fc32.ppc64le #1
NIP: c000000000f63294 LR: c000000000f62e44 CTR: 0000000000000000
REGS: c0000000fa45f0d0 TRAP: 0700 Not tainted (5.4.0-0.rc6.git0.1.fc32.ppc64le)
...
NIP ioremap_page_range+0x4c4/0x6e0
LR ioremap_page_range+0x74/0x6e0
Call Trace:
ioremap_page_range+0x74/0x6e0 (unreliable)
do_ioremap+0x8c/0x120
__ioremap_caller+0x128/0x140
ioremap+0x30/0x50
xive_spapr_populate_irq_data+0x170/0x260
xive_irq_domain_map+0x8c/0x170
irq_domain_associate+0xb4/0x2d0
irq_create_mapping+0x1e0/0x3b0
irq_create_fwspec_mapping+0x27c/0x3e0
irq_create_of_mapping+0x98/0xb0
of_irq_parse_and_map_pci+0x168/0x230
pcibios_setup_device+0x88/0x250
pcibios_setup_bus_devices+0x54/0x100
__of_scan_bus+0x160/0x310
pcibios_scan_phb+0x330/0x390
pcibios_init+0x8c/0x128
do_one_initcall+0x60/0x2c0
kernel_init_freeable+0x290/0x378
kernel_init+0x2c/0x148
ret_from_kernel_thread+0x5c/0x80
Fixes: bed81ee181dd ("powerpc/xive: introduce H_INT_ESB hcall")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191203163642.2428-1-clg@kaod.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights:
- Infrastructure for secure boot on some bare metal Power9 machines.
The firmware support is still in development, so the code here
won't actually activate secure boot on any existing systems.
- A change to xmon (our crash handler / pseudo-debugger) to restrict
it to read-only mode when the kernel is lockdown'ed, otherwise it's
trivial to drop into xmon and modify kernel data, such as the
lockdown state.
- Support for KASLR on 32-bit BookE machines (Freescale / NXP).
- Fixes for our flush_icache_range() and __kernel_sync_dicache()
(VDSO) to work with memory ranges >4GB.
- Some reworks of the pseries CMM (Cooperative Memory Management)
driver to make it behave more like other balloon drivers and enable
some cleanups of generic mm code.
- A series of fixes to our hardware breakpoint support to properly
handle unaligned watchpoint addresses.
Plus a bunch of other smaller improvements, fixes and cleanups.
Thanks to: Alastair D'Silva, Andrew Donnellan, Aneesh Kumar K.V,
Anthony Steinhauser, Cédric Le Goater, Chris Packham, Chris Smart,
Christophe Leroy, Christopher M. Riedl, Christoph Hellwig, Claudio
Carvalho, Daniel Axtens, David Hildenbrand, Deb McLemore, Diana
Craciun, Eric Richter, Geert Uytterhoeven, Greg Kroah-Hartman, Greg
Kurz, Gustavo L. F. Walbon, Hari Bathini, Harish, Jason Yan, Krzysztof
Kozlowski, Leonardo Bras, Mathieu Malaterre, Mauro S. M. Rodrigues,
Michal Suchanek, Mimi Zohar, Nathan Chancellor, Nathan Lynch, Nayna
Jain, Nick Desaulniers, Oliver O'Halloran, Qian Cai, Rasmus Villemoes,
Ravi Bangoria, Sam Bobroff, Santosh Sivaraj, Scott Wood, Thomas Huth,
Tyrel Datwyler, Vaibhav Jain, Valentin Longchamp, YueHaibing"
* tag 'powerpc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (144 commits)
powerpc/fixmap: fix crash with HIGHMEM
x86/efi: remove unused variables
powerpc: Define arch_is_kernel_initmem_freed() for lockdep
powerpc/prom_init: Use -ffreestanding to avoid a reference to bcmp
powerpc: Avoid clang warnings around setjmp and longjmp
powerpc: Don't add -mabi= flags when building with Clang
powerpc: Fix Kconfig indentation
powerpc/fixmap: don't clear fixmap area in paging_init()
selftests/powerpc: spectre_v2 test must be built 64-bit
powerpc/powernv: Disable native PCIe port management
powerpc/kexec: Move kexec files into a dedicated subdir.
powerpc/32: Split kexec low level code out of misc_32.S
powerpc/sysdev: drop simple gpio
powerpc/83xx: map IMMR with a BAT.
powerpc/32s: automatically allocate BAT in setbat()
powerpc/ioremap: warn on early use of ioremap()
powerpc: Add support for GENERIC_EARLY_IOREMAP
powerpc/fixmap: Use __fix_to_virt() instead of fix_to_virt()
powerpc/8xx: use the fixmapped IMMR in cpm_reset()
powerpc/8xx: add __init to cpm1 init functions
...
|
|
Using a mask to represent bus DMA constraints has a set of limitations.
The biggest one being it can only hold a power of two (minus one). The
DMA mapping code is already aware of this and treats dev->bus_dma_mask
as a limit. This quirk is already used by some architectures although
still rare.
With the introduction of the Raspberry Pi 4 we've found a new contender
for the use of bus DMA limits, as its PCIe bus can only address the
lower 3GB of memory (of a total of 4GB). This is impossible to represent
with a mask. To make things worse the device-tree code rounds non power
of two bus DMA limits to the next power of two, which is unacceptable in
this case.
In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
over the tree and treat it as such. Note that dev->bus_dma_limit should
contain the higher accessible DMA address.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
There is a config item CONFIG_SIMPLE_GPIO which
provides simple memory mapped GPIOs specific to powerpc.
However, the only platform which selects this option is
mpc5200, and this platform doesn't use it.
There are three boards calling simple_gpiochip_init(), but
as they don't select CONFIG_SIMPLE_GPIO, this is just a nop.
Simple_gpio is just redundant with the generic MMIO GPIO
driver which can be found in driver/gpio/ and selected via
CONFIG_GPIO_GENERIC_PLATFORM, so drop simple_gpio driver.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bf930402613b41b42d0441b784e0cc43fc18d1fb.1572529632.git.christophe.leroy@c-s.fr
|
|
When the machine crash handler is invoked, all interrupts are masked
but interrupts which have not been started yet do not have an ESB page
mapped in the Linux address space. This crashes the 'crash kexec'
sequence on sPAPR guests.
To fix, force the mapping of the ESB page when an interrupt is being
mapped in the Linux IRQ number space. This is done by setting the
initial state of the interrupt to OFF which is not necessarily the
case on PowerNV.
Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191031063100.3864-1-clg@kaod.org
|
|
On a 2-socket Power9 system with 32 cores/128 threads (SMT4) and 1TB
of memory running the following guest configs:
guest A:
- 224GB of memory
- 56 VCPUs (sockets=1,cores=28,threads=2), where:
VCPUs 0-1 are pinned to CPUs 0-3,
VCPUs 2-3 are pinned to CPUs 4-7,
...
VCPUs 54-55 are pinned to CPUs 108-111
guest B:
- 4GB of memory
- 4 VCPUs (sockets=1,cores=4,threads=1)
with the following workloads (with KSM and THP enabled in all):
guest A:
stress --cpu 40 --io 20 --vm 20 --vm-bytes 512M
guest B:
stress --cpu 4 --io 4 --vm 4 --vm-bytes 512M
host:
stress --cpu 4 --io 4 --vm 2 --vm-bytes 256M
the below soft-lockup traces were observed after an hour or so and
persisted until the host was reset (this was found to be reliably
reproducible for this configuration, for kernels 4.15, 4.18, 5.0,
and 5.3-rc5):
[ 1253.183290] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1253.183319] rcu: 124-....: (5250 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=1941
[ 1256.287426] watchdog: BUG: soft lockup - CPU#105 stuck for 23s! [CPU 52/KVM:19709]
[ 1264.075773] watchdog: BUG: soft lockup - CPU#24 stuck for 23s! [worker:19913]
[ 1264.079769] watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [worker:20331]
[ 1264.095770] watchdog: BUG: soft lockup - CPU#45 stuck for 23s! [worker:20338]
[ 1264.131773] watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [avocado:19525]
[ 1280.408480] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
[ 1316.198012] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1316.198032] rcu: 124-....: (21003 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=8243
[ 1340.411024] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
[ 1379.212609] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1379.212629] rcu: 124-....: (36756 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=14714
[ 1404.413615] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
[ 1442.227095] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1442.227115] rcu: 124-....: (52509 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=21403
[ 1455.111787] INFO: task worker:19907 blocked for more than 120 seconds.
[ 1455.111822] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
[ 1455.111833] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1455.111884] INFO: task worker:19908 blocked for more than 120 seconds.
[ 1455.111905] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
[ 1455.111925] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1455.111966] INFO: task worker:20328 blocked for more than 120 seconds.
[ 1455.111986] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
[ 1455.111998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1455.112048] INFO: task worker:20330 blocked for more than 120 seconds.
[ 1455.112068] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
[ 1455.112097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1455.112138] INFO: task worker:20332 blocked for more than 120 seconds.
[ 1455.112159] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
[ 1455.112179] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1455.112210] INFO: task worker:20333 blocked for more than 120 seconds.
[ 1455.112231] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
[ 1455.112242] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1455.112282] INFO: task worker:20335 blocked for more than 120 seconds.
[ 1455.112303] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
[ 1455.112332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1455.112372] INFO: task worker:20336 blocked for more than 120 seconds.
[ 1455.112392] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
CPUs 45, 24, and 124 are stuck on spin locks, likely held by
CPUs 105 and 31.
CPUs 105 and 31 are stuck in smp_call_function_many(), waiting on
target CPU 42. For instance:
# CPU 105 registers (via xmon)
R00 = c00000000020b20c R16 = 00007d1bcd800000
R01 = c00000363eaa7970 R17 = 0000000000000001
R02 = c0000000019b3a00 R18 = 000000000000006b
R03 = 000000000000002a R19 = 00007d537d7aecf0
R04 = 000000000000002a R20 = 60000000000000e0
R05 = 000000000000002a R21 = 0801000000000080
R06 = c0002073fb0caa08 R22 = 0000000000000d60
R07 = c0000000019ddd78 R23 = 0000000000000001
R08 = 000000000000002a R24 = c00000000147a700
R09 = 0000000000000001 R25 = c0002073fb0ca908
R10 = c000008ffeb4e660 R26 = 0000000000000000
R11 = c0002073fb0ca900 R27 = c0000000019e2464
R12 = c000000000050790 R28 = c0000000000812b0
R13 = c000207fff623e00 R29 = c0002073fb0ca808
R14 = 00007d1bbee00000 R30 = c0002073fb0ca800
R15 = 00007d1bcd600000 R31 = 0000000000000800
pc = c00000000020b260 smp_call_function_many+0x3d0/0x460
cfar= c00000000020b270 smp_call_function_many+0x3e0/0x460
lr = c00000000020b20c smp_call_function_many+0x37c/0x460
msr = 900000010288b033 cr = 44024824
ctr = c000000000050790 xer = 0000000000000000 trap = 100
CPU 42 is running normally, doing VCPU work:
# CPU 42 stack trace (via xmon)
[link register ] c00800001be17188 kvmppc_book3s_radix_page_fault+0x90/0x2b0 [kvm_hv]
[c000008ed3343820] c000008ed3343850 (unreliable)
[c000008ed33438d0] c00800001be11b6c kvmppc_book3s_hv_page_fault+0x264/0xe30 [kvm_hv]
[c000008ed33439d0] c00800001be0d7b4 kvmppc_vcpu_run_hv+0x8dc/0xb50 [kvm_hv]
[c000008ed3343ae0] c00800001c10891c kvmppc_vcpu_run+0x34/0x48 [kvm]
[c000008ed3343b00] c00800001c10475c kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
[c000008ed3343b90] c00800001c0f5a78 kvm_vcpu_ioctl+0x470/0x7c8 [kvm]
[c000008ed3343d00] c000000000475450 do_vfs_ioctl+0xe0/0xc70
[c000008ed3343db0] c0000000004760e4 ksys_ioctl+0x104/0x120
[c000008ed3343e00] c000000000476128 sys_ioctl+0x28/0x80
[c000008ed3343e20] c00000000000b388 system_call+0x5c/0x70
--- Exception: c00 (System Call) at 00007d545cfd7694
SP (7d53ff7edf50) is in userspace
It was subsequently found that ipi_message[PPC_MSG_CALL_FUNCTION]
was set for CPU 42 by at least 1 of the CPUs waiting in
smp_call_function_many(), but somehow the corresponding
call_single_queue entries were never processed by CPU 42, causing the
callers to spin in csd_lock_wait() indefinitely.
Nick Piggin suggested something similar to the following sequence as
a possible explanation (interleaving of CALL_FUNCTION/RESCHEDULE
IPI messages seems to be most common, but any mix of CALL_FUNCTION and
!CALL_FUNCTION messages could trigger it):
CPU
X: smp_muxed_ipi_set_message():
X: smp_mb()
X: message[RESCHEDULE] = 1
X: doorbell_global_ipi(42):
X: kvmppc_set_host_ipi(42, 1)
X: ppc_msgsnd_sync()/smp_mb()
X: ppc_msgsnd() -> 42
42: doorbell_exception(): // from CPU X
42: ppc_msgsync()
105: smp_muxed_ipi_set_message():
105: smb_mb()
// STORE DEFERRED DUE TO RE-ORDERING
--105: message[CALL_FUNCTION] = 1
| 105: doorbell_global_ipi(42):
| 105: kvmppc_set_host_ipi(42, 1)
| 42: kvmppc_set_host_ipi(42, 0)
| 42: smp_ipi_demux_relaxed()
| 42: // returns to executing guest
| // RE-ORDERED STORE COMPLETES
->105: message[CALL_FUNCTION] = 1
105: ppc_msgsnd_sync()/smp_mb()
105: ppc_msgsnd() -> 42
42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored
105: // hangs waiting on 42 to process messages/call_single_queue
This can be prevented with an smp_mb() at the beginning of
kvmppc_set_host_ipi(), such that stores to message[<type>] (or other
state indicated by the host_ipi flag) are ordered vs. the store to
to host_ipi.
However, doing so might still allow for the following scenario (not
yet observed):
CPU
X: smp_muxed_ipi_set_message():
X: smp_mb()
X: message[RESCHEDULE] = 1
X: doorbell_global_ipi(42):
X: kvmppc_set_host_ipi(42, 1)
X: ppc_msgsnd_sync()/smp_mb()
X: ppc_msgsnd() -> 42
42: doorbell_exception(): // from CPU X
42: ppc_msgsync()
// STORE DEFERRED DUE TO RE-ORDERING
-- 42: kvmppc_set_host_ipi(42, 0)
| 42: smp_ipi_demux_relaxed()
| 105: smp_muxed_ipi_set_message():
| 105: smb_mb()
| 105: message[CALL_FUNCTION] = 1
| 105: doorbell_global_ipi(42):
| 105: kvmppc_set_host_ipi(42, 1)
| // RE-ORDERED STORE COMPLETES
-> 42: kvmppc_set_host_ipi(42, 0)
42: // returns to executing guest
105: ppc_msgsnd_sync()/smp_mb()
105: ppc_msgsnd() -> 42
42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored
105: // hangs waiting on 42 to process messages/call_single_queue
Fixing this scenario would require an smp_mb() *after* clearing
host_ipi flag in kvmppc_set_host_ipi() to order the store vs.
subsequent processing of IPI messages.
To handle both cases, this patch splits kvmppc_set_host_ipi() into
separate set/clear functions, where we execute smp_mb() prior to
setting host_ipi flag, and after clearing host_ipi flag. These
functions pair with each other to synchronize the sender and receiver
sides.
With that change in place the above workload ran for 20 hours without
triggering any lock-ups.
Fixes: 755563bc79c7 ("powerpc/powernv: Fixes for hypervisor doorbell handling") # v4.0
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190911223155.16045-1-mdroth@linux.vnet.ibm.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"This is a bit late, partly due to me travelling, and partly due to a
power outage knocking out some of my test systems *while* I was
travelling.
- Initial support for running on a system with an Ultravisor, which
is software that runs below the hypervisor and protects guests
against some attacks by the hypervisor.
- Support for building the kernel to run as a "Secure Virtual
Machine", ie. as a guest capable of running on a system with an
Ultravisor.
- Some changes to our DMA code on bare metal, to allow devices with
medium sized DMA masks (> 32 && < 59 bits) to use more than 2GB of
DMA space.
- Support for firmware assisted crash dumps on bare metal (powernv).
- Two series fixing bugs in and refactoring our PCI EEH code.
- A large series refactoring our exception entry code to use gas
macros, both to make it more readable and also enable some future
optimisations.
As well as many cleanups and other minor features & fixups.
Thanks to: Adam Zerella, Alexey Kardashevskiy, Alistair Popple, Andrew
Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual,
Balbir Singh, Benjamin Herrenschmidt, Cédric Le Goater, Christophe
JAILLET, Christophe Leroy, Christopher M. Riedl, Christoph Hellwig,
Claudio Carvalho, Daniel Axtens, David Gibson, David Hildenbrand,
Desnes A. Nunes do Rosario, Ganesh Goudar, Gautham R. Shenoy, Greg
Kurz, Guerney Hunt, Gustavo Romero, Halil Pasic, Hari Bathini, Joakim
Tjernlund, Jonathan Neuschafer, Jordan Niethe, Leonardo Bras, Lianbo
Jiang, Madhavan Srinivasan, Mahesh Salgaonkar, Mahesh Salgaonkar,
Masahiro Yamada, Maxiwell S. Garcia, Michael Anderson, Nathan
Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver
O'Halloran, Qian Cai, Ram Pai, Ravi Bangoria, Reza Arbab, Ryan Grimm,
Sam Bobroff, Santosh Sivaraj, Segher Boessenkool, Sukadev Bhattiprolu,
Thiago Bauermann, Thiago Jung Bauermann, Thomas Gleixner, Tom
Lendacky, Vasant Hegde"
* tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (264 commits)
powerpc/mm/mce: Keep irqs disabled during lockless page table walk
powerpc: Use ftrace_graph_ret_addr() when unwinding
powerpc/ftrace: Enable HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
ftrace: Look up the address of return_to_handler() using helpers
powerpc: dump kernel log before carrying out fadump or kdump
docs: powerpc: Add missing documentation reference
powerpc/xmon: Fix output of XIVE IPI
powerpc/xmon: Improve output of XIVE interrupts
powerpc/mm/radix: remove useless kernel messages
powerpc/fadump: support holes in kernel boot memory area
powerpc/fadump: remove RMA_START and RMA_END macros
powerpc/fadump: update documentation about option to release opalcore
powerpc/fadump: consider f/w load area
powerpc/opalcore: provide an option to invalidate /sys/firmware/opal/core file
powerpc/opalcore: export /sys/firmware/opal/core for analysing opal crashes
powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP
powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel
powerpc/fadump: improve how crashed kernel's memory is reserved
powerpc/fadump: consider reserved ranges while releasing memory
powerpc/fadump: make crash memory ranges array allocation generic
...
|
|
When dumping the XIVE state of an CPU IPI, xmon does not check if the
CPU is started or not which can cause an error. Add a check for that
and change the output to be on one line just as the XIVE interrupts of
the machine.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190910081850.26038-3-clg@kaod.org
|
|
When looping on the list of interrupts, add the current value of the
PQ bits with a load on the ESB page. This has the side effect of
faulting the ESB page of all interrupts.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190910081850.26038-2-clg@kaod.org
|
|
There's a bug in skiboot that causes the OPAL_XIVE_ALLOCATE_IRQ call
to return the 32-bit value 0xffffffff when OPAL has run out of IRQs.
Unfortunatelty, OPAL return values are signed 64-bit entities and
errors are supposed to be negative. If that happens, the linux code
confusingly treats 0xffffffff as a valid IRQ number and panics at some
point.
A fix was recently merged in skiboot:
e97391ae2bb5 ("xive: fix return value of opal_xive_allocate_irq()")
but we need a workaround anyway to support older skiboots already
in the field.
Internally convert 0xffffffff to OPAL_RESOURCE which is the usual error
returned upon resource exhaustion.
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821713818.1985334.14123187368108582810.stgit@bahia.lan
|
|
functions
There are some POWER9 machines where the OPAL firmware does not support
the OPAL_XIVE_GET_QUEUE_STATE and OPAL_XIVE_SET_QUEUE_STATE calls.
The impact of this is that a guest using XIVE natively will not be able
to be migrated successfully. On the source side, the get_attr operation
on the KVM native device for the KVM_DEV_XIVE_GRP_EQ_CONFIG attribute
will fail; on the destination side, the set_attr operation for the same
attribute will fail.
This adds tests for the existence of the OPAL get/set queue state
functions, and if they are not supported, the XIVE-native KVM device
is not created and the KVM_CAP_PPC_IRQ_XIVE capability returns false.
Userspace can then either provide a software emulation of XIVE, or
else tell the guest that it does not have a XIVE controller available
to it.
Cc: stable@vger.kernel.org # v5.2+
Fixes: 3fab2d10588e ("KVM: PPC: Book3S HV: XIVE: Activate XIVE exploitation mode")
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
The xmon 'dxi' command calls OPAL to query the XIVE configuration of a
interrupt. This can only be done on baremetal (PowerNV) and it will
crash a pseries machine.
Introduce a new XIVE get_irq_config() operation which implements a
different query depending on the platform, PowerNV or pseries, and
modify xmon to use a top level wrapper.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190814154754.23682-3-clg@kaod.org
|
|
At the moment we create a small window only for 32bit devices, the window
maps 0..2GB of the PCI space only. For other devices we either use
a sketchy bypass or hardware bypass but the former can only work if
the amount of RAM is no bigger than the device's DMA mask and the latter
requires devices to support at least 59bit DMA.
This extends the default DMA window to the maximum size possible to allow
a wider DMA mask than just 32bit. The default window size is now limited
by the the iommu_table::it_map allocation bitmap which is a contiguous
array, 1 bit per an IOMMU page.
This increases the default IOMMU page size from hard coded 4K to
the system page size to allow wider DMA masks.
This increases the level number to not exceed the max order allocation
limit per TCE level. By the same time, this keeps minimal levels number
as 2 in order to save memory.
As the extended window now overlaps the 32bit MMIO region, this adds
an area reservation to iommu_init_table().
After this change the default window size is 0x80000000000==1<<43 so
devices limited to DMA mask smaller than the amount of system RAM can
still use more than just 2GB of memory for DMA.
This is an optimization and not a bug fix for DMA API usage.
With the on-demand allocation of indirect TCE table levels enabled and
2 levels, the first TCE level size is just
1<<ceil((log2(0x7ffffffffff+1)-16)/2)=16384 TCEs or 2 system pages.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190718051139.74787-5-aik@ozlabs.ru
|
|
Merge our ppc-kvm topic branch. This contains several fixes for the XIVE
interrupt controller that we are sharing with the KVM tree.
|
|
Testing has revealed the existence of a race condition where a XIVE
interrupt being shut down can be in one of the XIVE interrupt queues
(of which there are up to 8 per CPU, one for each priority) at the
point where free_irq() is called. If this happens, can return an
interrupt number which has been shut down. This can lead to various
symptoms:
- irq_to_desc(irq) can be NULL. In this case, no end-of-interrupt
function gets called, resulting in the CPU's elevated interrupt
priority (numerically lowered CPPR) never gets reset. That then
means that the CPU stops processing interrupts, causing device
timeouts and other errors in various device drivers.
- The irq descriptor or related data structures can be in the process
of being freed as the interrupt code is using them. This typically
leads to crashes due to bad pointer dereferences.
This race is basically what commit 62e0468650c3 ("genirq: Add optional
hardware synchronization for shutdown", 2019-06-28) is intended to
fix, given a get_irqchip_state() method for the interrupt controller
being used. It works by polling the interrupt controller when an
interrupt is being freed until the controller says it is not pending.
With XIVE, the PQ bits of the interrupt source indicate the state of
the interrupt source, and in particular the P bit goes from 0 to 1 at
the point where the hardware writes an entry into the interrupt queue
that this interrupt is directed towards. Normally, the code will then
process the interrupt and do an end-of-interrupt (EOI) operation which
will reset PQ to 00 (assuming another interrupt hasn't been generated
in the meantime). However, there are situations where the code resets
P even though a queue entry exists (for example, by setting PQ to 01,
which disables the interrupt source), and also situations where the
code leaves P at 1 after removing the queue entry (for example, this
is done for escalation interrupts so they cannot fire again until
they are explicitly re-enabled).
The code already has a 'saved_p' flag for the interrupt source which
indicates that a queue entry exists, although it isn't maintained
consistently. This patch adds a 'stale_p' flag to indicate that
P has been left at 1 after processing a queue entry, and adds code
to set and clear saved_p and stale_p as necessary to maintain a
consistent indication of whether a queue entry may or may not exist.
With this, we can implement xive_get_irqchip_state() by looking at
stale_p, saved_p and the ESB PQ bits for the interrupt.
There is some additional code to handle escalation interrupts
properly; because they are enabled and disabled in KVM assembly code,
which does not have access to the xive_irq_data struct for the
escalation interrupt. Hence, stale_p may be incorrect when the
escalation interrupt is freed in kvmppc_xive_{,native_}cleanup_vcpu().
Fortunately, we can fix it up by looking at vcpu->arch.xive_esc_on,
with some careful attention to barriers in order to ensure the correct
result if xive_esc_irq() races with kvmppc_xive_cleanup_vcpu().
Finally, this adds code to make noise on the console (pr_crit and
WARN_ON(1)) if we find an interrupt queue entry for an interrupt
which does not have a descriptor. While this won't catch the race
reliably, if it does get triggered it will be an indication that
the race is occurring and needs to be debugged.
Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190813100648.GE9567@blackberry
|
|
The result of this kzalloc is not checked. Add a check and corresponding
error handling code.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cc53462734dfeaf15b6bad0e626b483de18656b4.1564647619.git.christophe.jaillet@wanadoo.fr
|
|
There is no need to use GFP_ATOMIC here. GFP_KERNEL should be enough.
GFP_KERNEL is also already used for another allocation just a few lines
below.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/85d5d247ce753befd6aa63c473f7823de6520ccd.1564647619.git.christophe.jaillet@wanadoo.fr
|
|
The comment above xive_esb_read() references magic loads from an ESB as
described xive.h. This has been inaccurate since commit 12c1f339cd49
("powerpc/xive: Move definition of ESB bits") which moved the
description. Update the comment to reference the new location of the
description in xive-regs.h
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Acked-by: Stewart Smith <stewart@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802000835.26191-1-jniethe5@gmail.com
|
|
The powernv platform is the only one that directly accesses SCOMs.
Move the support code to platforms/powernv, and get rid of the
PPC_SCOM Kconfig option, as SCOM support is always selected when
compiling for powernv.
This also means that the Kconfig item for CONFIG_SCOM_DEBUGFS will
show up in menuconfig in the platform menu, rather than at the root,
which is a much better location.
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190509051119.7694-1-ajd@linux.ibm.com
|
|
xive_find_target_in_mask() has the following for(;;) loop which has a
bug when @first == cpumask_first(@mask) and condition 1 fails to hold
for every CPU in @mask. In this case we loop forever in the for-loop.
first = cpu;
for (;;) {
if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1
return cpu;
cpu = cpumask_next(cpu, mask);
if (cpu == first) // condition 2
break;
if (cpu >= nr_cpu_ids) // condition 3
cpu = cpumask_first(mask);
}
This is because, when @first == cpumask_first(@mask), we never hit the
condition 2 (cpu == first) since prior to this check, we would have
executed "cpu = cpumask_next(cpu, mask)" which will set the value of
@cpu to a value greater than @first or to nr_cpus_ids. When this is
coupled with the fact that condition 1 is not met, we will never exit
this loop.
This was discovered by the hard-lockup detector while running LTP test
concurrently with SMT switch tests.
watchdog: CPU 12 detected hard LOCKUP on other CPUs 68
watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago)
watchdog: CPU 68 Hard LOCKUP
watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago)
CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1
NIP: c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000
REGS: c000201fff3c7d80 TRAP: 0100 Not tainted (4.18.0-100.el8.ppc64le)
MSR: 9000000002883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 24028424 XER: 00000000
CFAR: c0000000006f558c IRQMASK: 1
GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18
GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8
GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8
GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001
GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18
GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0
GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f
NIP [c0000000006f5578] find_next_bit+0x38/0x90
LR [c000000000cba9ec] cpumask_next+0x2c/0x50
Call Trace:
[c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable)
[c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240
[c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0
[c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260
[c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0
[c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0
[c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90
[c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220
[c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x]
[c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x]
[c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0
[c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50
[c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0
[c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0
[c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70
[c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160
[c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70
Instruction dump:
78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039
4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040
To fix this, move the check for condition 2 after the check for
condition 3, so that we are able to break out of the loop soon after
iterating through all the CPUs in the @mask in the problem case. Use
do..while() to achieve this.
Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Reported-by: Indira P. Joga <indira.priya@in.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1563359724-13931-1-git-send-email-ego@linux.vnet.ibm.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Notable changes:
- Removal of the NPU DMA code, used by the out-of-tree Nvidia driver,
as well as some other functions only used by drivers that haven't
(yet?) made it upstream.
- A fix for a bug in our handling of hardware watchpoints (eg. perf
record -e mem: ...) which could lead to register corruption and
kernel crashes.
- Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for
vmalloc when using the Radix MMU.
- A large but incremental rewrite of our exception handling code to
use gas macros rather than multiple levels of nested CPP macros.
And the usual small fixes, cleanups and improvements.
Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab,
Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann,
Athira Rajeev, Cédric Le Goater, Christian Lamparter, Christophe
Leroy, Christophe Lombard, Christoph Hellwig, Daniel Axtens, Denis
Efremov, Enrico Weigelt, Frederic Barrat, Gautham R. Shenoy, Geert
Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg Kurz,
Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro
Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N.
Rao, Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi
Bangoria, Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher
Boessenkool, Shaokun Zhang, Shawn Anastasio, Stewart Smith, Suraj
Jitindar Singh, Thiago Jung Bauermann, YueHaibing"
* tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (163 commits)
powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state.
powerpc/eeh: Handle hugepages in ioremap space
ocxl: Update for AFU descriptor template version 1.1
powerpc/boot: pass CONFIG options in a simpler and more robust way
powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h
powerpc/irq: Don't WARN continuously in arch_local_irq_restore()
powerpc/module64: Use symbolic instructions names.
powerpc/module32: Use symbolic instructions names.
powerpc: Move PPC_HA() PPC_HI() and PPC_LO() to ppc-opcode.h
powerpc/module64: Fix comment in R_PPC64_ENTRY handling
powerpc/boot: Add lzo support for uImage
powerpc/boot: Add lzma support for uImage
powerpc/boot: don't force gzipped uImage
powerpc/8xx: Add microcode patch to move SMC parameter RAM.
powerpc/8xx: Use IO accessors in microcode programming.
powerpc/8xx: replace #ifdefs by IS_ENABLED() in microcode.c
powerpc/8xx: refactor programming of microcode CPM params.
powerpc/8xx: refactor printing of microcode patch name.
powerpc/8xx: Refactor microcode write
powerpc/8xx: refactor writing of CPM microcode arrays
...
|
|
Pull Documentation updates from Jonathan Corbet:
"It's been a relatively busy cycle for docs:
- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with
other trees, unfortunately. He has a lot more of these waiting on
the wings that, I think, will go to you directly later on.
- A new document on how to use merges and rebases in kernel repos,
and one on Spectre vulnerabilities.
- Various improvements to the build system, including automatic
markup of function() references because some people, for reasons I
will never understand, were of the opinion that
:c:func:``function()`` is unattractive and not fun to type.
- We now recommend using sphinx 1.7, but still support back to 1.4.
- Lots of smaller improvements, warning fixes, typo fixes, etc"
* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
docs: automarkup.py: ignore exceptions when seeking for xrefs
docs: Move binderfs to admin-guide
Disable Sphinx SmartyPants in HTML output
doc: RCU callback locks need only _bh, not necessarily _irq
docs: format kernel-parameters -- as code
Doc : doc-guide : Fix a typo
platform: x86: get rid of a non-existent document
Add the RCU docs to the core-api manual
Documentation: RCU: Add TOC tree hooks
Documentation: RCU: Rename txt files to rst
Documentation: RCU: Convert RCU UP systems to reST
Documentation: RCU: Convert RCU linked list to reST
Documentation: RCU: Convert RCU basic concepts to reST
docs: filesystems: Remove uneeded .rst extension on toctables
scripts/sphinx-pre-install: fix out-of-tree build
docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
Documentation: PGP: update for newer HW devices
Documentation: Add section about CPU vulnerabilities for Spectre
Documentation: platform: Delete x86-laptop-drivers.txt
docs: Note that :c:func: should no longer be used
...
|
|
Only 8xx selects CPM1 and related CONFIG options are already
in platforms/8xx/Kconfig
Move the related C files to platforms/8xx/.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Minor formatting fixes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
On most arches having function flush_dcache_range(), including PPC32,
this function does a writeback and invalidation of the cache bloc.
On PPC64, flush_dcache_range() only does a writeback while
flush_inval_dcache_range() does the invalidation in addition.
In addition it looks like within arch/powerpc/, there are no PPC64
platforms using flush_dcache_range()
This patch drops the existing 64 bits version of flush_dcache_range()
and renames flush_inval_dcache_range() into flush_dcache_range().
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Formatting of Kconfig files doesn't look so pretty, so let the
Great White Handkerchief come around and clean it up.
Also convert "---help---" as requested.
Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The kbuild documentation clearly shows that the documents
there are written at different times: some use markdown,
some use their own peculiar logic to split sections.
Convert everything to ReST without affecting too much
the author's style and avoiding adding uneeded markups.
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation version 2 of the license
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 315 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190115.503150771@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
On POWER9, if the hypervisor supports XIVE exploitation mode, the
guest OS will unconditionally requests for the XIVE interrupt mode
even if XIVE was deactivated with the kernel command line xive=off.
Later on, when the spapr XIVE init code handles xive=off, it disables
XIVE and tries to fall back on the legacy mode XICS.
This discrepency causes a kernel panic because the hypervisor is
configured to provide the XIVE interrupt mode to the guest :
kernel BUG at arch/powerpc/sysdev/xics/xics-common.c:135!
...
NIP xics_smp_probe+0x38/0x98
LR xics_smp_probe+0x2c/0x98
Call Trace:
xics_smp_probe+0x2c/0x98 (unreliable)
pSeries_smp_probe+0x40/0xa0
smp_prepare_cpus+0x62c/0x6ec
kernel_init_freeable+0x148/0x448
kernel_init+0x2c/0x148
ret_from_kernel_thread+0x5c/0x68
Look for xive=off during prom_init and don't ask for XIVE in this
case. One exception though: if the host only supports XIVE, we still
want to boot so we ignore xive=off.
Similarly, have the spapr XIVE init code to looking at the interrupt
mode negotiated during CAS, and ignore xive=off if the hypervisor only
supports XIVE.
Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.20
Reported-by: Pavithra R. Prakash <pavrampu@in.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of version 2 of the gnu general public license as
published by the free software foundation
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 107 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528171438.615055994@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation inc
59 temple place suite 330 boston ma 02111 1307 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 1334 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 or at your option any
later version this program is distributed in the hope that it will
be useful but without any warranty without even the implied warranty
of merchantability or fitness for a particular purpose see the gnu
general public license for more details you should have received a
copy of the gnu general public license along with this program if
not write to the free software foundation inc 675 mass ave cambridge
ma 02139 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 77 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.837555891@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation inc
675 mass ave cambridge ma 02139 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 441 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190520071858.739733335@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull KVM updates from Paolo Bonzini:
"ARM:
- support for SVE and Pointer Authentication in guests
- PMU improvements
POWER:
- support for direct access to the POWER9 XIVE interrupt controller
- memory and performance optimizations
x86:
- support for accessing memory not backed by struct page
- fixes and refactoring
Generic:
- dirty page tracking improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (155 commits)
kvm: fix compilation on aarch64
Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU"
kvm: x86: Fix L1TF mitigation for shadow MMU
KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible
KVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device
KVM: PPC: Book3S HV: XIVE: Fix spelling mistake "acessing" -> "accessing"
KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUs
kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks complete
tests: kvm: Add tests for KVM_SET_NESTED_STATE
KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state
tests: kvm: Add tests for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_CPU_ID
tests: kvm: Add tests to .gitignore
KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one
KVM: Fix the bitmap range to copy during clear dirty
KVM: arm64: Fix ptrauth ID register masking logic
KVM: x86: use direct accessors for RIP and RSP
KVM: VMX: Use accessors for GPRs outside of dedicated caching logic
KVM: x86: Omit caching logic for always-available GPRs
kvm, x86: Properly check whether a pfn is an MMIO or not
...
|
|
This patch fixes following (similar) warning reported by kbuild test robot:
In function ‘memcpy’,
inlined from ‘smsc75xx_init_mac_address’ at drivers/net/usb/smsc75xx.c:778:3,
inlined from ‘smsc75xx_bind’ at drivers/net/usb/smsc75xx.c:1501:2:
./include/linux/string.h:355:9: warning: argument 2 null where non-null expected [-Wnonnull]
return __builtin_memcpy(p, q, size);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/usb/smsc75xx.c: In function ‘smsc75xx_bind’:
./include/linux/string.h:355:9: note: in a call to built-in function ‘__builtin_memcpy’
I've replaced the offending memcpy with ether_addr_copy, because I'm
100% sure, that of_get_mac_address can't return NULL as it returns valid
pointer or ERR_PTR encoded value, nothing else.
I'm hesitant to just change IS_ERR into IS_ERR_OR_NULL check, as this
would make the warning disappear also, but it would be confusing to
check for impossible return value just to make a compiler happy.
I'm now changing all occurencies of memcpy to ether_addr_copy after the
of_get_mac_address call, as it's very likely, that we're going to get
similar reports from kbuild test robot in the future.
Fixes: ea168cdf1299 ("powerpc: tsi108: support of_get_mac_address new ERR_PTR error")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Petr Štetiar <ynezz@true.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Slightly delayed due to the issue with printk() calling
probe_kernel_read() interacting with our new user access prevention
stuff, but all fixed now.
The only out-of-area changes are the addition of a cpuhp_state, small
additions to Documentation and MAINTAINERS updates.
Highlights:
- Support for Kernel Userspace Access/Execution Prevention (like
SMAP/SMEP/PAN/PXN) on some 64-bit and 32-bit CPUs. This prevents
the kernel from accidentally accessing userspace outside
copy_to/from_user(), or ever executing userspace.
- KASAN support on 32-bit.
- Rework of where we map the kernel, vmalloc, etc. on 64-bit hash to
use the same address ranges we use with the Radix MMU.
- A rewrite into C of large parts of our idle handling code for
64-bit Book3S (ie. power8 & power9).
- A fast path entry for syscalls on 32-bit CPUs, for a 12-17% speedup
in the null_syscall benchmark.
- On 64-bit bare metal we have support for recovering from errors
with the time base (our clocksource), however if that fails
currently we hang in __delay() and never crash. We now have support
for detecting that case and short circuiting __delay() so we at
least panic() and reboot.
- Add support for optionally enabling the DAWR on Power9, which had
to be disabled by default due to a hardware erratum. This has the
effect of enabling hardware breakpoints for GDB, the downside is a
badly behaved program could crash the machine by pointing the DAWR
at cache inhibited memory. This is opt-in obviously.
- xmon, our crash handler, gets support for a read only mode where
operations that could change memory or otherwise disturb the system
are disabled.
Plus many clean-ups, reworks and minor fixes etc.
Thanks to: Christophe Leroy, Akshay Adiga, Alastair D'Silva, Alexey
Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar,
Anton Blanchard, Ben Hutchings, Bo YU, Breno Leitao, Cédric Le Goater,
Christopher M. Riedl, Christoph Hellwig, Colin Ian King, David Gibson,
Ganesh Goudar, Gautham R. Shenoy, George Spelvin, Greg Kroah-Hartman,
Greg Kurz, Horia Geantă, Jagadeesh Pagadala, Joel Stanley, Joe
Perches, Julia Lawall, Laurentiu Tudor, Laurent Vivier, Lukas Bulwahn,
Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre, Michael
Neuling, Mukesh Ojha, Nathan Fontenot, Nathan Lynch, Nicholas Piggin,
Nick Desaulniers, Oliver O'Halloran, Peng Hao, Qian Cai, Ravi
Bangoria, Rick Lindsley, Russell Currey, Sachin Sant, Stewart Smith,
Sukadev Bhattiprolu, Thomas Huth, Tobin C. Harding, Tyrel Datwyler,
Valentin Schneider, Wei Yongjun, Wen Yang, YueHaibing"
* tag 'powerpc-5.2-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (205 commits)
powerpc/64s: Use early_mmu_has_feature() in set_kuap()
powerpc/book3s/64: check for NULL pointer in pgd_alloc()
powerpc/mm: Fix hugetlb page initialization
ocxl: Fix return value check in afu_ioctl()
powerpc/mm: fix section mismatch for setup_kup()
powerpc/mm: fix redundant inclusion of pgtable-frag.o in Makefile
powerpc/mm: Fix makefile for KASAN
powerpc/kasan: add missing/lost Makefile
selftests/powerpc: Add a signal fuzzer selftest
powerpc/booke64: set RI in default MSR
ocxl: Provide global MMIO accessors for external drivers
ocxl: move event_fd handling to frontend
ocxl: afu_irq only deals with IRQ IDs, not offsets
ocxl: Allow external drivers to use OpenCAPI contexts
ocxl: Create a clear delineation between ocxl backend & frontend
ocxl: Don't pass pci_dev around
ocxl: Split pci.c
ocxl: Remove some unused exported symbols
ocxl: Remove superfluous 'extern' from headers
ocxl: read_pasid never returns an error, so make it void
...
|
|
There was NVMEM support added to of_get_mac_address, so it could now return
ERR_PTR encoded error values, so we need to adjust all current users of
of_get_mac_address to this new fact.
Signed-off-by: Petr Štetiar <ynezz@true.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Each thread has an associated Thread Interrupt Management context
composed of a set of registers. These registers let the thread handle
priority management and interrupt acknowledgment. The most important
are :
- Interrupt Pending Buffer (IPB)
- Current Processor Priority (CPPR)
- Notification Source Register (NSR)
They are exposed to software in four different pages each proposing a
view with a different privilege. The first page is for the physical
thread context and the second for the hypervisor. Only the third
(operating system) and the fourth (user level) are exposed the guest.
A custom VM fault handler will populate the VMA with the appropriate
pages, which should only be the OS page for now.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
The support for XIVE native exploitation mode in Linux/KVM needs a
couple more OPAL calls to get and set the state of the XIVE internal
structures being used by a sPAPR guest.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|