summaryrefslogtreecommitdiff
path: root/drivers/xen
AgeCommit message (Collapse)AuthorFilesLines
2025-10-19xen/events: Update virq_to_irq on migrationJason Andryuk1-1/+12
commit 3fcc8e146935415d69ffabb5df40ecf50e106131 upstream. VIRQs come in 3 flavors, per-VPU, per-domain, and global, and the VIRQs are tracked in per-cpu virq_to_irq arrays. Per-domain and global VIRQs must be bound on CPU 0, and bind_virq_to_irq() sets the per_cpu virq_to_irq at registration time Later, the interrupt can migrate, and info->cpu is updated. When calling __unbind_from_irq(), the per-cpu virq_to_irq is cleared for a different cpu. If bind_virq_to_irq() is called again with CPU 0, the stale irq is returned. There won't be any irq_info for the irq, so things break. Make xen_rebind_evtchn_to_cpu() update the per_cpu virq_to_irq mappings to keep them update to date with the current cpu. This ensures the correct virq_to_irq is cleared in __unbind_from_irq(). Fixes: e46cdb66c8fc ("xen: event channels") Cc: stable@vger.kernel.org Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250828003604.8949-4-jason.andryuk@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19xen/events: Return -EEXIST for bound VIRQsJason Andryuk1-5/+14
commit 07ce121d93a5e5fb2440a24da3dbf408fcee978e upstream. Change find_virq() to return -EEXIST when a VIRQ is bound to a different CPU than the one passed in. With that, remove the BUG_ON() from bind_virq_to_irq() to propogate the error upwards. Some VIRQs are per-cpu, but others are per-domain or global. Those must be bound to CPU0 and can then migrate elsewhere. The lookup for per-domain and global will probably fail when migrated off CPU 0, especially when the current CPU is tracked. This now returns -EEXIST instead of BUG_ON(). A second call to bind a per-domain or global VIRQ is not expected, but make it non-fatal to avoid trying to look up the irq, since we don't know which per_cpu(virq_to_irq) it will be in. Cc: stable@vger.kernel.org Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250828003604.8949-3-jason.andryuk@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19xen/manage: Fix suspend error pathLukas Wunner1-1/+2
commit f770c3d858687252f1270265ba152d5c622e793f upstream. The device power management API has the following asymmetry: * dpm_suspend_start() does not clean up on failure (it requires a call to dpm_resume_end()) * dpm_suspend_end() does clean up on failure (it does not require a call to dpm_resume_start()) The asymmetry was introduced by commit d8f3de0d2412 ("Suspend-related patches for 2.6.27") in June 2008: It removed a call to device_resume() from device_suspend() (which was later renamed to dpm_suspend_start()). When Xen began using the device power management API in May 2008 with commit 0e91398f2a5d ("xen: implement save/restore"), the asymmetry did not yet exist. But since it was introduced, a call to dpm_resume_end() is missing in the error path of dpm_suspend_start(). Fix it. Fixes: d8f3de0d2412 ("Suspend-related patches for 2.6.27") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v2.6.27 Reviewed-by: "Rafael J. Wysocki (Intel)" <rafael@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <22453676d1ddcebbe81641bb68ddf587fee7e21e.1756990799.git.lukas@wunner.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19xen/events: Cleanup find_virq() return codesJason Andryuk1-3/+4
commit 08df2d7dd4ab2db8a172d824cda7872d5eca460a upstream. rc is overwritten by the evtchn_status hypercall in each iteration, so the return value will be whatever the last iteration is. This could incorrectly return success even if the event channel was not found. Change to an explicit -ENOENT for an un-found virq and return 0 on a successful match. Fixes: 62cc5fc7b2e0 ("xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports") Cc: stable@vger.kernel.org Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250828003604.8949-2-jason.andryuk@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-15xen/gntdev: remove struct gntdev_copy_batch from stackJuergen Gross2-21/+54
[ Upstream commit 70045cf6593cbf0740956ea9b7b4269142c6ee38 ] When compiling the kernel with LLVM, the following warning was issued: drivers/xen/gntdev.c:991: warning: stack frame size (1160) exceeds limit (1024) in function 'gntdev_ioctl' The main reason is struct gntdev_copy_batch which is located on the stack and has a size of nearly 1kb. For performance reasons it shouldn't by just dynamically allocated instead, so allocate a new instance when needed and instead of freeing it put it into a list of free structs anchored in struct gntdev_priv. Fixes: a4cdb556cae0 ("xen/gntdev: add ioctl for grant copy") Reported-by: Abinash Singh <abinashsinghlalotra@gmail.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250703073259.17356-1-jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19xen/x86: fix initial memory balloon targetRoger Pau Monne1-5/+8
[ Upstream commit 74287971dbb3fe322bb316afd9e7fb5807e23bee ] When adding extra memory regions as ballooned pages also adjust the balloon target, otherwise when the balloon driver is started it will populate memory to match the target value and consume all the extra memory regions added. This made the usage of the Xen `dom0_mem=,max:` command line parameter for dom0 not work as expected, as the target won't be adjusted and when the balloon is started it will populate memory straight to the 'max:' value. It would equally affect domUs that have memory != maxmem. Kernels built with CONFIG_XEN_UNPOPULATED_ALLOC are not affected, because the extra memory regions are consumed by the unpopulated allocation driver, and then balloon_add_regions() becomes a no-op. Reported-by: John <jw@nuclearfallout.net> Fixes: 87af633689ce ('x86/xen: fix balloon target initialization for PVH dom0') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Message-ID: <20250514080427.28129-1-roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04xenbus: Allow PVH dom0 a non-local xenstoreJason Andryuk1-6/+8
[ Upstream commit 90989869baae47ee2aa3bcb6f6eb9fbbe4287958 ] Make xenbus_init() allow a non-local xenstore for a PVH dom0 - it is currently forced to XS_LOCAL. With Hyperlaunch booting dom0 and a xenstore stubdom, dom0 can be handled as a regular XS_HVM following the late init path. Ideally we'd drop the use of xen_initial_domain() and just check for the event channel instead. However, ARM has a xen,enhanced no-xenstore mode, where the event channel and PFN would both be 0. Retain the xen_initial_domain() check, and use that for an additional check when the event channel is 0. Check the full 64bit HVM_PARAM_STORE_EVTCHN value to catch the off chance that high bits are set for the 32bit event channel. Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Change-Id: I5506da42e4c6b8e85079fefb2f193c8de17c7437 Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250506204456.5220-1-jason.andryuk@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04xen: Add support for XenServer 6.1 platform deviceFrediano Ziglio1-0/+4
[ Upstream commit 2356f15caefc0cc63d9cc5122641754f76ef9b25 ] On XenServer on Windows machine a platform device with ID 2 instead of 1 is used. This device is mainly identical to device 1 but due to some Windows update behaviour it was decided to use a device with a different ID. This causes compatibility issues with Linux which expects, if Xen is detected, to find a Xen platform device (5853:0001) otherwise code will crash due to some missing initialization (specifically grant tables). Specifically from dmesg RIP: 0010:gnttab_expand+0x29/0x210 Code: 90 0f 1f 44 00 00 55 31 d2 48 89 e5 41 57 41 56 41 55 41 89 fd 41 54 53 48 83 ec 10 48 8b 05 7e 9a 49 02 44 8b 35 a7 9a 49 02 <8b> 48 04 8d 44 39 ff f7 f1 45 8d 24 06 89 c3 e8 43 fe ff ff 44 39 RSP: 0000:ffffba34c01fbc88 EFLAGS: 00010086 ... The device 2 is presented by Xapi adding device specification to Qemu command line. Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com> Acked-by: Juergen Gross <jgross@suse.com> Message-ID: <20250227145016.25350-1-frediano.ziglio@cloud.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18xenbus: Use kref to track req lifetimeJason Andryuk4-8/+23
commit 1f0304dfd9d217c2f8b04a9ef4b3258a66eedd27 upstream. Marek reported seeing a NULL pointer fault in the xenbus_thread callstack: BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: e030:__wake_up_common+0x4c/0x180 Call Trace: <TASK> __wake_up_common_lock+0x82/0xd0 process_msg+0x18e/0x2f0 xenbus_thread+0x165/0x1c0 process_msg+0x18e is req->cb(req). req->cb is set to xs_wake_up(), a thin wrapper around wake_up(), or xenbus_dev_queue_reply(). It seems like it was xs_wake_up() in this case. It seems like req may have woken up the xs_wait_for_reply(), which kfree()ed the req. When xenbus_thread resumes, it faults on the zero-ed data. Linux Device Drivers 2nd edition states: "Normally, a wake_up call can cause an immediate reschedule to happen, meaning that other processes might run before wake_up returns." ... which would match the behaviour observed. Change to keeping two krefs on each request. One for the caller, and one for xenbus_thread. Each will kref_put() when finished, and the last will free it. This use of kref matches the description in Documentation/core-api/kref.rst Link: https://lore.kernel.org/xen-devel/ZO0WrR5J0xuwDIxW@mail-itl/ Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Fixes: fd8aa9095a95 ("xen: optimize xenbus driver for multiple concurrent xenstore accesses") Cc: stable@vger.kernel.org Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250506210935.5607-1-jason.andryuk@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-18xen: swiotlb: Use swiotlb bouncing if kmalloc allocation demands itJohn Ernberg1-0/+1
commit cd9c058489053e172a6654cad82ee936d1b09fab upstream. Xen swiotlb support was missed when the patch set starting with 4ab5f8ec7d71 ("mm/slab: decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN") was merged. When running Xen on iMX8QXP, a SoC without IOMMU, the effect was that USB transfers ended up corrupted when there was more than one URB inflight at the same time. Add a call to dma_kmalloc_needs_bounce() to make sure that allocations too small for DMA get bounced via swiotlb. Closes: https://lore.kernel.org/linux-usb/ab2776f0-b838-4cf6-a12a-c208eb6aad59@actia.se/ Fixes: 4ab5f8ec7d71 ("mm/slab: decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN") Cc: stable@kernel.org # v6.5+ Signed-off-by: John Ernberg <john.ernberg@actia.se> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250502114043.1968976-2-john.ernberg@actia.se> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-02xen: Change xen-acpi-processor dom0 dependencyJason Andryuk1-1/+1
[ Upstream commit 0f2946bb172632e122d4033e0b03f85230a29510 ] xen-acpi-processor functions under a PVH dom0 with only a xen_initial_domain() runtime check. Change the Kconfig dependency from PV dom0 to generic dom0 to reflect that. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Tested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250331172913.51240-1-jason.andryuk@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25x86/xen: fix balloon target initialization for PVH dom0Roger Pau Monne1-10/+24
commit 87af633689ce16ddb166c80f32b120e50b1295de upstream. PVH dom0 re-uses logic from PV dom0, in which RAM ranges not assigned to dom0 are re-used as scratch memory to map foreign and grant pages. Such logic relies on reporting those unpopulated ranges as RAM to Linux, and mark them as reserved. This way Linux creates the underlying page structures required for metadata management. Such approach works fine on PV because the initial balloon target is calculated using specific Xen data, that doesn't take into account the memory type changes described above. However on HVM and PVH the initial balloon target is calculated using get_num_physpages(), and that function does take into account the unpopulated RAM regions used as scratch space for remote domain mappings. This leads to PVH dom0 having an incorrect initial balloon target, which causes malfunction (excessive memory freeing) of the balloon driver if the dom0 memory target is later adjusted from the toolstack. Fix this by using xen_released_pages to account for any pages that are part of the memory map, but are already unpopulated when the balloon driver is initialized. This accounts for any regions used for scratch remote mappings. Note on x86 xen_released_pages definition is moved to enlighten.c so it's uniformly available for all Xen-enabled builds. Take the opportunity to unify PV with PVH/HVM guests regarding the usage of get_num_physpages(), as that avoids having to add different logic for PV vs PVH in both balloon_add_regions() and arch_xen_unpopulated_init(). Much like a6aa4eb994ee, the code in this changeset should have been part of 38620fc4e893. Fixes: a6aa4eb994ee ('xen/x86: add extra pages to unpopulated-alloc if available') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250407082838.65495-1-roger.pau@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25xenfs/xensyms: respect hypervisor's "next" indicationJan Beulich1-2/+2
commit 5c4e79e29a9fe4ea132118ac40c2bc97cfe23077 upstream. The interface specifies the symnum field as an input and output; the hypervisor sets it to the next sequential symbol's index. xensyms_next() incrementing the position explicitly (and xensyms_next_sym() decrementing it to "rewind") is only correct as long as the sequence of symbol indexes is non-sparse. Use the hypervisor-supplied value instead to update the position in xensyms_next(), and use the saved incoming index in xensyms_next_sym(). Cc: stable@kernel.org Fixes: a11f4f0a4e18 ("xen: xensyms support") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <15d5e7fa-ec5d-422f-9319-d28bed916349@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-22Xen/swiotlb: mark xen_swiotlb_fixup() __initJan Beulich1-1/+1
[ Upstream commit 75ad02318af2e4ae669e26a79f001bd5e1f97472 ] It's sole user (pci_xen_swiotlb_init()) is __init, too. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Message-ID: <e1198286-99ec-41c1-b5ad-e04e285836c9@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-21xen/swiotlb: relax alignment requirementsJuergen Gross1-8/+12
[ Upstream commit 85fcb57c983f423180ba6ec5d0034242da05cc54 ] When mapping a buffer for DMA via .map_page or .map_sg DMA operations, there is no need to check the machine frames to be aligned according to the mapped areas size. All what is needed in these cases is that the buffer is contiguous at machine level. So carve out the alignment check from range_straddles_page_boundary() and move it to a helper called by xen_swiotlb_alloc_coherent() and xen_swiotlb_free_coherent() directly. Fixes: 9f40ec84a797 ("xen/swiotlb: add alignment check for dma buffers") Reported-by: Jan Vejvalka <jan.vejvalka@lfmotol.cuni.cz> Tested-by: Jan Vejvalka <jan.vejvalka@lfmotol.cuni.cz> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09xen: Fix the issue of resource not being properly released in xenbus_dev_probe()Qiu-ji Chen1-1/+7
commit afc545da381ba0c651b2658966ac737032676f01 upstream. This patch fixes an issue in the function xenbus_dev_probe(). In the xenbus_dev_probe() function, within the if (err) branch at line 313, the program incorrectly returns err directly without releasing the resources allocated by err = drv->probe(dev, id). As the return value is non-zero, the upper layers assume the processing logic has failed. However, the probe operation was performed earlier without a corresponding remove operation. Since the probe actually allocates resources, failing to perform the remove operation could lead to problems. To fix this issue, we followed the resource release logic of the xenbus_dev_remove() function by adding a new block fail_remove before the fail_put block. After entering the branch if (err) at line 313, the function will use a goto statement to jump to the fail_remove block, ensuring that the previously acquired resources are correctly released, thus preventing the reference count leak. This bug was identified by an experimental static analysis tool developed by our team. The tool specializes in analyzing reference count operations and detecting potential issues where resources are not properly managed. In this case, the tool flagged the missing release operation as a potential problem, which led to the development of this patch. Fixes: 4bac07c993d0 ("xen: add the Xenbus sysfs and virtual device hotplug driver") Cc: stable@vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20241105130919.4621-1-chenqiuji666@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04xen/swiotlb: fix allocated sizeJuergen Gross1-2/+2
[ Upstream commit c3dea3d54f4d399f8044547f0f1abdccbdfb0fee ] The allocated size in xen_swiotlb_alloc_coherent() and xen_swiotlb_free_coherent() is calculated wrong for the case of XEN_PAGE_SIZE not matching PAGE_SIZE. Fix that. Fixes: 7250f422da04 ("xen-swiotlb: use actually allocated size on check physical continuous") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04xen/swiotlb: add alignment check for dma buffersJuergen Gross1-0/+6
[ Upstream commit 9f40ec84a7976d95c34e7cc070939deb103652b0 ] When checking a memory buffer to be consecutive in machine memory, the alignment needs to be checked, too. Failing to do so might result in DMA memory not being aligned according to its requested size, leading to error messages like: 4xxx 0000:2b:00.0: enabling device (0140 -> 0142) 4xxx 0000:2b:00.0: Ring address not aligned 4xxx 0000:2b:00.0: Failed to initialise service qat_crypto 4xxx 0000:2b:00.0: Resetting device qat_dev0 4xxx: probe of 0000:2b:00.0 failed with error -14 Fixes: 9435cce87950 ("xen/swiotlb: Add support for 64KB page granularity") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12xen: privcmd: Fix possible access to a freed kirqfd instanceViresh Kumar1-1/+9
[ Upstream commit 611ff1b1ae989a7bcce3e2a8e132ee30e968c557 ] Nothing prevents simultaneous ioctl calls to privcmd_irqfd_assign() and privcmd_irqfd_deassign(). If that happens, it is possible that a kirqfd created and added to the irqfds_list by privcmd_irqfd_assign() may get removed by another thread executing privcmd_irqfd_deassign(), while the former is still using it after dropping the locks. This can lead to a situation where an already freed kirqfd instance may be accessed and cause kernel oops. Use SRCU locking to prevent the same, as is done for the KVM implementation for irqfds. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/9e884af1f1f842eacbb7afc5672c8feb4dea7f3f.1718703669.git.viresh.kumar@linaro.org Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-04Revert "change alloc_pages name in dma_map_ops to avoid name conflicts"Greg Kroah-Hartman2-2/+2
This reverts commit 983e6b2636f0099dbac1874c9e885bbe1cf2df05 which is commit 8a2f11878771da65b8ac135c73b47dae13afbd62 upstream. It wasn't needed and caused a build break on s390, so just revert it entirely. Reported-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240830221217.GA3837758@thelio-3990X Cc: Suren Baghdasaryan <surenb@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-29change alloc_pages name in dma_map_ops to avoid name conflictsSuren Baghdasaryan2-2/+2
[ Upstream commit 8a2f11878771da65b8ac135c73b47dae13afbd62 ] After redefining alloc_pages, all uses of that name are being replaced. Change the conflicting names to prevent preprocessor from replacing them when it's not intended. Link: https://lkml.kernel.org/r/20240321163705.3067592-18-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 61ebe5a747da ("mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14xen: privcmd: Switch from mutex to spinlock for irqfdsViresh Kumar1-10/+15
[ Upstream commit 1c682593096a487fd9aebc079a307ff7a6d054a3 ] irqfd_wakeup() gets EPOLLHUP, when it is called by eventfd_release() by way of wake_up_poll(&ctx->wqh, EPOLLHUP), which gets called under spin_lock_irqsave(). We can't use a mutex here as it will lead to a deadlock. Fix it by switching over to a spin lock. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/a66d7a7a9001424d432f52a9fc3931a1f345464f.1718703669.git.viresh.kumar@linaro.org Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12drivers/xen: Improve the late XenStore init protocolHenry Wang1-13/+23
[ Upstream commit a3607581cd49c17128a486a526a36a97bafcb2bb ] Currently, the late XenStore init protocol is only triggered properly for the case that HVM_PARAM_STORE_PFN is ~0ULL (invalid). For the case that XenStore interface is allocated but not ready (the connection status is not XENSTORE_CONNECTED), Linux should also wait until the XenStore is set up properly. Introduce a macro to describe the XenStore interface is ready, use it in xenbus_probe_initcall() to select the code path of doing the late XenStore init protocol or not. Since now we have more than one condition for XenStore late init, rework the check in xenbus_probe() for the free_irq(). Take the opportunity to enhance the check of the allocated XenStore interface can be properly mapped, and return error early if the memremap() fails. Fixes: 5b3353949e89 ("xen: add support for initializing xenstore later as HVM domain") Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/20240517011516.1451087-1-xin.wang2@amd.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13x86/xen: attempt to inflate the memory balloon on PVHRoger Pau Monne1-2/+0
[ Upstream commit 38620fc4e8934f1801c7811ef39a041914ac4c1d ] When running as PVH or HVM Linux will use holes in the memory map as scratch space to map grants, foreign domain pages and possibly miscellaneous other stuff. However the usage of such memory map holes for Xen purposes can be problematic. The request of holesby Xen happen quite early in the kernel boot process (grant table setup already uses scratch map space), and it's possible that by then not all devices have reclaimed their MMIO space. It's not unlikely for chunks of Xen scratch map space to end up using PCI bridge MMIO window memory, which (as expected) causes quite a lot of issues in the system. At least for PVH dom0 we have the possibility of using regions marked as UNUSABLE in the e820 memory map. Either if the region is UNUSABLE in the native memory map, or it has been converted into UNUSABLE in order to hide RAM regions from dom0, the second stage translation page-tables can populate those areas without issues. PV already has this kind of logic, where the balloon driver is inflated at boot. Re-use the current logic in order to also inflate it when running as PVH. onvert UNUSABLE regions up to the ratio specified in EXTRA_MEM_RATIO to RAM, while reserving them using xen_add_extra_mem() (which is also moved so it's no longer tied to CONFIG_PV). [jgross: fixed build for CONFIG_PVH without CONFIG_XEN_PVH] Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20240220174341.56131-1-roger.pau@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27xen/events: increment refcnt only if event channel is refcountedJuergen Gross1-9/+13
[ Upstream commit d277f9d82802223f242cd9b60c988cfdda1d6be0 ] In bind_evtchn_to_irq_chip() don't increment the refcnt of the event channel blindly. In case the event channel is NOT refcounted, issue a warning instead. Add an additional safety net by doing the refcnt increment only if the caller has specified IRQF_SHARED in the irqflags parameter. Fixes: 9e90e58c11b7 ("xen: evtchn: Allow shared registration of IRQ handers") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Link: https://lore.kernel.org/r/20240313071409.25913-3-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27xen/evtchn: avoid WARN() when unbinding an event channelJuergen Gross1-0/+6
[ Upstream commit 51c23bd691c0f1fb95b29731c356c6fd69925d17 ] When unbinding a user event channel, the related handler might be called a last time in case the kernel was built with CONFIG_DEBUG_SHIRQ. This might cause a WARN() in the handler. Avoid that by adding an "unbinding" flag to struct user_event which will short circuit the handler. Fixes: 9e90e58c11b7 ("xen: evtchn: Allow shared registration of IRQ handers") Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com> Tested-by: Demi Marie Obenour <demi@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Link: https://lore.kernel.org/r/20240313071409.25913-2-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01xen/events: fix error code in xen_bind_pirq_msi_to_irq()Dan Carpenter1-1/+3
commit 7f3da4b698bcc21a6df0e7f114af71d53a3e26ac upstream. Return -ENOMEM if xen_irq_init() fails. currently the code returns an uninitialized variable or zero. Fixes: 5dd9ad32d775 ("xen/events: drop xen_allocate_irqs_dynamic()") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Juergen Gross <jgross@ssue.com> Link: https://lore.kernel.org/r/3b9ab040-a92e-4e35-b687-3a95890a9ace@moroto.mountain Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01xen/events: close evtchn after mapping cleanupMaximilian Heyne1-2/+6
[ Upstream commit fa765c4b4aed2d64266b694520ecb025c862c5a9 ] shutdown_pirq and startup_pirq are not taking the irq_mapping_update_lock because they can't due to lock inversion. Both are called with the irq_desc->lock being taking. The lock order, however, is first irq_mapping_update_lock and then irq_desc->lock. This opens multiple races: - shutdown_pirq can be interrupted by a function that allocates an event channel: CPU0 CPU1 shutdown_pirq { xen_evtchn_close(e) __startup_pirq { EVTCHNOP_bind_pirq -> returns just freed evtchn e set_evtchn_to_irq(e, irq) } xen_irq_info_cleanup() { set_evtchn_to_irq(e, -1) } } Assume here event channel e refers here to the same event channel number. After this race the evtchn_to_irq mapping for e is invalid (-1). - __startup_pirq races with __unbind_from_irq in a similar way. Because __startup_pirq doesn't take irq_mapping_update_lock it can grab the evtchn that __unbind_from_irq is currently freeing and cleaning up. In this case even though the event channel is allocated, its mapping can be unset in evtchn_to_irq. The fix is to first cleanup the mappings and then close the event channel. In this way, when an event channel gets allocated it's potential previous evtchn_to_irq mappings are guaranteed to be unset already. This is also the reverse order of the allocation where first the event channel is allocated and then the mappings are setup. On a 5.10 kernel prior to commit 3fcdaf3d7634 ("xen/events: modify internal [un]bind interfaces"), we hit a BUG like the following during probing of NVMe devices. The issue is that during nvme_setup_io_queues, pci_free_irq is called for every device which results in a call to shutdown_pirq. With many nvme devices it's therefore likely to hit this race during boot because there will be multiple calls to shutdown_pirq and startup_pirq are running potentially in parallel. ------------[ cut here ]------------ blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled; bounce buffer: enabled kernel BUG at drivers/xen/events/events_base.c:499! invalid opcode: 0000 [#1] SMP PTI CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 5.10.201-191.748.amzn2.x86_64 #1 Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006 Workqueue: nvme-reset-wq nvme_reset_work RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0 Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0f 84 64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 RSP: 0000:ffffc9000d533b08 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006 RDX: 0000000000000028 RSI: 00000000ffffffff RDI: 00000000ffffffff RBP: ffff888107419680 R08: 0000000000000000 R09: ffffffff82d72b00 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000001ed R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000002 FS: 0000000000000000(0000) GS:ffff88bc8b500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000002610001 CR4: 00000000001706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? show_trace_log_lvl+0x1c1/0x2d9 ? show_trace_log_lvl+0x1c1/0x2d9 ? set_affinity_irq+0xdc/0x1c0 ? __die_body.cold+0x8/0xd ? die+0x2b/0x50 ? do_trap+0x90/0x110 ? bind_evtchn_to_cpu+0xdf/0xf0 ? do_error_trap+0x65/0x80 ? bind_evtchn_to_cpu+0xdf/0xf0 ? exc_invalid_op+0x4e/0x70 ? bind_evtchn_to_cpu+0xdf/0xf0 ? asm_exc_invalid_op+0x12/0x20 ? bind_evtchn_to_cpu+0xdf/0xf0 ? bind_evtchn_to_cpu+0xc5/0xf0 set_affinity_irq+0xdc/0x1c0 irq_do_set_affinity+0x1d7/0x1f0 irq_setup_affinity+0xd6/0x1a0 irq_startup+0x8a/0xf0 __setup_irq+0x639/0x6d0 ? nvme_suspend+0x150/0x150 request_threaded_irq+0x10c/0x180 ? nvme_suspend+0x150/0x150 pci_request_irq+0xa8/0xf0 ? __blk_mq_free_request+0x74/0xa0 queue_request_irq+0x6f/0x80 nvme_create_queue+0x1af/0x200 nvme_create_io_queues+0xbd/0xf0 nvme_setup_io_queues+0x246/0x320 ? nvme_irq_check+0x30/0x30 nvme_reset_work+0x1c8/0x400 process_one_work+0x1b0/0x350 worker_thread+0x49/0x310 ? process_one_work+0x350/0x350 kthread+0x11b/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 Modules linked in: ---[ end trace a11715de1eee1873 ]--- Fixes: d46a78b05c0e ("xen: implement pirq type event channels") Cc: stable@vger.kernel.org Co-debugged-by: Andrew Panyakin <apanyaki@amazon.com> Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20240124163130.31324-1-mheyne@amazon.de Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01xen/events: modify internal [un]bind interfacesJuergen Gross1-135/+124
[ Upstream commit 3fcdaf3d7634338c3f5cbfa7451eb0b6b0024844 ] Modify the internal bind- and unbind-interfaces to take a struct irq_info parameter. When allocating a new IRQ pass the pointer from the allocating function further up. This will reduce the number of info_for_irq() calls and make the code more efficient. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Signed-off-by: Juergen Gross <jgross@suse.com> Stable-dep-of: fa765c4b4aed ("xen/events: close evtchn after mapping cleanup") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01xen/events: drop xen_allocate_irqs_dynamic()Juergen Gross1-30/+44
[ Upstream commit 5dd9ad32d7758b1a76742f394acf0eb3ac8a636a ] Instead of having a common function for allocating a single IRQ or a consecutive number of IRQs, split up the functionality into the callers of xen_allocate_irqs_dynamic(). This allows to handle any allocation error in xen_irq_init() gracefully instead of panicing the system. Let xen_irq_init() return the irq_info pointer or NULL in case of an allocation error. Additionally set the IRQ into irq_info already at allocation time, as otherwise the IRQ would be '0' (which is a valid IRQ number) until being set. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Signed-off-by: Juergen Gross <jgross@suse.com> Stable-dep-of: fa765c4b4aed ("xen/events: close evtchn after mapping cleanup") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01xen/events: remove some simple helpers from events_base.cJuergen Gross1-59/+38
[ Upstream commit 3bdb0ac350fe5e6301562143e4573971dd01ae0b ] The helper functions type_from_irq() and cpu_from_irq() are just one line functions used only internally. Open code them where needed. At the same time modify and rename get_evtchn_to_irq() to return a struct irq_info instead of the IRQ number. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Signed-off-by: Juergen Gross <jgross@suse.com> Stable-dep-of: fa765c4b4aed ("xen/events: close evtchn after mapping cleanup") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01xen/events: reduce externally visible helper functionsJuergen Gross3-9/+13
[ Upstream commit 686464514fbebb6c8de4415238319e414c3500a4 ] get_evtchn_to_irq() has only one external user while irq_from_evtchn() provides the same functionality and is exported for a wider user base. Modify the only external user of get_evtchn_to_irq() to use irq_from_evtchn() instead and make get_evtchn_to_irq() static. evtchn_from_irq() and irq_from_virq() have a single external user and can easily be combined to a new helper irq_evtchn_from_virq() allowing to drop irq_from_virq() and to make evtchn_from_irq() static. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Signed-off-by: Juergen Gross <jgross@suse.com> Stable-dep-of: fa765c4b4aed ("xen/events: close evtchn after mapping cleanup") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01xen: evtchn: Allow shared registration of IRQ handersViresh Kumar2-2/+3
[ Upstream commit 9e90e58c11b74c2bddac4b2702cf79d36b981278 ] Currently the handling of events is supported either in the kernel or userspace, but not both. In order to support fast delivery of interrupts from the guest to the backend, we need to handle the Queue notify part of Virtio protocol in kernel and the rest in userspace. Update the interrupt handler registration flag to IRQF_SHARED for event channels, which would allow multiple entities to bind their interrupt handler for the same event channel port. Also increment the reference count of irq_info when multiple entities try to bind event channel to irqchip, so the unbinding happens only after all the users are gone. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/99b1edfd3147c6b5d22a5139dab5861e767dc34a.1697439990.git.viresh.kumar@linaro.org Signed-off-by: Juergen Gross <jgross@suse.com> Stable-dep-of: fa765c4b4aed ("xen/events: close evtchn after mapping cleanup") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05xen/gntdev: Fix the abuse of underlying struct page in DMA-buf importOleksandr Tyshchenko1-25/+25
[ Upstream commit 2d2db7d40254d5fb53b11ebd703cd1ed0c5de7a1 ] DO NOT access the underlying struct page of an sg table exported by DMA-buf in dmabuf_imp_to_refs(), this is not allowed. Please see drivers/dma-buf/dma-buf.c:mangle_sg_table() for details. Fortunately, here (for special Xen device) we can avoid using pages and calculate gfns directly from dma addresses provided by the sg table. Suggested-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Daniel Vetter <daniel@ffwll.ch> Link: https://lore.kernel.org/r/20240107103426.2038075-1-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-03swiotlb-xen: provide the "max_mapping_size" methodKeith Busch1-0/+1
commit bff2a2d453a1b683378b4508b86b84389f551a00 upstream. There's a bug that when using the XEN hypervisor with bios with large multi-page bio vectors on NVMe, the kernel deadlocks [1]. The deadlocks are caused by inability to map a large bio vector - dma_map_sgtable always returns an error, this gets propagated to the block layer as BLK_STS_RESOURCE and the block layer retries the request indefinitely. XEN uses the swiotlb framework to map discontiguous pages into contiguous runs that are submitted to the PCIe device. The swiotlb framework has a limitation on the length of a mapping - this needs to be announced with the max_mapping_size method to make sure that the hardware drivers do not create larger mappings. Without max_mapping_size, the NVMe block driver would create large mappings that overrun the maximum mapping size. Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Link: https://lore.kernel.org/stable/ZTNH0qtmint%2FzLJZ@mail-itl/ [1] Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Suggested-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/151bef41-e817-aea9-675-a35fdac4ed@redhat.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28acpi/processor: sanitize _OSC/_PDC capabilities for Xen dom0Roger Pau Monne1-0/+22
commit bfa993b355d33a438a746523e7129391c8664e8a upstream. The Processor capability bits notify ACPI of the OS capabilities, and so ACPI can adjust the return of other Processor methods taking the OS capabilities into account. When Linux is running as a Xen dom0, the hypervisor is the entity in charge of processor power management, and hence Xen needs to make sure the capabilities reported by _OSC/_PDC match the capabilities of the driver in Xen. Introduce a small helper to sanitize the buffer when running as Xen dom0. When Xen supports HWP, this serves as the equivalent of commit a21211672c9a ("ACPI / processor: Request native thermal interrupt handling via _OSC") to avoid SMM crashes. Xen will set bit ACPI_PROC_CAP_COLLAB_PROC_PERF (bit 12) in the capability bits and the _OSC/_PDC call will apply it. [ jandryuk: Mention Xen HWP's need. Support _OSC & _PDC ] Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: stable@vger.kernel.org Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20231108212517.72279-1-jandryuk@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28xen/events: fix delayed eoi list handlingJuergen Gross1-1/+3
[ Upstream commit 47d970204054f859f35a2237baa75c2d84fcf436 ] When delaying eoi handling of events, the related elements are queued into the percpu lateeoi list. In case the list isn't empty, the elements should be sorted by the time when eoi handling is to happen. Unfortunately a new element will never be queued at the start of the list, even if it has a handling time lower than all other list elements. Fix that by handling that case the same way as for an empty list. Fixes: e99502f76271 ("xen/events: defer eoi in case of excessive number of events") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28xen/events: avoid using info_for_irq() in xen_send_IPI_one()Juergen Gross1-4/+8
[ Upstream commit e64e7c74b99ec9e439abca75f522f4b98f220bd1 ] xen_send_IPI_one() is being used by cpuhp_report_idle_dead() after it calls rcu_report_dead(), meaning that any RCU usage by xen_send_IPI_one() is a bad idea. Unfortunately xen_send_IPI_one() is using notify_remote_via_irq() today, which is using irq_get_chip_data() via info_for_irq(). And irq_get_chip_data() in turn is using a maple-tree lookup requiring RCU. Avoid this problem by caching the ipi event channels in another percpu variable, allowing the use notify_remote_via_evtchn() in xen_send_IPI_one(). Fixes: 721255b9826b ("genirq: Use a maple tree for interrupt descriptor management") Reported-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20xen-pciback: Consider INTx disabled when MSI/MSI-X is enabledMarek Marczykowski-Górecki3-25/+23
[ Upstream commit 2c269f42d0f382743ab230308b836ffe5ae9b2ae ] Linux enables MSI-X before disabling INTx, but keeps MSI-X masked until the table is filled. Then it disables INTx just before clearing MASKALL bit. Currently this approach is rejected by xen-pciback. According to the PCIe spec, device cannot use INTx when MSI/MSI-X is enabled (in other words: enabling MSI/MSI-X implicitly disables INTx). Change the logic to consider INTx disabled if MSI/MSI-X is enabled. This applies to three places: - checking currently enabled interrupts type, - transition to MSI/MSI-X - where INTx would be implicitly disabled, - clearing INTx disable bit - which can be allowed even if MSI/MSI-X is enabled, as device should consider INTx disabled anyway in that case Fixes: 5e29500eba2a ("xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too") Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20231016131348.1734721-1-marmarek@invisiblethingslab.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20xen: Make struct privcmd_irqfd's layout architecture independentViresh Kumar1-1/+1
[ Upstream commit 8dd765a5d769c521d73931850d1c8708fbc490cb ] Using indirect pointers in an ioctl command argument means that the layout is architecture specific, in particular we can't use the same one from 32-bit compat tasks. The general recommendation is to have __u64 members and use u64_to_user_ptr() to access it from the kernel if we are unable to avoid the pointers altogether. Fixes: f8941e6c4c71 ("xen: privcmd: Add support for irqfd") Reported-by: Arnd Bergmann <arnd@kernel.org> Closes: https://lore.kernel.org/all/268a2031-63b8-4c7d-b1e5-8ab83ca80b4a@app.fastmail.com/ Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/a4ef0d4a68fc858b34a81fd3f9877d9b6898eb77.1697439990.git.viresh.kumar@linaro.org Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20xenbus: fix error exit in xenbus_init()Juergen Gross1-1/+1
[ Upstream commit 44961b81a9e9059b5c0443643915386db7035227 ] In case an error occurs in xenbus_init(), xen_store_domain_type should be set to XS_UNKNOWN. Fix one instance where this action is missing. Fixes: 5b3353949e89 ("xen: add support for initializing xenstore later as HVM domain") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/202304200845.w7m4kXZr-lkp@intel.com/ Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Link: https://lore.kernel.org/r/20230822091138.4765-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-09xen/events: replace evtchn_rwlock with RCUJuergen Gross1-41/+46
In unprivileged Xen guests event handling can cause a deadlock with Xen console handling. The evtchn_rwlock and the hvc_lock are taken in opposite sequence in __hvc_poll() and in Xen console IRQ handling. Normally this is no problem, as the evtchn_rwlock is taken as a reader in both paths, but as soon as an event channel is being closed, the lock will be taken as a writer, which will cause read_lock() to block: CPU0 CPU1 CPU2 (IRQ handling) (__hvc_poll()) (closing event channel) read_lock(evtchn_rwlock) spin_lock(hvc_lock) write_lock(evtchn_rwlock) [blocks] spin_lock(hvc_lock) [blocks] read_lock(evtchn_rwlock) [blocks due to writer waiting, and not in_interrupt()] This issue can be avoided by replacing evtchn_rwlock with RCU in xen_free_irq(). Note that RCU is used only to delay freeing of the irq_info memory. There is no RCU based dereferencing or replacement of pointers involved. In order to avoid potential races between removing the irq_info reference and handling of interrupts, set the irq_info pointer to NULL only when freeing its memory. The IRQ itself must be freed at that time, too, as otherwise the same IRQ number could be allocated again before handling of the old instance would have been finished. This is XSA-441 / CVE-2023-34324. Fixes: 54c9de89895e ("xen/events: add a new "late EOI" evtchn framework") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2023-09-19xen: simplify evtchn_do_upcall() call mazeJuergen Gross2-20/+3
There are several functions involved for performing the functionality of evtchn_do_upcall(): - __xen_evtchn_do_upcall() doing the real work - xen_hvm_evtchn_do_upcall() just being a wrapper for __xen_evtchn_do_upcall(), exposed for external callers - xen_evtchn_do_upcall() calling __xen_evtchn_do_upcall(), too, but without any user Simplify this maze by: - removing the unused xen_evtchn_do_upcall() - removing xen_hvm_evtchn_do_upcall() as the only left caller of __xen_evtchn_do_upcall(), while renaming __xen_evtchn_do_upcall() to xen_evtchn_do_upcall() Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-30Merge tag 'dma-mapping-6.6-2023-08-29' of ↵Linus Torvalds1-1/+1
git://git.infradead.org/users/hch/dma-mapping Pull dma-maping updates from Christoph Hellwig: - allow dynamic sizing of the swiotlb buffer, to cater for secure virtualization workloads that require all I/O to be bounce buffered (Petr Tesarik) - move a declaration to a header (Arnd Bergmann) - check for memory region overlap in dma-contiguous (Binglei Wang) - remove the somewhat dangerous runtime swiotlb-xen enablement and unexport is_swiotlb_active (Christoph Hellwig, Juergen Gross) - per-node CMA improvements (Yajun Deng) * tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: optimize get_max_slots() swiotlb: move slot allocation explanation comment where it belongs swiotlb: search the software IO TLB only if the device makes use of it swiotlb: allocate a new memory pool when existing pools are full swiotlb: determine potential physical address limit swiotlb: if swiotlb is full, fall back to a transient memory pool swiotlb: add a flag whether SWIOTLB is allowed to grow swiotlb: separate memory pool data from other allocator data swiotlb: add documentation and rename swiotlb_do_find_slots() swiotlb: make io_tlb_default_mem local to swiotlb.c swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated dma-contiguous: check for memory region overlap dma-contiguous: support numa CMA for specified node dma-contiguous: support per-numa CMA for all architectures dma-mapping: move arch_dma_set_mask() declaration to header swiotlb: unexport is_swiotlb_active x86: always initialize xen-swiotlb when xen-pcifront is enabling xen/pci: add flag for PCI passthrough being possible
2023-08-22xen: privcmd: Add support for irqfdViresh Kumar2-2/+287
Xen provides support for injecting interrupts to the guests via the HYPERVISOR_dm_op() hypercall. The same is used by the Virtio based device backend implementations, in an inefficient manner currently. Generally, the Virtio backends are implemented to work with the Eventfd based mechanism. In order to make such backends work with Xen, another software layer needs to poll the Eventfds and raise an interrupt to the guest using the Xen based mechanism. This results in an extra context switch. This is not a new problem in Linux though. It is present with other hypervisors like KVM, etc. as well. The generic solution implemented in the kernel for them is to provide an IOCTL call to pass the interrupt details and eventfd, which lets the kernel take care of polling the eventfd and raising of the interrupt, instead of handling this in user space (which involves an extra context switch). This patch adds support to inject a specific interrupt to guest using the eventfd mechanism, by preventing the extra context switch. Inspired by existing implementations for KVM, etc.. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/8e724ac1f50c2bc1eb8da9b3ff6166f1372570aa.1692697321.git.viresh.kumar@linaro.org Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-22xen/xenbus: Avoid a lockdep warning when adding a watchPetr Pavlu1-2/+2
The following lockdep warning appears during boot on a Xen dom0 system: [ 96.388794] ====================================================== [ 96.388797] WARNING: possible circular locking dependency detected [ 96.388799] 6.4.0-rc5-default+ #8 Tainted: G EL [ 96.388803] ------------------------------------------------------ [ 96.388804] xenconsoled/1330 is trying to acquire lock: [ 96.388808] ffffffff82acdd10 (xs_watch_rwsem){++++}-{3:3}, at: register_xenbus_watch+0x45/0x140 [ 96.388847] but task is already holding lock: [ 96.388849] ffff888100c92068 (&u->msgbuffer_mutex){+.+.}-{3:3}, at: xenbus_file_write+0x2c/0x600 [ 96.388862] which lock already depends on the new lock. [ 96.388864] the existing dependency chain (in reverse order) is: [ 96.388866] -> #2 (&u->msgbuffer_mutex){+.+.}-{3:3}: [ 96.388874] __mutex_lock+0x85/0xb30 [ 96.388885] xenbus_dev_queue_reply+0x48/0x2b0 [ 96.388890] xenbus_thread+0x1d7/0x950 [ 96.388897] kthread+0xe7/0x120 [ 96.388905] ret_from_fork+0x2c/0x50 [ 96.388914] -> #1 (xs_response_mutex){+.+.}-{3:3}: [ 96.388923] __mutex_lock+0x85/0xb30 [ 96.388930] xenbus_backend_ioctl+0x56/0x1c0 [ 96.388935] __x64_sys_ioctl+0x90/0xd0 [ 96.388942] do_syscall_64+0x5c/0x90 [ 96.388950] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 96.388957] -> #0 (xs_watch_rwsem){++++}-{3:3}: [ 96.388965] __lock_acquire+0x1538/0x2260 [ 96.388972] lock_acquire+0xc6/0x2b0 [ 96.388976] down_read+0x2d/0x160 [ 96.388983] register_xenbus_watch+0x45/0x140 [ 96.388990] xenbus_file_write+0x53d/0x600 [ 96.388994] vfs_write+0xe4/0x490 [ 96.389003] ksys_write+0xb8/0xf0 [ 96.389011] do_syscall_64+0x5c/0x90 [ 96.389017] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 96.389023] other info that might help us debug this: [ 96.389025] Chain exists of: xs_watch_rwsem --> xs_response_mutex --> &u->msgbuffer_mutex [ 96.413429] Possible unsafe locking scenario: [ 96.413430] CPU0 CPU1 [ 96.413430] ---- ---- [ 96.413431] lock(&u->msgbuffer_mutex); [ 96.413432] lock(xs_response_mutex); [ 96.413433] lock(&u->msgbuffer_mutex); [ 96.413434] rlock(xs_watch_rwsem); [ 96.413436] *** DEADLOCK *** [ 96.413436] 1 lock held by xenconsoled/1330: [ 96.413438] #0: ffff888100c92068 (&u->msgbuffer_mutex){+.+.}-{3:3}, at: xenbus_file_write+0x2c/0x600 [ 96.413446] An ioctl call IOCTL_XENBUS_BACKEND_SETUP (record #1 in the report) results in calling xenbus_alloc() -> xs_suspend() which introduces ordering xs_watch_rwsem --> xs_response_mutex. The xenbus_thread() operation (record #2) creates xs_response_mutex --> &u->msgbuffer_mutex. An XS_WATCH write to the xenbus file then results in a complain about the opposite lock order &u->msgbuffer_mutex --> xs_watch_rwsem. The dependency xs_watch_rwsem --> xs_response_mutex is spurious. Avoid it and the warning by changing the ordering in xs_suspend(), first acquire xs_response_mutex and then xs_watch_rwsem. Reverse also the unlocking order in xs_suspend_cancel() for consistency, but keep xs_resume() as is because it needs to have xs_watch_rwsem unlocked only after exiting xs suspend and re-adding all watches. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230607123624.15739-1-petr.pavlu@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-21xen: Fix one kernel-doc commentYang Li1-1/+1
Use colon to separate parameter name from their specific meaning. silence the warning: drivers/xen/grant-table.c:1051: warning: Function parameter or member 'nr_pages' not described in 'gnttab_free_pages' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6030 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Acked-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230731030037.123946-1-yang.lee@linux.alibaba.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-21xen: xenbus: Use helper function IS_ERR_OR_NULL()Li Zetao1-1/+1
Use IS_ERR_OR_NULL() to detect an error pointer or a null pointer open-coding to simplify the code. Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230817014736.3094289-1-lizetao1@huawei.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-21xen: Switch to use kmemdup() helperRuan Jinjie1-5/+2
Use kmemdup() helper instead of open-coding to simplify the code. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Chen Jiahao <chenjiahao16@huawei.com> Link: https://lore.kernel.org/r/20230815092434.1206386-1-ruanjinjie@huawei.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-21xen-pciback: Remove unused function declarationsYue Haibing2-5/+0
Commit a92336a1176b ("xen/pciback: Drop two backends, squash and cleanup some code.") declared but never implemented these functions. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230808150912.43416-1-yuehaibing@huawei.com Signed-off-by: Juergen Gross <jgross@suse.com>