kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2025-07-06	Drivers: hv: vmbus: Add utility function for querying ring size	Saurabh Sengar	2	-3/+17
	[ Upstream commit e8c4bd6c6e6b7e7b416c42806981c2a81370001e ] Add a function to query for the preferred ring buffer size of VMBus device. This will allow the drivers (eg. UIO) to allocate the most optimized ring buffer size for devices. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1711788723-8593-2-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 0315fef2aff9 ("uio_hv_generic: Align ring size to system page") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27	Drivers: hv: Allocate interrupt and monitor pages aligned to system page ↵	Long Li	1	-6/+17
	boundary commit 09eea7ad0b8e973dcf5ed49902838e5d68177f8e upstream. There are use cases that interrupt and monitor pages are mapped to user-mode through UIO, so they need to be system page aligned. Some Hyper-V allocation APIs introduced earlier broke those requirements. Fix this by using page allocation functions directly for interrupt and monitor pages. Cc: stable@vger.kernel.org Fixes: ca48739e59df ("Drivers: hv: vmbus: Move Hyper-V page allocator to arch neutral code") Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1746492997-4599-2-git-send-email-longli@linuxonhyperv.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1746492997-4599-2-git-send-email-longli@linuxonhyperv.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22	Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer()	Michael Kelley	1	-59/+0
	commit 45a442fe369e6c4e0b4aa9f63b31c3f2f9e2090e upstream. With the netvsc driver changed to use vmbus_sendpacket_mpb_desc() instead of vmbus_sendpacket_pagebuffer(), the latter has no remaining callers. Remove it. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-6-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22	Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple ranges	Michael Kelley	1	-3/+3
	commit 380b75d3078626aadd0817de61f3143f5db6e393 upstream. vmbus_sendpacket_mpb_desc() is currently used only by the storvsc driver and is hardcoded to create a single GPA range. To allow it to also be used by the netvsc driver to create multiple GPA ranges, no longer hardcode as having a single GPA range. Allow the calling driver to specify the rangecount in the supplied descriptor. Update the storvsc driver to reflect this new approach. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-2-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-22	Drivers: hv: vmbus: Don't release fb_mmio resource in vmbus_free_mmio()	Michael Kelley	1	-0/+13
	[ Upstream commit 73fe9073c0cc28056cb9de0c8a516dac070f1d1f ] The VMBus driver manages the MMIO space it owns via the hyperv_mmio resource tree. Because the synthetic video framebuffer portion of the MMIO space is initially setup by the Hyper-V host for each guest, the VMBus driver does an early reserve of that portion of MMIO space in the hyperv_mmio resource tree. It saves a pointer to that resource in fb_mmio. When a VMBus driver requests MMIO space and passes "true" for the "fb_overlap_ok" argument, the reserved framebuffer space is used if possible. In that case it's not necessary to do another request against the "shadow" hyperv_mmio resource tree because that resource was already requested in the early reserve steps. However, the vmbus_free_mmio() function currently does no special handling for the fb_mmio resource. When a framebuffer device is removed, or the driver is unbound, the current code for vmbus_free_mmio() releases the reserved resource, leaving fb_mmio pointing to memory that has been freed. If the same or another driver is subsequently bound to the device, vmbus_allocate_mmio() checks against fb_mmio, and potentially gets garbage. Furthermore a second unbind operation produces this "nonexistent resource" error because of the unbalanced behavior between vmbus_allocate_mmio() and vmbus_free_mmio(): [ 55.499643] resource: Trying to free nonexistent resource <0x00000000f0000000-0x00000000f07fffff> Fix this by adding logic to vmbus_free_mmio() to recognize when MMIO space in the fb_mmio reserved area would be released, and don't release it. This filtering ensures the fb_mmio resource always exists, and makes vmbus_free_mmio() more parallel with vmbus_allocate_mmio(). Fixes: be000f93e5d7 ("drivers:hv: Track allocations of children of hv_vmbus in private resource tree") Signed-off-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Link: https://lore.kernel.org/r/20250310035208.275764-1-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250310035208.275764-1-mhklinux@outlook.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-27	Drivers: hv: util: Avoid accessing a ringbuffer not initialized yet	Michael Kelley	4	-0/+23
	commit 07a756a49f4b4290b49ea46e089cbe6f79ff8d26 upstream. If the KVP (or VSS) daemon starts before the VMBus channel's ringbuffer is fully initialized, we can hit the panic below: hv_utils: Registering HyperV Utility Driver hv_vmbus: registering driver hv_utils ... BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 44 UID: 0 PID: 2552 Comm: hv_kvp_daemon Tainted: G E 6.11.0-rc3+ #1 RIP: 0010:hv_pkt_iter_first+0x12/0xd0 Call Trace: ... vmbus_recvpacket hv_kvp_onchannelcallback vmbus_on_event tasklet_action_common tasklet_action handle_softirqs irq_exit_rcu sysvec_hyperv_stimer0 </IRQ> <TASK> asm_sysvec_hyperv_stimer0 ... kvp_register_done hvt_op_read vfs_read ksys_read __x64_sys_read This can happen because the KVP/VSS channel callback can be invoked even before the channel is fully opened: 1) as soon as hv_kvp_init() -> hvutil_transport_init() creates /dev/vmbus/hv_kvp, the kvp daemon can open the device file immediately and register itself to the driver by writing a message KVP_OP_REGISTER1 to the file (which is handled by kvp_on_msg() ->kvp_handle_handshake()) and reading the file for the driver's response, which is handled by hvt_op_read(), which calls hvt->on_read(), i.e. kvp_register_done(). 2) the problem with kvp_register_done() is that it can cause the channel callback to be called even before the channel is fully opened, and when the channel callback is starting to run, util_probe()-> vmbus_open() may have not initialized the ringbuffer yet, so the callback can hit the panic of NULL pointer dereference. To reproduce the panic consistently, we can add a "ssleep(10)" for KVP in __vmbus_open(), just before the first hv_ringbuffer_init(), and then we unload and reload the driver hv_utils, and run the daemon manually within the 10 seconds. Fix the panic by reordering the steps in util_probe() so the char dev entry used by the KVP or VSS daemon is not created until after vmbus_open() has completed. This reordering prevents the race condition from happening. Reported-by: Dexuan Cui <decui@microsoft.com> Fixes: e0fa3e5e7df6 ("Drivers: hv: utils: fix a race on userspace daemons registration") Cc: stable@vger.kernel.org Signed-off-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20241106154247.2271-3-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20241106154247.2271-3-mhklinux@outlook.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12	Drivers: hv: vmbus: Fix rescind handling in uio_hv_generic	Naman Jain	1	-0/+1
	commit 6fd28941447bf2c8ca0f26fda612a1cabc41663f upstream. Rescind offer handling relies on rescind callbacks for some of the resources cleanup, if they are registered. It does not unregister vmbus device for the primary channel closure, when callback is registered. Without it, next onoffer does not come, rescind flag remains set and device goes to unusable state. Add logic to unregister vmbus for the primary channel in rescind callback to ensure channel removal and relid release, and to ensure that next onoffer can be received and handled properly. Cc: stable@vger.kernel.org Fixes: ca3cda6fcf1e ("uio_hv_generic: add rescind support") Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Link: https://lore.kernel.org/r/20240829071312.1595-3-namjain@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-17	Drivers: hv: vmbus: Don't free ring buffers that couldn't be re-encrypted	Michael Kelley	1	-1/+3
	[ Upstream commit 30d18df6567be09c1433e81993e35e3da573ac48 ] In CoCo VMs it is possible for the untrusted host to cause set_memory_encrypted() or set_memory_decrypted() to fail such that an error is returned and the resulting memory is shared. Callers need to take care to handle these errors to avoid returning decrypted (shared) memory to the page allocator, which could lead to functional or security issues. The VMBus ring buffer code could free decrypted/shared pages if set_memory_decrypted() fails. Check the decrypted field in the struct vmbus_gpadl for the ring buffers to decide whether to free the memory. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/20240311161558.1310-6-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240311161558.1310-6-mhklinux@outlook.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-17	Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl	Rick Edgecombe	1	-4/+21
	[ Upstream commit 211f514ebf1ef5de37b1cf6df9d28a56cfd242ca ] In CoCo VMs it is possible for the untrusted host to cause set_memory_encrypted() or set_memory_decrypted() to fail such that an error is returned and the resulting memory is shared. Callers need to take care to handle these errors to avoid returning decrypted (shared) memory to the page allocator, which could lead to functional or security issues. In order to make sure callers of vmbus_establish_gpadl() and vmbus_teardown_gpadl() don't return decrypted/shared pages to allocators, add a field in struct vmbus_gpadl to keep track of the decryption status of the buffers. This will allow the callers to know if they should free or leak the pages. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/20240311161558.1310-3-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240311161558.1310-3-mhklinux@outlook.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-17	Drivers: hv: vmbus: Leak pages if set_memory_encrypted() fails	Rick Edgecombe	1	-7/+22
	[ Upstream commit 03f5a999adba062456c8c818a683beb1b498983a ] In CoCo VMs it is possible for the untrusted host to cause set_memory_encrypted() or set_memory_decrypted() to fail such that an error is returned and the resulting memory is shared. Callers need to take care to handle these errors to avoid returning decrypted (shared) memory to the page allocator, which could lead to functional or security issues. VMBus code could free decrypted pages if set_memory_encrypted()/decrypted() fails. Leak the pages if this happens. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/20240311161558.1310-2-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240311161558.1310-2-mhklinux@outlook.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27	x86/hyperv: Use per cpu initial stack for vtl context	Saurabh Sengar	1	-0/+1
	[ Upstream commit 2b4b90e053a29057fb05ba81acce26bddce8d404 ] Currently, the secondary CPUs in Hyper-V VTL context lack support for parallel startup. Therefore, relying on the single initial_stack fetched from the current task structure suffices for all vCPUs. However, common initial_stack risks stack corruption when parallel startup is enabled. In order to facilitate parallel startup, use the initial_stack from the per CPU idle thread instead of the current task. Fixes: 3be1bc2fe9d2 ("x86/hyperv: VTL support for Hyper-V") Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1709452896-13342-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1709452896-13342-1-git-send-email-ssengar@linux.microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-04	Merge tag 'hyperv-next-signed-20230902' of ↵	Linus Torvalds	6	-66/+225
	git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Support for SEV-SNP guests on Hyper-V (Tianyu Lan) - Support for TDX guests on Hyper-V (Dexuan Cui) - Use SBRM API in Hyper-V balloon driver (Mitchell Levy) - Avoid dereferencing ACPI root object handle in VMBus driver (Maciej Szmigiero) - A few misecllaneous fixes (Jiapeng Chong, Nathan Chancellor, Saurabh Sengar) * tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits) x86/hyperv: Remove duplicate include x86/hyperv: Move the code in ivm.c around to avoid unnecessary ifdef's x86/hyperv: Remove hv_isolation_type_en_snp x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor x86/hyperv: Introduce a global variable hyperv_paravisor_present Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM x86/hyperv: Fix serial console interrupts for fully enlightened TDX guests Drivers: hv: vmbus: Support fully enlightened TDX guests x86/hyperv: Support hypercalls for fully enlightened TDX guests x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests x86/hyperv: Fix undefined reference to isolation_type_en_snp without CONFIG_HYPERV x86/hyperv: Add missing 'inline' to hv_snp_boot_ap() stub hv: hyperv.h: Replace one-element array with flexible-array member Drivers: hv: vmbus: Don't dereference ACPI root object handle x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES x86/hyperv: Add smp support for SEV-SNP guest clocksource: hyper-v: Mark hyperv tsc page unencrypted in sev-snp enlightened guest x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest drivers: hv: Mark percpu hvcall input arg page unencrypted in SEV-SNP enlightened guest ...
2023-08-25	x86/hyperv: Remove hv_isolation_type_en_snp	Dexuan Cui	2	-9/+3
	In ms_hyperv_init_platform(), do not distinguish between a SNP VM with the paravisor and a SNP VM without the paravisor. Replace hv_isolation_type_en_snp() with !ms_hyperv.paravisor_present && hv_isolation_type_snp(). The hv_isolation_type_en_snp() in drivers/hv/hv.c and drivers/hv/hv_common.c can be changed to hv_isolation_type_snp() since we know !ms_hyperv.paravisor_present is true there. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-10-decui@microsoft.com
2023-08-25	Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor	Dexuan Cui	2	-6/+64
	The post_msg_page was removed in commit 9a6b1a170ca8 ("Drivers: hv: vmbus: Remove the per-CPU post_msg_page") However, it turns out that we need to bring it back, but only for a TDX VM with the paravisor: in such a VM, the hyperv_pcpu_input_arg is not decrypted, but the HVCALL_POST_MESSAGE in such a VM needs a decrypted page as the hypercall input page: see the comments in hyperv_init() for a detailed explanation. Except for HVCALL_POST_MESSAGE and HVCALL_SIGNAL_EVENT, the other hypercalls in a TDX VM with the paravisor still use hv_hypercall_pg and must use the hyperv_pcpu_input_arg (which is encrypted in such a VM), when a hypercall input page is used. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-8-decui@microsoft.com
2023-08-25	x86/hyperv: Introduce a global variable hyperv_paravisor_present	Dexuan Cui	3	-10/+18
	The new variable hyperv_paravisor_present is set only when the VM is a SNP/TDX VM with the paravisor running: see ms_hyperv_init_platform(). We introduce hyperv_paravisor_present because we can not use ms_hyperv.paravisor_present in arch/x86/include/asm/mshyperv.h: struct ms_hyperv_info is defined in include/asm-generic/mshyperv.h, which is included at the end of arch/x86/include/asm/mshyperv.h, but at the beginning of arch/x86/include/asm/mshyperv.h, we would already need to use struct ms_hyperv_info in hv_do_hypercall(). We use hyperv_paravisor_present only in include/asm-generic/mshyperv.h, and use ms_hyperv.paravisor_present elsewhere. In the future, we'll introduce a hypercall function structure for different VM types, and at boot time, the right function pointers would be written into the structure so that runtime testing of TDX vs. SNP vs. normal will be avoided and hyperv_paravisor_present will no longer be needed. Call hv_vtom_init() when it's a VBS VM or when ms_hyperv.paravisor_present is true, i.e. the VM is a SNP VM or TDX VM with the paravisor. Enhance hv_vtom_init() for a TDX VM with the paravisor. In hv_common_cpu_init(), don't decrypt the hyperv_pcpu_input_arg for a TDX VM with the paravisor, just like we don't decrypt the page for a SNP VM with the paravisor. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-7-decui@microsoft.com
2023-08-25	Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM	Dexuan Cui	1	-7/+23
	Don't set this_cpu_ptr(hyperv_pcpu_input_arg) before the function set_memory_decrypted() returns, otherwise we run into this ticky issue: For a fully enlightened TDX/SNP VM, in hv_common_cpu_init(), this_cpu_ptr(hyperv_pcpu_input_arg) is an encrypted page before the set_memory_decrypted() returns. When such a VM has more than 64 VPs, if the hyperv_pcpu_input_arg is not NULL, hv_common_cpu_init() -> set_memory_decrypted() -> ... -> cpa_flush() -> on_each_cpu() -> ... -> hv_send_ipi_mask() -> ... -> __send_ipi_mask_ex() tries to call hv_do_rep_hypercall() with the hyperv_pcpu_input_arg as the hypercall input page, which must be a decrypted page in such a VM, but the page is still encrypted at this point, and a fatal fault is triggered. Fix the issue by setting *this_cpu_ptr(hyperv_pcpu_input_arg) after set_memory_decrypted(): if the hyperv_pcpu_input_arg is NULL, __send_ipi_mask_ex() returns HV_STATUS_INVALID_PARAMETER immediately, and hv_send_ipi_mask() falls back to orig_apic.send_IPI_mask(), which can use x2apic_send_IPI_all(), which may be slightly slower than the hypercall but still works correctly in such a VM. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-6-decui@microsoft.com
2023-08-25	Drivers: hv: vmbus: Support fully enlightened TDX guests	Dexuan Cui	1	-2/+7
	Add Hyper-V specific code so that a fully enlightened TDX guest (i.e. without the paravisor) can run on Hyper-V: Don't use hv_vp_assist_page. Use GHCI instead. Don't try to use the unsupported HV_REGISTER_CRASH_CTL. Don't trust (use) Hyper-V's TLB-flushing hypercalls. Don't use lazy EOI. Share the SynIC Event/Message pages with the hypervisor. Don't use the Hyper-V TSC page for now, because non-trivial work is required to share the page with the hypervisor. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-4-decui@microsoft.com
2023-08-25	x86/hyperv: Support hypercalls for fully enlightened TDX guests	Dexuan Cui	1	-2/+8
	A fully enlightened TDX guest on Hyper-V (i.e. without the paravisor) only uses the GHCI call rather than hv_hypercall_pg. Do not initialize hypercall_pg for such a guest. In hv_common_cpu_init(), the hyperv_pcpu_input_arg page needs to be decrypted in such a guest. Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-3-decui@microsoft.com
2023-08-25	x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests	Dexuan Cui	1	-0/+6
	No logic change to SNP/VBS guests. hv_isolation_type_tdx() will be used to instruct a TDX guest on Hyper-V to do some TDX-specific operations, e.g. for a fully enlightened TDX guest (i.e. without the paravisor), hv_do_hypercall() should use __tdx_hypercall() and such a guest on Hyper-V should handle the Hyper-V Event/Message/Monitor pages specially. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-2-decui@microsoft.com
2023-08-22	Drivers: hv: vmbus: Don't dereference ACPI root object handle	Maciej S. Szmigiero	1	-1/+2
	Since the commit referenced in the Fixes: tag below the VMBus client driver is walking the ACPI namespace up from the VMBus ACPI device to the ACPI namespace root object trying to find Hyper-V MMIO ranges. However, if it is not able to find them it ends trying to walk resources of the ACPI namespace root object itself. This object has all-ones handle, which causes a NULL pointer dereference in the ACPI code (from dereferencing this pointer with an offset). This in turn causes an oops on boot with VMBus host implementations that do not provide Hyper-V MMIO ranges in their VMBus ACPI device or its ancestors. The QEMU VMBus implementation is an example of such implementation. I guess providing these ranges is optional, since all tested Windows versions seem to be able to use VMBus devices without them. Fix this by explicitly terminating the lookup at the ACPI namespace root object. Note that Linux guests under KVM/QEMU do not use the Hyper-V PV interface by default - they only do so if the KVM PV interface is missing or disabled. Example stack trace of such oops: [ 3.710827] ? __die+0x1f/0x60 [ 3.715030] ? page_fault_oops+0x159/0x460 [ 3.716008] ? exc_page_fault+0x73/0x170 [ 3.716959] ? asm_exc_page_fault+0x22/0x30 [ 3.717957] ? acpi_ns_lookup+0x7a/0x4b0 [ 3.718898] ? acpi_ns_internalize_name+0x79/0xc0 [ 3.720018] acpi_ns_get_node_unlocked+0xb5/0xe0 [ 3.721120] ? acpi_ns_check_object_type+0xfe/0x200 [ 3.722285] ? acpi_rs_convert_aml_to_resource+0x37/0x6e0 [ 3.723559] ? down_timeout+0x3a/0x60 [ 3.724455] ? acpi_ns_get_node+0x3a/0x60 [ 3.725412] acpi_ns_get_node+0x3a/0x60 [ 3.726335] acpi_ns_evaluate+0x1c3/0x2c0 [ 3.727295] acpi_ut_evaluate_object+0x64/0x1b0 [ 3.728400] acpi_rs_get_method_data+0x2b/0x70 [ 3.729476] ? vmbus_platform_driver_probe+0x1d0/0x1d0 [hv_vmbus] [ 3.730940] ? vmbus_platform_driver_probe+0x1d0/0x1d0 [hv_vmbus] [ 3.732411] acpi_walk_resources+0x78/0xd0 [ 3.733398] vmbus_platform_driver_probe+0x9f/0x1d0 [hv_vmbus] [ 3.734802] platform_probe+0x3d/0x90 [ 3.735684] really_probe+0x19b/0x400 [ 3.736570] ? __device_attach_driver+0x100/0x100 [ 3.737697] __driver_probe_device+0x78/0x160 [ 3.738746] driver_probe_device+0x1f/0x90 [ 3.739743] __driver_attach+0xc2/0x1b0 [ 3.740671] bus_for_each_dev+0x70/0xc0 [ 3.741601] bus_add_driver+0x10e/0x210 [ 3.742527] driver_register+0x55/0xf0 [ 3.744412] ? 0xffffffffc039a000 [ 3.745207] hv_acpi_init+0x3c/0x1000 [hv_vmbus] Fixes: 7f163a6fd957 ("drivers:hv: Modify hv_vmbus to search for all MMIO ranges available.") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/fd8e64ceeecfd1d95ff49021080cf699e88dbbde.1691606267.git.maciej.szmigiero@oracle.com
2023-08-22	drivers: hv: Mark percpu hvcall input arg page unencrypted in SEV-SNP ↵	Tianyu Lan	2	-3/+67
	enlightened guest Hypervisor needs to access input arg, VMBus synic event and message pages. Mark these pages unencrypted in the SEV-SNP guest and free them only if they have been marked encrypted successfully. Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230818102919.1318039-5-ltykernel@gmail.com
2023-08-22	x86/hyperv: Set Virtual Trust Level in VMBus init message	Tianyu Lan	1	-0/+1
	SEV-SNP guests on Hyper-V can run at multiple Virtual Trust Levels (VTL). During boot, get the VTL at which we're running using the GET_VP_REGISTERs hypercall, and save the value for future use. Then during VMBus initialization, set the VTL with the saved value as required in the VMBus init message. Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230818102919.1318039-3-ltykernel@gmail.com
2023-08-22	x86/hyperv: Add sev-snp enlightened guest static key	Tianyu Lan	1	-0/+6
	Introduce static key isolation_type_en_snp for enlightened sev-snp guest check. Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230818102919.1318039-2-ltykernel@gmail.com
2023-08-12	hv_balloon: Update the balloon driver to use the SBRM API	Mitchell Levy	1	-44/+38
	This patch is intended as a proof-of-concept for the new SBRM machinery[1]. For some brief background, the idea behind SBRM is using the __cleanup__ attribute to automatically unlock locks (or otherwise release resources) when they go out of scope, similar to C++ style RAII. This promises some benefits such as making code simpler (particularly where you have lots of goto fail; type constructs) as well as reducing the surface area for certain kinds of bugs. The changes in this patch should not result in any difference in how the code actually runs (i.e., it's purely an exercise in this new syntax sugar). In one instance SBRM was not appropriate, so I left that part alone, but all other locking/unlocking is handled automatically in this patch. [1] https://lore.kernel.org/all/20230626125726.GU4253@hirez.programming.kicks-ass.net/ Suggested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: "Mitchell Levy (Microsoft)" <levymitchell0@gmail.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230807-sbrm-hyperv-v2-1-9d2ac15305bd@gmail.com
2023-06-28	x86/hyperv: Improve code for referencing hyperv_pcpu_input_arg	Nischala Yelchuri	1	-1/+1
	Several places in code for Hyper-V reference the per-CPU variable hyperv_pcpu_input_arg. Older code uses a multi-line sequence to reference the variable, and usually includes a cast. Newer code does a much simpler direct assignment. The latter is preferable as the complexity of the older code is unnecessary. Update older code to use the simpler direct assignment. Signed-off-by: Nischala Yelchuri <niyelchu@linux.microsoft.com> Link: https://lore.kernel.org/r/1687286438-9421-1-git-send-email-niyelchu@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-06-28	Drivers: hv: Change hv_free_hyperv_page() to take void * argument	Kameron Carr	2	-12/+11
	Currently hv_free_hyperv_page() takes an unsigned long argument, which is inconsistent with the void * return value from the corresponding hv_alloc_hyperv_page() function and variants. This creates unnecessary extra casting. Change the hv_free_hyperv_page() argument type to void *. Also remove redundant casts from invocations of hv_alloc_hyperv_page() and variants. Signed-off-by: Kameron Carr <kameroncarr@linux.microsoft.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1687558189-19734-1-git-send-email-kameroncarr@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-06-18	x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline	Michael Kelley	1	-24/+24
	These commits a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg") 2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs") update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg because that memory will be correctly marked as decrypted or encrypted for all VM types (CoCo or normal). But problems ensue when CPUs in the VM go online or offline after virtual PCI devices have been configured. When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN. But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which may call the virtual PCI driver and fault trying to use the as yet uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo VM if the MMIO read and write hypercalls are used from state CPUHP_AP_IRQ_AFFINITY_ONLINE. When a CPU is taken offline, IRQs may be reassigned in state CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to use the hyperv_pcpu_input_arg that has already been freed by a higher state. Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE immediately after CPUHP_AP_ONLINE_IDLE (similar to CPUHP_AP_KVM_ONLINE) and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for Hyper-V initialization so that hyperv_pcpu_input_arg is allocated early enough. Fix the offlining problem by not freeing hyperv_pcpu_input_arg when a CPU goes offline. Retain the allocated memory, and reuse it if the CPU comes back online later. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1684862062-51576-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-05-23	Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs	Michael Kelley	1	-2/+16
	vmbus_wait_for_unload() may be called in the panic path after other CPUs are stopped. vmbus_wait_for_unload() currently loops through online CPUs looking for the UNLOAD response message. But the values of CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used to stop the other CPUs, and in one of the paths the stopped CPUs are removed from cpu_online_mask. This removal happens in both x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload() only checks the panic'ing CPU, and misses the UNLOAD response message except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload() eventually times out, but only after waiting 100 seconds. Fix this by looping through present CPUs in vmbus_wait_for_unload(). The cpu_present_mask is not modified by stopping the other CPUs in the panic path, nor should it be. Also, in a CoCo VM the synic_message_page is not allocated in hv_synic_alloc(), but is set and cleared in hv_synic_enable_regs() and hv_synic_disable_regs() such that it is set only when the CPU is online. If not all present CPUs are online when vmbus_wait_for_unload() is called, the synic_message_page might be NULL. Add a check for this. Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios") Cc: stable@vger.kernel.org Reported-by: John Starks <jostarks@microsoft.com> Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/1684422832-38476-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-05-08	Drivers: hv: vmbus: Call hv_synic_free() if hv_synic_alloc() fails	Dexuan Cui	1	-3/+2
	Commit 572086325ce9 ("Drivers: hv: vmbus: Cleanup synic memory free path") says "Any memory allocations that succeeded will be freed when the caller cleans up by calling hv_synic_free()", but if the get_zeroed_page() in hv_synic_alloc() fails, currently hv_synic_free() is not really called in vmbus_bus_init(), consequently there will be a memory leak, e.g. hv_context.hv_numa_map is not freed in the error path. Fix this by updating the goto labels. Cc: stable@kernel.org Signed-off-by: Dexuan Cui <decui@microsoft.com> Fixes: 4df4cb9e99f8 ("x86/hyperv: Initialize clockevents earlier in CPU onlining") Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20230504224155.10484-1-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-28	Merge tag 'hyperv-next-signed-20230424' of ↵	Linus Torvalds	8	-425/+421
	git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - PCI passthrough for Hyper-V confidential VMs (Michael Kelley) - Hyper-V VTL mode support (Saurabh Sengar) - Move panic report initialization code earlier (Long Li) - Various improvements and bug fixes (Dexuan Cui and Michael Kelley) * tag 'hyperv-next-signed-20230424' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits) PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg Drivers: hv: move panic report code from vmbus to hv early init code x86/hyperv: VTL support for Hyper-V Drivers: hv: Kconfig: Add HYPERV_VTL_MODE x86/hyperv: Make hv_get_nmi_reason public x86/hyperv: Add VTL specific structs and hypercalls x86/init: Make get/set_rtc_noop() public x86/hyperv: Exclude lazy TLB mode CPUs from enlightened TLB flushes x86/hyperv: Add callback filter to cpumask_to_vpset() Drivers: hv: vmbus: Remove the per-CPU post_msg_page clocksource: hyper-v: make sure Invariant-TSC is used if it is available PCI: hv: Enable PCI pass-thru devices in Confidential VMs Drivers: hv: Don't remap addresses that are above shared_gpa_boundary hv_netvsc: Remove second mapping of send and recv buffers Drivers: hv: vmbus: Remove second way of mapping ring buffers Drivers: hv: vmbus: Remove second mapping of VMBus monitor pages swiotlb: Remove bounce buffer remapping for Hyper-V Driver: VMBus: Add Devicetree support dt-bindings: bus: Add Hyper-V VMBus Drivers: hv: vmbus: Convert acpi_device to more generic platform_device ...
2023-04-28	Merge tag 'sysctl-6.4-rc1' of ↵	Linus Torvalds	1	-10/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "This only does a few sysctl moves from the kernel/sysctl.c file, the rest of the work has been put towards deprecating two API calls which incur recursion and prevent us from simplifying the registration process / saving memory per move. Most of the changes have been soaking on linux-next since v6.3-rc3. I've slowed down the kernel/sysctl.c moves due to Matthew Wilcox's feedback that we should see if we could save memory with these moves instead of incurring more memory. We currently incur more memory since when we move a syctl from kernel/sysclt.c out to its own file we end up having to add a new empty sysctl used to register it. To achieve saving memory we want to allow syctls to be passed without requiring the end element being empty, and just have our registration process rely on ARRAY_SIZE(). Without this, supporting both styles of sysctls would make the sysctl registration pretty brittle, hard to read and maintain as can be seen from Meng Tang's efforts to do just this [0]. Fortunately, in order to use ARRAY_SIZE() for all sysctl registrations also implies doing the work to deprecate two API calls which use recursion in order to support sysctl declarations with subdirectories. And so during this development cycle quite a bit of effort went into this deprecation effort. I've annotated the following two APIs are deprecated and in few kernel releases we should be good to remove them: - register_sysctl_table() - register_sysctl_paths() During this merge window we should be able to deprecate and unexport register_sysctl_paths(), we can probably do that towards the end of this merge window. Deprecating register_sysctl_table() will take a bit more time but this pull request goes with a few example of how to do this. As it turns out each of the conversions to move away from either of these two API calls also saves memory. And so long term, all these changes will prove to have saved a bit of memory on boot. The way I see it then is if remove a user of one deprecated call, it gives us enough savings to move one kernel/sysctl.c out from the generic arrays as we end up with about the same amount of bytes. Since deprecating register_sysctl_table() and register_sysctl_paths() does not require maintainer coordination except the final unexport you'll see quite a bit of these changes from other pull requests, I've just kept the stragglers after rc3" Link: https://lkml.kernel.org/r/ZAD+cpbrqlc5vmry@bombadil.infradead.org [0] * tag 'sysctl-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (29 commits) fs: fix sysctls.c built mm: compaction: remove incorrect #ifdef checks mm: compaction: move compaction sysctl to its own file mm: memory-failure: Move memory failure sysctls to its own file arm: simplify two-level sysctl registration for ctl_isa_vars ia64: simplify one-level sysctl registration for kdump_ctl_table utsname: simplify one-level sysctl registration for uts_kern_table ntfs: simplfy one-level sysctl registration for ntfs_sysctls coda: simplify one-level sysctl registration for coda_table fs/cachefiles: simplify one-level sysctl registration for cachefiles_sysctls xfs: simplify two-level sysctl registration for xfs_table nfs: simplify two-level sysctl registration for nfs_cb_sysctls nfs: simplify two-level sysctl registration for nfs4_cb_sysctls lockd: simplify two-level sysctl registration for nlm_sysctls proc_sysctl: enhance documentation xen: simplify sysctl registration for balloon md: simplify sysctl registration hv: simplify sysctl registration scsi: simplify sysctl registration with register_sysctl() csky: simplify alignment sysctl registration ...
2023-04-27	Merge tag 'driver-core-6.4-rc1' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[\|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ...
2023-04-25	Merge tag 'x86_sev_for_v6.4_rc1' of ↵	Linus Torvalds	2	-2/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Add the necessary glue so that the kernel can run as a confidential SEV-SNP vTOM guest on Hyper-V. A vTOM guest basically splits the address space in two parts: encrypted and unencrypted. The use case being running unmodified guests on the Hyper-V confidential computing hypervisor - Double-buffer messages between the guest and the hardware PSP device so that no partial buffers are copied back'n'forth and thus potential message integrity and leak attacks are possible - Name the return value the sev-guest driver returns when the hw PSP device hasn't been called, explicitly - Cleanups * tag 'x86_sev_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyperv: Change vTOM handling to use standard coco mechanisms init: Call mem_encrypt_init() after Hyper-V hypercall init is done x86/mm: Handle decryption/re-encryption of bss_decrypted consistently Drivers: hv: Explicitly request decrypted in vmap_pfn() calls x86/hyperv: Reorder code to facilitate future work x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM x86/sev: Change snp_guest_issue_request()'s fw_err argument virt/coco/sev-guest: Double-buffer messages crypto: ccp: Get rid of __sev_platform_init_locked()'s local function pointer crypto: ccp - Name -1 return value as SEV_RET_NO_FW_CALL
2023-04-21	Drivers: hv: move panic report code from vmbus to hv early init code	Long Li	3	-235/+231
	The panic reporting code was added in commit 81b18bce48af ("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic") It was added to the vmbus driver. The panic reporting has no dependence on vmbus, and can be enabled at an earlier boot time when Hyper-V is initialized. This patch moves the panic reporting code out of vmbus. There is no functionality changes. During moving, also refactored some cleanup functions into hv_kmsg_dump_unregister(). Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1682030946-6372-1-git-send-email-longli@linuxonhyperv.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-18	Drivers: hv: Kconfig: Add HYPERV_VTL_MODE	Saurabh Sengar	1	-0/+24
	Add HYPERV_VTL_MODE Kconfig flag for VTL mode. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1681192532-15460-5-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17	Drivers: hv: vmbus: Remove the per-CPU post_msg_page	Dexuan Cui	2	-19/+5
	The post_msg_page was introduced in 2014 in commit b29ef3546aec ("Drivers: hv: vmbus: Cleanup hv_post_message()") Commit 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments") introduced the hyperv_pcpu_input_arg in 2018, which can be used in hv_post_message(). Remove post_msg_page to simplify the code a little bit. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Jinank Jain <jinankjain@linux.microsoft.com> Link: https://lore.kernel.org/r/20230408213441.15472-1-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17	PCI: hv: Enable PCI pass-thru devices in Confidential VMs	Michael Kelley	1	-1/+1
	For PCI pass-thru devices in a Confidential VM, Hyper-V requires that PCI config space be accessed via hypercalls. In normal VMs, config space accesses are trapped to the Hyper-V host and emulated. But in a confidential VM, the host can't access guest memory to decode the instruction for emulation, so an explicit hypercall must be used. Add functions to make the new MMIO read and MMIO write hypercalls. Update the PCI config space access functions to use the hypercalls when such use is indicated by Hyper-V flags. Also, set the flag to allow the Hyper-V PCI driver to be loaded and used in a Confidential VM (a.k.a., "Isolation VM"). The driver has previously been hardened against a malicious Hyper-V host[1]. [1] https://lore.kernel.org/all/20220511223207.3386-2-parri.andrea@gmail.com/ Co-developed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://lore.kernel.org/r/1679838727-87310-13-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17	Drivers: hv: Don't remap addresses that are above shared_gpa_boundary	Michael Kelley	1	-10/+13
	With the vTOM bit now treated as a protection flag and not part of the physical address, avoid remapping physical addresses with vTOM set since technically such addresses aren't valid. Use ioremap_cache() instead of memremap() to ensure that the mapping provides decrypted access, which will correctly set the vTOM bit as a protection flag. While this change is not required for correctness with the current implementation of memremap(), for general code hygiene it's better to not depend on the mapping functions doing something reasonable with a physical address that is out-of-range. While here, fix typos in two error messages. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/1679838727-87310-12-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17	hv_netvsc: Remove second mapping of send and recv buffers	Michael Kelley	2	-12/+0
	With changes to how Hyper-V guest VMs flip memory between private (encrypted) and shared (decrypted), creating a second kernel virtual mapping for shared memory is no longer necessary. Everything needed for the transition to shared is handled by set_memory_decrypted(). As such, remove the code to create and manage the second mapping for the pre-allocated send and recv buffers. This mapping is the last user of hv_map_memory()/hv_unmap_memory(), so delete these functions as well. Finally, hv_map_memory() is the last user of vmap_pfn() in Hyper-V guest code, so remove the Kconfig selection of VMAP_PFN. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/1679838727-87310-11-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17	Drivers: hv: vmbus: Remove second way of mapping ring buffers	Michael Kelley	1	-42/+20
	With changes to how Hyper-V guest VMs flip memory between private (encrypted) and shared (decrypted), it's no longer necessary to have separate code paths for mapping VMBus ring buffers for for normal VMs and for Confidential VMs. As such, remove the code path that uses vmap_pfn(), and set the protection flags argument to vmap() to account for the difference between normal and Confidential VMs. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/1679838727-87310-10-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17	Drivers: hv: vmbus: Remove second mapping of VMBus monitor pages	Michael Kelley	2	-87/+28
	With changes to how Hyper-V guest VMs flip memory between private (encrypted) and shared (decrypted), creating a second kernel virtual mapping for shared memory is no longer necessary. Everything needed for the transition to shared is handled by set_memory_decrypted(). As such, remove the code to create and manage the second mapping for VMBus monitor pages. Because set_memory_decrypted() and set_memory_encrypted() are no-ops in normal VMs, it's not even necessary to test for being in a Confidential VM (a.k.a., "Isolation VM"). Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/1679838727-87310-9-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17	Merge remote-tracking branch 'tip/x86/sev' into hyperv-next	Wei Liu	2	-2/+1
	Merge the following 6 patches from tip/x86/sev, which are taken from Michael Kelley's series [0]. The rest of Michael's series depend on them. x86/hyperv: Change vTOM handling to use standard coco mechanisms init: Call mem_encrypt_init() after Hyper-V hypercall init is done x86/mm: Handle decryption/re-encryption of bss_decrypted consistently Drivers: hv: Explicitly request decrypted in vmap_pfn() calls x86/hyperv: Reorder code to facilitate future work x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM 0: https://lore.kernel.org/linux-hyperv/1679838727-87310-1-git-send-email-mikelley@microsoft.com/
2023-04-17	Driver: VMBus: Add Devicetree support	Saurabh Sengar	2	-6/+65
	Update the driver to support Devicetree boot as well along with ACPI. At present the Devicetree parsing only provides the mmio region info and is not the exact copy of ACPI parsing. This is sufficient to cater all the current Devicetree usecases for VMBus. Currently Devicetree is supported only for x86 systems. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1679298460-11855-6-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17	Drivers: hv: vmbus: Convert acpi_device to more generic platform_device	Saurabh Sengar	1	-23/+35
	VMBus driver code currently has direct dependency on ACPI and struct acpi_device. As a staging step toward optionally configuring based on Devicetree instead of ACPI, use a more generic platform device to reduce the dependency on ACPI where possible, though the dependency on ACPI is not completely removed. Also rename the function vmbus_acpi_remove() to the more generic vmbus_mmio_remove(). Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1679298460-11855-4-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-13	hv: simplify sysctl registration	Luis Chamberlain	1	-10/+1
	register_sysctl_table() is a deprecated compatibility wrapper. register_sysctl() can do the directory creation for you so just use that. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Wei Liu <wei.liu@kernel.org>
2023-03-27	x86/hyperv: Change vTOM handling to use standard coco mechanisms	Michael Kelley	1	-1/+0
	Hyper-V guests on AMD SEV-SNP hardware have the option of using the "virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP architecture. With vTOM, shared vs. private memory accesses are controlled by splitting the guest physical address space into two halves. vTOM is the dividing line where the uppermost bit of the physical address space is set; e.g., with 47 bits of guest physical address space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is accessible at two parallel physical addresses -- one below vTOM and one above vTOM. Accesses below vTOM are private (encrypted) while accesses above vTOM are shared (decrypted). In this sense, vTOM is like the GPA.SHARED bit in Intel TDX. Support for Hyper-V guests using vTOM was added to the Linux kernel in two patch sets[1][2]. This support treats the vTOM bit as part of the physical address. For accessing shared (decrypted) memory, these patch sets create a second kernel virtual mapping that maps to physical addresses above vTOM. A better approach is to treat the vTOM bit as a protection flag, not as part of the physical address. This new approach is like the approach for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel virtual mapping, the existing mapping is updated using recently added coco mechanisms. When memory is changed between private and shared using set_memory_decrypted() and set_memory_encrypted(), the PTEs for the existing kernel mapping are changed to add or remove the vTOM bit in the guest physical address, just as with TDX. The hypercalls to change the memory status on the host side are made using the existing callback mechanism. Everything just works, with a minor tweak to map the IO-APIC to use private accesses. To accomplish the switch in approach, the following must be done: * Update Hyper-V initialization to set the cc_mask based on vTOM and do other coco initialization. * Update physical_mask so the vTOM bit is no longer treated as part of the physical address * Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear the vTOM bit as a protection flag. * Code already exists to make hypercalls to inform Hyper-V about pages changing between shared and private. Update this code to run as a callback from __set_memory_enc_pgtable(). * Remove the Hyper-V special case from __set_memory_enc_dec() * Remove the Hyper-V specific call to swiotlb_update_mem_attributes() since mem_encrypt_init() will now do it. * Add a Hyper-V specific implementation of the is_private_mmio() callback that returns true for the IO-APIC and vTPM MMIO addresses [1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/ [2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/ [ bp: Touchups. ] Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-27	Drivers: hv: Explicitly request decrypted in vmap_pfn() calls	Michael Kelley	1	-1/+1
	Update vmap_pfn() calls to explicitly request that the mapping be for decrypted access to the memory. There's no change in functionality since the PFNs passed to vmap_pfn() are above the shared_gpa_boundary, implicitly producing a decrypted mapping. But explicitly requesting "decrypted" allows the code to work before and after changes that cause vmap_pfn() to mask the PFNs to being below the shared_gpa_boundary. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/1679838727-87310-4-git-send-email-mikelley@microsoft.com
2023-03-23	driver core: bus: mark the struct bus_type for sysfs callbacks as constant	Greg Kroah-Hartman	1	-1/+1
	struct bus_type should never be modified in a sysfs callback as there is nothing in the structure to modify, and frankly, the structure is almost never used in a sysfs callback, so mark it as constant to allow struct bus_type to be moved to read-only memory. Cc: "David S. Miller" <davem@davemloft.net> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Bounine <alex.bou9@gmail.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Ben Widawsky <bwidawsk@kernel.org> Cc: Dexuan Cui <decui@microsoft.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Harald Freudenberger <freude@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hu Haowen <src.res@email.cn> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Stuart Yoder <stuyoder@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Acked-by: Ilya Dryomov <idryomov@gmail.com> # rbd Acked-by: Ira Weiny <ira.weiny@intel.com> # cxl Reviewed-by: Alex Shi <alexs@kernel.org> Acked-by: Iwona Winiarska <iwona.winiarska@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi Link: https://lore.kernel.org/r/20230313182918.1312597-23-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-06	Drivers: vmbus: Check for channel allocation before looking up relids	Mohammed Gamal	1	-0/+4
	relid2channel() assumes vmbus channel array to be allocated when called. However, in cases such as kdump/kexec, not all relids will be reset by the host. When the second kernel boots and if the guest receives a vmbus interrupt during vmbus driver initialization before vmbus_connect() is called, before it finishes, or if it fails, the vmbus interrupt service routine is called which in turn calls relid2channel() and can cause a null pointer dereference. Print a warning and error out in relid2channel() for a channel id that's invalid in the second kernel. Fixes: 8b6a877c060e ("Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels") Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/20230217204411.212709-1-mgamal@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-02-24	Merge tag 'driver-core-6.3-rc1' of ↵	Linus Torvalds	1	-2/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.3-rc1. There's a lot of changes this development cycle, most of the work falls into two different categories: - fw_devlink fixes and updates. This has gone through numerous review cycles and lots of review and testing by lots of different devices. Hopefully all should be good now, and Saravana will be keeping a watch for any potential regression on odd embedded systems. - driver core changes to work to make struct bus_type able to be moved into read-only memory (i.e. const) The recent work with Rust has pointed out a number of areas in the driver core where we are passing around and working with structures that really do not have to be dynamic at all, and they should be able to be read-only making things safer overall. This is the contuation of that work (started last release with kobject changes) in moving struct bus_type to be constant. We didn't quite make it for this release, but the remaining patches will be finished up for the release after this one, but the groundwork has been laid for this effort. Other than that we have in here: - debugfs memory leak fixes in some subsystems - error path cleanups and fixes for some never-able-to-be-hit codepaths. - cacheinfo rework and fixes - Other tiny fixes, full details are in the shortlog All of these have been in linux-next for a while with no reported problems" [ Geert Uytterhoeven points out that that last sentence isn't true, and that there's a pending report that has a fix that is queued up - Linus ] * tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits) debugfs: drop inline constant formatting for ERR_PTR(-ERROR) OPP: fix error checking in opp_migrate_dentry() debugfs: update comment of debugfs_rename() i3c: fix device.h kernel-doc warnings dma-mapping: no need to pass a bus_type into get_arch_dma_ops() driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place Revert "driver core: add error handling for devtmpfs_create_node()" Revert "devtmpfs: add debug info to handle()" Revert "devtmpfs: remove return value of devtmpfs_delete_node()" driver core: cpu: don't hand-override the uevent bus_type callback. devtmpfs: remove return value of devtmpfs_delete_node() devtmpfs: add debug info to handle() driver core: add error handling for devtmpfs_create_node() driver core: bus: update my copyright notice driver core: bus: add bus_get_dev_root() function driver core: bus: constify bus_unregister() driver core: bus: constify some internal functions driver core: bus: constify bus_get_kset() driver core: bus: constify bus_register/unregister_notifier() driver core: remove private pointer from struct bus_type ...