kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2026-01-12	selftests: ublk: add utility to get block device metadata size	Caleb Sander Mateos	2	-2/+39
	Some block device integrity parameters are available in sysfs, but others are only accessible using the FS_IOC_GETLBMD_CAP ioctl. Add a metadata_size utility program to print out the logical block metadata size, PI offset, and PI size within the metadata. Example output: $ metadata_size /dev/ublkb0 metadata_size: 64 pi_offset: 56 pi_tuple_size: 8 Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12	selftests: ublk: display UBLK_F_INTEGRITY support	Caleb Sander Mateos	1	-0/+1
	Add support for printing the UBLK_F_INTEGRITY feature flag in the human-readable kublk features output. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-11	Merge branch 'block-6.19' into for-7.0/block	Jens Axboe	3	-6/+83
	Merge in fixes that went to 6.19 after for-7.0/block was branched. Pending ublk changes depend on particularly the async scan work. * block-6.19: block: zero non-PI portion of auto integrity buffer ublk: fix use-after-free in ublk_partition_scan_work blk-mq: avoid stall during boot due to synchronize_rcu_expedited loop: add missing bd_abort_claiming in loop_set_status block: don't merge bios with different app_tags blk-rq-qos: Remove unlikely() hints from QoS checks loop: don't change loop device under exclusive opener in loop_set_status block, bfq: update outdated comment blk-mq: skip CPU offline notify on unmapped hctx selftests/ublk: fix Makefile to rebuild on header changes selftests/ublk: add test for async partition scan ublk: scan partition in async way block,bfq: fix aux stat accumulation destination md: Fix forward incompatibility from configurable logical block size md: Fix logical_block_size configuration being overwritten md: suspend array while updating raid_disks via sysfs md/raid5: fix possible null-pointer dereferences in raid5_store_group_thread_cnt() md: Fix static checker warning in analyze_sbs
2026-01-11	tools/nolibc: Add a simple test for writing to a FILE and reading it back	Daniel Palmer	1	-0/+53
	Add a test that exercises create->write->seek->read to check that using the stream functions (fwrite() etc) is not totally broken. The only edge cases this is testing for are: - Reading the file after writing but without rewinding reads nothing. - Trying to read more items than the file contains returns the count of fully read items. Signed-off-by: Daniel Palmer <daniel@thingy.jp> Link: https://patch.msgid.link/20260105023629.1502801-4-daniel@thingy.jp Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2026-01-11	selftests/nolibc: also test libc-test through regular selftest framework	Thomas Weißschuh	1	-1/+5
	Hook up libc-test to the regular selftest build to make sure nolibc-test.c stays compatible with a normal libc. As the pattern rule from lib.mk does not handle compiling a target from a differently named source file, add an explicit rule definition. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://patch.msgid.link/20260106-nolibc-selftests-v1-3-f82101c2c505@weissschuh.net
2026-01-11	selftests/nolibc: scope custom flags to the nolibc-test target	Thomas Weißschuh	1	-6/+2
	A new target for 'libc-test' is going to be added which should not be affected by these options. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://patch.msgid.link/20260106-nolibc-selftests-v1-2-f82101c2c505@weissschuh.net
2026-01-11	selftests/nolibc: try to read from stdin in readv_zero test	Thomas Weißschuh	1	-1/+1
	When stdout is redirected to a file this test fails. This happens when running through the kselftest runner since commit d9e6269e3303 ("selftests/run_kselftest.sh: exit with error if tests fail"). For consistency with other tests that read from a file descriptor, switch to stdin over stdout. The tests are still brittle against a redirected stdin, but at least they are now consistently so. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://patch.msgid.link/20260106-nolibc-selftests-v1-1-f82101c2c505@weissschuh.net
2026-01-11	Merge tag 'linux_kselftest-fixes-6.19-rc5' of ↵	Linus Torvalds	1	-1/+17
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fix from Shuah Khan: "Fix tracing test_multiple_writes stalls when buffer_size_kb is less than 12KB" * tag 'linux_kselftest-fixes-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/tracing: Fix test_multiple_writes stall
2026-01-11	selftests: net: py: ensure defer() is only used within a test case	Jakub Kicinski	2	-0/+10
	I wasted a couple of hours recently after accidentally adding a defer() from within a function which itself was called as part of defer(). This leads to an infinite loop of defer(). Make sure this cannot happen and raise a helpful exception. I understand that the pair of _ksft_defer_arm() calls may not be the most Pythonic way to implement this, but it's easy enough to understand. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20260108225257.2684238-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-11	selftests: net: py: capitalize defer queue and improve import	Jakub Kicinski	2	-6/+6
	Import utils and refer to the global defer queue that way instead of importing the queue. This will make it possible to assign value to the global variable. While at it capitalize the name, to comply with the Python coding style. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20260108225257.2684238-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-11	selftests/net: parametrise iou-zcrx.py with ksft_variants	David Wei	1	-89/+73
	Use ksft_variants to parametrise tests in iou-zcrx.py to either use single queues or RSS contexts, reducing duplication. Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20260108234521.3619621-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-11	selftests: drv-net: psp: Better control the used PSP dev	Cosmin Ratiu	3	-29/+26
	The PSP responder fails when zero or multiple PSP devices are detected. There's an option to select the device id to use (-d) but it's currently not used from the PSP self test. It's also hard to use because the PSP test doesn't dump the PSP devices so can't choose one. When zero devices are detected, psp_responder fails which will cause the parent test to fail as well instead of skipping PSP tests. Fix both of these problems. Change psp_responder to: - not fail when no PSP devs are detected. - get an optional -i ifindex argument instead of -d. - select the correct PSP dev from the dump corresponding to ifindex or - select the first PSP dev when -i is not given. - fail when multiple devs are found and -i is not given. - warn and continue when the requested ifindex is not found. Also plumb the ifindex from the Python test. With these, when there are no PSP devs found or the wrong one is chosen, psp_responder opens the server socket, listens for control connections normally, and leaves the skipping of the various test cases which require a PSP device (~most, but not all of them) to the parent test. This results in output like: ok 1 psp.test_case # SKIP No PSP devices found [...] ok 12 psp.dev_get_device # SKIP No PSP devices found ok 13 psp.dev_get_device_bad ok 14 psp.dev_rotate # SKIP No PSP devices found [...] Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Link: https://patch.msgid.link/20260109110851.2952906-2-cratiu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10	vsock/test: add a final full barrier after run all tests	Stefano Garzarella	1	-0/+12
	If the last test fails, the other side still completes correctly, which could lead to false positives. Let's add a final barrier that ensures that the last test has finished correctly on both sides, but also that the two sides agree on the number of tests to be performed. Fixes: 2f65b44e199c ("VSOCK: add full barrier between test cases") Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260108114419.52747-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10	selftests: kvm: Verify TILELOADD actually #NM faults when XFD[18]=1	Sean Christopherson	1	-12/+18
	Rework the AMX test's #NM handling to use kvm_asm_safe() to verify an #NM actually occurs. As is, a completely missing #NM could go unnoticed. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-01-10	selftests: kvm: try getting XFD and XSAVE state out of sync	Paolo Bonzini	1	-8/+30
	The host is allowed to set FPU state that includes a disabled xstate component. Check that this does not cause bad effects. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-01-10	selftests: kvm: replace numbered sync points with actions	Paolo Bonzini	1	-45/+43
	Rework the guest=>host syncs in the AMX test to use named actions instead of arbitrary, incrementing numbers. The "stage" of the test has no real meaning, what matters is what action the test wants the host to perform. The incrementing numbers are somewhat helpful for triaging failures, but fully debugging failures almost always requires a much deeper dive into the test (and KVM). Using named actions not only makes it easier to extend the test without having to shift all sync point numbers, it makes the code easier to read. [Commit message by Sean Christopherson] Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-01-10	selftests: forwarding: update PTP tcpdump patterns	Jakub Kicinski	1	-9/+9
	Recent version of tcpdump (tcpdump-4.99.6-1.fc43.x86_64) seems to have removed the spurious space after msg type in PTP info, e.g.: before: PTPv2, majorSdoId: 0x0, msg type : sync msg, length: 44 after: PTPv2, majorSdoId: 0x0, msg type: sync msg, length: 44 Update our patterns to match both. Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Link: https://patch.msgid.link/20260107145320.1837464-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10	selftests: drv-net: gro: increase the rcvbuf size	Jakub Kicinski	1	-1/+24
	The gro.py test (testing software GRO) is slightly flaky when running against fbnic. We see one flake per roughly 20 runs in NIPA, mostly in ipip.large, and always including some EAGAIN: # Shouldn't coalesce if exceed IP max pkt size: Test succeeded # Expected {65475 899 }, Total 2 packets # Received {65475 899 }, Total 2 packets. # Expected {64576 900 900 }, Total 3 packets # Received {64576 /home/virtme/testing/wt-24/tools/testing/selftests/drivers/net/gro: could not receive: Resource temporarily unavailable The test sends 2 large frames (64k + change). Looks like the default packet socket rcvbuf (~200kB) may not be large enough to hold them. Bump the rcvbuf to 1MB. Add a debug print showing socket statistics to make debugging this issue easier in the future. Without the rcvbuf increase we see: # Shouldn't coalesce if exceed IP max pkt size: Test succeeded # Expected {65475 899 }, Total 2 packets # Received {65475 899 }, Total 2 packets. # Expected {64576 900 900 }, Total 3 packets # Received {64576 Socket stats: packets=7, drops=3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # /home/virtme/testing/wt-24/tools/testing/selftests/drivers/net/gro: could not receive: Resource temporarily unavailable Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260107232557.2147760-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10	selftests: tls: avoid flakiness in data_steal	Jakub Kicinski	1	-4/+12
	We see the following failure a few times a week: # RUN global.data_steal ... # tls.c:3280:data_steal:Expected recv(cfd, buf2, sizeof(buf2), MSG_DONTWAIT) (10000) == -1 (-1) # data_steal: Test failed # FAIL global.data_steal not ok 8 global.data_steal The 10000 bytes read suggests that the child process did a recv() of half of the data using the TLS ULP and we're now getting the remaining half. The intent of the test is to get the child to enter _TCP_ recvmsg handler, so it needs to enter the syscall before parent installed the TLS recvmsg with setsockopt(SOL_TLS). Instead of the 10msec sleep send 1 byte of data and wait for the child to consume it. Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20260106200205.1593915-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10	selftests/resctrl: Fix non-contiguous CBM check for Hygon	Xiaochen Shen	1	-2/+4
	The resctrl selftest currently fails on Hygon CPUs that always supports non-contiguous CBM, printing the error: "# Hardware and kernel differ on non-contiguous CBM support!" This occurs because the arch_supports_noncont_cat() function lacks vendor detection for Hygon CPUs, preventing proper identification of their non-contiguous CBM capability. Fix this by adding Hygon vendor ID detection to arch_supports_noncont_cat(). Link: https://lore.kernel.org/r/20251217030456.3834956-5-shenxiaochen@open-hieco.net Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-10	selftests/resctrl: Add CPU vendor detection for Hygon	Xiaochen Shen	2	-0/+3
	The resctrl selftest currently fails on Hygon CPUs that support Platform QoS features, printing the error: "# Can not get vendor info..." This occurs because vendor detection is missing for Hygon CPUs. Fix this by extending the CPU vendor detection logic to include Hygon's vendor ID. Link: https://lore.kernel.org/r/20251217030456.3834956-4-shenxiaochen@open-hieco.net Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-10	selftests/resctrl: Define CPU vendor IDs as bits to match usage	Xiaochen Shen	2	-11/+22
	The CPU vendor IDs are required to be unique bits because they're used for vendor_specific bitmask in the struct resctrl_test. Consider for example their usage in test_vendor_specific_check(): return get_vendor() & test->vendor_specific However, the definitions of CPU vendor IDs in file resctrl.h is quite subtle as a bitmask value: #define ARCH_INTEL 1 #define ARCH_AMD 2 A clearer and more maintainable approach is to define these CPU vendor IDs using BIT(). This ensures each vendor corresponds to a distinct bit and makes it obvious when adding new vendor IDs. Accordingly, update the return types of detect_vendor() and get_vendor() from 'int' to 'unsigned int' to align with their usage as bitmask values and to prevent potentially risky type conversions. Furthermore, introduce a bool flag 'initialized' to simplify the get_vendor() -> detect_vendor() logic. This ensures the vendor ID is detected only once and resolves the ambiguity of using the same variable 'vendor' both as a value and as a state. Link: https://lore.kernel.org/r/20251217030456.3834956-3-shenxiaochen@open-hieco.net Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Suggested-by: Fenghua Yu <fenghuay@nvidia.com> Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-10	selftests/resctrl: Fix a division by zero error on Hygon	Xiaochen Shen	1	-0/+10
	Change to adjust effective L3 cache size with SNC enabled change introduced the snc_nodes_per_l3_cache() function to detect the Intel Sub-NUMA Clustering (SNC) feature by comparing #CPUs in node0 with #CPUs sharing LLC with CPU0. The function was designed to return: (1) >1: SNC mode is enabled. (2) 1: SNC mode is not enabled or not supported. However, on certain Hygon CPUs, #CPUs sharing LLC with CPU0 is actually less than #CPUs in node0. This results in snc_nodes_per_l3_cache() returning 0 (calculated as cache_cpus / node_cpus). This leads to a division by zero error in get_cache_size(): *cache_size /= snc_nodes_per_l3_cache(); Causing the resctrl selftest to fail with: "Floating point exception (core dumped)" Fix the issue by ensuring snc_nodes_per_l3_cache() returns 1 when SNC mode is not supported on the platform. Updated commit log to fix commit has issues: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20251217030456.3834956-2-shenxiaochen@open-hieco.net Fixes: a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled") Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-10	selftests/tracing: Fix test_multiple_writes stall	Fushuai Wang	1	-1/+17
	When /sys/kernel/tracing/buffer_size_kb is less than 12KB, the test_multiple_writes test will stall and wait for more input due to insufficient buffer space. Check current buffer_size_kb value before the test. If it is less than 12KB, it temporarily increase the buffer to 12KB, and restore the original value after the tests are completed. Link: https://lore.kernel.org/r/20260109033620.25727-1-fushuai.wang@linux.dev Fixes: 37f46601383a ("selftests/tracing: Add basic test for trace_marker_raw file") Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-09	selftests/landlock: Properly close a file descriptor	Günther Noack	1	-1/+2
	Add a missing close(srv_fd) call, and use EXPECT_EQ() to check the result. Signed-off-by: Günther Noack <gnoack3000@gmail.com> Fixes: f83d51a5bdfe ("selftests/landlock: Check IOCTL restrictions for named UNIX domain sockets") Link: https://lore.kernel.org/r/20260101134102.25938-2-gnoack3000@gmail.com [mic: Use EXPECT_EQ() and update commit message] Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-01-09	memblock: drop redundant 'struct page *' argument from memblock_free_pages()	Shengming Hu	1	-2/+1
	memblock_free_pages() currently takes both a struct page * and the corresponding PFN. The page pointer is always derived from the PFN at call sites (pfn_to_page(pfn)), making the parameter redundant and also allowing accidental mismatches between the two arguments. Simplify the interface by removing the struct page * argument and deriving the page locally from the PFN, after the deferred struct page initialization check. This keeps the behavior unchanged while making the helper harder to misuse. Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org> Link: https://patch.msgid.link/tencent_F741CE6ECC49EE099736685E60C0DBD4A209@qq.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-01-08	KVM: selftests: Extend vmx_set_nested_state_test to cover SVM	Yosry Ahmed	2	-12/+112
	Add test cases for the validation checks in svm_set_nested_state(), and allow the test to run with SVM as well as VMX. The SVM test also makes sure that KVM_SET_NESTED_STATE accepts GIF being set or cleared if EFER.SVME is cleared, verifying a recently fixed bug where GIF was incorrectly expected to always be set when EFER.SVME is cleared. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251121204803.991707-5-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Use TEST_ASSERT_EQ() in test_vmx_nested_state()	Yosry Ahmed	1	-2/+4
	The assert messages do not add much value, so use TEST_ASSERT_EQ(), which also nicely displays the addresses in hex. While at it, also assert the values of state->flags. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251121204803.991707-4-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Rename vm_get_page_table_entry() to vm_get_pte()	Sean Christopherson	4	-6/+4
	Shorten the API to get a PTE as the "PTE" acronym is ubiquitous, and the "page table entry" makes it unnecessarily difficult to quickly understand what callers are doing. No functional change intended. Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-21-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Extend memstress to run on nested SVM	Yosry Ahmed	1	-7/+35
	Add L1 SVM code and generalize the setup code to work for both VMX and SVM. This allows running 'dirty_log_perf_test -n' on AMD CPUs. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Extend vmx_dirty_log_test to cover SVM	Yosry Ahmed	2	-21/+54
	Generalize the code in vmx_dirty_log_test.c by adding SVM-specific L1 code, doing some renaming (e.g. EPT -> TDP), and having setup code for both SVM and VMX in test_dirty_log(). Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Set the user bit on nested NPT PTEs	Yosry Ahmed	4	-2/+9
	According to the APM, NPT walks are treated as user accesses. In preparation for supporting NPT mappings, set the 'user' bit on NPTs by adding a mask of bits to always be set on PTEs in kvm_mmu. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Add support for nested NPTs	Yosry Ahmed	6	-4/+54
	Implement nCR3 and NPT initialization functions, similar to the EPT equivalents, and create common TDP helpers for enablement checking and initialization. Enable NPT for nested guests by default if the TDP MMU was initialized, similar to VMX. Reuse the PTE masks from the main MMU in the NPT MMU, except for the C and S bits related to confidential VMs. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-17-seanjc@google.com [sean: apply Yosry's fixup for ncr3_gpa] Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs	Yosry Ahmed	1	-0/+3
	In preparation for generalizing the nested dirty logging test, checking if either EPT or NPT is enabled will be needed. To avoid needing to gate the kvm_cpu_has_ept() call by the CPU type, make sure the function returns false if VMX is not available instead of trying to read VMX-only MSRs. No functional change intended. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Move TDP mapping functions outside of vmx.c	Sean Christopherson	4	-74/+57
	Now that the functions are no longer VMX-specific, move them to processor.c. Do a minor comment tweak replacing 'EPT' with 'TDP'. No functional change intended. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Reuse virt mapping functions for nested EPTs	Yosry Ahmed	4	-108/+52
	Rework tdp_map() and friends to use __virt_pg_map() and drop the custom EPT code in __tdp_pg_map() and tdp_create_pte(). The EPT code and __virt_pg_map() are practically identical, the main differences are: - EPT uses the EPT struct overlay instead of the PTE masks. - EPT always assumes 4-level EPTs. To reuse __virt_pg_map(), extend the PTE masks to work with EPT's RWX and X-only capabilities, and provide a tdp_mmu_init() API so that EPT can pass in the EPT PTE masks along with the root page level (which is currently hardcoded to '4'). Don't reuse KVM's insane overloading of the USER bit for EPT_R as there's no reason to multiplex bits in the selftests, e.g. selftests aren't trying to shadow guest PTEs and thus don't care about funnelling protections into a common permissions check. Another benefit of reusing the code is having separate handling for upper-level PTEs vs 4K PTEs, which avoids some quirks like setting the large bit on a 4K PTE in the EPTs. For all intents and purposes, no functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Co-developed-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20251230230150.4150236-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Add a stage-2 MMU instance to kvm_vm	Sean Christopherson	1	-0/+5
	Add a stage-2 MMU instance so that architectures that support nested virtualization (more specifically, nested stage-2 page tables) can create and track stage-2 page tables for running L2 guests. Plumb the structure into common code to avoid cyclical dependencies, and to provide some line of sight to having common APIs for creating stage-2 mappings. As a bonus, putting the member in common code justifies using stage2_mmu instead of tdp_mmu for x86. Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Stop passing VMX metadata to TDP mapping functions	Yosry Ahmed	4	-40/+24
	The root GPA is now retrieved from the nested MMU, stop passing VMX metadata. This is in preparation for making these functions work for NPTs as well. Opportunistically drop tdp_pg_map() since it's unused. No functional change intended. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Use a TDP MMU to share EPT page tables between vCPUs	Yosry Ahmed	7	-32/+48
	prepare_eptp() currently allocates new EPTs for each vCPU. memstress has its own hack to share the EPTs between vCPUs. Currently, there is no reason to have separate EPTs for each vCPU, and the complexity is significant. The only reason it doesn't matter now is because memstress is the only user with multiple vCPUs. Add vm_enable_ept() to allocate EPT page tables for an entire VM, and use it everywhere to replace prepare_eptp(). Drop 'eptp' and 'eptp_hva' from 'struct vmx_pages' as they serve no purpose (e.g. the EPTP can be built from the PGD), but keep 'eptp_gpa' so that the MMU structure doesn't need to be passed in along with vmx_pages. Dynamically allocate the TDP MMU structure to avoid a cyclical dependency between kvm_util_arch.h and kvm_util.h. Remove the workaround in memstress to copy the EPT root between vCPUs since that's now the default behavior. Name the MMU tdp_mmu instead of e.g. nested_mmu or nested.mmu to avoid recreating the same mess that KVM has with respect to "nested" MMUs, e.g. does nested refer to the stage-2 page tables created by L1, or the stage-1 page tables created by L2? Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Co-developed-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20251230230150.4150236-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Move PTE bitmasks to kvm_mmu	Yosry Ahmed	3	-39/+76
	Move the PTE bitmasks into kvm_mmu to parameterize them for virt mapping functions. Introduce helpers to read/write different PTE bits given a kvm_mmu. Drop the 'global' bit definition as it's currently unused, but leave the 'user' bit as it will be used in coming changes. Opportunisitcally rename 'large' to 'huge' as it's more consistent with the kernel naming. Leave PHYSICAL_PAGE_MASK alone, it's fixed in all page table formats and a lot of other macros depend on it. It's tempting to move all the other macros to be per-struct instead, but it would be too much noise for little benefit. Keep c_bit and s_bit in vm->arch as they used before the MMU is initialized, through __vmcreate() -> vm_userspace_mem_region_add() -> vm_mem_add() -> vm_arch_has_protected_memory(). No functional change intended. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> [sean: rename accessors to is_<adjective>_pte()] Link: https://patch.msgid.link/20251230230150.4150236-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Add a "struct kvm_mmu_arch arch" member to kvm_mmu	Sean Christopherson	6	-0/+9
	Add an arch structure+field in "struct kvm_mmu" so that architectures can track arch-specific information for a given MMU. No functional change intended. Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Plumb "struct kvm_mmu" into x86's MMU APIs	Sean Christopherson	2	-28/+39
	In preparation for generalizing the x86 virt mapping APIs to work with TDP (stage-2) page tables, plumb "struct kvm_mmu" into all of the helper functions instead of operating on vm->mmu directly. Opportunistically swap the order of the check in virt_get_pte() to first assert that the parent is the PGD, and then check that the PTE is present, as it makes more sense to check if the parent PTE is the PGD/root (i.e. not a PTE) before checking that the PTE is PRESENT. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> [sean: rebase on common kvm_mmu structure, rewrite changelog] Link: https://patch.msgid.link/20251230230150.4150236-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Add "struct kvm_mmu" to track a given MMU instance	Sean Christopherson	8	-88/+94
	Add a "struct kvm_mmu" to track a given MMU instance, e.g. a VM's stage-1 MMU versus a VM's stage-2 MMU, so that x86 can share MMU functionality for both stage-1 and stage-2 MMUs, without creating the potential for subtle bugs, e.g. due to consuming on vm->pgtable_levels when operating a stage-2 MMU. Encapsulate the existing de facto MMU in "struct kvm_vm", e.g instead of burying the MMU details in "struct kvm_vm_arch", to avoid more #ifdefs in ____vm_create(), and in the hopes that other architectures can utilize the formalized MMU structure if/when they too support stage-2 page tables. No functional change intended. Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Stop setting A/D bits when creating EPT PTEs	Yosry Ahmed	1	-8/+0
	Stop setting Accessed/Dirty bits when creating EPT entries for L2 so that the stage-1 and stage-2 (a.k.a. TDP) page table APIs can use common code without bleeding the EPT hack into the common APIs. While commit 094444204570 ("selftests: kvm: add test for dirty logging inside nested guests") is _very_ light on details, the most likely explanation is that vmx_dirty_log_test was attempting to avoid taking an EPT Violation on the first _write_ from L2. static void l2_guest_code(u64 a, u64 b) { READ_ONCE(a); WRITE_ONCE(a, 1); <=== GUEST_SYNC(true); ... } When handling read faults in the shadow MMU, KVM opportunistically creates a writable SPTE if the mapping can be writable and the gPTE is dirty (or doesn't support the Dirty bit), i.e. if KVM doesn't need to intercept writes in order to emulate Dirty-bit updates. By setting A/D bits in the test's EPT entries, the above READ+WRITE will fault only on the read, and in theory expose the bug fixed by KVM commit 1f4e5fc83a42 ("KVM: x86: fix nested guest live migration with PML"). If the Dirty bit is NOT set, the test will get a false pass due; though again, in theory. However, the test is flawed (and always was, at least in the versions posted publicly), as KVM (correctly) marks the corresponding L1 GFN as dirty (in the dirty bitmap) when creating the writable SPTE. I.e. without a check on the dirty bitmap after the READ_ONCE(), the check after the first WRITE_ONCE() will get a false pass due to the dirty bitmap/log having been updated by the read fault, not by PML. Furthermore, the subsequent behavior in the test's l2_guest_code() effectively hides the flawed test behavior, as the straight writes to a new L2 GPA fault also trigger the KVM bug, and so the test will still detect the failure due to lack of isolation between the two testcases (Read=>Write vs. Write=>Write). WRITE_ONCE(b, 1); GUEST_SYNC(true); WRITE_ONCE(b, 1); GUEST_SYNC(true); GUEST_SYNC(false); Punt on fixing vmx_dirty_log_test for the moment as it will be easier to properly fix the test once the TDP code uses the common MMU APIs, at which point it will be trivially easy for the test to retrieve the EPT PTE and set the Dirty bit as needed. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> [sean: rewrite changelog to explain the situation] Link: https://patch.msgid.link/20251230230150.4150236-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Kill eptPageTablePointer	Yosry Ahmed	1	-20/+17
	Replace the struct overlay with explicit bitmasks, which is clearer and less error-prone. See commit f18b4aebe107 ("kvm: selftests: do not use bitfields larger than 32-bits for PTEs") for an example of why bitfields are not preferable. Remove the unused PAGE_SHIFT_4K definition while at it. No functional change intended. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Rename nested TDP mapping functions	Yosry Ahmed	4	-39/+37
	Rename the functions from nested_* to tdp_* to make their purpose clearer. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Stop passing a memslot to nested_map_memslot()	Yosry Ahmed	3	-7/+11
	On x86, KVM selftests use memslot 0 for all the default regions used by the test infrastructure. This is an implementation detail. nested_map_memslot() is currently used to map the default regions by explicitly passing slot 0, which leaks the library implementation into the caller. Rename the function to a very verbose nested_identity_map_default_memslots() to reflect what it actually does. Add an assertion that only memslot 0 is being used so that the implementation does not change from under us. No functional change intended. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Make __vm_get_page_table_entry() static	Yosry Ahmed	2	-4/+2
	The function is only used in processor.c, drop the declaration in processor.h and make it static. No functional change intended. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Fix sign extension bug in get_desc64_base()	MJ Pooladkhay	1	-2/+4
	The function get_desc64_base() performs a series of bitwise left shifts on fields of various sizes. More specifically, when performing '<< 24' on 'desc->base2' (which is a u8), 'base2' is promoted to a signed integer before shifting. In a scenario where base2 >= 0x80, the shift places a 1 into bit 31, causing the 32-bit intermediate value to become negative. When this result is cast to uint64_t or ORed into the return value, sign extension occurs, corrupting the upper 32 bits of the address (base3). Example: Given: base0 = 0x5000 base1 = 0xd6 base2 = 0xf8 base3 = 0xfffffe7c Expected return: 0xfffffe7cf8d65000 Actual return: 0xfffffffff8d65000 Fix this by explicitly casting the fields to 'uint64_t' before shifting to prevent sign extension. Signed-off-by: MJ Pooladkhay <mj@pooladkhay.com> Link: https://patch.msgid.link/20251222174207.107331-1-mj@pooladkhay.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	26	-40/+356
	Cross-merge networking fixes after downstream PR (net-6.19-rc5). No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>