<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/virt/kvm/dirty_ring.c, branch linux-7.1.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-7.1.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-7.1.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-05-12T20:16:16+00:00</updated>
<entry>
<title>KVM: Reject wrapped offset in kvm_reset_dirty_gfn()</title>
<updated>2026-05-12T20:16:16+00:00</updated>
<author>
<name>Aaron Sacks</name>
<email>contact@xchglabs.com</email>
</author>
<published>2026-05-12T06:07:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=577a8d3bae0531f0e5ccfac919cd8192f920a804'/>
<id>urn:sha1:577a8d3bae0531f0e5ccfac919cd8192f920a804</id>
<content type='text'>
kvm_reset_dirty_gfn() guards the gfn range with

	if (!memslot || (offset + __fls(mask)) &gt;= memslot-&gt;npages)
		return;

but offset is u64 and the addition is unchecked.  The check can be
silently bypassed by a u64 wrap.

The dirty ring backing those entries is MAP_SHARED at
KVM_DIRTY_LOG_PAGE_OFFSET of the vcpu fd, so the VMM can rewrite the
slot and offset fields of any entry between when the kernel pushes
them and when KVM_RESET_DIRTY_RINGS consumes them.  On reset,
kvm_dirty_ring_reset() re-reads the values via READ_ONCE() and feeds
them straight back into this check; only the flags handshake is
treated as the handover, the slot/offset payload is taken on trust.

Crafting two entries

	entry[i].offset   = 0xffffffffffffffc1
	entry[i+1].offset = 0

makes the coalescing loop in kvm_dirty_ring_reset() compute

	delta = (s64)(0 - 0xffffffffffffffc1) = 63

which falls in [0, BITS_PER_LONG), so it folds entry[i+1] into the
existing mask by setting bit 63.  The trailing kvm_reset_dirty_gfn()
call then sees offset = 0xffffffffffffffc1 and __fls(mask) = 63;
the sum is 0 in u64 and the bounds check passes.

That offset propagates into kvm_arch_mmu_enable_log_dirty_pt_masked()
unchanged.  On the legacy MMU path -- kvm_memslots_have_rmaps() ==
true, i.e. shadow paging, any VM that has allocated shadow roots, or
a write-tracked slot -- it reaches gfn_to_rmap(), which indexes
slot-&gt;arch.rmap[0][] with a near-U64_MAX gfn.  That is an
out-of-bounds load of a kvm_rmap_head, followed by a conditional
clear of PT_WRITABLE_MASK in whatever the loaded pointer points at.
The path is reachable from any process holding /dev/kvm.

Range-check offset on its own first, so the addition cannot wrap.
memslot-&gt;npages is bounded well below U64_MAX, so once offset &lt;
npages holds, offset + __fls(mask) (with __fls(mask) &lt; BITS_PER_LONG)
stays in range.

Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
Cc: stable@vger.kernel.org
Signed-off-by: Aaron Sacks &lt;contact@xchglabs.com&gt;
Link: https://patch.msgid.link/20260512060742.1628959-1-contact@xchglabs.com/
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Assert that slots_lock is held when resetting per-vCPU dirty rings</title>
<updated>2025-06-20T20:41:04+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-05-16T21:35:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=614fb9d1479b1d90721ca70da8b7c55f69fe9ad2'/>
<id>urn:sha1:614fb9d1479b1d90721ca70da8b7c55f69fe9ad2</id>
<content type='text'>
Assert that slots_lock is held in kvm_dirty_ring_reset() and add a comment
to explain _why_ slots needs to be held for the duration of the reset.

Link: https://lore.kernel.org/all/aCSns6Q5oTkdXUEe@google.com
Suggested-by: James Houghton &lt;jthoughton@google.com&gt;
Reviewed-by: Yan Zhao &lt;yan.y.zhao@intel.com&gt;
Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Link: https://lore.kernel.org/r/20250516213540.2546077-7-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets</title>
<updated>2025-06-20T20:41:03+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-05-16T21:35:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e46ad851150f1dd14b8542b6fb7a51f695a99eb1'/>
<id>urn:sha1:e46ad851150f1dd14b8542b6fb7a51f695a99eb1</id>
<content type='text'>
Use "mask" instead of a dedicated boolean to track whether or not there
is at least one to-be-reset entry for the current slot+offset.  In the
body of the loop, mask is zero only on the first iteration, i.e. !mask is
equivalent to first_round.

Opportunistically combine the adjacent "if (mask)" statements into a single
if-statement.

No functional change intended.

Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Yan Zhao &lt;yan.y.zhao@intel.com&gt;
Cc: Maxim Levitsky &lt;mlevitsk@redhat.com&gt;
Reviewed-by: Pankaj Gupta &lt;pankaj.gupta@amd.com&gt;
Reviewed-by: James Houghton &lt;jthoughton@google.com&gt;
Reviewed-by: Binbin Wu &lt;binbin.wu@linux.intel.com&gt;
Reviewed-by: Yan Zhao &lt;yan.y.zhao@intel.com&gt;
Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Link: https://lore.kernel.org/r/20250516213540.2546077-6-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: Check for empty mask of harvested dirty ring entries in caller</title>
<updated>2025-06-20T20:41:02+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-05-16T21:35:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ee188dea1677a0d9b3a8097e71b803896b8aaed7'/>
<id>urn:sha1:ee188dea1677a0d9b3a8097e71b803896b8aaed7</id>
<content type='text'>
When resetting a dirty ring, explicitly check that there is work to be
done before calling kvm_reset_dirty_gfn(), e.g. if no harvested entries
are found and/or on the loop's first iteration, and delete the extremely
misleading comment "This is only needed to make compilers happy".  KVM
absolutely relies on mask to be zero-initialized, i.e. the comment is an
outright lie.  Furthermore, the compiler is right to complain that KVM is
calling a function with uninitialized data, as there are no guarantees
the implementation details of kvm_reset_dirty_gfn() will be visible to
kvm_dirty_ring_reset().

While the flaw could be fixed by simply deleting (or rewording) the
comment, and duplicating the check is unfortunate, checking mask in the
caller will allow for additional cleanups.

Opportunistically drop the zero-initialization of cur_slot and cur_offset.
If a bug were introduced where either the slot or offset was consumed
before mask is set to a non-zero value, then it is highly desirable for
the compiler (or some other sanitizer) to yell.

Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Yan Zhao &lt;yan.y.zhao@intel.com&gt;
Cc: Maxim Levitsky &lt;mlevitsk@redhat.com&gt;
Cc: Binbin Wu &lt;binbin.wu@linux.intel.com&gt;
Reviewed-by: Pankaj Gupta &lt;pankaj.gupta@amd.com&gt;
Reviewed-by: James Houghton &lt;jthoughton@google.com&gt;
Reviewed-by: Binbin Wu &lt;binbin.wu@linux.intel.com&gt;
Reviewed-by: Yan Zhao &lt;yan.y.zhao@intel.com&gt;
Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Link: https://lore.kernel.org/r/20250516213540.2546077-5-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: Conditionally reschedule when resetting the dirty ring</title>
<updated>2025-06-20T20:39:42+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-05-16T21:35:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1333c35c4eea6533874564899371501ee80b9583'/>
<id>urn:sha1:1333c35c4eea6533874564899371501ee80b9583</id>
<content type='text'>
When resetting a dirty ring, conditionally reschedule on each iteration
after the first.  The recently introduced hard limit mitigates the issue
of an endless reset, but isn't sufficient to completely prevent RCU
stalls, soft lockups, etc., nor is the hard limit intended to guard
against such badness.

Note!  Take care to check for reschedule even in the "continue" paths,
as a pathological scenario (or malicious userspace) could dirty the same
gfn over and over, i.e. always hit the continue path.

  rcu: INFO: rcu_sched self-detected stall on CPU
  rcu: 	4-....: (5249 ticks this GP) idle=51e4/1/0x4000000000000000 softirq=309/309 fqs=2563
  rcu: 	(t=5250 jiffies g=-319 q=608 ncpus=24)
  CPU: 4 UID: 1000 PID: 1067 Comm: dirty_log_test Tainted: G             L     6.13.0-rc3-17fa7a24ea1e-HEAD-vm #814
  Tainted: [L]=SOFTLOCKUP
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:kvm_arch_mmu_enable_log_dirty_pt_masked+0x26/0x200 [kvm]
  Call Trace:
   &lt;TASK&gt;
   kvm_reset_dirty_gfn.part.0+0xb4/0xe0 [kvm]
   kvm_dirty_ring_reset+0x58/0x220 [kvm]
   kvm_vm_ioctl+0x10eb/0x15d0 [kvm]
   __x64_sys_ioctl+0x8b/0xb0
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   &lt;/TASK&gt;
  Tainted: [L]=SOFTLOCKUP
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:kvm_arch_mmu_enable_log_dirty_pt_masked+0x17/0x200 [kvm]
  Call Trace:
   &lt;TASK&gt;
   kvm_reset_dirty_gfn.part.0+0xb4/0xe0 [kvm]
   kvm_dirty_ring_reset+0x58/0x220 [kvm]
   kvm_vm_ioctl+0x10eb/0x15d0 [kvm]
   __x64_sys_ioctl+0x8b/0xb0
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   &lt;/TASK&gt;

Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
Reviewed-by: James Houghton &lt;jthoughton@google.com&gt;
Reviewed-by: Yan Zhao &lt;yan.y.zhao@intel.com&gt;
Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Link: https://lore.kernel.org/r/20250516213540.2546077-4-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: Bail from the dirty ring reset flow if a signal is pending</title>
<updated>2025-06-20T20:39:42+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-05-16T21:35:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=49005a2a3d2a2974db486e330dfb7c8bc85df17f'/>
<id>urn:sha1:49005a2a3d2a2974db486e330dfb7c8bc85df17f</id>
<content type='text'>
Abort a dirty ring reset if the current task has a pending signal, as the
hard limit of INT_MAX entries doesn't ensure KVM will respond to a signal
in a timely fashion.

Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
Reviewed-by: James Houghton &lt;jthoughton@google.com&gt;
Reviewed-by: Binbin Wu &lt;binbin.wu@linux.intel.com&gt;
Reviewed-by: Yan Zhao &lt;yan.y.zhao@intel.com&gt;
Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Link: https://lore.kernel.org/r/20250516213540.2546077-3-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: Bound the number of dirty ring entries in a single reset at INT_MAX</title>
<updated>2025-06-20T20:39:42+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-05-16T21:35:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=530a8ba71b4c3b7fcee323dd997f4bab1be1a6ba'/>
<id>urn:sha1:530a8ba71b4c3b7fcee323dd997f4bab1be1a6ba</id>
<content type='text'>
Cap the number of ring entries that are reset in a single ioctl to INT_MAX
to ensure userspace isn't confused by a wrap into negative space, and so
that, in a truly pathological scenario, KVM doesn't miss a TLB flush due
to the count wrapping to zero.  While the size of the ring is fixed at
0x10000 entries and KVM (currently) supports at most 4096, userspace is
allowed to harvest entries from the ring while the reset is in-progress,
i.e. it's possible for the ring to always have harvested entries.

Opportunistically return an actual error code from the helper so that a
future fix to handle pending signals can gracefully return -EINTR.  Drop
the function comment now that the return code is a stanard 0/-errno (and
because a future commit will add a proper lockdep assertion).

Opportunistically drop a similarly stale comment for kvm_dirty_ring_push().

Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Yan Zhao &lt;yan.y.zhao@intel.com&gt;
Cc: Maxim Levitsky &lt;mlevitsk@redhat.com&gt;
Cc: Binbin Wu &lt;binbin.wu@linux.intel.com&gt;
Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
Reviewed-by: James Houghton &lt;jthoughton@google.com&gt;
Reviewed-by: Binbin Wu &lt;binbin.wu@linux.intel.com&gt;
Reviewed-by: Yan Zhao &lt;yan.y.zhao@intel.com&gt;
Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Link: https://lore.kernel.org/r/20250516213540.2546077-2-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: Add parameter "kvm" to kvm_cpu_dirty_log_size() and its callers</title>
<updated>2025-03-14T18:20:53+00:00</updated>
<author>
<name>Yan Zhao</name>
<email>yan.y.zhao@intel.com</email>
</author>
<published>2025-01-13T03:08:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c4a92f12cf35b83ce81757f6e5e8eb6223b87388'/>
<id>urn:sha1:c4a92f12cf35b83ce81757f6e5e8eb6223b87388</id>
<content type='text'>
Add a parameter "kvm" to kvm_cpu_dirty_log_size() and down to its callers:
kvm_dirty_ring_get_rsvd_entries(), kvm_dirty_ring_alloc().

This is a preparation to make cpu_dirty_log_size a per-VM value rather than
a system-wide value.

No function changes expected.

Signed-off-by: Yan Zhao &lt;yan.y.zhao@intel.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Discard zero mask with function kvm_dirty_ring_reset</title>
<updated>2024-06-20T21:20:11+00:00</updated>
<author>
<name>Bibo Mao</name>
<email>maobibo@loongson.cn</email>
</author>
<published>2024-06-13T12:28:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=676f819c3e982db3695a371f336a05086585ea4f'/>
<id>urn:sha1:676f819c3e982db3695a371f336a05086585ea4f</id>
<content type='text'>
Function kvm_reset_dirty_gfn may be called with parameters cur_slot /
cur_offset / mask are all zero, it does not represent real dirty page.
It is not necessary to clear dirty page in this condition. Also return
value of macro __fls() is undefined if mask is zero which is called in
funciton kvm_reset_dirty_gfn(). Here just return.

Signed-off-by: Bibo Mao &lt;maobibo@loongson.cn&gt;
Message-ID: &lt;20240613122803.1031511-1-maobibo@loongson.cn&gt;
[Move the conditional inside kvm_reset_dirty_gfn; suggested by
 Sean Christopherson. - Paolo]
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Allow arch code to track number of memslot address spaces per VM</title>
<updated>2023-11-14T13:01:05+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2023-10-27T18:22:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=eed52e434bc33603ddb0af62b6c4ef818948489d'/>
<id>urn:sha1:eed52e434bc33603ddb0af62b6c4ef818948489d</id>
<content type='text'>
Let x86 track the number of address spaces on a per-VM basis so that KVM
can disallow SMM memslots for confidential VMs.  Confidentials VMs are
fundamentally incompatible with emulating SMM, which as the name suggests
requires being able to read and write guest memory and register state.

Disallowing SMM will simplify support for guest private memory, as KVM
will not need to worry about tracking memory attributes for multiple
address spaces (SMM is the only "non-default" address space across all
architectures).

Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Reviewed-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Reviewed-by: Fuad Tabba &lt;tabba@google.com&gt;
Tested-by: Fuad Tabba &lt;tabba@google.com&gt;
Message-Id: &lt;20231027182217.3615211-23-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
</feed>
