<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/virt/kvm/eventfd.c, branch v6.6.132</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-02-11T12:39:08+00:00</updated>
<entry>
<title>KVM: Don't clobber irqfd routing type when deassigning irqfd</title>
<updated>2026-02-11T12:39:08+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2026-01-13T17:46:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b61f9b2fcf181451d0a319889478cc53c001123e'/>
<id>urn:sha1:b61f9b2fcf181451d0a319889478cc53c001123e</id>
<content type='text'>
commit b4d37cdb77a0015f51fee083598fa227cc07aaf1 upstream.

When deassigning a KVM_IRQFD, don't clobber the irqfd's copy of the IRQ's
routing entry as doing so breaks kvm_arch_irq_bypass_del_producer() on x86
and arm64, which explicitly look for KVM_IRQ_ROUTING_MSI.  Instead, to
handle a concurrent routing update, verify that the irqfd is still active
before consuming the routing information.  As evidenced by the x86 and
arm64 bugs, and another bug in kvm_arch_update_irqfd_routing() (see below),
clobbering the entry type without notifying arch code is surprising and
error prone.

As a bonus, checking that the irqfd is active provides a convenient
location for documenting _why_ KVM must not consume the routing entry for
an irqfd that is in the process of being deassigned: once the irqfd is
deleted from the list (which happens *before* the eventfd is detached), it
will no longer receive updates via kvm_irq_routing_update(), and so KVM
could deliver an event using stale routing information (relative to
KVM_SET_GSI_ROUTING returning to userspace).

As an even better bonus, explicitly checking for the irqfd being active
fixes a similar bug to the one the clobbering is trying to prevent: if an
irqfd is deactivated, and then its routing is changed,
kvm_irq_routing_update() won't invoke kvm_arch_update_irqfd_routing()
(because the irqfd isn't in the list).  And so if the irqfd is in bypass
mode, IRQs will continue to be posted using the old routing information.

As for kvm_arch_irq_bypass_del_producer(), clobbering the routing type
results in KVM incorrectly keeping the IRQ in bypass mode, which is
especially problematic on AMD as KVM tracks IRQs that are being posted to
a vCPU in a list whose lifetime is tied to the irqfd.

Without the help of KASAN to detect use-after-free, the most common
sympton on AMD is a NULL pointer deref in amd_iommu_update_ga() due to
the memory for irqfd structure being re-allocated and zeroed, resulting
in irqfd-&gt;irq_bypass_data being NULL when read by
avic_update_iommu_vcpu_affinity():

  BUG: kernel NULL pointer dereference, address: 0000000000000018
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 40cf2b9067 P4D 40cf2b9067 PUD 408362a067 PMD 0
  Oops: Oops: 0000 [#1] SMP
  CPU: 6 UID: 0 PID: 40383 Comm: vfio_irq_test
  Tainted: G     U  W  O        6.19.0-smp--5dddc257e6b2-irqfd #31 NONE
  Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.78.2-0 09/05/2025
  RIP: 0010:amd_iommu_update_ga+0x19/0xe0
  Call Trace:
   &lt;TASK&gt;
   avic_update_iommu_vcpu_affinity+0x3d/0x90 [kvm_amd]
   __avic_vcpu_load+0xf4/0x130 [kvm_amd]
   kvm_arch_vcpu_load+0x89/0x210 [kvm]
   vcpu_load+0x30/0x40 [kvm]
   kvm_arch_vcpu_ioctl_run+0x45/0x620 [kvm]
   kvm_vcpu_ioctl+0x571/0x6a0 [kvm]
   __se_sys_ioctl+0x6d/0xb0
   do_syscall_64+0x6f/0x9d0
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x46893b
    &lt;/TASK&gt;
  ---[ end trace 0000000000000000 ]---

If AVIC is inhibited when the irfd is deassigned, the bug will manifest as
list corruption, e.g. on the next irqfd assignment.

  list_add corruption. next-&gt;prev should be prev (ffff8d474d5cd588),
                       but was 0000000000000000. (next=ffff8d8658f86530).
  ------------[ cut here ]------------
  kernel BUG at lib/list_debug.c:31!
  Oops: invalid opcode: 0000 [#1] SMP
  CPU: 128 UID: 0 PID: 80818 Comm: vfio_irq_test
  Tainted: G     U  W  O        6.19.0-smp--f19dc4d680ba-irqfd #28 NONE
  Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.78.2-0 09/05/2025
  RIP: 0010:__list_add_valid_or_report+0x97/0xc0
  Call Trace:
   &lt;TASK&gt;
   avic_pi_update_irte+0x28e/0x2b0 [kvm_amd]
   kvm_pi_update_irte+0xbf/0x190 [kvm]
   kvm_arch_irq_bypass_add_producer+0x72/0x90 [kvm]
   irq_bypass_register_consumer+0xcd/0x170 [irqbypass]
   kvm_irqfd+0x4c6/0x540 [kvm]
   kvm_vm_ioctl+0x118/0x5d0 [kvm]
   __se_sys_ioctl+0x6d/0xb0
   do_syscall_64+0x6f/0x9d0
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   &lt;/TASK&gt;
  ---[ end trace 0000000000000000 ]---

On Intel and arm64, the bug is less noisy, as the end result is that the
device keeps posting IRQs to the vCPU even after it's been deassigned.

Note, the worst of the breakage can be traced back to commit cb210737675e
("KVM: Pass new routing entries and irqfd when updating IRTEs"), as before
that commit KVM would pull the routing information from the per-VM routing
table.  But as above, similar bugs have existed since support for IRQ
bypass was added.  E.g. if a routing change finished before irq_shutdown()
invoked kvm_arch_irq_bypass_del_producer(), VMX and SVM would see stale
routing information and potentially leave the irqfd in bypass mode.

Alternatively, x86 could be fixed by explicitly checking irq_bypass_vcpu
instead of irq_entry.type in kvm_arch_irq_bypass_del_producer(), and arm64
could be modified to utilize irq_bypass_vcpu in a similar manner.  But (a)
that wouldn't fix the routing updates bug, and (b) fixing core code doesn't
preclude x86 (or arm64) from adding such code as a sanity check (spoiler
alert).

Fixes: f70c20aaf141 ("KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'")
Fixes: cb210737675e ("KVM: Pass new routing entries and irqfd when updating IRTEs")
Fixes: a0d7e2fc61ab ("KVM: arm64: vgic-v4: Only attempt vLPI mapping for actual MSIs")
Cc: stable@vger.kernel.org
Cc: Marc Zyngier &lt;maz@kernel.org&gt;
Cc: Oliver Upton &lt;oupton@kernel.org&gt;
Link: https://patch.msgid.link/20260113174606.104978-2-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kvm/eventfd: use list_for_each_entry when deassign ioeventfd</title>
<updated>2023-06-13T21:25:39+00:00</updated>
<author>
<name>Wei Wang</name>
<email>wei.w.wang@intel.com</email>
</author>
<published>2023-02-07T12:37:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cc77b95acf3c7d9a24204b0555fed2014f300fd5'/>
<id>urn:sha1:cc77b95acf3c7d9a24204b0555fed2014f300fd5</id>
<content type='text'>
Simpify kvm_deassign_ioeventfd_idx to use list_for_each_entry as the
loop just ends at the entry that's found and deleted.

Note, coalesced_mmio_ops and ioeventfd_ops are the only instances of
kvm_io_device_ops that implement a destructor, all other callers of
kvm_io_bus_unregister_dev() are unaffected by this change.

Suggested-by: Michal Luczaj &lt;mhal@rbox.co&gt;
Signed-off-by: Wei Wang &lt;wei.w.wang@intel.com&gt;
Reviewed-by: Sean Christopherson &lt;seanjc@google.com&gt;
Link: https://lore.kernel.org/r/20230207123713.3905-3-wei.w.wang@intel.com
[sean: call out that only select users implement a destructor]
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: destruct kvm_io_device while unregistering it from kvm_io_bus</title>
<updated>2023-06-13T21:18:09+00:00</updated>
<author>
<name>Wei Wang</name>
<email>wei.w.wang@intel.com</email>
</author>
<published>2023-02-07T12:37:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5ea5ca3c2b4bf4090232e18cfc515dcb52f914a6'/>
<id>urn:sha1:5ea5ca3c2b4bf4090232e18cfc515dcb52f914a6</id>
<content type='text'>
Current usage of kvm_io_device requires users to destruct it with an extra
call of kvm_iodevice_destructor after the device gets unregistered from
kvm_io_bus. This is not necessary and can cause errors if a user forgot
to make the extra call.

Simplify the usage by combining kvm_iodevice_destructor into
kvm_io_bus_unregister_dev. This reduces LOCs a bit for users and can
avoid the leakage of destructing the device explicitly.

Signed-off-by: Wei Wang &lt;wei.w.wang@intel.com&gt;
Reviewed-by: Sean Christopherson &lt;seanjc@google.com&gt;
Link: https://lore.kernel.org/r/20230207123713.3905-2-wei.w.wang@intel.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: Don't kfree(NULL) on kzalloc() failure in kvm_assign_ioeventfd_idx()</title>
<updated>2023-06-01T20:48:54+00:00</updated>
<author>
<name>Michal Luczaj</name>
<email>mhal@rbox.co</email>
</author>
<published>2023-03-27T17:54:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=70b0bc4c0a05cb68ffeeaba8c8340896b5ff6fd7'/>
<id>urn:sha1:70b0bc4c0a05cb68ffeeaba8c8340896b5ff6fd7</id>
<content type='text'>
On kzalloc() failure, taking the `goto fail` path leads to kfree(NULL).
Such no-op has no use. Move it out.

Signed-off-by: Michal Luczaj &lt;mhal@rbox.co&gt;
Reviewed-by: Sean Christopherson &lt;seanjc@google.com&gt;
Reviewed-by: Philippe Mathieu-Daudé &lt;philmd@linaro.org&gt;
Link: https://lore.kernel.org/r/20230327175457.735903-1-mhal@rbox.co
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking</title>
<updated>2023-03-27T14:13:28+00:00</updated>
<author>
<name>Dmytro Maluka</name>
<email>dmy@semihalf.com</email>
</author>
<published>2023-03-22T20:43:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fef8f2b90edbd7089a4278021314f11f056b0cbb'/>
<id>urn:sha1:fef8f2b90edbd7089a4278021314f11f056b0cbb</id>
<content type='text'>
KVM irqfd based emulation of level-triggered interrupts doesn't work
quite correctly in some cases, particularly in the case of interrupts
that are handled in a Linux guest as oneshot interrupts (IRQF_ONESHOT).
Such an interrupt is acked to the device in its threaded irq handler,
i.e. later than it is acked to the interrupt controller (EOI at the end
of hardirq), not earlier.

Linux keeps such interrupt masked until its threaded handler finishes,
to prevent the EOI from re-asserting an unacknowledged interrupt.
However, with KVM + vfio (or whatever is listening on the resamplefd)
we always notify resamplefd at the EOI, so vfio prematurely unmasks the
host physical IRQ, thus a new physical interrupt is fired in the host.
This extra interrupt in the host is not a problem per se. The problem is
that it is unconditionally queued for injection into the guest, so the
guest sees an extra bogus interrupt. [*]

There are observed at least 2 user-visible issues caused by those
extra erroneous interrupts for a oneshot irq in the guest:

1. System suspend aborted due to a pending wakeup interrupt from
   ChromeOS EC (drivers/platform/chrome/cros_ec.c).
2. Annoying "invalid report id data" errors from ELAN0000 touchpad
   (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg
   every time the touchpad is touched.

The core issue here is that by the time when the guest unmasks the IRQ,
the physical IRQ line is no longer asserted (since the guest has
acked the interrupt to the device in the meantime), yet we
unconditionally inject the interrupt queued into the guest by the
previous resampling. So to fix the issue, we need a way to detect that
the IRQ is no longer pending, and cancel the queued interrupt in this
case.

With IOAPIC we are not able to probe the physical IRQ line state
directly (at least not if the underlying physical interrupt controller
is an IOAPIC too), so in this patch we use irqfd resampler for that.
Namely, instead of injecting the queued interrupt, we just notify the
resampler that this interrupt is done. If the IRQ line is actually
already deasserted, we are done. If it is still asserted, a new
interrupt will be shortly triggered through irqfd and injected into the
guest.

In the case if there is no irqfd resampler registered for this IRQ, we
cannot fix the issue, so we keep the existing behavior: immediately
unconditionally inject the queued interrupt.

This patch fixes the issue for x86 IOAPIC only. In the long run, we can
fix it for other irqchips and other architectures too, possibly taking
advantage of reading the physical state of the IRQ line, which is
possible with some other irqchips (e.g. with arm64 GIC, maybe even with
the legacy x86 PIC).

[*] In this description we assume that the interrupt is a physical host
    interrupt forwarded to the guest e.g. by vfio. Potentially the same
    issue may occur also with a purely virtual interrupt from an
    emulated device, e.g. if the guest handles this interrupt, again, as
    a oneshot interrupt.

Signed-off-by: Dmytro Maluka &lt;dmy@semihalf.com&gt;
Link: https://lore.kernel.org/kvm/31420943-8c5f-125c-a5ee-d2fde2700083@semihalf.com/
Link: https://lore.kernel.org/lkml/87o7wrug0w.wl-maz@kernel.org/
Message-Id: &lt;20230322204344.50138-3-dmy@semihalf.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: irqfd: Make resampler_list an RCU list</title>
<updated>2023-03-27T14:13:28+00:00</updated>
<author>
<name>Dmytro Maluka</name>
<email>dmy@semihalf.com</email>
</author>
<published>2023-03-22T20:43:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d583fbd7066a2dea43050521a95d9770f7d7593e'/>
<id>urn:sha1:d583fbd7066a2dea43050521a95d9770f7d7593e</id>
<content type='text'>
It is useful to be able to do read-only traversal of the list of all the
registered irqfd resamplers without locking the resampler_lock mutex.
In particular, we are going to traverse it to search for a resampler
registered for the given irq of an irqchip, and that will be done with
an irqchip spinlock (ioapic-&gt;lock) held, so it is undesirable to lock a
mutex in this context. So turn this list into an RCU list.

For protecting the read side, reuse kvm-&gt;irq_srcu which is already used
for protecting a number of irq related things (kvm-&gt;irq_routing,
irqfd-&gt;resampler-&gt;list, kvm-&gt;irq_ack_notifier_list,
kvm-&gt;arch.mask_notifier_list).

Signed-off-by: Dmytro Maluka &lt;dmy@semihalf.com&gt;
Message-Id: &lt;20230322204344.50138-2-dmy@semihalf.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: eventfd: Fix false positive RCU usage warning</title>
<updated>2022-05-20T13:10:33+00:00</updated>
<author>
<name>Wanpeng Li</name>
<email>wanpengli@tencent.com</email>
</author>
<published>2022-05-19T08:49:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e332b55fe79cca72451fe0b797219bd9fe6b9434'/>
<id>urn:sha1:e332b55fe79cca72451fe0b797219bd9fe6b9434</id>
<content type='text'>
The splat below can be seen when running kvm-unit-test:

     =============================
     WARNING: suspicious RCU usage
     5.18.0-rc7 #5 Tainted: G          IOE
     -----------------------------
     /home/kernel/linux/arch/x86/kvm/../../../virt/kvm/eventfd.c:80 RCU-list traversed in non-reader section!!

     other info that might help us debug this:

     rcu_scheduler_active = 2, debug_locks = 1
     4 locks held by qemu-system-x86/35124:
      #0: ffff9725391d80b8 (&amp;vcpu-&gt;mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x710 [kvm]
      #1: ffffbd25cfb2a0b8 (&amp;kvm-&gt;srcu){....}-{0:0}, at: vcpu_enter_guest+0xdeb/0x1900 [kvm]
      #2: ffffbd25cfb2b920 (&amp;kvm-&gt;irq_srcu){....}-{0:0}, at: kvm_hv_notify_acked_sint+0x79/0x1e0 [kvm]
      #3: ffffbd25cfb2b920 (&amp;kvm-&gt;irq_srcu){....}-{0:0}, at: irqfd_resampler_ack+0x5/0x110 [kvm]

     stack backtrace:
     CPU: 2 PID: 35124 Comm: qemu-system-x86 Tainted: G          IOE     5.18.0-rc7 #5
     Call Trace:
      &lt;TASK&gt;
      dump_stack_lvl+0x6c/0x9b
      irqfd_resampler_ack+0xfd/0x110 [kvm]
      kvm_notify_acked_gsi+0x32/0x90 [kvm]
      kvm_hv_notify_acked_sint+0xc5/0x1e0 [kvm]
      kvm_hv_set_msr_common+0xec1/0x1160 [kvm]
      kvm_set_msr_common+0x7c3/0xf60 [kvm]
      vmx_set_msr+0x394/0x1240 [kvm_intel]
      kvm_set_msr_ignored_check+0x86/0x200 [kvm]
      kvm_emulate_wrmsr+0x4f/0x1f0 [kvm]
      vmx_handle_exit+0x6fb/0x7e0 [kvm_intel]
      vcpu_enter_guest+0xe5a/0x1900 [kvm]
      kvm_arch_vcpu_ioctl_run+0x16e/0xac0 [kvm]
      kvm_vcpu_ioctl+0x279/0x710 [kvm]
      __x64_sys_ioctl+0x83/0xb0
      do_syscall_64+0x3b/0x90
      entry_SYSCALL_64_after_hwframe+0x44/0xae

resampler-list is protected by irq_srcu (see kvm_irqfd_assign), so fix
the false positive by using list_for_each_entry_srcu().

Signed-off-by: Wanpeng Li &lt;wanpengli@tencent.com&gt;
Message-Id: &lt;1652950153-12489-1-git-send-email-wanpengli@tencent.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: eventfd: Fix false positive RCU usage warning</title>
<updated>2022-01-28T12:38:27+00:00</updated>
<author>
<name>Hou Wenlong</name>
<email>houwenlong93@linux.alibaba.com</email>
</author>
<published>2022-01-27T06:54:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6a0c61703e3a5d67845a4b275e1d9d7bc1b5aad7'/>
<id>urn:sha1:6a0c61703e3a5d67845a4b275e1d9d7bc1b5aad7</id>
<content type='text'>
Fix the following false positive warning:
 =============================
 WARNING: suspicious RCU usage
 5.16.0-rc4+ #57 Not tainted
 -----------------------------
 arch/x86/kvm/../../../virt/kvm/eventfd.c:484 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 3 locks held by fc_vcpu 0/330:
  #0: ffff8884835fc0b0 (&amp;vcpu-&gt;mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x88/0x6f0 [kvm]
  #1: ffffc90004c0bb68 (&amp;kvm-&gt;srcu){....}-{0:0}, at: vcpu_enter_guest+0x600/0x1860 [kvm]
  #2: ffffc90004c0c1d0 (&amp;kvm-&gt;irq_srcu){....}-{0:0}, at: kvm_notify_acked_irq+0x36/0x180 [kvm]

 stack backtrace:
 CPU: 26 PID: 330 Comm: fc_vcpu 0 Not tainted 5.16.0-rc4+
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
 Call Trace:
  &lt;TASK&gt;
  dump_stack_lvl+0x44/0x57
  kvm_notify_acked_gsi+0x6b/0x70 [kvm]
  kvm_notify_acked_irq+0x8d/0x180 [kvm]
  kvm_ioapic_update_eoi+0x92/0x240 [kvm]
  kvm_apic_set_eoi_accelerated+0x2a/0xe0 [kvm]
  handle_apic_eoi_induced+0x3d/0x60 [kvm_intel]
  vmx_handle_exit+0x19c/0x6a0 [kvm_intel]
  vcpu_enter_guest+0x66e/0x1860 [kvm]
  kvm_arch_vcpu_ioctl_run+0x438/0x7f0 [kvm]
  kvm_vcpu_ioctl+0x38a/0x6f0 [kvm]
  __x64_sys_ioctl+0x89/0xc0
  do_syscall_64+0x3a/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Since kvm_unregister_irq_ack_notifier() does synchronize_srcu(&amp;kvm-&gt;irq_srcu),
kvm-&gt;irq_ack_notifier_list is protected by kvm-&gt;irq_srcu. In fact,
kvm-&gt;irq_srcu SRCU read lock is held in kvm_notify_acked_irq(), making it
a false positive warning. So use hlist_for_each_entry_srcu() instead of
hlist_for_each_entry_rcu().

Reviewed-by: Sean Christopherson &lt;seanjc@google.com&gt;
Signed-off-by: Hou Wenlong &lt;houwenlong93@linux.alibaba.com&gt;
Message-Id: &lt;f98bac4f5052bad2c26df9ad50f7019e40434512.1643265976.git.houwenlong.hwl@antgroup.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>kvm: irqfd: avoid update unmodified entries of the routing</title>
<updated>2021-09-30T08:27:10+00:00</updated>
<author>
<name>Longpeng(Mike)</name>
<email>longpeng2@huawei.com</email>
</author>
<published>2021-08-27T08:00:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=515a0c79e7963cc4556ca61516cc09d39e592712'/>
<id>urn:sha1:515a0c79e7963cc4556ca61516cc09d39e592712</id>
<content type='text'>
All of the irqfds would to be updated when update the irq
routing, it's too expensive if there're too many irqfds.

However we can reduce the cost by avoid some unnecessary
updates. For irqs of MSI type on X86, the update can be
saved if the msi values are not change.

The vfio migration could receives benefit from this optimi-
zaiton. The test VM has 128 vcpus and 8 VF (with 65 vectors
enabled), so the VM has more than 520 irqfds. We mesure the
cost of the vfio_msix_enable (in QEMU, it would set routing
for each irqfd) for each VF, and we can see the total cost
can be significantly reduced.

                Origin         Apply this Patch
1st             8              4
2nd             15             5
3rd             22             6
4th             24             6
5th             36             7
6th             44             7
7th             51             8
8th             58             8
Total           258ms          51ms

We're also tring to optimize the QEMU part [1], but it's still
worth to optimize the KVM to gain more benefits.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg04215.html

Signed-off-by: Longpeng(Mike) &lt;longpeng2@huawei.com&gt;
Message-Id: &lt;20210827080003.2689-1-longpeng2@huawei.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>kvm/eventfd: Drain events from eventfd in irqfd_wakeup()</title>
<updated>2020-11-15T14:49:11+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2020-10-27T13:55:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b59e00dd8cda75fc8303c9f6847ac720b10664e3'/>
<id>urn:sha1:b59e00dd8cda75fc8303c9f6847ac720b10664e3</id>
<content type='text'>
Don't allow the events to accumulate in the eventfd counter, drain them
as they are handled.

Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Message-Id: &lt;20201027135523.646811-4-dwmw2@infradead.org&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
</feed>
