<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/Documentation/virt, branch v5.15.99</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.15.99</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.15.99'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-03-10T08:40:01+00:00</updated>
<entry>
<title>KVM: s390: disable migration mode when dirty tracking is disabled</title>
<updated>2023-03-10T08:40:01+00:00</updated>
<author>
<name>Nico Boehr</name>
<email>nrb@linux.ibm.com</email>
</author>
<published>2023-01-27T14:05:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6e46d9ff3ed395de419dcc43b1fbef87788b6062'/>
<id>urn:sha1:6e46d9ff3ed395de419dcc43b1fbef87788b6062</id>
<content type='text'>
commit f2d3155e2a6bac44d16f04415a321e8707d895c6 upstream.

Migration mode is a VM attribute which enables tracking of changes in
storage attributes (PGSTE). It assumes dirty tracking is enabled on all
memslots to keep a dirty bitmap of pages with changed storage attributes.

When enabling migration mode, we currently check that dirty tracking is
enabled for all memslots. However, userspace can disable dirty tracking
without disabling migration mode.

Since migration mode is pointless with dirty tracking disabled, disable
migration mode whenever userspace disables dirty tracking on any slot.

Also update the documentation to clarify that dirty tracking must be
enabled when enabling migration mode, which is already enforced by the
code in kvm_s390_vm_start_migration().

Also highlight in the documentation for KVM_S390_GET_CMMA_BITS that it
can now fail with -EINVAL when dirty tracking is disabled while
migration mode is on. Move all the error codes to a table so this stays
readable.

To disable migration mode, slots_lock should be held, which is taken
in kvm_set_memory_region() and thus held in
kvm_arch_prepare_memory_region().

Restructure the prepare code a bit so all the sanity checking is done
before disabling migration mode. This ensures migration mode isn't
disabled when some sanity check fails.

Cc: stable@vger.kernel.org
Fixes: 190df4a212a7 ("KVM: s390: CMMA tracking, ESSA emulation, migration mode")
Signed-off-by: Nico Boehr &lt;nrb@linux.ibm.com&gt;
Reviewed-by: Janosch Frank &lt;frankja@linux.ibm.com&gt;
Reviewed-by: Claudio Imbrenda &lt;imbrenda@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20230127140532.230651-2-nrb@linux.ibm.com
Message-Id: &lt;20230127140532.230651-2-nrb@linux.ibm.com&gt;
[frankja@linux.ibm.com: fixed commit message typo, moved api.rst error table upwards]
Signed-off-by: Janosch Frank &lt;frankja@linux.ibm.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID</title>
<updated>2023-01-18T10:48:57+00:00</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2022-10-22T08:17:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fbf50151418240cdbfc1f54d7679b0415a499732'/>
<id>urn:sha1:fbf50151418240cdbfc1f54d7679b0415a499732</id>
<content type='text'>
[ Upstream commit 45e966fcca03ecdcccac7cb236e16eea38cc18af ]

Passing the host topology to the guest is almost certainly wrong
and will confuse the scheduler.  In addition, several fields of
these CPUID leaves vary on each processor; it is simply impossible to
return the right values from KVM_GET_SUPPORTED_CPUID in such a way that
they can be passed to KVM_SET_CPUID2.

The values that will most likely prevent confusion are all zeroes.
Userspace will have to override it anyway if it wishes to present a
specific topology to the guest.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>Documentation: KVM: add API issues section</title>
<updated>2023-01-18T10:48:57+00:00</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2022-03-22T11:07:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ee16841134be59a8ea4d5617be63b791e8fa1067'/>
<id>urn:sha1:ee16841134be59a8ea4d5617be63b791e8fa1067</id>
<content type='text'>
[ Upstream commit cde363ab7ca7aea7a853851cd6a6745a9e1aaf5e ]

Add a section to document all the different ways in which the KVM API sucks.

I am sure there are way more, give people a place to vent so that userspace
authors are aware.

Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Message-Id: &lt;20220322110712.222449-4-pbonzini@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>KVM: s390: pv: don't allow userspace to set the clock under PV</title>
<updated>2022-11-16T08:58:17+00:00</updated>
<author>
<name>Nico Boehr</name>
<email>nrb@linux.ibm.com</email>
</author>
<published>2022-10-11T16:07:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=119407dc329a5a0c72326dcb9073dc256887d948'/>
<id>urn:sha1:119407dc329a5a0c72326dcb9073dc256887d948</id>
<content type='text'>
[ Upstream commit 6973091d1b50ab4042f6a2d495f59e9db3662ab8 ]

When running under PV, the guest's TOD clock is under control of the
ultravisor and the hypervisor isn't allowed to change it. Hence, don't
allow userspace to change the guest's TOD clock by returning
-EOPNOTSUPP.

When userspace changes the guest's TOD clock, KVM updates its
kvm.arch.epoch field and, in addition, the epoch field in all state
descriptions of all VCPUs.

But, under PV, the ultravisor will ignore the epoch field in the state
description and simply overwrite it on next SIE exit with the actual
guest epoch. This leads to KVM having an incorrect view of the guest's
TOD clock: it has updated its internal kvm.arch.epoch field, but the
ultravisor ignores the field in the state description.

Whenever a guest is now waiting for a clock comparator, KVM will
incorrectly calculate the time when the guest should wake up, possibly
causing the guest to sleep for much longer than expected.

With this change, kvm_s390_set_tod() will now take the kvm-&gt;lock to be
able to call kvm_s390_pv_is_protected(). Since kvm_s390_set_tod_clock()
also takes kvm-&gt;lock, use __kvm_s390_set_tod_clock() instead.

The function kvm_s390_set_tod_clock is now unused, hence remove it.
Update the documentation to indicate the TOD clock attr calls can now
return -EOPNOTSUPP.

Fixes: 0f3035047140 ("KVM: s390: protvirt: Do only reset registers that are accessible")
Reported-by: Marc Hartmayer &lt;mhartmay@linux.ibm.com&gt;
Signed-off-by: Nico Boehr &lt;nrb@linux.ibm.com&gt;
Reviewed-by: Claudio Imbrenda &lt;imbrenda@linux.ibm.com&gt;
Reviewed-by: Janosch Frank &lt;frankja@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20221011160712.928239-2-nrb@linux.ibm.com
Message-Id: &lt;20221011160712.928239-2-nrb@linux.ibm.com&gt;
Signed-off-by: Janosch Frank &lt;frankja@linux.ibm.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm</title>
<updated>2021-09-07T20:40:51+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-09-07T20:40:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=192ad3c27a4895ee4b2fa31c5b54a932f5bb08c1'/>
<id>urn:sha1:192ad3c27a4895ee4b2fa31c5b54a932f5bb08c1</id>
<content type='text'>
Pull KVM updates from Paolo Bonzini:
 "ARM:
   - Page ownership tracking between host EL1 and EL2
   - Rely on userspace page tables to create large stage-2 mappings
   - Fix incompatibility between pKVM and kmemleak
   - Fix the PMU reset state, and improve the performance of the virtual
     PMU
   - Move over to the generic KVM entry code
   - Address PSCI reset issues w.r.t. save/restore
   - Preliminary rework for the upcoming pKVM fixed feature
   - A bunch of MM cleanups
   - a vGIC fix for timer spurious interrupts
   - Various cleanups

  s390:
   - enable interpretation of specification exceptions
   - fix a vcpu_idx vs vcpu_id mixup

  x86:
   - fast (lockless) page fault support for the new MMU
   - new MMU now the default
   - increased maximum allowed VCPU count
   - allow inhibit IRQs on KVM_RUN while debugging guests
   - let Hyper-V-enabled guests run with virtualized LAPIC as long as
     they do not enable the Hyper-V "AutoEOI" feature
   - fixes and optimizations for the toggling of AMD AVIC (virtualized
     LAPIC)
   - tuning for the case when two-dimensional paging (EPT/NPT) is
     disabled
   - bugfixes and cleanups, especially with respect to vCPU reset and
     choosing a paging mode based on CR0/CR4/EFER
   - support for 5-level page table on AMD processors

  Generic:
   - MMU notifier invalidation callbacks do not take mmu_lock unless
     necessary
   - improved caching of LRU kvm_memory_slot
   - support for histogram statistics
   - add statistics for halt polling and remote TLB flush requests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits)
  KVM: Drop unused kvm_dirty_gfn_invalid()
  KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
  KVM: MMU: mark role_regs and role accessors as maybe unused
  KVM: MIPS: Remove a "set but not used" variable
  x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait
  KVM: stats: Add VM stat for remote tlb flush requests
  KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count()
  KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page
  KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality
  Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"
  KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page
  kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710
  kvm: x86: Increase MAX_VCPUS to 1024
  kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS
  KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
  KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host
  KVM: s390: index kvm-&gt;arch.idle_mask by vcpu_idx
  KVM: s390: Enable specification exception interpretation
  KVM: arm64: Trim guest debug exception handling
  KVM: SVM: Add 5-level page table support for SVM
  ...
</content>
</entry>
<entry>
<title>Merge tag 'docs-5.15' of git://git.lwn.net/linux</title>
<updated>2021-09-02T01:49:47+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-09-02T01:49:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4ac6d90867a4de2e12117e755dbd76e08d88697f'/>
<id>urn:sha1:4ac6d90867a4de2e12117e755dbd76e08d88697f</id>
<content type='text'>
Pull documentation updates from Jonathan Corbet:
 "Yet another set of documentation changes:

   - A reworking of PDF generation to yield better results for documents
     using CJK fonts in particular.

   - A new set of translations into traditional Chinese, a dialect for
     which I am assured there is a community of interested readers.

   - A lot more regular Chinese translation work as well.

  ... plus the usual assortment of updates, fixes, typo tweaks, etc"

* tag 'docs-5.15' of git://git.lwn.net/linux: (55 commits)
  docs: sphinx-requirements: Move sphinx_rtd_theme to top
  docs: pdfdocs: Enable language-specific font choice of zh_TW translations
  docs: pdfdocs: Teach xeCJK about character classes of quotation marks
  docs: pdfdocs: Permit AutoFakeSlant for CJK fonts
  docs: pdfdocs: One-half spacing for CJK translations
  docs: pdfdocs: Add conf.py local to translations for ascii-art alignment
  docs: pdfdocs: Preserve inter-phrase space in Korean translations
  docs: pdfdocs: Choose Serif font as CJK mainfont if possible
  docs: pdfdocs: Add CJK-language-specific font settings
  docs: pdfdocs: Refactor config for CJK document
  scripts/kernel-doc: Override -Werror from KCFLAGS with KDOC_WERROR
  docs/zh_CN: Add zh_CN/accounting/psi.rst
  doc: align Italian translation
  Documentation/features/vm: riscv supports THP now
  docs/zh_CN: add infiniband user_verbs translation
  docs/zh_CN: add infiniband user_mad translation
  docs/zh_CN: add infiniband tag_matching translation
  docs/zh_CN: add infiniband sysfs translation
  docs/zh_CN: add infiniband opa_vnic translation
  docs/zh_CN: add infiniband ipoib translation
  ...
</content>
</entry>
<entry>
<title>KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ</title>
<updated>2021-08-20T20:06:37+00:00</updated>
<author>
<name>Maxim Levitsky</name>
<email>mlevitsk@redhat.com</email>
</author>
<published>2021-08-11T12:29:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=61e5f69ef08379cdc74e8f15d3770976ed48480a'/>
<id>urn:sha1:61e5f69ef08379cdc74e8f15d3770976ed48480a</id>
<content type='text'>
KVM_GUESTDBG_BLOCKIRQ will allow KVM to block all interrupts
while running.

This change is mostly intended for more robust single stepping
of the guest and it has the following benefits when enabled:

* Resuming from a breakpoint is much more reliable.
  When resuming execution from a breakpoint, with interrupts enabled,
  more often than not, KVM would inject an interrupt and make the CPU
  jump immediately to the interrupt handler and eventually return to
  the breakpoint, to trigger it again.

  From the user point of view it looks like the CPU never executed a
  single instruction and in some cases that can even prevent forward
  progress, for example, when the breakpoint is placed by an automated
  script (e.g lx-symbols), which does something in response to the
  breakpoint and then continues the guest automatically.
  If the script execution takes enough time for another interrupt to
  arrive, the guest will be stuck on the same breakpoint RIP forever.

* Normal single stepping is much more predictable, since it won't
  land the debugger into an interrupt handler.

* RFLAGS.TF has less chance to be leaked to the guest:

  We set that flag behind the guest's back to do single stepping
  but if single step lands us into an interrupt/exception handler
  it will be leaked to the guest in the form of being pushed
  to the stack.
  This doesn't completely eliminate this problem as exceptions
  can still happen, but at least this reduces the chances
  of this happening.

Signed-off-by: Maxim Levitsky &lt;mlevitsk@redhat.com&gt;
Message-Id: &lt;20210811122927.900604-6-mlevitsk@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: stats: Update doc for histogram statistics</title>
<updated>2021-08-20T20:06:32+00:00</updated>
<author>
<name>Jing Zhang</name>
<email>jingzhangos@google.com</email>
</author>
<published>2021-08-02T16:56:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0176ec51290f8ef543a8c18a02e932d6ccedbbc5'/>
<id>urn:sha1:0176ec51290f8ef543a8c18a02e932d6ccedbbc5</id>
<content type='text'>
Add documentations for linear and logarithmic histogram statistics.

Signed-off-by: Jing Zhang &lt;jingzhangos@google.com&gt;
Message-Id: &lt;20210802165633.1866976-3-jingzhangos@google.com&gt;
[Small changes to the phrasing. - Paolo]
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'kvm-tdpmmu-fixes' into HEAD</title>
<updated>2021-08-13T07:35:01+00:00</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2021-08-13T07:35:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9a63b4517c606bfbccd063ffc4188e059d4fa23f'/>
<id>urn:sha1:9a63b4517c606bfbccd063ffc4188e059d4fa23f</id>
<content type='text'>
Merge topic branch with fixes for 5.14-rc6 and 5.15 merge window.
</content>
</entry>
<entry>
<title>KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock</title>
<updated>2021-08-13T07:32:14+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2021-08-12T18:18:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ce25681d59ffc4303321e555a2d71b1946af07da'/>
<id>urn:sha1:ce25681d59ffc4303321e555a2d71b1946af07da</id>
<content type='text'>
Add yet another spinlock for the TDP MMU and take it when marking indirect
shadow pages unsync.  When using the TDP MMU and L1 is running L2(s) with
nested TDP, KVM may encounter shadow pages for the TDP entries managed by
L1 (controlling L2) when handling a TDP MMU page fault.  The unsync logic
is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and
misbehaves when a shadow page is marked unsync via a TDP MMU page fault,
which runs with mmu_lock held for read, not write.

Lack of a critical section manifests most visibly as an underflow of
unsync_children in clear_unsync_child_bit() due to unsync_children being
corrupted when multiple CPUs write it without a critical section and
without atomic operations.  But underflow is the best case scenario.  The
worst case scenario is that unsync_children prematurely hits '0' and
leads to guest memory corruption due to KVM neglecting to properly sync
shadow pages.

Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock
would functionally be ok.  Usurping the lock could degrade performance when
building upper level page tables on different vCPUs, especially since the
unsync flow could hold the lock for a comparatively long time depending on
the number of indirect shadow pages and the depth of the paging tree.

For simplicity, take the lock for all MMUs, even though KVM could fairly
easily know that mmu_lock is held for write.  If mmu_lock is held for
write, there cannot be contention for the inner spinlock, and marking
shadow pages unsync across multiple vCPUs will be slow enough that
bouncing the kvm_arch cacheline should be in the noise.

Note, even though L2 could theoretically be given access to its own EPT
entries, a nested MMU must hold mmu_lock for write and thus cannot race
against a TDP MMU page fault.  I.e. the additional spinlock only _needs_ to
be taken by the TDP MMU, as opposed to being taken by any MMU for a VM
that is running with the TDP MMU enabled.  Holding mmu_lock for read also
prevents the indirect shadow page from being freed.  But as above, keep
it simple and always take the lock.

Alternative #1, the TDP MMU could simply pass "false" for can_unsync and
effectively disable unsync behavior for nested TDP.  Write protecting leaf
shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such
VMMs typically don't modify TDP entries, but the same may not hold true for
non-standard use cases and/or VMMs that are migrating physical pages (from
L1's perspective).

Alternative #2, the unsync logic could be made thread safe.  In theory,
simply converting all relevant kvm_mmu_page fields to atomics and using
atomic bitops for the bitmap would suffice.  However, (a) an in-depth audit
would be required, (b) the code churn would be substantial, and (c) legacy
shadow paging would incur additional atomic operations in performance
sensitive paths for no benefit (to legacy shadow paging).

Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon &lt;bgardon@google.com&gt;
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Message-Id: &lt;20210812181815.3378104-1-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
</feed>
