summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-04-26xprtrdma: Add tracepoints showing FastReg WRs and remote invalidationChuck Lever2-0/+4
The Send signaling logic is a little subtle, so add some observability around it. For every xprtrdma_mr_fastreg event, there should be an xprtrdma_mr_localinv or xprtrdma_mr_reminv event. When these tracepoints are enabled, we can see exactly when an MR is DMA-mapped, registered, invalidated (either locally or remotely) and then DMA-unmapped. kworker/u25:2-190 [000] 787.979512: xprtrdma_mr_map: task:351@5 mr.id=4 nents=2 5608@0x8679e0c8f6f56000:0x00000503 (TO_DEVICE) kworker/u25:2-190 [000] 787.979515: xprtrdma_chunk_read: task:351@5 pos=148 5608@0x8679e0c8f6f56000:0x00000503 (last) kworker/u25:2-190 [000] 787.979519: xprtrdma_marshal: task:351@5 xid=0x8679e0c8: hdr=52 xdr=148/5608/0 read list/inline kworker/u25:2-190 [000] 787.979525: xprtrdma_mr_fastreg: task:351@5 mr.id=4 nents=2 5608@0x8679e0c8f6f56000:0x00000503 (TO_DEVICE) kworker/u25:2-190 [000] 787.979526: xprtrdma_post_send: task:351@5 cq.id=0 cid=73 (2 SGEs) ... kworker/5:1H-219 [005] 787.980567: xprtrdma_wc_receive: cq.id=1 cid=161 status=SUCCESS (0/0x0) received=164 kworker/5:1H-219 [005] 787.980571: xprtrdma_post_recvs: peer=[192.168.100.55]:20049 r_xprt=0xffff8884974d4000: 0 new recvs, 70 active (rc 0) kworker/5:1H-219 [005] 787.980573: xprtrdma_reply: task:351@5 xid=0x8679e0c8 credits=64 kworker/5:1H-219 [005] 787.980576: xprtrdma_mr_reminv: task:351@5 mr.id=4 nents=2 5608@0x8679e0c8f6f56000:0x00000503 (TO_DEVICE) kworker/5:1H-219 [005] 787.980577: xprtrdma_mr_unmap: mr.id=4 nents=2 5608@0x8679e0c8f6f56000:0x00000503 (TO_DEVICE) Note that I've moved the xprtrdma_post_send tracepoint so that event always appears after the xprtrdma_mr_fastreg tracepoint. Otherwise the event log looks counterintuitive (FastReg is always supposed to happen before Send). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Avoid Send Queue wrappingChuck Lever2-17/+16
Send WRs can be signalled or unsignalled. A signalled Send WR always has a matching Send completion, while a unsignalled Send has a completion only if the Send WR fails. xprtrdma has a Send account mechanism that is designed to reduce the number of signalled Send WRs. This in turn mitigates the interrupt rate of the underlying device. RDMA consumers can't leave all Sends unsignaled, however, because providers rely on Send completions to maintain their Send Queue head and tail pointers. xprtrdma counts the number of unsignaled Send WRs that have been posted to ensure that Sends are signalled often enough to prevent the Send Queue from wrapping. This mechanism neglected to account for FastReg WRs, which are posted on the Send Queue but never signalled. As a result, the Send Queue wrapped on occasion, resulting in duplication completions of FastReg and LocalInv WRs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Do not wake RPC consumer on a failed LocalInvChuck Lever3-9/+41
Throw away any reply where the LocalInv flushes or could not be posted. The registered memory region is in an unknown state until the disconnect completes. rpcrdma_xprt_disconnect() will find and release the MR. No need to put it back on the MR free list in this case. The client retransmits pending RPC requests once it reestablishes a fresh connection, so a replacement reply should be forthcoming on the next connection instance. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Do not recycle MR after FastReg/LocalInv flushesChuck Lever2-53/+17
Better not to touch MRs involved in a flush or post error until the Send and Receive Queues are drained and the transport is fully quiescent. Simply don't insert such MRs back onto the free list. They remain on mr_all and will be released when the connection is torn down. I had thought that recycling would prevent hardware resources from being tied up for a long time. However, since v5.7, a transport disconnect destroys the QP and other hardware-owned resources. The MRs get cleaned up nicely at that point. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Clarify use of barrier in frwr_wc_localinv_done()Chuck Lever1-3/+5
Clean up: The comment and the placement of the memory barrier is confusing. Humans want to read the function statements from head to tail. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Rename frwr_release_mr()Chuck Lever3-6/+6
Clean up: To be consistent with other functions in this source file, follow the naming convention of putting the object being acted upon before the action itself. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: rpcrdma_mr_pop() already does list_del_init()Chuck Lever1-1/+0
The rpcrdma_mr_pop() earlier in the function has already cleared out mr_list, so it must not be done again in the error path. Fixes: 847568942f93 ("xprtrdma: Remove fr_state") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Delete rpcrdma_recv_buffer_put()Chuck Lever4-20/+14
Clean up: The name recv_buffer_put() is a vestige of older code, and the function is just a wrapper for the newer rpcrdma_rep_put(). In most of the existing call sites, a pointer to the owning rpcrdma_buffer is already available. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Fix cwnd update orderingChuck Lever3-7/+8
After a reconnect, the reply handler is opening the cwnd (and thus enabling more RPC Calls to be sent) /before/ rpcrdma_post_recvs() can post enough Receive WRs to receive their replies. This causes an RNR and the new connection is lost immediately. The race is most clearly exposed when KASAN and disconnect injection are enabled. This slows down rpcrdma_rep_create() enough to allow the send side to post a bunch of RPC Calls before the Receive completion handler can invoke ib_post_recv(). Fixes: 2ae50ad68cd7 ("xprtrdma: Close window between waking RPC senders and posting Receives") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Improve locking around rpcrdma_rep creationChuck Lever1-4/+5
Defensive clean up: Protect the rb_all_reps list during rep creation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Improve commentary around rpcrdma_reps_unmap()Chuck Lever1-1/+5
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Improve locking around rpcrdma_rep destructionChuck Lever1-7/+24
Currently rpcrdma_reps_destroy() assumes that, at transport tear-down, the content of the rb_free_reps list is the same as the content of the rb_all_reps list. Although that is usually true, using the rb_all_reps list should be more reliable because of the way it's managed. And, rpcrdma_reps_unmap() uses rb_all_reps; these two functions should both traverse the "all" list. Ensure that all rpcrdma_reps are always destroyed whether they are on the rep free list or not. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Put flushed Receives on free list instead of destroying themChuck Lever1-1/+3
Defer destruction of an rpcrdma_rep until transport tear-down to preserve the rb_all_reps list while Receives flush. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Do not refresh Receive Queue while it is drainingChuck Lever2-0/+14
Currently the Receive completion handler refreshes the Receive Queue whenever a successful Receive completion occurs. On disconnect, xprtrdma drains the Receive Queue. The first few Receive completions after a disconnect are typically successful, until the first flushed Receive. This means the Receive completion handler continues to post more Receive WRs after the drain sentinel has been posted. The late- posted Receives flush after the drain sentinel has completed, leading to a crash later in rpcrdma_xprt_disconnect(). To prevent this crash, xprtrdma has to ensure that the Receive handler stops posting Receives before ib_drain_rq() posts its drain sentinel. Suggested-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Avoid Receive Queue wrappingChuck Lever1-0/+1
Commit e340c2d6ef2a ("xprtrdma: Reduce the doorbell rate (Receive)") increased the number of Receive WRs that are posted by the client, but did not increase the size of the Receive Queue allocated during transport set-up. This is usually not an issue because RPCRDMA_BACKWARD_WRS is defined as (32) when SUNRPC_BACKCHANNEL is defined. In cases where it isn't, there is a real risk of Receive Queue wrapping. Fixes: e340c2d6ef2a ("xprtrdma: Reduce the doorbell rate (Receive)") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26io_uring: simplify SQPOLL cancellationsPavel Begunkov1-42/+3
All sqpoll rings (even sharing sqpoll task) are currently dead bound to the task that created them, iow when owner task dies it kills all its SQPOLL rings and their inflight requests via task_work infra. It's neither the nicist way nor the most convenient as adds extra locking/waiting and dependencies. Leave it alone and rely on SIGKILL being delivered on its thread group exit, so there are only two cases left: 1) thread group is dying, so sqpoll task gets a signal and exit itself cancelling all requests. 2) an sqpoll ring is dying. Because refs_kill() is called the sqpoll not going to submit any new request, and that's what we need. And io_ring_exit_work() will do all the cancellation itself before actually killing ctx, so sqpoll doesn't need to worry about it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3cd7f166b9c326a2c932b70e71a655b03257b366.1619389911.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-26io_uring: fix work_exit sqpoll cancellationsPavel Begunkov1-7/+18
After closing an SQPOLL ring, io_ring_exit_work() kicks in and starts doing cancellations via io_uring_try_cancel_requests(). It will go through io_uring_try_cancel_iowq(), which uses ctx->tctx_list, but as SQPOLL task don't have a ctx note, its io-wq won't be reachable and so is left not cancelled. It will eventually cancelled when one of the tasks dies, but if a thread group survives for long and changes rings, it will spawn lots of unreclaimed resources and live locked works. Cancel SQPOLL task's io-wq separately in io_ring_exit_work(). Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a71a7fe345135d684025bb529d5cb1d8d6b46e10.1619389911.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-26io_uring: Fix uninitialized variable up.resvColin Ian King1-0/+1
The variable up.resv is not initialized and is being checking for a non-zero value in the call to _io_register_rsrc_update. Fix this by explicitly setting the variable to 0. Addresses-Coverity: ("Uninitialized scalar variable)" Fixes: c3bdad027183 ("io_uring: add generic rsrc update with tags") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210426094735.8320-1-colin.king@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-26io_uring: fix invalid error check after mallocPavel Begunkov1-1/+1
Now we allocate io_mapped_ubuf instead of bvec, so we clearly have to check its address after allocation. Fixes: 41edf1a5ec967 ("io_uring: keep table of pointers to ubufs") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d28eb1bc4384284f69dbce35b9f70c115ff6176f.1619392565.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-26blk-iocost: don't ignore vrate_min on QD contentionTejun Heo1-4/+0
ioc_adjust_base_vrate() ignored vrate_min when rq_wait_pct indicates that there is QD contention. The reasoning was that QD depletion always reliably indicates device saturation and thus it's safe to override user specified vrate_min. However, this sometimes leads to unnecessary throttling, especially on really fast devices, because vrate adjustments have delays and inertia. It also confuses users because the behavior violates the explicitly specified configuration. This patch drops the special case handling so that vrate_min is always applied. Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/YIIo1HuyNmhDeiNx@slm.duckdns.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-26Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo197-929/+1311
To pick up fixes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-26ALSA: hda/realtek: fix static noise on ALC285 Lenovo laptopsSami Loone1-5/+4
Remove a duplicate vendor+subvendor pin fixup entry as one is masking the other and making it unreachable. Consider the more specific newcomer as a second chance instead. The generic entry is made less strict to also match for laptops with slightly different 0x12 pin configuration. Tested on Lenovo Yoga 6 (AMD) where 0x12 is 0x40000000. Fixes: 607184cb1635 ("ALSA: hda/realtek - Add supported for more Lenovo ALC285 Headset Button") Signed-off-by: Sami Loone <sami@loone.fi> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/YIXS+GT/dGI/LtK6@yoga Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-04-26mmc: block: Issue a cache flush only when it's enabledAvri Altman4-3/+21
In command queueing mode, the cache isn't flushed via the mmc_flush_cache() function, but instead by issuing a CMDQ_TASK_MGMT (CMD48) with a FLUSH_CACHE opcode. In this path, we need to check if cache has been enabled, before deciding to flush the cache, along the lines of what's being done in mmc_flush_cache(). To fix this problem, let's add a new bus ops callback ->cache_enabled() and implement it for the mmc bus type. In this way, the mmc block device driver can call it to know whether cache flushing should be done. Fixes: 1e8e55b67030 (mmc: block: Add CQE support) Cc: stable@vger.kernel.org Reported-by: Brendan Peter <bpeter@lytx.com> Signed-off-by: Avri Altman <avri.altman@wdc.com> Tested-by: Brendan Peter <bpeter@lytx.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20210425060207.2591-2-avri.altman@wdc.com Link: https://lore.kernel.org/r/20210425060207.2591-3-avri.altman@wdc.com [Ulf: Squashed the two patches and made some minor updates] Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-04-26selftests: kvm: Fix the check of return valueZhenzhong Duan1-2/+2
In vm_vcpu_rm() and kvm_vm_release(), a stale return value is checked in TEST_ASSERT macro. Fix it by assigning variable ret with correct return value. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20210426193138.118276-1-zhenzhong.duan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt()Haiwei Li1-12/+9
`kvm_arch_dy_runnable` checks the pending_interrupt as the code in `kvm_arch_dy_has_pending_interrupt`. So take advantage of it. Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <20210421032513.1921-1-lihaiwei.kernel@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Skip SEV cache flush if no ASIDs have been usedSean Christopherson1-12/+13
Skip SEV's expensive WBINVD and DF_FLUSH if there are no SEV ASIDs waiting to be reclaimed, e.g. if SEV was never used. This "fixes" an issue where the DF_FLUSH fails during hardware teardown if the original SEV_INIT failed. Ideally, SEV wouldn't be marked as enabled in KVM if SEV_INIT fails, but that's a problem for another day. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()Sean Christopherson1-1/+0
Remove the forward declaration of sev_flush_asids(), which is only a few lines above the function itself. No functional change intended. Reviewed by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Drop redundant svm_sev_enabled() helperSean Christopherson2-8/+3
Replace calls to svm_sev_enabled() with direct checks on sev_enabled, or in the case of svm_mem_enc_op, simply drop the call to svm_sev_enabled(). This effectively replaces checks against a valid max_sev_asid with checks against sev_enabled. sev_enabled is forced off by sev_hardware_setup() if max_sev_asid is invalid, all call sites are guaranteed to run after sev_hardware_setup(), and all of the checks care about SEV being fully enabled (as opposed to intentionally handling the scenario where max_sev_asid is valid but SEV enabling fails due to OOM). Reviewed by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Move SEV VMCB tracking allocation to sev.cSean Christopherson3-8/+20
Move the allocation of the SEV VMCB array to sev.c to help pave the way toward encapsulating SEV enabling wholly within sev.c. No functional change intended. Reviewed by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()Sean Christopherson1-2/+1
Query max_sev_asid directly after setting it instead of bouncing through its wrapper, svm_sev_enabled(). Using the wrapper is unnecessary obfuscation. No functional change intended. Reviewed by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Unconditionally invoke sev_hardware_teardown()Sean Christopherson1-2/+1
Remove the redundant svm_sev_enabled() check when calling sev_hardware_teardown(), the teardown helper itself does the check. Removing the check from svm.c will eventually allow dropping svm_sev_enabled() entirely. No functional change intended. Reviewed by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)Sean Christopherson1-2/+2
Enable the 'sev' and 'sev_es' module params by default instead of having them conditioned on CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT. The extra Kconfig is pointless as KVM SEV/SEV-ES support is already controlled via CONFIG_KVM_AMD_SEV, and CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT has the unfortunate side effect of enabling all the SEV-ES _guest_ code due to it being dependent on CONFIG_AMD_MEM_ENCRYPT=y. Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=ySean Christopherson1-1/+8
Define sev_enabled and sev_es_enabled as 'false' and explicitly #ifdef out all of sev_hardware_setup() if CONFIG_KVM_AMD_SEV=n. This kills three birds at once: - Makes sev_enabled and sev_es_enabled off by default if CONFIG_KVM_AMD_SEV=n. Previously, they could be on by default if CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y, regardless of KVM SEV support. - Hides the sev and sev_es modules params when CONFIG_KVM_AMD_SEV=n. - Resolves a false positive -Wnonnull in __sev_recycle_asids() that is currently masked by the equivalent IS_ENABLED(CONFIG_KVM_AMD_SEV) check in svm_sev_enabled(), which will be dropped in a future patch. Reviewed by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variablesSean Christopherson1-12/+12
Rename sev and sev_es to sev_enabled and sev_es_enabled respectively to better align with other KVM terminology, and to avoid pseudo-shadowing when the variables are moved to sev.c in a future patch ('sev' is often used for local struct kvm_sev_info pointers. No functional change intended. Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SEV: Mask CPUID[0x8000001F].eax according to supported featuresPaolo Bonzini5-1/+20
Add a reverse-CPUID entry for the memory encryption word, 0x8000001F.EAX, and use it to override the supported CPUID flags reported to userspace. Masking the reported CPUID flags avoids over-reporting KVM support, e.g. without the mask a SEV-SNP capable CPU may incorrectly advertise SNP support to userspace. Clear SEV/SEV-ES if their corresponding module parameters are disabled, and clear the memory encryption leaf completely if SEV is not fully supported in KVM. Advertise SME_COHERENT in addition to SEV and SEV-ES, as the guest can use SME_COHERENT to avoid CLFLUSH operations. Explicitly omit SME and VM_PAGE_FLUSH from the reporting. These features are used by KVM, but are not exposed to the guest, e.g. guest access to related MSRs will fault. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Move SEV module params/variables to sev.cSean Christopherson3-16/+13
Unconditionally invoke sev_hardware_setup() when configuring SVM and handle clearing the module params/variable 'sev' and 'sev_es' in sev_hardware_setup(). This allows making said variables static within sev.c and reduces the odds of a collision with guest code, e.g. the guest side of things has already laid claim to 'sev_enabled'. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Disable SEV/SEV-ES if NPT is disabledSean Christopherson1-15/+15
Disable SEV and SEV-ES if NPT is disabled. While the APM doesn't clearly state that NPT is mandatory, it's alluded to by: The guest page tables, managed by the guest, may mark data memory pages as either private or shared, thus allowing selected pages to be shared outside the guest. And practically speaking, shadow paging can't work since KVM can't read the guest's page tables. Fixes: e9df09428996 ("KVM: SVM: Add sev module_param") Cc: Brijesh Singh <brijesh.singh@amd.com Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Free sev_asid_bitmap during init if SEV setup failsSean Christopherson1-1/+4
Free sev_asid_bitmap if the reclaim bitmap allocation fails, othwerise KVM will unnecessarily keep the bitmap when SEV is not fully enabled. Freeing the page is also necessary to avoid introducing a bug when a future patch eliminates svm_sev_enabled() in favor of using the global 'sev' flag directly. While sev_hardware_enabled() checks max_sev_asid, which is true even if KVM setup fails, 'sev' will be true if and only if KVM setup fully succeeds. Fixes: 33af3a7ef9e6 ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations") Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Zero out the VMCB array used to track SEV ASID associationSean Christopherson1-3/+2
Zero out the array of VMCB pointers so that pre_sev_run() won't see garbage when querying the array to detect when an SEV ASID is being associated with a new VMCB. In practice, reading random values is all but guaranteed to be benign as a false negative (which is extremely unlikely on its own) can only happen on CPU0 on the first VMRUN and would only cause KVM to skip the ASID flush. For anything bad to happen, a previous instance of KVM would have to exit without flushing the ASID, _and_ KVM would have to not flush the ASID at any time while building the new SEV guest. Cc: Borislav Petkov <bp@suse.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Fixes: 70cd94e60c73 ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26x86/sev: Drop redundant and potentially misleading 'sev_enabled'Sean Christopherson3-8/+4
Drop the sev_enabled flag and switch its one user over to sev_active(). sev_enabled was made redundant with the introduction of sev_status in commit b57de6cd1639 ("x86/sev-es: Add SEV-ES Feature Detection"). sev_enabled and sev_active() are guaranteed to be equivalent, as each is true iff 'sev_status & MSR_AMD64_SEV_ENABLED' is true, and are only ever written in tandem (ignoring compressed boot's version of sev_status). Removing sev_enabled avoids confusion over whether it refers to the guest or the host, and will also allow KVM to usurp "sev_enabled" for its own purposes. No functional change intended. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-7-seanjc@google.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: x86: Move reverse CPUID helpers to separate header fileRicardo Koller2-176/+186
Split out the reverse CPUID machinery to a dedicated header file so that KVM selftests can reuse the reverse CPUID definitions without introducing any '#ifdef __KERNEL__' pollution. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Message-Id: <20210422005626.564163-2-ricarkol@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: x86: Rename GPR accessors to make mode-aware variants the defaultsSean Christopherson7-36/+41
Append raw to the direct variants of kvm_register_read/write(), and drop the "l" from the mode-aware variants. I.e. make the mode-aware variants the default, and make the direct variants scary sounding so as to discourage use. Accessing the full 64-bit values irrespective of mode is rarely the desired behavior. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Use default rAX size for INVLPGA emulationSean Christopherson1-3/+9
Drop bits 63:32 of RAX when grabbing the address for INVLPGA emulation outside of 64-bit mode to make KVM's emulation slightly less wrong. The address for INVLPGA is determined by the effective address size, i.e. it's not hardcoded to 64/32 bits for a given mode. Add a FIXME to call out that the emulation is wrong. Opportunistically tweak the ASID handling to make it clear that it's defined by ECX, not rCX. Per the APM: The portion of rAX used to form the address is determined by the effective address size (current execution mode and optional address size prefix). The ASID is taken from ECX. Fixes: ff092385e828 ("KVM: SVM: Implement INVLPGA") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: x86/xen: Drop RAX[63:32] when processing hypercallSean Christopherson1-1/+1
Truncate RAX to 32 bits, i.e. consume EAX, when retrieving the hypecall index for a Xen hypercall. Per Xen documentation[*], the index is EAX when the vCPU is not in 64-bit mode. [*] http://xenbits.xenproject.org/docs/sphinx-unstable/guest-guide/x86/hypercall-abi.html Fixes: 23200b7a30de ("KVM: x86/xen: intercept xen hypercalls if enabled") Cc: Joao Martins <joao.m.martins@oracle.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: nVMX: Truncate base/index GPR value on address calc in !64-bitSean Christopherson1-2/+2
Drop bits 63:32 of the base and/or index GPRs when calculating the effective address of a VMX instruction memory operand. Outside of 64-bit mode, memory encodings are strictly limited to E*X and below. Fixes: 064aea774768 ("KVM: nVMX: Decoding memory operands of VMX instructions") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: nVMX: Truncate bits 63:32 of VMCS field on nested check in !64-bitSean Christopherson1-1/+1
Drop bits 63:32 of the VMCS field encoding when checking for a nested VM-Exit on VMREAD/VMWRITE in !64-bit mode. VMREAD and VMWRITE always use 32-bit operands outside of 64-bit mode. The actual emulation of VMREAD/VMWRITE does the right thing, this bug is purely limited to incorrectly causing a nested VM-Exit if a GPR happens to have bits 63:32 set outside of 64-bit mode. Fixes: a7cde481b6e8 ("KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by vmcs12 vmread/vmwrite bitmaps") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: VMX: Truncate GPR value for DR and CR reads in !64-bit modeSean Christopherson1-3/+3
Drop bits 63:32 when storing a DR/CR to a GPR when the vCPU is not in 64-bit mode. Per the SDM: The operand size for these instructions is always 32 bits in non-64-bit modes, regardless of the operand-size attribute. CR8 technically isn't affected as CR8 isn't accessible outside of 64-bit mode, but fix it up for consistency and to allow for future cleanup. Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit modeSean Christopherson1-4/+4
Drop bits 63:32 on loads/stores to/from DRs and CRs when the vCPU is not in 64-bit mode. The APM states bits 63:32 are dropped for both DRs and CRs: In 64-bit mode, the operand size is fixed at 64 bits without the need for a REX prefix. In non-64-bit mode, the operand size is fixed at 32 bits and the upper 32 bits of the destination are forced to 0. Fixes: 7ff76d58a9dc ("KVM: SVM: enhance MOV CR intercept handler") Fixes: cae3797a4639 ("KVM: SVM: enhance mov DR intercept handler") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: x86: Check CR3 GPA for validity regardless of vCPU modeSean Christopherson1-3/+8
Check CR3 for an invalid GPA even if the vCPU isn't in long mode. For bigger emulation flows, notably RSM, the vCPU mode may not be accurate if CR0/CR4 are loaded after CR3. For MOV CR3 and similar flows, the caller is responsible for truncating the value. Fixes: 660a5d517aaa ("KVM: x86: save/load state on SMM switch") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: x86: Remove emulator's broken checks on CR0/CR3/CR4 loadsSean Christopherson1-77/+3
Remove the emulator's checks for illegal CR0, CR3, and CR4 values, as the checks are redundant, outdated, and in the case of SEV's C-bit, broken. The emulator manually calculates MAXPHYADDR from CPUID and neglects to mask off the C-bit. For all other checks, kvm_set_cr*() are a superset of the emulator checks, e.g. see CR4.LA57. Fixes: a780a3ea6282 ("KVM: X86: Fix reserved bits check for MOV to CR3") Cc: Babu Moger <babu.moger@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-2-seanjc@google.com> Cc: stable@vger.kernel.org [Unify check_cr_read and check_cr_write. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>