summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-05-13KVM: x86: replace is_smm checks with kvm_x86_ops.smi_allowedPaolo Bonzini3-5/+5
Do not hardcode is_smm so that all the architectural conditions for blocking SMIs are listed in a single place. Well, in two places because this introduces some code duplication between Intel and AMD. This ensures that nested SVM obeys GIF in kvm_vcpu_has_events. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: x86: Make return for {interrupt_nmi,smi}_allowed() a bool instead of intSean Christopherson3-18/+18
Return an actual bool for kvm_x86_ops' {interrupt_nmi}_allowed() hook to better reflect the return semantics, and to avoid creating an even bigger mess when the related VMX code is refactored in upcoming patches. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200423022550.15113-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: x86: Set KVM_REQ_EVENT if run is canceled with req_immediate_exit setSean Christopherson1-0/+2
Re-request KVM_REQ_EVENT if vcpu_enter_guest() bails after processing pending requests and an immediate exit was requested. This fixes a bug where a pending event, e.g. VMX preemption timer, is delayed and/or lost if the exit was deferred due to something other than a higher priority _injected_ event, e.g. due to a pending nested VM-Enter. This bug only affects the !injected case as kvm_x86_ops.cancel_injection() sets KVM_REQ_EVENT to redo the injection, but that's purely serendipitous behavior with respect to the deferred event. Note, emulated preemption timer isn't the only event that can be affected, it simply happens to be the only event where not re-requesting KVM_REQ_EVENT is blatantly visible to the guest. Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200423022550.15113-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: nVMX: Open a window for pending nested VMX preemption timerSean Christopherson3-2/+18
Add a kvm_x86_ops hook to detect a nested pending "hypervisor timer" and use it to effectively open a window for servicing the expired timer. Like pending SMIs on VMX, opening a window simply means requesting an immediate exit. This fixes a bug where an expired VMX preemption timer (for L2) will be delayed and/or lost if a pending exception is injected into L2. The pending exception is rightly prioritized by vmx_check_nested_events() and injected into L2, with the preemption timer left pending. Because no window opened, L2 is free to run uninterrupted. Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer") Reported-by: Jim Mattson <jmattson@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Peter Shier <pshier@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200423022550.15113-3-sean.j.christopherson@intel.com> [Check it in kvm_vcpu_has_events too, to ensure that the preemption timer is serviced promptly even if the vCPU is halted and L1 is not intercepting HLT. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: nVMX: Preserve exception priority irrespective of exiting behaviorSean Christopherson1-5/+7
Short circuit vmx_check_nested_events() if an exception is pending and needs to be injected into L2, priority between coincident events is not dependent on exiting behavior. This fixes a bug where a single-step #DB that is not intercepted by L1 is incorrectly dropped due to servicing a VMX Preemption Timer VM-Exit. Injected exceptions also need to be blocked if nested VM-Enter is pending or an exception was already injected, otherwise injecting the exception could overwrite an existing event injection from L1. Technically, this scenario should be impossible, i.e. KVM shouldn't inject its own exception during nested VM-Enter. This will be addressed in a future patch. Note, event priority between SMI, NMI and INTR is incorrect for L2, e.g. SMI should take priority over VM-Exit on NMI/INTR, and NMI that is injected into L2 should take priority over VM-Exit INTR. This will also be addressed in a future patch. Fixes: b6b8a1451fc4 ("KVM: nVMX: Rework interception of IRQs and NMIs") Reported-by: Jim Mattson <jmattson@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Peter Shier <pshier@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200423022550.15113-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: SVM: Implement check_nested_events for NMICathy Avery3-19/+23
Migrate nested guest NMI intercept processing to new check_nested_events. Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20200414201107.22952-2-cavery@redhat.com> [Reorder clauses as NMIs have higher priority than IRQs; inject immediate vmexit as is now done for IRQ vmexits. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: SVM: immediately inject INTR vmexitPaolo Bonzini1-3/+3
We can immediately leave SVM guest mode in svm_check_nested_events now that we have the nested_run_pending mechanism. This makes things easier because we can run the rest of inject_pending_event with GIF=0, and KVM will naturally end up requesting the next interrupt window. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: SVM: leave halted state on vmexitPaolo Bonzini1-0/+3
Similar to VMX, we need to leave the halted state when performing a vmexit. Failure to do so will cause a hang after vmexit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: SVM: introduce nested_run_pendingPaolo Bonzini3-1/+8
We want to inject vmexits immediately from svm_check_nested_events, so that the interrupt/NMI window requests happen in inject_pending_event right after it returns. This however has the same issue as in vmx_check_nested_events, so introduce a nested_run_pending flag with the exact same purpose of delaying vmexit injection after the vmentry. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13Merge branch 'kvm-amd-fixes' into HEADPaolo Bonzini4723-69549/+185187
2020-05-13KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.cBabu Moger3-18/+18
Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU resource isn't. It can be read with XSAVE and written with XRSTOR. So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state), the guest can read the host value. In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could potentially use XRSTOR to change the host PKRU value. While at it, move pkru state save/restore to common code and the host_pkru field to kvm_vcpu_arch. This will let SVM support protection keys. Cc: stable@vger.kernel.org Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <158932794619.44260.14508381096663848853.stgit@naples-babu.amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-11Linux 5.7-rc5Linus Torvalds1-1/+1
2020-05-10Merge tag 'x86-urgent-2020-05-10' of ↵Linus Torvalds11-90/+138
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for x86: - Ensure that direct mapping alias is always flushed when changing page attributes. The optimization for small ranges failed to do so when the virtual address was in the vmalloc or module space. - Unbreak the trace event registration for syscalls without arguments caused by the refactoring of the SYSCALL_DEFINE0() macro. - Move the printk in the TSC deadline timer code to a place where it is guaranteed to only be called once during boot and cannot be rearmed by clearing warn_once after boot. If it's invoked post boot then lockdep rightfully complains about a potential deadlock as the calling context is different. - A series of fixes for objtool and the ORC unwinder addressing variety of small issues: - Stack offset tracking for indirect CFAs in objtool ignored subsequent pushs and pops - Repair the unwind hints in the register clearing entry ASM code - Make the unwinding in the low level exit to usermode code stop after switching to the trampoline stack. The unwind hint is no longer valid and the ORC unwinder emits a warning as it can't find the registers anymore. - Fix unwind hints in switch_to_asm() and rewind_stack_do_exit() which caused objtool to generate bogus ORC data. - Prevent unwinder warnings when dumping the stack of a non-current task as there is no way to be sure about the validity because the dumped stack can be a moving target. - Make the ORC unwinder behave the same way as the frame pointer unwinder when dumping an inactive tasks stack and do not skip the first frame. - Prevent ORC unwinding before ORC data has been initialized - Immediately terminate unwinding when a unknown ORC entry type is found. - Prevent premature stop of the unwinder caused by IRET frames. - Fix another infinite loop in objtool caused by a negative offset which was not catched. - Address a few build warnings in the ORC unwinder and add missing static/ro_after_init annotations" * tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES x86/apic: Move TSC deadline timer debug printk ftrace/x86: Fix trace event registration for syscalls without arguments x86/mm/cpa: Flush direct map alias during cpa objtool: Fix infinite loop in for_offset_range() x86/unwind/orc: Fix premature unwind stoppage due to IRET frames x86/unwind/orc: Fix error path for bad ORC entry type x86/unwind/orc: Prevent unwinding before ORC initialization x86/unwind/orc: Don't skip the first frame for inactive tasks x86/unwind: Prevent false warnings for non-current tasks x86/unwind/orc: Convert global variables to static x86/entry/64: Fix unwind hints in rewind_stack_do_exit() x86/entry/64: Fix unwind hints in __switch_to_asm() x86/entry/64: Fix unwind hints in kernel exit path x86/entry/64: Fix unwind hints in register clearing code objtool: Fix stack offset tracking for indirect CFAs
2020-05-10Merge tag 'objtool-urgent-2020-05-10' of ↵Linus Torvalds1-2/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Thomas Gleixner: "A single fix for objtool to prevent an infinite loop in the jump table search which can be triggered when building the kernel with '-ffunction-sections'" * tag 'objtool-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix infinite loop in find_jump_table()
2020-05-10Merge tag 'locking-urgent-2020-05-10' of ↵Linus Torvalds1-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Thomas Gleixner: "A single fix for the fallout of the recent futex uacess rework. With those changes GCC9 fails to analyze arch_futex_atomic_op_inuser() correctly and emits a 'maybe unitialized' warning. While we usually ignore compiler stupidity the conditional store is pointless anyway because the correct case has to store. For the fault case the extra store does no harm" * tag 'locking-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ARM: futex: Address build warning
2020-05-10Merge tag 'iommu-fixes-v5.7-rc4' of ↵Linus Torvalds3-47/+162
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: - Race condition fixes for the AMD IOMMU driver. These are five patches fixing two race conditions around increase_address_space(). The first race condition was around the non-atomic update of the domain page-table root pointer and the variable containing the page-table depth (called mode). This is fixed now be merging page-table root and mode into one 64-bit field which is read/written atomically. The second race condition was around updating the page-table root pointer and making it public before the hardware caches were flushed. This could cause addresses to be mapped and returned to drivers which are not reachable by IOMMU hardware yet, causing IO page-faults. This is fixed too by adding the necessary flushes before a new page-table root is published. Related to the race condition fixes these patches also add a missing domain_flush_complete() barrier to update_domain() and a fix to bail out of the loop which tries to increase the address space when the call to increase_address_space() fails. Qian was able to trigger the race conditions under high load and memory pressure within a few days of testing. He confirmed that he has seen no issues anymore with the fixes included here. - Fix for a list-handling bug in the VirtIO IOMMU driver. * tag 'iommu-fixes-v5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/virtio: Reverse arguments to list_add iommu/amd: Do not flush Device Table in iommu_map_page() iommu/amd: Update Device Table in increase_address_space() iommu/amd: Call domain_flush_complete() in update_domain() iommu/amd: Do not loop forever when trying to increase address space iommu/amd: Fix race in increase_address_space()/fetch_pte()
2020-05-10Merge tag 'block-5.7-2020-05-09' of git://git.kernel.dk/linux-blockLinus Torvalds12-68/+107
Pull block fixes from Jens Axboe: - a small series fixing a use-after-free of bdi name (Christoph,Yufen) - NVMe fix for a regression with the smaller CQ update (Alexey) - NVMe fix for a hang at namespace scanning error recovery (Sagi) - fix race with blk-iocost iocg->abs_vdebt updates (Tejun) * tag 'block-5.7-2020-05-09' of git://git.kernel.dk/linux-block: nvme: fix possible hang when ns scanning fails during error recovery nvme-pci: fix "slimmer CQ head update" bdi: add a ->dev_name field to struct backing_dev_info bdi: use bdi_dev_name() to get device name bdi: move bdi_dev_name out of line vboxsf: don't use the source name in the bdi name iocost: protect iocg->abs_vdebt with iocg->waitq.lock
2020-05-10gcc-10: mark more functions __init to avoid section mismatch warningsLinus Torvalds2-2/+2
It seems that for whatever reason, gcc-10 ends up not inlining a couple of functions that used to be inlined before. Even if they only have one single callsite - it looks like gcc may have decided that the code was unlikely, and not worth inlining. The code generation difference is harmless, but caused a few new section mismatch errors, since the (now no longer inlined) function wasn't in the __init section, but called other init functions: Section mismatch in reference from the function kexec_free_initrd() to the function .init.text:free_initrd_mem() Section mismatch in reference from the function tpm2_calc_event_log_size() to the function .init.text:early_memremap() Section mismatch in reference from the function tpm2_calc_event_log_size() to the function .init.text:early_memunmap() So add the appropriate __init annotation to make modpost not complain. In both cases there were trivially just a single callsite from another __init function. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-10Merge tag 'riscv-for-linus-5.7-rc5' of ↵Linus Torvalds9-34/+121
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: "A smattering of fixes and cleanups: - Dead code removal. - Exporting riscv_cpuid_to_hartid_mask for modules. - Per-CPU tracking of ISA features. - Setting max_pfn correctly when probing memory. - Adding a note to the VDSO so glibc can check the kernel's version without a uname(). - A fix to force the bootloader to initialize the boot spin tables, which still get used as a fallback when SBI-0.1 is enabled" * tag 'riscv-for-linus-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Remove unused code from STRICT_KERNEL_RWX riscv: force __cpu_up_ variables to put in data section riscv: add Linux note to vdso riscv: set max_pfn to the PFN of the last page RISC-V: Remove N-extension related defines RISC-V: Add bitmap reprensenting ISA features common across CPUs RISC-V: Export riscv_cpuid_to_hartid_mask() API
2020-05-10gcc-10: avoid shadowing standard library 'free()' in cryptoLinus Torvalds2-6/+6
gcc-10 has started warning about conflicting types for a few new built-in functions, particularly 'free()'. This results in warnings like: crypto/xts.c:325:13: warning: conflicting types for built-in function ‘free’; expected ‘void(void *)’ [-Wbuiltin-declaration-mismatch] because the crypto layer had its local freeing functions called 'free()'. Gcc-10 is in the wrong here, since that function is marked 'static', and thus there is no chance of confusion with any standard library function namespace. But the simplest thing to do is to just use a different name here, and avoid this gcc mis-feature. [ Side note: gcc knowing about 'free()' is in itself not the mis-feature: the semantics of 'free()' are special enough that a compiler can validly do special things when seeing it. So the mis-feature here is that gcc thinks that 'free()' is some restricted name, and you can't shadow it as a local static function. Making the special 'free()' semantics be a function attribute rather than tied to the name would be the much better model ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-10gcc-10: disable 'restrict' warning for nowLinus Torvalds1-0/+3
gcc-10 now warns about passing aliasing pointers to functions that take restricted pointers. That's actually a great warning, and if we ever start using 'restrict' in the kernel, it might be quite useful. But right now we don't, and it turns out that the only thing this warns about is an idiom where we have declared a few functions to be "printf-like" (which seems to make gcc pick up the restricted pointer thing), and then we print to the same buffer that we also use as an input. And people do that as an odd concatenation pattern, with code like this: #define sysfs_show_gen_prop(buffer, fmt, ...) \ snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__) where we have 'buffer' as both the destination of the final result, and as the initial argument. Yes, it's a bit questionable. And outside of the kernel, people do have standard declarations like int snprintf( char *restrict buffer, size_t bufsz, const char *restrict format, ... ); where that output buffer is marked as a restrict pointer that cannot alias with any other arguments. But in the context of the kernel, that 'use snprintf() to concatenate to the end result' does work, and the pattern shows up in multiple places. And we have not marked our own version of snprintf() as taking restrict pointers, so the warning is incorrect for now, and gcc picks it up on its own. If we do start using 'restrict' in the kernel (and it might be a good idea if people find places where it matters), we'll need to figure out how to avoid this issue for snprintf and friends. But in the meantime, this warning is not useful. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-10gcc-10: disable 'stringop-overflow' warning for nowLinus Torvalds1-0/+1
This is the final array bounds warning removal for gcc-10 for now. Again, the warning is good, and we should re-enable all these warnings when we have converted all the legacy array declaration cases to flexible arrays. But in the meantime, it's just noise. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-10nvme: fix possible hang when ns scanning fails during error recoverySagi Grimberg1-1/+1
When the controller is reconnecting, the host fails I/O and admin commands as the host cannot reach the controller. ns scanning may revalidate namespaces during that period and it is wrong to remove namespaces due to these failures as we may hang (see 205da2434301). One command that may fail is nvme_identify_ns_descs. Since we return success due to having ns identify descriptor list optional, we continue to compare ns identifiers in nvme_revalidate_disk, obviously fail and return -ENODEV to nvme_validate_ns, which will remove the namespace. Exactly what we don't want to happen. Fixes: 22802bf742c2 ("nvme: Namepace identification descriptor list is optional") Tested-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-10nvme-pci: fix "slimmer CQ head update"Alexey Dobriyan1-1/+5
Pre-incrementing ->cq_head can't be done in memory because OOB value can be observed by another context. This devalues space savings compared to original code :-\ $ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-32 (-32) Function old new delta nvme_poll_irqdisable 464 456 -8 nvme_poll 455 447 -8 nvme_irq 388 380 -8 nvme_dev_disable 955 947 -8 But the code is minimal now: one read for head, one read for q_depth, one increment, one comparison, single instruction phase bit update and one write for new head. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: John Garry <john.garry@huawei.com> Tested-by: John Garry <john.garry@huawei.com> Fixes: e2a366a4b0feaeb ("nvme-pci: slimmer CQ head update") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-10bdi: add a ->dev_name field to struct backing_dev_infoChristoph Hellwig2-2/+4
Cache a copy of the name for the life time of the backing_dev_info structure so that we can reference it even after unregistering. Fixes: 68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears") Reported-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-10bdi: use bdi_dev_name() to get device nameYufen Yu4-8/+10
Use the common interface bdi_dev_name() to get device name. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Add missing <linux/backing-dev.h> include BFQ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-10gcc-10: disable 'array-bounds' warning for nowLinus Torvalds1-0/+1
This is another fine warning, related to the 'zero-length-bounds' one, but hitting the same historical code in the kernel. Because C didn't historically support flexible array members, we have code that instead uses a one-sized array, the same way we have cases of zero-sized arrays. The one-sized arrays come from either not wanting to use the gcc zero-sized array extension, or from a slight convenience-feature, where particularly for strings, the size of the structure now includes the allocation for the final NUL character. So with a "char name[1];" at the end of a structure, you can do things like v = my_malloc(sizeof(struct vendor) + strlen(name)); and avoid the "+1" for the terminator. Yes, the modern way to do that is with a flexible array, and using 'offsetof()' instead of 'sizeof()', and adding the "+1" by hand. That also technically gets the size "more correct" in that it avoids any alignment (and thus padding) issues, but this is another long-term cleanup thing that will not happen for 5.7. So disable the warning for now, even though it's potentially quite useful. Having a slew of warnings that then hide more urgent new issues is not an improvement. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-10gcc-10: disable 'zero-length-bounds' warning for nowLinus Torvalds1-0/+3
This is a fine warning, but we still have a number of zero-length arrays in the kernel that come from the traditional gcc extension. Yes, they are getting converted to flexible arrays, but in the meantime the gcc-10 warning about zero-length bounds is very verbose, and is hiding other issues. I missed one actual build failure because it was hidden among hundreds of lines of warning. Thankfully I caught it on the second go before pushing things out, but it convinced me that I really need to disable the new warnings for now. We'll hopefully be all done with our conversion to flexible arrays in the not too distant future, and we can then re-enable this warning. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09Stop the ad-hoc games with -Wno-maybe-initializedLinus Torvalds3-23/+3
We have some rather random rules about when we accept the "maybe-initialized" warnings, and when we don't. For example, we consider it unreliable for gcc versions < 4.9, but also if -O3 is enabled, or if optimizing for size. And then various kernel config options disabled it, because they know that they trigger that warning by confusing gcc sufficiently (ie PROFILE_ALL_BRANCHES). And now gcc-10 seems to be introducing a lot of those warnings too, so it falls under the same heading as 4.9 did. At the same time, we have a very straightforward way to _enable_ that warning when wanted: use "W=2" to enable more warnings. So stop playing these ad-hoc games, and just disable that warning by default, with the known and straight-forward "if you want to work on the extra compiler warnings, use W=123". Would it be great to have code that is always so obvious that it never confuses the compiler whether a variable is used initialized or not? Yes, it would. In a perfect world, the compilers would be smarter, and our source code would be simpler. That's currently not the world we live in, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09Merge tag 'io_uring-5.7-2020-05-08' of git://git.kernel.dk/linux-blockLinus Torvalds2-70/+40
Pull io_uring fixes from Jens Axboe: - Fix finish_wait() balancing in file cancelation (Xiaoguang) - Ensure early cleanup of resources in ring map failure (Xiaoguang) - Ensure IORING_OP_SLICE does the right file mode checks (Pavel) - Remove file opening from openat/openat2/statx, it's not needed and messes with O_PATH * tag 'io_uring-5.7-2020-05-08' of git://git.kernel.dk/linux-block: io_uring: don't use 'fd' for openat/openat2/statx splice: move f_mode checks to do_{splice,tee}() io_uring: handle -EFAULT properly in io_uring_setup() io_uring: fix mismatched finish_wait() calls in io_uring_cancel_files()
2020-05-08Merge tag 'scsi-fixes' of ↵Linus Torvalds4-6/+7
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Four minor fixes, all in drivers (qla2xxx, ibmvfc, ibmvscsi)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ibmvscsi: Fix WARN_ON during event pool release scsi: ibmvfc: Don't send implicit logouts prior to NPIV login scsi: qla2xxx: Delete all sessions before unregister local nvme port scsi: qla2xxx: Fix hang when issuing nvme disconnect-all in NPIV
2020-05-08Merge tag 'ceph-for-5.7-rc5' of git://github.com/ceph/ceph-clientLinus Torvalds4-14/+7
Pull ceph fixes from Ilya Dryomov: "Fixes for an endianness handling bug that prevented mounts on big-endian arches, a spammy log message and a couple error paths. Also included a MAINTAINERS update" * tag 'ceph-for-5.7-rc5' of git://github.com/ceph/ceph-client: ceph: demote quotarealm lookup warning to a debug message MAINTAINERS: remove myself as ceph co-maintainer ceph: fix double unlock in handle_cap_export() ceph: fix special error code in ceph_try_get_caps() ceph: fix endianness bug when handling MDS session feature bits
2020-05-08ceph: demote quotarealm lookup warning to a debug messageLuis Henriques1-2/+2
A misconfigured cephx can easily result in having the kernel client flooding the logs with: ceph: Can't lookup inode 1 (err: -13) Change this message to debug level. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/44546 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-05-08Merge tag 'char-misc-5.7-rc5' of ↵Linus Torvalds14-51/+77
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small driver fixes for 5.7-rc5 that resolve a number of minor reported issues: - mhi bus driver fixes found as people actually use the code - phy driver fixes and compat string additions - most driver fix due to link order changing when the core moved out of staging - mei driver fix - interconnect build warning fix All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: bus: mhi: core: Fix channel device name conflict bus: mhi: core: Fix typo in comment bus: mhi: core: Offload register accesses to the controller bus: mhi: core: Remove link_status() callback bus: mhi: core: Make sure to powerdown if mhi_sync_power_up fails bus: mhi: Fix parsing of mhi_flags mei: me: disable mei interface on LBG servers. phy: qualcomm: usb-hs-28nm: Prepare clocks in init MAINTAINERS: Add Vinod Koul as Generic PHY co-maintainer interconnect: qcom: Move the static keyword to the front of declaration most: core: use function subsys_initcall() bus: mhi: core: Fix a NULL vs IS_ERR check in mhi_create_devices() phy: qcom-qusb2: Re add "qcom,sdm845-qusb2-phy" compat string phy: tegra: Select USB_COMMON for usb_get_maximum_speed()
2020-05-08Merge tag 'driver-core-5.7-rc5' of ↵Linus Torvalds10-30/+48
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are a number of small driver core fixes for 5.7-rc5 to resolve a bunch of reported issues with the current tree. Biggest here are the reverts and patches from John Stultz to resolve a bunch of deferred probe regressions we have been seeing in 5.7-rc right now. Along with those are some other smaller fixes: - coredump crash fix - devlink fix for when permissive mode was enabled - amba and platform device dma_parms fixes - component error silenced for when deferred probe happens All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: regulator: Revert "Use driver_deferred_probe_timeout for regulator_init_complete_work" driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires driver core: Use dev_warn() instead of dev_WARN() for deferred_probe_timeout warnings driver core: Revert default driver_deferred_probe_timeout value to 0 component: Silence bind error on -EPROBE_DEFER driver core: Fix handling of fw_devlink=permissive coredump: fix crash when umh is disabled amba: Initialize dma_parms for amba devices driver core: platform: Initialize dma_parms for platform devices
2020-05-08Merge tag 'staging-5.7-rc5' of ↵Linus Torvalds3-6/+4
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes from Greg KH: "Here are three small driver fixes for 5.7-rc5. Two of these are documentation fixes: - MAINTAINERS update due to removed driver - removing Wolfram from the ks7010 driver TODO file The other patch is a real fix: - fix gasket driver to proper check the return value of a call All of these have been in linux-next for a while with no reported issues" * tag 'staging-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: gasket: Check the return value of gasket_get_bar_index() staging: ks7010: remove me from CC list MAINTAINERS: remove entry after hp100 driver removal
2020-05-08Merge tag 'tty-5.7-rc5' of ↵Linus Torvalds3-5/+9
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are three small TTY/Serial/VT fixes for 5.7-rc5: - revert for the bcm63xx driver "fix" that was incorrect - vt unicode console bugfix - xilinx_uartps console driver fix All of these have been in linux next with no reported issues" * tag 'tty-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: xilinx_uartps: Fix missing id assignment to the console vt: fix unicode console freeing with a common interface Revert "tty: serial: bcm63xx: fix missing clk_put() in bcm63xx_uart"
2020-05-08Merge tag 'usb-5.7-rc5' of ↵Linus Torvalds8-10/+24
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB fixes for 5.7-rc5 to resolve some reported issues: - syzbot found problems fixed - usbfs dma mapping fix - typec bugfixs - chipidea bugfix - usb4/thunderbolt fix - new device ids/quirks All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: chipidea: msm: Ensure proper controller reset using role switch API usb: typec: mux: intel: Handle alt mode HPD_HIGH usb: usbfs: correct kernel->user page attribute mismatch usb: typec: intel_pmc_mux: Fix the property names USB: core: Fix misleading driver bug report USB: serial: qcserial: Add DW5816e support USB: uas: add quirk for LaCie 2Big Quadra thunderbolt: Check return value of tb_sw_read() in usb4_switch_op() USB: serial: garmin_gps: add sanity checking for data length
2020-05-08Merge tag 'drm-fixes-2020-05-08' of git://anongit.freedesktop.org/drm/drmLinus Torvalds12-31/+57
Pull drm fixes from Dave Airlie: "Another pretty normal week. I didn't get any i915 fixes yet, so next week I'd expect double the usual i915, but otherwise a bunch of amdgpu and some scattered other fixes. hdcp: - fix HDCP regression amdgpu: - Runtime PM fixes - DC fix for PPC - Misc DC fixes virtio: - fix context ordering issue sun4i: - old gcc warning fix ingenic-drm: - missing module support" * tag 'drm-fixes-2020-05-08' of git://anongit.freedesktop.org/drm/drm: drm/amd/display: Prevent dpcd reads with passive dongles drm/amd/display: fix counter in wait_for_no_pipes_pending drm/amd/display: Update DCN2.1 DV Code Revision drm: Fix HDCP failures when SRM fw is missing sun6i: dsi: fix gcc-4.8 drm: ingenic-drm: add MODULE_DEVICE_TABLE drm/virtio: create context before RESOURCE_CREATE_2D in 3D mode drm/amd/display: work around fp code being emitted outside of DC_FP_START/END drm/amdgpu/dc: Use WARN_ON_ONCE for ASSERT drm/amdgpu: drop redundant cg/pg ungate on runpm enter drm/amdgpu: move kfd suspend after ip_suspend_phase1
2020-05-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds14-78/+275
Merge misc fixes from Andrew Morton: "14 fixes and one selftest to verify the ipc fixes herein" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: limit boost_watermark on small zones ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST mm/vmscan: remove unnecessary argument description of isolate_lru_pages() epoll: atomically remove wait entry on wake up kselftests: introduce new epoll60 testcase for catching lost wakeups percpu: make pcpu_alloc() aware of current gfp context mm/slub: fix incorrect interpretation of s->offset scripts/gdb: repair rb_first() and rb_last() eventpoll: fix missing wakeup for ovflist in ep_poll_callback arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() scripts/decodecode: fix trapping instruction formatting kernel/kcov.c: fix typos in kcov_remote_start documentation mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() mm, memcg: fix error return value of mem_cgroup_css_alloc() ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
2020-05-08iommu/virtio: Reverse arguments to list_addJulia Lawall1-1/+1
Elsewhere in the file, there is a list_for_each_entry with &vdev->resv_regions as the second argument, suggesting that &vdev->resv_regions is the list head. So exchange the arguments on the list_add call to put the list head in the second argument. Fixes: 2a5a31487445 ("iommu/virtio: Add probe request") Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/1588704467-13431-1-git-send-email-Julia.Lawall@inria.fr Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-08KVM: SVM: Disable AVIC before setting V_IRQSuravee Suthikulpanit1-1/+12
The commit 64b5bd270426 ("KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1") introduced a WARN_ON, which checks if AVIC is enabled when trying to set V_IRQ in the VMCB for enabling irq window. The following warning is triggered because the requesting vcpu (to deactivate AVIC) does not get to process APICv update request for itself until the next #vmexit. WARNING: CPU: 0 PID: 118232 at arch/x86/kvm/svm/svm.c:1372 enable_irq_window+0x6a/0xa0 [kvm_amd] RIP: 0010:enable_irq_window+0x6a/0xa0 [kvm_amd] Call Trace: kvm_arch_vcpu_ioctl_run+0x6e3/0x1b50 [kvm] ? kvm_vm_ioctl_irq_line+0x27/0x40 [kvm] ? _copy_to_user+0x26/0x30 ? kvm_vm_ioctl+0xb3e/0xd90 [kvm] ? set_next_entity+0x78/0xc0 kvm_vcpu_ioctl+0x236/0x610 [kvm] ksys_ioctl+0x8a/0xc0 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x58/0x210 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes by sending APICV update request to all other vcpus, and immediately update APIC for itself. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lkml.org/lkml/2020/5/2/167 Fixes: 64b5bd270426 ("KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1") Message-Id: <1588818939-54264-1-git-send-email-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08KVM: Introduce kvm_make_all_cpus_request_except()Suravee Suthikulpanit4-5/+16
This allows making request to all other vcpus except the one specified in the parameter. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <1588771076-73790-2-git-send-email-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08KVM: VMX: pass correct DR6 for GD userspace exitPaolo Bonzini2-2/+24
When KVM_EXIT_DEBUG is raised for the disabled-breakpoints case (DR7.GD), DR6 was incorrectly copied from the value in the VM. Instead, DR6.BD should be set in order to catch this case. On AMD this does not need any special code because the processor triggers a #DB exception that is intercepted. However, the testcase would fail without the previous patch because both DR6.BS and DR6.BD would be set. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6Paolo Bonzini5-30/+32
There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the different handling of DR6 on intercepted #DB exceptions on Intel and AMD. On Intel, #DB exceptions transmit the DR6 value via the exit qualification field of the VMCS, and the exit qualification only contains the description of the precise event that caused a vmexit. On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception was to be injected into the guest. This has two effects when guest debugging is in use: * the guest DR6 is clobbered * the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather than just the last one that happened (the testcase in the next patch covers this issue). This patch fixes both issues by emulating, so to speak, the Intel behavior on AMD processors. The important observation is that (after the previous patches) the VMCB value of DR6 is only ever observable from the guest is KVM_DEBUGREG_WONT_EXIT is set. Therefore we can actually set vmcb->save.dr6 to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it will be if guest debugging is enabled. Therefore it is possible to enter the guest with an all-zero DR6, reconstruct the #DB payload from the DR6 we get at exit time, and let kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6. Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT is set, but this is harmless. This may not be the most optimized way to deal with this, but it is simple and, being confined within SVM code, it gets rid of the set_dr6 callback and kvm_update_dr6. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6Paolo Bonzini5-28/+21
kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the second argument. Ensure that the VMCB value is synchronized to vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so that the current value of DR6 is always available in vcpu->arch.dr6. The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08Merge tag 'drm-misc-fixes-2020-05-07' of ↵Dave Airlie6-4/+14
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes A few minor fixes for an ordering issue in virtio, an (old) gcc warning in sun4i, a probe issue in ingenic-drm and a regression in the HDCP support. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20200507160130.id64niqgf5wsha4u@gilmour.lan
2020-05-08Merge tag 'amd-drm-fixes-5.7-2020-05-06' of ↵Dave Airlie6-27/+43
git://people.freedesktop.org/~agd5f/linux into drm-fixes amd-drm-fixes-5.7-2020-05-06: amdgpu: - Runtime PM fixes - DC fix for PPC - Misc DC fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200506212257.3893-1-alexander.deucher@amd.com
2020-05-08Merge branch 'for-v5.7' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem fix from James Morris: "Fix the default value of fs_context_parse_param hook" * 'for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security: Fix the default value of fs_context_parse_param hook
2020-05-08mm: limit boost_watermark on small zonesHenry Willard1-0/+8
Commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") adds a boost_watermark() function which increases the min watermark in a zone by at least pageblock_nr_pages or the number of pages in a page block. On Arm64, with 64K pages and 512M huge pages, this is 8192 pages or 512M. It does this regardless of the number of managed pages managed in the zone or the likelihood of success. This can put the zone immediately under water in terms of allocating pages from the zone, and can cause a small machine to fail immediately due to OoM. Unlike set_recommended_min_free_kbytes(), which substantially increases min_free_kbytes and is tied to THP, boost_watermark() can be called even if THP is not active. The problem is most likely to appear on architectures such as Arm64 where pageblock_nr_pages is very large. It is desirable to run the kdump capture kernel in as small a space as possible to avoid wasting memory. In some architectures, such as Arm64, there are restrictions on where the capture kernel can run, and therefore, the space available. A capture kernel running in 768M can fail due to OoM immediately after boost_watermark() sets the min in zone DMA32, where most of the memory is, to 512M. It fails even though there is over 500M of free memory. With boost_watermark() suppressed, the capture kernel can run successfully in 448M. This patch limits boost_watermark() to boosting a zone's min watermark only when there are enough pages that the boost will produce positive results. In this case that is estimated to be four times as many pages as pageblock_nr_pages. Mel said: : There is no harm in marking it stable. Clearly it does not happen very : often but it's not impossible. 32-bit x86 is a lot less common now : which would previously have been vulnerable to triggering this easily. : ppc64 has a larger base page size but typically only has one zone. : arm64 is likely the most vulnerable, particularly when CMA is : configured with a small movable zone. Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") Signed-off-by: Henry Willard <henry.willard@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1588294148-6586-1-git-send-email-henry.willard@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>