summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-11-25s390/vdso: filter out -mstack-guard and -mstack-sizeSven Schnelle2-6/+9
commit 00b55eaf45549ce26424224d069a091c7e5d8bac upstream. When CONFIG_VMAP_STACK is disabled, the user can enable CONFIG_STACK_CHECK, which adds a stack overflow check to each C function in the kernel. This is also done for functions in the vdso page. These functions are run in user context and user stack sizes are usually different to what the kernel uses. This might trigger the stack check although the stack size is valid. Therefore filter the -mstack-guard and -mstack-size flags when compiling vdso C files. Cc: stable@kernel.org # 5.10+ Fixes: 4bff8cb54502 ("s390: convert to GENERIC_VDSO") Reported-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25s390/boot: simplify and fix kernel memory layout setupVasily Gorbik2-58/+32
commit 9a39abb7c9aab50eec4ac4421e9ee7f3de013d24 upstream. Initial KASAN shadow memory range was picked to preserve original kernel modules area position. With protected execution support, which might impose addressing limitation on vmalloc area and hence affect modules area position, current fixed KASAN shadow memory range is only making kernel memory layout setup more complex. So move it to the very end of available virtual space and simplify calculations. At the same time return to previous kernel address space split. In particular commit 0c4f2623b957 ("s390: setup kernel memory layout early") introduced precise identity map size calculation and keeping vmemmap left most starting from a fresh region table entry. This didn't take into account additional mapping region requirement for potential DCSS mapping above available physical memory. So go back to virtual space split between 1:1 mapping & vmemmap array once vmalloc area size is subtracted. Cc: stable@vger.kernel.org Fixes: 0c4f2623b957 ("s390: setup kernel memory layout early") Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25s390/setup: avoid reserving memory above identity mappingVasily Gorbik1-9/+1
commit 420f48f636b98fd685f44a3acc4c0a7c0840910d upstream. Such reserved memory region, if not cleaned up later causes problems when memblock_free_all() is called to release free pages to the buddy allocator and those reserved regions are carried over to reserve_bootmem_region() which marks the pages as PageReserved. Instead use memblock_set_current_limit() to make sure memblock allocations do not go over identity mapping (which could happen when "mem=" option is used or during kdump). Cc: stable@vger.kernel.org Fixes: 73045a08cf55 ("s390: unify identity mapping limits handling") Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25pinctrl: ralink: include 'ralink_regs.h' in 'pinctrl-mt7620.c'Sergio Paracuellos1-0/+1
commit a5b9703fe11cd1d6d7a60102aa2abe686dc1867f upstream. mt7620.h, included by pinctrl-mt7620.c, mentions MT762X_SOC_MT7628AN declared in ralink_regs.h. Fixes: 745ec436de72 ("pinctrl: ralink: move MT7620 SoC pinmux config into a new 'pinctrl-mt7620.c' file") Cc: stable@vger.kernel.org Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Link: https://lore.kernel.org/r/20211031064046.13533-1-sergio.paracuellos@gmail.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25scsi: qla2xxx: Fix mailbox direction flags in qla2xxx_get_adapter_id()Ewan D. Milne1-4/+2
commit 392006871bb26166bcfafa56faf49431c2cfaaa8 upstream. The SCM changes set the flags in mcp->out_mb instead of mcp->in_mb so the data was not actually being read into the mcp->mb[] array from the adapter. Link: https://lore.kernel.org/r/20211108183012.13895-1-emilne@redhat.com Fixes: 9f2475fe7406 ("scsi: qla2xxx: SAN congestion management implementation") Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25ata: libata: add missing ata_identify_page_supported() callsDamien Le Moal1-1/+5
commit 06f6c4c6c3e8354dceddd77bd58f9a7a84c67246 upstream. ata_dev_config_ncq_prio() and ata_dev_config_devslp() both access pages of the IDENTIFY DEVICE data log. Before calling ata_read_log_page(), make sure to check for the existence of the IDENTIFY DEVICE data log and of the log page accessed using ata_identify_page_supported(). This avoids useless error messages from ata_read_log_page() and failures with some LLDD scsi drivers using libsas. Reported-by: Nikolay <knv418@gmail.com> Cc: stable@kernel.org # 5.15 Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Matthew Perkowski <mgperkow@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25ata: libata: improve ata_read_log_page() error messageDamien Le Moal1-2/+3
commit 23ef63d5e14f916c5bba39128ebef395859d7c0f upstream. If ata_read_log_page() fails to read a log page, the ata_dev_err() error message only print the page number, omitting the log number. In case of error, facilitate debugging by also printing the log number. Cc: stable@kernel.org # 5.15 Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Matthew Perkowski <mgperkow@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25Revert "parisc: Reduce sigreturn trampoline to 3 instructions"Helge Deller3-8/+9
commit 79df39d535c7a3770856fe9f5aba8c0ad1eebdb6 upstream. This reverts commit e4f2006f1287e7ea17660490569cff323772dac4. This patch shows problems with signal handling. Revert it for now. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v5.15 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25Revert "drm/i915/tgl/dsi: Gate the ddi clocks after pll mapping"Vandita Kulkarni1-8/+2
commit f15863b27752682bb700c21de5f83f613a0fb77e upstream. This reverts commit 991d9557b0c4 ("drm/i915/tgl/dsi: Gate the ddi clocks after pll mapping"). The Bspec was updated recently with the pll ungate sequence similar to that of icl dsi enable sequence. Hence reverting. Bspec: 49187 Fixes: 991d9557b0c4 ("drm/i915/tgl/dsi: Gate the ddi clocks after pll mapping") Cc: <stable@vger.kernel.org> # v5.4+ Signed-off-by: Vandita Kulkarni <vandita.kulkarni@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211109120428.15211-1-vandita.kulkarni@intel.com (cherry picked from commit 4579509ef181480f4e4510d436c691519167c5c2) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25powerpc/8xx: Fix pinned TLBs with CONFIG_STRICT_KERNEL_RWXChristophe Leroy1-6/+7
commit 1e35eba4055149c578baf0318d2f2f89ea3c44a0 upstream. As spotted and explained in commit c12ab8dbc492 ("powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST"), the selection of STRICT_KERNEL_RWX without selecting DEBUG_RODATA_TEST has spotted the lack of the DIRTY bit in the pinned kernel data TLBs. This problem should have been detected a lot earlier if things had been working as expected. But due to an incredible level of chance or mishap, this went undetected because of a set of bugs: In fact the DTLBs were not pinned, because instead of setting the reserve bit in MD_CTR, it was set in MI_CTR that is the register for ITLBs. But then, another huge bug was there: the physical address was reset to 0 at the boundary between RO and RW areas, leading to the same physical space being mapped at both 0xc0000000 and 0xc8000000. This had by miracle no consequence until now because the entry was not really pinned so it was overwritten soon enough to go undetected. Of course, now that we really pin the DTLBs, it must be fixed as well. Fixes: f76c8f6d257c ("powerpc/8xx: Add function to set pinned TLBs") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Depends-on: c12ab8dbc492 ("powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a21e9a057fe2d247a535aff0d157a54eefee017a.1636963688.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25powerpc/xive: Change IRQ domain to a tree domainCédric Le Goater2-3/+1
commit 8e80a73fa9a7747e3e8255cb149c543aabf65a24 upstream. Commit 4f86a06e2d6e ("irqdomain: Make normal and nomap irqdomains exclusive") introduced an IRQ_DOMAIN_FLAG_NO_MAP flag to isolate the 'nomap' domains still in use under the powerpc arch. With this new flag, the revmap_tree of the IRQ domain is not used anymore. This change broke the support of shared LSIs [1] in the XIVE driver because it was relying on a lookup in the revmap_tree to query previously mapped interrupts. Linux now creates two distinct IRQ mappings on the same HW IRQ which can lead to unexpected behavior in the drivers. The XIVE IRQ domain is not a direct mapping domain and its HW IRQ interrupt number space is rather large : 1M/socket on POWER9 and POWER10, change the XIVE driver to use a 'tree' domain type instead. [1] For instance, a linux KVM guest with virtio-rng and virtio-balloon devices. Fixes: 4f86a06e2d6e ("irqdomain: Make normal and nomap irqdomains exclusive") Cc: stable@vger.kernel.org # v5.14+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211116134022.420412-1-clg@kaod.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25powerpc/signal32: Fix sigset_t copyChristophe Leroy1-2/+8
commit 5499802b2284331788a440585869590f1bd63f7f upstream. The conversion from __copy_from_user() to __get_user() by commit d3ccc9781560 ("powerpc/signal: Use __get_user() to copy sigset_t") introduced a regression in __get_user_sigset() for powerpc/32. The bug was subsequently moved into unsafe_get_user_sigset(). The bug is due to the copied 64 bit value being truncated to 32 bits while being assigned to dst->sig[0] The regression was reported by users of the Xorg packages distributed in Debian/powerpc -- "The symptoms are that the fb screen goes blank, with the backlight remaining on and no errors logged in /var/log; wdm (or startx) run with no effect (I tried logging in in the blind, with no effect). And they are hard to kill, requiring 'kill -KILL ...'" Fix the regression by copying each word of the sigset, not only the first one. __get_user_sigset() was tentatively optimised to copy 64 bits at once in order to minimise KUAP unlock/lock impact, but the unsafe variant doesn't suffer that, so it can just copy words. Fixes: 887f3ceb51cd ("powerpc/signal32: Convert do_setcontext[_tm]() to user access block") Cc: stable@vger.kernel.org # v5.13+ Reported-by: Finn Thain <fthain@linux-m68k.org> Reported-and-tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/99ef38d61c0eb3f79c68942deb0c35995a93a777.1636966353.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFODavid Woodhouse1-1/+1
commit 4e8436479ad3be76a3823e6ce466ae464ce71300 upstream. In commit 319afe68567b ("KVM: xen: do not use struct gfn_to_hva_cache") we stopped storing this in-kernel as a GPA, and started storing it as a GFN. Which means we probably should have stopped calling gpa_to_gfn() on it when userspace asks for it back. Cc: stable@vger.kernel.org Fixes: 319afe68567b ("KVM: xen: do not use struct gfn_to_hva_cache") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211115165030.7422-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25KVM: x86/mmu: include EFER.LMA in extended mmu roleMaxim Levitsky2-0/+2
commit b8453cdcf26020030da182f0156d7bf59ae5719f upstream. Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the guest root level and is not reflected in kvm_mmu_page_role.level when TDP is in use. When simply running the guest, it is impossible for EFER.LMA and kvm_mmu.root_level to get out of sync, as the guest cannot transition from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without first bouncing through a different MMU context. And stuffing guest state via KVM_SET_SREGS{,2} also ensures a full MMU context reset. However, if KVM_SET_SREGS{,2} is followed by KVM_SET_NESTED_STATE, e.g. to set guest state when migrating the VM while L2 is active, the vCPU state will reflect L2, not L1. If L1 is using TDP for L2, then root_mmu will have been configured using L2's state, despite not being used for L2. If L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will be configured for guest PAE paging, but will match the mmu_role for 64-bit paging and cause KVM to not reconfigure root_mmu on the next nested VM-Exit. Alternatively, the root_mmu's role could be invalidated after a successful KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu, i.e. that switches the active mmu to guest_mmu, but doing so is unnecessarily tricky, and not even needed if L1 and L2 do have the same role (e.g., they are both 64-bit guests and run with the same CR4). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211115131837.195527-3-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25KVM: x86: Fix uninitialized eoi_exit_bitmap usage in vcpu_load_eoi_exitmap()黄乐1-2/+6
commit c5adbb3af051079f35abfa26551107e2c653087f upstream. In vcpu_load_eoi_exitmap(), currently the eoi_exit_bitmap[4] array is initialized only when Hyper-V context is available, in other path it is just passed to kvm_x86_ops.load_eoi_exitmap() directly from on the stack, which would cause unexpected interrupt delivery/handling issues, e.g. an *old* linux kernel that relies on PIT to do clock calibration on KVM might randomly fail to boot. Fix it by passing ioapic_handled_vectors to load_eoi_exitmap() when Hyper-V context is not available. Fixes: f2bc14b69c38 ("KVM: x86: hyper-v: Prepare to meet unallocated Hyper-V context") Cc: stable@vger.kernel.org Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Huang Le <huangle1@jd.com> Message-Id: <62115b277dab49ea97da5633f8522daf@jd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25KVM: x86: Assume a 64-bit hypercall for guests with protected stateTom Lendacky4-4/+16
commit b5aead0064f33ae5e693a364e3204fe1c0ac9af2 upstream. When processing a hypercall for a guest with protected state, currently SEV-ES guests, the guest CS segment register can't be checked to determine if the guest is in 64-bit mode. For an SEV-ES guest, it is expected that communication between the guest and the hypervisor is performed to shared memory using the GHCB. In order to use the GHCB, the guest must have been in long mode, otherwise writes by the guest to the GHCB would be encrypted and not be able to be comprehended by the hypervisor. Create a new helper function, is_64_bit_hypercall(), that assumes the guest is in 64-bit mode when the guest has protected state, and returns true, otherwise invoking is_64_bit_mode() to determine the mode. Update the hypercall related routines to use is_64_bit_hypercall() instead of is_64_bit_mode(). Add a WARN_ON_ONCE() to is_64_bit_mode() to catch occurences of calls to this helper function for a guest running with protected state. Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES") Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <e0b20c770c9d0d1403f23d83e785385104211f74.1621878537.git.thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25x86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup failsSean Christopherson1-0/+3
commit daf972118c517b91f74ff1731417feb4270625a4 upstream. Check for a valid hv_vp_index array prior to derefencing hv_vp_index when setting Hyper-V's TSC change callback. If Hyper-V setup failed in hyperv_init(), the kernel will still report that it's running under Hyper-V, but will have silently disabled nearly all functionality. BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc2+ #75 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:set_hv_tscchange_cb+0x15/0xa0 Code: <8b> 04 82 8b 15 12 17 85 01 48 c1 e0 20 48 0d ee 00 01 00 f6 c6 08 ... Call Trace: kvm_arch_init+0x17c/0x280 kvm_init+0x31/0x330 vmx_init+0xba/0x13a do_one_initcall+0x41/0x1c0 kernel_init_freeable+0x1f2/0x23b kernel_init+0x16/0x120 ret_from_fork+0x22/0x30 Fixes: 93286261de1b ("x86/hyperv: Reenlightenment notifications support") Cc: stable@vger.kernel.org Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20211104182239.1302956-2-seanjc@google.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25x86/sgx: Fix free page accountingReinette Chatre1-6/+6
commit ac5d272a0ad0419f52e08c91953356e32b075af7 upstream. The SGX driver maintains a single global free page counter, sgx_nr_free_pages, that reflects the number of free pages available across all NUMA nodes. Correspondingly, a list of free pages is associated with each NUMA node and sgx_nr_free_pages is updated every time a page is added or removed from any of the free page lists. The main usage of sgx_nr_free_pages is by the reclaimer that runs when it (sgx_nr_free_pages) goes below a watermark to ensure that there are always some free pages available to, for example, support efficient page faults. With sgx_nr_free_pages accessed and modified from a few places it is essential to ensure that these accesses are done safely but this is not the case. sgx_nr_free_pages is read without any protection and updated with inconsistent protection by any one of the spin locks associated with the individual NUMA nodes. For example: CPU_A CPU_B ----- ----- spin_lock(&nodeA->lock); spin_lock(&nodeB->lock); ... ... sgx_nr_free_pages--; /* NOT SAFE */ sgx_nr_free_pages--; spin_unlock(&nodeA->lock); spin_unlock(&nodeB->lock); Since sgx_nr_free_pages may be protected by different spin locks while being modified from different CPUs, the following scenario is possible: CPU_A CPU_B ----- ----- {sgx_nr_free_pages = 100} spin_lock(&nodeA->lock); spin_lock(&nodeB->lock); sgx_nr_free_pages--; sgx_nr_free_pages--; /* LOAD sgx_nr_free_pages = 100 */ /* LOAD sgx_nr_free_pages = 100 */ /* sgx_nr_free_pages-- */ /* sgx_nr_free_pages-- */ /* STORE sgx_nr_free_pages = 99 */ /* STORE sgx_nr_free_pages = 99 */ spin_unlock(&nodeA->lock); spin_unlock(&nodeB->lock); In the above scenario, sgx_nr_free_pages is decremented from two CPUs but instead of sgx_nr_free_pages ending with a value that is two less than it started with, it was only decremented by one while the number of free pages were actually reduced by two. The consequence of sgx_nr_free_pages not being protected is that its value may not accurately reflect the actual number of free pages on the system, impacting the availability of free pages in support of many flows. The problematic scenario is when the reclaimer does not run because it believes there to be sufficient free pages while any attempt to allocate a page fails because there are no free pages available. In the SGX driver the reclaimer's watermark is only 32 pages so after encountering the above example scenario 32 times a user space hang is possible when there are no more free pages because of repeated page faults caused by no free pages made available. The following flow was encountered: asm_exc_page_fault ... sgx_vma_fault() sgx_encl_load_page() sgx_encl_eldu() // Encrypted page needs to be loaded from backing // storage into newly allocated SGX memory page sgx_alloc_epc_page() // Allocate a page of SGX memory __sgx_alloc_epc_page() // Fails, no free SGX memory ... if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) // Wake reclaimer wake_up(&ksgxd_waitq); return -EBUSY; // Return -EBUSY giving reclaimer time to run return -EBUSY; return -EBUSY; return VM_FAULT_NOPAGE; The reclaimer is triggered in above flow with the following code: static bool sgx_should_reclaim(unsigned long watermark) { return sgx_nr_free_pages < watermark && !list_empty(&sgx_active_page_list); } In the problematic scenario there were no free pages available yet the value of sgx_nr_free_pages was above the watermark. The allocation of SGX memory thus always failed because of a lack of free pages while no free pages were made available because the reclaimer is never started because of sgx_nr_free_pages' incorrect value. The consequence was that user space kept encountering VM_FAULT_NOPAGE that caused the same address to be accessed repeatedly with the same result. Change the global free page counter to an atomic type that ensures simultaneous updates are done safely. While doing so, move the updating of the variable outside of the spin lock critical section to which it does not belong. Cc: stable@vger.kernel.org Fixes: 901ddbb9ecf5 ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()") Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/a95a40743bbd3f795b465f30922dde7f1ea9e0eb.1637004094.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25x86/boot: Pull up cmdline preparation and early param parsingBorislav Petkov1-27/+39
commit 8d48bf8206f77aa8687f0e241e901e5197e52423 upstream. Dan reports that Anjaneya Chagam can no longer use the efi=nosoftreserve kernel command line parameter to suppress "soft reservation" behavior. This is due to the fact that the following call-chain happens at boot: early_reserve_memory |-> efi_memblock_x86_reserve_range |-> efi_fake_memmap_early which does if (!efi_soft_reserve_enabled()) return; and that would have set EFI_MEM_NO_SOFT_RESERVE after having parsed "nosoftreserve". However, parse_early_param() gets called *after* it, leading to the boot cmdline not being taken into account. Therefore, carve out the command line preparation into a separate function which does the early param parsing too. So that it all goes together. And then call that function before early_reserve_memory() so that the params would have been parsed by then. Fixes: 8aa83e6395ce ("x86/setup: Call early_reserve_memory() earlier") Reported-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Anjaneya Chagam <anjaneya.chagam@intel.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/e8dd8993c38702ee6dd73b3c11f158617e665607.camel@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25mm/damon/dbgfs: fix missed use of damon_dbgfs_lockSeongJae Park1-3/+8
commit d78f3853f831eee46c6dbe726debf3be9e9c0d05 upstream. DAMON debugfs is supposed to protect dbgfs_ctxs, dbgfs_nr_ctxs, and dbgfs_dirs using damon_dbgfs_lock. However, some of the code is accessing the variables without the protection. This fixes it by protecting all such accesses. Link: https://lkml.kernel.org/r/20211110145758.16558-3-sj@kernel.org Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocationSeongJae Park1-2/+2
commit db7a347b26fe05d2e8c115bb24dfd908d0252bc3 upstream. Patch series "DAMON fixes". This patch (of 2): DAMON users can trigger below warning in '__alloc_pages()' by invoking write() to some DAMON debugfs files with arbitrarily high count argument, because DAMON debugfs interface allocates some buffers based on the user-specified 'count'. if (unlikely(order >= MAX_ORDER)) { WARN_ON_ONCE(!(gfp & __GFP_NOWARN)); return NULL; } Because the DAMON debugfs interface code checks failure of the 'kmalloc()', this commit simply suppresses the warnings by adding '__GFP_NOWARN' flag. Link: https://lkml.kernel.org/r/20211110145758.16558-1-sj@kernel.org Link: https://lkml.kernel.org/r/20211110145758.16558-2-sj@kernel.org Fixes: 4bc05954d007 ("mm/damon: implement a debugfs-based user space interface") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25kmap_local: don't assume kmap PTEs are linear arrays in memoryArd Biesheuvel3-11/+25
commit 825c43f50e3aa811a291ffcb40e02fbf6d91ba86 upstream. The kmap_local conversion broke the ARM architecture, because the new code assumes that all PTEs used for creating kmaps form a linear array in memory, and uses array indexing to look up the kmap PTE belonging to a certain kmap index. On ARM, this cannot work, not only because the PTE pages may be non-adjacent in memory, but also because ARM/!LPAE interleaves hardware entries and extended entries (carrying software-only bits) in a way that is not compatible with array indexing. Fortunately, this only seems to affect configurations with more than 8 CPUs, due to the way the per-CPU kmap slots are organized in memory. Work around this by permitting an architecture to set a Kconfig symbol that signifies that the kmap PTEs do not form a lineary array in memory, and so the only way to locate the appropriate one is to walk the page tables. Link: https://lore.kernel.org/linux-arm-kernel/20211026131249.3731275-1-ardb@kernel.org/ Link: https://lkml.kernel.org/r/20211116094737.7391-1-ardb@kernel.org Fixes: 2a15ba82fa6c ("ARM: highmem: Switch to generic kmap atomic") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reported-by: Quanyang Wang <quanyang.wang@windriver.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25hugetlb, userfaultfd: fix reservation restore on userfaultfd errorMina Almasry1-3/+4
commit cc30042df6fcc82ea18acf0dace831503e60a0b7 upstream. Currently in the is_continue case in hugetlb_mcopy_atomic_pte(), if we bail out using "goto out_release_unlock;" in the cases where idx >= size, or !huge_pte_none(), the code will detect that new_pagecache_page == false, and so call restore_reserve_on_error(). In this case I see restore_reserve_on_error() delete the reservation, and the following call to remove_inode_hugepages() will increment h->resv_hugepages causing a 100% reproducible leak. We should treat the is_continue case similar to adding a page into the pagecache and set new_pagecache_page to true, to indicate that there is no reservation to restore on the error path, and we need not call restore_reserve_on_error(). Rename new_pagecache_page to page_in_pagecache to make that clear. Link: https://lkml.kernel.org/r/20211117193825.378528-1-almasrymina@google.com Fixes: c7b1850dfb41 ("hugetlb: don't pass page cache pages to restore_reserve_on_error") Signed-off-by: Mina Almasry <almasrymina@google.com> Reported-by: James Houghton <jthoughton@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flagRustam Kovhaev1-1/+1
commit 34dbc3aaf5d9e89ba6cc5e24add9458c21ab1950 upstream. When kmemleak is enabled for SLOB, system does not boot and does not print anything to the console. At the very early stage in the boot process we hit infinite recursion from kmemleak_init() and eventually kernel crashes. kmemleak_init() specifies SLAB_NOLEAKTRACE for KMEM_CACHE(), but kmem_cache_create_usercopy() removes it because CACHE_CREATE_MASK is not valid for SLOB. Let's fix CACHE_CREATE_MASK and make kmemleak work with SLOB Link: https://lkml.kernel.org/r/20211115020850.3154366-1-rkovhaev@gmail.com Fixes: d8843922fba4 ("slab: Ignore internal flags in cache creation") Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Glauber Costa <glommer@parallels.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25shm: extend forced shm destroy to support objects from several IPC nsesAlexander Mikhalitsyn3-47/+159
commit 85b6d24646e4125c591639841169baa98a2da503 upstream. Currently, the exit_shm() function not designed to work properly when task->sysvshm.shm_clist holds shm objects from different IPC namespaces. This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it leads to use-after-free (reproducer exists). This is an attempt to fix the problem by extending exit_shm mechanism to handle shm's destroy from several IPC ns'es. To achieve that we do several things: 1. add a namespace (non-refcounted) pointer to the struct shmid_kernel 2. during new shm object creation (newseg()/shmget syscall) we initialize this pointer by current task IPC ns 3. exit_shm() fully reworked such that it traverses over all shp's in task->sysvshm.shm_clist and gets IPC namespace not from current task as it was before but from shp's object itself, then call shm_destroy(shp, ns). Note: We need to be really careful here, because as it was said before (1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we using special helper get_ipc_ns_not_zero() which allows to get IPC ns refcounter only if IPC ns not in the "state of destruction". Q/A Q: Why can we access shp->ns memory using non-refcounted pointer? A: Because shp object lifetime is always shorther than IPC namespace lifetime, so, if we get shp object from the task->sysvshm.shm_clist while holding task_lock(task) nobody can steal our namespace. Q: Does this patch change semantics of unshare/setns/clone syscalls? A: No. It's just fixes non-covered case when process may leave IPC namespace without getting task->sysvshm.shm_clist list cleaned up. Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife.com Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity") Co-developed-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Andrei Vagin <avagin@gmail.com> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25ipc: WARN if trying to remove ipc object which is absentAlexander Mikhalitsyn1-3/+3
commit 126e8bee943e9926238c891e2df5b5573aee76bc upstream. Patch series "shm: shm_rmid_forced feature fixes". Some time ago I met kernel crash after CRIU restore procedure, fortunately, it was CRIU restore, so, I had dump files and could do restore many times and crash reproduced easily. After some investigation I've constructed the minimal reproducer. It was found that it's use-after-free and it happens only if sysctl kernel.shm_rmid_forced = 1. The key of the problem is that the exit_shm() function not handles shp's object destroy when task->sysvshm.shm_clist contains items from different IPC namespaces. In most cases this list will contain only items from one IPC namespace. How can this list contain object from different namespaces? The exit_shm() function is designed to clean up this list always when process leaves IPC namespace. But we made a mistake a long time ago and did not add a exit_shm() call into the setns() syscall procedures. The first idea was just to add this call to setns() syscall but it obviously changes semantics of setns() syscall and that's userspace-visible change. So, I gave up on this idea. The first real attempt to address the issue was just to omit forced destroy if we meet shp object not from current task IPC namespace [1]. But that was not the best idea because task->sysvshm.shm_clist was protected by rwsem which belongs to current task IPC namespace. It means that list corruption may occur. Second approach is just extend exit_shm() to properly handle shp's from different IPC namespaces [2]. This is really non-trivial thing, I've put a lot of effort into that but not believed that it's possible to make it fully safe, clean and clear. Thanks to the efforts of Manfred Spraul working an elegant solution was designed. Thanks a lot, Manfred! Eric also suggested the way to address the issue in ("[RFC][PATCH] shm: In shm_exit destroy all created and never attached segments") Eric's idea was to maintain a list of shm_clists one per IPC namespace, use lock-less lists. But there is some extra memory consumption-related concerns. An alternative solution which was suggested by me was implemented in ("shm: reset shm_clist on setns but omit forced shm destroy"). The idea is pretty simple, we add exit_shm() syscall to setns() but DO NOT destroy shm segments even if sysctl kernel.shm_rmid_forced = 1, we just clean up the task->sysvshm.shm_clist list. This chages semantics of setns() syscall a little bit but in comparision to the "naive" solution when we just add exit_shm() without any special exclusions this looks like a safer option. [1] https://lkml.org/lkml/2021/7/6/1108 [2] https://lkml.org/lkml/2021/7/14/736 This patch (of 2): Let's produce a warning if we trying to remove non-existing IPC object from IPC namespace kht/idr structures. This allows us to catch possible bugs when the ipc_rmid() function was called with inconsistent struct ipc_ids*, struct kern_ipc_perm* arguments. Link: https://lkml.kernel.org/r/20211027224348.611025-1-alexander.mikhalitsyn@virtuozzo.com Link: https://lkml.kernel.org/r/20211027224348.611025-2-alexander.mikhalitsyn@virtuozzo.com Co-developed-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Andrei Vagin <avagin@gmail.com> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25tipc: check for null after calling kmemdupTadeusz Struk1-0/+4
commit 3e6db079751afd527bf3db32314ae938dc571916 upstream. kmemdup can return a null pointer so need to check for it, otherwise the null key will be dereferenced later in tipc_crypto_key_xmit as can be seen in the trace [1]. Cc: tipc-discussion@lists.sourceforge.net Cc: stable@vger.kernel.org # 5.15, 5.14, 5.10 [1] https://syzkaller.appspot.com/bug?id=bca180abb29567b189efdbdb34cbf7ba851c2a58 Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Link: https://lore.kernel.org/r/20211115160143.5099-1-tadeusz.struk@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25hexagon: clean up timer-regs.hNathan Chancellor3-30/+11
commit 51f2ec593441d3d1ebc0d478fac3ea329c7c93ac upstream. When building allmodconfig, there is a warning about TIMER_ENABLE being redefined: drivers/clocksource/timer-oxnas-rps.c:39:9: error: 'TIMER_ENABLE' macro redefined [-Werror,-Wmacro-redefined] #define TIMER_ENABLE BIT(7) ^ arch/hexagon/include/asm/timer-regs.h:13:9: note: previous definition is here #define TIMER_ENABLE 0 ^ 1 error generated. The values in this header are only used in one file each, if they are used at all. Remove the header and sink all of the constants into their respective files. TCX0_CLK_RATE is only used in arch/hexagon/include/asm/timex.h TIMER_ENABLE, RTOS_TIMER_INT, RTOS_TIMER_REGS_ADDR are only used in arch/hexagon/kernel/time.c. SLEEP_CLK_RATE and TIMER_CLR_ON_MATCH have both been unused since the file's introduction in commit 71e4a47f32f4 ("Hexagon: Add time and timer functions"). TIMER_ENABLE is redefined as BIT(0) so the shift is moved into the definition, rather than its use. Link: https://lkml.kernel.org/r/20211115174250.1994179-3-nathan@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Brian Cain <bcain@codeaurora.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25hexagon: export raw I/O routines for modulesNathan Chancellor1-0/+4
commit ffb92ce826fd801acb0f4e15b75e4ddf0d189bde upstream. Patch series "Fixes for ARCH=hexagon allmodconfig", v2. This series fixes some issues noticed with ARCH=hexagon allmodconfig. This patch (of 3): When building ARCH=hexagon allmodconfig, the following errors occur: ERROR: modpost: "__raw_readsl" [drivers/i3c/master/svc-i3c-master.ko] undefined! ERROR: modpost: "__raw_writesl" [drivers/i3c/master/dw-i3c-master.ko] undefined! ERROR: modpost: "__raw_readsl" [drivers/i3c/master/dw-i3c-master.ko] undefined! ERROR: modpost: "__raw_writesl" [drivers/i3c/master/i3c-master-cdns.ko] undefined! ERROR: modpost: "__raw_readsl" [drivers/i3c/master/i3c-master-cdns.ko] undefined! Export these symbols so that modules can use them without any errors. Link: https://lkml.kernel.org/r/20211115174250.1994179-1-nathan@kernel.org Link: https://lkml.kernel.org/r/20211115174250.1994179-2-nathan@kernel.org Fixes: 013bf24c3829 ("Hexagon: Provide basic implementation and/or stubs for I/O routines.") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Brian Cain <bcain@codeaurora.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25pstore/blk: Use "%lu" to format unsigned longGeert Uytterhoeven1-1/+1
commit 61eb495c83bf6ebde490992bf888ca15b9babc39 upstream. On 32-bit: fs/pstore/blk.c: In function ‘__best_effort_init’: include/linux/kern_levels.h:5:18: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 3 has type ‘long unsigned int’ [-Wformat=] 5 | #define KERN_SOH "\001" /* ASCII Start Of Header */ | ^~~~~~ include/linux/kern_levels.h:14:19: note: in expansion of macro ‘KERN_SOH’ 14 | #define KERN_INFO KERN_SOH "6" /* informational */ | ^~~~~~~~ include/linux/printk.h:373:9: note: in expansion of macro ‘KERN_INFO’ 373 | printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) | ^~~~~~~~~ fs/pstore/blk.c:314:3: note: in expansion of macro ‘pr_info’ 314 | pr_info("attached %s (%zu) (no dedicated panic_write!)\n", | ^~~~~~~ Cc: stable@vger.kernel.org Fixes: 7bb9557b48fcabaa ("pstore/blk: Use the normal block device I/O path") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210629103700.1935012-1-geert@linux-m68k.org Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25Revert "mark pstore-blk as broken"Kees Cook1-1/+0
commit d1faacbf67b1944f0e0c618dc581d929263f6fe9 upstream. This reverts commit d07f3b081ee632268786601f55e1334d1f68b997. pstore-blk was fixed to avoid the unwanted APIs in commit 7bb9557b48fc ("pstore/blk: Use the normal block device I/O path"), which landed in the same release as the commit adding BROKEN. Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20211116181559.3975566-1-keescook@chromium.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25tun: fix bonding active backup with arp monitoringNicolas Dichtel1-0/+5
commit a31d27fbed5d518734cb60956303eb15089a7634 upstream. As stated in the bonding doc, trans_start must be set manually for drivers using NETIF_F_LLTX: Drivers that use NETIF_F_LLTX flag must also update netdev_queue->trans_start. If they do not, then the ARP monitor will immediately fail any slaves using that driver, and those slaves will stay down. Link: https://www.kernel.org/doc/html/v5.15/networking/bonding.html#arp-monitor-operation Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25dmaengine: remove debugfs #ifdefArnd Bergmann2-16/+1
commit b3b180e735409ca0c76642014304b59482e0e653 upstream. The ptdma driver has added debugfs support, but this fails to build when debugfs is disabled: drivers/dma/ptdma/ptdma-debugfs.c: In function 'ptdma_debugfs_setup': drivers/dma/ptdma/ptdma-debugfs.c:93:54: error: 'struct dma_device' has no member named 'dbg_dev_root' 93 | debugfs_create_file("info", 0400, pt->dma_dev.dbg_dev_root, pt, | ^ drivers/dma/ptdma/ptdma-debugfs.c:96:55: error: 'struct dma_device' has no member named 'dbg_dev_root' 96 | debugfs_create_file("stats", 0400, pt->dma_dev.dbg_dev_root, pt, | ^ drivers/dma/ptdma/ptdma-debugfs.c:102:52: error: 'struct dma_device' has no member named 'dbg_dev_root' 102 | debugfs_create_dir("q", pt->dma_dev.dbg_dev_root); | ^ Remove the #ifdef in the header, as this only saves a few bytes, but would require ugly #ifdefs in each driver using it. Simplify the other user while we're at it. Fixes: e2fb2e2a33fa ("dmaengine: ptdma: Add debugfs entries for PTDMA") Fixes: 26cf132de6f7 ("dmaengine: Create debug directories for DMA devices") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://lore.kernel.org/r/20210920122017.205975-1-arnd@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25blk-cgroup: fix missing put device in error path from blkg_conf_pref()Yu Kuai1-4/+5
[ Upstream commit 15c30104965101b8e76b24d27035569d6613a7d6 ] If blk_queue_enter() failed due to queue is dying, the blkdev_put_no_open() is needed because blkcg_conf_open_bdev() succeeded. Fixes: 0c9d338c8443 ("blk-cgroup: synchronize blkg creation against policy deactivation") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20211102020705.2321858-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25s390/kexec: fix return code handlingHeiko Carstens2-2/+9
[ Upstream commit 20c76e242e7025bd355619ba67beb243ba1a1e95 ] kexec_file_add_ipl_report ignores that ipl_report_finish may fail and can return an error pointer instead of a valid pointer. Fix this and simplify by returning NULL in case of an error and let the only caller handle this case. Fixes: 99feaa717e55 ("s390/kexec_file: Create ipl report and pass to next kernel") Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25perf/x86/intel/uncore: Fix IIO event constraints for SnowridgeAlexander Antonov1-0/+8
[ Upstream commit bdc0feee05174418dec1fa68de2af19e1750b99f ] According to the latest uncore document, DATA_REQ_OF_CPU (0x83), DATA_REQ_BY_CPU (0xc0) and COMP_BUF_OCCUPANCY (0xd5) events have constraints. Add uncore IIO constraints for Snowridge. Fixes: 210cc5f9db7a ("perf/x86/intel/uncore: Add uncore support for Snow Ridge server") Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20211115090334.3789-4-alexander.antonov@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25perf/x86/intel/uncore: Fix IIO event constraints for Skylake ServerAlexander Antonov1-0/+1
[ Upstream commit 3866ae319c846a612109c008f43cba80b8c15e86 ] According to the latest uncore document, COMP_BUF_OCCUPANCY (0xd5) event can be collected on 2-3 counters. Update uncore IIO event constraints for Skylake Server. Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore support") Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20211115090334.3789-3-alexander.antonov@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25perf/x86/intel/uncore: Fix filter_tid mask for CHA events on Skylake ServerAlexander Antonov1-0/+3
[ Upstream commit e324234e0aa881b7841c7c713306403e12b069ff ] According Uncore Reference Manual: any of the CHA events may be filtered by Thread/Core-ID by using tid modifier in CHA Filter 0 Register. Update skx_cha_hw_config() to follow Uncore Guide. Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore support") Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20211115090334.3789-2-alexander.antonov@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25pinctrl: qcom: sm8350: Correct UFS and SDC offsetsBjorn Andersson1-4/+4
[ Upstream commit 62209e805b5c68577602a5803a71d8e2e11ee0d3 ] The downstream TLMM binding covers a group of TLMM-related hardware blocks, but the upstream binding only captures the particular block related to controlling the TLMM pins from an OS. In the translation of the driver from downstream, the offset of 0x100000 was lost for the UFS and SDC pingroups. Fixes: d5d348a3271f ("pinctrl: qcom: Add SM8350 pinctrl driver") Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Vinod Koul <vkoul@kernel.org> Reviewed-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Link: https://lore.kernel.org/r/20211104170835.1993686-1-bjorn.andersson@linaro.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25pinctrl: qcom: sdm845: Enable dual edge errataBjorn Andersson1-0/+1
[ Upstream commit 3a3a100473d2f6ebf9bdfe6efedd7e18de724388 ] It has been observed that dual edge triggered wakeirq GPIOs on SDM845 doesn't trigger interrupts on the falling edge. Enabling wakeirq_dual_edge_errata for SDM845 indicates that the PDC in SDM845 suffers from the same problem described, and worked around, by Doug in 'c3c0c2e18d94 ("pinctrl: qcom: Handle broken/missing PDC dual edge IRQs on sc7180")', so enable the workaround for SDM845 as well. The specific problem seen without this is that gpio-keys does not detect the falling edge of the LID gpio on the Lenovo Yoga C630 and as such consistently reports the LID as closed. Fixes: e35a6ae0eb3a ("pinctrl/msm: Setup GPIO chip in hierarchy") Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Tested-By: Steev Klimaszewski <steev@kali.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20211102034115.1946036-1-bjorn.andersson@linaro.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25powerpc/pseries: Fix numa FORM2 parsing fallback codeNicholas Piggin1-16/+12
[ Upstream commit 302039466f6a3b9421ecb9a6a2c528801dc24a86 ] In case the FORM2 distance table from firmware is not the expected size, there is fallback code that just populates the lookup table as local vs remote. However it then continues on to use the distance table. Fix. Fixes: 1c6b5a7e7405 ("powerpc/pseries: Add support for FORM2 associativity") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211109064900.2041386-2-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25powerpc/pseries: rename numa_dist_table to form2_distancesNicholas Piggin1-9/+9
[ Upstream commit 0bd81274e3f1195ee7c820ef02d62f31077c42c3 ] The name of the local variable holding the "form2" property address conflicts with the numa_distance_table global. This patch does 's/numa_dist_table/form2_distances/g' over the function, which also renames numa_dist_table_length to form2_distances_length. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211109064900.2041386-1-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25powerpc: clean vdso32 and vdso64 directoriesMasahiro Yamada1-0/+3
[ Upstream commit 964c33cd0be621b291b5d253d8731eb2680082cb ] Since commit bce74491c300 ("powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o"), "make ARCH=powerpc clean" does not clean up the arch/powerpc/kernel/{vdso32,vdso64} directories. Use the subdir- trick to let "make clean" descend into them. Fixes: bce74491c300 ("powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211109185015.615517-1-masahiroy@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr()Michael Ellerman1-2/+2
[ Upstream commit dae581864609d36fb58855fd59880b4941ce9d14 ] kvmppc_h_set_dabr(), and kvmppc_h_set_xdabr() which jumps into it, need to use _GLOBAL_TOC to setup the kernel TOC pointer, because kvmppc_h_set_dabr() uses LOAD_REG_ADDR() to load dawr_force_enable. When called from hcall_try_real_mode() we have the kernel TOC in r2, established near the start of kvmppc_interrupt_hv(), so there is no issue. But they can also be called from kvmppc_pseries_do_hcall() which is module code, so the access ends up happening with the kvm-hv module's r2, which will not point at dawr_force_enable and could even cause a fault. With the current code layout and compilers we haven't observed a fault in practice, the load hits somewhere in kvm-hv.ko and silently returns some bogus value. Note that we we expect p8/p9 guests to use the DAWR, but SLOF uses h_set_dabr() to test if sc1 works correctly, see SLOF's lib/libhvcall/brokensc1.c. Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Daniel Axtens <dja@axtens.net> Link: https://lore.kernel.org/r/20210923151031.72408-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25riscv: fix building external modulesAndreas Schwab1-0/+2
[ Upstream commit 5a19c7e06236a9c55dfc001bb4d1a8f1950d23e7 ] When building external modules, vdso_prepare should not be run. If the kernel sources are read-only, it will fail. Fixes: fde9c59aebaf ("riscv: explicitly use symbol offsets for VDSO") Signed-off-by: Andreas Schwab <schwab@suse.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25tools build: Fix removal of feature-sync-compare-and-swap feature detectionArnaldo Carvalho de Melo1-1/+0
[ Upstream commit e8c04ea0fef5731dbcaabac86d65254c227aedf4 ] The patch removing the feature-sync-compare-and-swap feature detection didn't remove the call to main_test_sync_compare_and_swap(), making the 'test-all' case fail an all the feature tests to be performed individually: $ cat /tmp/build/perf/feature/test-all.make.output In file included from test-all.c:18: test-libpython-version.c:5:10: error: #error 5 | #error | ^~~~~ test-all.c: In function ‘main’: test-all.c:203:9: error: implicit declaration of function ‘main_test_sync_compare_and_swap’ [-Werror=implicit-function-declaration] 203 | main_test_sync_compare_and_swap(argc, argv); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors $ Fix it, now to figure out what is that test-libpython-version.c problem... Fixes: 60fa754b2a5a4e0c ("tools: Remove feature-sync-compare-and-swap feature detection") Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/YZU9Fe0sgkHSXeC2@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25ptp: ocp: Fix a couple NULL vs IS_ERR() checksDan Carpenter1-4/+5
[ Upstream commit c7521d3aa2fa7fc785682758c99b5bcae503f6be ] The ptp_ocp_get_mem() function does not return NULL, it returns error pointers. Fixes: 773bda964921 ("ptp: ocp: Expose various resources on the timecard.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25e100: fix device suspend/resumeJesse Brandeburg1-5/+13
[ Upstream commit 5d2ca2e12dfb2aff3388ca57b06f570fa6206ced ] As reported in [1], e100 was no longer working for suspend/resume cycles. The previous commit mentioned in the fixes appears to have broken things and this attempts to practice best known methods for device power management and keep wake-up working while allowing suspend/resume to work. To do this, I reorder a little bit of code and fix the resume path to make sure the device is enabled. [1] https://bugzilla.kernel.org/show_bug.cgi?id=214933 Fixes: 69a74aef8a18 ("e100: use generic power management") Cc: Vaibhav Gupta <vaibhavgupta40@gmail.com> Reported-by: Alexey Kuznetsov <axet@me.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Alexey Kuznetsov <axet@me.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25NFC: add NCI_UNREG flag to eliminate the raceLin Ma2-2/+18
[ Upstream commit 48b71a9e66c2eab60564b1b1c85f4928ed04e406 ] There are two sites that calls queue_work() after the destroy_workqueue() and lead to possible UAF. The first site is nci_send_cmd(), which can happen after the nci_close_device as below nfcmrvl_nci_unregister_dev | nfc_genl_dev_up nci_close_device | flush_workqueue | del_timer_sync | nci_unregister_device | nfc_get_device destroy_workqueue | nfc_dev_up nfc_unregister_device | nci_dev_up device_del | nci_open_device | __nci_request | nci_send_cmd | queue_work !!! Another site is nci_cmd_timer, awaked by the nci_cmd_work from the nci_send_cmd. ... | ... nci_unregister_device | queue_work destroy_workqueue | nfc_unregister_device | ... device_del | nci_cmd_work | mod_timer | ... | nci_cmd_timer | queue_work !!! For the above two UAF, the root cause is that the nfc_dev_up can race between the nci_unregister_device routine. Therefore, this patch introduce NCI_UNREG flag to easily eliminate the possible race. In addition, the mutex_lock in nci_close_device can act as a barrier. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20211116152732.19238-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25NFC: reorder the logic in nfc_{un,}register_deviceLin Ma1-14/+18
[ Upstream commit 3e3b5dfcd16a3e254aab61bd1e8c417dd4503102 ] There is a potential UAF between the unregistration routine and the NFC netlink operations. The race that cause that UAF can be shown as below: (FREE) | (USE) nfcmrvl_nci_unregister_dev | nfc_genl_dev_up nci_close_device | nci_unregister_device | nfc_get_device nfc_unregister_device | nfc_dev_up rfkill_destory | device_del | rfkill_blocked ... | ... The root cause for this race is concluded below: 1. The rfkill_blocked (USE) in nfc_dev_up is supposed to be placed after the device_is_registered check. 2. Since the netlink operations are possible just after the device_add in nfc_register_device, the nfc_dev_up() can happen anywhere during the rfkill creation process, which leads to data race. This patch reorder these actions to permit 1. Once device_del is finished, the nfc_dev_up cannot dereference the rfkill object. 2. The rfkill_register need to be placed after the device_add of nfc_dev because the parent device need to be created first. So this patch keeps the order but inject device_lock to prevent the data race. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: be055b2f89b5 ("NFC: RFKILL support") Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20211116152652.19217-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>