Age | Commit message (Collapse) | Author | Files | Lines |
|
The ioctl KVM_SET_BOOT_CPU_ID fails when called after vcpu creation.
Add this explanation in the documentation.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20210319091650.11967-1-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
After commit 997acaf6b4b59c (lockdep: report broken irq restoration), the guest
splatting below during boot:
raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 1 PID: 169 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x26/0x30
Modules linked in: hid_generic usbhid hid
CPU: 1 PID: 169 Comm: systemd-udevd Not tainted 5.11.0+ #25
RIP: 0010:warn_bogus_irq_restore+0x26/0x30
Call Trace:
kvm_wait+0x76/0x90
__pv_queued_spin_lock_slowpath+0x285/0x2e0
do_raw_spin_lock+0xc9/0xd0
_raw_spin_lock+0x59/0x70
lockref_get_not_dead+0xf/0x50
__legitimize_path+0x31/0x60
legitimize_root+0x37/0x50
try_to_unlazy_next+0x7f/0x1d0
lookup_fast+0xb0/0x170
path_openat+0x165/0x9b0
do_filp_open+0x99/0x110
do_sys_openat2+0x1f1/0x2e0
do_sys_open+0x5c/0x80
__x64_sys_open+0x21/0x30
do_syscall_64+0x32/0x50
entry_SYSCALL_64_after_hwframe+0x44/0xae
The new consistency checking, expects local_irq_save() and
local_irq_restore() to be paired and sanely nested, and therefore expects
local_irq_restore() to be called with irqs disabled.
The irqflags handling in kvm_wait() which ends up doing:
local_irq_save(flags);
safe_halt();
local_irq_restore(flags);
instead triggers it. This patch fixes it by using
local_irq_disable()/enable() directly.
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1615791328-2735-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In order to deal with noncoherent DMA, we should execute wbinvd on
all dirty pCPUs when guest wbinvd exits to maintain data consistency.
smp_call_function_many() does not execute the provided function on the
local core, therefore replace it by on_each_cpu_mask().
Reported-by: Nadav Amit <namit@vmware.com>
Cc: Nadav Amit <namit@vmware.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1615517151-7465-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fix a plethora of issues with MSR filtering by installing the resulting
filter as an atomic bundle instead of updating the live filter one range
at a time. The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
the hardware MSR bitmaps won't be updated until the next VM-Enter, but
the relevant software struct is atomically updated, which is what KVM
really needs.
Similar to the approach used for modifying memslots, make arch.msr_filter
a SRCU-protected pointer, do all the work configuring the new filter
outside of kvm->lock, and then acquire kvm->lock only when the new filter
has been vetted and created. That way vCPU readers either see the old
filter or the new filter in their entirety, not some half-baked state.
Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
TOCTOU bug, but that's just the tip of the iceberg...
- Nothing is __rcu annotated, making it nigh impossible to audit the
code for correctness.
- kvm_add_msr_filter() has an unpaired smp_wmb(). Violation of kernel
coding style aside, the lack of a smb_rmb() anywhere casts all code
into doubt.
- kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
count before taking the lock.
- kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.
The entire approach of updating the live filter is also flawed. While
installing a new filter is inherently racy if vCPUs are running, fixing
the above issues also makes it trivial to ensure certain behavior is
deterministic, e.g. KVM can provide deterministic behavior for MSRs with
identical settings in the old and new filters. An atomic update of the
filter also prevents KVM from getting into a half-baked state, e.g. if
installing a filter fails, the existing approach would leave the filter
in a half-baked state, having already committed whatever bits of the
filter were already processed.
[*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com
Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering")
Cc: stable@vger.kernel.org
Cc: Alexander Graf <graf@amazon.com>
Reported-by: Yuan Yao <yaoyuan0329os@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210316184436.2544875-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Test for the KVM_SET_BOOT_CPU_ID ioctl.
Check that it correctly allows to change the BSP vcpu.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20210318151624.490861-2-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
As in kvm_ioctl and _kvm_ioctl, add
the respective _vm_ioctl for vm_ioctl.
_vm_ioctl invokes an ioctl using the vm fd,
leaving the caller to test the result.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20210318151624.490861-1-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Test the KVM_GET_MSR_FEATURE_INDEX_LIST
and KVM_GET_MSR_INDEX_LIST ioctls.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20210318145629.486450-1-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Introduce a new selftest for Hyper-V clocksources (MSR-based reference TSC
and TSC page). As a starting point, test the following:
1) Reference TSC is 1Ghz clock.
2) Reference TSC and TSC page give the same reading.
3) TSC page gets updated upon KVM_SET_CLOCK call.
4) TSC page does not get updated when guest opted for reenlightenment.
5) Disabled TSC page doesn't get updated.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210318140949.1065740-1-vkuznets@redhat.com>
[Add a host-side test using TSC + KVM_GET_MSR too. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
re-enlightenment
When guest opts for re-enlightenment notifications upon migration, it is
in its right to assume that TSC page values never change (as they're only
supposed to change upon migration and the host has to keep things as they
are before it receives confirmation from the guest). This is mostly true
until the guest is migrated somewhere. KVM userspace (e.g. QEMU) will
trigger masterclock update by writing to HV_X64_MSR_REFERENCE_TSC, by
calling KVM_SET_CLOCK,... and as TSC value and kvmclock reading drift
apart (even slightly), the update causes TSC page values to change.
The issue at hand is that when Hyper-V is migrated, it uses stale (cached)
TSC page values to compute the difference between its own clocksource
(provided by KVM) and its guests' TSC pages to program synthetic timers
and in some cases, when TSC page is updated, this puts all stimer
expirations in the past. This, in its turn, causes an interrupt storm
and L2 guests not making much forward progress.
Note, KVM doesn't fully implement re-enlightenment notification. Basically,
the support for reenlightenment MSRs is just a stub and userspace is only
expected to expose the feature when TSC scaling on the expected destination
hosts is available. With TSC scaling, no real re-enlightenment is needed
as TSC frequency doesn't change. With TSC scaling becoming ubiquitous, it
likely makes little sense to fully implement re-enlightenment in KVM.
Prevent TSC page from being updated after migration. In case it's not the
guest who's initiating the change and when TSC page is already enabled,
just keep it as it is: TSC value is supposed to be preserved across
migration and TSC frequency can't change with re-enlightenment enabled.
The guest is doomed anyway if any of this is not true.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210316143736.964151-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Create an infrastructure for tracking Hyper-V TSC page status, i.e. if it
was updated from guest/host side or if we've failed to set it up (because
e.g. guest wrote some garbage to HV_X64_MSR_REFERENCE_TSC) and there's no
need to retry.
Also, in a hypothetical situation when we are in 'always catchup' mode for
TSC we can now avoid contending 'hv->hv_lock' on every guest enter by
setting the state to HV_TSC_PAGE_BROKEN after compute_tsc_page_parameters()
returns false.
Check for HV_TSC_PAGE_SET state instead of '!hv->tsc_ref.tsc_sequence' in
get_time_ref_counter() to properly handle the situation when we failed to
write the updated TSC page values to the guest.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210316143736.964151-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When KVM_REQ_MASTERCLOCK_UPDATE request is issued (e.g. after migration)
we need to make sure no vCPU sees stale values in PV clock structures and
thus all vCPUs are kicked with KVM_REQ_CLOCK_UPDATE. Hyper-V TSC page
clocksource is global and kvm_guest_time_update() only updates in on vCPU0
but this is not entirely correct: nothing blocks some other vCPU from
entering the guest before we finish the update on CPU0 and it can read
stale values from the page.
Invalidate TSC page in kvm_gen_update_masterclock() to switch all vCPUs
to using MSR based clocksource (HV_X64_MSR_TIME_REF_COUNT).
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210316143736.964151-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
HV_X64_MSR_TSC_EMULATION_STATUS
HV_X64_MSR_TSC_EMULATION_STATUS indicates whether TSC accesses are emulated
after migration (to accommodate for a different host TSC frequency when TSC
scaling is not supported; we don't implement this in KVM). Guest can use
the same MSR to stop TSC access emulation by writing zero. Writing anything
else is forbidden.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210316143736.964151-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Store the address space ID in the TDP iterator so that it can be
retrieved without having to bounce through the root shadow page. This
streamlines the code and fixes a Sparse warning about not properly using
rcu_dereference() when grabbing the ID from the root on the fly.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210315233803.2706477-5-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In tdp_mmu_iter_cond_resched there is a call to tdp_iter_start which
causes the iterator to continue its walk over the paging structure from
the root. This is needed after a yield as paging structure could have
been freed in the interim.
The tdp_iter_start call is not very clear and something of a hack. It
requires exposing tdp_iter fields not used elsewhere in tdp_mmu.c and
the effect is not obvious from the function name. Factor a more aptly
named function out of tdp_iter_start and call it from
tdp_mmu_iter_cond_resched and tdp_iter_start.
No functional change intended.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210315233803.2706477-4-bgardon@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fix a missing rcu_dereference in tdp_mmu_zap_spte_atomic.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210315233803.2706477-3-bgardon@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The pt passed into handle_removed_tdp_mmu_page does not need RCU
protection, as it is not at any risk of being freed by another thread at
that point. However, the implicit cast from tdp_sptep_t to u64 * dropped
the __rcu annotation without a proper rcu_derefrence. Fix this by
passing the pt as a tdp_ptep_t and then rcu_dereferencing it in
the function.
Suggested-by: Sean Christopherson <seanjc@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210315233803.2706477-2-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
|
|
Doing a
prctl(PR_SET_MM, PR_SET_MM_AUXV, addr, 1);
will copy 1 byte from userspace to (quite big) on-stack array
and then stash everything to mm->saved_auxv.
AT_NULL terminator will be inserted at the very end.
/proc/*/auxv handler will find that AT_NULL terminator
and copy original stack contents to userspace.
This devious scheme requires CAP_SYS_RESOURCE.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of irqchip updates:
- Make the GENERIC_IRQ_MULTI_HANDLER configuration correct
- Add a missing DT compatible string for the Ingenic driver
- Remove the pointless debugfs_file pointer from struct irqdomain"
* tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/ingenic: Add support for the JZ4760
dt-bindings/irq: Add compatible string for the JZ4760B
irqchip: Do not blindly select CONFIG_GENERIC_IRQ_MULTI_HANDLER
ARM: ep93xx: Select GENERIC_IRQ_MULTI_HANDLER directly
irqdomain: Remove debugfs_file from struct irq_domain
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix in for hrtimers to prevent an interrupt storm caused by
the lack of reevaluation of the timers which expire in softirq context
under certain circumstances, e.g. when the clock was set"
* tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"A set of scheduler updates:
- Prevent a NULL pointer dereference in the migration_stop_cpu()
mechanims
- Prevent self concurrency of affine_move_task()
- Small fixes and cleanups related to task migration/affinity setting
- Ensure that sync_runqueues_membarrier_state() is invoked on the
current CPU when it is in the cpu mask"
* tag 'sched-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/membarrier: fix missing local execution of ipi_sync_rq_state()
sched: Simplify set_affinity_pending refcounts
sched: Fix affine_move_task() self-concurrency
sched: Optimize migration_cpu_stop()
sched: Collate affine_move_task() stoppers
sched: Simplify migration_cpu_stop()
sched: Fix migration_cpu_stop() requeueing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Thomas Gleixner:
"A single objtool fix to handle the PUSHF/POPF validation correctly for
the paravirt changes which modified arch_local_irq_restore not to use
popf"
* tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool,x86: Fix uaccess PUSHF/POPF validation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"A couple of locking fixes:
- A fix for the static_call mechanism so it handles unaligned
addresses correctly.
- Make u64_stats_init() a macro so every instance gets a seperate
lockdep key.
- Make seqcount_latch_init() a macro as well to preserve the static
variable which is used for the lockdep key"
* tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
seqlock,lockdep: Fix seqcount_latch_init()
u64_stats,lockdep: Fix u64_stats_init() vs lockdep
static_call: Fix the module key fixup
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Make sure PMU internal buffers are flushed for per-CPU events too and
properly handle PID/TID for large PEBS.
- Handle the case properly when there's no PMU and therefore return an
empty list of perf MSRs for VMX to switch instead of reading random
garbage from the stack.
* tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case
perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
perf/core: Flush PMU internal buffers for per-CPU events
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fix from Ard Biesheuvel via Borislav Petkov:
"Fix an oversight in the handling of EFI_RT_PROPERTIES_TABLE, which was
added v5.10, but failed to take the SetVirtualAddressMap() RT service
into account"
* tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- A couple of SEV-ES fixes and robustifications: verify usermode stack
pointer in NMI is not coming from the syscall gap, correctly track
IRQ states in the #VC handler and access user insn bytes atomically
in same handler as latter cannot sleep.
- Balance 32-bit fast syscall exit path to do the proper work on exit
and thus not confuse audit and ptrace frameworks.
- Two fixes for the ORC unwinder going "off the rails" into KASAN
redzones and when ORC data is missing.
* tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev-es: Use __copy_from_user_inatomic()
x86/sev-es: Correctly track IRQ states in runtime #VC handler
x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
x86/sev-es: Introduce ip_within_syscall_gap() helper
x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
x86/unwind/orc: Silence warnings caused by missing ORC data
x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.12:
- Fix wrong instruction encoding for lis in ppc_function_entry(),
which could potentially lead to missed kprobes.
- Fix SET_FULL_REGS on 32-bit and 64e, which prevented ptrace of
non-volatile GPRs immediately after exec.
- Clean up a missed SRR specifier in the recent interrupt rework.
- Don't treat unrecoverable_exception() as an interrupt handler, it's
called from other handlers so shouldn't do the interrupt entry/exit
accounting itself.
- Fix build errors caused by missing declarations for
[en/dis]able_kernel_vsx().
Thanks to Christophe Leroy, Daniel Axtens, Geert Uytterhoeven, Jiri
Olsa, Naveen N. Rao, and Nicholas Piggin"
* tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/traps: unrecoverable_exception() is not an interrupt handler
powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()
powerpc/64s/exception: Clean up a missed SRR specifier
powerpc: Fix inverted SET_FULL_REGS bitop
powerpc/64s: Use symbolic macros for function entry encoding
powerpc/64s: Fix instruction encoding for lis in ppc_function_entry()
|
|
Pull KVM fixes from Paolo Bonzini:
"More fixes for ARM and x86"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: LAPIC: Advancing the timer expiration on guest initiated write
KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode
KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
kvm: x86: annotate RCU pointers
KVM: arm64: Fix exclusive limit for IPA size
KVM: arm64: Reject VM creation when the default IPA size is unsupported
KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
KVM: arm64: Don't use cbz/adr with external symbols
KVM: arm64: Fix range alignment when walking page tables
KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
KVM: arm64: Fix nVHE hyp panic host context restore
KVM: arm64: Avoid corrupting vCPU context register in guest exit
KVM: arm64: nvhe: Save the SPE context early
kvm: x86: use NULL instead of using plain integer as pointer
KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled'
KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
|
|
Merge misc fixes from Andrew Morton:
"28 patches.
Subsystems affected by this series: mm (memblock, pagealloc, hugetlb,
highmem, kfence, oom-kill, madvise, kasan, userfaultfd, memcg, and
zram), core-kernel, kconfig, fork, binfmt, MAINTAINERS, kbuild, and
ia64"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (28 commits)
zram: fix broken page writeback
zram: fix return value on writeback_store
mm/memcg: set memcg when splitting page
mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
mm/userfaultfd: fix memory corruption due to writeprotect
kasan: fix KASAN_STACK dependency for HW_TAGS
kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC
mm/madvise: replace ptrace attach requirement for process_madvise
include/linux/sched/mm.h: use rcu_dereference in in_vfork()
kfence: fix reports if constant function prefixes exist
kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations
kfence: fix printk format for ptrdiff_t
linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
MAINTAINERS: exclude uapi directories in API/ABI section
binfmt_misc: fix possible deadlock in bm_register_write
mm/highmem.c: fix zero_user_segments() with start > end
hugetlb: do early cow when page pinned on src mm
mm: use is_cow_mapping() across tree where proper
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:
- More compatible strings for the Ingenic irqchip (introducing the
JZ4760B SoC)
- Select GENERIC_IRQ_MULTI_HANDLER on the ARM ep93xx platform
- Drop all GENERIC_IRQ_MULTI_HANDLER selections from the irqchip
Kconfig, now relying on the architecture to get it right
- Drop the debugfs_file field from struct irq_domain, now that
debugfs can track things on its own
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small misc/char driver fixes to resolve some reported
problems:
- habanalabs driver fixes
- Acrn build fixes (reported many times)
- pvpanic module table export fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
misc/pvpanic: Export module FDT device table
misc: fastrpc: restrict user apps from sending kernel RPC messages
virt: acrn: Correct type casting of argument of copy_from_user()
virt: acrn: Use EPOLLIN instead of POLLIN
virt: acrn: Use vfs_poll() instead of f_op->poll()
virt: acrn: Make remove_cpu sysfs invisible with !CONFIG_HOTPLUG_CPU
cpu/hotplug: Fix build error of using {add,remove}_cpu() with !CONFIG_SMP
habanalabs: fix debugfs address translation
habanalabs: Disable file operations after device is removed
habanalabs: Call put_pid() when releasing control device
drivers: habanalabs: remove unused dentry pointer for debugfs files
habanalabs: mark hl_eq_inc_ptr() as static
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for reported problems. They
include:
- wfx header file cleanup patch reverted as it could cause problems
- comedi driver endian fixes
- buffer overflow problems for staging wifi drivers
- build dependency issue for rtl8192e driver
All have been in linux-next for a while with no reported problems"
* tag 'staging-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (23 commits)
Revert "staging: wfx: remove unused included header files"
staging: rtl8188eu: prevent ->ssid overflow in rtw_wx_set_scan()
staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
staging: rtl8192u: fix ->ssid overflow in r8192_wx_set_scan()
staging: comedi: pcl726: Use 16-bit 0 for interrupt data
staging: comedi: ni_65xx: Use 16-bit 0 for interrupt data
staging: comedi: ni_6527: Use 16-bit 0 for interrupt data
staging: comedi: comedi_parport: Use 16-bit 0 for interrupt data
staging: comedi: amplc_pc236_common: Use 16-bit 0 for interrupt data
staging: comedi: pcl818: Fix endian problem for AI command data
staging: comedi: pcl711: Fix endian problem for AI command data
staging: comedi: me4000: Fix endian problem for AI command data
staging: comedi: dmm32at: Fix endian problem for AI command data
staging: comedi: das800: Fix endian problem for AI command data
staging: comedi: das6402: Fix endian problem for AI command data
staging: comedi: adv_pci1710: Fix endian problem for AI command data
staging: comedi: addi_apci_1500: Fix endian problem for command sample
staging: comedi: addi_apci_1032: Fix endian problem for COS sample
staging: ks7010: prevent buffer overflow in ks_wlan_set_scan()
staging: rtl8712: Fix possible buffer overflow in r8712_sitesurvey_cmd
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty and serial driver fixes to resolve some
reported problems:
- led tty trigger fixes based on review and were acked by the led
maintainer
- revert a max310x serial driver patch as it was causing problems
- revert a pty change as it was also causing problems
All of these have been in linux-next for a while with no reported
problems"
* tag 'tty-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "drivers:tty:pty: Fix a race causing data loss on close"
Revert "serial: max310x: rework RX interrupt handling"
leds: trigger/tty: Use led_set_brightness_sync() from workqueue
leds: trigger: Fix error path to not unlock the unlocked mutex
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a small number of USB fixes for 5.12-rc3 to resolve a bunch
of reported issues:
- usbip fixups for issues found by syzbot
- xhci driver fixes and quirk additions
- gadget driver fixes
- dwc3 QCOM driver fix
- usb-serial new ids and fixes
- usblp fix for a long-time issue
- cdc-acm quirk addition
- other tiny fixes for reported problems
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
xhci: Fix repeated xhci wake after suspend due to uncleared internal wake state
usb: xhci: Fix ASMedia ASM1042A and ASM3242 DMA addressing
xhci: Improve detection of device initiated wake signal.
usb: xhci: do not perform Soft Retry for some xHCI hosts
usbip: fix vudc usbip_sockfd_store races leading to gpf
usbip: fix vhci_hcd attach_store() races leading to gpf
usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
usbip: fix vudc to check for stream socket
usbip: fix vhci_hcd to check for stream socket
usbip: fix stub_dev to check for stream socket
usb: dwc3: qcom: Add missing DWC3 OF node refcount decrement
USB: usblp: fix a hang in poll() if disconnected
USB: gadget: udc: s3c2410_udc: fix return value check in s3c2410_udc_probe()
usb: renesas_usbhs: Clear PIPECFG for re-enabling pipe with other EPNUM
usb: dwc3: qcom: Honor wakeup enabled/disabled state
usb: gadget: f_uac1: stop playback on function disable
usb: gadget: f_uac2: always increase endpoint max_packet_size by one audio slot
USB: gadget: u_ether: Fix a configfs return code
usb: dwc3: qcom: add ACPI device id for sc8180x
Goodix Fingerprint device is not a modem
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fix from Gao Xiang:
"Fix an urgent regression introduced by commit baa2c7c97153 ("block:
set .bi_max_vecs as actual allocated vector number"), which could
cause unexpected hung since linux 5.12-rc1.
Resolve it by avoiding using bio->bi_max_vecs completely"
* tag 'erofs-for-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix bio->bi_max_vecs behavior change
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- avoid 'make image_name' invoking syncconfig
- fix a couple of bugs in scripts/dummy-tools
- fix LLD_VENDOR and locale issues in scripts/ld-version.sh
- rebuild GCC plugins when the compiler is upgraded
- allow LTO to be enabled with KASAN_HW_TAGS
- allow LTO to be enabled without LLVM=1
* tag 'kbuild-fixes-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: fix ld-version.sh to not be affected by locale
kbuild: remove meaningless parameter to $(call if_changed_rule,dtc)
kbuild: remove LLVM=1 test from HAS_LTO_CLANG
kbuild: remove unneeded -O option to dtc
kbuild: dummy-tools: adjust to scripts/cc-version.sh
kbuild: Allow LTO to be selected with KASAN_HW_TAGS
kbuild: dummy-tools: support MPROFILE_KERNEL checks for ppc
kbuild: rebuild GCC plugins when the compiler is upgraded
kbuild: Fix ld-version.sh script if LLD was built with LLD_VENDOR
kbuild: dummy-tools: fix inverted tests for gcc
kbuild: add image_name to no-sync-config-targets
|
|
commit 0d8359620d9b ("zram: support page writeback") introduced two
problems. It overwrites writeback_store's return value as kstrtol's
return value, which makes return value zero so user could see zero as
return value of write syscall even though it wrote data successfully.
It also breaks index value in the loop in that it doesn't increase the
index any longer. It means it can write only first starting block index
so user couldn't write all idle pages in the zram so lose memory saving
chance.
This patch fixes those issues.
Link: https://lkml.kernel.org/r/20210312173949.2197662-2-minchan@kernel.org
Fixes: 0d8359620d9b("zram: support page writeback")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Amos Bianchi <amosbianchi@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: John Dias <joaodias@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
writeback_store's return value is overwritten by submit_bio_wait's return
value. Thus, writeback_store will return zero since there was no IO
error. In the end, write syscall from userspace will see the zero as
return value, which could make the process stall to keep trying the write
until it will succeed.
Link: https://lkml.kernel.org/r/20210312173949.2197662-1-minchan@kernel.org
Fixes: 3b82a051c101("drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: John Dias <joaodias@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As described in the split_page() comment, for the non-compound high order
page, the sub-pages must be freed individually. If the memcg of the first
page is valid, the tail pages cannot be uncharged when be freed.
For example, when alloc_pages_exact is used to allocate 1MB continuous
physical memory, 2MB is charged(kmemcg is enabled and __GFP_ACCOUNT is
set). When make_alloc_exact free the unused 1MB and free_pages_exact free
the applied 1MB, actually, only 4KB(one page) is uncharged.
Therefore, the memcg of the tail page needs to be set when splitting a
page.
Michel:
There are at least two explicit users of __GFP_ACCOUNT with
alloc_exact_pages added recently. See 7efe8ef274024 ("KVM: arm64:
Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT") and c419621873713
("KVM: s390: Add memcg accounting to KVM allocations"), so this is not
just a theoretical issue.
Link: https://lkml.kernel.org/r/20210304074053.65527-3-zhouguanghui1@huawei.com
Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rui Xiang <rui.xiang@huawei.com>
Cc: Tianhong Ding <dingtianhong@huawei.com>
Cc: Weilong Chen <chenweilong@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
nr_pages argument
Rename mem_cgroup_split_huge_fixup to split_page_memcg and explicitly pass
in page number argument.
In this way, the interface name is more common and can be used by
potential users. In addition, the complete info(memcg and flag) of the
memcg needs to be set to the tail pages.
Link: https://lkml.kernel.org/r/20210304074053.65527-2-zhouguanghui1@huawei.com
Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Tianhong Ding <dingtianhong@huawei.com>
Cc: Weilong Chen <chenweilong@huawei.com>
Cc: Rui Xiang <rui.xiang@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In https://bugs.gentoo.org/769614 Dmitry noticed that
`ptrace(PTRACE_GET_SYSCALL_INFO)` does not return error sign properly.
The bug is in mismatch between get/set errors:
static inline long syscall_get_error(struct task_struct *task,
struct pt_regs *regs)
{
return regs->r10 == -1 ? regs->r8:0;
}
static inline long syscall_get_return_value(struct task_struct *task,
struct pt_regs *regs)
{
return regs->r8;
}
static inline void syscall_set_return_value(struct task_struct *task,
struct pt_regs *regs,
int error, long val)
{
if (error) {
/* error < 0, but ia64 uses > 0 return value */
regs->r8 = -error;
regs->r10 = -1;
} else {
regs->r8 = val;
regs->r10 = 0;
}
}
Tested on v5.10 on rx3600 machine (ia64 9040 CPU).
Link: https://lkml.kernel.org/r/20210221002554.333076-2-slyfox@gentoo.org
Link: https://bugs.gentoo.org/769614
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In https://bugs.gentoo.org/769614 Dmitry noticed that
`ptrace(PTRACE_GET_SYSCALL_INFO)` does not work for syscalls called via
glibc's syscall() wrapper.
ia64 has two ways to call syscalls from userspace: via `break` and via
`eps` instructions.
The difference is in stack layout:
1. `eps` creates simple stack frame: no locals, in{0..7} == out{0..8}
2. `break` uses userspace stack frame: may be locals (glibc provides
one), in{0..7} == out{0..8}.
Both work fine in syscall handling cde itself.
But `ptrace(PTRACE_GET_SYSCALL_INFO)` uses unwind mechanism to
re-extract syscall arguments but it does not account for locals.
The change always skips locals registers. It should not change `eps`
path as kernel's handler already enforces locals=0 and fixes `break`.
Tested on v5.10 on rx3600 machine (ia64 9040 CPU).
Link: https://lkml.kernel.org/r/20210221002554.333076-1-slyfox@gentoo.org
Link: https://bugs.gentoo.org/769614
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Userfaultfd self-test fails occasionally, indicating a memory corruption.
Analyzing this problem indicates that there is a real bug since mmap_lock
is only taken for read in mwriteprotect_range() and defers flushes, and
since there is insufficient consideration of concurrent deferred TLB
flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in
wp_page_copy(), this flush takes place after the copy has already been
performed, and therefore changes of the page are possible between the time
of the copy and the time in which the PTE is flushed.
To make matters worse, memory-unprotection using userfaultfd also poses a
problem. Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
userrfaultfd code might actually cause a demotion of the architectural PTE
permission: when userfaultfd_writeprotect() unprotects memory region, it
unintentionally *clears* the RW-bit if it was already set. Note that this
unprotecting a PTE that is not write-protected is a valid use-case: the
userfaultfd monitor might ask to unprotect a region that holds both
write-protected and write-unprotected PTEs.
The scenario that happens in selftests/vm/userfaultfd is as follows:
cpu0 cpu1 cpu2
---- ---- ----
[ Writable PTE
cached in TLB ]
userfaultfd_writeprotect()
[ write-*unprotect* ]
mwriteprotect_range()
mmap_read_lock()
change_protection()
change_protection_range()
...
change_pte_range()
[ *clear* “write”-bit ]
[ defer TLB flushes ]
[ page-fault ]
...
wp_page_copy()
cow_user_page()
[ copy page ]
[ write to old
page ]
...
set_pte_at_notify()
A similar scenario can happen:
cpu0 cpu1 cpu2 cpu3
---- ---- ---- ----
[ Writable PTE
cached in TLB ]
userfaultfd_writeprotect()
[ write-protect ]
[ deferred TLB flush ]
userfaultfd_writeprotect()
[ write-unprotect ]
[ deferred TLB flush]
[ page-fault ]
wp_page_copy()
cow_user_page()
[ copy page ]
... [ write to page ]
set_pte_at_notify()
This race exists since commit 292924b26024 ("userfaultfd: wp: apply
_PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent
since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made
wp_page_copy() more likely to take place, specifically if page_count(page)
> 1.
To resolve the aforementioned races, check whether there are pending
flushes on uffd-write-protected VMAs, and if there are, perform a flush
before doing the COW.
Further optimizations will follow to avoid during uffd-write-unprotect
unnecassary PTE write-protection and TLB flushes.
Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com
Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Signed-off-by: Nadav Amit <namit@vmware.com>
Suggested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org> [5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There's a runtime failure when running HW_TAGS-enabled kernel built with
GCC on hardware that doesn't support MTE. GCC-built kernels always have
CONFIG_KASAN_STACK enabled, even though stack instrumentation isn't
supported by HW_TAGS. Having that config enabled causes KASAN to issue
MTE-only instructions to unpoison kernel stacks, which causes the failure.
Fix the issue by disallowing CONFIG_KASAN_STACK when HW_TAGS is used.
(The commit that introduced CONFIG_KASAN_HW_TAGS specified proper
dependency for CONFIG_KASAN_STACK_ENABLE but not for CONFIG_KASAN_STACK.)
Link: https://lkml.kernel.org/r/59e75426241dbb5611277758c8d4d6f5f9298dac.1615215441.git.andreyknvl@google.com
Fixes: 6a63a63ff1ac ("kasan: introduce CONFIG_KASAN_HW_TAGS")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently, kasan_free_nondeferred_pages()->kasan_free_pages() is called
after debug_pagealloc_unmap_pages(). This causes a crash when
debug_pagealloc is enabled, as HW_TAGS KASAN can't set tags on an
unmapped page.
This patch puts kasan_free_nondeferred_pages() before
debug_pagealloc_unmap_pages() and arch_free_page(), which can also make
the page unavailable.
Link: https://lkml.kernel.org/r/24cd7db274090f0e5bc3adcdc7399243668e3171.1614987311.git.andreyknvl@google.com
Fixes: 94ab5b61ee16 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
process_madvise currently requires ptrace attach capability.
PTRACE_MODE_ATTACH gives one process complete control over another
process. It effectively removes the security boundary between the two
processes (in one direction). Granting ptrace attach capability even to a
system process is considered dangerous since it creates an attack surface.
This severely limits the usage of this API.
The operations process_madvise can perform do not affect the correctness
of the operation of the target process; they only affect where the data is
physically located (and therefore, how fast it can be accessed). What we
want is the ability for one process to influence another process in order
to optimize performance across the entire system while leaving the
security boundary intact.
Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ and
CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata and
CAP_SYS_NICE for influencing process performance.
Link: https://lkml.kernel.org/r/20210303185807.2160264-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <stable@vger.kernel.org> [5.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix a sparse warning by using rcu_dereference(). Technically this is a
bug and a sufficiently aggressive compiler could reload the `real_parent'
pointer outside the protection of the rcu lock (and access freed memory),
but I think it's pretty unlikely to happen.
Link: https://lkml.kernel.org/r/20210221194207.1351703-1-willy@infradead.org
Fixes: b18dc5f291c0 ("mm, oom: skip vforked tasks from being selected")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Some architectures prefix all functions with a constant string ('.' on
ppc64). Add ARCH_FUNC_PREFIX, which may optionally be defined in
<asm/kfence.h>, so that get_stack_skipnr() can work properly.
Link: https://lkml.kernel.org/r/f036c53d-7e81-763c-47f4-6024c6c5f058@csgroup.eu
Link: https://lkml.kernel.org/r/20210304144000.1148590-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
cache_alloc_debugcheck_after() performs checks on an object, including
adjusting the returned pointer. None of this should apply to KFENCE
objects. While for non-bulk allocations, the checks are skipped when we
allocate via KFENCE, for bulk allocations cache_alloc_debugcheck_after()
is called via cache_alloc_debugcheck_after_bulk().
Fix it by skipping cache_alloc_debugcheck_after() for KFENCE objects.
Link: https://lkml.kernel.org/r/20210304205256.2162309-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Use %td for ptrdiff_t.
Link: https://lkml.kernel.org/r/3abbe4c9-16ad-c168-a90f-087978ccd8f7@csgroup.eu
Link: https://lkml.kernel.org/r/20210303121157.3430807-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|